Login | Register

Semantic Analysis of Academic Citation Behavior: An Environment-Based Design Approach using Large Language Models

Title:

Semantic Analysis of Academic Citation Behavior: An Environment-Based Design Approach using Large Language Models

Hosseinmardi, Arman (2026) Semantic Analysis of Academic Citation Behavior: An Environment-Based Design Approach using Large Language Models. Masters thesis, Concordia University.

[thumbnail of Hosseinmardi_MA_S2026.pdf]
Preview
Text (application/pdf)
Hosseinmardi_MA_S2026.pdf - Accepted Version
Available under License Spectrum Terms of Access.
1MB

Abstract

The exponential growth of academic literature has led to a reliance on quantitative bibliometrics, such as citation counts and h-indices, to measure scientific impact. However, these metrics remain "meaning-blind," treating all citations as equal endorsements while failing to capture the nuance of why a paper was cited or the faithfulness of its representation. This thesis addresses the "Verification Gap" the systemic inability to verify citation accuracy at scale by adopting an Environment-Based Design (EBD) methodology.
Framing citation verification as a transdisciplinary design problem, this study identifies a fundamental conflict between the built environment of digital archives and the cognitive limitations of the human environment. To resolve this conflict, a novel multi-agent system powered by Large Language Models (Gemini 3 Flash) was designed and implemented. The system operationalizes a recursive five-stage workflow: (1) Zero-Shot Extraction of unstructured bibliographies using LLM-native structural reasoning; (2) Hybrid "Hunter" Retrieval, utilizing a prioritized "White-Hat" waterfall strategy (Crossref, arXiv, CORE) to solve the "cold start" problem of full-text acquisition; and (3) Semantic Alignment, where the artifact identifies "Evidentiary Anchors" in the cited source to verify authorial claims. The analytical framework is grounded in the sociological taxonomy of Bornmann and Daniel (2008), classifying citations into eight functional categories.
The system was evaluated against a "Gold Standard" dataset of 50 citation pairs sampled from flagship design engineering journals (e.g., JMD, AIEDAM, CoDesign). Results demonstrate an 80% retrieval success rate through legitimate channels and a 100% detection rate for intentional citation distortions. In a hybrid evaluation comparing the system’s critique against human subject-matter experts, the artifact achieved a Cohen’s Kappa of 0.81, indicating substantial agreement. These findings confirm that modern LLMs, when constrained by EBD principles and strict structural prompting, can effectively serve as scalable "Augmented Intelligence" for research integrity. This research moves the field of scientometrics from simple sentiment classification toward deep semantic verification, ensuring that scientific impact is measured by the quality and accuracy of intellectual debt.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:Hosseinmardi, Arman
Institution:Concordia University
Degree Name:M.A.
Program:Information and Systems Engineering
Date:24 February 2026
Thesis Supervisor(s):Zeng, Yong
ID Code:996799
Deposited By: Arman Hosseinmardi
Deposited On:29 Jun 2026 14:51
Last Modified:29 Jun 2026 14:51
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top