Hosseinmardi, Arman (2026) Semantic Analysis of Academic Citation Behavior: An Environment-Based Design Approach using Large Language Models. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
1MBHosseinmardi_MA_S2026.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
The exponential growth of academic literature has led to a reliance on quantitative bibliometrics, such as citation counts and h-indices, to measure scientific impact. However, these metrics remain "meaning-blind," treating all citations as equal endorsements while failing to capture the nuance of why a paper was cited or the faithfulness of its representation. This thesis addresses the "Verification Gap" the systemic inability to verify citation accuracy at scale by adopting an Environment-Based Design (EBD) methodology.
Framing citation verification as a transdisciplinary design problem, this study identifies a fundamental conflict between the built environment of digital archives and the cognitive limitations of the human environment. To resolve this conflict, a novel multi-agent system powered by Large Language Models (Gemini 3 Flash) was designed and implemented. The system operationalizes a recursive five-stage workflow: (1) Zero-Shot Extraction of unstructured bibliographies using LLM-native structural reasoning; (2) Hybrid "Hunter" Retrieval, utilizing a prioritized "White-Hat" waterfall strategy (Crossref, arXiv, CORE) to solve the "cold start" problem of full-text acquisition; and (3) Semantic Alignment, where the artifact identifies "Evidentiary Anchors" in the cited source to verify authorial claims. The analytical framework is grounded in the sociological taxonomy of Bornmann and Daniel (2008), classifying citations into eight functional categories.
The system was evaluated against a "Gold Standard" dataset of 50 citation pairs sampled from flagship design engineering journals (e.g., JMD, AIEDAM, CoDesign). Results demonstrate an 80% retrieval success rate through legitimate channels and a 100% detection rate for intentional citation distortions. In a hybrid evaluation comparing the system’s critique against human subject-matter experts, the artifact achieved a Cohen’s Kappa of 0.81, indicating substantial agreement. These findings confirm that modern LLMs, when constrained by EBD principles and strict structural prompting, can effectively serve as scalable "Augmented Intelligence" for research integrity. This research moves the field of scientometrics from simple sentiment classification toward deep semantic verification, ensuring that scientific impact is measured by the quality and accuracy of intellectual debt.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Hosseinmardi, Arman |
| Institution: | Concordia University |
| Degree Name: | M.A. |
| Program: | Information and Systems Engineering |
| Date: | 24 February 2026 |
| Thesis Supervisor(s): | Zeng, Yong |
| ID Code: | 996799 |
| Deposited By: | Arman Hosseinmardi |
| Deposited On: | 29 Jun 2026 14:51 |
| Last Modified: | 29 Jun 2026 14:51 |
Repository Staff Only: item control page


Download Statistics
Download Statistics