Rengarajan, Nanda Kumar
ORCID: https://orcid.org/0009-0004-3598-4457
(2025)
PANER: A Paraphrase-Augmented Framework for Low-Resource Named Entity Recognition.
Masters thesis, Concordia University.
Preview |
Text (application/pdf)
904kBRengarajan_MSc_F2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Named Entity Recognition (NER) is a critical task that requires substantial annotated data, making it challenging in low-resource scenarios where label acquisition is expensive. While zero-shot and instruction-tuned approaches have made progress, they often fail to generalize to domain-specific entities and do not effectively utilize the limited available data. We present a lightweight few-shot NER framework that addresses these challenges through two key innovations: (1) a new instruction tuning template with a simplified output format that combines principles from prior IT approaches to leverage the large context window of recent state-of-the-art LLMs; (2) introducing a strategic data augmentation technique that preserves entity information while paraphrasing the surrounding context, thereby expanding our training data without compromising semantic relationships. Experiments on benchmark datasets demonstrate that our method achieves performance comparable to that of state-of-the-art models on few-shot and zero-shot tasks, with our few-shot approach attaining an average F1 score of 80.1\% on the CrossNER datasets. Models trained with our instruction tuning approach exhibit consistent improvements in F1 scores of up to 17\% points over comparable baselines, providing a promising solution for groups with limited NER training data and computational resources.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Rengarajan, Nanda Kumar |
| Institution: | Concordia University |
| Degree Name: | M. Sc. |
| Program: | Computer Science |
| Date: | August 2025 |
| Thesis Supervisor(s): | Yan, Jun and Wang, Chun |
| Keywords: | Named Entity Recognition, NLP, LLM |
| ID Code: | 995873 |
| Deposited By: | Nanda Kumar Kumar Rengarajan |
| Deposited On: | 04 Nov 2025 15:39 |
| Last Modified: | 04 Nov 2025 15:39 |
Repository Staff Only: item control page


Download Statistics
Download Statistics