Login | Register

PANER: A Paraphrase-Augmented Framework for Low-Resource Named Entity Recognition

Title:

PANER: A Paraphrase-Augmented Framework for Low-Resource Named Entity Recognition

Rengarajan, Nanda Kumar ORCID: https://orcid.org/0009-0004-3598-4457 (2025) PANER: A Paraphrase-Augmented Framework for Low-Resource Named Entity Recognition. Masters thesis, Concordia University.

[thumbnail of Rengarajan_MSc_F2025.pdf]
Preview
Text (application/pdf)
Rengarajan_MSc_F2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.
904kB

Abstract

Named Entity Recognition (NER) is a critical task that requires substantial annotated data, making it challenging in low-resource scenarios where label acquisition is expensive. While zero-shot and instruction-tuned approaches have made progress, they often fail to generalize to domain-specific entities and do not effectively utilize the limited available data. We present a lightweight few-shot NER framework that addresses these challenges through two key innovations: (1) a new instruction tuning template with a simplified output format that combines principles from prior IT approaches to leverage the large context window of recent state-of-the-art LLMs; (2) introducing a strategic data augmentation technique that preserves entity information while paraphrasing the surrounding context, thereby expanding our training data without compromising semantic relationships. Experiments on benchmark datasets demonstrate that our method achieves performance comparable to that of state-of-the-art models on few-shot and zero-shot tasks, with our few-shot approach attaining an average F1 score of 80.1\% on the CrossNER datasets. Models trained with our instruction tuning approach exhibit consistent improvements in F1 scores of up to 17\% points over comparable baselines, providing a promising solution for groups with limited NER training data and computational resources.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Rengarajan, Nanda Kumar
Institution:Concordia University
Degree Name:M. Sc.
Program:Computer Science
Date:August 2025
Thesis Supervisor(s):Yan, Jun and Wang, Chun
Keywords:Named Entity Recognition, NLP, LLM
ID Code:995873
Deposited By: Nanda Kumar Kumar Rengarajan
Deposited On:04 Nov 2025 15:39
Last Modified:04 Nov 2025 15:39
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top