Detecting Location Names in French Life-Story Interview Transcripts


Detecting Location Names in French Life-Story Interview Transcripts

Bilal, Nadia (2021) Detecting Location Names in French Life-Story Interview Transcripts. Masters thesis, Concordia University.

A number of real-world projects cannot leverage the state-of-the-art techniques due to the unavailability of labelled datasets, lack of models tailored to their specific information extraction needs, or lack of models for their language. In such scenarios, instead of using state-of-the-art techniques, a rule-based syntactic analysis is more feasible for extracting specific entities and their relationships. In a similar information extraction scenario, this thesis uses prepositions to detect location names in the French life-story interview transcripts. When the performance is compared with human annotations (gold standard), the average precision for this basic methodology is 80% and the recall is 83%. Such locations that are identified in the context of prepositional phrases are thereafter extracted from the rest of the text. This extends the basic methodology and leads to a significant increase in recall, however, at the expense of precision. The extended version has a higher recall of 94% with a decreased precision of 70%. An additional step addresses a small set of false positives which increases the precision of the extended version to 76% with the same recall of 94%. In addition to location detection, this thesis presents a simple demonstration of using the grammatical context to further detect other entities of interest, specifically, the interviewee’s recollection of the past with respect to people in association with a location. Hence, this thesis demonstrates the utility of the rule-based approach and a grammar based methodology to detect specific entities of interest and their relationships in texts of specific projects.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Bilal, Nadia
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:September 2021
Thesis Supervisor(s):Bergler, Sabine
Keywords:rule-based, grammar-based, information extraction, specific entities, complex relationships, interview transcripts, text analysis, named entity recognition
ID Code:988905
Deposited By: Nadia Bilal
Deposited On:29 Nov 2021 16:30
Last Modified:29 Nov 2021 16:30


