Bilal, Nadia (2021) Detecting Location Names in French Life-Story Interview Transcripts. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
536kBBILAL_MCompSc_F2021.pdf - Accepted Version |
Abstract
A number of real-world projects cannot leverage the state-of-the-art techniques due to the unavailability of labelled datasets, lack of models tailored to their specific information extraction needs, or lack of models for their language. In such scenarios, instead of using state-of-the-art techniques, a rule-based syntactic analysis is more feasible for extracting specific entities and their relationships. In a similar information extraction scenario, this thesis uses prepositions to detect location names in the French life-story interview transcripts. When the performance is compared with human annotations (gold standard), the average precision for this basic methodology is 80% and the recall is 83%. Such locations that are identified in the context of prepositional phrases are thereafter extracted from the rest of the text. This extends the basic methodology and leads to a significant increase in recall, however, at the expense of precision. The extended version has a higher recall of 94% with a decreased precision of 70%. An additional step addresses a small set of false positives which increases the precision of the extended version to 76% with the same recall of 94%. In addition to location detection, this thesis presents a simple demonstration of using the grammatical context to further detect other entities of interest, specifically, the interviewee’s recollection of the past with respect to people in association with a location. Hence, this thesis demonstrates the utility of the rule-based approach and a grammar based methodology to detect specific entities of interest and their relationships in texts of specific projects.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Bilal, Nadia |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science |
Date: | September 2021 |
Thesis Supervisor(s): | Bergler, Sabine |
Keywords: | rule-based, grammar-based, information extraction, specific entities, complex relationships, interview transcripts, text analysis, named entity recognition |
ID Code: | 988905 |
Deposited By: | Nadia Bilal |
Deposited On: | 29 Nov 2021 16:30 |
Last Modified: | 29 Nov 2021 16:30 |
References:
Abney, S. (1997). “Part-of-Speech Tagging and Partial Parsing”. In: Corpus-Based Methods in Language and Speech Processing. Springer Netherlands, pp. 118–136.Albared, Mohammed, Marc Gallofré Ocaña, Abdullah Ghareb, and Tareq Al-Moslmi (2019). “RecentProgress of Named Entity Recognition over the Most Popular Datasets”. In: First InternationalConference of Intelligent Computing and Engineering. IEEE, pp. 1–9.
Allen, James (1995).Natural Language Understanding. 2nd ed. Pearson.
Blache, Philippe and Azulay David-Olivier (2002). “Parsing Ill-formed Inputs with Constraint Graphs”.In: International Conference on Intelligent Text Processing and Computational Linguistics,pp. 220–229.
Caquard, Sébastien and William Cartwright (2014). “Narrative Cartography: From Mapping Stories to the Narrative of Maps and Mapping”. In: The Cartographic Journal.
Carlier, Anne, Michèle Goyens, and Béatrice Lamiroy (2013). “De: A Genitive Marker in French?: ItsGrammaticalization Path from Latin to French”. In:The Genitive. John Benjamins, pp. 141–216.
Chang, Yu-shan and Yun-Hsuan Sung (2005). “Applying Named Entity Recognition to InformalText”. In: Recall1.
Chiticariu, Laura, Yunyao Li, and Frederick R Reiss (2013). “Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!” In: Proceedings of the 2013Conference on Empirical Methods in Natural Language Processing, pp. 827–832.
Coates-Stephens, Sam (1992). “The Analysis and Acquisition of Proper Names for Robust TextUnderstanding”. Ph.D. thesis. City University London.
Crabbé, Benoit and Marie Candito (2008). “Expériences d’analyse syntaxique statistique du français”.In:15ème Conférence sur le Traitement Automatique des Langues Naturelles-TALN’08, pp. 45–54.
Cunningham, Hamish, Diana Maynard, Kalina Bontcheva, Valentin Tablan, Niraj Aswani, IanRoberts, Genevieve Gorrell, Adam Funk, Angus Roberts, Danica Damljanovic, et al. (2014). Developing Language Processing Components with GATE version 8 (A User Guide).
Eftimov, Tome, Barbara Koroušić Seljak, and Peter Korošec (2017). “A Rule-based Named-entity Recognition Method for Knowledge Extraction of Evidence-based Dietary Recommendations”.In: PLOS ONE12.6.56
Ehrmann, Maud, Damien Nouvel, and Sophie Rosset (2016). “Named Entity Resources - Overview and Outlook”. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp. 3349–3356.
Erdmann, Alexander, David Joseph Wrisley, Benjamin Allen, Christopher Brown, Sophie Cohen-Bodénès, Micha Elsner, Yukun Feng, Brian Joseph, Béatrice Joyeux-Prunel, and Marie-Catherinede Marneffe (2019). “Practical, Efficient, and Customizable Active Learning for Named EntityRecognition in the Digital Humanities”. In: Proceedings of the 2019 Conference of the NorthAmerican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp. 2223–2234.
Goyal, Archana, Vishal Gupta, and Manish Kumar (2018). “Recent Named Entity Recognition and Classification Techniques: A Systematic Review”. In: Computer Science Review29, pp. 21–43.
Hadži, Vesna Požgaj, Damir Horga, and Tatjana Balazic Bulc (2012). “Speech fluency: a result of oral language proficiency?” In: Linguistica52.1, pp. 87–100.
Kluegl, Peter, Martin Atzmueller, Tobias Hermann, and Frank Puppe (2009). “A Framework for Semi-Automatic Development of Rule-based Information Extraction Applications.” In: Proceedings of the Workshops on Learning, Knowledge Discovery, and Adaptivity (LWA), KDML–56.
Kluegl, Peter, Martin Toepfer, Philip-Daniel Beck, Georg Fette, and Frank Puppe (2016). “UIMARuta: Rapid development of rule-based information extraction applications”. In: Natural Language Engineering22.1, pp. 1–40.
Law, Jennifer H, Christopher Pettengell, Lisa W Le, Steven Aviv, Patricia DeMarco, David C Merritt, Sally CM Lau, Adrian G Sacher, and Natasha B Leighl (2019). “Generating Real-World Evidence: Using Automated Data Extraction to Replace Manual Chart Review.” In:Journal of ClinicalOncology37.
Li, Jing, Aixin Sun, Jianglei Han, and Chenliang Li (2020). “A Survey on Deep Learning for NamedEntity Recognition”. In: IEEE Transactions on Knowledge and Data Engineering.
Manning, Christopher D (2011). “Part-of-Speech Tagging from 97% to 100%: Is It Time for SomeLinguistics?” In: International Conference on Intelligent Text Processing and Computational Linguistics. Springer, pp. 171–189.
Marciano, Richard, William Underwood, Mohammad Hanaee, Connor Mullane, Aakanksha Singh, and Zayden Tethong (2018). “Automating the detection of personally identifiable information(PII) in Japanese-American WWII incarceration camp records”. In: Proceedings of the International Conference on Big Data. The Institute of Electrical and Electronics Engineers, pp. 2725–2732.
Maynard, Diana, Wim Peters, and Yaoyong Li (2006). “Metrics for Evaluation of Ontology-BasedInformation Extraction”. In: Proceedings of the WWW Workshop on Evaluation of Ontologies for the Web.
Milanova, Ivona, Jurij Silc, Miha Serucnik, Tome Eftimov, and Hristijan Gjoreski (2019). “LOCALE:A Rule-based Location Named-entity Recognition Method for Latin Text.” In:HistoInformatics@TPDL, pp. 13–20.57
Nenadic, Goran, Irena Spasic, and Sophia Ananiadou (2003). “Terminology-Driven Mining of Biomedical Literature”. In: Bioinformatics19.8, pp. 938–943.
Palmer, David D (2000). “Tokenisation and Sentence Segmentation”. In: Handbook of Natural Language Processing, pp. 11–35.
Poibeau, Thierry and Leila Kosseim (2001). “Proper Name Extraction from Non-journalistic Texts”.In: Computational Linguistics in the Netherlands 2000. Brill Rodopi, pp. 144–157.
Richter, Ludwig, Johanna Geiß, Andreas Spitz, and Michael Gertz (2017). “HeidelPlace: An Extensible Framework for Geoparsing”. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 85–90.
Toutanova, Kristina, Dan Klein, Christopher D. Manning, and Yoram Singer (2003). “Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network”. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 252–259.
Vandeloise, Claude (1991).Spatial Prepositions: A Case Study from French. University of Chicago Press.
Wang, Ilaine, Aurore Pelletier, Jean-Yves Antoine, and Anaı̈s Halftermeyer (2020). “ODIL_Syntax: a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees”. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 5301–5307.
Yadav, Vikas and Steven Bethard (2018). “A Survey on Recent Advances in Named Entity Recognition from Deep Learning models”. In: Proceedings of the 27th International Conference onComputational Linguistics, pp. 2145–2158.
Repository Staff Only: item control page