Investigating the Use of Transformer Based Embeddings for Multilingual Discourse Connective Identification

Title:

Investigating the Use of Transformer Based Embeddings for Multilingual Discourse Connective Identification

Chapados Muermans, Thomas (2022) Investigating the Use of Transformer Based Embeddings for Multilingual Discourse Connective Identification. Masters thesis, Concordia University.

[thumbnail of Chapados-Muermans_MCompSci_S2022.pdf]

Preview

Text (application/pdf)
Chapados-Muermans_MCompSci_S2022.pdf - Accepted Version

2MB

Abstract

In this thesis, we report on our experiments toward multilingual discourse connective (or DC) identification and show how language-specific BERT models seem to be sufficient even with little task-specific training data and do not require any additional handcrafted features to achieve strong results. Although some languages are under-resourced and do not have large annotated discourse connective corpora. To address this, we developed a methodology to induce large synthetic discourse annotated corpora using a parallel word aligned corpus. We evaluated our models in 3 languages: English, Turkish, and Mandarin Chinese; and applied our induction methodology on English-Turkish and English-Chinese. All our models were evaluated in the context of the recent DISRPT 2021 Task 2 shared task. Results show that the F-measure achieved by our simple approach (93.12%, 94.42%, 87.47% for English, Turkish and Chinese) are near or at state-of-the-art for the 3 languages while being simple and not requiring any handcrafted features.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:	Thesis (Masters)
Authors:	Chapados Muermans, Thomas
Institution:	Concordia University
Degree Name:	M. Comp. Sc.
Program:	Computer Science
Date:	26 May 2022
Thesis Supervisor(s):	Kosseim, Leila
ID Code:	990633
Deposited By:	THOMAS CHAPADOS MUERMANS
Deposited On:	27 Oct 2022 14:24
Last Modified:	27 Oct 2022 14:24

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Investigating the Use of Transformer Based Embeddings for Multilingual Discourse Connective Identification

Investigating the Use of Transformer Based Embeddings for Multilingual Discourse Connective Identification

Abstract