Costa, Nelson Filipe (2025) Harmonizing Divergence in Computational Discourse Analysis. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
705kBCosta_PhD_S2026.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Understanding discourse is essential for advancing computational models from surface-level text processing to deeper language reasoning, as it captures the logical flow of ideas that shapes meaning into a coherent text. However, progress in computational discourse analysis is hindered by divergent theoretical frameworks, ambiguity in implicit discourse relations and a myopic focus on the English language.
This thesis addresses these challenges through three research objectives. First, it proposes an empirical mapping between the two most widely used discourse frameworks, the Rhetorical Structure Theory and the Penn Discourse Treebank, for explicit and implicit discourse relations. The proposed mapping successfully maps 80.0% of the overlapping annotations between the most prominent corpora following each framework, laying groundwork for cross-framework interoperability. Second, the thesis introduces a novel multi-task classification model, MTask, for Implicit Discourse Relation Recognition (IDRR). The model captures ambiguity in implicit relations by jointly learning multi-label representations of their senses. The model establishes the first benchmark on multi-label IDRR and is also evaluated on the traditional single-label IDRR. Third, the thesis extends the multi-label approach to different languages and presents a hierarchical classification model. The model outperforms MTask in the English language and establishes the first benchmark on multilingual and multi-label IDRR. The thesis further explores prompting strategies using recent large language models and shows that fine-tuning strategies still perform better in this task.
Together, these contributions advance the goal of harmonizing divergence in computational discourse analysis, offering more generalizable and inclusive methods for discourse modeling across frameworks, ambiguity and languages.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
|---|---|
| Item Type: | Thesis (PhD) |
| Authors: | Costa, Nelson Filipe |
| Institution: | Concordia University |
| Degree Name: | Ph. D. |
| Program: | Computer Science |
| Date: | 21 August 2025 |
| Thesis Supervisor(s): | Kosseim, Leila |
| ID Code: | 996415 |
| Deposited By: | Nelson Filipe Ferreira De Almeida Costa |
| Deposited On: | 29 Jun 2026 15:33 |
| Last Modified: | 29 Jun 2026 15:33 |
Repository Staff Only: item control page


Download Statistics
Download Statistics