GHADIMIGHESHLAGHI, SHADAN ORCID: https://orcid.org/0009-0005-8822-6470 (2024) Adaptive Priors in Probabilistic Topic Models for Bursty Discovery in Textual Data. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
384kBGhadimigheshlaghi_MASc_S2024.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
In the field of natural language processing, topic modeling plays an important role in detecting latent topics in large amounts of text. Models that use traditional methods of representation, however, often fail to capture the 'burstiness' characteristic of natural language - the tendency for previously occurring words to recur within the same document. In order to address this limitation, we introduce two innovative topic modeling frameworks: the Generalized Dirichlet Compound Multinomial Latent Dirichlet Allocation (GDCMLDA) and the Beta-Liouville Dirichlet Compound Multinomial Latent Dirichlet Allocation (BLDCMLDA). Using Dirichlet Compound Multinomial distribution together with Generalized Dirichlet and Beta-Liouville distributions, both frameworks integrate advanced distribution methods. By integrating these concepts, it is possible to model the burstiness phenomenon while maintaining a variety of topic proportion patterns that can be varied and flexible. As a result of our comprehensive evaluations across multiple benchmark text datasets, we conclude that GDCMLDA and BLDCMLDA are superior to existing models. The evidence for this is found in the improved performance metrics, including the scores for perplexity and coherence. Our results confirm that the proposed models are able to capture the complexities of word usage dynamics, thus contributing to a significant advancement in topic modeling.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | GHADIMIGHESHLAGHI, SHADAN |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | March 2024 |
Thesis Supervisor(s): | Bouguila, Nizar |
ID Code: | 993505 |
Deposited By: | Shadan Ghadimigheshlaghi |
Deposited On: | 05 Jun 2024 16:52 |
Last Modified: | 05 Jun 2024 16:52 |
Repository Staff Only: item control page