Login | Register

Adaptive Priors in Probabilistic Topic Models for Bursty Discovery in Textual Data

Title:

Adaptive Priors in Probabilistic Topic Models for Bursty Discovery in Textual Data

GHADIMIGHESHLAGHI, SHADAN ORCID: https://orcid.org/0009-0005-8822-6470 (2024) Adaptive Priors in Probabilistic Topic Models for Bursty Discovery in Textual Data. Masters thesis, Concordia University.

[thumbnail of Ghadimigheshlaghi_MASc_S2024.pdf]
Preview
Text (application/pdf)
Ghadimigheshlaghi_MASc_S2024.pdf - Accepted Version
Available under License Spectrum Terms of Access.
384kB

Abstract

In the field of natural language processing, topic modeling plays an important role in detecting latent topics in large amounts of text. Models that use traditional methods of representation, however, often fail to capture the 'burstiness' characteristic of natural language - the tendency for previously occurring words to recur within the same document. In order to address this limitation, we introduce two innovative topic modeling frameworks: the Generalized Dirichlet Compound Multinomial Latent Dirichlet Allocation (GDCMLDA) and the Beta-Liouville Dirichlet Compound Multinomial Latent Dirichlet Allocation (BLDCMLDA). Using Dirichlet Compound Multinomial distribution together with Generalized Dirichlet and Beta-Liouville distributions, both frameworks integrate advanced distribution methods. By integrating these concepts, it is possible to model the burstiness phenomenon while maintaining a variety of topic proportion patterns that can be varied and flexible. As a result of our comprehensive evaluations across multiple benchmark text datasets, we conclude that GDCMLDA and BLDCMLDA are superior to existing models. The evidence for this is found in the improved performance metrics, including the scores for perplexity and coherence. Our results confirm that the proposed models are able to capture the complexities of word usage dynamics, thus contributing to a significant advancement in topic modeling.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:GHADIMIGHESHLAGHI, SHADAN
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Quality Systems Engineering
Date:March 2024
Thesis Supervisor(s):Bouguila, Nizar
ID Code:993505
Deposited By: Shadan Ghadimigheshlaghi
Deposited On:05 Jun 2024 16:52
Last Modified:05 Jun 2024 16:52
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top