Login | Register

Novel Mixture Allocation Models for Topic Learning


Novel Mixture Allocation Models for Topic Learning

Mathin Henry, Kamal Maanicshah (2023) Novel Mixture Allocation Models for Topic Learning. PhD thesis, Concordia University.

[thumbnail of MathinHenry_PhD_S2023.pdf]
Text (application/pdf)
MathinHenry_PhD_S2023.pdf - Accepted Version
Available under License Spectrum Terms of Access.


Unsupervised learning has been an interesting area of research in recent years. Novel algorithms are being built on the basis of unsupervised learning methodologies to solve many real world problems. Topic modelling is one such fascinating methodology that identifies patterns as topics within data. Introduction of latent Dirichlet Allocation (LDA) has bolstered research on topic modelling approaches with modifications specific to the application. However, the basic assumption of a Dirichlet prior in LDA for topic proportions, might not be applicable in certain real world scenarios.

Hence, in this thesis we explore the use of generalized Dirichlet (GD) and Beta-Liouville (BL) as alternative priors for topic proportions. In addition, we assume a mixture of distributions over topic proportions which provides better fit to the data. In order to accommodate application of the resulting models to real-time streaming data, we also provide an online learning solution for the models. A supervised version of the learning framework is also provided and is shown to be advantageous when labelled data are available.

There is a slight chance that the topics thus derived may not be that accurate. In order to alleviate this problem, we integrate an interactive approach which uses inputs from the user to improve the quality of identified topics. We have also tweaked our models to be applied for interesting applications such as parallel topics extraction from multilingual texts and content based recommendation systems proving the adaptability of our proposed models. In the case of multilingual topic extraction, we use global topic proportions sampled from a Dirichlet process (DP) to tackle the problem and in the case of recommendation systems, we use the co-occurrences of words to our advantage.

For inference, we use a variational approach which makes computation of variational solutions easier. The applications we validated our models with, show the efficiency of proposed models.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (PhD)
Authors:Mathin Henry, Kamal Maanicshah
Institution:Concordia University
Degree Name:Ph. D.
Program:Information and Systems Engineering
Date:8 March 2023
Thesis Supervisor(s):Bouguila, Nizar and Amayri, Manar
ID Code:991858
Deposited By: Kamal Maanicshah Mathin Henry
Deposited On:21 Jun 2023 14:30
Last Modified:21 Jun 2023 14:30
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top