Nguyen Dinh, Hieu (2019) Variational Approaches For Learning Finite Scaled Dirichlet Mixture Models. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
5MBThesis.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is undisputed. Recent development of technology has made machine learning techniques applicable to various problems. Particularly, we emphasize on cluster analysis, an important aspect of data analysis. Recent works with excellent results on the aforementioned task using finite mixture models have motivated us to further explore their extents with different applications. In other words, the main idea of mixture model is that the observations are generated from a mixture of components, in each of which the probability distribution should provide strong flexibility in order to fit numerous types of data. Indeed, the Dirichlet family of distributions has been known to achieve better clustering performances than those of Gaussian when the data are clearly non-Gaussian, especially proportional data.
Thus, we introduce several variational approaches for finite Scaled Dirichlet mixture models. The proposed algorithms guarantee reaching convergence while avoiding the computational complexity of conventional Bayesian inference. In summary, our contributions are threefold. First, we propose a variational Bayesian learning framework for finite Scaled Dirichlet mixture models, in which the parameters and complexity of the models are naturally estimated through the process of minimizing the Kullback-Leibler (KL) divergence between the approximated posterior distribution and the true one. Secondly, we integrate component splitting into the first model, a local model selection scheme, which gradually splits the components based on their mixing weights to obtain the optimal number of components. Finally, an online variational inference framework for finite Scaled Dirichlet mixture models is developed by employing a stochastic approximation method in order to improve the scalability of finite mixture models for handling large scale data in real time. The effectiveness of our models is validated with real-life challenging problems including object, texture, and scene categorization, text-based and image-based spam email detection.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Nguyen Dinh, Hieu |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | 29 May 2019 |
Thesis Supervisor(s): | Bouguila, Nizar |
ID Code: | 985459 |
Deposited By: | Dinh Hieu Nguyen |
Deposited On: | 05 Feb 2020 14:26 |
Last Modified: | 05 Feb 2020 14:26 |
Repository Staff Only: item control page