Arjmandiasl, Zeinab (2020) Variational Learning for Finite Shifted-Scaled Dirichlet Mixture Model and Its Applications. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
1MBArjmandiasl_MASc_S2020.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
With the huge amount of data produced every day, the interest in data mining and machine learning techniques has been growing. Ongoing advancement of technology has made AI systems subject to different issues. Data clustering is an important aspect of data analysis which is the process of grouping similar observations in the same subset. Among known clustering techniques, finite mixture models have led to outstanding results that created an inspiration toward further exploration of various mixture models and applications. The main idea of this clustering technique is to fit a mixture of components generated from a predetermined probability distribution into the data through parameter approximation of the components. Therefore, choosing a proper distribution based on the type of the data is another crucial step in data analysis. Although the Gaussian distribution has been widely used with mixture models, the Dirichlet family of distributions have been known to achieve better results particularly when dealing with proportional and non-Gaussian data.
Another crucial part in statistical modelling is the learning process. Among the conventional estimation approaches, Maximum Likelihood (ML) is widely used due to its simplicity in terms of implementation but it has some drawbacks, too. Bayesian approach has overcome some of the disadvantages of ML approach via taking prior knowledge into account. However, it creates new issues such as need for additional estimation methods due to the intractability of parameters' marginal probabilities. In this thesis, these limitations are discussed and addressed via defining a variational learning framework for finite shifted-scaled Dirichlet mixture model. The motivation behind applying variational inference is that compared to conventional Bayesian approach, it is much less computationally costly. Furthermore, in this method, the optimal number of components is estimated along with the parameter approximation automatically and simultaneously while convergence is guaranteed. The performance of our model, in terms of accuracy of clustering, is validated on real world challenging medical applications, including image processing, namely, Malaria detection, breast cancer diagnosis and cardiovascular disease detection as well as text-based spam email detection. Finally, in order to evaluate the merits of our model effectiveness, it is compared with four other widely used methods.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Arjmandiasl, Zeinab |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | 24 April 2020 |
Thesis Supervisor(s): | Bouguila, Nizar and Bentahar, Jamal |
ID Code: | 986770 |
Deposited By: | Zeinab Arjmandiasl |
Deposited On: | 19 Apr 2021 17:26 |
Last Modified: | 19 Apr 2021 17:26 |
Repository Staff Only: item control page