Oboh, Eromonsele Samuel (2016) Cluster Analysis of Multivariate Data Using Scaled Dirichlet Finite Mixture Model. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
2MBOboh_MASc_F2016.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
We have designed and implemented a finite mixture model, using the scaled Dirichlet distribution for the cluster analysis of multivariate proportional data. In this thesis, the task of cluster analysis first involves model selection which helps to discover the number of natural groupings underlying a dataset. This activity is then followed by that of estimating the model parameters for those natural groupings using the expectation maximization framework.
This work, aims to address the flexibility challenge of the Dirichlet distribution by introduction of a distribution with an extra model parameter. This is important because scientists and researchers are constantly searching for the best models that can fully describe the intrinsic characteristics of the observed data and flexible models are increasingly used to achieve such purposes.
In addition, we have applied our estimation and model selection algorithm to both synthetic and real datasets. Most importantly, we considered two areas of application in software modules defect prediction and in customer segmentation. Today, there is a growing challenge of detecting defected modules early in complex software development projects. Therefore, making these sort of machine learning algorithms crucial in driving key quality improvements that impacts bottom-line and customer satisfaction.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Oboh, Eromonsele Samuel |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | December 2016 |
Thesis Supervisor(s): | Bouguila, Nizar |
Keywords: | Unsupervised Learning, Software Modules Categorization, Finite Mixture Models, Scaled Dirichlet Distribution |
ID Code: | 982063 |
Deposited By: | EROMONSELE SAMU OBOH |
Deposited On: | 09 Jun 2017 14:49 |
Last Modified: | 18 Jan 2018 17:54 |
Repository Staff Only: item control page