Nasfi, Rim (2022) Modeling Semi-Bounded Support Data using Non-Gaussian Hidden Markov Models with Applications. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
23MBNasfi_PhD_F2022.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
With the exponential growth of data in all formats, and data categorization rapidly becoming one of the most essential components of data analysis, it is crucial to research and identify hidden patterns in order to extract valuable information that promotes accurate and solid decision making. Because data modeling is the first stage in accomplishing any of these tasks, its accuracy and consistency are critical for later development of a complete data processing framework. Furthermore, an appropriate distribution selection that corresponds to the nature of the data is a particularly interesting subject of research. Hidden Markov Models (HMMs) are some of the most impressively powerful probabilistic models, which have recently made a big resurgence in the machine learning industry, despite having been recognized for decades. Their ever-increasing application in a variety of critical practical settings to model varied and heterogeneous data (image, video, audio, time series, etc.) is the subject of countless extensions. Equally prevalent, finite mixture models are a potent tool for modeling heterogeneous data of various natures. The over-use of Gaussian mixture models for data modeling in the literature is one of the main driving forces for this thesis. This work focuses on modeling positive vectors, which naturally occur in a variety of real-life applications, by proposing novel HMMs extensions using the Inverted Dirichlet, the Generalized Inverted Dirichlet and the BetaLiouville mixture models as emission probabilities. These extensions are motivated by the proven capacity of these mixtures to deal with positive vectors and overcome mixture models’ impotence to account for any ordering or temporal limitations relative to the information. We utilize the aforementioned distributions to derive several theoretical approaches for learning and deploying Hidden Markov Modelsinreal-world settings. Further, we study online learning of parameters and explore the integration of a feature selection methodology. Extensive experimentation on highly challenging applications ranging from image categorization, video categorization, indoor occupancy estimation and Natural Language Processing, reveals scenarios in which such models are appropriate to apply, and proves their effectiveness compared to the extensively used Gaussian-based models.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Nasfi, Rim |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Information and Systems Engineering |
Date: | April 2022 |
Thesis Supervisor(s): | Bouguila, Nizar |
ID Code: | 990660 |
Deposited By: | Rim Nasfi |
Deposited On: | 27 Oct 2022 14:26 |
Last Modified: | 27 Oct 2022 14:26 |
Repository Staff Only: item control page