su, xuanbo (2021) Fully Bayesian Inference for Finite and Infinite Discrete Exponential Mixture Models. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
704kBXuanbo_MASc_F2021.pdf - Accepted Version |
Abstract
Count data often appears in natural language processing and computer vision applications. For example, in images and textual documents clustering, each image or text can be described
by a histogram of visual words or text words. In real applications, these frequency vectors often show high-dimensional and sparsity nature. In this case, hierarchical Bayesian modeling frameworks
show the ability to model the dependence of the word repetitive occurrences ’burstiness’.
Moreover, approximating these models to exponential families is helpful to improve computing efficiency, especially when facing high-dimensional count data and large data sets. However, classical deterministic approaches such as expectation-maximization (EM) do not achieve good results in real-life complex applications. This thesis explores the use of a fully Bayesian inference for finite discrete exponential mixture models of Multinomial Generalized Dirichlet (EMGD), Multinomial
Beta-Liouville (EMBL), Multinomial Scaled Dirichlet (EMSD), and Multinomial Shifted Scaled Dirichlet (EMSSD). Finite mixtures have already shown superior performance in real data
sets clustering with EM approach. The proposed approaches in this thesis are based on Monte Carlo simulation technique of Gibbs sampling mixed with Metropolis-Hastings step, and we utilize exponential family conjugate prior information to construct the required posteriors relying on Bayesian theory. Furthermore, we also present the infinite models based on Dirichlet processes, which results in clustering algorithms that do not require the specification of the number of mixture components to be given in advance. The performance of our Bayesian approaches was tested in some challenging real-world applications concerning text sentiment analysis, fake news detection, and human face
gender recognition.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | su, xuanbo |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Information and Systems Engineering |
Date: | 5 November 2021 |
Thesis Supervisor(s): | Bouguila, Nizar and Zamzami, Nuha |
ID Code: | 990003 |
Deposited By: | Xuanbo Su |
Deposited On: | 16 Jun 2022 15:15 |
Last Modified: | 16 Jun 2022 15:15 |
Repository Staff Only: item control page