Login | Register

Mixture-Based Clustering for High-Dimensional Count Data Using Minorization-Maximization Approaches

Title:

Mixture-Based Clustering for High-Dimensional Count Data Using Minorization-Maximization Approaches

Bregu, Ornela ORCID: https://orcid.org/0000-0001-8337-2376 (2020) Mixture-Based Clustering for High-Dimensional Count Data Using Minorization-Maximization Approaches. Masters thesis, Concordia University.

[thumbnail of Bregu_MASc_F2020.pdf]
Preview
Text (application/pdf)
Bregu_MASc_F2020.pdf - Accepted Version
Available under License Spectrum Terms of Access.
2MB

Abstract

The Multinomial distribution has been widely used to model count data. To increase clustering efficiency, we use an approximation of the Fisher Scoring as a learning algorithm, which is more robust to the choice of the initial parameter values. Moreover, we consider the generalization of the multinomial model obtained by introducing the Dirichlet as prior, which is called the Dirichlet Compound Multinomial (DCM). Even though DCM can address the burstiness phenomenon of count data, the presence of Gamma function in its density function usually leads to undesired complications. In this thesis, we use two alternative representations of DCM distribution to perform clustering based on finite mixture models, where the mixture parameters are estimated using minorization-maximization algorithm. Moreover, we propose an online learning technique for unsupervised clustering based on a mixture of Neerchal- Morel distributions. While the novel mixture model is able to capture overdispersion due to a weight parameter assigned to each feature in each cluster, online learning is able to overcome the drawbacks of batch learning in such a way that the mixture’s parameters can be updated instantly for any new data instances. Finally, by implementing a minimum message length model selection criterion, the weights of irrelevant mixture components are driven towards zero, which resolves the problem of knowing the number of clusters beforehand. To evaluate and compare the performance of our proposed models, we have considered five challenging real-world applications that involve high dimensional count vectors, namely, sentiment analysis, topic detection, facial expression recognition, human action recognition and medical diagnosis. The results show that the proposed algorithms increase the clustering efficiency remarkably as compared to other benchmarks, and the best results are achieved by the models able to accommodate over-dispersed count data.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:Bregu, Ornela
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Information and Systems Engineering
Date:16 August 2020
Thesis Supervisor(s):Bouguila, Nizar
Keywords:count data, overdispersion,burstiness, high-dimensionality, AFSA, DCM, Neechal-Morel distribution, minorization-maximization, finite mixture models, minimum message length, image datasets
ID Code:988291
Deposited By: ORNELA BREGU
Deposited On:27 Oct 2022 13:51
Last Modified:27 Oct 2022 13:51

References:

Bibliography
[1] Seshadri Tirunillai and Gerard J Tellis. Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent dirichlet allocation. Journal of Marketing Research, 51(4):463–479, 2014.
[2] Wullianallur Raghupathi and Viju Raghupathi. Big data analytics in healthcare: promise and potential. Health information science and systems, 2(1):3, 2014.
[3] Nizar Bouguila. Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng., 20(4):462–474, 2008.
[4] Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. Text classification from labeled and unlabeled documents using em. Machine learning, 39(2-3):103–134, 2000.
[5] Gabriella Csurka, Christopher Dance, Lixin Fan, Jutta Willamowski, and C´edric Bray. Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV, volume 1, pages 1–2. Prague, 2004.
[6] Nizar Bouguila. Count data modeling and classification using finite mixtures of distributions. IEEE Transactions on Neural Networks, 22(2):186–198, 2010.
[7] Nizar Bouguila and Ola Amayri. A discrete mixture-based kernel for svms: Application to spam and image categorization. Inf. Process. Manag., 45(6):631–642, 2009.
[8] Nizar Bouguila and Walid ElGuebaly. On discrete data clustering. In Takashi Washio, Einoshin Suzuki, Kai Ming Ting, and Akihiro Inokuchi, editors, Advances in Knowledge Discovery and Data Mining, 12th Pacific-Asia Conference, PAKDD 2008, Osaka, Japan, May 20-23, 2008 Proceedings, volume 5012 of Lecture Notes in Computer Science, pages 503–510. Springer, 2008.
[9] Inderjit S Dhillon and Dharmendra S Modha. Concept decompositions for large sparse text data using clustering. Machine learning, 42(1-2):143–175, 2001.
[10] Slava Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE transactions on acoustics, speech, and signal processing, 35(3):400 401, 1987.
[11] Kenneth W. Church and William A. Gale. Poisson mixtures. Natural Language Engineering, 1(2):163–190, 1995.
[12] Slava M Katz. Distribution of content words and phrases in text and language modelling. Natural language engineering, 2(1):15–59, 1996.
[13] John Hinde and Clarice GB Dem´etrio. Overdispersion: models and estimation. Computational statistics & data analysis, 27(2):151 170, 1998.
[14] Mario A. T. Figueiredo and Anil K. Jain. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis & Machine Intelligence, (3):381–396, 2002.
[15] Nizar Bouguila and Walid ElGuebaly. Discrete data clustering using finite mixture models. Pattern Recognit., 42(1):33–42, 2009.
[16] Geoffrey McLachlan and David Peel. Finite mixture models. John Wiley & Sons, 2004.
[17] Makoto Iwayama and Takenobu Tokunaga. Cluster-based text categorization: a comparison of category search strategies. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pages 273 280. Citeseer, 1995.
[18] Nizar Bouguila and Walid ElGuebaly. A generative model for spatial color image databases categorization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, March 30 - April 4, 2008, Caesars Palace, Las Vegas, Nevada, USA, pages 821–824. IEEE, 2008.
[19] Ali Shojaee Bakhtiari and Nizar Bouguila. An expandable hierarchical statistical framework for count data modeling and its application to object classification. In IEEE 23rd International Conference on Tools with Artificial Intelligence, ICTAI 2011, Boca Raton, FL, USA, November 7-9, 2011, pages 817–824. IEEE
Computer Society, 2011.
[20] Mehran Sahami and Daphne Koller. Using machine learning to improve information access. PhD thesis, Stanford University, Department of Computer Science,1998.
[21] Sanjiv K Bhatia and Jitender S Deogun. Conceptual clustering in information retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 28(3):427–436, 1998.
[22] Rohan A Baxter and Jonathan J Oliver. Finding overlapping components with mml. Statistics and Computing, 10(1):5–16, 2000.
[23] Chris S Wallace and David L Dowe. Mml clustering of multi-state, poisson, von mises circular and gaussian distributions. Statistics and Computing, 10(1):73–83, 2000.
[24] Martin HC Law, Mario AT Figueiredo, and Anil K Jain. Simultaneous feature selection and clustering using mixture models. IEEE transactions on pattern analysis and machine intelligence, 26(9):1154–1166, 2004.
[25] Nizar Bouguila. Count data clustering using unsupervised localized feature selection and outliers rejection. In IEEE 23rd International Conference on Tools with Artificial Intelligence, ICTAI 2011, Boca Raton, FL, USA, November 7-9, 2011, pages 1020–1027. IEEE Computer Society, 2011.
[26] N. Bouguila and R. I. Hammoud. Color texture classification by a discrete statistical model and feature selection. In Proc. of the IEEE International Conference on Image Processing (ICIP), pages 1381–1384, 2009.
[27] Herv´e J´egou, Matthijs Douze, and Cordelia Schmid. On the burstiness of visual elements. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1169–1176. IEEE, 2009.
[28] James E Mosimann. On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika, 49(1/2):65–82,1962.
[29] Rasmus E Madsen, David Kauchak, and Charles Elkan. Modeling word burstiness using the dirichlet distribution. In Proceedings of the 22nd international conference on Machine learning, pages 545–552. ACM, 2005.
[30] Yulong Wang, Yuan Yan Tang, Luoqing Li, and Xianwei Zheng. Block sparse representation for pattern classification: Theory, extensions and applications. Pattern Recognition, 88:198–209, 2019.
[31] Nizar Bouguila and Walid ElGuebaly. A statistical model for histogram refinement. In Vera Kurkov´a, Roman Neruda, and Jan Koutn´ık, editors, Artificial Neural Networks - ICANN 2008 , 18th International Conference, Prague, Czech Republic, September 3-6, 2008, Proceedings, Part I, volume 5163 of Lecture Notes in Computer Science, pages 837–846. Springer, 2008.
[32] Jason D Rennie, Lawrence Shih, Jaime Teevan, and David R Karger. Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th international conference on machine learning (ICML-03), pages 616–623, 2003.
[33] Matthew A Etterson, Gerald J Niemi, and Nicholas P Danz. Estimating the effects of detection heterogeneity and overdispersion on trends estimated from avian point counts. Ecological Applications, 19(8):2049–2066, 2009.
[34] Andreas Lind´en and Samu M¨antyniemi. Using the negative binomial distribution to model overdispersion in ecological count data. Ecology, 92(7):1414–1421, 2011.
[35] Celestin C Kokonendji. Over-and underdispersion models. Methods and Applications of Statistics in Clinical Trials, 2:506–526, 2014.
[36] Nizar Bouguila. A liouville-based approach for discrete data categorization. In Sergei O. Kuznetsov, Dominik Slezak, Daryl H. Hepting, and Boris G. Mirkin, editors, Rough Sets, Fuzzy Sets, Data Mining and Granular Computing - 13th International Conference, RSFDGrC 2011, Moscow, Russia, June 25-27, 2011.
Proceedings, volume 6743 of Lecture Notes in Computer Science, pages 330–337. Springer, 2011.
[37] Nizar Bouguila and Mukti Nath Ghimire. Discrete visual features modeling via leave-one-out likelihood estimation and applications. J. Vis. Commun. Image Represent., 21(7):613–626, 2010.
[38] Dimitris Margaritis and Sebastian Thrun. A bayesian multiresolution independence test for continuous variables. arXiv preprint arXiv:1301.2292, 2013.
[39] John BS Haldane. The fitting of binomial distributions. Annals of Eugenics, 11(1):179–181, 1941.
[40] Norman TJ Bailey. The mathematical theory of epidemics. Technical report, 1957.
[41] DA Griffiths. Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics, pages 637–648, 1973.
[42] Jorge G Morel and Neerchal K Nagaraj. A finite mixture distribution for modelling multinomial extra variation. Biometrika, 80(2):363–371, 1993.
[43] Nagaraj K Neerchal and Jorge G Morel. Large cluster results for two parametric multinomial extra variation models. Journal of the American Statistical Association, 93(443):1078–1087, 1998.
[44] Nagaraj K Neerchal and Jorge G Morel. An improved method for the computation of maximum likeliood estimates for multinomial overdispersion models. Computational Statistics & Data Analysis, 49(1):33–43, 2005.
[45] Kenneth Lange. MM optimization algorithms, volume 147. SIAM, 2016.
[46] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22, 1977.
[47] Tong Tong Wu, Kenneth Lange, et al. The mm alternative to em. Statistical Science, 25(4):492–505, 2010.
[48] Hua Zhou and Yiwen Zhang. Em vs mm: A case study. Computational statistics & data analysis, 56(12):3909–3920, 2012.
[49] Andrew M Raim, Minglei Liu, Nagaraj K Neerchal, and Jorge G Morel. On the method of approximate fisher scoring for finite mixtures of multinomials. Statistical Methodology, 18:115–130, 2014.
[50] David R Hunter and Kenneth Lange. A tutorial on mm algorithms. The American Statistician, 58(1):30–37, 2004.
[51] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
[52] Hua Zhou and Kenneth Lange. Mm algorithms for some discrete multivariate distributions. Journal of Computational and Graphical Statistics, 19(3):645–665, 2010.
[53] C Wallace and D Boulton. An information measure for classification comput. J, 11(2):185–194, 1968.
[54] Hirotugu Akaike. A new look at the statistical model identification. IEEE transactions on automatic control, 19(6):716–723, 1974.
[55] Jorma Rissanen. Modeling by shortest data description. Automatica, 14(5):465– 471, 1978.
[56] Christophe Andrieu, PM Djuri´c, and Arnaud Doucet. Model selection by mcmc computation. Signal Processing, 81(1):19–37, 2001.
[57] Sarunas J Raudys and Anil K. Jain. Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis & Machine Intelligence, (3):252–264, 1991.
[58] Nizar Bouguila and Djemel Ziou. Unsupervised selection of a finite dirichlet mixture model: an mml-based approach. IEEE Transactions on Knowledge and Data Engineering, 18(8):993–1009, 2006.
[59] Nizar Bouguila and Djemel Ziou. High-dimensional unsupervised selection and estimation of a finite generalized dirichlet mixture model based on minimum message length. IEEE transactions on pattern analysis and machine intelligence, 29(10):1716–1731, 2007.
[60] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons, 2012.
[61] Chris S Wallace. Classification by minimum-message-length inference. In International Conference on Computing and Information, pages 72–81. Springer, 1990.
[62] Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Advances in neural information processing systems, pages 649–657, 2015.
[63] Andrew Kachites McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www. cs. cmu.edu/mccallum/bow/, 1996.
[64] Michel Valstar and Maja Pantic. Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In Proc. 3rd Intern. Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, page 65. Paris, France, 2010.
[65] Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops, pages 94–101.
IEEE, 2010.
[66] Christian Schuldt, Ivan Laptev, and Barbara Caputo. Recognizing human actions: a local svm approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., volume 3, pages 32–36. IEEE, 2004.
[67] Yang Wang and Greg Mori. Human action recognition by semilatent topic models. IEEE transactions on pattern analysis and machine intelligence, 31(10):1762–1774, 2009.
[68] Chris S Wallace and Peter R Freeman. Estimation and inference by compact coding. Journal of the Royal Statistical Society: Series B (Methodological), 49(3):240–252, 1987.
[69] Nizar Bouguila and Djemel Ziou. Online clustering via finite mixtures of dirichlet and minimum message length. Engineering Applications of Artificial Intelligence, 19(4):371–379, 2006.
[70] D Michael Titterington, Adrian FM Smith, and Udi E Makov. Statistical analysis of finite mixture distributions. Wiley,, 1985.
[71] Jos´e M Bernardo and Adrian FM Smith. Bayesian theory, volume 405. John Wiley & Sons, 2009.
[72] Jian-Feng Yao. On recursive estimation in incomplete data models. Statistics: A Journal of Theoretical and Applied Statistics, 34(1):27–51, 2000.
[73] Nuha Zamzami, Manar Amayri, Nizar Bouguila, and Stephane Ploix. Online clustering for estimating occupancy in an office setting. In 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), pages 2195–2200. IEEE, 2019.
[74] Victor M Corman, Olfert Landt, Marco Kaiser, Richard Molenkamp, Adam Meijer, Daniel KW Chu, Tobias Bleicker, Sebastian Br¨unink, Julia Schneider, Marie Luisa Schmidt, et al. Detection of 2019 novel coronavirus (2019-ncov) by real-time rt-pcr. Eurosurveillance, 25(3):2000045, 2020.
[75] Nizar Bouguila. Spatial color image databases summarization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2007, Honolulu, Hawaii, USA, April 15-20, 2007, pages 953–956. IEEE, 2007.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top