Mixture-Based Clustering for High-Dimensional Count Data Using Minorization-Maximization Approaches

Title:

Mixture-Based Clustering for High-Dimensional Count Data Using Minorization-Maximization Approaches

Bregu, Ornela ORCID: https://orcid.org/0000-0001-8337-2376 (2020) Mixture-Based Clustering for High-Dimensional Count Data Using Minorization-Maximization Approaches. Masters thesis, Concordia University.

Preview

Text (application/pdf)
Bregu_MASc_F2020.pdf - Accepted Version
Available under License Spectrum Terms of Access.

2MB

Abstract

The Multinomial distribution has been widely used to model count data. To increase
clustering efficiency, we use an approximation of the Fisher Scoring as a learning algorithm, which is more robust to the choice of the initial parameter values. Moreover,
we consider the generalization of the multinomial model obtained by introducing the
Dirichlet as prior, which is called the Dirichlet Compound Multinomial (DCM). Even
though DCM can address the burstiness phenomenon of count data, the presence of
Gamma function in its density function usually leads to undesired complications. In
this thesis, we use two alternative representations of DCM distribution to perform
clustering based on finite mixture models, where the mixture parameters are estimated using minorization-maximization algorithm. Moreover, we propose an online
learning technique for unsupervised clustering based on a mixture of Neerchal- Morel
distributions. While, the novel mixture model is able to capture overdispersion due
to a weight parameter assigned to each feature in each cluster, online learning is able
to overcome the drawbacks of batch learning in such a way that the mixture’s parameters can be updated instantly for any new data instances. Finally, by implementing
a minimum message length model selection criterion, the weights of irrelevant mixture components are driven towards zero, which resolves the problem of knowing the
number of clusters beforehand. To evaluate and compare the performance of our
proposed models, we have considered five challenging real-world applications that
involve high-dimensional count vectors, namely, sentiment analysis, topic detection,
facial expression recognition, human action recognition and medical diagnosis. The
results show that the proposed algorithms increase the clustering efficiency remarkably as compared to other benchmarks, and the best results are achieved by the
models able to accommodate over-dispersed count data.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:	Thesis (Masters)
Authors:	Bregu, Ornela
Institution:	Concordia University
Degree Name:	M.A. Sc.
Program:	Information and Systems Engineering
Date:	16 August 2020
Thesis Supervisor(s):	Bouguila, Nizar
Keywords:	count data, overdispersion,burstiness, high-dimensionality, AFSA, DCM, Neechal-Morel distribution, minorization-maximization, finite mixture models, minimum message length, image datasets
ID Code:	987364
Deposited By:	ORNELA BREGU
Deposited On:	21 Apr 2021 13:47
Last Modified:	21 Apr 2021 13:47

References:

Bibliography
[1] Seshadri Tirunillai and Gerard J Tellis. Mining marketing meaning from online
chatter: Strategic brand analysis of big data using latent dirichlet allocation.
Journal of Marketing Research, 51(4):463–479, 2014.
[2] Wullianallur Raghupathi and Viju Raghupathi. Big data analytics in healthcare:
promise and potential. Health information science and systems, 2(1):3, 2014.
[3] Nizar Bouguila. Clustering of count data using generalized dirichlet multinomial
distributions. IEEE Trans. Knowl. Data Eng., 20(4):462–474, 2008.
[4] Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell.
Text classification from labeled and unlabeled documents using em. Machine
learning, 39(2-3):103–134, 2000.
[5] Gabriella Csurka, Christopher Dance, Lixin Fan, Jutta Willamowski, and C´edric
Bray. Visual categorization with bags of keypoints. In Workshop on statistical
learning in computer vision, ECCV, volume 1, pages 1–2. Prague, 2004.
[6] Nizar Bouguila. Count data modeling and classification using finite mixtures of
distributions. IEEE Transactions on Neural Networks, 22(2):186–198, 2010.
[7] Nizar Bouguila and Ola Amayri. A discrete mixture-based kernel for svms:
Application to spam and image categorization. Inf. Process. Manag., 45(6):631–
642, 2009.
[8] Nizar Bouguila and Walid ElGuebaly. On discrete data clustering. In Takashi
Washio, Einoshin Suzuki, Kai Ming Ting, and Akihiro Inokuchi, editors, Advances in Knowledge Discovery and Data Mining, 12th Pacific-Asia Conference,
PAKDD 2008, Osaka, Japan, May 20-23, 2008 Proceedings, volume 5012 of Lecture Notes in Computer Science, pages 503–510. Springer, 2008.
[9] Inderjit S Dhillon and Dharmendra S Modha. Concept decompositions for large
sparse text data using clustering. Machine learning, 42(1-2):143–175, 2001.
[10] Slava Katz. Estimation of probabilities from sparse data for the language model
component of a speech recognizer. IEEE transactions on acoustics, speech, and
signal processing, 35(3):400–401, 1987.
[11] Kenneth W. Church and William A. Gale. Poisson mixtures. Natural Language
Engineering, 1(2):163–190, 1995.
[12] Slava M Katz. Distribution of content words and phrases in text and language
modelling. Natural language engineering, 2(1):15–59, 1996.
[13] John Hinde and Clarice GB Dem´etrio. Overdispersion: models and estimation.
Computational statistics & data analysis, 27(2):151–170, 1998.
[14] Mario A. T. Figueiredo and Anil K. Jain. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis & Machine Intelligence,
(3):381–396, 2002.
[15] Nizar Bouguila and Walid ElGuebaly. Discrete data clustering using finite mixture models. Pattern Recognit., 42(1):33–42, 2009.
[16] Geoffrey McLachlan and David Peel. Finite mixture models. John Wiley & Sons,
2004.
[17] Makoto Iwayama and Takenobu Tokunaga. Cluster-based text categorization: a
comparison of category search strategies. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information
retrieval, pages 273–280. Citeseer, 1995.
[18] Nizar Bouguila and Walid ElGuebaly. A generative model for spatial color image
databases categorization. In Proceedings of the IEEE International Conference
on Acoustics, Speech, and Signal Processing, ICASSP 2008, March 30 - April 4,
2008, Caesars Palace, Las Vegas, Nevada, USA, pages 821–824. IEEE, 2008.
[19] Ali Shojaee Bakhtiari and Nizar Bouguila. An expandable hierarchical statistical
framework for count data modeling and its application to object classification.
In IEEE 23rd International Conference on Tools with Artificial Intelligence, ICTAI 2011, Boca Raton, FL, USA, November 7-9, 2011, pages 817–824. IEEE
Computer Society, 2011.
[20] Mehran Sahami and Daphne Koller. Using machine learning to improve information access. PhD thesis, Stanford University, Department of Computer Science,
1998.
[21] Sanjiv K Bhatia and Jitender S Deogun. Conceptual clustering in information
retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 28(3):427–436, 1998.
[22] Rohan A Baxter and Jonathan J Oliver. Finding overlapping components with
mml. Statistics and Computing, 10(1):5–16, 2000.
[23] Chris S Wallace and David L Dowe. Mml clustering of multi-state, poisson, von
mises circular and gaussian distributions. Statistics and Computing, 10(1):73–83,
2000.
[24] Martin HC Law, Mario AT Figueiredo, and Anil K Jain. Simultaneous feature
selection and clustering using mixture models. IEEE transactions on pattern
analysis and machine intelligence, 26(9):1154–1166, 2004.
[25] Nizar Bouguila. Count data clustering using unsupervised localized feature selection and outliers rejection. In IEEE 23rd International Conference on Tools
with Artificial Intelligence, ICTAI 2011, Boca Raton, FL, USA, November 7-9,
2011, pages 1020–1027. IEEE Computer Society, 2011.
[26] N. Bouguila and R. I. Hammoud. Color texture classification by a discrete statistical model and feature selection. In Proc. of the IEEE International Conference
on Image Processing (ICIP), pages 1381–1384, 2009.
[27] Herv´e J´egou, Matthijs Douze, and Cordelia Schmid. On the burstiness of visual
elements. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1169–1176. IEEE, 2009.
[28] James E Mosimann. On the compound multinomial distribution, the multivariate
β-distribution, and correlations among proportions. Biometrika, 49(1/2):65–82,
1962.
[29] Rasmus E Madsen, David Kauchak, and Charles Elkan. Modeling word burstiness using the dirichlet distribution. In Proceedings of the 22nd international
conference on Machine learning, pages 545–552. ACM, 2005.
[30] Yulong Wang, Yuan Yan Tang, Luoqing Li, and Xianwei Zheng. Block sparse
representation for pattern classification: Theory, extensions and applications.
Pattern Recognition, 88:198–209, 2019.
[31] Nizar Bouguila and Walid ElGuebaly. A statistical model for histogram refinement. In Vera Kurkov´a, Roman Neruda, and Jan Koutn´ık, editors, Artificial
Neural Networks - ICANN 2008 , 18th International Conference, Prague, Czech
Republic, September 3-6, 2008, Proceedings, Part I, volume 5163 of Lecture Notes
in Computer Science, pages 837–846. Springer, 2008.
[32] Jason D Rennie, Lawrence Shih, Jaime Teevan, and David R Karger. Tackling
the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th
international conference on machine learning (ICML-03), pages 616–623, 2003.
[33] Matthew A Etterson, Gerald J Niemi, and Nicholas P Danz. Estimating the
effects of detection heterogeneity and overdispersion on trends estimated from
avian point counts. Ecological Applications, 19(8):2049–2066, 2009.
[34] Andreas Lind´en and Samu M¨antyniemi. Using the negative binomial distribution
to model overdispersion in ecological count data. Ecology, 92(7):1414–1421, 2011.
[35] Celestin C Kokonendji. Over-and underdispersion models. Methods and Applications of Statistics in Clinical Trials, 2:506–526, 2014.
[36] Nizar Bouguila. A liouville-based approach for discrete data categorization. In
Sergei O. Kuznetsov, Dominik Slezak, Daryl H. Hepting, and Boris G. Mirkin,
editors, Rough Sets, Fuzzy Sets, Data Mining and Granular Computing - 13th
International Conference, RSFDGrC 2011, Moscow, Russia, June 25-27, 2011.
Proceedings, volume 6743 of Lecture Notes in Computer Science, pages 330–337.
Springer, 2011.
[37] Nizar Bouguila and Mukti Nath Ghimire. Discrete visual features modeling via
leave-one-out likelihood estimation and applications. J. Vis. Commun. Image
Represent., 21(7):613–626, 2010.
[38] Dimitris Margaritis and Sebastian Thrun. A bayesian multiresolution independence test for continuous variables. arXiv preprint arXiv:1301.2292, 2013.
[39] John BS Haldane. The fitting of binomial distributions. Annals of Eugenics,
11(1):179–181, 1941.
[40] Norman TJ Bailey. The mathematical theory of epidemics. Technical report,
1957.
[41] DA Griffiths. Maximum likelihood estimation for the beta-binomial distribution
and an application to the household distribution of the total number of cases of
a disease. Biometrics, pages 637–648, 1973.
[42] Jorge G Morel and Neerchal K Nagaraj. A finite mixture distribution for modelling multinomial extra variation. Biometrika, 80(2):363–371, 1993.
[43] Nagaraj K Neerchal and Jorge G Morel. Large cluster results for two parametric multinomial extra variation models. Journal of the American Statistical
Association, 93(443):1078–1087, 1998.
[44] Nagaraj K Neerchal and Jorge G Morel. An improved method for the computation of maximum likeliood estimates for multinomial overdispersion models.
Computational Statistics & Data Analysis, 49(1):33–43, 2005.
[45] Kenneth Lange. MM optimization algorithms, volume 147. SIAM, 2016.
[46] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood
from incomplete data via the em algorithm. Journal of the Royal Statistical
Society: Series B (Methodological), 39(1):1–22, 1977.
[47] Tong Tong Wu, Kenneth Lange, et al. The mm alternative to em. Statistical
Science, 25(4):492–505, 2010.
[48] Hua Zhou and Yiwen Zhang. Em vs mm: A case study. Computational statistics
& data analysis, 56(12):3909–3920, 2012.
[49] Andrew M Raim, Minglei Liu, Nagaraj K Neerchal, and Jorge G Morel. On
the method of approximate fisher scoring for finite mixtures of multinomials.
Statistical Methodology, 18:115–130, 2014.
[50] David R Hunter and Kenneth Lange. A tutorial on mm algorithms. The American Statistician, 58(1):30–37, 2004.
[51] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
[52] Hua Zhou and Kenneth Lange. Mm algorithms for some discrete multivariate
distributions. Journal of Computational and Graphical Statistics, 19(3):645–665,
2010.
[53] C Wallace and D Boulton. An information measure for classification comput. J,
11(2):185–194, 1968.
[54] Hirotugu Akaike. A new look at the statistical model identification. IEEE
transactions on automatic control, 19(6):716–723, 1974.
[55] Jorma Rissanen. Modeling by shortest data description. Automatica, 14(5):465–
471, 1978.
[56] Christophe Andrieu, PM Djuri´c, and Arnaud Doucet. Model selection by mcmc
computation. Signal Processing, 81(1):19–37, 2001.
[57] Sarunas J Raudys and Anil K. Jain. Small sample size effects in statistical
pattern recognition: Recommendations for practitioners. IEEE Transactions on
Pattern Analysis & Machine Intelligence, (3):252–264, 1991.
[58] Nizar Bouguila and Djemel Ziou. Unsupervised selection of a finite dirichlet
mixture model: an mml-based approach. IEEE Transactions on Knowledge and
Data Engineering, 18(8):993–1009, 2006.
[59] Nizar Bouguila and Djemel Ziou. High-dimensional unsupervised selection and
estimation of a finite generalized dirichlet mixture model based on minimum
message length. IEEE transactions on pattern analysis and machine intelligence,
29(10):1716–1731, 2007.
[60] Thomas M Cover and Joy A Thomas. Elements of information theory. John
Wiley & Sons, 2012.
[61] Chris S Wallace. Classification by minimum-message-length inference. In International Conference on Computing and Information, pages 72–81. Springer,
1990.
[62] Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Advances in neural information processing systems, pages 649–657, 2015.
[63] Andrew Kachites McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www. cs. cmu.
edu/mccallum/bow/, 1996.
[64] Michel Valstar and Maja Pantic. Induced disgust, happiness and surprise: an
addition to the mmi facial expression database. In Proc. 3rd Intern. Workshop
on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect,
page 65. Paris, France, 2010.
[65] Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and
Iain Matthews. The extended cohn-kanade dataset (ck+): A complete dataset
for action unit and emotion-specified expression. In 2010 ieee computer society
conference on computer vision and pattern recognition-workshops, pages 94–101.
IEEE, 2010.
[66] Christian Schuldt, Ivan Laptev, and Barbara Caputo. Recognizing human actions: a local svm approach. In Proceedings of the 17th International Conference
on Pattern Recognition, 2004. ICPR 2004., volume 3, pages 32–36. IEEE, 2004.
[67] Yang Wang and Greg Mori. Human action recognition by semilatent topic
models. IEEE transactions on pattern analysis and machine intelligence,
31(10):1762–1774, 2009.
[68] Chris S Wallace and Peter R Freeman. Estimation and inference by compact
coding. Journal of the Royal Statistical Society: Series B (Methodological),
49(3):240–252, 1987.
[69] Nizar Bouguila and Djemel Ziou. Online clustering via finite mixtures of dirichlet
and minimum message length. Engineering Applications of Artificial Intelligence,
19(4):371–379, 2006.
[70] D Michael Titterington, Adrian FM Smith, and Udi E Makov. Statistical analysis
of finite mixture distributions. Wiley,, 1985.
[71] Jos´e M Bernardo and Adrian FM Smith. Bayesian theory, volume 405. John
Wiley & Sons, 2009.
[72] Jian-Feng Yao. On recursive estimation in incomplete data models. Statistics:
A Journal of Theoretical and Applied Statistics, 34(1):27–51, 2000.
[73] Nuha Zamzami, Manar Amayri, Nizar Bouguila, and Stephane Ploix. Online
clustering for estimating occupancy in an office setting. In 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), pages 2195–2200. IEEE,
2019.
[74] Victor M Corman, Olfert Landt, Marco Kaiser, Richard Molenkamp, Adam
Meijer, Daniel KW Chu, Tobias Bleicker, Sebastian Br¨unink, Julia Schneider,
Marie Luisa Schmidt, et al. Detection of 2019 novel coronavirus (2019-ncov) by
real-time rt-pcr. Eurosurveillance, 25(3):2000045, 2020.
[75] Nizar Bouguila. Spatial color image databases summarization. In Proceedings of
the IEEE International Conference on Acoustics, Speech, and Signal Processing,
ICASSP 2007, Honolulu, Hawaii, USA, April 15-20, 2007, pages 953–956. IEEE,
2007.

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Mixture-Based Clustering for High-Dimensional Count Data Using Minorization-Maximization Approaches

Mixture-Based Clustering for High-Dimensional Count Data Using Minorization-Maximization Approaches

Abstract

References: