Login | Register

Statistical Models for Short Text Clustering

Title:

Statistical Models for Short Text Clustering

Hannachi, Samar (2021) Statistical Models for Short Text Clustering. Masters thesis, Concordia University.

[thumbnail of Hannachi_MASc_S2022.pdf]
Preview
Archive (application/pdf)
Hannachi_MASc_S2022.pdf - Accepted Version
Available under License Creative Commons Public Domain Dedication.
991kB

Abstract

A notable rise in the amounts of data collected, which were made available to the public, is witnessed. This allowed the emergence of many research problems among which extracting knowledge from short texts and their different related challenges. In this thesis, we elaborate new approaches to enhance short text clustering results obtained through the use of mixture models.
We deployed the collapsed Gibbs sampling algorithm previously used with the Dirichlet Multinomial mixture model on our proposed statistical models. In particular, we proposed the collapsed Gibbs sampling generalized Dirichlet Multinomial (CGSGDM) and the collapsed Gibbs sampling Beta-Liouville Multinomial (CGSBLM) mixture models to cope with the challenges that come with short texts. We demonstrate the efficiency of our proposed approaches on the Google News corpora. We compared the experimental results with related works that made use of the Dirichlet distribution as a prior.
Finally, we scaled our work to use infinite mixture models namely collapsed Gibbs sampling infinite generalized Dirichlet Multinomial mixture model (CGSIGDMM) and collapsed Gibbs sampling infinite Beta-Liouville Multinomial mixture model (CGSIBLMM). We also evaluate our proposed approaches on the Tweet dataset additionally to the previously used Google News dataset. An improvement of the work is also proposed through an online clustering process demonstrating good performance on the same used datasets. A final application is presented to assess the robustness of the proposed framework in the presence of outliers.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:Hannachi, Samar
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Information and Systems Engineering
Date:22 November 2021
Thesis Supervisor(s):Bouguila, Nizar
Keywords:Collapsed Gibbs sampling, short text, Generalized Dirichlet, Beta-Liouville, online clustering, outlier detection
ID Code:990137
Deposited By: SAMAR HANNACHI
Deposited On:16 Jun 2022 14:42
Last Modified:16 Jun 2022 14:42

References:

@article{blei2003latent,
title={Latent dirichlet allocation},
author={Blei, David M and Ng, Andrew Y and Jordan, Michael I},
journal={Journal of machine Learning research},
volume={3},
number={Jan},
pages={993--1022},
year={2003}
},
@article{hu2009latent,
title={Latent dirichlet allocation for text, images, and music},
author={Hu, Diane J},
journal={University of California, San Diego. Retrieved April},
volume={26},
pages={2013},
year={2009}
},
@incollection{aggarwal2012survey,
title={A survey of text clustering algorithms},
author={Aggarwal, Charu C and Zhai, ChengXiang},
booktitle={Mining text data},
pages={77--128},
year={2012},
publisher={Springer}
},


@article{DBLP:journals/eswa/BouguilaE12,
author = {Nizar Bouguila and
Tarek Elguebaly},
title = {A fully Bayesian model based on reversible jump {MCMC} and finite
Beta mixtures for clustering},
journal = {Expert Syst. Appl.},
volume = {39},
number = {5},
pages = {5946--5959},
year = {2012},
url = {https://doi.org/10.1016/j.eswa.2011.11.122},
doi = {10.1016/j.eswa.2011.11.122},
timestamp = {Fri, 26 May 2017 22:54:10 +0200},
biburl = {https://dblp.org/rec/journals/eswa/BouguilaE12.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}



@article{frunza2010machine,
title={A machine learning approach for identifying disease-treatment relations in short texts},
author={Frunza, Oana and Inkpen, Diana and Tran, Thomas},
journal={IEEE transactions on knowledge and data engineering},
volume={23},
number={6},
pages={801--814},
year={2010},
publisher={IEEE}
},
@article{alsmadi2019term,
title={Term weighting scheme for short-text classification: Twitter corpuses},
author={Alsmadi, Issa and Hoon, Gan Keng},
journal={Neural Computing and Applications},
volume={31},
number={8},
pages={3819--3831},
year={2019},
publisher={Springer}
}
@article{zeng2018topic,
title={Topic memory networks for short text classification},
author={Zeng, Jichuan and Li, Jing and Song, Yan and Gao, Cuiyun and Lyu, Michael R and King, Irwin},
journal={arXiv preprint arXiv:1809.03664},
year={2018}
},
@inproceedings{jin2011transferring,
title={Transferring topical knowledge from auxiliary long texts for short text clustering},
author={Jin, Ou and Liu, Nathan N and Zhao, Kai and Yu, Yong and Yang, Qiang},
booktitle={Proceedings of the 20th ACM international conference on Information and knowledge management},
pages={775--784},
year={2011}
},
@inproceedings{dos2014deep,
title={Deep convolutional neural networks for sentiment analysis of short texts},
author={Dos Santos, Cicero and Gatti, Maira},
booktitle={Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers},
pages={69--78},
year={2014}
},
@article{lee2016sequential,
title={Sequential short-text classification with recurrent and convolutional neural networks},
author={Lee, Ji Young and Dernoncourt, Franck},
journal={arXiv preprint arXiv:1603.03827},
year={2016}
},
@inproceedings{yin2014dirichlet,
title={A dirichlet multinomial mixture model-based approach for short text clustering},
author={Yin, Jianhua and Wang, Jianyong},
booktitle={Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining},
pages={233--242},
year={2014}
},

@article{karandikar2006markov,
title={On the markov chain monte carlo (MCMC) method},
author={Karandikar, Rajeeva L},
journal={Sadhana},
volume={31},
number={2},
pages={81--104},
year={2006},
publisher={Springer}
},
@article{DBLP:journals/prl/Bouguila12,
author = {Nizar Bouguila},
title = {Infinite Liouville mixture models with application to text and texture
categorization},
journal = {Pattern Recognit. Lett.},
volume = {33},
number = {2},
pages = {103--110},
year = {2012},
url = {https://doi.org/10.1016/j.patrec.2011.09.037},
doi = {10.1016/j.patrec.2011.09.037},
timestamp = {Sat, 22 Feb 2020 19:31:49 +0100},
biburl = {https://dblp.org/rec/journals/prl/Bouguila12.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
},
@inproceedings{DBLP:conf/annpr/ElguebalyB10,
author = {Tarek Elguebaly and
Nizar Bouguila},
editor = {Friedhelm Schwenker and
Neamat El Gayar},
title = {Bayesian Learning of Generalized Gaussian Mixture Models on Biomedical
Images},
booktitle = {Artificial Neural Networks in Pattern Recognition, 4th {IAPR} {TC3}
Workshop, {ANNPR} 2010, Cairo, Egypt, April 11-13, 2010. Proceedings},
series = {Lecture Notes in Computer Science},
volume = {5998},
pages = {207--218},
publisher = {Springer},
year = {2010},
url = {https://doi.org/10.1007/978-3-642-12159-3\_19},
doi = {10.1007/978-3-642-12159-3\_19},
timestamp = {Tue, 14 May 2019 10:00:39 +0200},
biburl = {https://dblp.org/rec/conf/annpr/ElguebalyB10.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

@article{yildirim2012bayesian,
title={Bayesian inference: Gibbs sampling},
author={Yildirim, Ilker},
journal={Technical Note, University of Rochester},
year={2012}
},


@article{DBLP:journals/tnn/FanB13,
author = {Wentao Fan and
Nizar Bouguila},
title = {Online Learning of a Dirichlet Process Mixture of Beta-Liouville Distributions
Via Variational Inference},
journal = {{IEEE} Trans. Neural Networks Learn. Syst.},
volume = {24},
number = {11},
pages = {1850--1862},
year = {2013},
url = {https://doi.org/10.1109/TNNLS.2013.2268461},
doi = {10.1109/TNNLS.2013.2268461},
timestamp = {Mon, 09 Mar 2020 15:51:17 +0100},
biburl = {https://dblp.org/rec/journals/tnn/FanB13.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
},
@article{bouguila2008clustering,
title={Clustering of count data using generalized Dirichlet multinomial distributions},
author={Bouguila, Nizar},
journal={IEEE Transactions on Knowledge and Data Engineering},
volume={20},
number={4},
pages={462--474},
year={2008},
publisher={IEEE}
},
@techreport{heinrich2005parameter,
title={Parameter estimation for text analysis},
author={Heinrich, Gregor},
year={2005},
institution={Technical report}
},
@inproceedings{banerjee2007clustering,
title={Clustering short texts using wikipedia},
author={Banerjee, Somnath and Ramanathan, Krishnan and Gupta, Ajay},
booktitle={Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval},
pages={787--788},
year={2007}
},
@book{han2011data,
title={Data mining: concepts and techniques},
author={Han, Jiawei and Pei, Jian and Kamber, Micheline},
year={2011},
publisher={Elsevier}
},
@phdthesis{becker2011identification,
title={Identification and characterization of events in social media},
author={Becker, Hila},
year={2011},
school={Columbia University}
},
@inproceedings{zhang2010arimp,
title={ARImp: A generalized adjusted rand index for cluster ensembles},
author={Zhang, Shaohong and Wong, Hau-San},
booktitle={2010 20th International Conference on Pattern Recognition},
pages={778--781},
year={2010},
organization={IEEE}
}
@article{Bouguila09,
author = {Nizar Bouguila},
title = {A Model-Based Approach for Discrete Data Clustering and Feature Weighting
Using {MAP} and Stochastic Complexity},
journal = {{IEEE} Trans. Knowl. Data Eng.},
volume = {21},
number = {12},
pages = {1649--1664},
year = {2009},
}
@article{BouguilaA09,
author = {Nizar Bouguila and
Ola Amayri},
title = {A discrete mixture-based kernel for SVMs: Application to spam and
image categorization},
journal = {Inf. Process. Manag.},
volume = {45},
number = {6},
pages = {631--642},
year = {2009},
}
@inproceedings{BouguilaZ05,
author = {Nizar Bouguila and
Djemel Ziou},
editor = {Petra Perner and
Atsushi Imiya},
title = {MML-Based Approach for Finite Dirichlet Mixture Estimation and Selection},
booktitle = {Machine Learning and Data Mining in Pattern Recognition, 4th International
Conference, {MLDM} 2005, Leipzig, Germany, July 9-11, 2005, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {3587},
pages = {42--51},
publisher = {Springer},
year = {2005},
}
@inproceedings{BouguilaE08,
author = {Nizar Bouguila and
Walid ElGuebaly},
title = {A generative model for spatial color image databases categorization},
booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech,
and Signal Processing, {ICASSP} 2008, March 30 - April 4, 2008, Caesars
Palace, Las Vegas, Nevada, {USA}},
pages = {821--824},
publisher = {{IEEE}},
year = {2008},
}
@article{BakhtiariB14,
author = {Ali Shojaee Bakhtiari and
Nizar Bouguila},
title = {A variational Bayes model for count data learning and classification},
journal = {Eng. Appl. Artif. Intell.},
volume = {35},
pages = {176--186},
year = {2014},
}
@article{BouguilaG10,
author = {Nizar Bouguila and
Mukti Nath Ghimire},
title = {Discrete visual features modeling via leave-one-out likelihood estimation
and applications},
journal = {J. Vis. Commun. Image Represent.},
volume = {21},
number = {7},
pages = {613--626},
year = {2010},
}
@inproceedings{ZamzamiB18,
author = {Nuha Zamzami and
Nizar Bouguila},
editor = {Malek Mouhoub and
Samira Sadaoui and
Otmane A{\"{\i}}t Mohamed and
Moonis Ali},
title = {Text Modeling Using Multinomial Scaled Dirichlet Distributions},
booktitle = {Recent Trends and Future Technology in Applied Intelligence - 31st
International Conference on Industrial Engineering and Other Applications
of Applied Intelligent Systems, {IEA/AIE} 2018, Montreal, QC, Canada,
June 25-28, 2018, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {10868},
pages = {69--80},
publisher = {Springer},
year = {2018},
}
@article{BouguilaE09,
author = {Nizar Bouguila and
Walid ElGuebaly},
title = {Discrete data clustering using finite mixture models},
journal = {Pattern Recognit.},
volume = {42},
number = {1},
pages = {33--42},
year = {2009},
}
@inproceedings{KeikhaBB12,
author = {Mohamad Mehdi and
Nizar Bouguila and
Jamal Bentahar},
editor = {Carole A. Goble and
Peter P. Chen and
Jia Zhang},
title = {Trustworthy Web Service Selection Using Probabilistic Models},
booktitle = {2012 {IEEE} 19th International Conference on Web Services, Honolulu,
HI, USA, June 24-29, 2012},
pages = {17--24},
publisher = {{IEEE} Computer Society},
year = {2012},
}
@article{blei2001latent,
title={Latent dirichlet allocation},
author={Blei, David and Ng, Andrew and Jordan, Michael},
journal={Advances in neural information processing systems},
volume={14},
pages={601--608},
year={2001}
}
@article{blei2007correlated,
title={A correlated topic model of science},
author={Blei, David M and Lafferty, John D and others},
journal={The Annals of Applied Statistics},
volume={1},
number={1},
pages={17--35},
year={2007},
publisher={Institute of Mathematical Statistics}
}
@inproceedings{li2006pachinko,
title={Pachinko allocation: DAG-structured mixture models of topic correlations},
author={Li, Wei and McCallum, Andrew},
booktitle={Proceedings of the 23rd international conference on Machine learning},
pages={577--584},
year={2006}
}
@article{li2012nonparametric,
title={Nonparametric bayes pachinko allocation},
author={Li, Wei and Blei, David and McCallum, Andrew},
journal={arXiv preprint arXiv:1206.5270},
year={2012}
}
@inproceedings{putthividhya2009independent,
title={Independent factor topic models},
author={Putthividhya, Duangmanee and Attias, Hagai T and Nagarajan, Srikantan},
booktitle={Proceedings of the 26th Annual International Conference on Machine Learning},
pages={833--840},
year={2009}
}
@inproceedings{caballero2012generalized,
title={The generalized dirichlet distribution in enhanced topic detection},
author={Caballero, Karla L and Barajas, Joel and Akella, Ram},
booktitle={Proceedings of the 21st ACM international conference on Information and knowledge management},
pages={773--782},
year={2012}
}
@article{bouguila2011hybrid,
title={Hybrid generative/discriminative approaches for proportional data modeling and classification},
author={Bouguila, Nizar},
journal={IEEE Transactions on Knowledge and Data Engineering},
volume={24},
number={12},
pages={2184--2202},
year={2011},
publisher={IEEE}
},
@article{blei2003latent,
title={Latent dirichlet allocation},
author={Blei, David M and Ng, Andrew Y and Jordan, Michael I},
journal={Journal of machine Learning research},
volume={3},
number={Jan},
pages={993--1022},
year={2003}
}


@techreport{heinrich2005parameter,
title={Parameter estimation for text analysis},
author={Heinrich, Gregor},
year={2005},
institution={Technical report}
}
@inproceedings{canini2009online,
title={Online inference of topics with latent Dirichlet allocation},
author={Canini, Kevin and Shi, Lei and Griffiths, Thomas},
booktitle={Artificial Intelligence and Statistics},
pages={65--72},
year={2009}
},
@article{albalawi2020using,
title={Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis. Front},
author={Albalawi, R and Yeap, TH and Benyoucef, M},
journal={Artif. Intell},
volume={3},
pages={42},
year={2020}
}
@article{kherwa2020topic,
title={Topic Modeling: A Comprehensive Review},
author={Kherwa, Pooja and Bansal, Poonam},
journal={EAI Endorsed Transactions on Scalable Information Systems},
volume={7},
number={24},
year={2020},
publisher={European Alliance for Innovation (EAI)}
}
@article{casella1992explaining,
title={Explaining the Gibbs sampler},
author={Casella, George and George, Edward I},
journal={The American Statistician},
volume={46},
number={3},
pages={167--174},
year={1992},
publisher={Taylor \& Francis}
}
@article{griffiths2002gibbs,
title={Gibbs sampling in the generative model of latent dirichlet allocation},
author={Griffiths, Tom},
year={2002},
publisher={Citeseer}
}
@article{gershman2012tutorial,
title={A tutorial on Bayesian nonparametric models},
author={Gershman, Samuel J and Blei, David M},
journal={Journal of Mathematical Psychology},
volume={56},
number={1},
pages={1--12},
year={2012},
publisher={Elsevier}
}
@inproceedings{yin2014dirichlet,
title={A dirichlet multinomial mixture model-based approach for short text clustering},
author={Yin, Jianhua and Wang, Jianyong},
booktitle={Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining},
pages={233--242},
year={2014}
@article{ratnaparkhi2014multinomial,
title={Multinomial Distribution: Properties and Extensions},
author={Ratnaparkhi, MV},
journal={Wiley StatsRef: Statistics Reference Online},
year={2014},
publisher={Wiley Online Library}
}
@article{bouguila2010count,
title={Count data modeling and classification using finite mixtures of distributions},
author={Bouguila, Nizar},
journal={IEEE Transactions on Neural Networks},
volume={22},
number={2},
pages={186--198},
year={2010},
publisher={IEEE}
}
@article{zamzami2020high,
title={High-Dimensional Count Data Clustering Based on an Exponential Approximation to the Multinomial Beta-Liouville Distribution},
author={Zamzami, Nuha and Bougila, Nizar},
journal={Information Sciences},
year={2020},
publisher={Elsevier}
}
@inproceedings{banerjee2007clustering,
title={Clustering short texts using wikipedia},
author={Banerjee, Somnath and Ramanathan, Krishnan and Gupta, Ajay},
booktitle={Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval},
pages={787--788},
year={2007}
}
@article{siddiqui2015short,
title={Short text clustering; challenges \& solutions: a literature review},
author={Siddiqui, Tamanna and Aalam, Parvej},
journal={Int. J. Math. Comput. Res},
volume={3},
number={6},
pages={1025--1031},
year={2015}
}
@inproceedings{koppen2000curse,
title={The curse of dimensionality},
author={K{\"o}ppen, Mario},
booktitle={5th Online World Conference on Soft Computing in Industrial Applications (WSC5)},
volume={1},
pages={4--8},
year={2000}
}
@inproceedings{ramos2003using,
title={Using tf-idf to determine word relevance in document queries},
author={Ramos, Juan and others},
booktitle={Proceedings of the first instructional conference on machine learning},
volume={242},
number={1},
pages={29--48},
year={2003},
organization={Citeseer}
}
@article{zhang2010understanding,
title={Understanding bag-of-words model: a statistical framework},
author={Zhang, Yin and Jin, Rong and Zhou, Zhi-Hua},
journal={International Journal of Machine Learning and Cybernetics},
volume={1},
number={1-4},
pages={43--52},
year={2010},
publisher={Springer}
},
@article{DBLP:journals/csda/Bouguila10,
author = {Nizar Bouguila},
title = {On multivariate binary data clustering and feature weighting},
journal = {Comput. Stat. Data Anal.},
volume = {54},
number = {1},
pages = {120--134},
year = {2010},
url = {https://doi.org/10.1016/j.csda.2009.07.013},
doi = {10.1016/j.csda.2009.07.013},
timestamp = {Tue, 18 Feb 2020 13:54:06 +0100},
biburl = {https://dblp.org/rec/journals/csda/Bouguila10.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}



@article{murphy2006binomial,
title={Binomial and multinomial distributions},
author={Murphy, Kevin P},
journal={University of British Columbia, Tech. Rep},
year={2006}}
@article{farrow2017mas3301,
title={MAS3301 Bayesian Statistics},
author={Farrow, M},
journal={Newcastle University},
year={2017}
}
@article{lin2016dirichlet,
title={On the dirichlet distribution},
author={Lin, Jiayu},
journal={Mater's Report},
year={2016},
publisher={Queen's University Kingston Ontario, Canada}
}
@book{bouguila2020mixture,
title={Mixture models and applications},
author={Bouguila, Nizar and Fan, Wentao},
year={2020},
publisher={Springer}
}
@article{mcdaid2011normalized,
title={Normalized mutual information to evaluate overlapping community finding algorithms},
author={McDaid, Aaron F and Greene, Derek and Hurley, Neil},
journal={arXiv preprint arXiv:1110.2515},
year={2011}
}
@article{qaiser2018text,
title={Text mining: use of TF-IDF to examine the relevance of words to documents},
author={Qaiser, Shahzad and Ali, Ramsha},
journal={International Journal of Computer Applications},
volume={181},
number={1},
pages={25--29},
year={2018}
}
@inproceedings{yin2014dirichlet,
title={A dirichlet multinomial mixture model-based approach for short text clustering},
author={Yin, Jianhua and Wang, Jianyong},
booktitle={Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining},
pages={233--242},
year={2014}
}
@article{nigam2000text,
title={Text classification from labeled and unlabeled documents using EM},
author={Nigam, Kamal and McCallum, Andrew Kachites and Thrun, Sebastian and Mitchell, Tom},
journal={Machine learning},
volume={39},
number={2},
pages={103--134},
year={2000},
publisher={Springer}
}
@inproceedings{hannachi2021short,
title={Short Text Clustering Using Generalized Dirichlet Multinomial Mixture Model.},
author={Hannachi, Samar and Najar, Fatma and Bouguila, Nizar},
booktitle={ACIIDS (Companion)},
pages={149--161},
year={2021}
}
@inproceedings{yin2016model,
title={A model-based approach for text clustering with outlier detection},
author={Yin, Jianhua and Wang, Jianyong},
booktitle={2016 IEEE 32nd International Conference on Data Engineering (ICDE)},
pages={625--636},
year={2016},
organization={IEEE}
}
@inproceedings{crook2009unsupervised,
title={Unsupervised classification of dialogue acts using a Dirichlet process mixture model},
author={Crook, Nigel and Granell, Ramon and Pulman, Stephen},
booktitle={Proceedings of the SIGDIAL 2009 Conference},
pages={341--348},
year={2009}
}
@article{crook2019fast,
title={Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics},
author={Crook, Oliver M and Gatto, Laurent and Kirk, Paul DW},
journal={Statistical applications in genetics and molecular biology},
volume={18},
number={6},
year={2019},
publisher={De Gruyter}
}
@article{teh2006hierarchical,
title={Hierarchical dirichlet processes},
author={Teh, Yee Whye and Jordan, Michael I and Beal, Matthew J and Blei, David M},
journal={Journal of the american statistical association},
volume={101},
number={476},
pages={1566--1581},
year={2006},
publisher={Taylor \& Francis}
}
@inproceedings{yin2016text,
title={A text clustering algorithm using an online clustering scheme for initialization},
author={Yin, Jianhua and Wang, Jianyong},
booktitle={Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining},
pages={1995--2004},
year={2016}
},

@article{boutemedjet2008hybrid,
title={A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering},
author={Boutemedjet, Sabri and Bouguila, Nizar and Ziou, Djemel},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume={31},
number={8},
pages={1429--1443},
year={2008},
publisher={IEEE}
}
@incollection{vacavant2014incremental,
title={Incremental Learning of an Infinite Beta-Liouville Mixture Model for Video Background Subtraction},
author={Vacavant, Antoine},
booktitle={Background Modeling and Foreground Detection for Video Surveillance},
pages={317--334},
year={2014},
publisher={Chapman and Hall/CRC}
}
@inproceedings{fan2013spatio,
title={Spatio-temporal object recognition using variational learning of an infinite statistical model},
author={Fan, Wentao and Bouguila, Nizar},
booktitle={21st European Signal Processing Conference (EUSIPCO 2013)},
pages={1--5},
year={2013},
organization={IEEE}
},

@article{ishwaran2001gibbs,
title={Gibbs sampling methods for stick-breaking priors},
author={Ishwaran, Hemant and James, Lancelot F},
journal={Journal of the American Statistical Association},
volume={96},
number={453},
pages={161--173},
year={2001},
publisher={Taylor \& Francis}
}
@article{bishop2006pattern,
title={Pattern recognition},
author={Bishop, Christopher M},
journal={Machine learning},
volume={128},
number={9},
year={2006}
}
@inproceedings{hannachi2021collapsed,
title={Collapsed Gibbs Sampling of Beta-Liouville Multinomial for Short Text Clustering},
author={Hannachi, Samar and Najar, Fatma and Ihou, Koffi Eddy and Bouguila, Nizar},
booktitle={International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems},
pages={564--571},
year={2021},
organization={Springer}
}
@article{connor1969concepts,
title={Concepts of independence for proportions with a generalization of the Dirichlet distribution},
author={Connor, Robert J and Mosimann, James E},
journal={Journal of the American Statistical Association},
volume={64},
number={325},
pages={194--206},
year={1969},
publisher={Taylor \& Francis}
}
@article{rahman2020efficient,
title={Efficient Feature Mapping in Classifying Proportional Data},
author={Rahman, Md Hafizur and Bouguila, Nizar},
journal={IEEE Access},
volume={9},
pages={3712--3724},
year={2020},
publisher={IEEE}
},
@article{bouguila2012infinite,
title={Infinite Liouville mixture models with application to text and texture categorization},
author={Bouguila, Nizar},
journal={Pattern Recognition Letters},
volume={33},
number={2},
pages={103--110},
year={2012},
publisher={Elsevier}
},
@inproceedings{fan2013learning,
title={Learning finite beta-liouville mixture models via variational bayes for proportional data clustering},
author={Fan, Wentao and Bouguila, Nizar},
booktitle={Twenty-Third International Joint Conference on Artificial Intelligence},
year={2013}
},
@inproceedings{bouguila2008discrete,
title={On discrete data clustering},
author={Bouguila, Nizar and ElGuebaly, Walid},
booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
pages={503--510},
year={2008},
organization={Springer}
},

@article{zamzami2019novel,
title={A novel scaled dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation},
author={Zamzami, Nuha and Bouguila, Nizar},
journal={Pattern Recognition},
volume={95},
pages={36--47},
year={2019},
publisher={Elsevier}
},
@article{ihou2019variational,
title={Variational-based latent generalized Dirichlet allocation model in the collapsed space and applications},
author={Ihou, Koffi Eddy and Bouguila, Nizar},
journal={Neurocomputing},
volume={332},
pages={372--395},
year={2019},
publisher={Elsevier}
},
@article{fan2015expectation,
title={Expectation propagation learning of a Dirichlet process mixture of Beta-Liouville distributions for proportional data clustering},
author={Fan, Wentao and Bouguila, Nizar},
journal={Engineering Applications of Artificial Intelligence},
volume={43},
pages={1--14},
year={2015},
publisher={Elsevier}
},
@inproceedings{bouguila2009nonparametric,
title={A nonparametric bayesian learning model: Application to text and image categorization},
author={Bouguila, Nizar and Ziou, Djemel},
booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
pages={463--474},
year={2009},
organization={Springer}
},
@inproceedings{bakhtiari2011expandable,
title={An expandable hierarchical statistical framework for count data modeling and its application to object classification},
author={Bakhtiari, Ali Shojaee and Bouguila, Nizar},
booktitle={2011 IEEE 23rd International Conference on Tools with Artificial Intelligence},
pages={817--824},
year={2011},
organization={IEEE}
},
@article{bouguila2010discrete,
title={Discrete visual features modeling via leave-one-out likelihood estimation and applications},
author={Bouguila, Nizar and Ghimire, Mukti Nath},
journal={Journal of Visual Communication and Image Representation},
volume={21},
number={7},
pages={613--626},
year={2010},
publisher={Elsevier}
},
@article{zamzami2019model,
title={Model selection and application to high-dimensional count data clustering},
author={Zamzami, Nuha and Bouguila, Nizar},
journal={Applied Intelligence},
volume={49},
number={4},
pages={1467--1488},
year={2019},
publisher={Springer}
},
@article{fan2013online,
title={Online learning of a dirichlet process mixture of beta-liouville distributions via variational inference},
author={Fan, Wentao and Bouguila, Nizar},
journal={IEEE transactions on neural networks and learning systems},
volume={24},
number={11},
pages={1850--1862},
year={2013},
publisher={IEEE}
}
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top