Bouguila, Nizar (2013) On the smoothing of multinomial estimates using Liouville mixture models and applications. Pattern Analysis and Applications, 16 (3). pp. 349363. ISSN 14337541

Text (application/pdf)
4MBbouguila2013a.pdf  Accepted Version 
Official URL: http://dx.doi.org/10.1007/s1004401102368
Abstract
There has been major progress in recent years in statistical modelbased pattern recognition, data mining and knowledge discovery. In particular, generative models are widely used and are very reliable in terms of overall performance. Success of these models hinges on their ability to construct a representation which captures the underlying statistical distribution of data. In this article, we focus on count data modeling. Indeed, this kind of data is naturally generated in many contexts and in different application domains. Usually, models based on the multinomial assumption are used in this case that may have several shortcomings, especially in the case of highdimensional sparse data. We propose then a principled approach to smooth multinomials using a mixture of BetaLiouville distributions which is learned to reflect and model prior beliefs about multinomial parameters, via both theoretical interpretations and experimental validations, we argue that the proposed smoothing model is general and flexible enough to allow accurate representation of count data.
Divisions:  Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering 

Item Type:  Article 
Refereed:  Yes 
Authors:  Bouguila, Nizar 
Journal or Publication:  Pattern Analysis and Applications 
Date:  August 2013 
Digital Object Identifier (DOI):  10.1007/s1004401102368 
Keywords:  Liouville family of distributions Mixture models Smoothing Count data Generative discriminative learning SVM Texture classification Object recognition 
ID Code:  977853 
Deposited By:  DANIELLE DENNIE 
Deposited On:  27 Sep 2013 14:13 
Last Modified:  18 Jan 2018 17:45 
References:
1. Brodley CE, Smyth P (1997) Applying classification algorithms in practice. Stat Comput 7(1):45–562.Bouguila N, Ziou D, Vaillancourt J (2003) Novel Mixture based on the Dirichlet distribution: application to data and image classification. In: Perner P, Rosenfeld A (eds) Machine learning and data mining in pattern recognition (MLDM). LNAI, vol 2734. Springer, Berlin, pp 172–181
3.Vijaya PA, Murty MN, Subramanian DK (2006) Efficient median based clustering and classification techniques for protein sequences. Pattern Anal Appl 9(23):243–255
4.Dagan I, Lee L, Perrira FCN (1999) Similaritybased models of word cooccurrence probabilities. Mach Learn 34(1–3):43–69
5.Scott S, Matwin S (1999) Feature engineering for text classification. In: Proceedings of the international conference on machine learning (ICML), pp 379–388
6.Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, 8th European conference on computer vision (ECCV)
7.Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using threedimensional textons. Int J Comput Vis 43(1):29–44
8.Bouguila N, ElGuebaly W (2009) Discrete data clustering using finite mixture models. Pattern Recognit 42(1):33–42
9.Cheng BYM, Carbonell JG, KleinSeetharaman J (2005) Protein classification based on text document classification techniques. Prot Struct Funct Bioinform 58:955–970
10.Witten IH, Bell TC (1991) The zerofrequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans Inform Theory 37(4):1085–1094
11.Fienberg SE, Holland PW (1973) Simultaneous estimation of multinomial cell probabilities. J Am Stat Assoc 68(343):683–691
12.Hall P, Titterington DM (1987) On smoothing sparse multinomial data. Aust J Stat 29(1):19–37» CrossRef
13.Simonoff JS (1995) Smoothing categorical data. J Stat Plann Infer 47:41–69
14.Bouguila N, Ziou D (2007) Unsupervised learning of a finite discrete mixture: applications to texture modeling and image databases summarization. J Vis Commun Image Represent 18(4):295–309
15.Bouguila N, Ziou D (2004) A powerful finite mixture model based on the generalized Dirichlet distribution: unsupervised learning and applications. In Proceedings of the 17th international conference on pattern recognition (ICPR), pp 280–283
16.Bouguila N, Ziou D (2004) Dirichletbased probability model applied to human skin detection. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 521–524
17.Bouguila N, Ziou D, Hammoud RI (2009) On Bayesian analysis of a finite generalized Dirichlet mixture via a metropoliswithinGibbs sampling. Pattern Anal Appl 12(2):151–166
18.McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
19.Hoare Z (2008) Landscapes of Naive Bayes classifiers. Pattern Anal Appl 11(1):59–72
20.AndrésFerrer J, Juan A (2010) Constrained domain maximum likelihood estimation for Naive Bayes text classification. Pattern Anal Appl 13(2):189–196
21.Goodman LA (1970) The multivariate analysis of qualitative data: interactions among multiple classifications. J Am Stat Assoc 65(329):226–256
22.Goodman LA (1971) The analysis of multidimensional contingency tables: stepwise procedures and direct estimation methods for building models for multiple classifications. Technometrics 13(1):33–61
23.Goodman LA (1964) Interactions in multidimensional contingency tables. Ann Math Stat 35(2):632–646
24.Gart JJ, Zweifel JR (1967) On the bias of various estimators of the logit and its variance with application to quantal bioassay. Biometrika 54(1/2):181–187
25.Grizzle JE, Starmer CF, Koch GG (1969) Analysis of categorical data by linear models. Biometrics 25(3):489–504
26.Bouguila N, Ziou D (2004) Improving content based image retrieval systems using finite multinomial Dirichlet mixture. In: Proceedings of the IEEE workshop on machine learning for signal processing (MLSP), pp 23–32
27.Bouguila N (2007) Spatial color image databases summarization. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP), vol 1, Honolulu, HI, USA, pp 953–956
28.Good IJ, Bayesian A (1967) Significance test for multinomial distribution (with Discussion). J R Stat Soc B 29(3):399–431
29.Fienberg SE (1972) On the choice of flattening constants for estimating multinomial probabilities. J Multivar Anal 2(1):127–134
30.Lidstone GJ (1920) Note on the general case of the Bayes–Laplace formula for inductive or a posteriori probabilities. Trans Fac Actuar 8:182–192
31.Jeffreys J (1961) Theory of probability. 3rd edn. Clarendon Press, Oxford
32.Perks W (1947) Some observations on inverse probability including a new indifference rule (with discussion). J Inst Actuar 73:285–334
33.Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543
34.Lochner RH (1975) A generalized Dirichlet distribution in Bayesian life testing. J R Stat Soc B 37:103–113
35.Bouguila N, ElGuebaly W (2008) On discrete data clustering. In: Proceedings of the Pacific–Asia conference on knowledge discovery and data mining (PAKDD). LNCS, vol 5012. Springer, Osaka, pp 503–510
36.Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman and Hall, New York
37.Bouguila N, Ziou D (2005) Using unsupervised learning of a finite Dirichlet mixture model to improve pattern recognition applications. Pattern Recognit Lett 26(12):1916–1925
38.Bouguila N, Ziou D, Monga E (2006) Practical Bayesian estimation of a finite beta mixture through Gibbs sampling and its applications. Stat Comput 16(2):215–225
39.Robbins HE (1956) An empirical Bayes approach to statistics. In: Neyman J (ed) Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1, pp 157–163
40.Robbins HE (1964) The empirical Bayes approach to statistics. Ann Math Stat 35(1):1–20
41.Deely JJ, Lindley DV (1981) Bayes empirical Bayes. J Am Stat Assoc 76(376):833–841
42.Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods for data analysis, 2nd edn. Chapman & Hall/CRC, Boca Raton
43.McLachlan JG, Krishnan T (1997) The EM Algorithm and Extensions. Wiley
44.Hu T, Sung SY (2005) Clustering spatial data with a hybrid EM approach. Pattern Anal Appl 8(1–2):139–148
45.Bouguila N, Ziou D (2007) Highdimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10):1716–1731
46.Rissanen J (1978) Modeling by shortest data description. Automatica 14:465–471
47.Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175
48.Lebanon G, Lafferty J (2004) Hyperplane margin classifiers on the multinomial manifold. In: Proceedings of the international conference on machine learning (ICML), pp 66–73
49.Vapnik VN (1998) Statistical learning theory. Wiley, New York
50.Zhang D, Chen X, Lee WS (2005) Text classification with kernels on the multinomial manifold. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 266–273
51.Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5:819–844
52.Moreno PJ, Ho PP, Vasconcelos N (2003) A Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In: Proceedimgs of advances in neural information processing systems (NIPS). MIT Press, Cambridge
53.Topsoe F (2000) Some inequalities for information divergence and related measures of discrimination. IEEE Trans Inform Theory 46(4):1602–1609
54.Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogrambased image classification. IEEE Trans Neural Netw 10(5):1055–1064
55.Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In: Proceedings of the European conference on computer vision (ECCV), pp 255–271
56.Szczypiński PM, Strzelecki M, Materka A, Klepaczko A (2009) MaZda: a software package for image texture analysis. Comput Methods Prog Biomed 94(1):66–76
57.Zhu SC, Wu Y, Mumford D (1998) Filters, random fields and maximum entropy (FRAME): towards a unified theory for texture modeling. Int J Comput Vis 27(2):107–126
58.Varma M, Zisserman A (2009) A statistical approach to material classification using image patch exemplars. IEEE Trans Pattern Anal Mach Intell 31(11):2032–2047
59.Dana KJ, van Ginneken B, Nayar SK, Koenderink JJ (1999) Reflectance and texture of realworld surfaces. ACM Trans Graphics 18(1):1–34
60.Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intell 27(8):1265–1278
61.Grzegorzek M (2010) A system for 3D texturebased probabilistic object recognition and its applications. Pattern Anal Appl 13(3):333–348
62.Schiele B, Pentland A (1999) Probabilistic object recognition and localization. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 177–182
63.Amsaleg L, Gros P (2001) Contentbased retrieval using local descriptors: problems and issues from a database perspective. Pattern Anal Appl 4(2–3):108–124
64.Caputo B, Wallraven C, Nilsback ME (2004) Object categorization via local kernels. In: Proceedings of the 17th international conference on pattern recognition (ICPR), pp 132–135
65.Lyu S (2005) Mercer kernels for object recognition with local features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 223–229
66.Deselaers T, Keysers D, Ney H (2005) Discriminative training for object recognition using image patches. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 157–162
67.Loupias E, Sebe N, Bres S, Jolion J (2000) Waveletbased salient points for image retrieval. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 518–521
68.Linde Y, Buzo A, Gray RM (1980) An algorithm for vector quantization design. IEEE Trans Commun 28:84–95
69.Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL20). Technical Report CUCS00596, Columbia University
70.Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL100). Technical Report CUCS00696, Columbia University
71.Weber M, Welling M, Perona P (2000) Unsupervised learning of object models and recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 18–32
72.Bouguila N, Ziou D (2010) A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Trans Neural Netw 21(1):107–122
73.Bouguila N (2009) A modelbased approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans Knowl Data Eng 21(12):1649–1664
Repository Staff Only: item control page