Login | Register

Bayesian learning of inverted Dirichlet mixtures for SVM kernels generation

Title:

Bayesian learning of inverted Dirichlet mixtures for SVM kernels generation

Bdiri, Taoufik and Bouguila, Nizar (2012) Bayesian learning of inverted Dirichlet mixtures for SVM kernels generation. Neural Computing and Applications . ISSN 0941-0643

[img]
Preview
Text (application/pdf)
Bouguila2012.pdf - Accepted Version
3MB

Official URL: http://dx.doi.org/10.1007/s00521-012-1094-z

Abstract

We describe approaches for positive data modeling and classification using both finite inverted Dirichlet mixture models and support vector machines (SVMs). Inverted Dirichlet mixture models are used to tackle an outstanding challenge in SVMs namely the generation of accurate kernels. The kernels generation approaches, grounded on ideas from information theory that we consider, allow the incorporation of data structure and its structural constraints. Inverted Dirichlet mixture models are learned within a principled Bayesian framework using both Gibbs sampler and Metropolis-Hastings for parameter estimation and Bayes factor for model selection (i.e., determining the number of mixture’s components). Our Bayesian learning approach uses priors, which we derive by showing that the inverted Dirichlet distribution belongs to the family of exponential distributions, over the model parameters, and then combines these priors with information from the data to build posterior distributions. We illustrate the merits and the effectiveness of the proposed method with two real-world challenging applications namely object detection and visual scenes analysis and classification.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Article
Refereed:Yes
Authors:Bdiri, Taoufik and Bouguila, Nizar
Journal or Publication:Neural Computing and Applications
Date:2012
Digital Object Identifier (DOI):10.1007/s00521-012-1094-z
Keywords:Mixture models – SVM – Hybrid models – Inverted Dirichlet – Bayesian inference – Bayes factor – Model selection – Gibbs sampling – Kernels – Object detection – Image databases
ID Code:975145
Deposited By: DANIELLE DENNIE
Deposited On:18 Jan 2013 18:43
Last Modified:18 Jan 2018 17:39

References:

1. Ho TK, Baird HS (1997) Large-scale simulation studies in image pattern recognition. IEEE Trans Pattern Anal Mach Intell 19(10):1067–1079
2. Saitta L, Neri F (1998) Learning in the ‘‘Real World’’. Mach Learn 30:133–163
3. Domingos P (1999) MetaCost: a general method for making
Classifiers cost-sensitive. In: Proceedings of the international conference on knowledge discovery and data mining (KDD), pp 155–164
4. Sohn SY (1999) Meta analysis of classification algorithms for pattern recognition. IEEE Trans Pattern Anal Mach Intell 21(11):1137–1144
5. Cherkassky V, Mulier F (2007) Learning from data: concepts, theory, and methods, 2nd edn. Wiley, London
6. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
7. Fraley C, Raftery AE (1999) MCLUST: software for modelbased cluster analysis. J Classif 16:297–306
8. Ghosh D, Chinnaiyan AM (2002) Mixture modelling of a gene
expression data from microarray experiments. Bioinformatics
18(2):275–286
9. Keysers D, Och FJ, Ney H (2002) Maximum entropy and
Gaussian models for image object recognition. In: Proceedings of the DAGM-symposium, pp 498–506
10. Lu Z, Peng Y, Ip HHS (2010) Gaussian mixture learning via robust competitive agglomeration. Pattern Recogn Lett 31(7):539–547
11. Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543
12. Bouguila N, Ziou D (2006) A hybrid SEM algorithm for highdimensional unsupervised learning using a finite generalized Dirichlet mixture. IEEE Trans Image Process 15(9):2657–2668
13. Bouguila N, Ziou D, Hammoud RI (2009) On Bayesian analysis of a finite generalized Dirichlet mixture via a metropolis-within-Gibbs sampling. Pattern Anal Appl 12(2):151–166
14. Bdiri T, Bouguila N (2011) Learning inverted Dirichlet mixtures for positive data clustering. In: Proceedings of 3th international conference on rough sets, fuzzy sets, data mining and granular computing (RSFDGrC), volume 6743 of Lecture Notes in Computer Science. Springer, Berlin, pp 265–272
15. Bdiri T, Bouguila N (2011) An infinite mixture of inverted dirichlet distributions. In: Lu B-L, Zhang L, Kwok JT (eds) ICONIP (2), volume 7063 of Lecture Notes in Computer Science. Springer, Berlin, pp 71–78
16. McLachlan JG, Krishnan T (1997) The EM algorithm and
extensions. Wiley, London
17. Bdiri T, Bouguila N (2012) Positive vectors clustering using inverted Dirichlet finite mixture models. Exp Syst Appl 39(2): 1869–1882
18. Racine A, Grieve AP, Fluhler H, Smith AFM (1986) Bayesian methods in practice: experiences in the pharmaceutical industry (with discussion). Appl Stat 35(2):93–150
19. Barber D, Williams CKI (1996) Gaussian processes for Bayesian classification via hybrid Monte Carlo. In: Advances in neural information processing systems (NIPS), pp 340–346
20. Agarwal DK, Gelfand AE (2005) Slice sampling for simulation based fitting of spatial data models. Stat Comput 15:61–69
21. Meng X-L, Schilling S (1996) Fitting full-information item factor models and an empirical investigation of bridge sampling. J Am Stat Assoc 91(435):1254–1267
22. Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):1–21
23. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
24. Miller D, Rao AV, Rose K, Gersho A (1996) A global optimization technique for statistical classifier design. IEEE Trans Signal Process 44(12):3108–3122
25. Liu J, Song J-Q, Huang Y-L (2007) A generative/discriminative hybrid model: Bayes perceptron classifier. In: Proceedings of international conference on machine learning and cybernetics (ICLMC), pp 2767–2772
26. Scho¨lkopf B, Bartlett P, Smola A, Williamson R (1998)
Shrinking the tube: a new support vector regression lgorithm. In:Advances in neural information processing systems (NIPS), pp 330–336
27. Joachims T (2000) Estimating the generalization performance of an SVM efficiently. In: Proceedings of the international conference on machine learning (ICML), pp 431–438
28. Cristianini N, Campbell C, Shawe-Taylor J (1998) Dynamically adapting kernels in support vector machines. In: Advances in neural information processing systems (NIPS), pp 204–210
29. Fine S, Scheinberg K (2001) Efficient SVM training using lowrank kernel representations. J Mach Learn Res 2:243–264
30. Jaakkola TS, Haussler D (1998) Exploiting generative models in discriminative classifiers. In: Proceedings of advances in neural information systems (NIPS). MIT Press, Cambridge, pp 487–493
31. Bouguila N (2011) Bayesian hybrid generative discriminative learning based on finite Liouville mixture model. Pattern Recogn 44(6):1183–1200
32. Moreno PJ, Ho PP, Vasconcelos N (2003) A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge
33. Chan AB, Vasconcelos N, Moreno PJ (2004) A family of probabilistic kernels based on information divergence. Technical report SVCL-TR 2004/01. University of California, San Diego
34. Jebara T, Kondor R (2003) Bhattacharyya expected likelihood kernels. In: Proceedings of the annual conference on computational learning theory (COLT) pp 57–71
35. Kwok JT-Y (1999) Moderating the outputs of support vector machine classifiers. IEEE Trans Neural Netw 10(5):1018–1031
36. Sollich P (1999) Probabilistic interpretations and Bayesian methods for support vector machines. In: Proceedings of the international conference on artificial neural networks (ICANN), pp 91–96
37. Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46:21–52
38. Thompson EA (1996) Likelihood and linkage: from fisher to the future. Ann Stat 24(2):449–465
39. Stewart L (1987) Hierarchical Bayesian analysis using Monte Carlo integration: computing posterior distributions when there are many possible models. The Stat 36(2/3):211–219
40. Binder DA (1981) Approximations to Bayesian clustering rules. Biometrika 68(1):275–285
41. Winkler RL (1967) The quantification of judgment: some methodological suggestions. J Am Stat Assoc 62(320):1105–1129
42. von Holstein CSS (1971) Two techniques for assessment of
subjective probability distributions—an experimental study. Acta Psychol 35(6):478–494
43. Haughton D (1988) On the choice of a model to fit data from an exponential family. Ann Stat 16(1):342–355
44. Haughton D (1989) Size of the error in the choice of a model to fit data from an exponential family. Sankhya Indian J Stat Ser A 51(1):45–58
45. Frigessi A, Ga°semyr J, Rue H (2000) Antithetic coupling of two Gibbs Sampler Chains. Ann Stat 28(4):1128–1149
46. Casella G (2004) Mixture models, latent variables and partitioned importance sampling. Stat Methodol 1(1–2):1–18
47. Roberts GO, Rosenthal JS (1998) Optimal scaling of discrete approximations to Langevin diffusions. J R Stat Soc Ser B (Stat Methodol) 60(1):255–268
48. Robert CP (2001) The Bayesian choice. Springer, Berlin
49. Rosenthal JS (1993) Rates of convergence for data augmentation on finite sample spaces. Ann Appl Probab 3(3):819–839
50. Roberts GO, Polson NG (1994) On the geometric convergence of
the Gibbs sampler. J R Stat Soc Ser B (Stat Methodol)
56(2):377–384
51. Geyer CJ, Thompson EA (1995) Annealing Markov chain Monte
Carlo with applications to ancestral inference. J Am Stat Assoc
90(431):909–920
52. Rosenthal JS (1995) Rates of convergence for Gibbs sampling for variance component models. Ann Stat 23(3):740–761
53. Roberts GO, Tweedie RL (1999) Bounds on regeneration times and convergence rates for Markov chains. Stochast Process Appl 80:211–229
54. Raftery AE, Lewis SM (1992) One long run with diagnostics: implementation strategies for Markov Chain Monte Carlo. Stat Sci 7(4):493–497
55. Ho TK, Kleinberg EM (1996) Building projectable classifiers of arbitrary complexity. In: Proceedings of the international conference on pattern recognition (ICPR), pp 880–885
56. Kleinberg EM (1996) An overtraining-resistant stochastic modeling method for pattern recognition. Ann Stat 24(6):2319–2349
57. Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 20(3):1350–1360
58. Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika 83(2):251–266
59. Meng X-L, Wong WH (1996) Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat Sin 6:831–860
60. Chen M-H, Shao Q-M (1997) Estimating ratios of normalizing constants for densities with different dimensions. Stat Sin 7:607–630
61. Rissanen J (1978) Modeling by shortest data description.
Automatica 14:465–471
62. Bennett KP, Campbell C (2000) Support vector machines: hype or Hallelujah? SIGKDD Explor 2(2):1–13
63. Van Gestel T, Suykens JAK, De Brabanter J, De Moor B,
Vandewalle J (2001) Kernel canonical correlation analysis and least squares support vector machines. In: Proceedings of international conference on artificial neural networks (ICANN), pp 384–389
64. Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola J (2001) On kernel-target alignment. In: Advances in neural information processing systems (NIPS), pp 367–373
65. Hammer B, Jain BJ (2004) Neural methods for non-standard data. In: Proceedings of the European symposium on artificial neural networks (ESANN), pp 281–292
66. Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res 8:725–760
67. Re`nyi A (1960) On measures of entropy and information. In: Proceedings of Berkeley symposium on mathematical statistics and probability, pp 547–561
68. Lin J (1991) Divergence measure based on shannon entropy. IEEE Trans Inf Theory 37(14):145–151
69. Vailaya A, Jain A (2000) Detecting sky and vegetation in outdoor images. In: Proceedings of the international society for optical engineering (SPIE), pp 411–420
70. Torralba A (2003) Contextual priming for object detection. Int J Comput Vis 53(2):169–191
71. Lin Y-Y, Liu T-L, Fuh C-S (2007) Local ensemble kernel
learning for object category recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
72. Vedaldi A, Soatto S (2008) Quick shift and kernel methods for mode seeking. In: Proceedings of the European conference on computer vision (ECCV), pp 705–718
73. Itti L, Koch C, Neibur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
74. Fergus R, Perona P, Zisserman A (2005) A sparse object category model for efficient learning and exhaustive recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 380–387
75. Bosch A, Zisserman A, Mun˜oz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval (CIVR). ACM, New York, pp 401–408
76. Fulkerson B, Vedaldi A, Soatto S (2008) Localizing objects with smart dictionaries. In: Proceedings of the European conference on computer vision (ECCV), pp 179–192
77. Vedaldi A, Soatto S (2005) Features for recognition: viewpoint invariance for non-planar scenes. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1474–1481
78. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
79. Leibe B, Schiele B (2003) Analyzing appearance and contour based methods for object categorization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 409–415
80. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 524–531
81. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2169–2178
82. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
83. Friedman N (1998) The Bayesian structural EM algorithm. In: Proceedings of the 4th conference on uncertainty in artificial intelligence (UAI), pp 129–138
84. Dalal SR, Hall WJ (1983) Approximating priors by mixtures of natural conjugate priors. J R Stat Soc Ser B (Stat Methodol) 45(2):278–286
85. Brown LD (1986) Fundamentals of statistical exponential families with applications in statistical decision theory. Institute of Mathematical Statistics, Hayward
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top