1. Ho TK, Baird HS (1997) Large-scale simulation studies in image pattern recognition. IEEE Trans Pattern Anal Mach Intell 19(10):1067–1079 2. Saitta L, Neri F (1998) Learning in the ‘‘Real World’’. Mach Learn 30:133–163 3. Domingos P (1999) MetaCost: a general method for making Classifiers cost-sensitive. In: Proceedings of the international conference on knowledge discovery and data mining (KDD), pp 155–164 4. Sohn SY (1999) Meta analysis of classification algorithms for pattern recognition. IEEE Trans Pattern Anal Mach Intell 21(11):1137–1144 5. Cherkassky V, Mulier F (2007) Learning from data: concepts, theory, and methods, 2nd edn. Wiley, London 6. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York 7. Fraley C, Raftery AE (1999) MCLUST: software for modelbased cluster analysis. J Classif 16:297–306 8. Ghosh D, Chinnaiyan AM (2002) Mixture modelling of a gene expression data from microarray experiments. Bioinformatics 18(2):275–286 9. Keysers D, Och FJ, Ney H (2002) Maximum entropy and Gaussian models for image object recognition. In: Proceedings of the DAGM-symposium, pp 498–506 10. Lu Z, Peng Y, Ip HHS (2010) Gaussian mixture learning via robust competitive agglomeration. Pattern Recogn Lett 31(7):539–547 11. Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543 12. Bouguila N, Ziou D (2006) A hybrid SEM algorithm for highdimensional unsupervised learning using a finite generalized Dirichlet mixture. IEEE Trans Image Process 15(9):2657–2668 13. Bouguila N, Ziou D, Hammoud RI (2009) On Bayesian analysis of a finite generalized Dirichlet mixture via a metropolis-within-Gibbs sampling. Pattern Anal Appl 12(2):151–166 14. Bdiri T, Bouguila N (2011) Learning inverted Dirichlet mixtures for positive data clustering. In: Proceedings of 3th international conference on rough sets, fuzzy sets, data mining and granular computing (RSFDGrC), volume 6743 of Lecture Notes in Computer Science. Springer, Berlin, pp 265–272 15. Bdiri T, Bouguila N (2011) An infinite mixture of inverted dirichlet distributions. In: Lu B-L, Zhang L, Kwok JT (eds) ICONIP (2), volume 7063 of Lecture Notes in Computer Science. Springer, Berlin, pp 71–78 16. McLachlan JG, Krishnan T (1997) The EM algorithm and extensions. Wiley, London 17. Bdiri T, Bouguila N (2012) Positive vectors clustering using inverted Dirichlet finite mixture models. Exp Syst Appl 39(2): 1869–1882 18. Racine A, Grieve AP, Fluhler H, Smith AFM (1986) Bayesian methods in practice: experiences in the pharmaceutical industry (with discussion). Appl Stat 35(2):93–150 19. Barber D, Williams CKI (1996) Gaussian processes for Bayesian classification via hybrid Monte Carlo. In: Advances in neural information processing systems (NIPS), pp 340–346 20. Agarwal DK, Gelfand AE (2005) Slice sampling for simulation based fitting of spatial data models. Stat Comput 15:61–69 21. Meng X-L, Schilling S (1996) Fitting full-information item factor models and an empirical investigation of bridge sampling. J Am Stat Assoc 91(435):1254–1267 22. Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):1–21 23. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795 24. Miller D, Rao AV, Rose K, Gersho A (1996) A global optimization technique for statistical classifier design. IEEE Trans Signal Process 44(12):3108–3122 25. Liu J, Song J-Q, Huang Y-L (2007) A generative/discriminative hybrid model: Bayes perceptron classifier. In: Proceedings of international conference on machine learning and cybernetics (ICLMC), pp 2767–2772 26. Scho¨lkopf B, Bartlett P, Smola A, Williamson R (1998) Shrinking the tube: a new support vector regression lgorithm. In:Advances in neural information processing systems (NIPS), pp 330–336 27. Joachims T (2000) Estimating the generalization performance of an SVM efficiently. In: Proceedings of the international conference on machine learning (ICML), pp 431–438 28. Cristianini N, Campbell C, Shawe-Taylor J (1998) Dynamically adapting kernels in support vector machines. In: Advances in neural information processing systems (NIPS), pp 204–210 29. Fine S, Scheinberg K (2001) Efficient SVM training using lowrank kernel representations. J Mach Learn Res 2:243–264 30. Jaakkola TS, Haussler D (1998) Exploiting generative models in discriminative classifiers. In: Proceedings of advances in neural information systems (NIPS). MIT Press, Cambridge, pp 487–493 31. Bouguila N (2011) Bayesian hybrid generative discriminative learning based on finite Liouville mixture model. Pattern Recogn 44(6):1183–1200 32. Moreno PJ, Ho PP, Vasconcelos N (2003) A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge 33. Chan AB, Vasconcelos N, Moreno PJ (2004) A family of probabilistic kernels based on information divergence. Technical report SVCL-TR 2004/01. University of California, San Diego 34. Jebara T, Kondor R (2003) Bhattacharyya expected likelihood kernels. In: Proceedings of the annual conference on computational learning theory (COLT) pp 57–71 35. Kwok JT-Y (1999) Moderating the outputs of support vector machine classifiers. IEEE Trans Neural Netw 10(5):1018–1031 36. Sollich P (1999) Probabilistic interpretations and Bayesian methods for support vector machines. In: Proceedings of the international conference on artificial neural networks (ICANN), pp 91–96 37. Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46:21–52 38. Thompson EA (1996) Likelihood and linkage: from fisher to the future. Ann Stat 24(2):449–465 39. Stewart L (1987) Hierarchical Bayesian analysis using Monte Carlo integration: computing posterior distributions when there are many possible models. The Stat 36(2/3):211–219 40. Binder DA (1981) Approximations to Bayesian clustering rules. Biometrika 68(1):275–285 41. Winkler RL (1967) The quantification of judgment: some methodological suggestions. J Am Stat Assoc 62(320):1105–1129 42. von Holstein CSS (1971) Two techniques for assessment of subjective probability distributions—an experimental study. Acta Psychol 35(6):478–494 43. Haughton D (1988) On the choice of a model to fit data from an exponential family. Ann Stat 16(1):342–355 44. Haughton D (1989) Size of the error in the choice of a model to fit data from an exponential family. Sankhya Indian J Stat Ser A 51(1):45–58 45. Frigessi A, Ga°semyr J, Rue H (2000) Antithetic coupling of two Gibbs Sampler Chains. Ann Stat 28(4):1128–1149 46. Casella G (2004) Mixture models, latent variables and partitioned importance sampling. Stat Methodol 1(1–2):1–18 47. Roberts GO, Rosenthal JS (1998) Optimal scaling of discrete approximations to Langevin diffusions. J R Stat Soc Ser B (Stat Methodol) 60(1):255–268 48. Robert CP (2001) The Bayesian choice. Springer, Berlin 49. Rosenthal JS (1993) Rates of convergence for data augmentation on finite sample spaces. Ann Appl Probab 3(3):819–839 50. Roberts GO, Polson NG (1994) On the geometric convergence of the Gibbs sampler. J R Stat Soc Ser B (Stat Methodol) 56(2):377–384 51. Geyer CJ, Thompson EA (1995) Annealing Markov chain Monte Carlo with applications to ancestral inference. J Am Stat Assoc 90(431):909–920 52. Rosenthal JS (1995) Rates of convergence for Gibbs sampling for variance component models. Ann Stat 23(3):740–761 53. Roberts GO, Tweedie RL (1999) Bounds on regeneration times and convergence rates for Markov chains. Stochast Process Appl 80:211–229 54. Raftery AE, Lewis SM (1992) One long run with diagnostics: implementation strategies for Markov Chain Monte Carlo. Stat Sci 7(4):493–497 55. Ho TK, Kleinberg EM (1996) Building projectable classifiers of arbitrary complexity. In: Proceedings of the international conference on pattern recognition (ICPR), pp 880–885 56. Kleinberg EM (1996) An overtraining-resistant stochastic modeling method for pattern recognition. Ann Stat 24(6):2319–2349 57. Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 20(3):1350–1360 58. Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika 83(2):251–266 59. Meng X-L, Wong WH (1996) Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat Sin 6:831–860 60. Chen M-H, Shao Q-M (1997) Estimating ratios of normalizing constants for densities with different dimensions. Stat Sin 7:607–630 61. Rissanen J (1978) Modeling by shortest data description. Automatica 14:465–471 62. Bennett KP, Campbell C (2000) Support vector machines: hype or Hallelujah? SIGKDD Explor 2(2):1–13 63. Van Gestel T, Suykens JAK, De Brabanter J, De Moor B, Vandewalle J (2001) Kernel canonical correlation analysis and least squares support vector machines. In: Proceedings of international conference on artificial neural networks (ICANN), pp 384–389 64. Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola J (2001) On kernel-target alignment. In: Advances in neural information processing systems (NIPS), pp 367–373 65. Hammer B, Jain BJ (2004) Neural methods for non-standard data. In: Proceedings of the European symposium on artificial neural networks (ESANN), pp 281–292 66. Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res 8:725–760 67. Re`nyi A (1960) On measures of entropy and information. In: Proceedings of Berkeley symposium on mathematical statistics and probability, pp 547–561 68. Lin J (1991) Divergence measure based on shannon entropy. IEEE Trans Inf Theory 37(14):145–151 69. Vailaya A, Jain A (2000) Detecting sky and vegetation in outdoor images. In: Proceedings of the international society for optical engineering (SPIE), pp 411–420 70. Torralba A (2003) Contextual priming for object detection. Int J Comput Vis 53(2):169–191 71. Lin Y-Y, Liu T-L, Fuh C-S (2007) Local ensemble kernel learning for object category recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8 72. Vedaldi A, Soatto S (2008) Quick shift and kernel methods for mode seeking. In: Proceedings of the European conference on computer vision (ECCV), pp 705–718 73. Itti L, Koch C, Neibur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259 74. Fergus R, Perona P, Zisserman A (2005) A sparse object category model for efficient learning and exhaustive recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 380–387 75. Bosch A, Zisserman A, Mun˜oz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval (CIVR). ACM, New York, pp 401–408 76. Fulkerson B, Vedaldi A, Soatto S (2008) Localizing objects with smart dictionaries. In: Proceedings of the European conference on computer vision (ECCV), pp 179–192 77. Vedaldi A, Soatto S (2005) Features for recognition: viewpoint invariance for non-planar scenes. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1474–1481 78. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110 79. Leibe B, Schiele B (2003) Analyzing appearance and contour based methods for object categorization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 409–415 80. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 524–531 81. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2169–2178 82. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8 83. Friedman N (1998) The Bayesian structural EM algorithm. In: Proceedings of the 4th conference on uncertainty in artificial intelligence (UAI), pp 129–138 84. Dalal SR, Hall WJ (1983) Approximating priors by mixtures of natural conjugate priors. J R Stat Soc Ser B (Stat Methodol) 45(2):278–286 85. Brown LD (1986) Fundamentals of statistical exponential families with applications in statistical decision theory. Institute of Mathematical Statistics, Hayward