The Multinomial distribution has been widely used to model count data. To increase clustering efficiency, we use an approximation of the Fisher Scoring as a learning algorithm, which is more robust to the choice of the initial parameter values. Moreover, we consider the generalization of the multinomial model obtained by introducing the Dirichlet as prior, which is called the Dirichlet Compound Multinomial (DCM). Even though DCM can address the burstiness phenomenon of count data, the presence of Gamma function in its density function usually leads to undesired complications. In this thesis, we use two alternative representations of DCM distribution to perform clustering based on finite mixture models, where the mixture parameters are estimated using minorization-maximization algorithm. Moreover, we propose an online learning technique for unsupervised clustering based on a mixture of Neerchal- Morel distributions. While, the novel mixture model is able to capture overdispersion due to a weight parameter assigned to each feature in each cluster, online learning is able to overcome the drawbacks of batch learning in such a way that the mixture’s parameters can be updated instantly for any new data instances. Finally, by implementing a minimum message length model selection criterion, the weights of irrelevant mixture components are driven towards zero, which resolves the problem of knowing the number of clusters beforehand. To evaluate and compare the performance of our proposed models, we have considered five challenging real-world applications that involve high-dimensional count vectors, namely, sentiment analysis, topic detection, facial expression recognition, human action recognition and medical diagnosis. The results show that the proposed algorithms increase the clustering efficiency remarkably as compared to other benchmarks, and the best results are achieved by the models able to accommodate over-dispersed count data.