In 2015, 2.5 quintillion bytes of data were daily generated worldwide of which 90% were unstructured data that do not follow any pre-defined model. These data can be found in a great variety of formats among them are texts, images, audio tracks, or videos. With appropriate techniques, this massive amount of data is a goldmine from which one can extract a variety of meaningful embedded information. Among those techniques, machine learning algorithms allow multiple processing possibilities from compact data representation, to data clustering, classification, analysis, and synthesis, to the detection of outliers. Data modeling is the first step for performing any of these tasks and the accuracy and reliability of this initial step is thus crucial for subsequently building up a complete data processing framework. The principal motivation behind my work is the over-use of the Gaussian assumption for data modeling in the literature. Though this assumption is probably the best to make when no information about the data to be modeled is available, in most cases studying a few data properties would make other distributions a better assumption. In this thesis, I focus on proportional data that are most commonly known in the form of histograms and that naturally arise in a number of situations such as in bag-of-words methods. These data are non-Gaussian and their modeling with distributions belonging the Dirichlet family, that have common properties, is expected to be more accurate. The models I focus on are the hidden Markov models, well-known for their capabilities to easily handle dynamic ordered multivariate data. They have been shown to be very effective in numerous fields for various applications for the last 30 years and especially became a corner stone in speech processing. Despite their extensive use in almost all computer vision areas, they are still mainly suited for Gaussian data modeling. I propose here to theoretically derive different approaches for learning and applying to real-world situations hidden Markov models based on mixtures of Dirichlet, generalized Dirichlet, Beta-Liouville distributions, and mixed data. Expectation-Maximization and variational learning approaches are studied and compared over several data sets, specifically for the task of detecting and localizing unusual events. Hybrid HMMs are proposed to model mixed data with the goal of detecting changes in satellite images corrupted by different noises. Finally, several parametric distances for comparing Dirichlet and generalized Dirichlet-based HMMs are proposed and extensively tested for assessing their robustness. My experimental results show situations in which such models are worthy to be used, but also unravel their strength and limitations.