Login | Register

Hierarchical Probabilistic Modeling with the Nested Dirichlet Distribution

Title:

Hierarchical Probabilistic Modeling with the Nested Dirichlet Distribution

Alkhawaja, Fares ORCID: https://orcid.org/0000-0002-1404-259X (2026) Hierarchical Probabilistic Modeling with the Nested Dirichlet Distribution. PhD thesis, Concordia University.

[thumbnail of Alkhawja_PhD_S2026.pdf]
Preview
Text (application/pdf)
Alkhawja_PhD_S2026.pdf - Accepted Version
Available under License Spectrum Terms of Access.
29MB

Abstract

In the era of digitization, the immense amounts of data generated on a daily basis, from abundant media streams across various modalities, make it challenging to infer patterns that can further help industries draw conclusions or make decisions. These challenges arise from the unstructured nature of such data, its high dimensionality, and the complex correlations among its features. As a result, traditional analytical methods tend to struggle to capture the underlying structure of these data sources and fail to scale effectively. Moreover, existing deep learning models not only require labeled data for training, but also suffer from decision processes that are neither traceable nor explainable. Statistical generative models have witnessed a notable adoption across many applications recently due to their unsupervised nature, efficient performance under data scarcity, and interpretable coefficients that facilitate explainability. The Dirichlet distribution is widely used in generative models, including mixture and topic models. Despite its flexibility, it still has some limitations, namely its restrictive covariance matrix and its direct proportionality between its mean and variance. These limitations affect its ability to generalize to data with more complex correlation structures and to capture latent patterns in high-dimensional, unstructured data.



To address these limitations, this thesis demonstrates a generalization of the Dirichlet distribution, namely the Nested Dirichlet distribution (NDD), in the context of generative modeling frameworks, due to its greater flexibility and hierarchical structure. Moreover, the NDD generalizes the well-known generalized Dirichlet distribution (GDD) by allowing a more general covariance matrix and relaxing the dual-branch constraint imposed by the GDD, resulting in a more flexible tree structure. This enables the NDD to better handle complex sparse data that suffers from the curse of dimensionality. Furthermore, its hierarchical structure promotes the adoption of NDD in hierarchical feature learning and classification frameworks. This thesis demonstrates the power and superiority of the NDD across multiple works. First, it presents the enhanced performance of the NDD over the Dirichlet distribution and the GDD in mixture and hierarchical Bayesian models. This is achieved by introducing beneficial extensions that reduce model complexity and improve the use of the NDD's hierarchical structure. Second, it introduces hierarchical topic models based on the NDD to inherit its hierarchical structure in document-topic or topic-word representations, within topic models such as the Latent Dirichlet Allocation. This includes parametric and non-parametric hierarchical topic models, in addition to incorporating domain knowledge as a topic-word prior through a Nested Dirichlet Forest. Finally, it employs an NDD-based hierarchical topic model in non-intrusive load monitoring (NILM) to disaggregate appliance energy consumption in residential houses by augmenting the model with point processes. To account for the inherently event-driven and temporally correlated nature of appliance usage, temporal correlations among appliances are captured using the Hawkes process and the Transformer Hawkes process. The results reported in this thesis highlight the advantages of the proposed NDD-based frameworks and pave the way for further improvements in future work.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (PhD)
Authors:Alkhawaja, Fares
Institution:Concordia University
Degree Name:Ph. D.
Program:Information and Systems Engineering
Date:3 February 2026
Thesis Supervisor(s):Bouguila, Nizar and Amayri, Manar
Keywords:Nested Dirichlet Distribution, Dirichlet models, statistical generative models, unsupervised learning, hierarchical Bayesian modeling, topic models, Latent Dirichlet Allocation, high-dimensional data, sparse data, explainable models, hierarchical feature learning, non-intrusive load monitoring, Hawkes process.
ID Code:996756
Deposited By: Fares Alkhawja
Deposited On:29 Jun 2026 17:51
Last Modified:29 Jun 2026 17:51
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top