Login | Register

Distributions based Regression Techniques for Compositional Data


Distributions based Regression Techniques for Compositional Data

Ankam, Divya (2019) Distributions based Regression Techniques for Compositional Data. Masters thesis, Concordia University.

[thumbnail of Ankam_MASc_S2019.pdf]
Text (application/pdf)
Ankam_MASc_S2019.pdf - Accepted Version
Available under License Spectrum Terms of Access.


A systematic study of regression methods for compositional data, which are unique and rare are explored in this thesis. We start with the basic machine learning concept of regression. We use regression equations to solve a classification problem. With partial least squares discriminant analysis (PLS-DA), we follow regression algorithms and solve classification problems, like spam filtering and intrusion detection. After getting the basic understanding of how regression works, we move on to more complex algorithms of distributions based regression. We explore the uni-dimensional case of distributions, applied to regression, the beta-regression. This gives us an understanding of how, when the data to be predicted, or the outcome, is assumed to be of beta distribution, a prediction can be made with regression equations. To further enhance our understanding, we look into Dirichlet distribution, which is for a multi-dimensional case. Unlike traditional regression, here we are predicting a compositional outcome. Two novel regression approaches based on distributions are proposed for compositional data, namely generalized Dirichlet regression and Beta-Liouville regression. They are extensions of Beta regression in a multi-dimensional scenario, similar to Dirichlet regression. The models are learned by maximum likelihood estimation algorithm using Newton-Raphson approach. The performance comparison between the proposed models and other popular solutions is given and both synthetic and real data sets extracted from challenging applications such as market share analysis using Google-Trends and occupancy estimation in smart buildings are evaluated to show the merits of the proposed approaches. Our work will act as a tool for product based companies to estimate how their investments in advertising have yielded results in the market shares. Google-Trends gives an estimate of the popularity of a company, which reflects the effect of advertisements. This thesis bridges the gap between open source data from Google-Trends and market shares.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:Ankam, Divya
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Quality Systems Engineering
Date:February 2019
Thesis Supervisor(s):Bouguila, Nizar
Keywords:Machine learning, data mining, regression, spam filter, intrusion detection, PLS-DA, beta, Dirichlet, share-market, google, trends
ID Code:985009
Deposited By: Divya Ankam
Deposited On:03 Aug 2020 14:49
Last Modified:03 Aug 2020 14:49
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top