Ankam, Divya (2019) Distributions based Regression Techniques for Compositional Data. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
515kBAnkam_MASc_S2019.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
A systematic study of regression methods for compositional data, which are unique and rare are explored in this thesis. We start with the basic machine learning concept of regression. We use regression equations to solve a classification problem. With partial least squares discriminant analysis (PLS-DA), we follow regression algorithms and solve classification problems, like spam filtering and intrusion detection. After getting the basic understanding of how regression works, we move on to more complex algorithms of distributions based regression. We explore the uni-dimensional case of distributions, applied to regression, the beta-regression. This gives us an understanding of how, when the data to be predicted, or the outcome, is assumed to be of beta distribution, a prediction can be made with regression equations. To further enhance our understanding, we look into Dirichlet distribution, which is for a multi-dimensional case. Unlike traditional regression, here we are predicting a compositional outcome. Two novel regression approaches based on distributions are proposed for compositional data, namely generalized Dirichlet regression and Beta-Liouville regression. They are extensions of Beta regression in a multi-dimensional scenario, similar to Dirichlet regression. The models are learned by maximum likelihood estimation algorithm using Newton-Raphson approach. The performance comparison between the proposed models and other popular solutions is given and both synthetic and real data sets extracted from challenging applications such as market share analysis using Google-Trends and occupancy estimation in smart buildings are evaluated to show the merits of the proposed approaches. Our work will act as a tool for product based companies to estimate how their investments in advertising have yielded results in the market shares. Google-Trends gives an estimate of the popularity of a company, which reflects the effect of advertisements. This thesis bridges the gap between open source data from Google-Trends and market shares.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Ankam, Divya |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | February 2019 |
Thesis Supervisor(s): | Bouguila, Nizar |
Keywords: | Machine learning, data mining, regression, spam filter, intrusion detection, PLS-DA, beta, Dirichlet, share-market, google, trends |
ID Code: | 985009 |
Deposited By: | Divya Ankam |
Deposited On: | 03 Aug 2020 14:49 |
Last Modified: | 03 Aug 2020 14:49 |
Repository Staff Only: item control page