Koochemeshkian, Pantea (2020) Distribution-based Regression for Count and Semi-Bounded Data. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
3MBKoochemeshkian_MASc_S2020.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Data mining techniques have been successfully utilized in different applications of significant fields, including pattern recognition, computer vision, medical researches, etc. With the wealth of data generated every day, there is a lack of practical analysis tools to discover hidden relationships and trends. Among all statistical frameworks, regression has been proven to be one of the most strong tools in prediction. The complexity of data that is unfavorable for most models is a considerable challenge in prediction. The ability of a model to perform accurately and efficiently is extremely important. Thus, a model must be selected to fit the data well, such that the learning from previous data is efficient and highly accurate.
This work is motivated by the limited number of regression analysis tools for multivariate count data in the literature. We propose two regression models for count data based on flexible distributions, namely, the multinomial Beta-Liouville and multinomial scaled Dirichlet, and evaluate them in the problem of disease diagnosis. The performance is measured based on the accuracy of the prediction, which depends on the nature and complexity of the dataset. Our results show the efficiency of the two proposed regression models where the prediction performance of both models is competitive to other previously used regression approaches for count data and to the best results in the literature.
Then, we propose three regression models for positive vectors based on flexible distributions for semi-bounded data, namely, inverted Dirichlet, inverted generalize Dirichlet, and inverted Beta-Liouville. The efficiency of these models is tested via real-world applications, including software defects prediction, spam filtering, and disease diagnosis. Our results show that the performance of the three proposed regression models is better than other commonly used regression models.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Koochemeshkian, Pantea |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Electrical and Computer Engineering |
Date: | 2 April 2020 |
Thesis Supervisor(s): | Bouguila, Nizar |
ID Code: | 986672 |
Deposited By: | pantea koochemeshkian |
Deposited On: | 19 Apr 2021 17:33 |
Last Modified: | 19 Apr 2021 17:33 |
Repository Staff Only: item control page