Login | Register

Distribution-based Regression for Count and Semi-Bounded Data

Title:

Distribution-based Regression for Count and Semi-Bounded Data

Koochemeshkian, Pantea (2020) Distribution-based Regression for Count and Semi-Bounded Data. Masters thesis, Concordia University.

[thumbnail of Koochemeshkian_MASc_S2020.pdf]
Preview
Text (application/pdf)
Koochemeshkian_MASc_S2020.pdf - Accepted Version
Available under License Spectrum Terms of Access.
3MB

Abstract

Data mining techniques have been successfully utilized in different applications of significant fields, including pattern recognition, computer vision, medical researches, etc. With the wealth of data generated every day, there is a lack of practical analysis tools to discover hidden relationships and trends. Among all statistical frameworks, regression has been proven to be one of the most strong tools in prediction. The complexity of data that is unfavorable for most models is a considerable challenge in prediction. The ability of a model to perform accurately and efficiently is extremely important. Thus, a model must be selected to fit the data well, such that the learning from previous data is efficient and highly accurate.

This work is motivated by the limited number of regression analysis tools for multivariate count data in the literature. We propose two regression models for count data based on flexible distributions, namely, the multinomial Beta-Liouville and multinomial scaled Dirichlet, and evaluate them in the problem of disease diagnosis. The performance is measured based on the accuracy of the prediction, which depends on the nature and complexity of the dataset. Our results show the efficiency of the two proposed regression models where the prediction performance of both models is competitive to other previously used regression approaches for count data and to the best results in the literature.
Then, we propose three regression models for positive vectors based on flexible distributions for semi-bounded data, namely, inverted Dirichlet, inverted generalize Dirichlet, and inverted Beta-Liouville. The efficiency of these models is tested via real-world applications, including software defects prediction, spam filtering, and disease diagnosis. Our results show that the performance of the three proposed regression models is better than other commonly used regression models.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:Thesis (Masters)
Authors:Koochemeshkian, Pantea
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Electrical and Computer Engineering
Date:2 April 2020
Thesis Supervisor(s):Bouguila, Nizar
ID Code:986672
Deposited By: pantea koochemeshkian
Deposited On:19 Apr 2021 17:33
Last Modified:19 Apr 2021 17:33
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top