Shehab, Mohammed (2024) Techniques to Enhance Just-In-Time Software Defect Prediction Models. PhD thesis, Concordia University.
Text (application/pdf)
4MBShehab_PhD_S2024.pdf - Accepted Version Restricted to Repository staff only until 17 April 2026. Available under License Spectrum Terms of Access. |
Abstract
Software defects can lead to critical failures. Just-In-Time Software Defect Prediction (JIT-SDP) techniques identify potential defects early, improving software reliability and maintainability. This thesis addresses project clusters, data imbalance, and classifier combination challenges for JIT-SDP. The contributions were evaluated using diverse software projects and 34 datasets, totaling 259k commits.
The first contribution introduces ClusterCommit, a JIT-SDP approach tailored for project clusters sharing libraries and functionalities. Unlike traditional methods, ClusterCommit employs a machine learning model trained on commits from various projects within a cluster. The study incorporates six machine learning and three deep learning models. The results reveal noteworthy improvements, with mean Area Under the Curve (AUC) values ranging from 4% to 12%, particularly prominent in complex models such as Random Forest (RF) and Support Vector Machine (SVM) when dealing with large clusters. In contrast, simpler models like Naive Bayes (NB), Logistic Regression (LR), Decision Tree (DT), and k-Nearest Neighbors (k-NN) do not perform as well when applied to clusters of projects. This observed trend extends to deep learning models, where all models experience a performance from 3% to 30% with the ClusterCommit approach, irrespective of cluster size.
The second contribution proposed a One-Class Classification (OCC) approach to tackle the data imbalance challenge in JIT-SDP models. OCC algorithms, such as One-class SVM, Isolation Forest, and One-class k-NN, perform better than binary classifiers in medium to high data imbalance ratios. They achieve mean AUCs of 83%, 81%, and 86% for IOF, OC-k-NN, and OC-SVM, respectively, and require fewer features, reducing computational overhead.
Lastly, the JITBoost is a framework that uses a Boolean Combination of Classifiers to construct robust JIT-SDP models. Three BCC algorithms are investigated. JITBoost achieves superior performance by combining decisions from six traditional machine learning and deep learning algorithms, with mean AUCs of 89%, 87%, and 88% for JITBoost-BBC, JITBoost-IBC, and JITBoost-WPIBC, respectively.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Shehab, Mohammed |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Electrical and Computer Engineering |
Date: | 30 January 2024 |
Thesis Supervisor(s): | Hamou-Lhadj, Abdelwahab |
Keywords: | Deep Learning Machine Learning Just-In-Time Software Defect Prediction Boolean Combination of Classifiers |
ID Code: | 993763 |
Deposited By: | Mohammed Shehab |
Deposited On: | 05 Jun 2024 15:27 |
Last Modified: | 05 Jun 2024 15:27 |
Repository Staff Only: item control page