Techniques to Enhance Just-In-Time Software Defect Prediction Models

Title:

Techniques to Enhance Just-In-Time Software Defect Prediction Models

Shehab, Mohammed (2024) Techniques to Enhance Just-In-Time Software Defect Prediction Models. PhD thesis, Concordia University.

Text (application/pdf)
Shehab_PhD_S2024.pdf - Accepted Version
Restricted to Repository staff only until 17 April 2026.
Available under License Spectrum Terms of Access.

4MB

Abstract

Software defects can lead to critical failures. Just-In-Time Software Defect Prediction (JIT-SDP) techniques identify potential defects early, improving software reliability and maintainability. This thesis addresses project clusters, data imbalance, and classifier combination challenges for JIT-SDP. The contributions were evaluated using diverse software projects and 34 datasets, totaling 259k commits.

The first contribution introduces ClusterCommit, a JIT-SDP approach tailored for project clusters sharing libraries and functionalities. Unlike traditional methods, ClusterCommit employs a machine learning model trained on commits from various projects within a cluster. The study incorporates six machine learning and three deep learning models. The results reveal noteworthy improvements, with mean Area Under the Curve (AUC) values ranging from 4% to 12%, particularly prominent in complex models such as Random Forest (RF) and Support Vector Machine (SVM) when dealing with large clusters. In contrast, simpler models like Naive Bayes (NB), Logistic Regression (LR), Decision Tree (DT), and k-Nearest Neighbors (k-NN) do not perform as well when applied to clusters of projects. This observed trend extends to deep learning models, where all models experience a performance from 3% to 30% with the ClusterCommit approach, irrespective of cluster size.

The second contribution proposed a One-Class Classification (OCC) approach to tackle the data imbalance challenge in JIT-SDP models. OCC algorithms, such as One-class SVM, Isolation Forest, and One-class k-NN, perform better than binary classifiers in medium to high data imbalance ratios. They achieve mean AUCs of 83%, 81%, and 86% for IOF, OC-k-NN, and OC-SVM, respectively, and require fewer features, reducing computational overhead.

Lastly, the JITBoost is a framework that uses a Boolean Combination of Classifiers to construct robust JIT-SDP models. Three BCC algorithms are investigated. JITBoost achieves superior performance by combining decisions from six traditional machine learning and deep learning algorithms, with mean AUCs of 89%, 87%, and 88% for JITBoost-BBC, JITBoost-IBC, and JITBoost-WPIBC, respectively.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:	Thesis (PhD)
Authors:	Shehab, Mohammed
Institution:	Concordia University
Degree Name:	Ph. D.
Program:	Electrical and Computer Engineering
Date:	30 January 2024
Thesis Supervisor(s):	Hamou-Lhadj, Abdelwahab
Keywords:	Deep Learning Machine Learning Just-In-Time Software Defect Prediction Boolean Combination of Classifiers
ID Code:	993763
Deposited By:	Mohammed Shehab
Deposited On:	05 Jun 2024 15:27
Last Modified:	05 Jun 2024 15:27

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Techniques to Enhance Just-In-Time Software Defect Prediction Models

Techniques to Enhance Just-In-Time Software Defect Prediction Models

Abstract