Islam, Md. Shariful (2020) On the Use of Software Tracing and Boolean Combination of Ensemble Classifiers to Support Software Reliability and Security Tasks. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
4MBIslam_PhD_S2021.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
In this thesis, we propose an approach that relies on Boolean combination of multiple one-class classification methods based on Hidden Markov Models (HMMs), which are pruned using weighted Kappa coefficient to select and combine accurate and diverse classifiers. Our approach, called WPIBC (Weighted Pruning Iterative Boolean Combination) works in three phases. The first phase selects a subset of the available base diverse soft classifiers by pruning all the redundant soft classifiers based on a weighted version of Cohen’s kappa measure of agreement. The second phase selects a subset of diverse and accurate crisp classifiers from the base soft classifiers (selected in Phase1) based on the unweighted kappa measure. The selected complementary crisp classifiers are then combined in the final phase using Boolean combinations. We apply the proposed approach to two important problems in software security and reliability: The detection of system anomalies and the prediction of the reassignment of bug report fields.
Detecting system anomalies at run-time is a critical component of system reliability and security. Studies in this area focus mainly on the effectiveness of the proposed approaches -the ability to detect anomalies with high accuracy. Less attention was given to false alarm and efficiency. Although ensemble approaches for the detection of anomalies that use Boolean combination of classifier decisions have been shown to be useful in reducing the false alarm rate over that of a single classifier, existing methods rely on an exponential number of combinations making them impractical even for a small number of classifiers. Our approach is not only able to maintain and even improve the accuracy of existing Boolean combination techniques, but also significantly reduce the combination time and the number of classifiers selected for combination.
The second application domain of our approach is the prediction of the reassignment of bug report fields. Bug reports contain a wealth of information that is used by triaging and development teams to understand the causes of bugs in order to provide fixes. The problem is that, for various reasons, it is common to have bug reports with missing or incorrect information, hindering the bug resolution process. To address this problem. researchers have turned to machine learning techniques. The common practice is to build models that leverage historical bug reports to automatically predict when a given bug report field should be reassigned. Existing approaches have mainly relied upon classifiers that make use of natural language in the title and description of the bug reports. They fail to take advantage of the richly detailed sequential information that is present in stack traces included in bug reports. To address this, we propose an approach called EnHMM which uses WPIBC and stack traces to predict the reassignment of bug report fields.
Another contribution of this thesis is an approach to improve the efficiency of WPIBC by leveraging the Hadoop framework and the MapReduce programming model. We also show how WPIBC can be extended to support heterogenous classifiers.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Islam, Md. Shariful |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Electrical and Computer Engineering |
Date: | 22 November 2020 |
Thesis Supervisor(s): | Hamou-Lhadj, Dr. Abdelwahab |
ID Code: | 988189 |
Deposited By: | MD SHARIFUL ISLAM |
Deposited On: | 29 Jun 2021 22:30 |
Last Modified: | 29 Jun 2021 22:30 |
Repository Staff Only: item control page