Li, Ding (2023) An XAI-based Framework for Software Vulnerability Contributing Factors Assessment. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
11MBLi_MA_S2024.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Software vulnerability detection plays a proactive role in reducing risks to software security and reliability. Despite advancements in deep learning-based detection, a semantic gap persists between model-learned features and human-interpretable vulnerability semantics. The challenge lies in the absence of a systematic approach to assess feature importance, capable of explaining the relationship between these two elements. Explainable Artificial Intelligence (XAI) techniques become indispensable in offering comprehensive explanations of features learned by AI models, emphasizing their applicability in software vulnerability detection.
This research introduces an XAI-based framework to systematically evaluate XAI techniques and apply them for assessing the contributing factors of feature representations in classifying soft- ware code into Common Weakness Enumeration (CWE) types. The focus is on applying XAI methods to examine the importance of features underlying vulnerability detection. An additional challenge arises from the lack of a systematic evaluation to ensure consistent explanation results during the selection of state-of-the-art XAI methods.
To address this, this thesis defines three evaluation metrics for XAI: consistency, stability, and efficiency. A novel XAI method, named Mean-Centroid PredDiff, is introduced to strike a balance among these three metrics, significantly enhancing the framework’s efficacy. This method, along with SHAP, are integrated into the framework based on their well-performance across the evaluation in three domain case studies.
Findings from this work reveal that the proposed framework enables the summarization of the importance of 40 syntactic constructs and the similarities among 20 CWEs based on graph- embedded semantic features. The study results align closely with expert knowledge from the CWE community, achieving approximately 77.8% Top1, 89% Top5 similarity hit rates and mean average precision of 0.70 in CWE classification. The study validates the significance of attention values of transformer-based models in representing the importance of code tokens.
Overall, this thesis contributes a new XAI method to the open-source community, achieving a trade-off of efficiency with consistency and stability. In addition, the XAI-based framework success- fully assesses the nine meta syntactic constructs importance across 20 CWE types and evaluate their similarity. The dataset and the code of framework have been made publicly available on GitHub.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Li, Ding |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Electrical and Computer Engineering |
Date: | 27 September 2023 |
Thesis Supervisor(s): | Liu, Yan |
ID Code: | 993035 |
Deposited By: | Ding Li |
Deposited On: | 05 Jun 2024 15:19 |
Last Modified: | 05 Jun 2024 15:19 |
Repository Staff Only: item control page