Shabanfar, Mohammad Pasha (2025) Class Imbalance and Time-To-Detection in the Performance Analysis of Machine Learning-Based Intrusion Detection Systems. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
1MBShabanfar_MASc_F2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
The increasing reliance on Industrial Control Systems (ICS) and Supervisory Control and Data Acquisition (SCADA) systems has raised critical concerns regarding their vulnerability to cyberattacks. While machine learning (ML) methods have emerged as effective tools for detecting such intrusions, their real-world applicability is challenged by two major issues: the imbalance in cybersecurity datasets and the limited focus on the time required to detect attacks—referred to as Time-To-Detection (TTD). To address these two issues and suggest better practices for ML-based IDS researchers, this thesis examines the gaps in the literature and, through two respective case studies, aims to suggest practices toward a more precise and practical performance assessment of ML-based intrusion detection systems (IDS).
First, the thesis examines the class imbalance problem prevalent in popular Information Technology (IT) and Operational Technology (OT) cybersecurity datasets, where normal traffic often vastly outnumbers attack instances. This imbalance leads to biased model performance and inflated accuracy scores, which can over- or under-assess a model’s ability to identify the minority classes correctly. Through a case study with several ML models on a realistic dataset, we demonstrate how imbalanced classes should be considered in the performance evaluation, and how imbalance learning techniques like resampling should be properly utilized for robust performance of ML-based IDS.
Second, this thesis examines TTD, a crucial but often understressed performance indicator that measures how promptly an ML-based detection system identifies the onset of an attack. In addition to traditional metrics that focus solely on classification performance in ML communities, this thesis proposed a TTD model based on real-world responsiveness in OT systems by defining vari
ous stages of the detection process for ML researchers to quantify and measure temporal overheads accordingly. We also demonstrate how the proposed TTD model can be applied to OT datasets through a case study, thereby suggesting it as a best practice for a more comprehensive evaluation of ML-based intrusion detectors in practical OT use cases.
Through the two studies above, the thesis offers a more comprehensive and practical approach to evaluating ML-based IDS. It demonstrates how thoughtful consideration and integration of class imbalance and detection timeliness in the development and assessment of ML-based IDS representation is essential to deploy trustworthy and efficient cybersecurity solutions in critical infrastructure systems.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Shabanfar, Mohammad Pasha |
| Institution: | Concordia University |
| Degree Name: | M.A. Sc. |
| Program: | Information Systems Security |
| Date: | 17 August 2025 |
| Thesis Supervisor(s): | Yan, Jun and Ghafouri, Mohsen |
| ID Code: | 995937 |
| Deposited By: | Mohammad Pasha Shabanfar |
| Deposited On: | 04 Nov 2025 16:55 |
| Last Modified: | 04 Nov 2025 16:55 |
Repository Staff Only: item control page


Download Statistics
Download Statistics