Jamali, Abdul Fareed (2025) A Framework for Scalable Dataset Generation and Deep Learning-Based IoT Device Identification: Redefining the Future Paradigm. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
20MBJamali_MASc_S2026.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
The rapid evolution of automation, ubiquitous connectivity, and remote management capabilities has driven a significant increase in the deployment of Internet of Things (IoT) devices across sectors such as industry, agriculture, transportation, smart cities, smart homes, and healthcare. Despite their growing prevalence, many IoT devices possess limited computational capabilities and rely on lightweight or weak cryptographic mechanisms, leaving them highly vulnerable to a wide range of cyberattacks. As a result, reliable device identification has become a fundamental requirement for enforcing appropriate security policies and strengthening the cybersecurity posture of modern IoT ecosystems.
This thesis addresses the problem of accurate and scalable device identification in smart home IoT environments. While prior research has predominantly explored traditional machine learning techniques and comparative model evaluations, the present study proposes a deep learning–based approach tailored specifically for the heterogeneous and rapidly expanding nature of smart home devices. The research objectives include designing a high-performance identification model, developing realistic and balanced dataset, and enabling incremental learning to support newly added devices without full model retraining.
A detailed examination of widely adopted public datasets reveals key limitations, notably complex network traffic capture setups, inconsistent annotation practices, and significant class imbalance, all of which hinder reliable model training and evaluation. To overcome these challenges and to support both the research community and smart home users collaboration, two traffic-capture frameworks, ScanIoT and DroidScour, were developed. These frameworks simplify device-specific traffic collection, annotation, and enable the generation of balanced, scalable datasets suitable for both device identification and behavioral analysis.
Recognizing the limitations of existing datasets, especially their laboratory-controlled environments, limited human interaction, and lack of diversity in network conditions, this thesis introduces the Concordia University IoT Device Identification Dataset (CU2025). CU2025 is a well-balanced, high-quality dataset that includes few instances of identical devices operating under real human interaction and varied network environments, making it suitable not only for device identification but also for detailed device behavior analysis.
To perform device identification, a Deep Feedforward Neural Network (DFNN) model is proposed and evaluated on two distinct datasets. The model demonstrates exceptional performance, achieving 99.9% accuracy for both device category and device type classification. To address the continuous integration of new devices in smart homes, an incremental learning strategy is introduced, enabling the model to incorporate additional device types with only marginal performance degradation. An analysis of incremental update sizes further illustrates their effect on accuracy and model stability.
To evaluate practical generalizability, the model’s performance is assessed using both conventional train–validation–test splits and a Leave-One-Subject-Out Cross-Validation (LOSO-CV), the latter simulating real-world scenarios involving unseen device instances. The LOSO-CV evaluation yields an accuracy of 98.40%, demonstrating strong robustness and real-world applicability.
Overall, this thesis contributes a comprehensive and scalable approach to smart home IoT device identification, introducing new dataset, traffic-capture frameworks, and a high-performing deep learning model. The findings offer substantial advancements toward practical and adaptive IoT environments capable of addressing the growing challenges of device heterogeneity and cybersecurity.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Jamali, Abdul Fareed |
| Institution: | Concordia University |
| Degree Name: | M.A. Sc. |
| Program: | Information Systems Security |
| Date: | 16 December 2025 |
| Thesis Supervisor(s): | Schiffauerova, Andrea |
| ID Code: | 996617 |
| Deposited By: | Abdul Fareed Jamali |
| Deposited On: | 29 Jun 2026 14:44 |
| Last Modified: | 29 Jun 2026 14:44 |
Repository Staff Only: item control page


Download Statistics
Download Statistics