Login | Register

Data Labeling for Fault Detection in Cloud: A Test Suite-Based Active Learning Approach

Title:

Data Labeling for Fault Detection in Cloud: A Test Suite-Based Active Learning Approach

Bagora, Prateek (2023) Data Labeling for Fault Detection in Cloud: A Test Suite-Based Active Learning Approach. Masters thesis, Concordia University.

[thumbnail of Bagora_MCS_F2023.pdf]
Preview
Text (application/pdf)
Bagora_MCS_F2023.pdf - Accepted Version
Available under License Spectrum Terms of Access.
2MB

Abstract

Cloud computing enables ubiquitous on-demand network access to a shared pool of configurable computing resources with minimal management efforts from the user. It has evolved as a key computing paradigm to enable a wide variety of applications such as e-commerce, social networks, high-performance computing, mission-critical applications, and Internet of Things (IoT). Ensuring the quality of service of applications deployed in inherently complex and fault-prone cloud environments is of utmost concern to service providers and end users. Machine learning-based fault management solutions enable proactive identification and mitigation of faults in cloud environments to attain the desired reliability, though they require labeled cloud metrics data for training and evaluation. Moreover, the high dynamicity in cloud environments brings forth emerging data distributions, which necessitate frequent labeling of cloud metrics data stemming from an evolving data distribution for model adaptation. In this thesis, we study the problem of data labeling for fault detection in cloud environments, paying close attention to the phenomenon of evolving cloud metric data distributions. More specifically, we propose a test suite-based active learning framework for automated labeling of cloud metrics data with the corresponding cloud system state while accounting for emerging fault patterns and data or concept drifts. We implemented our solution on a cloud testbed and introduced various emerging data distribution scenarios to evaluate the proposed framework's labeling efficacy over known and emerging data distributions. According to our evaluation results, the proposed framework achieves about 41% higher weighted F1-score and 34% higher average Area Under One-vs-Rest Receiver Operating Characteristic curves (OvR ROC AUC score) than a system without any adaptation for emerging data distributions.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Bagora, Prateek
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:24 May 2023
Thesis Supervisor(s):Glitho, Roch
Keywords:Automated Data Labeling, Fault Detection, Cloud Computing, Active Learning, Deep Learning
ID Code:992319
Deposited By: Prateek Bagora
Deposited On:14 Nov 2023 19:52
Last Modified:14 Nov 2023 19:52
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top