Machine Learning for Fault Prediction in Clouds

Title:

Machine Learning for Fault Prediction in Clouds

Shayesteh, Behshid (2024) Machine Learning for Fault Prediction in Clouds. PhD thesis, Concordia University.

Preview

Text (application/pdf)
Shayesteh_PhD_F2024.pdf - Accepted Version
Available under License Spectrum Terms of Access.

4MB

Abstract

The vast adoption of cloud computing has increased the size and complexity of data centers, increasing possibility of faults. Fault can negatively impact the performance, availability, and reliability of cloud services, leading to significant maintenance cost and revenue loss for cloud service providers. Therefore, fault prediction in clouds is a critical task. Machine Learning (ML) is increasingly used for this purpose due to their pattern recognition capabilities. While predicting faults in clouds using ML enables a proactive approach to prevent faults, building accurate prediction models that can maintain their performance in dynamic clouds is challenging. One problem is concept drift, where changes in data distribution can degrade model performance. Similarly, feature drift, which is changes in feature relevancy, can also degrade the model performance. Additionally, models accuracy is influenced by data-related parameters, necessitating selection of these parameters to achieve a high model performance. Existing ML-based fault prediction solutions do not focus on adaptability to dynamic conditions like concept or feature drift. Additionally, selecting data-related parameters to balance model performance and resource consumption is not addressed in current literature.

This thesis mainly focuses on addressing the challenges of employing ML models for predicting faults and predicting application performance degradation caused by faults in cloud environments. We first propose a concept drift adaptation algorithm for fault prediction in clouds using Reinforcement Learning (RL). This algorithm considers the cloud operator's requirements, and uses RL to select the most appropriate drift adaptation method as well as data size for adaptation that fulfills the requirements. Second, we propose a feature drift adaptation solution for adapting the model to feature drifts while predicting application performance degradation in clouds. This solution consists of a feature drift detector that monitors the performance of the prediction model as well as the feature importance, and a feature drift adaptor that measures the drift severity to adapt the prediction model. Finally, we propose a multi-objective optimization algorithm to select the training data size, data sampling interval, input window, and prediction horizon for training an ML model that predicts application performance degradation in clouds.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:	Thesis (PhD)
Authors:	Shayesteh, Behshid
Institution:	Concordia University
Degree Name:	Ph. D.
Program:	Information and Systems Engineering
Date:	30 May 2024
Thesis Supervisor(s):	Glitho, Roch
ID Code:	994133
Deposited By:	Behshid Shayesteh
Deposited On:	24 Oct 2024 18:00
Last Modified:	24 Oct 2024 18:00

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Machine Learning for Fault Prediction in Clouds

Machine Learning for Fault Prediction in Clouds

Abstract