Login | Register

Automation And Intelligence In IT Operation Management: Machine Learning for Capacity Planning and Load Testing Optimization

Title:

Automation And Intelligence In IT Operation Management: Machine Learning for Capacity Planning and Load Testing Optimization

Vitui, Arthur Marius ORCID: https://orcid.org/0000-0002-9669-4603 (2025) Automation And Intelligence In IT Operation Management: Machine Learning for Capacity Planning and Load Testing Optimization. PhD thesis, Concordia University.

[thumbnail of Vitui_PhD_S2025.pdf]
Preview
Text (application/pdf)
Vitui_PhD_S2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.
2MB

Abstract

The increasing complexity and scale of modern IT infrastructures necessitate innovative strategies to maintain efficiency, reliability, and cost effectiveness. Large scale industrial systems require precise capacity planning to manage fluctuating demands, prevent downtime, and operate within optimal cost parameters. However, traditional capacity planning methods often fall short in today’s dynamic environments. This dissertation introduces an agentic approach to AIOps (Artificial Intelligence for IT Operations) aimed at enhancing the maintenance and operational stability of large scale systems. Effective capacity planning is essential for stable system operations. Over provisioning leads to resource waste, while under provisioning can cause failures and diminished performance. By utilizing load testing data and advanced machine learning (ML) models, we propose a blueprint process that optimizes system capacity planning. Integrating ML into this process enhances predictive capabilities, enabling proactive resource scaling, reducing costs, and increasing system resilience. A significant challenge in optimizing this process is the inefficiency and time consuming nature of traditional load testing. Existing methodologies often require substantial manual effort and considerable time to simulate large scale workloads. To address this, we propose a framework that streamlines load testing through automation and early stopping rules based on spike detection techniques for system Key Performance Indicators (KPIs). By leveraging a system’s ability to predict KPI spikes, we can dynamically adjust capacity as needed. We aim to integrate these processes into tools utilized by LLM (Large Language Model) agents within an AIOps system. These tools will act as intermediaries for monitoring and maintaining large scale systems. This integration will establish a fully managed architecture, where AIOps agents enhance the IT operations team’s ability to perform proactive maintenance, respond to new incidents, autonomously monitor system health, predict potential issues, and implement proactive measures to maintain optimal performance. This dissertation presents a novel approach to enhancing efficiency in large scale systems by combining automation and load testing improvements with machine learning and LLM agents. By developing a comprehensive, scalable framework, this research seeks to reduce operational overhead and establish a new standard for IT system management and load testing practices within the Software Development Life Cycle (SDLC) in industrial settings.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (PhD)
Authors:Vitui, Arthur Marius
Institution:Concordia University
Degree Name:Ph. D.
Program:Computer Science
Date:27 February 2025
Thesis Supervisor(s):Chen, Tse-Hsun
ID Code:995480
Deposited By: Arthur Marius Vitui
Deposited On:17 Jun 2025 14:57
Last Modified:17 Jun 2025 14:57
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top