Login | Register

An Industrial Study on Predicting Crash Report Log Types Using Large Language Models

Title:

An Industrial Study on Predicting Crash Report Log Types Using Large Language Models

Heba, Aburish (2023) An Industrial Study on Predicting Crash Report Log Types Using Large Language Models. Masters thesis, Concordia University.

[thumbnail of Aburish_MA_F2023.pdf]
Preview
Text (application/pdf)
Aburish_MA_F2023.pdf - Accepted Version
Available under License Spectrum Terms of Access.
946kB

Abstract

Software crashes and failures take a fair amount of effort and time to resolve. Software developers
use information submitted in crash reports (CRs) to conduct root cause analysis of faults. The
problem is that CRs often lack all the information required. Automatic prediction of CR fields can
therefore reduce the crash resolution process time. In this thesis, we use CR headings and
descriptions to predict the type of log files that should be attached to a CR. Our approach is to use
multilabel learning algorithms to train a machine learning model using a dataset from Ericsson’s
CR database to predict the type of log files based on CR headings and descriptions. We use three
different pre-trained language models Bert, Telecom Bert, and Word2Vector to extract feature
vectors from CR headings and descriptions and then feed these vectors to three different multilabel
learning algorithms, namely Binary Relevance (BR), Classifier Chain (CC), and Neural Network
(NN). Then, we compare the performance of different feature sets. We found that the use of
headings alone with pre-trained language models Bert and Telecom Bert results in the best average
AUC (0.70). The use of descriptions and headings and descriptions together as features resulted in
an average AUC varying from 0.65 to 0.70. In general, the algorithms showed no significant
difference in their performances, but the choice of features impacts the performance. Also, the
performance of predicting each type of log is influenced by the use of keywords in headings and
descriptions that describe these files. We found that log types with a clear definition such as Key
Performance Indicators (KPI) Logs, Post-mortem Dumps (PMD), and execution traces can be
predicted with higher accuracy.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:Thesis (Masters)
Authors:Heba, Aburish
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Electrical and Computer Engineering
Date:25 August 2023
Thesis Supervisor(s):Abdelwahab, Hamou-Lhadj
ID Code:992691
Deposited By: Heba Abu-Rish
Deposited On:15 Nov 2023 15:20
Last Modified:15 Nov 2023 15:20
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top