Login | Register

Using ChatGPT to Augment Software Engineering Chatbots Datasets

Title:

Using ChatGPT to Augment Software Engineering Chatbots Datasets

Badran, Khaled (2023) Using ChatGPT to Augment Software Engineering Chatbots Datasets. Masters thesis, Concordia University.

[thumbnail of Badran_MASc_S2024.pdf]
Preview
Text (application/pdf)
Badran_MASc_S2024.pdf - Accepted Version
Available under License Spectrum Terms of Access.
852kB

Abstract

Chatbots are envisioned to bring about a significant shift in the realm of Software Engineering (SE), enabling practitioners to engage in conversations and interact with various services using natural language. At the heart of each chatbot is a Natural Language Understanding (NLU) component that enables the chatbots to comprehend the user's queries. However, the NLU requires extensive, high-quality training data (examples) to accurately interpret user queries. Prior work shows that the creation and augmentation of SE datasets are resource-intensive and time-consuming. To address this gap, we explore the potential of using ChatGPT to augment the SE chatbot training dataset. Specifically, we evaluate the impact of retraining the NLU on ChatGPT’s augmented dataset on the NLU's performance using four widely used SE datasets. Moreover, we assess the syntactic and semantic aspects of the generated examples compared to human-written examples. Additionally, we conduct an ablation study to investigate the impact of each component in the prompt on the NLU's performance and the diversity of the generated examples. The results show that ChatGPT significantly improves the NLU's performance, with F1-score improvements ranging from 3.9% to 11.6%. Moreover, we find that ChatGPT-generated examples exhibit syntactic diversity while maintaining consistent semantics (2.2% on average) across all datasets. Additionally, the results indicate that including a few human-written examples and a description of the intent’s objective in the prompt impacts the quality of the generated examples. Finally, we provide implications for practitioners and researchers of SE chatbots.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Badran, Khaled
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Software Engineering
Date:1 December 2023
Thesis Supervisor(s):Shihab, Emad
ID Code:993248
Deposited By: KHALED BADRAN
Deposited On:05 Jun 2024 16:57
Last Modified:05 Jun 2024 16:57
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top