da S. Maldonado, Everton (2016) Identifying Self-Admitted Technical Debt. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
445kBMaldonado_MASc_F2016.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Technical debt is a metaphor coined to express the trade off between productivity and quality, e.g., when developers take shortcuts or perform quick hacks during the development of software projects. These non optimal solutions are often implemented to allow the project to move faster in the short term, at the cost of increased maintenance in the future. The accumulation of technical debt during the ever changing life-cycle of a project is unavoidable, and if not properly managed can severely hinder the development of the project. To help alleviate the impact of technical debt, a number of studies focused on the detection of technical debt. However, a recent study has shown that one possible source to detect technical debt is using source code comments, also referred to as self-admitted technical debt. Therefore, in this dissertation we use empirical studies and NLP techniques to propose an approach to automatically identify self-admitted technical debt.
First, we examine source code comments to determine the different types of technical debt, and we propose four simple filtering heuristics to eliminate comments that are not likely to contain technical debt. Then, we read through more than 33K comments, and we find that self-admitted technical debt can be classified into five main types - design debt, defect debt, documentation debt, requirement debt and test debt. In addition, two most common types of self-admitted technical debt are design and requirement debt, making up between 42% to 84% and 5% to 45% of the classified comments, respectively.
Second, we leverage the knowledge obtained in our first study to present an approach to automatically identify design and requirement self-admitted technical debt using Natural Language Processing (NLP). We study 10 open source projects: Ant, ArgoUML, Columba, EMF, Hibernate, JEdit, JFreeChart, Jmeter, JRuby and SQuirrel SQL and find that 1) we are able to effectively identify self-admitted technical debt, significantly outperforming state-of-the-art techniques; 2) that words related to sloppy or mediocre source code are the best indicators of design debt, whereas for requirement debt, words related to enhancing or completing tasks are the best indicators; and 3) we can achieve 90% of the best classification performance, using as little as 23% of the comments for both design and requirement self-admitted technical debt, and 80% of the best performance, using as little as 9% and 5% of the comments for design and requirement self-admitted technical debt, respectively.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | da S. Maldonado, Everton |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Computer Science and Software Engineering |
Date: | 15 September 2016 |
Thesis Supervisor(s): | Shihab, Emad |
Keywords: | Technical debt, Source code comments, Natural Language Processing, Empirical Study |
ID Code: | 981870 |
Deposited By: | Everton da Silva Maldonado |
Deposited On: | 08 Nov 2016 16:12 |
Last Modified: | 18 Jan 2018 17:54 |
Repository Staff Only: item control page