Login | Register

Characterizing and Predicting Blocking Bugs in Open Source Projects

Title:

Characterizing and Predicting Blocking Bugs in Open Source Projects

Valdivia-Garcia, Harold, Shihab, Emad and Nagappan, Mei (2018) Characterizing and Predicting Blocking Bugs in Open Source Projects. Journal of Systems and Software . ISSN 01641212 (In Press)

[img]
Text (application/pdf)
Shihab-2018.pdf - Accepted Version
Restricted to Repository staff only until 6 March 2020.
Available under License Spectrum Terms of Access.
837kB

Official URL: http://dx.doi.org/10.1016/j.jss.2018.03.053

Abstract

Software engineering researchers have studied specific types of issues such reopened bugs, performance bugs, dormant bugs, etc. However, one special type of severe bugs is blocking bugs. Blocking bugs are software bugs that prevent other bugs from being fixed. These bugs may increase maintenance costs, reduce overall quality and delay the release of the software systems. In this paper, we study blocking bugs in eight open source projects and propose a model to predict them early on. We extract 14 different factors (from the bug repositories) that are made available within 24 hours after the initial submission of the bug reports. Then, we build decision trees to predict whether a bug will be a blocking bugs or not. Our results show that our prediction models achieve F-measures of 21%-54%, which is a two-fold improvement over the baseline predictors. We also analyze the fixes of these blocking bugs to understand their negative impact. We find that fixing blocking bugs requires more lines of code to be touched compared to non-blocking bugs. In addition, our file-level analysis shows that files affected by blocking bugs are more negatively impacted in terms of cohesion, coupling complexity and size than files affected by non-blocking bugs.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Article
Refereed:Yes
Authors:Valdivia-Garcia, Harold and Shihab, Emad and Nagappan, Mei
Journal or Publication:Journal of Systems and Software
Date:6 April 2018
Digital Object Identifier (DOI):10.1016/j.jss.2018.03.053
Keywords:Process Metrics; Code Metrics; Post-release Defects
ID Code:983707
Deposited By: MICHAEL BIRON
Deposited On:10 Apr 2018 19:39
Last Modified:10 Apr 2018 19:39

References:

G. Antoniol, K. Ayari, M.D. Penta, F. Khomh, Y.G. Guéhéneuc Is it a bug or an enhancement?: a text-based approach to classify change requests Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, ACM (2008), p. 23

Anvik, J., Hiew, L., Murphy, G. C.,. Who should fix this bug?In: Proceedings of the 28th International Conference on Software Engineering, pp. 361–370.

J. Anvik, G.C. Murphy Reducing the effort of bug report triage: Recommenders for development-oriented decisions ACM Trans. Softw. Eng. Methodol., 20 (3) (2011), pp. 10:1–10:35

V.R. Basili, L.C. Briand, W.L. Melo A validation of object-oriented design metrics as quality indicators IEEE Trans. Softw. Eng., 22 (10) (1996), pp. 751–761

N. Bettenburg, M. Nagappan, A.E. Hassan Think locally, act globally: Improving defect and effort prediction models Proceedings of the 9th IEEE Working Conference on Mining Software Repositories, IEEE Press (2012), pp. 60–69

N. Bettenburg, R. Premraj, T. Zimmermann, S. Kim Duplicate bug reports considered harmful really Software Maintenance, 2008. ICSM 2008. IEEE International Conference on (2008), pp. 337–345

P. Bhattacharya, I. Neamtiu Bug-fix time prediction models: Can we do better? Proceedings of the 8th Working Conference on Mining Software Repositories, ACM (2011), pp. 207–210

C. Bird, N. Nagappan, H. Gall, B. Murphy, P. Devanbu Putting it all together: Using socio-technical networks to predict failures Software Reliability Engineering, 2009. ISSRE’09. 20th International Symposium on, IEEE (2009), pp. 109–119

L. Breiman Random forests Machine learning, 45 (1) (2001), pp. 5–32

R. Caruana, A. Niculescu-Mizil An empirical comparison of supervised learning algorithms Proceedings of the 23rd International Conference on Machine Learning, ACM (2006), pp. 161–168

S.R. Chidamber, D.P. Darcy, C.F. Kemerer Managerial use of metrics for object-oriented software: An exploratory analysis Software Engineering, IEEE Transactions on, 24 (8) (1998), pp. 629–639

I. Chowdhury, M. Zulkernine Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities Journal of Systems Architecture, 57 (3) (2011), pp. 294–313

D. Cubranic, G.C. Murphy Automatic bug triage using text categorization SEKE 2004: Proceedings of the Sixteenth International Conference on Software Engineering and Knowledge Engineering, KSI Press (2004), pp. 92–97

M. D’Ambros, M. Lanza, R. Robbes On the relationship between change coupling and software defects Working Conference on Reverse Engineering (2009), pp. 135–144

M. D’Ambros, M. Lanza, R. Robbes An extensive comparison of bug prediction approaches 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), IEEE (2010), pp. 31–41

M. DAmbros, M. Lanza, R. Robbes Evaluating defect prediction approaches: a benchmark and an extensive comparison Empirical Software Engineering, 17 (4-5) (2012), pp. 531–577

B. Efro Estimating the error rate of a prediction rule: improvement on cross-validation Journal of the American Statistical Association, 78 (382) (1983), pp. 316–331

L. Erlikh Leveraging legacy system dollars for e-business IT Professional, 2 (3) (2000), pp. 17–23

E. Giger, M. Pinzger, H. Gall Predicting the fix time of bugs Proceedings of the 2Nd International Workshop on Recommendation Systems for Software Engineering, ACM (2010), pp. 52–56

E. Giger, M. Pinzger, H.C. Gall Comparing fine-grained source code changes and code churn for bug prediction Proceeding of the 8th working conference on Mining software repositories - MSR ’11, ACM Press (2011), p. 83

Graham, P.,. A plan for spam. Available on:
http://paulgraham.com/spam.html(Aug. 2003).

T.L. Graves, A.F. Karr, J.S. Marron, H. Siy Predicting fault incidence using software change history IEEE Transactions of Software Engineering, 26 (7) (2000), pp. 653–661

T. Gyimothy, R. Ferenc, I. Siket Empirical validation of object-oriented metrics on open source software for fault prediction IEEE Trans. Softw. Eng., 31 (10) (2005), pp. 897–910

A.E. Hassan, K. Zhang Using decision trees to predict the certification result of a build Automated Software Engineering, 2006. ASE’06. 21st IEEE/ACM International Conference on, IEEE (2006), pp. 189–198

J.V. Hulse, T.M. Khoshgoftaar, A. Napolitano Experimental perspectives on learning from imbalanced data Proceedings of the 24th International Conference on Machine Learning, ACM (2007), pp. 935–942

W. Ibrahim, N. Bettenburg, E. Shihab, B. Adams, A. Hassan Should i contribute to this discussion? Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on (2010), pp. 181–190

N. Jalbert, W. Weimer Automated duplicate detection for bug tracking systems Dependable Systems and Networks With FTCS and DCC, 2008. DSN 2008. IEEE International Conference on (2008), pp. 52–61

Y. Kamei, S. Matsumoto, A. Monden, K.i. Matsumoto, B. Adams, A.E. Hassan Revisiting common bug prediction findings using effort-aware models 2010 IEEE International Conference on Software Maintenance, IEEE (2010), pp. 1–10

Y. Kamei, E. Shihab, B. Adams, A.E. Hassan, A. Mockus, A. Sinha, N. Ubayashi A large-scale empirical study of just-in-time quality assurance, software engineering IEEE Transactions on, 39 (6) (2013), pp. 757–773

T.M. Khoshgoftaar, N. Seliya Comparative assessment of software quality classification techniques: An empirical case study Empirical Software Engineering, 9 (3) (2004), pp. 229–257

D. Kim, Y. Tao, S. Kim, A. Zeller Where should we fix this bug? a two-phase recommendation model Software Engineering, IEEE Transactions on, 39 (11) (2013), pp. 1597–1610

S. Kim, H. Zhang, R. Wu, L. Gong Dealing with noise in defect prediction Software Engineering (ICSE), 2011 33rd International Conference on, IEEE (2011), pp. 481–490

A. Lamkanfi, S. Demeyer, E. Giger, B. Goethals Predicting the severity of a reported bug Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on (2010), pp. 1–10

A. Lamkanfi, S. Demeyer, Q. Soetens, T. Verdonck Comparing mining algorithms for predicting the severity of a reported bug Software Maintenance and Reengineering (CSMR), 2011 15th European Conference on (2011), pp. 249–258

Ma, W., Chen, L., Yang, Y., Zhou, Y., Xu, B.,. Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology.

L. Marks, Y. Zou, A.E. Hassan Studying the fix-time for bugs in large open source projects Proceedings of the 7th International Conference on Predictive Models in Software Engineering, ACM (2011), pp. 11:1–11:8

T. Mende, R. Koschke Revisiting the evaluation of defect prediction models Proceedings of the 5th International Conference on Predictor Models in Software Engineering, ACM (2009), p. 7

T. Menzies, A. Marcus Automated severity assessment of software defect reports Software Maintenance, 2008. ICSM 2008. IEEE International Conference on (2008), pp. 346–355

M.C. Monard, G. Batista Learning with skewed class distributions, advances in logic Artificial Intelligence and Robotics (2002), pp. 173–180

R. Moser, W. Pedrycz, G. Succi A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction ICSE ’08: Proceedings of the 30th international conference on Software engineering (2008), pp. 181–190

A.T. Nguyen, T.T. Nguyen, J. Al-Kofahi, H.V. Nguyen, T.N. Nguyen A topic-based approach for narrowing the search space of buggy files from a bug report Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on, IEEE (2011), pp. 263–272

A.T. Nguyen, T.T. Nguyen, H.A. Nguyen, T.N. Nguyen Multi-layered approach for recovering links between bug reports and fixes Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE ’12, ACM, New York, NY, USA (2012), pp. 63:1–63:11

L.D. Panjer Predicting eclipse bug lifetimes Mining Software Repositories, 2007. ICSE Workshops MSR ’07. Fourth International Workshop on (2007), p. 29

R. Premraj, K. Herzig Network versus code metrics to predict defects: A replication study 2011 International Symposium on Empirical Software Engineering and Measurement, IEEE (2011), pp. 215–224

J.R. Quinlan C4.5: programs for machine learning Morgan Kaufmann Publishers Inc. (1993)

F. Rahman, D. Posnett, P. Devanbu Recalling the “imprecision” of cross-project defect prediction Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE ’12, ACM Press (2012), p. 1

F. Rahman, D. Posnett, I. Herraiz, P. Devanbu Sample size vs. bias in defect prediction Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2013, ACM Press (2013), p. 147

M.M. Rahman, G. Ruhe, T. Zimmermann Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society (2009), pp. 439–442

P. Runeson, M. Alexandersson, O. Nyholm Detection of duplicate defect reports using natural language processing Software Engineering, 2007. ICSE 2007. 29th International Conference on (2007), pp. 499–510

M. Sharma, P. Bedi, K. Chaturvedi, V. Singh Predicting the priority of a reported bug using machine learning techniques and cross project validation Intelligent Systems Design and Applications (ISDA), 2012 12th International Conference on (2012), pp. 539–545

E. Shihab, A. Ihara, Y. Kamei, W. Ibrahim, M. Ohira, B. Adams, A. Hassan, K.i. Matsumoto Studying re-opened bugs in open source software Empirical Software Engineering, 18 (5) (2013), pp. 1005–1042

R. Subramanyam, M.S. Krishnan Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects IEEE Trans. Softw. Eng., 29 (4) (2003), pp. 297–310

C. Sun, D. Lo, S.C. Khoo, J. Jiang Towards more accurate retrieval of duplicate bug reports Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on (2011), pp. 253–262

L. Tan, C. Liu, Z. Li, X. Wang, Y. Zhou, C. Zhai Bug characteristics in open source software Empirical Software Engineering, 19 (6) (2014), pp. 1665–1705

C. Tantithamthavorn, S. McIntosh, A.E. Hassan, K. Matsumoto An empirical comparison of model validation techniques for defect prediction models IEEE Transactions on Software Engineering, 43 (1) (2017), pp. 1–18

Tassey, G., 2002. The economic impacts of inadequate infrastructure for software testing. Tech. rep.

Valdivia-Garcia, H.,. Characterizing and prediction blocking bugs in open source projects appendix. https://github.com/harold-valdivia-garcia/blocking-bugs/blob/master/jss-appx.pdf.


H. Valdivia-Garcia, E. Shihab Characterizing and predicting blocking bugs in open source projects Proceedings of the 11th Working Conference on Mining Software Repositories, ACM (2014), pp. 72–81


X. Wang, L. Zhang, T. Xie, J. Anvik, J. Sun An approach to detecting duplicate bug reports using natural language and execution information Software Engineering, 2008. ICSE ’08. ACM/IEEE 30th International Conference on (2008), pp. 461–470

C. Weiss, R. Premraj, T. Zimmermann, A. Zeller How long will it take to fix this bug? Proceedings of the Fourth International Workshop on Mining Software Repositories, IEEE Computer Society (2007), p. 1

G.M. Weiss Mining with rarity: A unifying framework SIGKDD Explor. Newsl., 6 (1) (2004), pp. 7–19

D.H. Wolpert Stacked generalization Neural networks, 5 (2) (1992), pp. 241–259

R. Wu, H. Zhang, S. Kim, S.C. Cheung Relink: recovering links between bugs and changes Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, ACM (2011), pp. 15–25

X. Xia, D. Lo, X. Wang, X. Yang, S. Li, J. Sun A comparative study of supervised learning algorithms for re-opened bug prediction Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on (2013), pp. 331–334

S. Zaman, B. Adams, A.E. Hassan A qualitative study on performance bugs Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on, IEEE (2012), pp. 199–208

J. Zhou, H. Zhang, D. Lo Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports Software Engineering (ICSE), 2012 34th International Conference on, IEEE (2012), pp. 14–24

T. Zimmermann, N. Nagappan Predicting defects using network analysis on dependency graphs Proceedings of the 30th International Conference on Software Engineering (2008), pp. 531–540

T. Zimmermann, N. Nagappan, P.J. Guo, B. Murphy Characterizing and predicting which bugs get reopened Proceedings of the 2012 International Conference on Software Engineering (2012), pp. 1074–1083

W. Zou, Y. Hu, J. Xuan, H. Jiang Towards training set reduction for bug triage Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications Conference, IEEE Computer Society (2011), pp. 576–581
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top