Ali, Nasir, Cai, Haipeng, Hamou-Lhadj, Abdelwahab and Hassine, Jameleddine (2018) Exploiting Parts-of-Speech for Effective Automated Requirements Traceability. Information and Software Technology . ISSN 09505849 (In Press)
Preview |
Text (In press, accepted manuscript) (application/pdf)
647kBExploiting-Parts-of-Speech-for-Effective-Automat_2018_Information-and-Softwa.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Official URL: http://dx.doi.org/10.1016/j.infsof.2018.09.009
Abstract
Context: Requirement traceability (RT) is defined as the ability to describe and follow the life of a requirement. RT helps developers ensure that relevant requirements are implemented and that the source code is consistent with its requirement with respect to a set of traceability links called trace links. Previous work leverages Parts Of Speech (POS) tagging of software artifacts to recover trace links among them. These studies work on the premise that discarding one or more POS tags results in an improved accuracy of Information Retrieval (IR) techniques. Objective: First, we show empirically that excluding one or more POS tags could negatively impact the accuracy of existing IR-based traceability approaches, namely the Vector Space Model (VSM) and the Jensen Shannon Model (JSM). Second, we propose a method that improves the accuracy of IR-based traceability approaches. Method: We developed an approach, called ConPOS, to recover trace links using constraint-based pruning. ConPOS uses major POS categories and applies constraints to the recovered trace links for pruning as a filtering process to significantly improve the effectiveness of IR-based techniques. We conducted an experiment to provide evidence that removing POSs does not improve the accuracy of IR techniques. Furthermore, we conducted two empirical studies to evaluate the effectiveness of ConPOS in recovering trace links compared to existing peer RT approaches. Results: The results of the first empirical study show that removing one or more POS negatively impacts the accuracy of VSM and JSM. Furthermore, the results from the other empirical studies show that ConPOS provides 11%-107%, 8%-64%, and 15%-170% higher precision, recall, and mean average precision (MAP) than VSM and JSM. Conclusion: We showed that ConPosout
performs existing IR-based RT approaches that discard some POS tags from the input documents.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Article |
Refereed: | Yes |
Authors: | Ali, Nasir and Cai, Haipeng and Hamou-Lhadj, Abdelwahab and Hassine, Jameleddine |
Journal or Publication: | Information and Software Technology |
Date: | 27 September 2018 |
Digital Object Identifier (DOI): | 10.1016/j.infsof.2018.09.009 |
Keywords: | Requirements Traceability (RT); Parts of Speech (POS); Information Retrieval (IR); Trace links |
ID Code: | 984566 |
Deposited By: | Monique Lane |
Deposited On: | 04 Oct 2018 18:32 |
Last Modified: | 27 Sep 2020 00:00 |
References:
O.C.Z. Gotel, C.W. Finkelstein An analysis of the requirements traceability problem1st International Conference on Requirements Engineering (1994), pp. 94-101
J. Cleland-Huang, M. Heimdahl, J.H. Hayes, R. Lutz, P. Maeder Trace queries for safety requirements in high assurance systems Requirements Engineering: Foundation for Software Quality, Springer (2012), pp. 179-193
J. Hill, S. Tilley Creating safety requirements traceability for assuring and recertifying legacy safety-critical systems 18th IEEE International Requirements Engineering Conference (RE), IEEE (2010), pp. 297-302
O. Gotel, J. Cleland-
Huang, J.H. Hayes, A. Zisman, A. Egyed, P. Grünbacher, A. Dekhtyar, G. Antoniol, J.Maletic, P. Mäder Traceability fundamentals J. Cleland-Huang, O. Gotel, A. Zisman (Eds.), Software and Systems Traceability, Springer London, London (2012), pp. 3-22
T.C. Lethbridge, J. Singer, A. Forward How software engineers use documentation: The state of the practice IEEE software, 20 (6) (2003), pp. 35-39
T. Gorschek, M. Svahnberg Requirements experience in practice: Studies of six companies
Engineering and Managing Software Requirements, Springer (2005), pp. 405-426
G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, E. Merlo Recovering traceability links between code and documentation IEEE Transactions on Software Engineering, 28 (10) (2002), pp. 970-983
N. Ali, Y.-G. Guéhéneuc, G. Antoniol Trustrace: Mining software repositories to improve the accuracy of requirement traceability links IEEE Transactions on Software Engineering, 39 (5) (2013), pp. 725-741
N. Ali, Y.-G. Gueheneuc, G. Antoniol Requirements traceability for object oriented systems by partitioning source code Proceedings of the 2011 18th Working Conference on Reverse Engineering, WCRE ’11, IEEE Computer Society, Washington, DC, USA (2011), pp. 45-54
G. Capobianco, A. De Lucia, R. Oliveto, A. Panichella, S. Panichella On the role of the nouns in ir-based traceability recovery 17th IEEE International Conference on Program Comprehension (ICPC’09), IEEE (2009), pp. 148-157
G. Capobianco, A.D. Lucia, R. Oliveto, A. Panichella, S. Panichella Improving ir-based traceability recovery via noun-based indexing of software artifacts Journal of Software: Evolution and Process, 25 (7) (2013), pp. 743-762
S. Zamani, S.P. Lee, R. Shokripour, J. Anvik A noun-based approach to feature location using time-aware term-weighting Information and Software Technology, 56 (8) (2014), pp. 991-1011
Java language and virtual machine specifications, (https://docs.oracle.com/javase/specs/)Last accessed: June2018.
Z.P. Fry, D. Shepherd, E. Hill, L. Pollock, K. Vijay-Shanker Analysing source code: looking for useful verb–direct object pairs in all the right places IET software, 2 (1) (2008), pp. 27-36
T. Hoff, C Coding Standard, 2008, (https://users.ece.cmu.edu/~eno/coding/CCodingStandard.html). Last accesses: June 2018.
N. Ali, Z. Sharafi, Y.-G. Guéhéneuc, G. Antoniol An empirical study on requirements traceability using eye-tracking 28th IEEE International Conference on Software Maintenance (ICSM) (2012), pp. 191-200
S. Gupta, S. Malik, L. Pollock, K. Vijay-Shanker Part-of-speech tagging of program identifiers for improved text-based software engineering tools 21st IEEE International Conference on Program Comprehension (ICPC) (2013), pp. 3-12
J. Giménez, L. Marquez, SVMTool: A general pos tagger generator based on support vector machines, in: In Proceedings of the 4th International Conference on Language Resources and Evaluation. 43–46.
A. Abadi, M. Nisenson, Y. Simionovici A traceability technique for specifications
The 16th IEEE International Conference on Program Comprehension (ICPC 2008) (2008), pp. 103-112
M. Borg, P. Runeson, A. Ardö Recovering from a decade: A systematic mapping of information retrieval approaches to software traceability Empirical Softw. Engg., 19 (6) (2014), pp. 1565-1616
A. Marcus, J.I. Maletic Recovering documentation-to-source-code traceability links using latent semantic indexing Proceedings of 25th International Conference on Software Engineering, IEEE CS Press, Portland Oregon USA (2003), pp. 125-135
D. Poshyvanyk, Y.-G. Guéhéneuc, A. Marcus, G. Antoniol, V. Rajlich Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval
IEEE Transactions on Software Engineering, 33 (6) (2007), pp. 420-432
R. Oliveto, M. Gethers, D. Poshyvanyk, A. De Lucia On the equivalence of information retrieval methods for automated traceability link recoveryProceedings of the 2010 IEEE 18th International Conference on Program Comprehension, ICPC ’10, IEEE Computer Society, Washington, DC, USA (2010), pp. 68-71
M. Borg, P. Runeson Ir in software traceability: From a bird’s eye view 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (2013), pp. 243-246
M. Gethers, R. Oliveto, D. Poshyvanyk, A.D. Lucia On integrating orthogonal information retrieval methods to improve traceability recovery 27th IEEE International Conference on Software Maintenance (ICSM) (2011), pp. 133-142
S.-H. Cha Comprehensive survey on distance/similarity measures between probability density functions International Journal of Mathematical Models and Methods in Applied Sciences, 1 (4) (2007), pp. 300-307
H.U. Asuncion, A.U. Asuncion, R.N. Taylor Software traceability with topic modeling
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, ACM(2010), pp. 95-104
D. Falessi, G. Cantone, G. Canfora Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques
IEEE Transactions on Software Engineering, 39 (1) (2013), pp. 18-44
N. Ali, Y.-G. Guéhéneuc, G. Antoniol Factors impacting the inputs of traceability recovery approaches A. Zisman, J. Cleland-Huang, O. Gotel (Eds.), Software and Systems Traceability, Springer-Verlag, New York (2011)
W. Zhao, L. Zhang, Y. Liu, J. Sun, F. Yang Sniafl: Towards a static noninteractive approach to feature location ACM Trans. Softw. Eng. Methodol., 15 (2006), pp. 195-226
A.D. Lucia, F. Fasano, R. Oliveto, G. Tortora Recovering traceability links in software artifact management systems using information retrieval methods ACM Trans. Softw. Eng. Methodol., 16 (4) (2007), p. 13
G. Antoniol, B. Caprile, A. Potrich, P. Tonella Design-code traceability recovery: selecting the basic linkage properties Science of Computer Programming, 40 (2-3) (2001), pp. 213-234
G. Antoniol, B. Caprile, A. Potrich, P. Tonella Design-code traceability for object-oriented systems Annals of Software Engineering, 9 (1) (2000), pp. 35-58
C. McMillan, D. Poshyvanyk, M. Revelle Combining textual and structural analysis of software artifacts for traceability link recovery ICSE Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’09), IEEE (2009), pp. 41-48
M. Grechanik, K. McKinley, D. Perry Recovering and using use-case-diagram-to-source-code traceability links Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, ACM (2007), pp. 95-104
D. Diaz, G. Bavota, A. Marcus, R. Oliveto, S. Takahashi, A. De Lucia Using code ownership to improve ir-based traceability link recovery 21st IEEE International Conference on Program Comprehension (ICPC) (2013), pp. 123-132
R. Baeza-Yates, B. Ribeiro-Neto Modern Information Retrieval Addison-Wesley (1999)
B. Dit, L. Guerrouj, D. Poshyvanyk, G. Antoniol Can better identifier splitting techniques help feature location? 19th IEEE International Conference on Program Comprehension (ICPC), IEEE (2011), pp. 11-20
A. Chowdhury, M.C. McCabe Improving information retrieval systems using part of speech tagging Technical Report (1998)
G. Kowalski Information retrieval architecture and algorithms Springer-Verlag New York Inc (2010)
B. Erol, K. Berkner, S. Joshi Multimedia thumbnails for documents Proceedings of the 14th annual ACM international conference on Multimedia, MULTIMEDIA ’06, ACM, New York, NY, USA (2006), pp. 231-240
Y. Sun, P. He, Z. Chen An improved term weighting scheme for vector space model
Proceedings of 2004 International Conference on Machine Learning and Cybernetics, 3, IEEE (2004), pp. 1692-1695
X. Zou, R. Settimi, J. Cleland-Huang Phrasing in dynamic requirements trace retrieval
Computer Software and Applications Conference, 2006. COMPSAC’06. 30th Annual International, 1, IEEE (2006), pp. 265-272
L.H. Etzkorn, L.L. Bowen, C.G. Davis An approach to program understanding by natural language understanding Natural Language Engineering, 5 (3) (1999), pp. 219-236
S.L. Abebe, P. Tonella Natural language parsing of program element names for concept extraction Program Comprehension (ICPC), 2010 IEEE 18th International Conference on, IEEE (2010), pp. 156-159
R. Shokripour, J. Anvik, Z.M. Kasirun, S. Zamani A time-based approach to automatic bug report assignment J. Syst. Softw., 102 (C) (2015), pp. 109-122
G. Capobianco, A. De Lucia, R. Oliveto, A. Panichella, S. Panichella Traceability recovery using numerical analysis 16th Working Conference on Reverse Engineering (WCRE’09), IEEE (2009), pp. 195-204
E. Hill, D. Binkley, D. Lawrie, L. Pollock, K. Vijay-Shanker An empirical study of identifier splitting techniques Empirical Software Engineering, 19 (6) (2014), pp. 1754-1780
K. Toutanova, D. Klein, C.D. Manning, Y. Singer Feature-rich part-of-speech tagging with a cyclic dependency network Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, Association for Computational Linguistics (2003), pp. 173-180
M.F. Porter, An algorithm for suffix stripping(1997) 313–316.
[51]
D.A. Evans, C. Zhai Noun-phrase analysis in unrestricted text for information retrieval
Proceedings of the 34th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, Morristown, NJ, USA (1996), pp. 17-24
C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A. Wesslén Experimentation in Software Engineering: An Introduction Kluwer Academic Publishers, Norwell, MA, USA (2000)
N. Juristo, A.M. Moreno Basics of Software Engineering Experimentation (1st), Springer Publishing Company, Incorporated (2010)
[54]
B. Kitchenham, S.L. Pfleeger, L. Pickard, P. Jones, D.C. Hoaglin, K.E. Emam, J. Rosenberg Preliminary guidelines for empirical research in software engineering IEEE Trans. Software Eng., 28 (8) (2002), pp. 721-734
A. Jedlitschka, D. Pfahl Reporting guidelines for controlled experiments in software engineering 2005 International Symposium on Empirical Software Engineering (ISESE 2005), 17-18 November 2005, Noosa Heads, Australia (2005), pp. 95-104
N. Ali, W. Wu, G. Antoniol, M.D. Penta, Y.-G. Guéhéneuc, J.H. Hayes A Novel Process and its Implementation for the Multi-objective Miniaturization of Software Technical Report, Ecole Polytechnique de Montreal (2010)
Technical Report
N. Ali, Y. Gueneuc, G. Antoniol Trustrace: Mining software repositories to improve the accuracy of requirement traceability links Software Engineering, IEEE Transactions on, 39 (5) (2013), pp. 725-741
J.H. Hayes, G. Antoniol, Y.-G. Guéhéneuc Prereqir: Recovering pre-requirements via cluster analysis Reverse Engineering, 2008. WCRE ’08. 15th Working Conference on (2008), pp. 165-174
E.M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness Information processing & management, 36 (5) (2000), pp. 697-716
M.D. Smucker, J. Allan, B. Carterette A comparison of statistical significance tests for information retrieval evaluation Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, ACM (2007), pp. 623-632
A. Mahmoud, N. Niu On the role of semantics in automated requirements tracing Requirements Engineering (2014), pp. 1-20
[62]
W. Zhao, L. Zhang, Y. Liu, J. Sun, F. Yang Sniafl: Towards a static noninteractive approach to feature location ACM Transactions on Software Engineering and Methodology (TOSEM), 15 (2) (2006), pp. 195-226
entagrec.
Z.P. Fry, D. Shepherd, E. Hill, L. Pollock, K. Vijay-Shanker Analysing source code: looking for useful verb–direct object pairs in all the right places IET software, 2 (1) (2008), pp. 27-36
C. Lioma, R. Blanco Part of speech based term weighting for information retrieval
Advances in information retrieval, Springer (2009), pp. 412-423
A. Mahmoud, N. Niu Using semantics-enabled information retrieval in requirements tracing: An ongoing experimental investigation 34th IEEE Annual Computer Software and Applications Conference (COMPSAC), IEEE (2010), pp. 246-247
Repository Staff Only: item control page