Login | Register

Source Code Similarity and Clone Search


Source Code Similarity and Clone Search

Keivanloo, Iman (2013) Source Code Similarity and Clone Search. PhD thesis, Concordia University.

[thumbnail of Keivanloo_PhD_F2013.pdf]
Text (application/pdf)
Keivanloo_PhD_F2013.pdf - Accepted Version


Historically, clone detection as a research discipline has focused on devising source code similarity measurement and search solutions to cancel out effects of code reuse in software maintenance. However, it has also been observed that identifying duplications and similar programming patterns can be exploited for pragmatic reuse. Identifying such patterns requires a source code similarity model for detection of Type-1, 2, and 3 clones. Due to the lack of such a model, ad-hoc pattern detection models have been devised as part of state of the art solutions that support pragmatic reuse via code search.
In this dissertation, we propose a clone search model which is based on the clone detection principles and satisfies the fundamental requirements for supporting pragmatic reuse. Our research presents a clone search model that not only supports scalability, short response times, and Type-1, 2 and 3 detection, but also emphasizes the need for supporting ranking as a key functionality. Our model takes advantage of a multi-level (non-positional) indexing approach to achieve a scalable and fast retrieval with high recall. Result sets are ranked using two ranking approaches: Jaccard similarity coefficient and the cosine similarity (vector space model) which exploits the code patterns’ local and global frequencies. We also extend the model by adapting a form of semantic search to cover bytecode code. Finally, we demonstrate how the proposed clone search model can be applied for spotting working code examples in the context of pragmatic reuse. Further evidence of the applicability of the clone search model is provided through performance evaluation.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (PhD)
Authors:Keivanloo, Iman
Institution:Concordia University
Degree Name:Ph. D.
Program:Computer Science
Date:20 June 2013
Thesis Supervisor(s):Rilling, Juergen
ID Code:977472
Deposited On:13 Jan 2014 14:41
Last Modified:18 Jan 2018 17:44
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top