Munkhjargal, Misheelt (2024) Bytecode Similarity Detection for Obfuscated Java Android Applications. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
789kBMunkhjargal_MCompSc_F2024.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Code similarity detection has many practical applications, such as intellectual property protection, vulnerability search, and malware detection. However, existing approaches typically focus on the source code, while many third-party libraries are released in bytecode format. Hence, developers may unknowingly use third-party libraries without knowing possible license violations or vulnerabilities. In this thesis, we introduce a deep learning approach, ByClone, to detect source code clones based on Java bytecode. We collect source-code level clone data for bytecode in 140 Android applications to conduct the experiments. We find that ByClone is effective in detecting code clones based on bytecode, with a precision and recall of 78.37 and 75.24. After obfuscating the bytecode, ByClone still has a precision and recall of 82.55 and 70.95, highlighting the potential of ByClone. Finally, we find that ByClone is not sensitive to different obfuscation options. Our study highlights the potential of clone detection based on bytecode. We also release the data for future research in this direction.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Munkhjargal, Misheelt |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science |
Date: | 23 May 2024 |
Thesis Supervisor(s): | Chen, Tse-Hsun |
ID Code: | 993936 |
Deposited By: | Misheelt Munkhjargal |
Deposited On: | 24 Oct 2024 16:23 |
Last Modified: | 24 Oct 2024 16:23 |
Repository Staff Only: item control page