Login | Register

BinComp: A Stratified Approach to Compiler Provenance Attribution


BinComp: A Stratified Approach to Compiler Provenance Attribution

Rahimian, Ashkan, Shirani, Paria, Alrabaee, Saed, Wang, Lingyu and Debbabi, Mourad (2015) BinComp: A Stratified Approach to Compiler Provenance Attribution. Digital Investigation, 14 (1). pp. 146-155. ISSN 1742-2876

[thumbnail of BinComp A Stratified Approach to Compiler Provenance Attribution.pdf]
Text (application/pdf)
BinComp A Stratified Approach to Compiler Provenance Attribution.pdf - Published Version
Available under License Spectrum Terms of Access.

Official URL: http://www.sciencedirect.com/science/article/pii/S...


Compiler provenance encompasses numerous pieces of information, such as the compiler family, compiler version, optimization level, and compiler-related functions. The extraction of such information is imperative for various binary analysis applications, such as function fingerprinting, clone detection, and authorship attribution. It is thus important to develop an efficient and automated approach for extracting compiler provenance. In this study, we present BinComp, a practical approach which, analyzes the syntax, structure, and semantics of disassembled functions to extract compiler provenance. BinComp has a stratified architecture with three layers. The first layer applies a supervised compilation process to a set of known programs to model the default code transformation of compilers. The second layer employs an intersection process that disassembles functions across compiled binaries to extract statistical features (e.g., numerical values) from common compiler/linker-inserted functions. This layer labels the compiler-related functions. The third layer extracts semantic features from the labeled compiler-related functions to identify the compiler version and the optimization level. Our experimental results demonstrate that BinComp is efficient in terms of both computational resources and time.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science
Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Concordia University > Research Units > Computer Security Laboratory
Item Type:Article
Authors:Rahimian, Ashkan and Shirani, Paria and Alrabaee, Saed and Wang, Lingyu and Debbabi, Mourad
Journal or Publication:Digital Investigation
Date:August 2015
  • Software Fingerprinting
  • Authorship Analysis
  • Google
  • Computer Security Laboratory (CSL), ENCS, Concordia University
Digital Object Identifier (DOI):10.1016/j.diin.2015.05.015
Keywords:Compiler Provenance, Reverse Engineering, Binary Program Analysis, Digital Forensics, Program Analysis
ID Code:980325
Deposited On:31 Aug 2015 17:07
Last Modified:18 Jan 2018 17:51
Related URLs:


Alrabaee S, Saleem N, Preda S, Wang L, Debbabi M. OBA2: an onion approach to binary code authorship attribution. Digit Investig 2014: S94-103. Elsevier.

Alrabaee S, Shirani P, Wang L, Debbabi M. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digit Investigations 2015;12:S61-71.

Balakrishnan G, Reps T.Wysinwyx: what you see is not what you execute. ACM Trans Program Lang Syst (TOPLAS) 2010;32(6) [ACM].

Edler K, Franke T, Bhandarkar P, Dasgupta A. Exploiting function similarity for code size reduction. In: Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems; 2014. p. 85-94 [ACM].

Farhadi M, Fung B, Charland P, Debbabi M. BinClone: detecting code clones in malware. In: Software Security Reliability, 2014 Eighth International Conference on. IEEE; 2014. p. 78-87.

F. Farnstrom, J. Lewis, and C. Elkan, Scalability for clustering algorithms revisited, ACM SIGKDD Explor Newsl, Vol 21, 51-57.

Gascon H, Yamaguchi F, Arp D, Rieck K. Structural detection of android malware using embedded call graphs. In: Proceedings of the 2013 ACM workshop on Artificial intelligence and security; 2013. p. 45-54 [ACM].

Hamerly G, Elkan C. Learning the k in A > means. In: Advances in neural information processing systems16; 2004. p. 281.

IDA Pro multi-processor disassembler and debugger, Available from: https://www.hex-rays.com/products/ida/, [accessed 09.06.14].

Jacobson E, Rosenblum N, Miller B. Labeling library functions in stripped binaries. In: The 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (SIGSOFT '11); 2011. p. 1-8 [ACM].

Lindorfer M, Di Federico A, Maggi F, Comparetti PM, Zanero S. Lines of malicious code: insights into the malicious software industry. In: Proceedings of the 28th Annual Computer Security Applications Conference; 2012, December. p. 349-58 [ACM].

Rahimian A, Charland P, Preda S, Debbabi M. RESource: a framework for online matching of assembly with open source code. In: Foundations and Practice of Security (FPS 2013). Springer Berlin Heidelberg; 2013. p. 211-26.

Rosenblum N, Miller B, Zhu X. Extracting compiler provenance from program binaries. In: The 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering (SIGSOFT '10); 2010. p. 21e8. ACM.

Rosenblum N, Miller B, Zhu X. Recovering the toolchain provenance of binary code. In: The 2011 International Symposium on Software Testing and Analysis; 2011. p. 100-10 [ACM].

Rosenblum N, Zhu X, Miller B. Who wrote this code? Identifying the authors of program binaries. In: Computer security-ESORICS. Springer Berlin Heidelberg; 2011. p. 172e89.

Ruttenberg B, Miles C, Kellogg L, Notani V, Howard M, LeDoux C, et al. Identifying shared software components to support malware forensics. In: Detection of Intrusions and Malware, and Vulnerability Assessment. Springer International Publishing; 2014. p. 21-40.

Stojanovic S, Radivojevic Z, Cvetanovic M. Approach for estimating similarity between procedures in differently compiled binaries, information and software technology. Elseiver; 2014.

The data set. Available from: https://github.com/BinSigma/BinComp/tree/master/Dataset, [accessed 30.04.15]. The Google Code Jam. Available from: https://code.google.com/codejam, [accessed 27.10.14].

The PEiD tool. Available from: http://www.woodmann.com/collaborative/tools/index.php/PEiD, [accessed 14.08.14].

The RDG Packer Detector. Available from: http://www.woodmann.com/collaborative/tools/index.php/RDG_Packer_Detector, [accessed 14.08.14].

Toderici A, Stamp M. Chi-squared distance and metamorphic virus detection. J Comput Virol Hacking Tech 2013;9(0):1-14. Springer.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top