Nouh, Lina (2017) BinSign: Fingerprinting Binary Functions to Support Automated Analysis of Code Executables. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
5MBNouh_MASc_S2017.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Software reverse engineering is a complex process that incorporates different techniques involving static and dynamic analyses of software programs. Numerous tools are available that help reverse engineers in automating the dynamic analysis process. However, the process of static analysis remains a challenging and tedious process for reverse engineers. The static analysis process requires a great amount of manual work. Therefore, it is very demanding and time-consuming. One aspect of reverse engineering that provides reverse engineers with useful information regarding a statically analyzed piece of code is function fingerprinting. Binary code fingerprinting is a challenging problem that requires an in-depth analysis of internal binary code components for deriving identifiable and expressive signatures.
Binary code fingerprints are helpful in the reverse engineering process and have various security applications such as malware variant detection, malware clustering, binary auditing, function recognition, and library identification. Moreover, binary code fingerprinting is also useful in automating some reverse engineering tasks such as clone detection, library function identification, code similarity, authorship attribution, etc. In addition, code fingerprints are valuable in cyber forensics as well as the process of patch analysis in order to identify patches or make sure that the patch complies with the security requirements.
In this thesis, we propose a binary function fingerprinting and matching approach and implement a tool named BinSign based on the proposed approach that enhances and accelerates the reverse engineering process. The main objective of BinSign is to provide an accurate and scalable solution to binary code fingerprinting by computing and matching structural and syntactic code profiles for disassemblies while outperforming existing techniques. The structural profile of binary code is captured through decomposing the control-flow-graph of a function into tracelets. We describe the underlying methodology and evaluate its performance in several use cases, including function matching, function reuse, library function detection, malware analysis, and function indexing scalability. We also provide some insights into the effects of different optimization levels and obfuscation techniques on our fingerprint matching methodology. Additionally, we emphasize the scalability aspect of BinSign that is achieved through applying locality sensitive hashing, filtering techniques, and distributing the computations across several machines. The min-hashing process is combined with the banding technique of locality sensitive hashing in order to ensure a scalable and efficient fingerprint matching process. We perform our experiments on a database of 6 million functions that includes well-known libraries, malware samples, and some dynamic library files obtained from the Microsoft Windows operating system. The indexing process of fingerprints is distributed across multiple machines and it requires an average time of 0.0072 seconds per function. A comparison is also conducted with relevant existing tools, which shows that BinSign achieves a higher accuracy than these tools.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Nouh, Lina |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Information Systems Security |
Date: | January 2017 |
Thesis Supervisor(s): | Debbabi, Mourad and Hanna, Aiman |
ID Code: | 982206 |
Deposited By: | LINA ADNAN NOUH |
Deposited On: | 09 Jun 2017 14:35 |
Last Modified: | 18 Jan 2018 17:54 |
Repository Staff Only: item control page