Login | Register

Binary Code Reuse Detection for Reverse Engineering and Malware Analysis


Binary Code Reuse Detection for Reverse Engineering and Malware Analysis

Huang, He (2015) Binary Code Reuse Detection for Reverse Engineering and Malware Analysis. Masters thesis, Concordia University.

[thumbnail of Huang_MASc_S2016.pdf]
Text (application/pdf)
Huang_MASc_S2016.pdf - Accepted Version
Available under License Spectrum Terms of Access.


Code reuse detection is a key technique in reverse engineering. However, existing source code similarity comparison techniques are not applicable to binary code. Moreover, compilers have made this problem even more difficult due to the fact that different assembly code and control flow structures can be generated by the compilers even when implementing the same functionality. To address this problem, we present a fuzzy matching approach to compare two functions. We first obtain our initial mapping between basic blocks by leveraging the concept of longest common subsequence on the basic block level and execution path level. Then, we extend the achieved mapping using neighborhood exploration. To make our approach applicable to large data sets, we designed an effective filtering process using Minhashing and locality-sensitive hashing.
Based on the approach proposed in this thesis, we implemented a tool named BinSequence. We conducted extensive experiments to test BinSequence in terms of performance, accuracy, and scalability. Our results suggest that, given a large assembly code repository with millions of functions, BinSequence is efficient and can attain high quality similarity ranking of assembly functions with an accuracy above 90% within seconds.
We also present several practical use cases including patch analysis, malware analysis, and bug search. In the use case for patch analysis, we utilized BinSequence to compare the unpatched and patched versions of the same binary, to reveal the vulnerability information and the details of the patch. For this use case, a Windows system driver (HTTP.sys) which contains a recently published critical vulnerability is used. For the malware analysis use case, we utilized BinSequence to identify reused components or already analyzed parts in malware so that the human analyst can focus on those new functionality to save time and effort. In this use case, two infamous malware, Zeus and Citadel, are analyzed. Finally, in the bug search use case, we utilized BinSequence to identify vulnerable functions in software caused by copying and pasting or sharing buggy source code. In this case, we succeeded in using BinSequence to identify a bug from Firefox. Together, these use cases demonstrate that our tool is both efficient and effective when applied to real-world scenarios.
We also compared BinSequence with three state of the art tools: Diaphora, PatchDiff2 and BinDiff. Experiment results show that BinSequence can achieve the best accuracy when compared with these tools.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:Huang, He
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Information Systems Security
Date:21 December 2015
Thesis Supervisor(s):Debbabi, Mourad and Youssef, Amr
ID Code:980764
Deposited By: HE HUANG
Deposited On:15 Jun 2016 16:31
Last Modified:18 Jan 2018 17:51
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top