The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT

Title:

The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT

Mokhov, Serguei A. (2011) The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT. Technical Report. NIST, Gaithersburg, MD.

Preview	Text (E-print of the technical report section.) (application/pdf) 1010.2511v5.pdf - Updated Version 374kB
Preview	Text (Presentation slides for SATE2010 workshop.) (application/pdf) SATE10_13_Marfcat_Mokhov.pdf - Presentation 327kB

Official URL: http://samate.nist.gov/docs/NIST_Special_Publicati...

Abstract

We present a machine learning approach to static code analysis and fingerprinting for weaknesses related to security, software engineering, and others using the open-source MARF framework and the MARFCAT application based on it for the NIST's SATE2010 static analysis tool exposition workshop.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering Concordia University > Research Units > Centre for Pattern Recognition and Machine Intelligence Concordia University > Research Units > Computer Security Laboratory
Item Type:	Monograph (Technical Report)
Authors:	Mokhov, Serguei A.
Series Name:	NIST SP
Institution:	NIST
Date:	27 October 2011
Projects:	Modular Audito Recognition Framework (MARF) Static Code Analysis MARFCAT
Funders:	Faculty of Engineering and Computer Science
Identification Number:	NIST Special Publication (SP) 500-283
Keywords:	static code analysis, vulnerability fingerprinting, machine learning, data mining, MARF, MARFCAT
ID Code:	36058
Deposited By:	Serguei Mokhov
Deposited On:	05 Jan 2012 17:03
Last Modified:	18 Jan 2018 17:36
Related URLs:	http://arxiv.org/abs/1010.2511 Publisher
Additional Information:	Editors: Vadim Okun, Aurelien Delaitre, Paul E. Black

References:

[ESI+09] Masashi Eto, Kotaro Sonoda, Daisuke Inoue, Katsunari Yoshioka, and Koji Nakao. A proposal of malware distinction method based on scan patterns using spectrum analysis. In Proceedings of the 16th International Conference on Neural Information Processing: Part II, ICONIP’09, pages 565–572, Berlin, Heidelberg, 2009. Springer-Verlag.

[Fre09] Free Software Foundation, Inc. wc – print newline, word, and byte counts for each file. GNU coreutils 6.10, 2009. man 1 wc.

[HLYD09] Aiman Hanna, Hai Zhou Ling, Xiaochun Yang, and Mourad Debbabi. A synergy between static and dynamic analysis for the detection of software security vulnerabilities. In Robert Meersman, Tharam S. Dillon, and Pilar Herrero, editors, OTM Conferences (2), volume 5871 of Lecture Notes in Computer Science, pages 815–832. Springer, 2009.

[IYE+09] Daisuke Inoue, Katsunari Yoshioka, Masashi Eto, Masaya Yamagata, Eisuke Nishino, Jun’ichi Takeuchi, Kazuya Ohkouchi, and Koji Nakao. An incident analysis system NICTER and its analysis engines based on data mining techniques. In Proceedings of the 15th International Conference on Advances in Neuro-Information Processing – Volume Part I, ICONIP’08, pages 579–586, Berlin, Heidelberg, 2009. Springer-Verlag.

[KAYE04] Ted Kremenek, Ken Ashcraft, Junfeng Yang, and Dawson Engler. Correlation exploitation in error ranking. In Foundations of Software Engineering (FSE), 2004.

[KE03] Ted Kremenek and Dawson Engler. Z-ranking: Using statistical analysis to counter the impact of static analysis approximations. In SAS 2003, 2003.

[KTB+06] Ted Kremenek, Paul Twohey, Godmar Back, Andrew Ng, and Dawson Engler. From uncertainty to belief: Inferring the specification within. In Proceedings of the 7th Symposium on Operating System Design and Implementation, 2006.

[KZL10] Ying Kong, Yuqing Zhang, and Qixu Liu. Eliminating human specification in static analysis. In Proceedings of the 13th international conference on Recent advances in intrusion detection, RAID’10, pages 494–495, Berlin, Heidelberg, 2010. Springer-Verlag.

[MD08] Serguei A. Mokhov and Mourad Debbabi. File type analysis using signal processing techniques and machine learning vs. file unix utility for forensic analysis. In Oliver Goebel, Sandra Frings, Detlef Guenther, Jens Nedon, and Dirk Schadt, editors, Proceedings of the IT Incident Management and IT Forensics (IMF’08), LNI140, pages 73–85. GI, September 2008.

[MES02] D. Mackenzie, P. Eggert, and R. Stallman. Comparing and merging files. [online], 2002. http://www.gnu.org/software/diffutils/manual/ps/diff.ps.gz.

[MLB07] Serguei A. Mokhov, Marc-Andre Laverdiere, and Djamel Benredjem. Taxonomy of linux kernel vulnerability solutions. In Innovative Techniques in Instruction Technology, E-learning, E-assessment, and Education, pages 485–493, University of Bridgeport, U.S.A., 2007. Proceedings of CISSE/SCSS’07.

[Mok07] Serguei A. Mokhov. Introducing MARF: a modular audio recognition framework and its applications for scientific and software engineering research. In Advances in Computer and Information Sciences and Engineering, pages 473–478, University of Bridgeport, U.S.A., December 2007. Springer Netherlands. Proceedings of CISSE/SCSS’07.

[Mok08] Serguei A. Mokhov. Study of best algorithm combinations for speech processing tasks in machine learning using median vs. mean clusters in MARF. In Bipin C. Desai, editor, Proceedings of C3S2E’08, pages 29–43, Montreal, Quebec, Canada, May 2008. ACM.

[Mok10a] Serguei A. Mokhov. Complete complimentary results report of the MARF’s NLP approach to the DEFT 2010 competition. [online], June 2010. http://arxiv.org/abs/1006.3787.

[Mok10b] Serguei A. Mokhov. L’approche MARF `a DEFT 2010: A MARF approach to DEFT 2010. In Proceedings of TALN’10, July 2010. To appear in DEFT 2010 System competition at TALN 2010.

[Mok11] Serguei A. Mokhov. MARFCAT – MARF-based Code Analysis Tool. Published electronically within the MARF project, http://sourceforge.net/projects/marf/files/Applications/MARFCAT/, 2010–2011. Last viewed February 2011.

[MSS09] Serguei A. Mokhov, Miao Song, and Ching Y. Suen. Writer identification using inexpensive signal processing techniques. In Tarek Sobh and Khaled Elleithy, editors, Innovations in Computing Sciences and Software Engineering; Proceedings of CISSE’09, pages 437–441. Springer, December 2009. ISBN: 978-90-481-9111-6, online at: http://arxiv.org/abs/0912.5502.

[NIS11a] NIST. National Vulnerability Database. [online], 2005–2011. http://nvd.nist.gov/.

[NIS11b] NIST. National Vulnerability Database statistics. [online], 2005–2011. http://web.nvd.nist.gov/view/vuln/statistics.

[NJG+10] Vinod P. Nair, Harshit Jain, Yashwant K. Golecha, Manoj Singh Gaur, and Vijay Laxmi. MEDUSA: MEtamorphic malware dynamic analysis using signature from API. In Proceedings of the 3rd International Conference on Security of information and Networks, SIN’10, pages 263–269, New York, NY, USA, 2010. ACM.

[ODBN10] Vadim Okun, Aurelien Delaitre, Paul E. Black, and NIST SAMATE. Static Analysis Tool Exposition (SATE) 2010. [online], 2010. See http://samate.nist.gov/SATE.html and http://samate.nist.gov/SATE2010Workshop.html.

[Sou10] Sourcefire. Snort: Open-source network intrusion prevention and detection system (IDS/IPS). [online], 2010. http://www.snort.org/.

[The11] The MARF Research and Development Group. The Modular Audio Recognition Framework and its Applications. [online], 2002–2011. http://marf.sf.net and http://arxiv.org/abs/0905.1235, last viewed April 2010.

[Tli09] Syrine Tlili. Automatic detection of safety and security vulnerabilities in open source software. PhD thesis, Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada, 2009. ISBN: 9780494634165.

[VM10] Various contributors and MITRE. Common Weakness Enumeration (CWE) – a community-developed dictionary of software weakness types. [online], 2010. See http://cwe.mitre.org

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT

The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT

Abstract

References: