Das, Swagata (2013) INSTANT MESSAGING SPAM DETECTION IN LONG TERM EVOLUTION NETWORKS. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
2MBcuthesis_masters-SwagataDas-PDF-A.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
The lack of efficient spam detection modules for packet data communication is resulting to increased threat exposure for the telecommunication network users and the service providers. In this thesis, we propose a novel approach to classify spam at the server side by intercepting packet-data communication among instant messaging applications. Spam detection is performed using machine learning techniques on packet headers and contents (if unencrypted) in two different phases: offline training and online classification. The contribution of this study is threefold. First, it identifies the scope of deploying a spam detection module in a state-of-the-art telecommunication architecture. Secondly, it compares the usefulness of various existing machine learning algorithms in order to intercept and classify data packets in near real-time communication of the instant messengers. Finally, it evaluates the accuracy and classification time of spam detection using our approach in a simulated environment of continuous packet data communication. Our research results are mainly generated by executing instances of a peer-to-peer instant messaging application prototype within a simulated Long Term Evolution (LTE) telecommunication network environment. This prototype is modeled and executed using OPNET network modeling and simulation tools. The research produces considerable knowledge on addressing unsolicited packet monitoring in instant messaging and similar applications.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Das, Swagata |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | 15 August 2013 |
Thesis Supervisor(s): | Debbabi, Mourad and Pourzandi, Makan |
Keywords: | Classification, LTE, Machine Learning, Mobile Network, SPIM |
ID Code: | 977830 |
Deposited By: | SWAGATA DAS |
Deposited On: | 25 Nov 2013 19:51 |
Last Modified: | 18 Jan 2018 17:45 |
References:
[1] Bill C-28: Canada's Anti-Spam Legislation. http://www.ic.gc.ca/eic/site/ecic-ceac.nsf/eng/h_gv00567.html. Accessed: 01/02/2013.
[2] Coda Recent Reports. http://www.codaresearch.co.uk/reports.htm. Accessed:
01/02/2013.
[3] Email Statistics Report, 2011-2015. http://www.radicati.com/wp/wp-content/
uploads/2011/05/Email-Statistics-Report-2011-2015-Executive-Summary.pdf. Accessed: 01/02/2013.
[4] Giz Explains: Whats the Di_erence Between GSM and CDMA? http://gizmodo.com/5637136/giz-explains-gsm-vs-cdma. Accessed: 01/02/2013.
[5] Global Mobile Information Systems Simulation. http://pcl.cs.ucla.edu/
projects/glomosim. Accessed: 01/02/2013.
[6] Jabber.org. http://www.jabber.org/. Accessed: 01/02/2013.
[7] LTE Deployments and Commitments. http://ltemaps.org/. Accessed: 01/02/2013.
[8] Manually Detecting Maximum Transmission Unit. online, http://ampledata.org/
manually_detecting_maximum_transmission_unit.html. Accessed: 01/02/2013.
[9] NetSim Network Simulator. http://www.boson.com/netsim-cisco-network-simulator. Accessed: 01/02/2013.
[10] Network Simulation. http://www.opnet.com/solutions/network_rd/modeler.html. Accessed: 01/02/2013.
[11] Network Simulator. http://www.isi.edu/nsnam/ns. Accessed: 01/02/2013.
[12] QualNet Simulation. http://web.scalable-networks.com/content/qualnet. Accessed:01/02/2013.
[13] SMS Spam Collection V.1. http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. Accessed: 01/02/2013.
[14] Spam (electronic). http://en.wikipedia.org/wiki/Spam_(electronic). Accessed: 01/02/2013.
[15] TETCOS. http://www.tetcos.com. Accessed: 01/02/2013.
[16] Weka 3: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/.
Accessed: 01/02/2013.
[17] WiMAX. http://en.wikipedia.org/wiki/WiMax. Accessed: 01/02/2013.
[18] XMPP Technologies: Jingle. http://xmpp.org/about-xmpp/
technology-overview/jingle/. Accessed: 01/02/2013.
[19] Protect Your Business from Instant Messaging Threats. online, http://www.symantec.com/es/es/library/article.jsp?aid=instant_messaging_threats, July 2006.
[20] About Detecting SPIM. online, http://www.symantec.com/business/support/
index?page=content&id=HOWTO54058, June 2011.
[21] Shivani Agarwal. Ranking Methods in Machine Learning: A Tutorial Introduction.
online, http://drona.csa.iisc.ernet.in/~shivani/Events/SDM-10-Tutorial/sdm10-tutorial.pdf, May 2010.
[22] Hasan Shojaa Alkahtani, Paul Gardener-Stephen, and Robert Goodwin. A Taxonomy of Email Spam Filters. In ACIT, 2011.
[23] Dmitri Alperovitch, Paul Judge, and Sven Krasser. Taxonomy of Email Reputation Systems. In Proceedings of the 27th International Conference on Distributed Computing Systems Workshops, ICDCSW '07, 2007.
[24] Aston Blake. Smart Phones: How do They Affect Us Really. http://www.
slideshare.net/ashtonblake/smart-phones-how-do-they-affect-us-really.
[25] Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When Is "Nearest Neighbor" Meaningful? In In Int. Conf. on Database Theory, pages 217--235, 1999.
[26] Enrico Blanzieri and Anton Bryl. A Survey of Learning-Based Techniques of Email Spam Filtering. Artif. Intell. Rev., 29(1):63--92, 2008.
[27] Dario Bonglio, Marco Mellia, Michela Meo, Dario Rossi, and Paolo Tofanelli. Revealing
Skype Traffic: When Randomness Plays with You. SIGCOMM Comput. Commun. Rev., 37(4):37{48, October 2007.
[28] Elias Bou-Harb, Mourad Debbabi, and Chadi Assi. A First Look on the Effects and Mitigation of VoIP SPIT Flooding in 4G Mobile Networks. In IEEE-ICC, 2012.
[29] Andrea Buonerba. Skype Traffic Detection and Characterization. Master's thesis,
HELSINKI UNIVERSITY OF TECHNOLOGY, September 2007.
[30] G. Camarillo and M.A. Garcia-Martin. The 3G IP Multimedia Subsystem (IMS):
Merging the Internet and the Cellular Worlds. Wiley, 2007.
[31] Gonzalo Camarillo and Miguel A. Garcia-Martin. The 3G IP Multimedia Subsystem IMS - Merging the Internet and the Cellular Worlds (2. ed.). Wiley, 2006.
[32] B. Campbell, J. Rosenberg, H. Schulzrinne, and C. Huitema. Session Initiation Protocol
(SIP) Extension for Instant Messaging. Technical report, IETF: Request for Comments: 3428, 2002.
[33] Sujata Chavan. Understanding Instant Messaging (IM) and Its Security Risks. SANS Institute, August 2003.
[34] Thomas Claburn. SPIM, Like Spam, Is on the Rise. Information week, March 2004.
[35] Gerald Combs. Wireshark. online, http://www.wireshark.org/.
[36] Gordon V. Cormack, Jose Maria Gomez Hidalgo, and Enrique Puertas Sanz. Feature engineering for mobile (SMS) spam filtering. In SIGIR, pages 871{872, 2007.
[37] Gordon V. Cormack, Jose Maria Gomez Hidalgo, and Enrique Puertas Sanz. Spam filtering for short messages. In CIKM, pages 313{320, 2007.
[38] Lingling Cui, Sharanya Eswaran, Wei Hu, and Xinyu Liu. Secure Instant Messaging. Project Report, 2006. www.cs.virginia.edu/~wh5a/personal/psi.doc.
[39] Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samarati. P2P-Based Collaborative Spam Detection and Filtering. IEEE International Conference on Peer-to-Peer Computing, pages 176{183, 2004.
[40] J. N. Darroch and D. Ratcliff. Generalized Iterative Scaling for Log-linear Models. In The Annals of Mathematical Statistics, volume 43, pages 1470{1480, 1972.
[41] Swagata Das, Mourad Debbabi, and Makan Pourzandi. SPIM Detection in LTE Networks. In 25th Canadian Conference on Electrical and Computer Engineering,May 2012.
[42] M. Debbabi and M. Rahman. The War of Presence and Instant Messaging: Right Protocols
and APIs. In IEEE Consumer Communications and Networking Conference,
pages 341--346, USA, January 2004. IEEE Press.
[43] Dialogic White Papers. The Architecture and Benefits of IMS. online,
http://www.dialogic.com/~/media/products/docs/whitepapers/
11297-ims-arch-benefits-wp.pdf, 2009.
[44] James Dougherty, Ron Kohavi, and Mehran Sahami. Supervised and Unsupervised
Discretization of Continuous Features. In Machine Learning: Proceedings of the
Twelfth International Conference, pages 194{202. Morgan Kaufmann, 1995.
[45] Mark Dredze. Machine Learning Finding Patterns in the World. online, http://www.
docstoc.com/docs/108806134/Machine-Learning-Tutorial, 2009.
[46] Sinem Coleri Ergen. ZigBee/IEEE 802.15.4 Summary. online, http://pages.cs.
wisc.edu/~suman/courses/838/papers/zigbee.pdf, September 2004.
[47] Eventhelix. Presence IMS Feature Successful Subscription (IMS Presence Subscription,
Publication and Notification). http://www.eventhelix.com/ims/presence/ims-presence-subscribe-notify-flow.pdf. Accessed: 01/02/2013.
[48] Paul Festa. Spammers target IM accounts. online, http://news.cnet.com/
2100-1023-857637.html, March 2002.
[49] L. Firte, C. Lemnaru, and R. Potolea. Spam Detection Filter Using KNN Algorithm and Re-sampling. In Intelligent Computer Communication and Processing (ICCP), 2010 IEEE International Conference on, pages 27{33, August 2010.
[50] Yoav Freund and Robert E. Schapire. A Decision-Theoretic Generalization of Online Learning and an Application to Boosting. In Proceedings of the Second European Conference on Computational Learning Theory, EuroCOLT '95, pages 23{37.
Springer-Verlag, 1995.
[51] Steven Gianvecchio, Mengjun Xie, Zhenyu Wu, and Haining Wang. Measurement and Classification of Humans and Bots in Internet Chat. In Proceedings of the 17th conference on Security symposium, pages 155{169. USENIX Association, 2008.
[52] J. Goodman. Sequential Conditional Generalized Iterative Scaling. [online] http:
//citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3035, 2002.
[53] Dimitris Gritzalis and Yannis Mallios. A SIP-oriented SPIT Management Framework. Computers & Security, 27(5-6):136{153, 2008.
[54] Anglica Garca Gutirrez. Instant Messaging and Presence Services: Analysis of the Standards and Example implementation. Master's thesis, Technische Universitt
Hamburg-Harburg, 2004.
[55] Zoltn Gyngyi and Hector Garcia-Molina. Web Spam Taxonomy. In AIRWeb'05, pages 39--47, 2005.
[56] Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray, and Sven Krasser. Detecting spammers with SNARE: spatio-temporal network-level automatic
reputation engine. In Proceedings of the 18th conference on USENIX security symposium, SSYM'09, pages 101{118. USENIX Association, 2009.
[57] Neal Hindocha and Eric Chien. Malicious Threats and Vulnerabilities in Instant Messaging.
White paper, Symantec Security Response, Symantec Corporation, September 2003.
[58] Matt Holliday. Facebook Chat Surpasses 1 Billion Messages Sent Per Day. http://www.insidefacebook.com/2009/06/17/facebook-chat-surpasses-1-billion-messages-per-day/.
[59] E. B. Hunt. Concept Learning: an Information Processing problem. Wiley, 1962.
[60] International Telecommunication Union. ITU Study on the Financial Aspects of Network Security: Malware and Spam. http://www.itu.int/ITU-D/cyb/cybersecurity/docs/itu-study-financial-aspects-of-malware-and-spam.pdf, 2008.
[61] C. Jennings, R. Mahy, and A. B. Roach. Relay Extensions for the Message Session
Relay Protocol (MSRP). Technical report, IETF: Request for Comments: 4976, 2007.
[62] Thorsten Joachims. Training Linear SVM in Linear Time. In Proceedings of the 12th
ACM SIGKDD international conference on Knowledge discovery and data mining,
KDD '06, pages 217{226, New York, NY, USA, 2006. ACM.
[63] KI Lakhtaria. Study and Modeling Instant Messaging and Presence over IMS. online,http://shodhganga.inflibnet.ac.in/bitstream/10603/734/12/12_
chapter7a.pdf, 2010.
[64] Maciej Korczynski. Classifying Application Flows and Intrusion Detection in Internet Traffic. PhD thesis, UNIVERSIT DE GRENOBLE, November 2012.
[65] Madhuri Kulkarni. 4G Wireless and International Mobile Telecommunication (IMT) Advanced. online, http://www.cse.wustl.edu/~jain/cse574-08/ftp/imta.pdf,
April 2008.
[66] Abdelkader Lahmadi and Olivier Festor. SecSip: A Stateful Firewall for SIP-based Networks. CoRR, abs/0907.3045, 2009.
[67] Wei Li, Marco Canini, Andrew W. Moore, and Rafaele Bolla. Efficient application identification and the temporal and spatial stability of classification schema. Comput.
Netw., 53:790{809, April 2009.
[68] Zhijun Liu, Weili Lin, Na Li, and David Lee. Detecting and filtering instant messaging spam: a global and personalized approach. In Proceedings of the First international conference on Secure network protocols, NPSEC'05, pages 19{24, Washington, DC,
USA, 2005. IEEE Computer Society.
[69] M. Day and J. Rosenberg and H. Sugano. A model for presence and instant messaging.
online, http://tools.ietf.org/html/rfc2778, 2000.
[70] M. Mannan and P.C. van Oorschot. Secure Public Instant Messaging: A Survey.
Proceedings of Privacy, Security and Trust, 2004.
[71] Mohammad Mannan and Paul C. van Oorschot. On Instant Messaging Worms, Analysis and Countermeasures. In WORM, pages 2{11, 2005.
[72] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. Introduction to
Information Retrieval. Cambridge University Press, 2008.
[73] U. Maroof. Analysis and detection of SPIM using message statistics. In 2010 6th
International Conference on Emerging Technologies (ICET), pages 246{249, 2010.
[74] Joseph Menn. N.Y. Man Arrested Over Instant-Message Spam. Los Angeles Times, February 2005.
[75] Messaging Anti-Abuse Working Group. Email Metrics Program: The Network Operators Perspective. http://www.maawg.org/sites/maawg/files/news/MAAWG_
2010-Q1Q2_Metrics_Report_13.pdf. Accessed: 01/02/2013.
[76] Vangelis Metsis, Ion Androutsopoulos, and Georgios Paliouras. Spam Filtering with
Naive Bayes - Which Naive Bayes? In CEAS, 2006.
[77] Motorola. Long Term Evolution (LTE): A Technical Overview. online, 2007.
[78] Ryuei Nishii and Shinto Eguchi. Supervised Image Classification by Contextual AdaBoost
Based on Posteriors in Neighborhoods. IEEE T. Geoscience and Remote Sensing, 43(11):2547{2554, 2005.
[79] Open Mobile Alliance. Instant Messaging Using SIMPLE Candidate Version 1.0. Release candidate, Open Mobile Alliance Ltd., May 2011.
[80] Tu Ouyang, Soumya Ray, Michael Rabinovich, and Mark Allman. Can network characteristics detect spam effectively in a stand-alone enterprise? In Proceedings of the
12th international conference on Passive and active measurement, PAM'11, pages
92{101, Berlin, Heidelberg, 2011. Springer-Verlag.
[81] Yongsuk Park and Taejoon Park. A Survey of Security Threats on 4G Networks. In
Globecom Workshops, 2007 IEEE, pages 1{6, nov. 2007.
[82] Roland Parviainen and Peter Parnes. Mobile instant messaging, 2003.
[83] John C. Platt. Advances in Kernel Methods, chapter Fast training of support vector
machines using sequential minimal optimization, pages 185{208. MIT Press, 1999.
[84] QoS: Classification Configuration Guide, Cisco IOS XE Release 3S. Classifying Network Traffic. online, http://www.cisco.com/en/US/docs/ios-xml/ios/qos_classn/configuration/xe-3s/qos-classn-ntwk-trfc.html.
[85] J. R. Quinlan. Induction of Decision Trees. Mach. Learn., 1(1):81{106, March 1986.
[86] J. R. Quinlan, P. J. Compton, K. A. Horn, and L. Lazarus. Inductive Knowledge
Acquisition: a Case Study. In Ross J. Quinlan, editor, Applications of Expert Systems,
chapter 9, pages 157{73. Addison-Wesley, 1987.
[87] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 1993.
[88] Santa Rahman, Nahid Hossain, Nizam Sayeed, and M.L. Palash. Comparative Study
Between Wireless Regional Area Network (IEEE Standard 802.22) and WiMAX and Coverage Planning of a Wireless Regional Area Network Using Cognitive Radio
Technology. International Journal of Recent Technology and Engineering (IJRTE),
1(6):161--163, January 2013.
[89] Research In Motion Limited. BlackBerry Enterprise Server for Microsoft Exchange. online, http://docs.blackberry.com/en/admin/deliverables/16574/BlackBerry_Enterprise_Server_for_Microsoft_Exchange-Feature_and_Technical_Overview-T305802-1108946-0615123042-001-5.0.2-US.pdf, 2010.
[90] Steve Roche. Protect Your Children from Internet and Mobile Phone Dangers. Sparkwave, 2004.
[91] Martin Roesch. Snort: Lightweight Intrusion Detection for Networks. In LISA, pages 229{238. USENIX, 1999.
[92] J. Rosenberg. Presence Authorization Rules. Technical report, IETF: Request for Comments: 5025, 2007.
[93] J. Rosenberg. The Extensible Markup Language (XML) Configuration Access Protocol
(XCAP). Technical report, IETF: Request for Comments: 4825, 2007.
[94] Sourabh Satish. Automatic spim detection. US-Patent, 2005.
[95] H. Schulzrinne, H. Tschofenig, J. Morris, J. Cuellar, J. Polk, and J. Rosenberg. Common Policy: A Document Format for Expressing Privacy Preferences. Technical report, IETF: Request for Comments: 4745, 2007.
[96] Stefania Sesia, Issam Tou_k, and Matthew Baker. LTE, The UMTS Long Term Evolution: From Theory to Practice. Wiley Publishing, 2009.
[97] Asaf Shabtai, Uri Kanonov, Yuval Elovici, Chanan Glezer, and Yael Weiss. “Andromaly":
a behavioral malware detection framework for android devices. Journal of Intelligent Information Systems, 38(1):161{190, 2012.
[98] Simon Znaty and Jean-Louis Dauphin. IP Multimedia Subsystem: Principles and Architecture. online, http://www.efort.com/media_pdf/IMS_ENG.pdf, 2005.
[99] Hardeep Singh and Harish Kumar. Survey of Feature Selection Technique in Internet
Traffic Data. International Journal of Advanced Research in Computer Science and
Software Engineering, 2(3):207{210, March 2012.
[100] Kyoungwon Suh, Daniel R. Figueiredo, Jim Kurose, and Don Towsley. Characterizing
and detecting skype-related traffic. In In Proceedings of IEEE INFOCOM 06, 2006.
[101] Tektronix Communications. LTE Networks: Evolution and Technology
Overview. online, http://www.tektronixcommunications.
com/sites/tektronixcommunications.com/files/assets/documents/LTE-Network-Evolution-Technology-Whitepaper.pdf, September 2010.
[102] Today's Net Threat. Instant Messaging Attacks on the Rise. online,http://www.2010netthreat.com/netthreats/post/2010/03/23/Instant-messaging-attacks-on-the-rise.aspx, March 2010.
[103] Ulticom. Signaling: Diameter. http://www.ulticom.com/
technologies-signaling/diameter/. Accessed: 01/02/2013.
[104] UMTS Forum. Towards Global Mobile Broadband: Standardizing the Future of
Mobile Communications with LTE (Long Term Evolution). online, February 2008.
[105] A. Wilhelm. Data and Knowledge Mining. In Y. Mori (Hrs
Repository Staff Only: item control page