Login | Register

On email spam filtering using support vector machine

Title:

On email spam filtering using support vector machine

Amayri, Ola (2009) On email spam filtering using support vector machine. Masters thesis, Concordia University.

[img]
Preview
Text (application/pdf)
MR63317.pdf - Accepted Version
1MB

Abstract

Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as "spam emails". A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering. Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using SVM is the choice of kernels as they directly affect the separation of emails in the feature space. We investigate the use of several distance-based kernels to specify spam filtering behaviors using SVM. However, most of used kernels concern continuous data, and neglect the structure of the text. In contrast to classical blind kernels, we propose the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem. On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. We detail a feature mapping variant in TC that yields improved performance for the standard SVM in filtering task. Furthermore, we propose an online active framework for spam filtering. We present empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in real time. We show that active online method using string kernels achieves higher precision and recall rates.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:Amayri, Ola
Pagination:x, 55 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Institute for Information Systems Engineering
Date:2009
Thesis Supervisor(s):Bouguila, N
ID Code:976212
Deposited By: Concordia University Library
Deposited On:22 Jan 2013 16:21
Last Modified:18 Jan 2018 17:42
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top