Login | Register

Automatic Identification of Online Predators in Chat Logs by Anomaly Detection and Deep Learning


Automatic Identification of Online Predators in Chat Logs by Anomaly Detection and Deep Learning

Ebrahimi, Mohammadreza (2016) Automatic Identification of Online Predators in Chat Logs by Anomaly Detection and Deep Learning. Masters thesis, Concordia University.

Text (application/pdf)
Ebrahimi_MSc_S2016.pdf - Accepted Version
Available under License Spectrum Terms of Access.


Providing a safe environment for juveniles and children in online social networks is considered as a major factor in improving public safety. Due to the prevalence of the online conversations, mitigating the undesirable effects of juvenile abuse in cyberspace has become inevitable. Using automatic ways to address this kind of crime is challenging and demands efficient and scalable data mining techniques. The problem can be casted as a combination of textual preprocessing in data/text mining and binary classification in machine learning. This thesis proposes two machine learning approaches to deal with the following two issues in the domain of online predator identification: 1) The first problem is gathering a comprehensive set of negative training samples which is unrealistic due to the nature of the problem. This problem is addressed by applying an existing method for semi-supervised anomaly detection that allows the training process based on only one class label. The method was tested on two datasets; 2) The second issue is improving the performance of current binary classification methods in terms of classification accuracy and F1-score. In this regard, we have customized a deep learning approach called Convolutional Neural Network to be used in this domain. Using this approach, we show that the classification performance (F1-score) is improved by almost 1.7% compared to the classification method (Support Vector Machine). Two different datasets were used in the empirical experiments: PAN-2012 and SQ (Sûreté du Québec). The former is a large public dataset that has been used extensively in the literature and the latter is a small dataset collected from the Sûreté du Québec.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Ebrahimi, Mohammadreza
Institution:Concordia University
Degree Name:M. Sc.
Program:Computer Science
Date:14 April 2016
Thesis Supervisor(s):Suen, Ching Y. and Ormandjieava, Olga
ID Code:981404
Deposited On:26 Aug 2016 12:48
Last Modified:18 Jan 2018 17:53
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top