Login | Register

Identifying Cyber Predators by Using Sentiment Analysis and Recurrent Neural Networks


Identifying Cyber Predators by Using Sentiment Analysis and Recurrent Neural Networks

Dan, Liu ORCID: https://orcid.org/0000-0002-3730-2139 (2018) Identifying Cyber Predators by Using Sentiment Analysis and Recurrent Neural Networks. Masters thesis, Concordia University.

Text (application/pdf)
DanLiu_Thesis_Final_2.pdf - Accepted Version
Available under License Spectrum Terms of Access.


Recurrent Neural Network with Long Short-Term Memory cells (LSTM-RNN) have impressive ability in sequence data processing, particularly language model building and text classification. This research proposes the combination of sentiment analysis, sentence vectors, and LSTM-RNN as a novel way for cyber Sexual Predator Identification (SPI). There are two tasks in SPI. The first one is identifying sexual predators among chats. The second one is highlighting specific sexual predators’ lines in chats. Our research focuses on the first task.

An LSTM-RNN language model is applied to generate sentence vectors which are the last hidden states in the language model. Sentence vectors are fed into the LSTM-RNN classifier, so as to capture suspicious conversations. Hidden state makes a breakthrough in the generation of unseen sentence vectors i.e., the system can score a sentence never seen before in the training data. Fasttext is used to filter the contents of conversations and generate a sentiment score to the purpose of identifying potential predators. IMDB sentiment review task is introduced to provide an intuitive measurement of the combined method. The model identified 206 predators out of 254. The experiment achieved a record-breaking F-0.5 score of 0.9555, higher than the top-ranked result in the SPI competition.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Concordia University > Research Units > Centre for Pattern Recognition and Machine Intelligence
Item Type:Thesis (Masters)
Authors:Dan, Liu
Institution:Concordia University
Degree Name:M. Sc.
Program:Computer Science
Date:April 2018
Thesis Supervisor(s):Olga, Ormandjieva and Ching, Suen
ID Code:983794
Deposited By: DAN LIU
Deposited On:11 Jun 2018 03:37
Last Modified:19 Mar 2019 14:47
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top