Dan, Liu ORCID: https://orcid.org/0000-0002-3730-2139 (2018) Identifying Cyber Predators by Using Sentiment Analysis and Recurrent Neural Networks. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
2MBDanLiu_Thesis_Final_2.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Recurrent Neural Network with Long Short-Term Memory cells (LSTM-RNN) have impressive ability in sequence data processing, particularly language model building and text classification. This research proposes the combination of sentiment analysis, sentence vectors, and LSTM-RNN as a novel way for cyber Sexual Predator Identification (SPI). There are two tasks in SPI. The first one is identifying sexual predators among chats. The second one is highlighting specific sexual predators’ lines in chats. Our research focuses on the first task.
An LSTM-RNN language model is applied to generate sentence vectors which are the last hidden states in the language model. Sentence vectors are fed into the LSTM-RNN classifier, so as to capture suspicious conversations. Hidden state makes a breakthrough in the generation of unseen sentence vectors i.e., the system can score a sentence never seen before in the training data. Fasttext is used to filter the contents of conversations and generate a sentiment score to the purpose of identifying potential predators. IMDB sentiment review task is introduced to provide an intuitive measurement of the combined method. The model identified 206 predators out of 254. The experiment achieved a record-breaking F-0.5 score of 0.9555, higher than the top-ranked result in the SPI competition.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering Concordia University > Research Units > Centre for Pattern Recognition and Machine Intelligence |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Dan, Liu |
Institution: | Concordia University |
Degree Name: | M. Sc. |
Program: | Computer Science |
Date: | April 2018 |
Thesis Supervisor(s): | Olga, Ormandjieva and Ching, Suen |
ID Code: | 983794 |
Deposited By: | DAN LIU |
Deposited On: | 11 Jun 2018 03:37 |
Last Modified: | 19 Mar 2019 14:47 |
Repository Staff Only: item control page