Sekeres, John (2022) Methodologies for the Management, Normalization and Identification of Sexual Predation of Minors in Cyber Chat Logs. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
6MBSekeres_MCompSc_F2022.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Neural networks based on the Transformer architecture have shown great results in tasks such as machine translation and text generation. Our contribution provides a methodology for an AI agent capable of Sexual Predator Identification (SPI) based on the classification capabilities of models built on the Transformer architecture. Results are comparable to existing state-of-the-art methods, with a F0.5 score of 92.5% for predator identification on the PAN2012 test dataset consisting of 2,004,235 lines of text. Practical considerations require an AI agent that can evaluate large numbers of chats quickly. In that regard the Transformer based AI agent is able to evaluate over 2 million lines of text in under 6 minutes on a modestly configured workstation.
An AI agent by itself does not provide a complete solution to sexual predator identification. In an effort to give practical value to an AI agent, we address the vitally important but often overlooked issues of chat management and normalization. Our contribution provides a methodology for efficiently transforming raw chats from a native format into a consistent 'normalized' format suitable for analysis. We define a methodology to the problem of managing large numbers of chats, converting/normalizing 10,000 documents in a dataset in under 3 minutes on a modestly configured workstation. We present a software-based solution that among other things brings together chat management, normalization, and AI based analysis into a cohesive, productive environment that law enforcement can use to identify and build a case against suspected predators.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Sekeres, John |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science |
Date: | 28 August 2022 |
Thesis Supervisor(s): | Suen, Ching Y. and Olga, Ormandjieva |
ID Code: | 991466 |
Deposited By: | JOHN SEKERES |
Deposited On: | 21 Jun 2023 14:43 |
Last Modified: | 21 Jun 2023 14:43 |
Repository Staff Only: item control page