Evaluating Website Data Leaks through Spam Collection on Honeypots

Title:

Evaluating Website Data Leaks through Spam Collection on Honeypots

Oyinloye, Oghenerukevwe Elohor (2025) Evaluating Website Data Leaks through Spam Collection on Honeypots. Masters thesis, Concordia University.

Preview

Text (application/pdf)
Oyinloye_MASc_S2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.

1MB

Abstract

People increasingly rely on online services for communication, education, shopping, entertainment
etc., but this convenience comes with escalating spam volumes. Prior studies have linked spam primarily
to user behavior or data leaks but lacked effective methods to distinguish these causes. Our research
proposes using spam as a forensic indicator of data leaks and introduces a low-complexity fingerprinting
technique to trace leak sources, while evaluating consent and subscription practices of websites and
telecom.
We deployed 148 honeypots with 740 accounts across 370 websites in 12 communities over 12
months, analyzing 12,490 spam emails to assess forensic indicators and evaluate exposure, privacy policy
exploitation seen with legitimate websites under the Canadian Anti-Spam Law (CASL). Our method
was bench-marked against traditional leak detection models and fingerprinting techniques.
Our findings reveal that many legitimate websites exploit CASL consent practices by automatically
enrolling users in mailing lists via implicit consent, while sites requiring explicit consent often violate
their own policies, highlighting enforcement gaps. We recommend that regulators mandate a clear separation
between subscription agreements and privacy policies and require explicit third-party consent at
sign-up.
Additionally, our analysis shows that men aged 48–57 receive the highest spam volumes, with peak
activity between 00-04 minute of each hour, with peaks on Thursdays. These insights offer valuable
guidance for enhancing spam filtering models.
Using our analysis engine, we achieved 100% leak detection and 99.29% source attribution accuracy.
Compared to network intrusion detection, log analysis, machine learning, and traditional fingerprinting,
our method more effectively identifies compliance violations, traces leaks to their source, and estimates
exposure impact. Furthermore, to address telecom issues from phone number reassignments, we proposed
a nonce-based de-association method that promises significant spam reduction.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:	Thesis (Masters)
Authors:	Oyinloye, Oghenerukevwe Elohor
Institution:	Concordia University
Degree Name:	M.A. Sc.
Program:	Information Systems Security
Date:	March 2025
Thesis Supervisor(s):	Fung, Carol
ID Code:	995280
Deposited By:	Oghenerukevwe Elohor Oyinloye
Deposited On:	17 Jun 2025 17:22
Last Modified:	17 Jun 2025 17:22

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Evaluating Website Data Leaks through Spam Collection on Honeypots

Evaluating Website Data Leaks through Spam Collection on Honeypots

Abstract