Knowledge-Informed Self-Reflective Automated Penetration Testing Based on LLMs

Title:

Knowledge-Informed Self-Reflective Automated Penetration Testing Based on LLMs

Hanzheng, Dai ORCID: https://orcid.org/0009-0004-1073-526X (2025) Knowledge-Informed Self-Reflective Automated Penetration Testing Based on LLMs. Masters thesis, Concordia University.

Preview

Text (application/pdf)
Dai_MA_F2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.

698kB

Abstract

Automated penetration testing (AutoPT) plays a crucial role in identifying and mitigating cybersecurity vulnerabilities before they can be exploited. However, traditional AutoPT approaches remain limited in adaptability, contextual reasoning, and cross-stage coordination. Although recent advances in artificial intelligence (AI), including expert systems, reinforcement learning (RL), and large language models (LLMs), have improved penetration testing (PT) automation, existing AI-assisted AutoPT frameworks still suffer from poor generalization, context loss, hallucination, and an inability to incorporate prior trial-and-error experience, making it challenging to automate multi-stage PT tasks across different environments.
To address these limitations, we propose RefPentester, a knowledge-informed, self-reflective AutoPT framework built on LLMs that enables context-aware, adaptive, and interpretable automated workflows across multi-stage PT processes. RefPentester integrates three main components: 1) a Process Navigator that identifies the current PT stage and retrieves corresponding hierarchical knowledge via a retrieval-augmented generation (RAG) pipeline; 2) a Generator that produces actionable and context-aware step-by-step guidance; and 3) a Reflector that evaluates execution feedback and facilitates structured reflective learning. A comprehensive PT knowledge vector database (VDB) is built from public cybersecurity resources, including MITRE ATT&CK and the OWASP Testing Guide (OTG), forming a tree-structured repository of tactics, techniques, and abstract actions.
Experiments were conducted on the Sau machine from the Hack The Box (HTB) platform, which provides realistic, legally authorized, and reproducible virtual PT environments. Results show that RefPentester consistently outperforms a GPT-4o baseline in both credential capture and stage transition success rates, demonstrating improved operational reliability. Beyond quantitative gains, log-based qualitative analysis reveals two key patterns: 1) the Reflector enhances decision stability by preventing repeated mistakes and supporting adaptive recovery across stages; and 2) the interaction between hierarchical knowledge retrieval and structured reflection yields clearer and more interpretable reasoning trajectories. These findings indicate that combining knowledge grounding with reflective optimization substantially strengthens robustness and interpretability in multi-stage AutoPT reasoning.
The study also acknowledges practical limitations, such as reliance on a single HTB environment, and it outlines several future research directions to further advance AI-assisted AutoPT.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:	Thesis (Masters)
Authors:	Hanzheng, Dai
Institution:	Concordia University
Degree Name:	M.A. Sc.
Program:	Information Systems Security
Date:	1 December 2025
Thesis Supervisor(s):	Jun, Yan
ID Code:	996544
Deposited By:	Hanzheng Dai
Deposited On:	29 Jun 2026 14:43
Last Modified:	29 Jun 2026 14:43

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Knowledge-Informed Self-Reflective Automated Penetration Testing Based on LLMs

Knowledge-Informed Self-Reflective Automated Penetration Testing Based on LLMs

Abstract