Hanzheng, Dai
ORCID: https://orcid.org/0009-0004-1073-526X
(2025)
Knowledge-Informed Self-Reflective Automated Penetration Testing Based on LLMs.
Masters thesis, Concordia University.
Preview |
Text (application/pdf)
698kBDai_MA_F2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Automated penetration testing (AutoPT) plays a crucial role in identifying and mitigating cybersecurity vulnerabilities before they can be exploited. However, traditional AutoPT approaches remain limited in adaptability, contextual reasoning, and cross-stage coordination. Although recent advances in artificial intelligence (AI), including expert systems, reinforcement learning (RL), and large language models (LLMs), have improved penetration testing (PT) automation, existing AI-assisted AutoPT frameworks still suffer from poor generalization, context loss, hallucination, and an inability to incorporate prior trial-and-error experience, making it challenging to automate multi-stage PT tasks across different environments.
To address these limitations, we propose RefPentester, a knowledge-informed, self-reflective AutoPT framework built on LLMs that enables context-aware, adaptive, and interpretable automated workflows across multi-stage PT processes. RefPentester integrates three main components: 1) a Process Navigator that identifies the current PT stage and retrieves corresponding hierarchical knowledge via a retrieval-augmented generation (RAG) pipeline; 2) a Generator that produces actionable and context-aware step-by-step guidance; and 3) a Reflector that evaluates execution feedback and facilitates structured reflective learning. A comprehensive PT knowledge vector database (VDB) is built from public cybersecurity resources, including MITRE ATT&CK and the OWASP Testing Guide (OTG), forming a tree-structured repository of tactics, techniques, and abstract actions.
Experiments were conducted on the Sau machine from the Hack The Box (HTB) platform, which provides realistic, legally authorized, and reproducible virtual PT environments. Results show that RefPentester consistently outperforms a GPT-4o baseline in both credential capture and stage transition success rates, demonstrating improved operational reliability. Beyond quantitative gains, log-based qualitative analysis reveals two key patterns: 1) the Reflector enhances decision stability by preventing repeated mistakes and supporting adaptive recovery across stages; and 2) the interaction between hierarchical knowledge retrieval and structured reflection yields clearer and more interpretable reasoning trajectories. These findings indicate that combining knowledge grounding with reflective optimization substantially strengthens robustness and interpretability in multi-stage AutoPT reasoning.
The study also acknowledges practical limitations, such as reliance on a single HTB environment, and it outlines several future research directions to further advance AI-assisted AutoPT.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Hanzheng, Dai |
| Institution: | Concordia University |
| Degree Name: | M.A. Sc. |
| Program: | Information Systems Security |
| Date: | 1 December 2025 |
| Thesis Supervisor(s): | Jun, Yan |
| ID Code: | 996544 |
| Deposited By: | Hanzheng Dai |
| Deposited On: | 29 Jun 2026 14:43 |
| Last Modified: | 29 Jun 2026 14:43 |
Repository Staff Only: item control page


Download Statistics
Download Statistics