An Agentic Benchmarking of Large Language Models for Security Incident Analysis

Title:

An Agentic Benchmarking of Large Language Models for Security Incident Analysis

Jajodia, Sourov (2025) An Agentic Benchmarking of Large Language Models for Security Incident Analysis. Masters thesis, Concordia University.

Text (application/pdf)
Jajodia_MASc_F2025.pdf - Accepted Version
Restricted to Repository staff only until 30 April 2026.
Available under License Spectrum Terms of Access.

2MB

Abstract

Security incident analysis poses a difficult challenge for security operation centers, because they must respond to an overwhelming number of alerts, requiring analysis of large volumes of data, a myriad of tools, while facing shortages of experienced analysts. The job of an analyst is further complicated as incidents are dynamic and require multifaceted, multi-step analysis. While companies are keen to apply Large Language Models (LLM) to augment analysts’ efforts in security incident analysis (SIA), the lack of benchmarking of LLMs for SIA renders huge risks on their overall effectiveness and design choices. Moreover, such benchmarking becomes non-trivial as: (i) no dataset currently exists in a digestible format for LLMs that covers a wide range of SIA tasks; (ii) considering the vast diversity in analysts’ job, there is a continuous need to add new tasks; and (iii) frequent model releases must be included in evaluation. In this thesis, we aim to address these challenges while building an agentic evaluation framework, SIABENCH. Specifically, first, we build a first-of-its-kind dataset that includes two major SIA tasks: (i) deep analysis of security incidents (25 scenarios) and (ii) alert triaging (35 scenarios). Second, we build an agent to automatically conduct a wide range of SIA tasks (covering network/memory forensics, malware analysis in binary/code/PDF, phishing email/kit analysis, and log analysis) along with false alert detection. Third, we evaluate the performance of nine major LLMs (covering both open- and closed-weight) in SIA with the capability to support newer models and tasks.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:	Thesis (Masters)
Authors:	Jajodia, Sourov
Institution:	Concordia University
Degree Name:	M.A. Sc.
Program:	Information Systems Security
Date:	23 August 2025
Thesis Supervisor(s):	Majumdar, Suryadipta and Sultana, Madeena
ID Code:	996238
Deposited By:	Sourov Jajodia
Deposited On:	04 Nov 2025 16:52
Last Modified:	04 Nov 2025 16:52

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

An Agentic Benchmarking of Large Language Models for Security Incident Analysis

An Agentic Benchmarking of Large Language Models for Security Incident Analysis

Abstract