Towards Supporting Penetration Testing Education with Large Language Models: an Evaluation and Comparison
Journal:
arXiv
Published Date:
Jan 29, 2025
Abstract
Cybersecurity education is challenging and it is helpful for educators to
understand Large Language Models' (LLMs') capabilities for supporting
education. This study evaluates the effectiveness of LLMs in conducting a
variety of penetration testing tasks. Fifteen representative tasks were
selected to cover a comprehensive range of real-world scenarios. We evaluate
the performance of 6 models (GPT-4o mini, GPT-4o, Gemini 1.5 Flash, Llama 3.1
405B, Mixtral 8x7B and WhiteRabbitNeo) upon the Metasploitable v3 Ubuntu image
and OWASP WebGOAT. Our findings suggest that GPT-4o mini currently offers the
most consistent support making it a valuable tool for educational purposes.
However, its use in conjonction with WhiteRabbitNeo should be considered,
because of its innovative approach to tool and command recommendations. This
study underscores the need for continued research into optimising LLMs for
complex, domain-specific tasks in cybersecurity education.