Towards Supporting Penetration Testing Education with Large Language Models: an Evaluation and Comparison

Journal: arXiv

Published Date: Jan 29, 2025

Abstract

Cybersecurity education is challenging and it is helpful for educators to understand Large Language Models' (LLMs') capabilities for supporting education. This study evaluates the effectiveness of LLMs in conducting a variety of penetration testing tasks. Fifteen representative tasks were selected to cover a comprehensive range of real-world scenarios. We evaluate the performance of 6 models (GPT-4o mini, GPT-4o, Gemini 1.5 Flash, Llama 3.1 405B, Mixtral 8x7B and WhiteRabbitNeo) upon the Metasploitable v3 Ubuntu image and OWASP WebGOAT. Our findings suggest that GPT-4o mini currently offers the most consistent support making it a valuable tool for educational purposes. However, its use in conjonction with WhiteRabbitNeo should be considered, because of its innovative approach to tool and command recommendations. This study underscores the need for continued research into optimising LLMs for complex, domain-specific tasks in cybersecurity education.

Authors

Martin Nizon-Deladoeuille
Brynjólfur Stefánsson
Helmut Neukirchen
Thomas Welsh

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2501.17539v1)

Towards Supporting Penetration Testing Education with Large Language Models: an Evaluation and Comparison

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Towards Supporting Penetration Testing Education with Large Language Models: an Evaluation and Comparison

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals