Leveraging Large Language Models for Challenge Solving in Capture-the-Flag

Published in 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2024

Keywords: Privacy; Large language models; Knowledge based systems; Collaboration; Problem-solving; Mirrors; Penetration testing; Multi-agent systems; Large Language Models; LLM Agent; Capture the Flag; Penetration Testing

Abstract:
Capture-the-Flag (CTF) competitions are a prominent method in cybersecurity for practical attack and defense exercises. Despite the rapid advancements in Large Language Models (LLMs), their potential for solving CTF challenges remains underexplored. In this paper, we first propose a flexible CTF platform designed to reflect real-world penetration testing scenarios, bridging the gap between theoretical learning and practical cybersecurity challenges. Our platform is highly customizable, freely deployable, and capable of generating scenarios that closely mirror real network vulnerabilities. More importantly, we introduce an automated LLM agent framework that tackles CTF challenges using an integrated toolchain and various plugins to enhance problem-solving efficiency. In addition, we propose a human-validated LLM agent framework to address the potential limitations of the fully automated LLM agent, providing a clearer evaluation of the LLMs’ intrinsic capabilities. We evaluate the agent’s performance using four LLMs: GPT-4o, GPT-4o mini, o1-preview, and o1-mini. Although the LLMs’ performance on dynamic and complex penetration tests reveals certain limitations, which need further exploration, our experimental results demonstrate that LLMs can leverage their extensive knowledge bases to effectively solve CTF challenges.

Recommended citation: Y. Zou, Y. Hong, J. Xu, L. Liu and W. Fan, "Leveraging Large Language Models for Challenge Solving in Capture-the-Flag," 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Sanya, China, 2024, pp. 1541-1550. doi: 10.1109/TrustCom63139.2024.00213.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Yang Hong

Share on