Researchers at the Rochester Institute of Technology (RIT) have launched CTIBench, a new benchmark designed to evaluate how well Large Language Models (LLMs) perform in Cyber Threat Intelligence (CTI). CTIBench focuses on critical CTI tasks such as threat detection, vulnerability analysis, and threat actor attribution. By testing models like ChatGPT 4 and Gemini 1.5, CTIBench aims to standardize and improve LLM capabilities in cybersecurity applications.
CTIBench includes tasks like Cyber Threat Intelligence Multiple Choice Questions (CTI-MCQ), Root Cause Mapping (CTI-RCM), Vulnerability Severity Prediction (CTI-VSP), and Threat Actor Attribution (CTI-TAA). These tasks assess LLMs’ ability to interpret and process complex CTI data, providing insights into their accuracy and reliability in practical cybersecurity scenarios.
Developed using a knowledge database from established CTI frameworks and standards, CTIBench features over 2500 multiple-choice questions derived from sources like MITRE ATT&CK and the Common Vulnerability Scoring System (CVSS). By evaluating various LLMs, CTIBench identifies areas where advancements in LLM technology can enhance CTI effectiveness and response strategies.
The results from CTIBench testing highlight both strengths and weaknesses among LLMs, guiding future research and development efforts to bolster cybersecurity defenses. This initiative underscores the importance of robust evaluation frameworks like CTIBench in advancing LLM applications in CTI and fortifying cybersecurity operations globally.