Google has integrated the power of artificial intelligence into its open source fuzz testing framework, showcasing a transformative potential in the realm of bug-hunting.
By introducing generative-AI technology to its OSS-FUZZ project, which offers free fuzzers for open source projects and privately alerts developers about detected bugs, Google has achieved significant advancements in code coverage. Leveraging large language model (LLM) algorithms for the creation of new fuzz targets, Google observed remarkable outcomes in terms of code coverage improvement.
Incorporating LLMs into the fuzz testing process enables the augmentation of code coverage for critical projects within Google’s OSS-Fuzz service without necessitating manual code writing. This groundbreaking approach not only enhances the security of over 1,000 fuzzed projects but also paves the way for the wider adoption of fuzzing in future endeavors.
Fuzz testers, or fuzzers, play a vital role in vulnerability research by subjecting applications to random inputs to expose potential security vulnerabilities. When vulnerabilities are detected, researchers can analyze the test outcomes to pinpoint the root cause of errors.
Traditionally, fuzzing entails substantial manual effort to construct fuzz targets and functions for testing various code segments. Google’s exploration of the viability of LLMs seeks to optimize the effectiveness of the six-year-old OSS-Fuzz service.
While OSS-Fuzz has successfully identified and verified fixes for more than 10,000 security bugs in open source software, the potential for discovering even more bugs through increased code coverage remains untapped. Despite its achievements, the fuzzing service currently covers only around 30% of an average open source project’s code, leaving a significant portion untouched by fuzzing.
To evaluate the feasibility of LLM-generated fuzz targets, Google’s software engineers developed an evaluation framework connecting OSS-Fuzz and LLMs. This framework facilitates the identification of under-fuzzed and high-potential sections of code within sample projects.
By employing LLMs to create new fuzz targets, Google witnessed substantial code coverage improvements ranging from 1.5% to 31% across various projects. For instance, in the case of the “tinyxml2” project, code coverage increased from 38% to 69% without human intervention.
The experiment revealed the potential of LLMs to autonomously generate functional targets that rediscover vulnerabilities previously overlooked by fuzzing. Google’s plans to open source the evaluation framework will provide researchers with the tools to test their own automatic fuzz target generation methods, further advancing the field of fuzz testing and bolstering software security.