Meta has announced LlamaFirewall, an open-source framework designed to enhance the security of artificial intelligence (AI) systems. The framework aims to protect AI systems from emerging cyber risks, including prompt injections, jailbreaks, and insecure code. It includes three key guardrails: PromptGuard 2, Agent Alignment Checks, and CodeShield. These components are designed to detect jailbreak attempts, inspect agent reasoning, and prevent the generation of insecure code, respectively.
PromptGuard 2 is intended to detect real-time direct prompt injection and jailbreak attempts, while Agent Alignment Checks focuses on identifying indirect prompt injection scenarios. CodeShield serves as an online static analysis engine to ensure AI agents do not generate dangerous code. Meta has built LlamaFirewall to be modular, allowing security teams to compose layered defenses for LLM-powered applications, spanning from raw input ingestion to final output actions.
Alongside LlamaFirewall, Meta has updated LlamaGuard and CyberSecEval, tools designed to detect violations and evaluate the cybersecurity defenses of AI systems.
The new version of CyberSecEval, known as CyberSecEval 4, includes AutoPatchBench, a benchmark that tests an LLM agent’s ability to repair vulnerabilities in C/C++ code. AutoPatchBench allows for a standardized evaluation of AI-assisted vulnerability repair, offering insights into the effectiveness of AI tools for patching fuzzing-identified bugs.
Meta also launched the Llama for Defenders program to support AI developers and partner organizations. This initiative provides access to early and closed AI solutions to address security challenges like detecting AI-generated content used in phishing, fraud, and scams. Additionally, Meta previewed WhatsApp’s Private Processing technology, which allows users to benefit from AI features while maintaining privacy by processing requests in a secure environment.
Reference: