Google has established an AI Red Team, focusing on simulating attacks on artificial intelligence (AI) systems, and released a comprehensive report detailing common attack types and key takeaways.
The AI Red Team complements traditional red teams with specialized AI expertise to execute complex technical attacks on AI systems. The report highlights prompt engineering, a widely-used AI attack method where attackers manipulate prompts to influence the system’s responses, potentially bypassing security measures like phishing detection in a webmail application that uses AI.
The report illustrates another example involving data used to train large language models (LLMs), such as ChatGPT. Despite efforts to remove personal and sensitive information, researchers demonstrated the extraction of personal data from LLMs, emphasizing the need for data protection in AI systems.
Moreover, the report raises concerns about AI autocomplete features, which an attacker can exploit to obtain private information by crafting sentences prompting the AI to reveal sensitive details about an individual.
To enhance AI security, Google advises traditional red teams to collaborate with AI experts, fostering realistic adversarial simulations. While traditional security controls can mitigate certain risks, addressing red team findings may pose challenges, requiring multi-layered security models to counter various AI attacks effectively.
The report serves as a valuable resource for organizations seeking to bolster the protection of their AI systems and highlights the importance of securing data, implementing robust controls, and integrating specialized AI subject matter expertise for better defense against emerging AI threats.