AI Attack Hides Prompts In Images

Researchers from Trail of Bits have developed a novel attack that exploits how AI systems process images, allowing malicious actors to steal sensitive user data. The method involves embedding hidden instructions within high-resolution images. These instructions are invisible to a person viewing the image but become readable to an AI system after the image is resampled, a common process to reduce image quality and file size. This attack leverages the fact that many AI systems, particularly large language models (LLMs) that accept image inputs, automatically downscale images for efficiency. The attack, named after a theory presented in a 2020 USENIX paper by a German university (TU Braunschweig), builds on the concept of an image-scaling attack in machine learning. It’s a clever way to manipulate the system’s inherent processes against itself, creating a covert channel for malicious commands.

The core of this attack lies in the specific resampling algorithms used to downscale images. When an image is uploaded to an AI system, it undergoes a transformation to a lower resolution. Depending on the system, this can be done using methods like nearest neighbor, bilinear, or bicubic interpolation. The Trail of Bits researchers crafted a high-resolution image in which specific patterns and dark areas are designed to “transform” into visible text when processed by these algorithms. For example, using bicubic downscaling, a specific arrangement of dark pixels in a malicious image will turn into red areas, making hidden black text emerge. This hidden text acts as a malicious prompt, which the AI model interprets as part of the user’s legitimate input, merging it with the original instructions.

From the user’s perspective, this attack is completely undetectable. The image they see on their screen appears normal, and they have no reason to suspect that hidden instructions are being executed in the background. The malicious prompt, once revealed by the downscaling process, is seamlessly integrated into the user’s query. This allows the attacker to manipulate the AI system to perform actions without the user’s explicit consent or knowledge. The attack is particularly effective because it preys on a seemingly benign and routine part of the image processing pipeline. It bypasses conventional security measures that might look for malicious code or text in the initial user input, as the harmful instructions only become apparent later in the process.

The potential consequences of a successful attack are significant, ranging from data leakage to unauthorized actions. The malicious prompt can instruct the AI to perform risky actions, such as exfiltrating sensitive data to an external location. A compelling example provided by the researchers involved the Gemini CLI and Zapier MCP. By embedding a hidden prompt in an image, they were able to trick the model into exfiltrating Google Calendar data to a specified email address. This was possible because the Zapier MCP was configured with trust=True, which allowed it to approve tool calls without user confirmation. This specific example highlights the potential for such attacks to be integrated into larger systems and workflows, leading to serious security vulnerabilities.

Protecting against this kind of attack requires a multi-layered approach. A key mitigation strategy is to improve how AI systems handle image processing, particularly the downscaling process. Instead of simply relying on standard resampling algorithms, developers could implement more robust methods that do not produce the aliasing artifacts that make these attacks possible. Another important step is to enhance the security protocols for AI tool calls. Systems should require explicit user confirmation for sensitive actions, even when using platforms like Zapier. Developers should be cautious about configurations that automatically approve tool calls without verification. Additionally, ongoing research into prompt injection and covert channels is crucial to stay ahead of these evolving threats and develop more resilient AI systems.

Reference: