A recent study has uncovered a new side-channel attack, called “InputSnatch,” that poses a serious threat to user privacy in large language models (LLMs). The attack exploits timing differences in cache-sharing mechanisms that are commonly used to optimize LLM inference. While these optimizations enhance performance, they unintentionally create vulnerabilities by allowing attackers to infer private user inputs based on response times. This discovery highlights the trade-off between optimizing LLM performance and maintaining user privacy, particularly in sensitive contexts such as healthcare, finance, and legal services.
The InputSnatch attack takes advantage of both prefix caching and semantic caching techniques. These methods, designed to speed up LLM inference, can inadvertently leak information about the user’s typed input. By measuring the time it takes for an LLM to respond to a query, attackers can determine the length of cached input prefixes, potentially revealing private or sensitive information. Researchers demonstrated that InputSnatch can reconstruct user inputs with alarming accuracy, posing serious risks to user confidentiality.
In their research, the team showed that InputSnatch could achieve impressive success rates in various scenarios. For instance, the attack accurately determined cache hit prefix lengths 87.13% of the time. In more sensitive applications, such as medical question-answering systems, the attack was able to extract exact user inputs with a 62% success rate. Even in legal consultation services, the attack was able to extract semantic information with near-perfect accuracy, raising concerns about the potential misuse of this technique in high-stakes industries.
To address these risks, the research team calls for LLM developers to reassess their caching strategies and implement stronger privacy-preserving measures. They suggest that prioritizing user privacy alongside performance improvements is essential, especially as LLMs continue to be integrated into critical sectors. The study serves as a wake-up call for the AI community to address the delicate balance between performance and privacy, ensuring that LLM-powered applications can be both effective and secure for users across industries.
Reference: