A new vulnerability named ‘LeftoverLocals’ has been identified, impacting graphics processing units (GPUs) from major manufacturers such as AMD, Apple, Qualcomm, and Imagination Technologies. Tracked as CVE-2023-4969, this security flaw allows malicious actors to retrieve data from the local memory space of vulnerable GPUs. Researchers at Trail of Bits, Tyler Sorensen and Heidy Khlaaf, discovered LeftoverLocals and reported it to the vendors before publicly disclosing the technical details. The vulnerability stems from incomplete memory isolation in some GPU frameworks, enabling one kernel to read values from local memory written by another kernel.
LeftoverLocals poses a significant security risk, particularly in the context of large language models (LLMs) and machine learning (ML) processes. Adversaries can exploit the flaw by running a GPU compute application, such as OpenCL, Vulkan, or Metal, to read data left in the GPU local memory. The researchers created a proof of concept (PoC) demonstrating that an attacker can recover up to 5.5MB of data per GPU invocation, depending on the GPU framework used. In some scenarios, an attacker could retrieve sensitive information about the victim’s computations, including model inputs, outputs, weights, and intermediate computations.
Mitigation efforts are underway, with some vendors already releasing fixes, while others are working on implementing defense mechanisms. Apple’s latest iPhone 15 is unaffected, and patches are available for A17 and M3 processors, though M2-powered computers still face the issue. AMD is actively investigating mitigation strategies for vulnerable GPU models. Qualcomm released a patch for some chips, but others remain vulnerable. Imagination released a fix in December 2023, but Google warned in January 2024 that some GPUs are still impacted. Trail of Bits recommends implementing an automatic local memory clearing mechanism between kernel calls to ensure the isolation of sensitive data, despite potential performance overhead. Other suggested mitigations include avoiding multi-tenant GPU environments in security-critical scenarios and implementing user-level measures.