Google has launched a significant new privacy-enhancing technology called Private AI Compute designed to process artificial intelligence (AI) queries within a secure, cloud-based platform. The company’s core goal with this infrastructure is to unlock the full speed and power of its Gemini cloud models for advanced AI experiences while simultaneously guaranteeing that user data remains entirely private and is not accessible, even to Google itself. This novel approach addresses the challenge of utilizing cutting-edge, high-speed cloud computation without compromising security and user confidentiality.
Private AI Compute operates as a secure, fortified space for handling sensitive user data. It effectively simulates the privacy assurances of on-device processing but extends them with the robust capabilities of Google’s cloud AI. The system is powered by custom hardware, specifically Trillium Tensor Processing Units (TPUs) and Titanium Intelligence Enclaves (TIE). This specialized hardware foundation is what allows Google to deploy its most advanced frontier AI models without making any concessions on the crucial aspects of security and privacy, striking a balance between performance and protection.
The underlying infrastructure for Private AI Compute, which includes the CPU and TPU workloads (referred to as trusted nodes), relies on an AMD-based hardware Trusted Execution Environment (TEE). This environment functions by encrypting and isolating the memory from the host system. Crucially, the technology giant has implemented measures to ensure that only attested workloads can run on these trusted nodes, and all administrative access to these workloads is completely severed. Furthermore, the nodes are structurally hardened to protect against potential physical attacks aimed at data exfiltration.
The system also incorporates peer-to-peer attestation and encryption between the trusted nodes. This critical feature ensures that user data is only decrypted and processed strictly within the boundaries of the secure execution environment, thereby shielding it from the broader Google infrastructure. As part of this process, each workload cryptographically validates the credentials of the other, establishing mutual trust within the protected space. A connection is only established if the node’s attestation successfully validates against internal reference values, actively preventing any connection from untrusted components and safeguarding user information.
The complete operational flow begins with a user’s client establishing an encrypted connection with a frontend server and performing a bi-directional attestation. The client then uses an Oak end-to-end encrypted attested session to confirm the server’s genuine and unmodified identity. Subsequently, the frontend server sets up an encrypted channel using Application Layer Transport Security (ALTS) with other services in the scalable inference pipeline. These services ultimately communicate with the model servers running on the hardened TPU platform. The entire architecture is ephemeral by design, meaning that an attacker gaining privileged access cannot obtain past data, as all inputs, model inferences, and computations are discarded immediately upon completion of the user session.
Reference:






