On July 30, 2024, Microsoft experienced a significant global outage that disrupted its Azure cloud services and Microsoft 365 products for nearly 10 hours. The outage, which began at approximately 11:45 UTC and ended by 19:43 UTC, was triggered by a Distributed Denial-of-Service (DDoS) attack. The attack overwhelmed key components of Microsoft’s infrastructure, including Azure Front Door (AFD) and Azure Content Delivery Network (CDN), leading to widespread service issues such as intermittent errors, timeouts, and latency spikes for users around the world.
In response to the attack, Microsoft quickly implemented a series of network configuration changes and performed failovers to alternative networking paths to mitigate the impact. Initial mitigation efforts successfully restored most services by 14:10 UTC, but some customers continued to face reduced availability and performance issues until around 18:00 UTC. To address the ongoing problems, Microsoft deployed a more refined mitigation strategy in phases, first targeting regions in Asia Pacific and Europe before addressing the Americas. Full restoration of services was achieved by 19:43 UTC, with complete resolution declared by 20:48 UTC.
The global outage had considerable repercussions for businesses and services that rely on Microsoft’s cloud infrastructure. Notably, Starbucks in the United States was forced to disable its mobile ordering system due to the Azure disruptions. This incident comes on the heels of a series of recent issues affecting Microsoft’s services, raising concerns about the resilience and reliability of cloud infrastructure, as well as the risks associated with centralized services.
To address the fallout from this incident, Microsoft has committed to a thorough internal review. The company plans to publish a Preliminary Post-Incident Review within 72 hours, followed by a Final Post-Incident Review within 14 days. These reviews will aim to provide detailed insights into the cause of the outage, the effectiveness of the response, and the lessons learned to enhance the reliability and security of its cloud services in the future. Microsoft’s proactive approach highlights the importance of continuous improvement in managing and mitigating large-scale cyber threats and ensuring robust service continuity.
Reference: