Microsoft recently dealt with a significant service disruption that primarily impacted users across North America. The outage, which was classified as a critical service issue (tracked under MO1138499), prevented customers from accessing key services like Office.com and the company’s AI-powered assistant, Copilot. The incident was first reported by users on DownDetector and was characterized by server connection problems and login failures. While the majority of reports originated from North America, Microsoft’s investigation aimed to determine the full scope of the impact.
Initial efforts by Microsoft’s engineering teams focused on data collection and internal reproduction of the issue. The company was actively reviewing service telemetry and network diagnostics to pinpoint the exact cause of the problem. During this period, Microsoft advised customers that they could still access Copilot through alternative methods, such as copilot.microsoft.com, the Microsoft Copilot for the Microsoft 365 app, or other Microsoft 365 applications like Teams. This provided a temporary workaround for affected users while the company worked to find a permanent solution.
The investigation led Microsoft to identify a specific configuration change that had been deployed around the same time the initial reports of the outage began. Out of caution and as a potential mitigation strategy, the company decided to revert this recent update. This rollback was a key step in their efforts to restore service and alleviate the user impact. The company also continued to analyze network traces, authentication flows, and Content Delivery Network (CDN) interactions to fully understand the root cause.
After several hours of work, Microsoft confirmed that the reversion of the configuration change was complete and that the issue was resolved for all affected users. To fully restore service, the company advised customers to refresh their web browsers. This incident highlights the potential for seemingly minor system changes to cause widespread disruption in cloud-based services.
This outage is not the first service incident for Microsoft. The company had recently provided a workaround for “couldn’t connect” errors in Microsoft Teams and, two months prior, had to mitigate an incident affecting Microsoft 365 authentication features in the EMEA and APAC regions. Such events underscore the complexities of managing a global cloud infrastructure and the constant need for robust testing and monitoring to prevent future disruptions.
Reference: