Azure Outage Comes a Week After a Cloud DNS Error Disrupted AWS Users

Microsoft’s Azure cloud and 365 systems suffered an outage at noon on Wednesday because of a configuration error – hours before its quarterly earnings call and about a week after rival AWS underwent a widespread outage that shut down applications and services for most of the day.
See Also: Tokenization, Authentication, and the Future of Machine-Led Transactions
Microsoft on its Azure Status page Wednesday said an “an inadvertent configuration change” triggered the outage, which resulted in “latencies, timeouts and errors” for users.
To restore the Azure Front Door system, Microsoft deployed its “last known good configuration” and posted at 5:40 p.m. that the recovery should be complete by 7:20 p.m. Wednesday. Affected Azure services include App Service, Azure Active Directory B2C, Azure Communication Services, Azure Databricks, Azure Healthcare APIs, Azure Maps, Azure Portal, Azure SQL Database, Container Registry, Media Services, Microsoft Defender External Attack Surface Management, Microsoft Entra ID, Microsoft Purview, Microsoft Sentinel, Video Indexer, and Virtual Desktop.
Microsoft said it immediately began recovering nodes and re-routing traffic through healthy nodes, but the process takes time and “some requests may still land on unhealthy nodes, resulting in intermittent failures or reduced availability until more nodes are fully restored.” As the company reloads configurations and rebalances traffic across working nodes, the recovery process needs to be gradual to ensure “stability and preventing overload as dependent services recover.”
The Redmond, Washington-based software giant had scheduled its quarterly earnings call after the market closed on Wednesday. Microsoft’s stock price dropped to $523.25 in after-hours trading, down 3.57% from the market closing price.
The outage came a little over a week after Amazon Web Services dealt with a widespread outage that affected its own services and dozens of its clients for most of day on Oct. 20. AWS blamed the outage on a domain name system misconfiguration
AWS said the failure to DNS resolution of the DynamoDB API endpoint happened in the cluster of data centers located in Northern Virginia, an area with the world’s largest concentration of cloud infrastructures.
The AWS disruption affected several widely used applications and services, including British banks Lloyds and Halifax as well as some British government websites including Gov.uk and HM Revenue and Customs. AWS said it detected the issue during the first hours of Monday morning. By 5:27 a.m. EST it said that most services had been restored. Affected companies also included the Amazon.com storefront, Disney+, Ring doorbells, Venmo, McDonald’s and encrypted chat app Signal.
