Breaking: AWS Outage Brings Down Major Internet Services
On October 20, 2025, Amazon Web Services experienced a significant disruption that affected numerous major platforms including Snapchat, Amazon Prime Video, Canva, Reddit, and countless other services worldwide. The outage, which began shortly after midnight Pacific Time, exposed the internet’s heavy dependence on cloud infrastructure and affected millions of users globally.
What Caused the AWS Outage?
The root cause was traced to an unusual race condition in DynamoDB’s automated DNS management system. This technical fault left an empty DNS record for the service’s regional endpoint, preventing applications from connecting to DynamoDB—essentially making the service undiscoverable on the network.
DNS, or Domain Name System, acts as the internet’s address book, converting web addresses into IP addresses so applications and websites can load properly. When this system failed, countless services dependent on AWS infrastructure immediately experienced connectivity issues.
Technical Details of the Failure
The failure originated in the DNS management system for DynamoDB in the US-EAST-1 region (Northern Virginia), which consists of two main components: the DNS Planner, which monitors load balancer health and creates traffic distribution plans, and the DNS Enactor, which applies these plans across multiple availability zones.
During the incident, one DNS Enactor was delayed and continued attempting to apply an outdated DNS plan. Meanwhile, another Enactor applied a newer plan, creating a conflict that resulted in an empty DNS record being published.
Timeline of the AWS Outage
The disruption unfolded across several critical hours:
- 11:49 PM PDT (October 19): AWS began experiencing increased error rates and latencies in the US-EAST-1 region
- 12:11 AM PDT (October 20): AWS started investigating the issue
- 12:26 AM PDT: Engineers identified DNS resolution issues as the trigger
- 2:24 AM PDT: The DynamoDB DNS issue was resolved
- 12:28 PM PDT: Significant recovery was observed across services
- 3:01 PM PDT: All AWS services returned to normal operations
Which Services Were Affected?
The outage had widespread impact across multiple sectors:
Social Media and Communication Platforms
User reports on Downdetector showed major disruptions to Snapchat, Reddit, Signal, and other communication platforms, with users unable to send messages, load content, or access basic features.
Entertainment and Gaming
Popular gaming platforms including Roblox and Fortnite experienced server disconnections, while streaming services like Amazon Prime Video and Disney+ faced buffering issues during peak viewing hours.
Business and Productivity Tools
Canva, a widely-used graphic design platform, became unresponsive for designers and creators working on projects. The service acknowledged experiencing significantly increased error rates due to issues with their underlying cloud provider.
Financial Services
Cryptocurrency exchange Coinbase, payment app Venmo, and investment platform Robinhood all reported service disruptions, preventing users from accessing their accounts and conducting transactions.
Government and Critical Services
British government websites including Gov.uk and HM Revenue and Customs experienced accessibility issues, while some healthcare systems reported disruptions that raised concerns about cloud reliance for critical operations.
Additional Affected Platforms
Other impacted services included United Airlines, Delta, T-Mobile, Starbucks, McDonald’s app, Lyft, Ring doorbells, Alexa devices, The New York Times, The Wall Street Journal, Duolingo, and Etsy.
The Cascading Effect: How One Service Took Down Hundreds
The disruption began with DynamoDB but quickly cascaded to other AWS services due to their interdependencies. The EC2 DropletWorkflow Manager, which manages physical servers for EC2 instances, relies on DynamoDB and began experiencing failures when it couldn’t establish leases.
Even after engineers identified and fixed the DNS issue, systems that had accumulated state problems during the outage needed additional recovery time. The recovery unfolded across distinct phases, with some services continuing to process message backlogs for several hours after the main issue was resolved.
The Scale of AWS’s Market Dominance
AWS commands roughly one-third of the global cloud infrastructure market, which explains why the outage had such widespread consequences. When the backbone that powers so much of the internet experiences problems, the ripple effects are felt across virtually every sector.
The incident exposed how little redundancy many services maintain for major disruptions. Even large-scale applications with hundreds of millions of users, such as Snapchat and Duolingo, were offline for hours, unable to divert traffic to backup providers.
Economic Impact and Business Consequences
Some estimates suggest the resultant chaos and damage from the outage may reach hundreds of billions of dollars. Businesses faced millions in losses, with small firms and individual creators particularly hard hit by stalled workflows.
Delivery drivers in the UK reported that Amazon warehouses were unable to sign off on packages due to technical faults, highlighting how the outage affected not just digital services but physical operations as well.
What AWS Says About Prevention
In its official postmortem, Amazon stated: “As we continue to work through the details of this event across all AWS services, we will look for additional ways to avoid impact from a similar event in the future, and how to further reduce time to recovery.” AWS has disabled the DynamoDB DNS Planner and DNS Enactor automation worldwide until safeguards can be implemented to prevent the race condition from recurring.
The Broader Cloud Infrastructure Question
The outage raises questions about the heavy reliance on just three major cloud providers—AWS, Microsoft Azure, and Google Cloud—which collectively control over 60 percent of the global cloud infrastructure market. All three companies are based in the United States, and all have experienced significant outages in recent years.
Following a major Google Cloud outage in June 2025 that disrupted services like Spotify and OpenAI, and a Microsoft Azure outage earlier in the year, this AWS incident underscores concerns about centralized cloud infrastructure and the need for greater diversification.
Historical Context: Previous Major Outages
This wasn’t AWS’s first significant disruption:
In December 2021, a widespread outage hit the same US-EAST-1 region, disrupting multiple AWS-connected services. A smaller outage in July 2024 primarily affected Amazon’s Ring camera system.
The most recent global internet outage before this occurred in 2024, when cybersecurity firm CrowdStrike published a problematic update to its anti-malware engine, causing millions of computers worldwide to crash and resulting in airport delays and mass outages that took several days to resolve.
Key Takeaways for Businesses and Users
- Diversification is Critical: Organizations should consider multi-cloud strategies to reduce dependency on a single provider
- DNS Remains Vulnerable: Despite being a fundamental internet technology, DNS continues to be a common point of failure
- Cascading Failures: Modern infrastructure complexity means a single technical fault can have far-reaching consequences
- Recovery Takes Time: Even after root causes are addressed, dependent systems may need hours to fully recover
- Global Impact: Cloud infrastructure failures can affect services and users worldwide, regardless of geographic location
Lessons Learned and Moving Forward
The incident highlighted ongoing concerns about resilience and redundancy in cloud infrastructure. As one analyst noted, the heavy centralization of internet services on a handful of cloud providers means that when problems occur, the impact is amplified across the entire digital ecosystem.
For businesses relying on cloud infrastructure, this outage serves as a stark reminder of the importance of disaster recovery planning, backup systems, and diversified cloud strategies to maintain business continuity during unexpected disruptions.
Conclusion
The October 2025 AWS outage demonstrates both the incredible power and vulnerability of modern cloud infrastructure. While AWS engineers worked swiftly to identify and resolve the issue, the hours-long disruption affected millions of users and highlighted critical questions about internet infrastructure resilience. As our digital lives become increasingly dependent on cloud services, ensuring robust, diverse, and resilient infrastructure becomes not just a technical priority but a societal necessity.







and then