AWS outage: what it reveals about the fragility of cloud cybersecurity

240 views

The fall of the world’s leading cloud infrastructure platform has caused a blackout across websites, apps, and social networks without contingency plans. Lacking a plan…

Panda SecurityOct 20, 20253 min read

The fall of the world’s leading cloud infrastructure platform has caused a blackout across websites, apps, and social networks without contingency plans. Lacking a plan B can trigger a total paralysis — and even invisibility — multiplying the risk of intrusions.

The engine stopped

On the morning of Monday, October 20, 2025, numerous websites, applications, and social networks went dark due to a global outage of Amazon Web Services (AWS), the world’s largest cloud infrastructure platform. In the United States, users were unable to access Amazon, Alexa, Prime Video, Crunchyroll, Canva, Perplexity, and Duolingo; social networks like Snapchat or Goodreads; and games such as Fortnite, Roblox, or Clash Royale. In Europe, several services experienced similar accessibility issues.

This happens because many invisible pieces of the internet live on AWS,explains Hervé Lambert, Global Consumer Operations Manager at Panda Security. “When this platform fails, it’s not just a server that goes down — entire basic services collapse, affecting websites, apps, and social networks that rely on them.” In short, “they stop working because they share the same infrastructure and base services — computing, storage, DNS, authentication, and CDN — either directly in AWS or in third-parties that depend on it. Without multi-region architecture or contingency plans, the entire user experience — loading, logging in, paying, or posting — falls apart.

When an outage of this magnitude occurs,” continues Lambert, “some apps can’t serve pages, APIs, or feeds because their compute layer — EC2, EKS, or Lambda — fails at the nodes or control plane. If there’s nowhere to read or store data, the site can’t load or authenticate; logins break because authentication systems like Cognito, STS/AssumeRole, or AWS SSO stop issuing tokens; DNS fails to resolve, or the CDN can’t fetch origin data, so domains respond erratically. Even if an app isn’t hosted on AWS, it still suffers if its providers are — the whole chain behaves like a house of cards.”

Why AWS Failures Ripple Across Services and Apps

Moreover, when AWS fails or degrades, “some companies go blind because their observability depends on that same platform,” warns Lambert. “If tools like CloudWatch, CloudTrail, GuardDuty, SIEMs, dashboards, SNS/SES alerts, or SSO are hosted in the same region, they too go down — leaving websites without metrics, logs, or valid credentials, and therefore exposed.” All of this is preventable “if monitoring, logging, and identity have an emergency exit outside the failure zone.

Many companies, however, centralise everything in a single region and account — “including backups and KMS keys,” notes Lambert. “Without multi-region failover, unavailability is total. Under pressure, some teams open security groups, disable WAFs, or expand IAM permissions to keep systems running — often breaking more things or leaving apps vulnerable.

The importance of having a “Plan B”

Why are there no contingency plans if outages are so risky?

“Because they aren’t incentivised — they seem expensive and technically tedious,” summarises Lambert. “Many websites and apps lack a Plan B because their priorities are misaligned: business rewards speed, not resilience; there’s a false sense of security — people believe these things won’t happen to them. Multi-region or multi-account setups, data replication, redundant identities, runbooks, and drills all sound like cost doubling. And many assume AWS won’t fail or that the SLA will cover the loss — which is not true.

At this point, the role of security by design becomes crucial. Many organisations still don’t integrate cybersecurity from the earliest stages of product or infrastructure development. They often react later with patches instead of building resilient systems from the start — a less effective and ultimately more expensive approach.

To break that cycle, Lambert suggests: “build resilience into KPIs, separate accounts and regions, automate backups and guardrails, and run failover drills. That will always be cheaper than explaining to thousands of users why your service has disappeared.