Last Tuesday, the Internet suffered a massive outage and thousands of well-known sites just went down, thanks to a technical glitch on the Amazon Servers.
But it wasn’t Amazon itself that failed the internet, it was Amazon Web Services (AWS), that provides crucial back-end for thousands of websites. In the specific case, the AWS bug last for almost 4 hours for about 100 000 sites, which were either affected or offline.
The sites affected include Slack, Quora, Imgur, Apple, Yahoo, and Medium, before AWS could restore the services. Even the site isitdownrightnow.com, one of the famous sites to check if other sites are down or running, was inaccessible thanks to this glitch, it’s like karma.
But what really happened? Amazon says that one of the company’s S3 datacenters in northern Virginia is to blame. The facility, part of a network of operations named US-EAST-1, passed through high error rates while sending and receiving clients’ hosted data, and that resulted in a glitch that make all those pages unavailable, excruciating slow, or missing features.
The bug made it impossible for some of the cloud services like Slack, to run at all in some regions. News media sites hosted with AWS like The Huffington Post, The Verge and Business Insider, where also brought down or showing the pages without any images.
“Imagine your business not being able to run for a day”, head of research for Loup Ventures, Gene Munster, told Reuters. “That’s a big problem”.
Although this fail may seem similar to last October’s massive outage that took down multiple sites, including Amazon itself, the causes are completely different. In that occasion, the main cause was a huge Distributed Denial of Service (DDoS) attack, a type of attack where millions of hackers attack the same infrastructure at the same time.
But this S3 error, although might bear similarities, is completely different, since it hasn’t originated of a hack, is just a consequence of a software malfunction.
This kind of errors happen all the time, all over the Internet, but due to how big AWS got, and how concentrated the internet’s reliance on it has become, we have this huge effect.
“Every time there’s a major cloud outage, you occasionally get customers who thought that everything would be magical and forever working”, Gartner analyst Lydia Leong told Ángel González at The Seattle Times. “And then they’re disabused of that notion and everybody gets on with their lives”.