Amazon S3
Sunday, July 20th, 2008UPDATE: Amazon S3 has been extremely reliable since I wrote this post. At this point they are almost certainly more reliable than us hosting ourselves.
FaceStat would have been much harder to make without Amazon’s S3. We’ve had over 100k images uploaded in the last month or two, and those images are requested constantly. We’re not big enough to justify using a CDN, but handling the image bandwidth ourselves would have introduced problems that I’m really glad we never had to deal with.
But I guess we’re going to have to deal with hosting our own images now. Since 8:43 this morning our site has been rendered non-functional. That’s 3 hours so far. I assumed Amazon would be more reliable than us at serving content.
All I can do is look at the status messages:9:05 AM PDT We are currently experiencing elevated error rates with S3. We are investigating.9:26 AM PDT We’re investigating an issue affecting requests. We’ll continue to post updates here.9:48 AM PDT Just wanted to provide an update that we are currently pursuing several paths of corrective action.10:12 AM PDT We are continuing to pursue corrective action.10:32 AM PDT A quick update that we believe this is an issue with the communication between several Amazon S3 internal components. We do not have an ETA at this time but will continue to keep you updated.11:01 AM PDT We’re currently in the process of testing a potential solution.11:22 AM PDT Testing is still in progress. We’re working very hard to restore service to our customers.
I especially love this bullshit about “elevated error rates”. Requests for at least 99% of the images we have cause s3 to hang. If our site failed for 99% of requests, I would not call that “elevated error rates”, I would call that being completely fucking down.
So what can we learn from this? Since many commentors mocked us for not building our own homemade monitoring system for FaceStat when we got hit with the Yahoo traffic, I’m sure people will mock us for using S3 in our runtime. And in this case the criticism might be fair. Should we have known better than to rely on Amazon? How many sites are there out there like us? Do they fail over to their own servers? Seems like that defeats the point of using S3 or EC2 at all.
I guess I’m spending my Sunday building a “poor mans cdn“.