Amazon S3: Simple Storage Service
Amazon’s
Simple Simple Storage Service provides endless storage for any
conceivable kind of file or data. It provides a highly scalable,
secure, distributed storage network, accessable from any internet
connection, and can be used for everything from backing up personal
data, to distributing multimedia content to millions of users.
Amazon offers a Service Level Agreement
which states that if their uptime dips below 99.9%, you can claim
service credits to offset the disruption. In practise though, you have
to provide Amazon with detailed logs to prove that you suffered
disruption, and that the disruption was caused by S3. The SLA details can be found here:
http://www.amazon.com/gp/browse.html?node=379654011
Overview
One of the great things about the S3 service is its utter
simplicity. S3 has just 2 types of construct, namely buckets and
objects. Your S3 account can contain any number of buckets, your
buckets can contain any number of objects and an object can hold data
and metadata. That’s it. There are no folder structures, no renaming
support, no in place editing, nothing. Almost any application would
need to impose a layer on top of S3, and therein lies its brilliance.
S3 allows you to set access controls on both buckets and objects,
governing who can do what. Each bucket and object is uniquely
addressable with its own URI, for example:
http://s3.amazonaws.com/mybucket/myobject
S3 also allows you to map on your own domain, so you could publish the same resource at, say;
http://www.agilemicroisv.com/myobject
S3 is hugely distributed. It is spread across multiple server in
multiple locations. Currently, you can house data in either the US or
Europe. This is a two edged sword. On the one hand, it provides huge
benefits in terms of reliability, scalability and redundancy, but on
the other hand, its distributed nature can cause the odd headache.
- Objects are not files. S3 does not support any
file system-like operations on your objects. You cannot rename, move or
modify files in place. You must get a local copy, make any changes,
then commit the new file back. - Propagation latency. Your files are distributed
over multiple servers in multiple data centres. This can cause issues
if you have users trying to access the same data at the same time from
different locations. Let’s say I upload a file, and you try to grab it.
The object might not appear to you straight away, which can cause all
kinds of fun with missing and out of date objects. If concurrent access
is important to your application, you will end up writing some sort of
version control or intermediary layer. - S3 requests will fail. Occasionally. It’s not a
bug, rather a deliberate consequence of the architecture. Any
application will need to gracefully expect the occasional failure, and
retry after a small pause. - S3 IP addresses will change. Occasionally. Again, because of the distributed architecture of the system, you shouldn’t employ any local DNS caching for more than a few minutes.
It’s a great system, but if you’re going to use it, you need to have
a good grasp of what it is, and perhaps more importantly, what it isn’t.
Pricing
Users of Amazon’s web services are billed on 3 fronts.
- Storage. Storage space in S3 is charged at 15¢ per gigabyte per
month for data stored in the US, and 18¢ per gigabyte per month for
data stored in Europe. - Data transfer. Data sent to S3 costs 10¢ per
gigabyte uploaded. Data retrieved from S3 is charged on a sliding
scale, depending on how much data was downloaded during the month: 18¢
per gigabyte for the first 10 terabytes downloaded, 16¢ per gigabyte
for the next 40 terabytes (between 10 TB and 50 TB), and 13¢per
gigabyte for any additional data (over 50 TB). - API requests.
You are also charged based on the number of API request messages S3
processes on your behalf. You must pay per-request fees for the
requests performed by your own application, as well as requests made by
others when they download data you have made available from your
account.
Amazon provide a calculator to estimate your usage costs:
http://calculator.s3.amazonaws.com/calc5.html
I’m not sure why the US is cheaper than Europe. It might be infrastructure costs, it might be the worthless dollar.
Spreading the love a little, here are a few applications that have built on top of the S3 APIs:

Since i started to read your blog when you wrote about Zemanta, I might point out that Zemanta uses S3 too. And EC2.
Quite good experiences with both, apart from some hiccups. And take a look at SimpleDB too.
bye
andraz
@Andraz, awesome, I didn’t know that Zemanta backed on to the AWS.
I’ve got a series of AWS post coming up over the next few days, I’ll
get to SimpleDB
yeah, check out http://aws.typepad.com/aws/2008/04/zemanta.html
The interesting thing is that more and more startups use EC2 &
S3, which means when Amazon goes down it takes large part of the
internet with it. However benifits are quite good.
The lack of automatic self-management software for EC2 however is
the big obstacle. We had to wrote our own. Do you know any good
(&cheap) solutions for that?
>> I’m not sure why the US is cheaper than Europe. It might be infrastructure costs, it might be the worthless dollar.
Their answer to this question is usually that “electricity in europe
is more expensive”. Think what you want about that answer
On the subject of online backup and storage …
Online backup is becoming common these days. It is estimated that
70-75% of all PC’s will be connected to online backup services with in
the next decade.
Thousands of online backup companies exist, from one guy operating in his apartment to fortune 500 companies.
Choosing the best online backup company will be very confusing and
difficult. One website I find very helpful in making a decision to pick
an online backup company is:
http://www.BackupReview.info
Have a look here, too:
http://www.backupreview.info/index.php?pid=read_article&article_id=9
This site lists more than 400 online backup companies in its directory and ranks the top 25 on a monthly basis.
@Andraz, stay tuned
@Jure, is there anything in Europe that isn’t more expensive?
@Jenny, thanks, great resource!