Archive

Archive for the ‘SaaS’ Category

Amazon SQS: Simple Queue Service

April 16th, 2008 No comments

100014192753_v46777512_A few days later than expected, it was the….erm….writers strike!

Amazon’s Simple Queue Service provides reliable storage and delivery
of messages between any number of computers that have at least periodic
access to the Internet.

SQS
has everything you expect from a messaging system, including loose
coupling and fault tolerance, with some things you might not, like a
distributed infrastructure that stores messages redundantly over
multiple data centres. And of course, being web service based, clients can be built on any platform that can use HTTP requests.

SQS Architecture

Like the other Amazon Web Services, SQS is elegant in its simplicity. It contains just 2 constructs:

  • Messages. A message is a piece of textual data up to 8KB in size.
  • Queues. Queues store related messages together with a common configuration.

SQS is implemented as a distributed system. It stores messages in
multiple servers and potentially across multiple data centres. Although
this has a positive impact on redundancy, reliability and scalability,
it also has its drawbacks. The distributed nature brings in a few
quirks that you need to bear in mind when developing an application
with SQS.

Message retrievals might return incomplete results. When
a request comes in from a client, the SQS system samples only a subset
of physical servers for messages to return. Although the subset
changes,  you cannot be sure that any particular retrieval
contains all the messages in a given queue.

Messages may not be delivered speedily. You cannot
rely on SQS as a massaging system if you require instant message
delivery. In the normal run of things, messages can take from 2 to 10
seconds to be delivered.

Messages may not be delivered in sequence. Although
SQS will try to deliver your messages in the order in which they were
sent, in a distributed system it’s not always possible. If you need to
have ordered delivery, you will need to add some sort of sequencing
layer on top of SQS.

Messages may be redelivered. SQS uses 2 criteria to
determine whether or not a message should be delivered; whether it
still exists in a queue, and its visibility state (see below). In a
distributed system however, one can never be certain that these
properties are synchronised across multiple physical servers, so your
application must gracefully handle the delivery of a message that
should be invisible or have already been deleted.

Managing Messages

SQS
will deliver a message as many times as it needs to in order to ensure
that it is properly received, processed and acknowledged. This approach
means that no message can be lost. Even if a message-receiving
component crashes, or a network goes down before a message can be
processed.

Messages are managed by changing a message’s visibility state. A messages state can be set to invisible for a certain amount of time. This amount of time varies from 0 seconds to 2 hours, and is set by the queue.

Because
a message only remains invisible for a set period of time, the only way
to prevent eventual redelivery is to delete the message from the queue.
Setting the visibility timeout appropriately is essential for efficient
running of your application. If you set it too short, the message might
be delivered again whilst it is still being processed. Ideally, a
message’s visibility timeout should be slightly higher than the time it
would take to process the message.

SQS Roles

Message Sender.
A message sender contacts SQS and asks it to create a message on a given queue with some data.

Message Receiver.
The message receiver periodically polls SQS to see if there are any new
messages available on a specific queue. If there are messages on a
specific queue, and on the subset of sampled physical servers, SQS will
return up to 10 of these messages for processing. If a message is
processed successfully, the receiver will delete the message from the
queue. If it is not processed successfully, the receiver can opt to do
nothing, and let the visibility timeout expire.

Administrator. The administrator monitors the
queues and keeps the infrastructure running smoothly. This role can
bounce quite nicely off of EC2, in the sense that if a queue starts to
back up, it can simply spawn a new server to process the messages!

Pricing

$0.01 for 10,000 requests.

Data transfer is on a sliding scale – $0.18/GB from 0 to 10TB, $0.16/GB from 10 to 50TB and $0.13/GB for any amount over 50TB.

All in all, an enterprise level messaging system, but for a very low cost. Just the right formula for the micro ISV!

  • Share/Bookmark
Categories: SaaS, Uncategorized Tags:

Amazon EC2: Elastic Compute Cloud

April 10th, 2008 1 comment

100014192753_v46777512_
Amazon’s Elastic Compute Cloud (EC2)
is an environment is your very own on-demand virtual Linux server farm.
Each EC2 virtual server can be configured just like a real Linux
server, you can communicate with it, you can install software on it,
you can configure it, you can even use virtual servers configured by
third parties.

The EC2 service allows you to resize your server pool (hence
‘elastic’) at a moments notice. If there is a demand spike, you simply
spawn more servers, if there is no more work to do, you make them all
vanish. As with Amazon’s other web services, you pay only for the
computing power you use. You can also scale the amount of processing
power available to you by using more or less powerful servers.

Your virtual server farm can be controlled by either an API, or by command line tools that Amazon provides.

Overview

There are 3 main components to the EC2 service:

  • Instances. These are the virtual servers that run inside your farm, and perform the tasks that would otherwise be done by real servers.
  • Environment. Instances run inside the EC2
    environment. The environment provide configurable access control,
    contextual data and other information that your instances need to do
    their work.
  • Amazon Machine Images. AMIs are not dissimilar to Virtual PC or VMWare images. They represent a snapshot of your server, and serve as the boot disk.

Instances

Each EC2 instance is a Linux server to which you have root access.
The servers run inside a Xen virtual environment. Since the underlying
hardware is virtualised, its performance can be tuned within certain
preset parameters. Don’t let their virtuality fool you though, they are
configured to give you a very reasonable (and well defined) amount of
bang for your buck.

What do I mean by well defined? EC2 instances are rated by Amazon in
terms of the ‘EC2 Compute Unit’. 1 EC2 unit is approximately equal to
the ‘oomph’ provided by physical machine with a 1.0 to 1.2 GHz AMD Opteron processor, circa 2007.

Server instances can be spawned with one of 3 specifications. They
are classed as small, large or extra large. A small server boasts 1ECU
of power, with 1.7GB RAM
and 160GB of storage. An extra large instance boasts 8ECUs of power
(virtual quad core), with 15GB of RAM and around 1.7TB of storage. The
prices vary from 10¢ per hour for a small server to 80¢ per hour for an
extra large server. More details on the specifications of the machines
can be found on the EC2 home page .

AMIs

Every EC2 server instance is based on an AMI. This is a disk image
of the server’s root file system. When you launch, or spawn a new
server, it boots from the image with all of its installed software,
configuration and data that is stored in the AMI.

You can log into the instance with root priviledges and install and configure any software compatible with your instance’s Linux kernel.
Any changes you make to the system when it is live can be bundled back
into a new AMI that is the starting point for the next time your server
boots. If you so chose, you could create an AMI from scratch using a Linux distribution of your choice, Ubuntu for example.

The AMIs are stored in S3,
and they must be registered with EC2 before you can use them. You can
opt to keep the AMIs as private, so that only you can use them, or you
can share them. You can even create commercial AMIs and offer them to
other users on a paid basis. You simply register them with EC2 as a
paid image, then Amazon’s DevPay system will detect when it is run, and
bill its users a rate you define.

The EC2 Environment

The EC2 environment provides a set of services to your instance,
including networking, security, ephemeral storage etc. Access to the
instance is granted via key based SSH, so that only someone in
possession of the correct private key can access the system.

All internal and external IP Addresses were dynamic in EC2, there
existed no mechanism to hold on to a static IP, but it seems Amazon
have recently introduced good ol’ fashioned static IPs :) . In addition,
any data written to the ephemeral storage is lost when the instance
dies, so, as is quite common with Amazon’s web services, it’s up to
you, the user, to implement any persistance strategy.

Spreading the love – here are a couple of projects built with EC2 in mind.

EC2 offers a big leg up to fledgling micro ISVs who lack the capital
to invest in serious computational power, but with EC2, it seems a
little more….civilised.

What say you?

  • Share/Bookmark
Categories: SaaS, Uncategorized Tags:

Amazon S3: Simple Storage Service

April 9th, 2008 6 comments

100014192753_v46777512_

Amazon’s
Simple Simple Storage Service provides endless storage for any
conceivable kind of file or data. It provides a highly scalable,
secure, distributed storage network, accessable from any internet
connection, and can be used for everything from backing up personal
data, to distributing multimedia content to millions of users.

Amazon offers a Service Level Agreement
which states that if their uptime dips below 99.9%, you can claim
service credits to offset the disruption. In practise though, you have
to provide Amazon with detailed logs to prove that you suffered
disruption, and that the disruption was caused by S3. The SLA details can be found here:

http://www.amazon.com/gp/browse.html?node=379654011

Overview

One of the great things about the S3 service is its utter
simplicity. S3 has just 2 types of construct, namely buckets and
objects. Your S3 account can contain any number of buckets, your
buckets can contain any number of objects and an object can hold data
and metadata. That’s it. There are no folder structures, no renaming
support, no in place editing, nothing. Almost any application would
need to impose a layer on top of S3, and therein lies its brilliance.

S3 allows you to set access controls on both buckets and objects,
governing who can do what. Each bucket and object is uniquely
addressable with its own URI, for example:

http://s3.amazonaws.com/mybucket/myobject

S3 also allows you to map on your own domain, so you could publish the same resource at, say;

http://www.agilemicroisv.com/myobject

S3 is hugely distributed. It is spread across multiple server in
multiple locations. Currently, you can house data in either the US or
Europe. This is a two edged sword. On the one hand, it provides huge
benefits in terms of reliability, scalability and redundancy, but on
the other hand, its distributed nature can cause the odd headache.

  • Objects are not files. S3 does not support any
    file system-like operations on your objects. You cannot rename, move or
    modify files in place. You must get a local copy, make any changes,
    then commit the new file back.
  • Propagation latency. Your files are distributed
    over multiple servers in multiple data centres. This can cause issues
    if you have users trying to access the same data at the same time from
    different locations. Let’s say I upload a file, and you try to grab it.
    The object might not appear to you straight away, which can cause all
    kinds of fun with missing and out of date objects. If concurrent access
    is important to your application, you will end up writing some sort of
    version control or intermediary layer.
  • S3 requests will fail. Occasionally. It’s not a
    bug, rather a deliberate consequence of the architecture. Any
    application will need to gracefully expect the occasional failure, and
    retry after a small pause.
  • S3 IP addresses will change. Occasionally. Again, because of the distributed architecture of the system, you shouldn’t employ any local DNS caching for more than a few minutes.

It’s a great system, but if you’re going to use it, you need to have
a good grasp of what it is, and perhaps more importantly, what it isn’t.

Pricing

Users of Amazon’s web services are billed on 3 fronts.

  • Storage. Storage space in S3 is charged at 15¢ per gigabyte per
    month for data stored in the US, and 18¢ per gigabyte per month for
    data stored in Europe.
  • Data transfer. Data sent to S3 costs 10¢ per
    gigabyte uploaded. Data retrieved from S3 is charged on a sliding
    scale, depending on how much data was downloaded during the month: 18¢
    per gigabyte for the first 10 terabytes downloaded, 16¢ per gigabyte
    for the next 40 terabytes (between 10 TB and 50 TB), and 13¢per
    gigabyte for any additional data (over 50 TB).
  • API requests.
    You are also charged based on the number of API request messages S3
    processes on your behalf. You must pay per-request fees for the
    requests performed by your own application, as well as requests made by
    others when they download data you have made available from your
    account.

Amazon provide a calculator to estimate your usage costs:

http://calculator.s3.amazonaws.com/calc5.html

I’m not sure why the US is cheaper than Europe. It might be infrastructure costs, it might be the worthless dollar.

 

Spreading the love a little, here are a few applications that have built on top of the S3 APIs:

  • Share/Bookmark
Categories: SaaS, Uncategorized Tags: