Monday 6 December 2010

Another Cloud

Windows Azure and Amazon EC2

Having spent some time working with Windows Azure, I wanted to take a look at some of the other cloud environments out there to get a feel for how they work and how they differ in approach. The first platform I decided to take a look at was the Amazon Elastic Compute Cloud (EC2).

Amazon’s cloud offering is a little different from Microsoft’s — where Windows Azure is a platform and a specially designed framework that allows you to run specially written applications in the cloud, Amazon EC2 allows you to run standard operating systems virtual environments on Amazon’s servers. So two trade-offs spring immediately to mind:

  1. Windows Azure offers you a single fixed environment as against EC2’s almost completely free choice of operating systems (including Windows). Note that the latest Windows Azure release also includes Virtual Machine Roles in addition to the existing Web and Worker roles, so that you can run your own virtual machines in the cloud.
  2. The Windows Azure platform manages all the scalability issues for you (because of the features built in to the platform), whereas with Amazon EC2 you have to do a lot of the work if you want to build a scalable application that can run across multiple virtual machines. Although Amazon does offer an auto scaling service that can start up (or shut down) virtual machine instances for you based on demand, and a MapReduce service (for a description of the MapReduce algorithm and how to implement it in Windows Azure, see here) that’s designed to process large amounts of data on demand.

That said, there are a lot of similarities between the two platforms as I’ve outlined in the following table:

Windows Azure Amazon Web Services Notes
Content Delivery Network (CDN) Amazon CloudFront Both provide high-speed edge caches for static data, used for example to host video or other media for your cloud application.
Windows Azure Table Service Amazon SimpleDB Schema-less table storage.
SQL Azure Amazon Relational Database Service (RDS) SQL Azure is SQL Server in the cloud, Amazon RDS is MySQL in the cloud.
AppFabric Service Bus Amazon Simple Queue Service and Amazon Simple Notification Service Hosted queue services enabling  computers to exchange data through a cloud-hosted message hub.
Windows Azure Connect Amazon Virtual Private Cloud Creating virtual private networks that connect on-premises computers with your cloud instances.
Windows Azure Blob Storage Amazon Simple Storage Service (S3) Facility to allow you to store arbitrary data in the cloud.
Windows Azure Drive Amazon Elastic Block Store Storage that can be formatted and used like hard drives by your cloud application.

 

Daily News

Earlier this year I purchased a Kindle ebook reader which has been fantastic as a way to carry around books and reference material. However, one area that I was slightly disappointed with was the subscriptions to newspapers and journals that are available on the Amazon site. I soon found that I could generate my own news digests from just about any source by using an open source tool called Calibre. Once I customized the news feeds that I wanted to read on my Kindle, I can use Calibre’s command line interface to generate the file containing my news and email it direct to the Kindle. This is all great, except for the fact that I need to have the machine that generates the Kindle news feed using Calibre running. Most of the time it is running, but if on occasion I’m away from home without my laptop, it would still be great to get my daily fix of news delivered to my Kindle.

Calibre is very smart in the way that it generates ebooks containing news if you don’t mind doing a bit of python scripting, so I wanted to carry on using Calibre. Running Calibre on an Amazon EC2 virtual machine seemed like a good way to automate sending out daily news from an always on machine, so this gave me a reason to investigate how easy this would be to achieve with Amazon EC2.

Setting up my Cloud Machine with Amazon EC2

After signing up for EC2, the first decision was what operating system to use. Amazon currently has an AWS Free Usage Tier offer, which is only free if you use a Linux operating system, so Linux it was. However I was then faced with choice of several hundred different base virtual machines of various different flavours of Linux. Ubuntu seemed to be the most popular, and a bit of googling soon revealed which were the “official” Ubuntu machine images.

AMIs

Running my instance of Ubuntu on Amazon’s servers was a simple as selecting the base machine image and clicking the launch button in the web console (making sure I used a micro instance to make sure I stayed on the free usage tier). The Public DNS value is the machine’s DNS name.

MyInstances

The next step was to connect to my virtual machine, which involved some security configuration. First of all I needed a key and this was generated for me when I launched the virtual machine, secondly I needed to open up the virtual machine’s firewall to allow me administrative access so I added an entry on the Security Group page to enable SSH.

SecurityGroup

My only stumbling block came when I tried to connect to the virtual machine using Putty as an SSH client in Windows in that Putty didn’t recognize the key that EC2 had generated for me when I launched the virtual machine. It turned out that I needed to convert the key to a different format by using Puttygen. With that sorted out I could run a command shell on the virtual machine, and copy files to and from the virtual machine using PSCP.

Installing Calibre on Ubuntu turned out to a single command:

sudo apt-get install calibre

Finally I could set up a scheduled command using crontab to generate and email my Kindle newsfeed every day at 6am.

Conclusions

To summarize what I learnt from my first use of Amazon EC2:

  • Setting up a virtual machine in the cloud is very straight-forward with the Amazon Web Services web-based management console. It also looked as if would be quite simple using the command line tools.
  • Choosing a suitable base operating system is more difficult. Someone else has installed the OS and a selection of software before you start, you really need to know your way round the OS to be sure that it’s secure and properly configured. In fact you probably want to install it yourself, which is possible, but a bit more complicated. Also, it’s down to you to make sure everything is kept up to date with patches etc.
  • Given the choice of operating systems available, you can run just about any piece of software you like (even applications with GUIs if you use technologies like Remote Desktop or VNC). However, there’s no guarantee that it will scale — in order for an application to scale it must be able to run in multiple virtual machines simultaneously, and probably be designed to use one or more of the scalable storage services like Amazon SimpleDB or Amazon Simple Storage Service.

No comments: