Microsoft Windows is still amazingly ubiquitous. To a good approximation, everyone uses Windows. The actual figure is above 95%, measured by people’s browsers. (Compare this to opinion polls: nothing ever gets 100 % support. There’s always the 3 % who don’t agree to anything.)
It may therefore come as a surprise that two Gartner analysts are quoted as saying that Windows is ‘collapsing’ (also here). The Windows software platform is too bloated and inflexible to be competitive, they say, not least because of backward compatibility. The Windows codebase is Microsoft’s Iraq – a quagmire.
To those woes can be added Microsoft’s relative absence in the web space. I, for one, don’t use any of Microsoft’s web services. Not because of any ideological opposition, but because they’re not competitive. I try Live Search from time to time, to see how it stacks up against Google. It doesn’t. HotMail versus Gmail – puh-lease. I even signed up – in good faith – for Office Live when that came out. Nice try.
That’s why Microsoft really needs Yahoo. People live their lives inside their browsers these days. True, the web does not yet solve all the problems that desktop apps do. I don’t know how many people do their video editing online, for instance. And Microsoft maintains its stranglehold with Office. But check out some of the features that are coming out on Google Docs: offline access, email notifications, gadget integration, and form support. The latter lets you publish web forms that submit directly into spreadsheets. Pretty cool. So Google is not only copying Office, they’re adding innovative, web-enabled features. Stiff competition on the horizon, in other words.
But don’t underestimate market inertia. It’s going to take time to eat into Microsoft’s market share. With 95% market dominance, rumours of Windows’s imminent death are clearly somewhat exaggerated. I have a feeling we’ll be having much the same discussion in 10 years time.
Cloud based computing is what all the kids are talking about these days. And with good reason. Buying and maintaining your own physical servers is expensive and labour-intensive. To do it efficiently, you need an economy of scale. Providers like Amazon already have one, and they’re renting it out by the hour.
With the latest feature additions to Amazon EC2, it’s even easier to deploy a seriously fault-tolerant web service. You can now programmatically assign “availability zones” to your Amazon servers, to ensure that your service survives in case on availability zone goes down (e.g. because of a fire in a data centre). They’ve also added a long-requested feature, by the way: static IP addresses.
Given the scale and redundancy of Amazon’s infrastructure, your data is stored in one of the most secure locations in the world. There is always that nagging doubt about putting all your eggs in one basket, of course. And, more importantly, there is no way to request backup tapes from Amazon. The whole point is, after all, that they’re no longer needed. But hang on, that’s not the only reason we have backup tapes. I would hazard a guess that the majority of requests for backup tapes are not made because of drive failures. They’re made because someone (you know who you are, sales department) has deleted a file they shouldn’t have.
How is this relevant in the Cloud context? Well, you can think of Amazon Web Services customers as users. And users make mistakes. They delete their own data. Or become virus-infested. Or their software misbehaves. So while the Cloud may be performing stellarly (forgive the mixed metaphor), that doesn’t mean backups aren’t needed.
Fortunately, backing up your Amazon data to another location is easy. At Kalibera, we use two services for storing persistent data: S3 and SimpleDB. They need to be backed up in slightly different ways.
There are already a number of backup tools that work with S3. The catch is that we want to do the reverse of what these tools usually do. They push your data to S3; we want to pull it out. (Some of them might be capable of syncing.) At any rate, it’s quick to write a small script that checks the Last-Modified date of your S3 objects and downloads the ones changed since your last backup. In my script, I append the modified time to the file name so that new versions do not overwrite old versions.
SimpleDB works almost like a relational database, so is logically backed up e.g. to MySQL. To facilitate a simple backup process, we add a little meta-data when we update SimpleDB:
- When we save a SimpleDB item (think “row”), we always add a “last modified” attribute. This makes it easy for the backup process to extract just the items that have changed since the last backup.
- When we delete a SimpleDB item, we also add an entry to a delete-log domain (think “table”) with a time stamp. That way, we know which items have been deleted since the last backup.
To take account of the (blissfully) dynamic nature of SimpleDB, the MySQL backup script needs to be able to create table structure on the fly, of course. Domain names become MySQL table names. Attribute names become column headers.
With this setup, we can reconstruct the state of our part of the Cloud as it existed at any time, because we’ve saved all historical versions of both S3 and SimpleDB data.
And voila – cheap insurance against our own programming errors, errant end-user deletions and the (arguably unlikely) event of multiple asteroids taking out all of Amazon’s data centres.
Amazon‘s Elastic Compute Cloud (EC2) is Amazon’s pay-as-you-go server rental outfit. Using various APIs, developers can easily launch new server instances from disk images. Launching an image typically takes only about a minute, so this is a very convenient way of scaling a web application.
However, EC2 has some design peculiarities that need to be taken into account when you build your application. Using EC2 to deploy web applications has particular challenges.
First of all, EC2 servers (or instances, as Amazon calls them) can fail at any moment. Of course, this applies to any server at any time, but losing an EC2 instance is particularly bad because you lose any data stored on the server’s local disk. If a regular, physical server crashes, you will usually be able to extract data from the hard drive. Not so with EC2. How often this actually happens is not well established, but it clearly does happen.
Secondly, EC2 servers have dynamic IP addresses. Amazon does not currently offer static IPs. At first blush, this looks like a severe limitation. How do you set up DNS records if you only have dynamic IPs? How do you configure load balancers if you don’t know the addresses of the backend servers?
(Another complication, which I won’t deal with here, is that EC2 instances boot up with no data on their hard drive except what you’ve put into the disk image.)
Are these complications fatal to the idea of deploying web applications on EC2? The ephemeral nature of EC2 instances does admittedly take a little time to get used to. But when you do, the idea is liberating. By treating servers as potentially short-lived, volatile entities, you end up with a design where redundancy and scalability is integral.
At Kalibera, we’ve set up a self-organizing system of EC2 instances based on dynamic, round-robin DNS and the load balancer Nginx.
The first thing to get used to is dynamic DNS. To deploy web applications entirely within EC2, you really need a DNS provider that lets you update records via an API. We use Nettica for this purpose, and their service works well. When one of our EC2 instances boots up, it immediately adds a DNS record for its dynamic IP to a special-purpose internal domain (such as mydomain.test). The point of this is let any EC2 instance easily obtain a list of the servers in the cluster. (My initial idea was to use JGroups for this, but Amazon doesn’t support IP multicast). Once all instances have each other’s IP addresses, they can monitor each other for failure and coordinate tasks such as load balancing.
Instances that are configured to run as load balancers also add a DNS record for the domain they’re set to balance. Importantly, multiple records are added for each domain. In our setup, we let every instance be a load balancer for every other instance. The time-to-live (TTL) is set as short as possible, to let instances disappear from DNS quickly in cases of failure. With this setup, browsers distribute requests between the registered load balancers in round-robin fashion.
Since the list of backend servers can change at any time, it’s convenient to use a load balancer that can reload its configuration without a restart. We’ve settled on Nginx, a light-weight web server which takes about five minutes to configure. It supports load balancing and failover, and scales well.
So what happens when an instance dies? If the instance is a backend server, Nginx will detect the failure and stop forwarding requests. If the instance is a load balancer, the problem is more serious. We now have DNS entries pointing to a dead server. However, instances monitor each other, so the failure will be detected and the invalid DNS record will be deleted almost immediately. Because we’ve set a short TTL, the now-invalid DNS record expires within a few minutes. Not every client will respect the DNS timeout, but most will. And anyway, more than one server is registered for any domain, so the client can choose another server. Again, clients behave a little differently, but this works decently overall.
The real beauty of this setup becomes apparent when we need to scale up or down. Scaling up just means adding more EC2 instances. As they boot up, they register in DNS, and are added to the cluster automatically, with zero configuration. Scaling down just means terminating instances (we actually do it a little more cleanly, but the principle is the same). Testing and roll-out of new versions is suddenly a lot easier too.
What initially seemed to me to be severe limitations in EC2 have turned out to be strengths. Every server can fail. EC2 forces you to confront this fact with a little more seriousness than you might with traditional setups. The scalability and redundancy that comes out of that is liberating indeed.