Live Leader

Live Chat 2.0

Login

Member Login

Blog

Blog Navigation

Blog Posts

  1. Amazon EC2: Dealing with ephemeral servers

    February 15th, 2008 - Category: Cloud Computing | 2 Comments »

    Amazon‘s Elastic Compute Cloud (EC2) is Amazon’s pay-as-you-go server rental outfit. Using various APIs, developers can easily launch new server instances from disk images. Launching an image typically takes only about a minute, so this is a very convenient way of scaling a web application.

    However, EC2 has some design peculiarities that need to be taken into account when you build your application. Using EC2 to deploy web applications has particular challenges.

    First of all, EC2 servers (or instances, as Amazon calls them) can fail at any moment. Of course, this applies to any server at any time, but losing an EC2 instance is particularly bad because you lose any data stored on the server’s local disk. If a regular, physical server crashes, you will usually be able to extract data from the hard drive. Not so with EC2. How often this actually happens is not well established, but it clearly does happen.

    Secondly, EC2 servers have dynamic IP addresses. Amazon does not currently offer static IPs. At first blush, this looks like a severe limitation. How do you set up DNS records if you only have dynamic IPs? How do you configure load balancers if you don’t know the addresses of the backend servers?

    (Another complication, which I won’t deal with here, is that EC2 instances boot up with no data on their hard drive except what you’ve put into the disk image.)

    Are these complications fatal to the idea of deploying web applications on EC2? The ephemeral nature of EC2 instances does admittedly take a little time to get used to. But when you do, the idea is liberating. By treating servers as potentially short-lived, volatile entities, you end up with a design where redundancy and scalability is integral.

    At Kalibera, we’ve set up a self-organizing system of EC2 instances based on dynamic, round-robin DNS and the load balancer Nginx.

    Clustering
    The first thing to get used to is dynamic DNS. To deploy web applications entirely within EC2, you really need a DNS provider that lets you update records via an API. We use Nettica for this purpose, and their service works well. When one of our EC2 instances boots up, it immediately adds a DNS record for its dynamic IP to a special-purpose internal domain (such as mydomain.test). The point of this is let any EC2 instance easily obtain a list of the servers in the cluster. (My initial idea was to use JGroups for this, but Amazon doesn’t support IP multicast). Once all instances have each other’s IP addresses, they can monitor each other for failure and coordinate tasks such as load balancing.

    Load balancing
    Instances that are configured to run as load balancers also add a DNS record for the domain they’re set to balance. Importantly, multiple records are added for each domain. In our setup, we let every instance be a load balancer for every other instance. The time-to-live (TTL) is set as short as possible, to let instances disappear from DNS quickly in cases of failure. With this setup, browsers distribute requests between the registered load balancers in round-robin fashion.

    Since the list of backend servers can change at any time, it’s convenient to use a load balancer that can reload its configuration without a restart. We’ve settled on Nginx, a light-weight web server which takes about five minutes to configure. It supports load balancing and failover, and scales well.

    So what happens when an instance dies? If the instance is a backend server, Nginx will detect the failure and stop forwarding requests. If the instance is a load balancer, the problem is more serious. We now have DNS entries pointing to a dead server. However, instances monitor each other, so the failure will be detected and the invalid DNS record will be deleted almost immediately. Because we’ve set a short TTL, the now-invalid DNS record expires within a few minutes. Not every client will respect the DNS timeout, but most will. And anyway, more than one server is registered for any domain, so the client can choose another server. Again, clients behave a little differently, but this works decently overall.

    The real beauty of this setup becomes apparent when we need to scale up or down. Scaling up just means adding more EC2 instances. As they boot up, they register in DNS, and are added to the cluster automatically, with zero configuration. Scaling down just means terminating instances (we actually do it a little more cleanly, but the principle is the same). Testing and roll-out of new versions is suddenly a lot easier too.

    What initially seemed to me to be severe limitations in EC2 have turned out to be strengths. Every server can fail. EC2 forces you to confront this fact with a little more seriousness than you might with traditional setups. The scalability and redundancy that comes out of that is liberating indeed.

Company Navigation

© Kalibera 2009. Powered by Amazon Web Services