Tag Archives: high availability

Experiments in DRBD

Recently I’ve been setting up a pair of machines for web hosting duty.  As a part of my high availability strategy I’ve decided to mirror a Linux LVM Logical Volume from each host to the other.  This will allow me to fail services between hosts without losing any data with only a momentary blip while things shuffle around in the background.  Enter DRBD

In a nutshell, DRBD allows you to mirror (both synchronously and asynchronously) block devices over standard network connections.  These mirrors are flexible including features like “Truck Based Replication” (for pre-seeding a host locally, loading it on a truck and syncing it back up at a remote site) and split brain detection.  It also looks to integrate well with things like Linux-HA and Pacemaker.

In order to get started, you first have to set up a resource file.  The default global DRBD configuration options were pretty sane for Fedora Core 13, but there are a couple things worth noting:

/etc/drbd.d/global_common.conf:

  • global
    • common
      • protocol C; – This indicates a Synchronous mirror. You can read about the other protocol options here
      • pri-on-incon-degr – These commands are run if the node is primary for a DRBD device, degraded and the data is inconsistent. Default behavior is to call notify.sh, which emails an appropriate error message, emails another messages notifying the sysadmin that the host is about to reboot and performs an immediate reboot without shutting things down cleanly through the shutdown command.
      • pri-lost-after-sb – These commands are run after the host loses a split brain election.  The defaults here result in the same ungraceful reboot as pri-on-incon-degr.
      • local-io-error – If the local IO subsystem returns an error the host is halted immediately without a clean shutdown.

It should be noted that the error conditions above will rely on the filesystem’s recovery mechanisms to maintain data integrity.  These events are serious enough that the underlying host may already be in a bad enough state that a standard “shutdown -r/-h now” wouldn’t return in a reasonable timeframe.

Getting Started

Now that you understand some of the more important defaults, let’s dig in and define a DRBD resource (or mirror).  This configuration will have to be put on both machines:

resource r0 {
    device          /dev/drbd1;
    meta-disk       internal;
    disk            /dev/volumegroup/logicalvolume;

    syncer { rate 50M; }

    on host1.blewtech.com     { address 192.168.2.1:7789; }
    on host2.blewtech.com     { address 192.168.2.2:7789; }
}

The addresses above are using network interfaces that are dedicated to DRBD.  If you’ve got fast drives you’ll probably even want to bond multiple Gigabit Ethernet interfaces together to maintain disk level performance. For my purposes I’ve set the syncer to rate limit itself to 50M in case there are other services utilizing the back channel interface DRBD will be using.

Now you can get your resource ready for use:

drbdadm create-md r0

And finally bring the resource online:

drbdadm up r0

Your resource is now online and ready to go.  You can get the current status of drbd a couple different ways:

service drbd status
drbdadm status 

If you’ve run this on both hosts you’ll notice that both hosts have the resource in a secondary state.  You’ll need to synchronize the mirror in order to make it usable:

drbdadm -- --overwrite-data-of-peer primary r0

Since we set a maximum sync rate of 50M in the DRBD configuration modern faster drivers will likely be held back.  We can tell DRBD to temporarily speed this up:

drbdsetup /dev/drbd0 syncer -r 110M
You can revert the setting as follows:
drbdadm adjust resource

Other Useful Tidbits

If you created the device but now need to fail it over to allow the other host to write to it, you have to first demote the primary host and then promote the other host:

Old Prmary:
drbdadm secondary r0

New Primary:
drbdadm primary r0

Performance with DRBD is good as far as I’ve been able to discern.  Unfortunately I don’t have enough spindles handy to try pushing beyond a single GigE link.

Hopefully this article encourages you to give DRBD a shot.  Once you’ve got it in your arsenal I’m sure you’ll think of all sorts of interesting problems to solve with it.

Everyone into the (Server) Pool

So you’ve got a web server…you’ve started your new internet application and you’re going to strike it rich.  Awesome.  Next thing you know, you’ve grown and you need more web servers.  Congratulations!  You’ve probably built or purchased some sort of load balancer to put in front of your web servers.

Next thing you know, you’ve diversified and you want to run a few different apps, or if you’re lucky, a few different domains using the same codebase.  You think “I probably need a few new servers to run this hot new application”.  You’re probably right.  You may think “I should probably set up a new server pool/farm for this new domain/app/whatever”.  Think twice!

Additional server pools or farms need justification.  They’re additional overhead when you’ve probably already got enough on your plate.  Here are some problems with additional server pools:

Additional management overhead on your load balancer.  Put enough enough pools on there and you could actually increase latency and/or decrease throughput.  Sure, with modern hardware this isn’t a huge deal.  You might need a thousand pools before you start seeing a reduction in performance, but if you have a really complex rule or transform set applied to URLs for a pool, this could come to a head sooner.

Additional management overhead on your servers.  Keeping track of which servers are running which domain or which application can be difficult, especially if your growth has been organic.  Every customer facing web server should be able to run any domain you host.  This probably doesn’t apply to a major hosting company hosting different customer URLs since you may not get that much control over the application, but if you’re fortunate enough to have customers running something written to be domain/vhost aware, it’ll pay dividends later.

This methodology doesn’t apply to everything.  You may not want to mix presentation layer services with your business logic services unless they’re both pretty lightweight (though since you can squeeze 32GB of RAM in 1U these days, it may not be a big deal).  You’ll probably want to run your database on your front-end web servers even less.  If you have extra cycles turn burn, this may be a way to get better utilization out of your machines, but this doesn’t work for everything.  When it comes down to it, you (should) know your application best.

Another point: make sure you understand how your application behaves.  Benchmarking will save you when you suddenly need to know how many more servers you need to buy.  Do certain things make the memory footprint expand dramatically?  Do some hits generate a lot of extra CPU load (maybe those URLs will need their own pool soon)

In my experience getting in this habit will make your life easier.  If you’ve half a dozen domains and half a dozen servers, taking half the farm down for maintenance behind the scenes will become much more trivial.