Links of the Week – Week of January 3rd

With fears of del.icio.us going away and a little inspiration from @sampowers I’ve decided to go a step farther and blog about the links that I run across on a weekly basis.  Enjoy!

Infrastructure/Sysadmin

http://banksimple.com/engineering/2011/01/03/engineering-at-banksimple/ (link via @al3x)
This was a nice overview from Alex Payne of Bank Simple providing some insight into how BankSimple is being built.

http://gondor.io/blog/2011/01/03/hello-gondor/
Cloud based django hosting, similar to heroku for Rails apps.  These services are interesting, but traditional hosting can still be significantly cheaper.

http://glarizza.posterous.com/using-crankd-to-react-to-network-events (via @0xEFF)
Looks like it could be really useful for both dealing with certain problems in Mac administration.

https://puck.nether.net/pipermail/cisco-nsp/2010-December/thread.html and https://puck.nether.net/pipermail/cisco-nsp/2011-January/thread.html (via @kgasso)
There’s a good dicussion on what serial console servers people are using and why they suck and/or rock. Near the bottom of the December thread list and the beginning of the January thread list

Computer History

http://www.youtube.com/watch?v=-Z6wfzKswUg – Wang “freestyle” Computer Demo (via @b6n).
A reminder that market adoption is king…

http://research.swtch.com/2011/01/mos-6502-and-best-layout-guy-in-world.html
Interesing article about old school CPU design

Web Development/Programming/Database

http://mootools.net/
Appears to be an interesting JavaScript framework.  I haven’t use this one yet.

http://www.basho.com/riakos.html
Yet another Dynamo inspired “internet scale” distributed database thing.  Possibly interesting since it’s backend storage is pluggable.  It feels pretty similar to Cassandra from what I’ve read on the wiki.

http://ringce.com/hyde
Static content generator using Django and Python.  Could be interesting if you’re building something like Digg or Slashdot…

http://numpy.scipy.org/
Scientific Computing module for Python.  I’ve not used it yet but it seems like it should be very helpful for number crunching/statistics munging.

http://code.google.com/p/colorjizz/ (via @evilchili)
Interesting JavaScript color library with a terrible name.

https://github.com/montylounge/django-mingus and http://blog.montylounge.com/2009/sep/24/apps-that-power-django-mingus/
Mingus looks to be a strong jumping off point if you’re writing your own blog app in django.

http://www.doughellmann.com/projects/virtualenvwrapper/
Wrapper for virtualenv, a very useful tool for containing your python projects and their associated eggs.

http://joeloughton.com/blog/web-applications/sparklines-using-flot/
Sparklines are probably one of the coolest ways you’re not using to present data.

http://omnipotent.net/jquery.sparkline
Sparkline specific jquery graphing library.  Not sure I’d use it over flot at this point, but I may adopt it in the future.

Business

http://www.otbc.org/node/936 – 2010 Oregon Startup investment stats/listings.

Experiments in DRBD

Recently I’ve been setting up a pair of machines for web hosting duty.  As a part of my high availability strategy I’ve decided to mirror a Linux LVM Logical Volume from each host to the other.  This will allow me to fail services between hosts without losing any data with only a momentary blip while things shuffle around in the background.  Enter DRBD

In a nutshell, DRBD allows you to mirror (both synchronously and asynchronously) block devices over standard network connections.  These mirrors are flexible including features like “Truck Based Replication” (for pre-seeding a host locally, loading it on a truck and syncing it back up at a remote site) and split brain detection.  It also looks to integrate well with things like Linux-HA and Pacemaker.

In order to get started, you first have to set up a resource file.  The default global DRBD configuration options were pretty sane for Fedora Core 13, but there are a couple things worth noting:

/etc/drbd.d/global_common.conf:

  • global
    • common
      • protocol C; – This indicates a Synchronous mirror. You can read about the other protocol options here
      • pri-on-incon-degr – These commands are run if the node is primary for a DRBD device, degraded and the data is inconsistent. Default behavior is to call notify.sh, which emails an appropriate error message, emails another messages notifying the sysadmin that the host is about to reboot and performs an immediate reboot without shutting things down cleanly through the shutdown command.
      • pri-lost-after-sb – These commands are run after the host loses a split brain election.  The defaults here result in the same ungraceful reboot as pri-on-incon-degr.
      • local-io-error – If the local IO subsystem returns an error the host is halted immediately without a clean shutdown.

It should be noted that the error conditions above will rely on the filesystem’s recovery mechanisms to maintain data integrity.  These events are serious enough that the underlying host may already be in a bad enough state that a standard “shutdown -r/-h now” wouldn’t return in a reasonable timeframe.

Getting Started

Now that you understand some of the more important defaults, let’s dig in and define a DRBD resource (or mirror).  This configuration will have to be put on both machines:

resource r0 {
    device          /dev/drbd1;
    meta-disk       internal;
    disk            /dev/volumegroup/logicalvolume;

    syncer { rate 50M; }

    on host1.blewtech.com     { address 192.168.2.1:7789; }
    on host2.blewtech.com     { address 192.168.2.2:7789; }
}

The addresses above are using network interfaces that are dedicated to DRBD.  If you’ve got fast drives you’ll probably even want to bond multiple Gigabit Ethernet interfaces together to maintain disk level performance. For my purposes I’ve set the syncer to rate limit itself to 50M in case there are other services utilizing the back channel interface DRBD will be using.

Now you can get your resource ready for use:

drbdadm create-md r0

And finally bring the resource online:

drbdadm up r0

Your resource is now online and ready to go.  You can get the current status of drbd a couple different ways:

service drbd status
drbdadm status 

If you’ve run this on both hosts you’ll notice that both hosts have the resource in a secondary state.  You’ll need to synchronize the mirror in order to make it usable:

drbdadm -- --overwrite-data-of-peer primary r0

Since we set a maximum sync rate of 50M in the DRBD configuration modern faster drivers will likely be held back.  We can tell DRBD to temporarily speed this up:

drbdsetup /dev/drbd0 syncer -r 110M
You can revert the setting as follows:
drbdadm adjust resource

Other Useful Tidbits

If you created the device but now need to fail it over to allow the other host to write to it, you have to first demote the primary host and then promote the other host:

Old Prmary:
drbdadm secondary r0

New Primary:
drbdadm primary r0

Performance with DRBD is good as far as I’ve been able to discern.  Unfortunately I don’t have enough spindles handy to try pushing beyond a single GigE link.

Hopefully this article encourages you to give DRBD a shot.  Once you’ve got it in your arsenal I’m sure you’ll think of all sorts of interesting problems to solve with it.

Proactive vs. Reactive Monitoring

Inevitably after you’ve got some important stuff running on your servers you’ll discover that things aren’t working like you’re expecting or perhaps not even working at all.  Sounds like it’s time to think about how you’re going to monitor that stuff you’ve spent so much time tending to.

At a high level monitoring falls into two categories: proactive and reactive.  Reactive monitoring happens after an event has taken place  while proactive monitoring gives you a heads up some time before the event is likely to occur.  Reactive monitoring is generally low hanging fruit once you’ve got a monitoring framework in place that fits your environment.  You can set up alerts that a service is no longer available, a filesystem has filled up, etc.  If you’re paying attention you can often build new proactive alerts based on reactive alerts that you’ve had to address…

Good proactive monitoring grows out of root cause analysis.  By paying attention to the log messages and performance metrics that you’ve collected before and during an event you can often create proactive alerts that can clue you into a problem before it becomes serious.  It’s also important to sift through this data with team members so you can share knowledge and troubleshooting techniques.  Different folks interpret error messages differently (and not always correctly).  Here are some tips for making the knowledge transfer and analysis go smoothly:

  • Draw out a timeline of events.  Start the timeline in the middle of the whiteboard because you’ll likely go back farther to find the root cause than you might initially think.
  • Get a meeting room with a projector to make it easy for everyone to see what you see (if possible).
  • Gather as much data from the time period in question and related systems as possible.  The information in the logs is generally your primary troubleshooting method.
  • Divide and conquer the log review process among team members in the room (if possible).

Making a group analysis a cultural norm can really help future problems be dealt with faster.  You can learn more from failure than from success.

Django fixtures with ManyToMany fields

This evening I learned how to set up a ManyToMany value in a fixture.  I ended up using the “dumpdata” feature of manage.py to figure out the syntax.

I’ve got a ManyToMany field for one of my models called “district”.  I wanted to link the model instance to one district object (district ID #1):

"district": [1]
If I wanted add them to more than one district it would look like this:
"district": [1, 2, 5]

Ubuntu 10.04 nvidia Driver Screen Resolution Issue

Recently at work I installed Ubuntu 10.04 on a workstation under my desk.  After getting it installed I couldn’t get the monitor working at 1900×1200 as it was designed to do, so I hunted around on the intertubes for a while and found an option that got it running correctly without having to do lots of modeline weirdness.

In the video card section of /etc/X11/xorg.conf add the following line:

Option	    "ModeValidation" "NoMaxPClkCheck"

This disables pixel clock checks which probably only really matters for CRT displays (don’t quote me on that).

Why your daily team meeting sucks

Note this is is written from the “old school” ops perspective.  Sprint meetings are kind of a different animal.

Your daily team meeting sucks.  You know it and your fellow team members know it.  Deep down it’s eating at you that you spend somewhere between 5 and 50 minutes in a room going over your hopes and dreams for the work day.  This can be especially disheartening if your coffee hasn’t kicked in and/or you’re coming down from that meth bender you were on earlier in the week.

Here’s why it sucks.

Nobody can stay on topic

You can’t and your boss can’t.  Since there’s no specific agenda for the meeting, discussions go off on wild tangent quickly.  You start out talking about the random task for a customer and suddenly you’re all bitching about how much you hate toe socks.  Maybe not toe socks specifically (though there’s not much to like about toe socks).

50% of the team isn’t ready for the meeting because they’re caught up with more pressing issues

Disks are on fire!  The customer needs their issue fixed ASAP (for reals)!  These things happen in the morning, especially if you’re working for a west coast company with east coast customers.  More than likely your team meeting will end up being right in the middle of the morning when you’re caught up with these sorts of issues.  Sometimes you can sit out if it’s a really urgent issue, but since nobody’s taking notes and everyone assumes that someone else will debrief  you, the information shared in that meeting is likely lost.

The biggest reason your daily team meeting sucks?

It’s a coverup

Your daily team meeting is a cover for a lack of process around the work you’re doing.  If your workflow processes were well implemented and properly thought out you’d know when to talk to your co-workers about inter dependencies because the workflow would call these things out.  It may be a project manager that helps make sure the communication is in place, or it could be automatic ticket creation for canned or often repeated tasks.

It’s not all bad though

Your DTM likely does do some good things though.  It’s an opportunity for team members to be in the same communication space (conference bridge, meeting room, whatever).  As good as your workflow is you still need face time with co-workers, especially if they work on the same kinds of things you do.

If you’re unable to fix the root problems, a daily meeting is a necessary thing.  The communication has to happen one way or another.

Generally speaking, meetings aren’t always the real problem, and meetings aren’t necessarily a solution.  Meetings are a means to an end, facilitating communication needed to get your “real work” done.

So go do it already.

WordPress fopen RSS issue w/ Solaris and CoolStack

I was getting errors like this when one of WordPress modules was attempting to grab RSS updates:

WP HTTP Error: Could not open handle for fopen() to

After hunting around and trying to replace the Magpie RSS bits, I’m pretty confident the root cause was actually the PHP curl.so module not being enabled by default with Sun Coolstack.  This meant that it didn’t have a method with which to go grab the feeds, and after re-enabling it things were off and running again.

mysqldump – INSERTs too big

So recently I was attempting to migrate some rather large tables from one (slow) database host to another.  I was running mysqldump piped into a mysql client on localhost.  Unfortunately, I ran into a snag:

mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `SOME_TABLE` at row: 14913098

I had two things working against me in this situation.

  1. I was forking a mysqldump process for each table in the database, so I was running 100+ mysqldump processes at the same time.
  2. The host the data was dumping from was slow.

So since mysqldump returned the error, the issue seems to have originated on the host I was dumping from.  This is sometimes due to a max_allowed_packet issue, but max_allowed_packet was set at 16M on both hosts.  I also found this blog entry that sounded similar:
http://jeremy.zawodny.com/blog/archives/000690.html

Unfortunately, -q is enabled by default with –opt.  Foiled again!  I found some mentions of setting timeouts really really high on the database server, which made me think “What if the host is so slow it’s not able to return data before the session timeout is hit?”  So how do I make mysqldump return the data more often…

I started playing with the different options.  max_allowed_packet still returns a large INSERT.  Setting  –no-extended-insert would also get this result, but that could more than double my migration time (already projected to be several days).  Then I found this only slightly documented option:

–net_buffer_length

The default setting seems to be 1M in my installation, so setting this down to 128K or 64K will reduce the size of the INSERT generated.  This also means that data is flushed out to the client more often, working around setting the timeouts obcenely high.  This also means that if something is really causing the source database host to crunch and return really slowly, we’ll probably return data fast enough to avoid hitting the timeouts.  If you’ve got rows bigger than what you set net_buffer_length to, mysqldump is smart enough to adjust the buffer for that row so you won’t get a partial result.

Sun 2540 Network Timeouts – RESOLVED

We recently took delivery of two 2540 storage arrays to be used with MySQL and ZFS.  These are great little boxes that offer a lot of bang for the buck.

After getting them on-site and online I started seeing lots of dropped packets behavior between the CAM host and other devices on the same VLAN when CAM was attempting to communicate with the arrays.  Initially we thought this was a bad edge switch, but as it turns out, it looks to be related to the bug described here:

http://sunsolve.sun.com/search/document.do?assetkey=1-66-240105-1

Originally we were only upgrading the arrays to the firmwares that come with the latest release of the 6.0.x branch because a Sun tech was concerned about false positives on broken drives/controllers with the 6.1.whatever CAM software.  However, after running into this network issue, I’ve upgraded CAM to 6.1.2.8.  I’m guessing that the issues he was concerned about are gone since it’s been a few months since the ticket was open.

To get the arrays with more disks to upgrade correctly, I’ve actually had to connect both controllers to sepearate network ports on the back of the CAM host (on different subnets to avoid routing issues).  This method didn’t actually work until I un-registered and re-registered the array within CAM.  What a pain!

I hope this helps someone else out there if they run into this issue.

Everyone into the (Server) Pool

So you’ve got a web server…you’ve started your new internet application and you’re going to strike it rich.  Awesome.  Next thing you know, you’ve grown and you need more web servers.  Congratulations!  You’ve probably built or purchased some sort of load balancer to put in front of your web servers.

Next thing you know, you’ve diversified and you want to run a few different apps, or if you’re lucky, a few different domains using the same codebase.  You think “I probably need a few new servers to run this hot new application”.  You’re probably right.  You may think “I should probably set up a new server pool/farm for this new domain/app/whatever”.  Think twice!

Additional server pools or farms need justification.  They’re additional overhead when you’ve probably already got enough on your plate.  Here are some problems with additional server pools:

Additional management overhead on your load balancer.  Put enough enough pools on there and you could actually increase latency and/or decrease throughput.  Sure, with modern hardware this isn’t a huge deal.  You might need a thousand pools before you start seeing a reduction in performance, but if you have a really complex rule or transform set applied to URLs for a pool, this could come to a head sooner.

Additional management overhead on your servers.  Keeping track of which servers are running which domain or which application can be difficult, especially if your growth has been organic.  Every customer facing web server should be able to run any domain you host.  This probably doesn’t apply to a major hosting company hosting different customer URLs since you may not get that much control over the application, but if you’re fortunate enough to have customers running something written to be domain/vhost aware, it’ll pay dividends later.

This methodology doesn’t apply to everything.  You may not want to mix presentation layer services with your business logic services unless they’re both pretty lightweight (though since you can squeeze 32GB of RAM in 1U these days, it may not be a big deal).  You’ll probably want to run your database on your front-end web servers even less.  If you have extra cycles turn burn, this may be a way to get better utilization out of your machines, but this doesn’t work for everything.  When it comes down to it, you (should) know your application best.

Another point: make sure you understand how your application behaves.  Benchmarking will save you when you suddenly need to know how many more servers you need to buy.  Do certain things make the memory footprint expand dramatically?  Do some hits generate a lot of extra CPU load (maybe those URLs will need their own pool soon)

In my experience getting in this habit will make your life easier.  If you’ve half a dozen domains and half a dozen servers, taking half the farm down for maintenance behind the scenes will become much more trivial.