Archive for 2nd April 2007

DRBD in the real world.

I’ve noticed a few blog posts recently about people saying how great DRBD is as a fail over mechanism for MySQL. My experience with DRBD has been the complete opposite. It offers almost no benefit over binary log replication for typical MySQL setups and prevents a few things that are possible with binary log replication.

Kaj Arnö has written an excellent blog post on the basics of DRBD. DRBD has one great feature that binary log replication doesn’t have. It can ensure that a write is synced to disk on two different hosts before allowing the application to continue. This is great for data redundancy but it introduces potential for instability in the setup. In a good fail over scenario a problem on the backup master should never cause an issue on the primary master. With DRBD the second master lagging behind because of a degraded raid, network issue, operator error, name your poison causes issues on the primary master because MySQL has to wait for writes to be synced to disk on _both_ machines before continuing. I know there are 3 different protocol modes that DRBD can operate in. Protocol C is really the only one that gives any extra data security over binary log replication so it’s the one I’m focusing my attention on. If an issue on one master causes problems on another then the benefit of having redundant masters is effectively lost.

When DRBD, the operating system, or hardware crashes it crashes hard. Any corruption on the primary master [update 2008-05-19: above the DRBD layer] during a nasty failure gets happily propagated over DRBD. Binary log replication executes queries on the slave the same way they were executed on the master giving a better chance of a tickled kernel/filesystem bug on one master won’t be ticked on the other master. The primary master will simply crash leaving the secondary master in a consistent state waiting to take on live traffic.

I’ve heard reports from clients that run DRBD fail over in the wild that do bulk load operations over DRBD puts enough load on the pair of masters that queries start timing out. I haven’t directly tested it but the client is a reputable source. I’ve personally seen alter table take much longer than normal (sorry about not having exact numbers but it’s > 2x) and cause enough commit operations to stack up to cause the system response time to go up high enough to time out clients. This is an outage caused by nothing more than a simple schema change.

I’ve saved the best for last. Since DRBD is a replicated block device, that block device can only be modified on one host a time. With binary log replication and dual master (one master hot) it’s possible to do most schema changes on the warm master, fail over, let the changes replicate over to the previously hot (now warm) master where they are run again without interrupting clients. This is a great workaround for not taking down time during large alter table operations.

DRBD offers slightly more data redundancy than normal raid configurations with the cost of a less stable less operationally friendly system. I see it as a great stop gap solution for applications that have no ability to do replication on their own. Since MySQL has this ability we should be focused on hardening and optimizing replication (checksum events please!) instead of finding ways around it.

If you have a DRBD fail over setup and want to get rid of it or want help setting up a proven system based on binary log replication drop me a line via the contact form on provenscaling.com or email me directly eric@provenscaling.com. Flame in the comments.