DRBD in the real world.

I’ve noticed a few blog posts recently about people saying how great DRBD is as a fail over mechanism for MySQL. My experience with DRBD has been the complete opposite. It offers almost no benefit over binary log replication for typical MySQL setups and prevents a few things that are possible with binary log replication.

Kaj Arnö has written an excellent blog post on the basics of DRBD. DRBD has one great feature that binary log replication doesn’t have. It can ensure that a write is synced to disk on two different hosts before allowing the application to continue. This is great for data redundancy but it introduces potential for instability in the setup. In a good fail over scenario a problem on the backup master should never cause an issue on the primary master. With DRBD the second master lagging behind because of a degraded raid, network issue, operator error, name your poison causes issues on the primary master because MySQL has to wait for writes to be synced to disk on _both_ machines before continuing. I know there are 3 different protocol modes that DRBD can operate in. Protocol C is really the only one that gives any extra data security over binary log replication so it’s the one I’m focusing my attention on. If an issue on one master causes problems on another then the benefit of having redundant masters is effectively lost.

When DRBD, the operating system, or hardware crashes it crashes hard. Any corruption on the primary master [update 2008-05-19: above the DRBD layer] during a nasty failure gets happily propagated over DRBD. Binary log replication executes queries on the slave the same way they were executed on the master giving a better chance of a tickled kernel/filesystem bug on one master won’t be ticked on the other master. The primary master will simply crash leaving the secondary master in a consistent state waiting to take on live traffic.

I’ve heard reports from clients that run DRBD fail over in the wild that do bulk load operations over DRBD puts enough load on the pair of masters that queries start timing out. I haven’t directly tested it but the client is a reputable source. I’ve personally seen alter table take much longer than normal (sorry about not having exact numbers but it’s > 2x) and cause enough commit operations to stack up to cause the system response time to go up high enough to time out clients. This is an outage caused by nothing more than a simple schema change.

I’ve saved the best for last. Since DRBD is a replicated block device, that block device can only be modified on one host a time. With binary log replication and dual master (one master hot) it’s possible to do most schema changes on the warm master, fail over, let the changes replicate over to the previously hot (now warm) master where they are run again without interrupting clients. This is a great workaround for not taking down time during large alter table operations.

DRBD offers slightly more data redundancy than normal raid configurations with the cost of a less stable less operationally friendly system. I see it as a great stop gap solution for applications that have no ability to do replication on their own. Since MySQL has this ability we should be focused on hardening and optimizing replication (checksum events please!) instead of finding ways around it.

If you have a DRBD fail over setup and want to get rid of it or want help setting up a proven system based on binary log replication drop me a line via the contact form on provenscaling.com or email me directly eric@provenscaling.com. Flame in the comments.

15 Comments

  1. Florian Haas says:

    Disclosure: I work for LINBIT, the company that drives the development of DRBD — and also provides commercial support for it, hint hint. :-) But what follows is my personal opinion, not company policy.

    “With DRBD the second master lagging behind because of a degraded raid, network issue, operator error, name your poison causes issues on the primary master because MySQL has to wait for writes to be synced to disk on _both_ machines before continuing.”

    Operator error is something you’re unlikely to ever eradicate by means of technical solutions. Network issues you’ll address by network resilience (switch redundancy, bonding) and network load planning. Degraded RAID, replace the faulty disk. Finally, configure your cluster manager so it autodetects failures and migrates resources and/or fences out nodes (Heartbeat does this quite nicely).

    “Any corruption on the primary master during a nasty failure gets happily propagated over DRBD.”

    You’re entirely correct. It’s a block device. It doesn’t have a clue as to what happens in the layers above. But on the other hand it’s fast, it’s flexible, it’s completely workload agnostic, and it’s transaction safe.

    “I’ve heard reports from clients that run DRBD fail over in the wild that do bulk load operations over DRBD puts enough load on the pair of masters that queries start timing out. I haven’t directly tested it but the client is a reputable source.”

    Hmmm. I’d love to see that system to see what’s going wrong there.

    “With binary log replication and dual master (one master hot) it’s possible to do most schema changes on the warm master, fail over, let the changes replicate over to the previously hot (now warm) master where they are run again without interrupting clients. This is a great workaround for not taking down time during large alter table operations.”

    Well if you’re doing schema changes all the time in your application, and you are observing a performance penalty on such operations when replicating over DRBD, then perhaps DRBD indeed isn’t the right solution for you. But I wonder if that’s true for the bulk of deployed database applications out there.

    Having said all that, you raised some important concerns, and we’ll be working together with the MySQL guys to address them in a couple of sessions in Santa Clara. Thanks a lot for your input!

  2. Eric Bergen says:

    The main point of this post is to show that drbd isn’t the right solution for most mysql failover setups, not that drbd is a bad product.

    Schema changes are a normal part of life with mysql. They can be reduced for the most part but new requirements for applications usually mean schema changes so it’s important for a failover solution to be able to handle these out of band. drbd can’t do that.

  3. [...] Bergen posted an interesting entry on his blog this morning about DRBD. He makes some very good points, but I believe leaves out some [...]

  4. They could be compared based on the delay added to commit, if MySQL were to implement synchronous replication. They can be compared on the amount of network IO done. Does DRBD use more or less network IO than the typical MySQL replication stream?

  5. I am setting up HA & Load Balanced cluster for HTTP & MYSQL, I like the idea of a DRBD replication but can I have the load balanced the 2 mysql server both running on DRBD syncronized drivers??

  6. Eric Bergen says:

    Mark,

    DRBD with innodb uses much more network i/o than binary log replication with this application. The binary log replicated slave uses about 50K/s net read. The DRBD slave at the same time uses 1800K/s net read.

  7. Julio Leiva says:

    Well, Well I just began playing with DRBD version 8.2.1 on a couple of Suse 10.1 Linux boxes.

    I have a postgresql D.B running with 900 T.P.S , and so far so good, I’ve been able to go back and forth between the two, and all my data have been replicated properly. We have not experience any lost in our T.P.S compare when we were not using DRDB.

    I think we will continue playing around with it before we decide to go live.

  8. [...] The session is opening up talking about failover. The shared disk in this case is drbd. DRBD is a fine product for replicating block devices of single disk systems. It’s made redundant by raid and doesn’t provide as much protection as binary log failover. You can find my notes on why I don’t recomment DRBD for MySQL in drbd in the real world. [...]

  9. Hi Eric,
    I just wanted to comment that you can combine the use of MySQL Replication and DRBD/Heartbeat to get the best of both worlds. Quick failover to a sycnhronous cold stand-by, and a warm slave that will let you do large table operations without much in the way of downtime.

  10. [...] DBA Dojo: Category: MySQL xaprb.com: How to sync tables in a master/master MySQL ReplicationAsk Bjoern Hansen: DRBD and MySQL MySQL Performance Blog MySQL HA Blog HowToForge: MySQL 5 Master/Master Replication on Fedora 8 Mark’s IT Blog: MySQL5 High Availability with DRBD 8 BobCares: High Availability Hosting with DRBD Eric Bergen: DRBD in the Real World [...]

  11. [...] to sum up yes we’ve seen all this before, and surprise surprise Eric Bergen’s “DRBD in the real world” has been quoted in that post as well. Now while I concede that some of the points Eric had made [...]

  12. MySQL and DRBD, Often say NO :)

    Florian is replying to Janmes on the subject of using DRBD for MySQL HA. A discussion started earlier by Eric Florian is refuting most of the arguments that James has against using MySQL and DRBD together.

    I`m also saying NO to MySQL and DRBD in mo…

  13. I think DRBD is the choice when there is little options – such is when you can’t afford loosing any single transaction, if this is not the case I think MySQL Replication is preferable for many reasons.
    \

  14. [...] these days we see a lot of post for and against (more, more) using of MySQL and DRBD as a high availability [...]

  15. Yosef Coelho says:

    I strongly disagree with that “Schema changes are a normal part of life with mysql”. NO, that’s normal part of life of a not organized development.

    Schema changes MUST be part of an organized and structured “new version” which has been extensively tested on a replica server etc etc. You run the schema change on the hot one on a planned time with all the team ready and with scape solutions at the lowest demand time.

    I’ve worked with a lot of crazy “developers” who promote “constant schema changes” as a “normal” practice. This is just plain lack of organization and project management, QA etc etc etc… On the fly = low level work.

Leave a Reply