Archive for 14th April 2008

Replication tutorial notes – part 2

This is a continuation of the MySQL User Conference replication notes part one.

The session is opening up talking about failover. The shared disk in this case is drbd. DRBD is a fine product for replicating block devices of single disk systems. It’s made redundant by raid and doesn’t provide as much protection as binary log failover. You can find my notes on why I don’t recomment DRBD for MySQL in drbd in the real world.

Lars went a bit quick through the other two configurations. I’ll try to review the slides and post comments.

The next configuration is using federated. The federated storage engine has many problems that make it almost useless for any production deployment. Mats says, “Federated isn’t the fastest engine in the world”. That’s an understatement. Join on two tables as they describe it is almost impossible. Aside from the performance issues this is my favorite limitation of the federated engine, “There is no way for the FEDERATED engine to know if the remote table has changed. The reason for this is that this table must work like a data file that would never be written to by anything other than the database system. The integrity of the data in the local table could be breached if there was any change to the remote database.”

Lars mentions Jeremy’s failover design that is recommended by Proven Scaling. Thanks Lars!

There is a lot of confusion about the difference between row based and statement based replication. Hopefully my row based replication post can clear up some of the confusion.

For keeping things simple and easy to debug I recommend sticking with either row or statement based replication. Teaching the implications of each to application developers is going to be more difficult than sticking with one model. The exception is things that can’t use row based replication like DML statements.

When using a SAN make sure you have redundant SANs, your backups aren’t on the same SAN as mysql and that if you have to use a shared san you have good control over the resources. MySQL is very sensitive to i/o latency increases. If someone else does a large operation on the SAN it can increase the i/o latency in mysql causing operations to take longer which can bring down your application.

I don’t recommend using a hardware load balancer to manage write load between masters. There is a risk of sending writes to both masters at the same time. In a properly configured dual master setup half the writes will be rejected on a read only error. The worst case is that writes go to both masters causing replication error and inconsistent data.

The tutorial is wrapping up now. I look forward to using row based replication 5.1. Thanks Lars and Mats.

Replication tutorial notes – part 1

I’m attempting to live blog corrections and notes while sitting in the replication tutorial. Lars is covering available options in MySQL replication. I’m going to attempt to cover some recommended best practices and things that are possible to do in MySQL but should be avoided. Please keep in mind that I’m writing this during the presentation. If anything is confusing post a comment and I will clean it up.

When designing a MySQL architecture that are several possible configurations. Two that should be avoided are dual master where you write to both masters. Configuring replication in a dual master dual writer setup means there is no single authority on the data. There is also no need to write to both masters as this doesn’t give you any performance improvement. Each master has to process the same sql statements. One step further is circular replication that wasn’t mentioned in the talk but has been in other publications. When using three or more masters if one dies there is no way to restore it without brining down the other masters.

There are several configuration options for filtering data on the slaves. You can filter but the current database or by tables. My recommendation is to keep the data on the slave the same as the master as much as possible. When the slave is different you’re much more likely to have queries succeed on the master but fail on the slave. The worst of these options is replicate-do-db. This option filters queries based on “USE db”. If the use db is set differently on the master the query will succeed but can be passed through the slave without executing. This is a silent failure and won’t cause replication to stop.

A quick note about show binlog events. This is similar to mysqlbinlog in that it dumps events in the binary log. The difference is that show binlog events doesn’t include set variables.

When using relay slaves it’s very important to always configure them in pairs. If a relay slave dies there is no good way to to connect it’s slaves to the relay slave’s master. Using blackhole can cause silent failures to be passed through as well as issues with auto increment and storage engine differences. Blackhole should always be avoided.

Reset slave should function the way it does on the slide in that it should delete everything and leave the slave in a blank state. It actually deletes the logs and resets the host, user, and password. The functionality has been changed a few times in different version so be sure to check the manual. The differences are in that mysql may keep the old host information in memory or it may forget.

Purge master logs was covered briefly. It should be noted that purge master logs won’t purge logs that a connected slave hasn’t downloaded. If a slave is disconnected the master will happily delete logs that a slave hasn’t downloaded yet. I recommend keeping at least 7 days of logs on the master for situations where slave gets disconnected.

Now we’re into the section of specialized slaves. These slaves have a subset of data on the master. Avoid this as much as possible see my row based replication post about half way down for the reasons.

The slide for HA + Scale out has dual master with all of the slaves hanging off of one master. It’s better to have slaves hanging off each master because if the active master dies hard enough that you can’t get binlog offsets from it then there isn’t a good way to change slaves over to the existing master. Balancing your slaves will make it easier to failover. Ah excellent someone up front just pointed this out.