While attending Lars Thalmann’s session on Row Based Replication I began to think about how the new features will allow us to do creative new things with MySQL Replication. For example we can now issue an update query with a join and have one slave update only rows from table a and another slave only update rows from table b. This is very powerful but also very dangerous.
My experience has been that the average php developer can write average sql but doesn’t tend to think about the overall impact on a system when writing code. This is where system architects apply. A system architect can view an entire system and design it’s components to scale well, be fault tolerant and easily maintained. An application level developer will implement the system based on the design. Replication is a very deceptive convergence between the architect role and the developer role because single queries written by a developer if written incorrectly can break replication. Sometimes the break it noticeable sometimes it isn’t for several months.
Most developers see the database as a single entity. After some work they can begin to see it as two entities, a master and a slave. They can grasp that a read from the slave might not contain information that was just inserted on the master and if they need that information they should go directly to the master. Most developers fight the term ‘critical read’ by trying to send most reads to the master. This manifests itself in a very busy master and mostly idle slaves and is easily corrected. I’m not trying to say that architects are smarter than developers simply that they are different people and have different views of the system. It’s difficult to think about the overall view of the system while trying to focus on a small module of code.
Options like replicate-do-db, and replicate-ignore-db force developers to think about more system level aspects while they are writing code. Now instead of just master and slave they have to think about master, slave with db A only, slave with db B only. This greatly increases the chance of a developer making a mistake when writing code. The more complex the architecture is the more the developer has to keep in mind when writing code for it.
For many years MySQL experts have been advising against the use of replicate-do-db and replicate-ignore-db because it required the developer to think about how the default database affected the replication of queries. Use of these options are the frequent cause of problems because they force developers to think more about the overall system while writing individual queries than they normally would. Developers would forget what the default database is set to, write queries that should replicate and wouldn’t and wonder why. The slaves then miss updates and bad things happen.
Now we have row based replication. Row based replication is basically a hook into the handler interface that captures handler events such as insert, update, and delete and writes them out to the binary log. This gives us powerful new ways to speed up replication and make slaves more exact replicas of their masters. The problems with auto_increment, timestamps, stored procedures, and triggers vanish, statements that update very few rows are now blazing fast and a whole new world of optimization has opened up. With the restriction of statement execution order removed we’re now free to update multiple tables at the same time provided that updates to a single table aren’t done out of sequence. I’m not sure if these optimizations exist yet but they hopefully they will (multiple execution threads — hint hint Lars ) With this new technology MySQL replication will be more powerful and more flexible than ever. This comes with a price, the previous gun you could use to shoot yourself in the foot has been taken away and replaced with a cannon. Now not only can you shoot yourself in the foot with replication configurations you can blow your whole leg off.
Row based replication gives you the option to filter changes based on individual tables. Even individual tables in the same query. An update with a join can change rows in two tables in the same query. With row based replication the changes to the individual tables are replicated to slaves as individual events, they are no longer coupled together by the query. This means that a slave can choose to ignore events for a single table. What this means to the developer is that they have to think about the replication settings for each slave for every single DML query they write. That’s an easy things to forget given the mass of other things developers have to keep track of.
I think row based replication is a huge step forward for mysql and that it will eventually take over as the preferred method of replication (I’ll try to do another post on row vs statement based replication. This post is getting a bit long). Beware when deploying it that you don’t create a trap for your developers. Keep things simple and the design straight forward and everything will be fine.