Archive for the 'MySQL' Category

tests and mysql-test sub directories

Sunday, August 12th, 2007

Recently I’ve been making more modifications to the MySQL source. Part of making modifications is testing them. For years I’ve been curious why there are both tests and mysql-test sub directories in the source tarballs. Arjen Lentz tells me that tests is old and that mysql-test is the current testing framework. This makes sense since the manual testing pages are all about mysql-test.

Are there plans to clean up the remnants of the old testing framework? What else is there in the tarball that’s outdated and needs to be removed?

set sql_log_slow=0 to control the slow query log.

Tuesday, July 3rd, 2007

Currently there isn’t a good method to control sending individual queries to the slow query log. Typically this isn’t an issue. It becomes an issue when using very short query times and importing sql files. The slow query log doesn’t have a limit on the length of queries it will write to a log. If you’re importing a sql file with inserts that get sent to the log mysql will dump the entire insert query. This can cause the slow query log to grow to gigs in size in a very short time.

I’ve patched mysql 5.0.41 to add a session variable called sql_log_slow (think sql_log_bin) that when set to 0 will prevent queries from that session from being sent to the slow query log. The patch also updates mysqlbinlog to set that variable in it’s output. One issue with this patch is that the executable comment version in mysqldump is set to 5.0.41 but it will error when imported on any 5.0.41 server that doesn’t have this patch. If it’s merged the version in the comment will need to be bumped.

sql_log_slow is also handy for controlling noise in the slow query log. It can be used to prune down queries that are known to be slow so only true problem queries are logged.

Here is the patch. I’ll post this on internals soon.

How alter table locks tables and handles transactions

Monday, May 7th, 2007

I’ve talked to several people that have questions about how alter table works under the hood. They want to know how it handles locking tables why they can sometimes use a table during alter table and other times they can’t. Also why it’s so slow :)

First let’s look at the basic process alter table typically goes through.

  1. If a transaction is open on this thread, commit it.
  2. Acquire a read lock for the table.
  3. Make a temporary table with new structure
  4. Copy the old table to the temporary table row by row changing the structure of the rows on the fly.
  5. Rename the original table out of the way
  6. Rename the temporary table to the original table name.
  7. Drop the original table.
  8. Release the read lock.

The slowest part of the process is copying rows from the original table to the temporary table. For large tables this can take minutes to hours. There are a few optimizations built into this process. If the alter table query only renames the table then mysql doesn’t bother copying all the rows to a temporary table and just renames it. For most other things such as renaming columns, adding/dropping indexes, making columns nullable, changing the column default all require copying the entire table.

During the first 4 steps MySQL allows other clients to read from the table being altered. When alter table is done copying rows to the temporary table and is ready to rename it it changes the table lock. MySQL instructs all other clients currently reading from the table to close the table when they are done. While alter table is waiting for existing clients to finish reading from the table it prevents other clients from starting to read from the table. During this time selects will be blocked on “Waiting for tables”. When the last client is done reading from the table alter table continues renaming the table.

This has some interesting implications for transactions and repeatable reads. Internally innodb keeps track of when rows are created. When a transaction is started it can only see rows that were created before the transaction was started (using repeatable read). Any rows created after are not returned. Since alter table copied rows from the old table to the new table rows get a new version number as they are inserted into the temporary table. An alter table can cause a dirty read in transactions that span an alter table. Transactions started before alter table will get no rows back from the table after alter table is finished. If your application is sensitive to dirty reads or getting no rows back from a table (really dirty read :) ) then don’t run alter table on a server when clients are running.

Here is an example.

mysql a> alter ignore table t add unique index (t);

mysql b> begin;
Query OK, 0 rows affected (0.00 sec)

#This select is from the original table while alter table is copying rows
mysql> select * from t limit 10;
+——+
| t |
+——+
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
| 10 |
+——+
10 rows in set (0.00 sec)
#alter table finishes
#Rows created in the temporary table before we issued begin
mysql> select * from t limit 10;
+———-+
| t |
+———-+
| 10 |
| 6920631 |
| 27998430 |
| 41865298 |
| 49403894 |
+———-+
5 rows in set (0.00 sec)

#Get a new view of the table
mysql> commit;
Query OK, 0 rows affected (0.00 sec)

#This select returns all the rows.
mysql> select * from t limit 10;
+———–+
| t |
+———–+
| 10 |
| 6920631 |
| 27998430 |
| 41865298 |
| 49403894 |
| 50522347 |
| 84441015 |
| 109401269 |
| 110202688 |
| 123590778 |
+———–+
10 rows in set (0.00 sec)

If we start the transaction before alter table the select after alter table has finished will return no rows even though there are rows in the table. I’m not sure why innodb allows rename of a table when transactions still have a few of that table open. It seems like a bug to me.

[Updated 2007-05-14 Bug #28432

select sql_cache_ttl

Monday, April 23rd, 2007

I’ve been thinking about the query cache since last years user conference. One of the features of the query cache is that it’s completely transparent to client. It achieves this by clearing cache entries for tables as soon as the tables are updated. This makes the cache inefficient for tables that are constantly updating.

While thinking about this also thought about slaves and non critical reads. In a replication setup reads that are sent to slaves expect to get data back that might not be the most current. In this situation it doesn’t make sense to expire the cache for every single update because queries running against the slave already know that they are going to get back slightly old data. I thought, “Why not add a time to live to cache entries instead of clearing them for every update?” So, I did.

I added an option to select called sql_cache_ttl. This option instructs the query cache hold the entry in the cache for query_cache_ttl seconds. query_cache_ttl can be set globally or per thread. Initial testing shows that it does help eliminate the cache thrashing that occurs during updates on tables that have many large cache entries.

I have a few more optimizations that I want to add to the patch before sending it into MySQL. The patch below is a preview that works with 5.0.37.

The patch.

Where will I be at the UC?

Saturday, April 21st, 2007

Do you want to say hi? Pick up a free bottle opener (I haven’t seen them yet but Jeremy says they are high quality)? Sway my opinion on DRBD? Most of my time at the user conference this year will be spent at the Proven Scaling corner of the Solid booth. I also have the privilege of participating in Meet the Experts “Replication in the Real World” event at the Solid booth during lunch on Wednesday.

DRBD in the real world.

Monday, April 2nd, 2007

I’ve noticed a few blog posts recently about people saying how great DRBD is as a fail over mechanism for MySQL. My experience with DRBD has been the complete opposite. It offers almost no benefit over binary log replication for typical MySQL setups and prevents a few things that are possible with binary log replication.

Kaj Arnö has written an excellent blog post on the basics of DRBD. DRBD has one great feature that binary log replication doesn’t have. It can ensure that a write is synced to disk on two different hosts before allowing the application to continue. This is great for data redundancy but it introduces potential for instability in the setup. In a good fail over scenario a problem on the backup master should never cause an issue on the primary master. With DRBD the second master lagging behind because of a degraded raid, network issue, operator error, name your poison causes issues on the primary master because MySQL has to wait for writes to be synced to disk on _both_ machines before continuing. I know there are 3 different protocol modes that DRBD can operate in. Protocol C is really the only one that gives any extra data security over binary log replication so it’s the one I’m focusing my attention on. If an issue on one master causes problems on another then the benefit of having redundant masters is effectively lost.

When DRBD, the operating system, or hardware crashes it crashes hard. Any corruption on the primary master [update 2008-05-19: above the DRBD layer] during a nasty failure gets happily propagated over DRBD. Binary log replication executes queries on the slave the same way they were executed on the master giving a better chance of a tickled kernel/filesystem bug on one master won’t be ticked on the other master. The primary master will simply crash leaving the secondary master in a consistent state waiting to take on live traffic.

I’ve heard reports from clients that run DRBD fail over in the wild that do bulk load operations over DRBD puts enough load on the pair of masters that queries start timing out. I haven’t directly tested it but the client is a reputable source. I’ve personally seen alter table take much longer than normal (sorry about not having exact numbers but it’s > 2x) and cause enough commit operations to stack up to cause the system response time to go up high enough to time out clients. This is an outage caused by nothing more than a simple schema change.

I’ve saved the best for last. Since DRBD is a replicated block device, that block device can only be modified on one host a time. With binary log replication and dual master (one master hot) it’s possible to do most schema changes on the warm master, fail over, let the changes replicate over to the previously hot (now warm) master where they are run again without interrupting clients. This is a great workaround for not taking down time during large alter table operations.

DRBD offers slightly more data redundancy than normal raid configurations with the cost of a less stable less operationally friendly system. I see it as a great stop gap solution for applications that have no ability to do replication on their own. Since MySQL has this ability we should be focused on hardening and optimizing replication (checksum events please!) instead of finding ways around it.

If you have a DRBD fail over setup and want to get rid of it or want help setting up a proven system based on binary log replication drop me a line via the contact form on provenscaling.com or email me directly eric@provenscaling.com. Flame in the comments.

Don’t reference auto increment ids outside of mysql.

Tuesday, March 20th, 2007

It’s a common practice to use MySQL’s auto increment feature for things like product ids or invoice numbers. While it always works when the system is first setup it can cause major headaches over time. The problems usually fall into one of three categories, dual master, test/staging setups, and reconciliation of ids.

MySQL in a dual master configuration requires that each master not be able to assign the other masters id. Replication events that use auto increment are tagged with the id that was allocated for that transaction. If you run mysqlbinlog on a bin log it will show something like SET INSERT_ID=123; This ensures that auto increment events are replicated to slaves correctly.

In a dual master configuration each master will try to allocate the same id, replicate the event to the other master where that id is already taken. This causes replication to stop on a duplicate key error. In MySQL 5 there are the auto_increment_increment and auto_increment_offset options that attempt to alleviate this situation by letting masters allocate unique ids. More details on how they work can be found in the manual. Using these options wastes key space so you will end up using larger data types for increment ids making MySQL less efficient. if you really need dual master (talk to me if you _really_ think you do, chances are you don’t) it works ok. A nasty problem rears it’s head when you start referencing these ids outside of MySQL. My favorite example is using auto_increment for invoice ids. Since two masters are allocating ids at offsets they aren’t contiguous like most accountants like. Instead of invoice numbers like 1001, 1002, 1003 you end up with 1011, 1021, 1022. depending on which master allocated which id when. Some might argue that this is a handy way to artificially inflate invoice ids but I think it’s a waste.

The testing/staging scenario usually causes reconciliation of ids. It’s more painful than dual master because fixing it often involves contacting partners with new ids, throwing away existing data or writing reconciliation scripts. What happens is that during staging data entry occurs. The ids generated from auto increment are associated with specific data entries and handed out. When the data is ready, it’s deployed to production. If production has also allocated new ids from previous data entry a collision occurs. It’s true that this can be avoided by good communication with product teams about how to use the ids but it’s better to just not reference auto increment ids outside of mysql in the first place. It also works the other way. Records from production need to be imported into a staging environment for testing purposes but staging has allocated those ids from previous tasks.

Reconciliation of ids, not counting the scenario above usually involves someone somewhere in the company that doesn’t like that the ids start with 1 instead of 0 or that when an item is deleted it’s id isn’t used again. Maybe ids should have started with 1000 instead of 100. Most of these can be fixed with simple updates, others require costly migrations away from auto_increment still others require dumping all the data, fixing the ids, and importing it again.

I’ve found it very difficult to convince people not to reference auto increment ids outside of their application. It seems so simple and clean cut on the surface. This entry was spawned from two different issues I helped people with today involving reference auto increment ids outside of MySQL and how to recover from the situation it caused. I’ve ran into this problem many many times in the past. Not referencing auto increment values outside of MySQL is a good best practices bullet point for any general MySQL presentation.

[Updated: 2007-03-21 I just talked to another person that is setting a default value of 1000 for auto increment so they can use it for ID card numbers.]

Client auto vertical output

Wednesday, March 7th, 2007

Do you hate it when you execute a query in the mysql cli only to have the output wrap three times making it almost impossible to read? Do you end up re-executing the query with \G waiting once again for the query to run? Well wait no more! My very first submitted patch to mysql is a client patch that adds –auto-vertical-output. This option compares the width of your terminal to the width of the result set and enables vertical output if it’s going to wrap. The patch link and details are Bug #26780

I’ve given up on the MySQL manual search

Thursday, January 25th, 2007

It’s simply too inaccurate to be useful. I’ve switched to using google with firefox bookmark keywords instead. Here is how to set it up. In firefox click on Organize Bookmarks under the Bookmarks menu. Click the New Bookmark button and fill in the following values.

  • Name: MySQL Manual
  • Location: http://www.google.com/search?hl=en&q=site:dev.mysql.com%20%s
  • Keyword: mysql
  • Description: Keyword bookmark for using Google to search the MySQL manual.

Click ok and close the bookmark menu. Now instead of typing mysql.com/foo or using the search box on mysql.com you can simply type mysql foo into the address bar. Firefox will use your keyword to load the location in the bookmark replacing %s with foo resulting in results from the mysql manual. This can be done with any search engine. Simply do a test search and replace the search term with %s for the bookmark url.

BACKUP/RESTORE TABLE lose auto increment

Friday, January 5th, 2007

One small fact about backup table and restore table that isn’t listed in the manual is that these commands lose the auto increment value if rows at the head of the table are deleted. For example if you have auto increment values of 1,2,3,4 in a table the auto increment value is 5. If you delete row 4 the next auto increment will still be 5. If you backup/restore the table the auto increment will be reset to 4. The auto increment value in myisam is stored in the MYI file. Since this file isn’t backed up myisam restores the auto increment value from the highest existing value in the table. This value may or may not be the actual value of auto increment. As the manual says these commands should not be used. Instead use mysqlhotcopy.