Archive for the ‘MySQL’ Category.

Update on porting table_statistics to MariaDB

Over the past few weeks I’ve been working on porting the per table and per session row stats from the tivo patch into mariadb. This is the show table_statistics functionality. I’ve been pushing the code to a branch on lanchpad as I get features working. Up to revision 2953 the row stats are working but the index stats aren’t. I think I have a reasonable implementation of the row stats code given my level of C++ skill. I’m changing the syntax a bit from how it worked in the tivo patch to make it be more compatible with how the original row stats code worked that was ported into mariadb.

In MariaDB 5.2 there are show table_statistics show index_statistics, and show user_statistics functions that return global stats collected by the server when the userstats variable is enabled. There are also associated information schema tables that feed these commands. So far I’ve changed the syntax of the show command to be

SHOW [ QUERY | SESSION | GLOBAL ] TABLE_STATISTICS

Where the default show table_statistics returns the global stats. This is different from the tivo patch against 5.0.72 which changed the default to be the last query. The new syntax keeps things backwards compatible with the original show command. The query keyword will return the previous query_statistics_history_size queries of statistics. This is a bit different from the 5.0 version which only tracked one query.

For example if I set query_statistics_history_size=5 then mariadb will track the row stats from the previous 5 queries:

mysql> set query_statistics_history_size=5;
Query OK, 0 rows affected (0.00 sec)

mysql> show tables;
+—————-+
| Tables_in_test |
+—————-+
| t |
| t2 |
+—————-+
2 rows in set (0.00 sec)

mysql> select * from t;
+—+
| t |
+—+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+—+
5 rows in set (0.00 sec)

mysql> select * from t2;
+—-+
| t2 |
+—-+
| 1 |
| 2 |
| 3 |
+—-+
3 rows in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 81 | test | t | 5 | 0 | 0 |
| 82 | test | t2 | 3 | 0 | 0 |
+———-+————–+————+———–+————–+————————-+
2 rows in set (0.00 sec)

mysql> select count(*) from t,t2;
+———-+
| count(*) |
+———-+
| 15 |
+———-+
1 row in set (0.00 sec)

mysql> show query table_statistics;
+———-+————–+————+———–+————–+————————-+
| Query_id | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+———-+————–+————+———–+————–+————————-+
| 81 | test | t | 5 | 0 | 0 |
| 82 | test | t2 | 3 | 0 | 0 |
| 84 | test | t | 5 | 0 | 0 |
| 84 | test | t2 | 3 | 0 | 0 |
+———-+————–+————+———–+————–+————————-+
4 rows in set (0.00 sec)

Note that show query table_statistics shows the stats only for the current connection. There isn’t a way to get lists from other threads. My next task is to add a separate table with the query_id and query text similar to show profiles that can be used to correlate a query back to the different statistics kept for it. After that I plan on adding index statistics.

The older version of this in the tivo patch has been extremely useful for optimizing queries. I hope I can make this even better and get it into mariadb for other people to use. If you have any suggestions for how I should change this please post in the comments.

Rare MyISAM failing to delete MYD file for temporary tables.

I manage a few systems that every hundred million disk temporary table deletions or so one of them will fail. MySQL will delete the MYI file but leave the MYD behind. It’s very strange. There isn’t an error in the error log but subsequent queries that try to use the same temporary table name will error because the MYD file still exists. The queries fail with an error like ERROR 1 (HY000): Can’t create/write to file ‘/tmp/#sql_25d1_0.MYD’. Fortunately the client error gives the temporary table name so it’s easy to clean up by deleted the MYD file. While I still don’t know what the root cause is but I was able to patch MySQL to fix the issue.

The old mi_delete_table function which is responsible for deleting only the MYI and MYD file would delete the MYI file and leave the MYD file behind. This function would try to delete the MYI file, get an error back from the filesystem and not attempt to delete the corresponding MYD file. The strange part is that the file is actually deleted. The error is captured in mi_delete_table but is lost by the functions calling it. I’ve tried to patch some parts of the temporary table code to try to track down the actual filesystem error but haven’t seen any errors returned. I don’t think I’ve correctly patched all the temporary table code to capture the error. This hasn’t been a priority because the patch clears up the symptom of the issue which is the MYD file being left behind.

In my new mi_delete_table function I try to delete both the MYD and MYI file before returning an error. This has fixed the issue of MYD files being left behind but I still don’t know the root cause of why they weren’t being deleted in the first place. I’m not sure about filing a bug for this because even though I think my mi_delete_table is better than the stock one I can’t really reproduce the issue.

Anyway here is the patch if anyone is interested.

Second draft of the per session row and index stats patch

I’ve taken the part of the tivo patch that includes table and index statistics and broken it out into it’s own patch. This patch includes the ability to do show [session | global ] table_statistics and the same for index_statistics. In this version the row stats are also logged in the slow query log. To log per query stats I had to track them separately from the per session stats. Because the tracking was already done for the slow query log I’ve modified the command to allow uses to access row stats for the previous query separate from the sum for the current session. The flush commands also act similarly.

Along with changing the slow query log format I’ve also change it to log the timestamp with every query. This made it easier to do automated parsing of the slow query log.

The queries now support three different modes. I’ve detailed how the show commands work in a previous post. This patch has helped developers diagnose numerous query problems that weren’t obvious by the row estimates in the explain output. I highly recommend it for users who run more complicated queries in mysql and have been having a hard time diagnosing why they are slow. This has proved particularly adept at finding misbehaving sub queries and bad join optimization which doesn’t show up in explain output. As I find example queries I will try to post them here.

The patch.

max_allowed_packet replication and global variables

The max_allowed_packet variable is used to control the maximum size of a query sent to MySQL. It’s function is fairly well defined in the manual but there is a significant gotcha that exists when changing the size of max_allowed_packet while using replication.

When the replication threads are created the global max_allowed_packet value is copied in to the thread context like doing a set session command in the slave connection. This is done because replication enforces max_allowed_packet a bit differently than other threads. It accounts for both the size of the packet and the overhead of the replication header. This makes the max_allowed_packet enforcement accurate in replication but it means that the slave thread won’t account for set global max_allowed_packet=N until replication is restarted. It should be possible to do the same calculation while checking incoming packets but I haven’t looked into making a patch.

I think this has flown under the radar because as soon as the slave hits a query that is larger than max_allowed_packet the i/o thread dies. For example when increasing the max_allowed_packet on a master and slave without restarting replication then running a query larger than the old max_allowed_packet but smaller than the new size and replication will break on the slave. When the operator runs slave start the slave i/o thread will pick up the new global max_allowed_packet and die with:

Last_Error: Could not parse relay log event entry. The possible reasons are: the master’s binary log is corrupted (you can check this by running ‘mysqlbinlog’ on the binary log), the slave’s relay log is corrupted (you can check this by running ‘mysqlbinlog’ on the relay log), a network problem, or a bug in the master’s or slave’s MySQL code. If you want to check the master’s binary log or slave’s relay log, you will be able to know their names by issuing ‘SHOW SLAVE STATUS’ on this slave.

This really only means that the relay log was corrupted. The fix for a corrupted relay log is to do slave stop then a change master to back to the same Exec_master_log_pos and Relay_Master_Log_File from show slave status. Only these two fields need to be specified. The username, password, and host will be reused from the old connection. This will delete the old binary log and download a new one. The new slave threads will have the new max_allowed_packet and everything will move along.

I think there are two things to consider here. First is that mysql can behave in unexpected ways when setting global variables. The other is that it’s important to restart replication when increasing max_allowed_packet on a slave. I haven’t looked into writing a patch for this because it’s so easy to work around by remembering to restart replication.

How to be a MySQL DBA and the best MySQL book on the planet.

Recently there was a thread on the mysql mailing list discussing how to become a MySQL DBA. I’m not sure the MySQL DBA role exists in the same capacity that it does in Oracle. Historically the Oracle’s DBAs that I’ve met are focused purely on Oracle. They focus on maintaining Oracle based systems including managing migrations, upgrades, table space sizes and other tasks. MySQL DBAs tend to be filed in to two different buckets, people that work like developers and help with query optimization and people that work like sys admins and are focused on the operation of MySQL. There are very few people who can fill both roles and I think that’s why there are so many MySQL DBA jobs on the market. Companies are looking for one DBA when they should really be looking for two.

Jeremy’s post on how to hire a MySQL DBA is still true today. These people still don’t exist. I’ve noticed there are two groups of people with part of the skills needed to be a MySQL DBA. Good Oracle DBAs tend to be very well versed in SQL and query optimization. They’re good at working with developers to write queries that will play nice with the database. They have brains that think in sets of data and can handle complex query logic. The downside is that they have only been exposed to Oracle which includes everything and have a hard time with the LAMP world where systems must be built out of a lot of separate components.

Sys admins on the other hand are used to managing daemons, working with rsync, linux, and shells. They can handle software deployment, monitoring, and understand system performance from the operating system level. They understand the basics of configuration and quite a few of them can handle simple MySQL tasks such as installation and basic replication configuration. They tend to not have very much experience in query optimization or the specifics of how applications interact with databases. I’ve long held the opinion that MySQL should just be another component in the system and doesn’t need specialized and isolated monitoring solutions which makes it easier for a group of sys admins for it to monitor along side apache and other daemons. To turn a sys admin in to a DBA they need to understand the special requirements MySQL has such as i/o latency and atomicity of backups. Good sys admins can pick up these skills quickly.

This brings me to the best MySQL book on the planet, High Performance MySQL Second Edition. Why is it the best? Because it applies to both types of DBAs and can help them develop the skills they need to become a super DBA that can handle both the sys admin tasks and the query optimization tasks. The book has been out for roughly two years and is still very relevant. I recommend it to everyone that asks me where they can go to learn how to be a MySQL DBA and it’s never disappointed. A quick note on books. Don’t loan out your copy, ask the person you recommend it to to buy a copy. It helps the writers and this book should be on the desk of anyone working with MySQL.

Table statistics draft 2, the slow query log

I’ve posted a new table statistics patch which is the next version of the session table/index statistics patch This version of the patch adds slow query log output. If a query is logged to the slow query log it will have row count statistics added to it.

I’m not sure about the format of the log which is why I’m posting this so early. The first format I tried was:

# Time: 100119 19:24:37
# User@Host: [ebergen] @ localhost []
# Query_time: 10 Lock_time: 0 Rows_sent: 7 Rows_examined: 3
# Rows_read: sbtest.foo:3, sbtest.bar:3,
select * from foo a, bar b where sleep(1) = 0;

Where there would be an additional line for each of rows_changed, rows_changed_x_indexes and index_rows_read. This seemed verbose so I tried a different format of:

# Time: 100119 20:27:16
# User@Host: [ebergen] @ localhost []
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14
# Rows Stats: sbtest.foo 18 0 0, sbtest.bar 15 3 3,
# Index Stats: sbtest.bar.u 6,
select * from foo a, bar b where b.u=4 order by sleep(1);

Where the row stats has 3 columns per table of rows_read, rows_changed, rows_changed_x_index. I’m leaning towards the second format but I’m open to ideas. What do you think?

The new patch is here

First draft of per session table and index statistics

I had some free time over Thanksgiving so I decided to work on something I have been thinking about for quite some time. I hacked up Google’s show table_statistics patch to also track stats per connection. I say this is a first draft hack because I based it off of the v2 patch which uses a straight up hash table instead of the intermediate object cache.

I’ve added the global/session key word to the existing show table_statistics command in the same way that show status works. This means that the default behavior of show table_statistics is to show session data instead of global data. This is different from the Google patch which only works globally. This has been running in production environments for a bit and seems stable. Note that these environments don’t run at the concurrency that motivated Google to update the patch to be less likely to lock a global mutex. You have been warned!

I’m planning on updating the patch with more stats and a cache for the global stats. So far it’s been useful in debugging queries that have low row estimates in the explain plan but are actually scanning quite a few rows. Explain tends to handle row count estimates for sub queries poorly. It’s handy to copy a query from the slow query log on a production server and run it again using show session table_statistics to see how many rows it actually read from individual tables. I also have plans to have build time tests which can keep track of row counts from a sample database. I also want to look into adding these stats directly into the slow query log.

Here is the updated patch. The patch applies against 5.0.72sp1. Here are the command descriptions.

For table statistics:

SHOW [GLOBAL | SESSION] TABLE_STATISTICS [LIKE 'pattern' | WHERE expr]

FLUSH [GLOBAL | SESSION] TABLE_STATISTICS

Index statistics:

SHOW [GLOBAL | SESSION] INDEX_STATISTICS [LIKE 'pattern' | WHERE expr]

FLUSH [GLOBAL | SESSION] INDEX_STATISTICS

Some examples.

mysql> show session table_statistics;
Empty set (0.00 sec)

mysql> show global table_statistics;
+————+———–+————–+————————-+
| Table | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+————+———–+————–+————————-+
| sbtest.foo | 6 | 0 | 0 |
+————+———–+————–+————————-+
1 row in set (0.00 sec)

mysql> select * from sbtest.foo;
+——-+
| t |
+——-+
| 82921 |
| 24489 |
| 73681 |
+——-+
3 rows in set (0.00 sec)

mysql> show session table_statistics;
+————+———–+————–+————————-+
| Table | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+————+———–+————–+————————-+
| sbtest.foo | 3 | 0 | 0 |
+————+———–+————–+————————-+
1 row in set (0.00 sec)

mysql> show global table_statistics;
+————+———–+————–+————————-+
| Table | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+————+———–+————–+————————-+
| sbtest.foo | 9 | 0 | 0 |
+————+———–+————–+————————-+
1 row in set (0.00 sec)

pid file directory and a full disk

To continue the pid file theme I’ve found another slight issue. This was unrelated to the testing which I found the previous pid file issues. I was working on an unmonitored development mysql system. While working on it I ran it out of disk space in /. The box has it’s mysql datadir in a separate partition which had plenty of space. The pid file is in a dir on /. When I started mysqld_safe mysqld exited because it couldn’t create the pid file. mysqld_safe continued to restart mysqld until I saw the problem and killed it a few minutes later. I’m not sure exactly why, I didn’t spend very much time digging into a failure that I caused by filling up the disk. mysqld was exiting because it was trying to create a pid file in a full partition.

Note: This was a stock mysqld, not one running my pid file patch.

Attempting to unwind the tangled web of pid file creation.

Previously I wrote about how late the mysql pid file is created in the startup process. At first glance it seemed like a relatively easy thing to fix. In main() there is a call to start_signal_handler(). The first instance of static void start_signal_handler() does only one thing. It checks !opt_bootstrap to make sure mysqld isn’t being called by mysql_install_db. I’m not sure why mysql_install_db can’t have a pid file created but that’s getting outside the scope of my investigation. It seems simple enough to move the call to start_signal_handler() above the call to init_server_components() in main() and have the pidfile created earlier. It turns out pidfile creation happens differently on different arches.

For windows and netware start_signal_handler simply creates the pid file. For __EMX__ (I’m not sure what that is) start signal handler does nothing. By default start_signal_handler starts a signal handler thread. This thread then creates the pid file. I think this can be cleaned up by removing the start_signal_handler functions that either do nothing or only create a pid file and handle the pid file creation for arches that need it directly in main with some good self documenting ifdefs right around it.

I don’t have all the environments to test that this patch really works. I’ve tested it on linux and it does create a pid file and deletes it on shutdown. The pid file is created just after argument parsing and before the heavy weight storage engine initialization.

[Update 2009-12-07: I think the old patch broke embedded. I updated it to ifdef out the call to start_signal_handler]
Here is the patch Create pid file sooner patch.

mysqld_safe and pid file creation race condition

mysql_safe is responsible for restarting mysqld if it crashes and will exit cleanly if mysqld was shutdown. The way it determines if mysqld shutdown correctly is if the pid file is cleaned up correctly. MySQL does quite a few things before creating the pid file like initializing storage engines. It can take quite a while to initialize innodb if it has to roll forward the transaction log. During this time if mysqld crashes mysqld_safe won’t restart it. Normally this is ok because the server would just crash again but it can mess with your head a bit if you’re testing changes to mysqld_safe. Especially if those changes involve what mysqld_safe does if mysqld crashes. I think it makes sense to create the pidfile earlier and there is a bug for it. Chime in on the bug if this has burned you.