WordPress on Android
It’s like my own little twitter. I don’t think I will be publishing much from this but it’s great for creating stub posts.
You will probably want some waders, a pickaxe, and one of those hats with a light on it before you go in here.
It’s like my own little twitter. I don’t think I will be publishing much from this but it’s great for creating stub posts.
I’ve posted a new table statistics patch which is the next version of the session table/index statistics patch This version of the patch adds slow query log output. If a query is logged to the slow query log it will have row count statistics added to it.
I’m not sure about the format of the log which is why I’m posting this so early. The first format I tried was:
# Time: 100119 19:24:37
# User@Host: [ebergen] @ localhost []
# Query_time: 10 Lock_time: 0 Rows_sent: 7 Rows_examined: 3
# Rows_read: sbtest.foo:3, sbtest.bar:3,
select * from foo a, bar b where sleep(1) = 0;
Where there would be an additional line for each of rows_changed, rows_changed_x_indexes and index_rows_read. This seemed verbose so I tried a different format of:
# Time: 100119 20:27:16
# User@Host: [ebergen] @ localhost []
# Query_time: 6 Lock_time: 0 Rows_sent: 6 Rows_examined: 14
# Rows Stats: sbtest.foo 18 0 0, sbtest.bar 15 3 3,
# Index Stats: sbtest.bar.u 6,
select * from foo a, bar b where b.u=4 order by sleep(1);
Where the row stats has 3 columns per table of rows_read, rows_changed, rows_changed_x_index. I’m leaning towards the second format but I’m open to ideas. What do you think?
The new patch is here
I had some free time over Thanksgiving so I decided to work on something I have been thinking about for quite some time. I hacked up Google’s show table_statistics patch to also track stats per connection. I say this is a first draft hack because I based it off of the v2 patch which uses a straight up hash table instead of the intermediate object cache.
I’ve added the global/session key word to the existing show table_statistics command in the same way that show status works. This means that the default behavior of show table_statistics is to show session data instead of global data. This is different from the Google patch which only works globally. This has been running in production environments for a bit and seems stable. Note that these environments don’t run at the concurrency that motivated Google to update the patch to be less likely to lock a global mutex. You have been warned!
I’m planning on updating the patch with more stats and a cache for the global stats. So far it’s been useful in debugging queries that have low row estimates in the explain plan but are actually scanning quite a few rows. Explain tends to handle row count estimates for sub queries poorly. It’s handy to copy a query from the slow query log on a production server and run it again using show session table_statistics to see how many rows it actually read from individual tables. I also have plans to have build time tests which can keep track of row counts from a sample database. I also want to look into adding these stats directly into the slow query log.
Here is the updated patch. The patch applies against 5.0.72sp1. Here are the command descriptions.
For table statistics:
SHOW [GLOBAL | SESSION] TABLE_STATISTICS [LIKE 'pattern' | WHERE expr]
FLUSH [GLOBAL | SESSION] TABLE_STATISTICS
Index statistics:
SHOW [GLOBAL | SESSION] INDEX_STATISTICS [LIKE 'pattern' | WHERE expr]
FLUSH [GLOBAL | SESSION] INDEX_STATISTICS
Some examples.
mysql> show session table_statistics;
Empty set (0.00 sec)mysql> show global table_statistics;
+————+———–+————–+————————-+
| Table | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+————+———–+————–+————————-+
| sbtest.foo | 6 | 0 | 0 |
+————+———–+————–+————————-+
1 row in set (0.00 sec)mysql> select * from sbtest.foo;
+——-+
| t |
+——-+
| 82921 |
| 24489 |
| 73681 |
+——-+
3 rows in set (0.00 sec)mysql> show session table_statistics;
+————+———–+————–+————————-+
| Table | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+————+———–+————–+————————-+
| sbtest.foo | 3 | 0 | 0 |
+————+———–+————–+————————-+
1 row in set (0.00 sec)mysql> show global table_statistics;
+————+———–+————–+————————-+
| Table | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+————+———–+————–+————————-+
| sbtest.foo | 9 | 0 | 0 |
+————+———–+————–+————————-+
1 row in set (0.00 sec)
To continue the pid file theme I’ve found another slight issue. This was unrelated to the testing which I found the previous pid file issues. I was working on an unmonitored development mysql system. While working on it I ran it out of disk space in /. The box has it’s mysql datadir in a separate partition which had plenty of space. The pid file is in a dir on /. When I started mysqld_safe mysqld exited because it couldn’t create the pid file. mysqld_safe continued to restart mysqld until I saw the problem and killed it a few minutes later. I’m not sure exactly why, I didn’t spend very much time digging into a failure that I caused by filling up the disk. mysqld was exiting because it was trying to create a pid file in a full partition.
Note: This was a stock mysqld, not one running my pid file patch.
Previously I wrote about how late the mysql pid file is created in the startup process. At first glance it seemed like a relatively easy thing to fix. In main() there is a call to start_signal_handler(). The first instance of static void start_signal_handler() does only one thing. It checks !opt_bootstrap to make sure mysqld isn’t being called by mysql_install_db. I’m not sure why mysql_install_db can’t have a pid file created but that’s getting outside the scope of my investigation. It seems simple enough to move the call to start_signal_handler() above the call to init_server_components() in main() and have the pidfile created earlier. It turns out pidfile creation happens differently on different arches.
For windows and netware start_signal_handler simply creates the pid file. For __EMX__ (I’m not sure what that is) start signal handler does nothing. By default start_signal_handler starts a signal handler thread. This thread then creates the pid file. I think this can be cleaned up by removing the start_signal_handler functions that either do nothing or only create a pid file and handle the pid file creation for arches that need it directly in main with some good self documenting ifdefs right around it.
I don’t have all the environments to test that this patch really works. I’ve tested it on linux and it does create a pid file and deletes it on shutdown. The pid file is created just after argument parsing and before the heavy weight storage engine initialization.
[Update 2009-12-07: I think the old patch broke embedded. I updated it to ifdef out the call to start_signal_handler]
Here is the patch Create pid file sooner patch.
mysql_safe is responsible for restarting mysqld if it crashes and will exit cleanly if mysqld was shutdown. The way it determines if mysqld shutdown correctly is if the pid file is cleaned up correctly. MySQL does quite a few things before creating the pid file like initializing storage engines. It can take quite a while to initialize innodb if it has to roll forward the transaction log. During this time if mysqld crashes mysqld_safe won’t restart it. Normally this is ok because the server would just crash again but it can mess with your head a bit if you’re testing changes to mysqld_safe. Especially if those changes involve what mysqld_safe does if mysqld crashes. I think it makes sense to create the pidfile earlier and there is a bug for it. Chime in on the bug if this has burned you.
Deadlocks are a common occurrence in relational databases. They usually aren’t a problem until they start happening too frequently. Innodb can provide you with information about the latest deadlock in SHOW ENGINE INNODB STATUS. This can be useful for debugging but it’s almost impossible to get the rate at which deadlocks are occurring. This patch applies against MySQL 5.0.72sp1 and probably quite a few other versions. It adds a counter to show table status that tracks the number of deadlocks. In this example mysql-1> is connection 1 and mysql-2> is connection 2.
mysql-2> show global status like ‘innodb_deadlocks’;
+——————+——-+
| Variable_name | Value |
+——————+——-+
| Innodb_deadlocks | 0 |
+——————+——-+
1 row in set (0.00 sec)mysql-1> begin; select * from t where t=1 for update;
Query OK, 0 rows affected (0.00 sec)+—+
| t |
+—+
| 1 |
+—+
1 row in set (0.00 sec)mysql-2> begin; select * from t where t=2 for update;
Query OK, 0 rows affected (0.00 sec)+—+
| t |
+—+
| 2 |
+—+
1 row in set (0.00 sec)mysql-1> select * from t where t=2 for update;
+—+
| t |
+—+
| 2 |
+—+
1 row in set (1.71 sec)mysql-2> select * from t where t=1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
mysql> show global status like ‘innodb_deadlocks’;
+——————+——-+
| Variable_name | Value |
+——————+——-+
| Innodb_deadlocks | 1 |
+——————+——-+
1 row in set (0.00 sec)
Enjoy!
In a Great Magnet moment Trent Lloyd posted an excellent write-up on how to recover from relay log name changes on the same day I was going to write up a procedure to send to a client who had a similar issue. Thanks Trent! The problem goes a bit deeper than server hostname changes because there have been a few changes to how mysql handles default log file names in 5.0
Prior to 5.0.38 the default log file name started with the hostname. The problem is, as Trent points out, that if the hostname of the server changes then mysql doesn’t generate default log file names correctly. The error message though is something like:
090825 18:54:53 [ERROR] Failed to open the relay log ‘/mysql/old_hostname-relay-bin.000015′ (relay_log_pos 107657)
There are a few strange things going on here. First if the relay-log index file default name changed and it didn’t know how to open the file then how did it know to open file 15 which is the file it was processing before? The answer is simple, it gets that information from the relay-log.info file which records where the sql thread is in it’s processing. If it was processing that file and it exists, and it knows the correct file name then why can’t it open it?
A quick scan of the source shows two problems. Once the error messages are very vague, the other is that they are printed in a different order than they happened. This message:
[ERROR] Could not find target log during relay log initialization
Actually happens before the Failed to open relay log message. It’s just saved in a variable and printed out later. This message is kind of cryptic. It should be:
[ERROR] Could not find log file ‘old_hostname-relay-bin.000015′ in relay log index file ‘/mysql/new_hostname-relay-bin.index’
The issue isn’t that MySQL couldn’t open the binlog file. The file is there and perfectly healthy. The issue is that MySQL couldn’t match the log file name it had recorded in the relay-log.info file to a valid file name in the new_hostname-relay-bin.index file.
The “fix” that was put in place to solve the problem of default files being named after the hostname was to generate the default log file names from the pid-file variable without the extension. This doesn’t really fix the problem, it just moves it around. If you don’t set the relay-log and relay-log-index variables and you change the pid-file variable then it’s exactly the same as if you changed the hostname in the old method. If the master.info, and relay-log.info names are fixed why can’t the default log file name be something simple like ‘mysql’ instead of being based off of something else that can be changed and has nothing to do with replication?
cinfo is a kick ass program written by the author of PowerDNS Bert Hubert to show how much of a file is being cached by the operating system. It’s great for testing to see if myisam data files are being read from the operating system cache or off disk. For example this is the cache status of myd files for my blog:
wordpress_comments.MYD: 1127 pages, 1121 pages cached (99.47%)
wordpress_links.MYD: 1 pages, 1 pages cached (100.00%)
wordpress_options.MYD: 309 pages, 309 pages cached (100.00%)
wordpress_postmeta.MYD: 2 pages, 2 pages cached (100.00%)
wordpress_posts.MYD: 100 pages, 94 pages cached (94.00%)
wordpress_term_relationships.MYD: 2 pages, 2 pages cached (100.00%)
wordpress_terms.MYD: 1 pages, 1 pages cached (100.00%)
wordpress_term_taxonomy.MYD: 1 pages, 1 pages cached (100.00%)
wordpress_usermeta.MYD: 10 pages, 10 pages cached (100.00%)
wordpress_users.MYD: 2 pages, 2 pages cached (100.00%)
Most of the files are in the filesystem cache so there is a very good chance that when you loaded this page mysql was able to read all the data for my blog from ram.
Cinfo is distributed by source tarball but I’ve created a x86_64 rpm. You can download a modified tarball that can be used with rpmbuild to create your own rpm. The basic process for building an rpm from my tarball is:
create a ~/.rpmmacros file with a path to a buildroot:
%_topdir /tmp/my_buildroot
Then create some directories in that buildroot
cd /tmp/my_buildroot
mkdir SPECS SRPMS BUILD RPMS
The rpm can be built directly from the tarball with:
rpmbuild -ta cinfo-0.1.tar.gz
If everything works ok it should drop an arch specific rpm in:
/tmp/my_buildroot/RPMS
Modern operating systems give the impression of running more processes than the number of available processors by giving each process that wants to run, such as your web browser or text editor a tiny slice of time in which they can execute. By switching out the running process very quickly it looks as though all processes are running even though only a few are at a time. Other processes that want to run are placed in a queue where they wait for their slice of cpu time to execute. The number of processes waiting in this queue is called load average.
Most people who run a flavor of unix inevitably come across load average. It’s in quite a few tools such as uptime, top, and w. It’s usually displayed as a 1, 5, and 15 minute average. Load average can change based on the number of processes wanting to run on a machine. This is a deceptively evil monitoring metric because it’s very difficult to define what a good and bad load average is.
Load average is often used as a comparison metric when trying to diagnose machines that are overloaded. While participating in discussions about machine load I often hear things like, “But the load average is OK” or, “The load average is way high.”. The problem with this is that on one server an OK load average may be 10 and on a server that’s overloaded it may also be 10, or 5, or 200. For situations like this it’s much easier to use response time because it can be easily compared between different types of servers or software. A response time of one second is a response time of one second no matter where you are. It’s very easy to judge if that’s good or bad based on the application. On the same servers one could have a response time of one second and a load average of 200 and another could have a response time of one second with a load average of 50.
Using load average as a comparison metric is so difficult because a higher load average doesn’t really mean a server is more loaded or that it isn’t responding appropriately. The reason for this is that the time a process spends waiting in the queue can be effected by different forces including the number of processes in line before it, how long those processes run for, and if your process gets and interrupt that can let it jump the queue. This is why a server with a load average of 10 may be almost impossible to work with and a server on the same hardware right next to it with a load average of 200 may be responding just fine.
The moral of all of this is to use load average for exactly what it is, the depth of the run queue. Don’t use it for comparison between different types of servers and don’t quote it when saying a server is overloaded.