Archive for the ‘MySQL’ Category.

Attempting to unwind the tangled web of pid file creation.

Previously I wrote about how late the mysql pid file is created in the startup process. At first glance it seemed like a relatively easy thing to fix. In main() there is a call to start_signal_handler(). The first instance of static void start_signal_handler() does only one thing. It checks !opt_bootstrap to make sure mysqld isn’t being called by mysql_install_db. I’m not sure why mysql_install_db can’t have a pid file created but that’s getting outside the scope of my investigation. It seems simple enough to move the call to start_signal_handler() above the call to init_server_components() in main() and have the pidfile created earlier. It turns out pidfile creation happens differently on different arches.

For windows and netware start_signal_handler simply creates the pid file. For __EMX__ (I’m not sure what that is) start signal handler does nothing. By default start_signal_handler starts a signal handler thread. This thread then creates the pid file. I think this can be cleaned up by removing the start_signal_handler functions that either do nothing or only create a pid file and handle the pid file creation for arches that need it directly in main with some good self documenting ifdefs right around it.

I don’t have all the environments to test that this patch really works. I’ve tested it on linux and it does create a pid file and deletes it on shutdown. The pid file is created just after argument parsing and before the heavy weight storage engine initialization.

[Update 2009-12-07: I think the old patch broke embedded. I updated it to ifdef out the call to start_signal_handler]
Here is the patch Create pid file sooner patch.

mysqld_safe and pid file creation race condition

mysql_safe is responsible for restarting mysqld if it crashes and will exit cleanly if mysqld was shutdown. The way it determines if mysqld shutdown correctly is if the pid file is cleaned up correctly. MySQL does quite a few things before creating the pid file like initializing storage engines. It can take quite a while to initialize innodb if it has to roll forward the transaction log. During this time if mysqld crashes mysqld_safe won’t restart it. Normally this is ok because the server would just crash again but it can mess with your head a bit if you’re testing changes to mysqld_safe. Especially if those changes involve what mysqld_safe does if mysqld crashes. I think it makes sense to create the pidfile earlier and there is a bug for it. Chime in on the bug if this has burned you.

InnoDB Deadlock Count Patch

InnoDB Deadlock Count Patch

Deadlocks are a common occurrence in relational databases. They usually aren’t a problem until they start happening too frequently. Innodb can provide you with information about the latest deadlock in SHOW ENGINE INNODB STATUS. This can be useful for debugging but it’s almost impossible to get the rate at which deadlocks are occurring. This patch applies against MySQL 5.0.72sp1 and probably quite a few other versions. It adds a counter to show table status that tracks the number of deadlocks. In this example mysql-1> is connection 1 and mysql-2> is connection 2.

mysql-2> show global status like ‘innodb_deadlocks';
+——————+——-+
| Variable_name | Value |
+——————+——-+
| Innodb_deadlocks | 0 |
+——————+——-+
1 row in set (0.00 sec)

mysql-1> begin; select * from t where t=1 for update;
Query OK, 0 rows affected (0.00 sec)

+—+
| t |
+—+
| 1 |
+—+
1 row in set (0.00 sec)

mysql-2> begin; select * from t where t=2 for update;
Query OK, 0 rows affected (0.00 sec)

+—+
| t |
+—+
| 2 |
+—+
1 row in set (0.00 sec)

mysql-1> select * from t where t=2 for update;
+—+
| t |
+—+
| 2 |
+—+
1 row in set (1.71 sec)

mysql-2> select * from t where t=1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
mysql> show global status like ‘innodb_deadlocks';
+——————+——-+
| Variable_name | Value |
+——————+——-+
| Innodb_deadlocks | 1 |
+——————+——-+
1 row in set (0.00 sec)

Enjoy!

Default log file name changes and replication breakage.

In a Great Magnet moment Trent Lloyd posted an excellent write-up on how to recover from relay log name changes on the same day I was going to write up a procedure to send to a client who had a similar issue. Thanks Trent! The problem goes a bit deeper than server hostname changes because there have been a few changes to how mysql handles default log file names in 5.0

Prior to 5.0.38 the default log file name started with the hostname. The problem is, as Trent points out, that if the hostname of the server changes then mysql doesn’t generate default log file names correctly. The error message though is something like:

090825 18:54:53 [ERROR] Failed to open the relay log ‘/mysql/old_hostname-relay-bin.000015′ (relay_log_pos 107657)

There are a few strange things going on here. First if the relay-log index file default name changed and it didn’t know how to open the file then how did it know to open file 15 which is the file it was processing before? The answer is simple, it gets that information from the relay-log.info file which records where the sql thread is in it’s processing. If it was processing that file and it exists, and it knows the correct file name then why can’t it open it?

A quick scan of the source shows two problems. Once the error messages are very vague, the other is that they are printed in a different order than they happened. This message:

[ERROR] Could not find target log during relay log initialization

Actually happens before the Failed to open relay log message. It’s just saved in a variable and printed out later. This message is kind of cryptic. It should be:

[ERROR] Could not find log file ‘old_hostname-relay-bin.000015′ in relay log index file ‘/mysql/new_hostname-relay-bin.index’

The issue isn’t that MySQL couldn’t open the binlog file. The file is there and perfectly healthy. The issue is that MySQL couldn’t match the log file name it had recorded in the relay-log.info file to a valid file name in the new_hostname-relay-bin.index file.

The “fix” that was put in place to solve the problem of default files being named after the hostname was to generate the default log file names from the pid-file variable without the extension. This doesn’t really fix the problem, it just moves it around. If you don’t set the relay-log and relay-log-index variables and you change the pid-file variable then it’s exactly the same as if you changed the hostname in the old method. If the master.info, and relay-log.info names are fixed why can’t the default log file name be something simple like ‘mysql’ instead of being based off of something else that can be changed and has nothing to do with replication?

fadvise syscall, myisam data file caching, and a lesson learned in debugging

fadvise is a system call that can be used to give Linux hints about how it should be caching files. It has a few options for caching, not caching, read ahead, and random access. I was looking into used fadvise because a client ran into an issue where some infrequently used myisam data files were being pushed out of the filesystem cache by binary logs and other activity. The files are used infrequently for queries but when they are used they need to be fast. When the files weren’t cached the particular query ran in about 30 seconds. When they were cached the query ran in .8 seconds — huge difference. The fix seemed pretty trivial, call fadvise on the myd files, they stay in cache, the queries are consistently faster and the problem is solved. It seemed simple but it wasn’t. I’ll cover the myisam issue in more detail in another post, this is all about fadvise and debugging.

I couldn’t find an rpm that contained a simple tool to control fadvise that I could called from a shell script (aside from the perl with inline C tool) so I decided to roll my own. The client has an existing system for building C tools to be deployed on servers. The system is such that it takes a bit of time to compile, deploy, and test so I wrote a small C app to test my theory which I could later merge into the clients tool system. I wrote the tool, tested it and the files stayed in cache as predicted so I started on the process of merging the tool with the clients system. The system used to compile the tool is 32bit the mysql servers are 64bit. When I originally wrote the test tool I compiled it directly on the 64 bit boxes.

When I rolled the fadvise tool compiled on the 32 bit system to the 64 bit system it failed with an illegal seek error. My original call to fadvise looked something like this:

posix_fadvise(fd, 0, lseek(fd, 0, SEEK_END), POSIX_FADV_WILLNEED);

I’m sure some of you are thinking, “the lseek is pointless, 0 will include the entire file.” I learned that lesson later on and it plays an important part in this story. When I ran strace on the 32bit binary on the 64 bit system the fadvise syscall looked like:

fadvise64(3, 0, 0, 0x594 /* POSIX_FADV_??? */) = -1 EINVAL (Invalid argument)

Strange, right? I checked the code, added some debugging output for the lseek and tried again with the same result. I went back to my small test program and checked to see if it had the same fadvise and lseek calls. It did. I pinged a friend on IRC and he suggested quite a few things to check including removing the unnecessary lseek. After the lseek is removed the strace output changes to:

fadvise64(3, 0, 0, POSIX_FADV_NORMAL) = 0

Notice that the last argument is POSIX_FADV_NORMAL instead of POSIX_FADV_WILLNEED. At this point I had been trying to debug this for a few hours and I started to think I was going insane. Frustration lead to typos and eyestrain. I sent the output back to my friend assuring him that I’m not insane and this has to be another problem. After some more debugging something clicks in his brain and he realizes it’s a kernel bug. (Note: I don’t fully understand this part so if you’re reading this and you do, please leave a comment.). The 32bit binary running on the 64bit kernel with the bug is taking the 3rd argument and splitting it across two registers to make up the third and fourth argument. POSIX_FADV_NORMAL is defined as 0 so the call is really

fadvise64(3,0,0,0)

Which in my case is a no op but it’s a successful no op and since I haven’t checked it with strace I think it’s caching files. I move this fix over to the clients tool, recompile, and retry my myisam performance test only to find that the files aren’t being kept in memory. Rerunning the fadvise tool doesn’t bring them in memory because it’s being called with POSIX_FADV_NORMAL. After more work with my super smart friend we pin down the problem to the register issue above and I resolve to write a test the next day (today) to test the the theory.

The test works.

Now I have two binaries created from the same source code, one that will successfully execute on both 32 and 64 bit architectures but is a no op on the 64 bit architecture. Since the clients build system isn’t setup to compile 64 bit binaries I end up giving them two binaries with a shell script that will detect the architecture and execute the correct binary.

The debugging lesson learned is to stay calm and think through problems. Frustration and disbelief led to the debugging cycle taking a few hours longer than it should have (it was about 10hrs total including all the testing). It also stressed me out. The other lesson learned is that is pays to have really smart friends. Thanks guys, I owe you a beer.

Why Oracle can’t kill MySQL.

When Oracle agreed to acquire Sun there was some speculation that Oracle might try to kill MySQL. First this wouldn’t be a very prudent effort on Oracle’s part and second it’s not even possible. I think Monty has the best explanation from his comment on his blog:

The simple fact is one can’t own an open source project. One can control it by controlling the people which are leading and developing it. When a company doesn’t take good care of their employees and those employees start to leave the company and work on the project elsewhere, that company has lost control of the project.

The whole comment is worth reading. Monty does a good job of putting the Sun purchase into perspective with regards to MySQL and the developer community.

Update of Google’s Sysbench patch to 0.4.12

[Update: I found the magic javascript links that show old releases of sysbench.]

Sysbench is an application that can be used to benchmark different system parameters and also includes support for testing MySQL directly. Google has released a patch for sysbench that adds a lot of new OLTP tests. It’s great for testing MySQL and for drag races against Mark’s tests. Their patch seems to apply against sysbench 0.4.10. I was able to find sysbench 0.4.10 but it wasn’t easy so I’ve ported Google’s patch to sysbench 0.4.12.

Grab the patch here.

Percona Performance Conference EMT Presentation Slides

I sat down about 20 minutes ago to write a blog post that included a link to the slides of my EMT presentation. It turned into a long post about the presentation, how I feel EMT was received and my feelings on presentations in general. Here is the short post and the link to the slides.

The MySQL conference always inspires me to write so expect a longer post in a few days.

MySQL Brings the Heat

This week throngs of MySQL developers, users, and enthusiasts descended on silicon valley. Apparently the valley’s cooling system can’t keep up because as they arrived the outside temperature went up into the 90s (32s for those of you who choose to use a sane temperature measurement system). I’m not attending the conference this year but I almost wish I was to get some of the air conditioning. It’s supposed to cool off and rain on Saturday. As the brain power leaves I suspect the valley is going to cool and moisture in the air will condense into rain, or tears.

I bet you thought this post was going to be about Oracle and Sun, sorry. I think the weather is more interesting.

Longest beta ever, myisamchk –parallel-recover

I was reading through the manual and noticed that myisamchk parallel recover option is still listed as beta code. The feature was added in 4.0.2 which was released in july 2002. This means it’s been in beta longer than gmail :)