Archive for the 'General' Category

Solving the final issue with electric cars.

Saturday, August 16th, 2008

For the past few years hyrbids have been all the rage. Now electric cars are coming on to the scene. I realized a while ago that neither of these vehicles is good for the great american tradition of the road trip. Before gas prices started to increase it was fairly common to pack up the family car and head across the country. This is still my preferred way to travel. I hate the uncertainty and the TSA interaction of flying. Driving is nice and relaxing once you get out of traffic.

Hybrids are fine for city driving but they offer no improvement over your typical four-banger once you get out on to the highway or back in the woods. Let’s face it, most hybrids have no business off pavement. In a few years I’m sure decent hybrid trucks will become available. I’m not holding my breath.

The other alternative is electric cars. They’re great for the daily commute. You drive to work and back then plug them in when you get home a night. Electric cars are great because we have so many different clean options for generating power. The typical range is 200 miles so they have plenty of juice to go to work and grab groceries on the way home. If you need to drive outside that range you’re effectively screwed. After the juice runs out the car has to be plugged in for hours before you can make go another 200 miles.

In order to get people to switch over to all electric cars we need to have the range of a gas fueled engine plus the ease of refueling. Since we can’t recharge batteries as fast as we can fill a tank with gas the only other option is to change the batteries out. I think we need to standardize on battery size and have them removed from the bottom of the car. Then instead of gas stations we can have automated robots that will drop all the batteries out of the bottom of a car and place charged batteries in. Think of it like driving through an automated car wash like device but instead of cleaning your car it drops the batteries out of the bottom and puts new ones in. This eliminates two problems with electric cars. First it means we can recharge electric cars as fast as or faster than gas cars and as long as there are battery change out stations we can continue to go on road trips. It also means that there is no longer an issue with the huge cost of replacing batteries after a few years. Part of the recharge cost will also be maintenance on the batteries. It’s like the propane exchange at your local grocery store.

Ptrace on threads and linux signal handling issues

Wednesday, June 25th, 2008

At Proven Scaling it’s not always all about scaling databases. Sometimes we get to solve other problems not related to scaling at all. We have a client that has been using jmap (unsupported) to grab memory statistics from java. They found that after they ran jmap they were unable to shutdown the jvm without it hanging.

After working on the problem for a bit I found that after jmap ran ps showed the java process as stopped. This is strange since java was still able to process requests. In linux threads are treated as processes, they get a pid just like any other process. To be POSIX compliant linux has the notion of thread groups and a thread group leader so signals can be delivered to an entire thread group.

jmap gets memory statistics by using ptrace on the pid of the thread group leader. When ptracing a thread group leader only that thread is stopped and analyzed. Other threads are free to continue processing. Jmap is a bit buggy in that it attaches to a thread but never detaches. Linux has a safe guard that if the parent of a traced process quits then linux changes the traced processes’ state from traced to stopped because traced processes can’t be killed.

When jmap quits the ps shows the process in state T which means stopped. What it doesn’t say is that only the thread group leader is stopped. To get around the limitations of ps and process states I went directly to proc to get the process state. The example below was done using mysqld instead of java. It shows the process state of all the threads, including the leader during a simulated jmap run.

In this example 20924 is the pid of mysqld. I substituted mysql for java in this example because I had it handy on my dev server. It reacts the same way java does. The bad_trace app simulates jmap by doing a ptrace attach and exiting before a detach. It sleeps for a big in the middle so I can get the process state of a normal traced process. Here is the source for bad_trace if you want to follow along at home.


#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ptrace.h>

int main(int argc, char **argv)
{
pid_t pid = 0;

if (argc < 2)
{
printf("First arg should be a pid\n");
return 1;
}

printf("argc %d\n", argc);
pid = atoi(argv[1]);
printf("Attaching to pid (%d)\n", pid);
ptrace(PTRACE_ATTACH, pid, NULL, NULL);
printf("Sleeping for ten seconds\n");
sleep (10);

return 0;
}

The bad_trace app does a ptrace attach and exits without detaching.

This is the normal state on an idle server all threads are sleeping.

ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: S (sleeping)
20926/status:State: S (sleeping)
20927/status:State: S (sleeping)
20928/status:State: S (sleeping)
20929/status:State: S (sleeping)
20931/status:State: S (sleeping)
20932/status:State: S (sleeping)
20933/status:State: S (sleeping)
20934/status:State: S (sleeping)

I execute bad_trace which puts the thread group leader into traced mode waiting for bad_trace to examine it.


ebergen@kamet:(/proc/20924/task) ~/bad_trace 20924
Attaching to pid (20924)
Sleeping for ten seconds
[1]+ Stopped ~/bad_trace 20924

I suspend bad_trace to grab the stats. You can see that the thread group leader has stopped in tracing mode waiting for bad_trace to examine it and tell it to continue.


ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: T (tracing stop)
20926/status:State: S (sleeping)
20927/status:State: S (sleeping)
20928/status:State: S (sleeping)
20929/status:State: S (sleeping)
20931/status:State: S (sleeping)
20932/status:State: S (sleeping)
20933/status:State: S (sleeping)
20934/status:State: S (sleeping)
ebergen@kamet:(/proc/20924/task) fg
~/bad_trace 20924

bad_trace resumes and exits without detaching from the traced process. To enable an admin to kill the traced process linux does some cleanup work by changing it’s state from traced to stopped. Now we can send signals to the thread group and they will be able to respond.


ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: T (stopped)
20926/status:State: S (sleeping)
20927/status:State: S (sleeping)
20928/status:State: S (sleeping)
20929/status:State: S (sleeping)
20931/status:State: S (sleeping)
20932/status:State: S (sleeping)
20933/status:State: S (sleeping)
20934/status:State: S (sleeping)

Naturally we don’t want the thread group leader stopped if everything is OK so I send it a continue signal to resume operation. This is where things get weird.


ebergen@kamet:(/proc/20924/task) kill -CONT 20924

Instead of the thread group leader resuming operation linux decides that it’s a good idea to stop all threads instead.


ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: T (stopped)
20926/status:State: T (stopped)
20927/status:State: T (stopped)
20928/status:State: T (stopped)
20929/status:State: T (stopped)
20931/status:State: T (stopped)
20932/status:State: T (stopped)
20933/status:State: T (stopped)
20934/status:State: T (stopped)

Sending a second continue signal does the right thing and the process resumes.


ebergen@kamet:(/proc/20924/task) kill -CONT 20924
ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: S (sleeping)
20926/status:State: S (sleeping)
20927/status:State: S (sleeping)
20928/status:State: S (sleeping)
20929/status:State: S (sleeping)
20931/status:State: S (sleeping)
20932/status:State: S (sleeping)
20933/status:State: S (sleeping)
20934/status:State: S (sleeping)

I did some digging in the kernel and didn’t see any specific reason for this behavior. I suspect it’s a kernel bug that has never been uncovered because tracing a single thread of a threaded process isn’t a very common operation.

Splitting flush logs command

Monday, May 19th, 2008

Last week I was working with a client that rediscovered a bug where setting expire_logs_days and issuing a flush logs causes the server to crash. It’s MySQL Bug #17733 if you want to have a look. Seeing MySQL crash was enough inspiration to fix something that I and others have wanted to fix in MySQL for years.

Currently a flush logs command tries to flush all of the following logs in order:

  • General Log
  • Slow Query Log
  • Binary Log
  • Relay Log
  • Store Engine Logs (If available)
  • Error Log

The reason I wanted to fix this is because my client was issuing a flush logs to rotate the error log on a server with no replication. The crash was caused by replication. With individual flush logs it’s less likely for this to happen again in the future. People can simply issue a query for the log they want to flush. The new commands flush logs named in the command. They are:

  • flush general log;
  • flush slow log;
  • flush binary log;
  • flush relay log;
  • flush engine logs;
  • flush error log;

The words log and logs are interchangeable. The query “flush general log” is just as valid as “flush general logs” even though there is only one log. I submitted the patch as a fix for MySQL Bug #14104.

The patch, flush_logs.patch was diffed against 6.0.4 but also applies on 5.1.24.

Rotation for different log files isn’t uniform. Rotating the slow log simply closes and opens it. I’m planning to write a second patch that rotates log files using the same numbered scheme as binary logs. This fixes the rotation for slow and general log as well as eliminating the annoying issue of error logs being destroyed after they are rotated to foo.log-old.

This patch hasn’t been accepted or committed yet so if you have any suggestions on how to make it better please let me know.

Parallels CD Jacket

Tuesday, July 10th, 2007

“The following serial number must be entered to activate your software. We recommend writing the number inside your manual for future reference.”

Why didn’t you just put the sticker with the serial number on the manual?

Mytop support for 5.0

Saturday, December 9th, 2006

In 5.0 the ability to show status for local or session variables. Unfortunately the default for this command is session instead of global (like the old method). This breaks many existing programs such as mytop. Here is a patch for mytop 1.4 that makes it aware of the 5.0 style show status.

Here is my rant about changing the default value which I won’t repeat here.

Show @&!# status again!

PHfhghasdbasdf

Sunday, September 10th, 2006

I’ve tried to write 4 different blog entries tonight. After driving 700 miles today I can’t complete any of them. Until my brain recovers these blog entries will join several other ideas that will exist as brain crack until I can get enough rest to form a complete thought.

Before the upgrade..

Wednesday, August 2nd, 2006

I just want to say that Adam Corolla is a better death than Norm Mcdonald on Family Guy. Debate it if you want but Norm Mcdonald is a turd.

Why uptime is bad

Wednesday, June 28th, 2006

Growing up in the world of linux uptime was always considered a good thing. On IRC every once in a while someone would post an uptime. Everyone else in that channel would then check their uptime and if it was greater or close they would post it in the channel. Most of these systems were home linux boxes used for compiling random programs or maybe hosting a webserver for experimenting. It was fun to see how long we could keep them running for. Since those days I have come to realize that high uptimes are a bad thing.
Keeping a server up for months or even years means that you aren’t maintaining it. It hasn’t been kept up to date with new kernels that have fixes for security holes. It doesn’t have new packages or new tools that can help it run more efficiently and have features that can make using it easier. It’s also not up to date with new servers that are being deployed which means that people logging into your server with a high uptime have to adjust themselves to the older software and possible missing tools.

Hardware fails, colos lose power, network connections, and sometimes catch on fire. If you’re entire system depends on a single server, say a mysql master. It’s going to fail. I know there are mysql servers out there that have been up for years. Those are going to fail. It’s inevitable. If you’re system is not designed to withstand the failure of a master it should be fixed. Jeremy Cole and I gave a tutorial at the 2006 MySQL User Conference about MySQL replication and failover. See Jeremy’s blog for links to the presentation and photos.

“But I can’t take down my master to fix it?” It’s much better to do a planned downtime than it is to get paged at 3am because the master died and the whole site is down. Take some time. Plan to take down the master and fix the system. It will be worth it in the end. If your manager says no to a planned downtime to make your website fault tolerant. Find a new job. Preferably at Yahoo! :)

By building a system that can handle the failure of a master it’s much easier to upgrade MySQL so you can take advantage of all the new nifty features.

26

Friday, June 23rd, 2006

The number of cars I saw off the road between ripon, ca and mountain view, ca due to over heating. I went up to ripon today to check out the Stanislaus river as a possible new fishing spot. According to my truck’s thermometer the high temp I saw today was 106F. There are a few steep grades on 580 going through pleasanton. People were working their cars too hard and simply overheated. One of the cars I saw pulled over was a convertible with a couch upside down in the back seat. I’m not sure what they were thinking but they paid for it. Here are some tips to prepare you for the summer heat.

If you see your car over heating turn on the heater and roll your windows down until it cools off. The heater works by running hot coolant from the engine through a little radiator. By turning on your heater with the fan on high you can help cool down the engine faster. Driving down the road with the heater on and windows down is much better than stopped on the side waiting for your car to cool off.
Change your radiator cap and thermostat. These two parts combined may cost you $10. They are well worth it. Radiator caps can wear out and blow prematurely. Thermostats can get stuck closed. Both of these parts are cheap to buy and easy to change yourself.

Check your coolant. Coolant testers are only a few dollars and can save you from a cracked block in the winter or overheating in the summer. Car coolant is a mixture of anti-freeze and water. The ratio of the mix determines the freezing point of the coolant. Coolant testers will tell you the approximate freezing point of coolant. During the hot summer months you can mix your coolant with a little more water to be a little more protected against overheating. Cheap testers will have several colored balls in them. Coolant testers are nothing more than a turkey baster with a few colored balls in them. The balls are a different weight and thus will float with higher density coolant. The number of floating or the color of the floating balls determines the freezing point.

If your going to be driving over hot mountains pack a few gallons of water (no need for anti-freeze) and some rags. If you do over heat pull over, shutdown the engine and wait for the car to cool down. This is important. If you radiator depressurized the water/steam coming out was well over 200F. WARNING!! IF YOU TRY TO OPEN THE RADIATOR CAP WHEN IT’S HOT THE RESULTING STEAM BLAST WILL BURN YOUR SKIN OFF. After the car is cooled down remove the cap with the rags and fill the radiator with water. Close the cap and you should be good to go. If you do fill your radiator with mostly water be sure to have your coolant flushed before winter. Waking up on a cold morning with a cracked block or all your freeze plugs on the ground is no fun.
It also helps to have a truck with a tow package (oversized radiator and transmission cooler) :)

Bait and switch

Tuesday, May 30th, 2006

This post should be a (hopefully good) review of the chili served today. It’s not because there was no chili. Instead of $25,000 chili I was offered seafood chowder. My review of the chowder?

The quality was good but it just didn’t appeal to me.