Ptrace on threads and linux signal handling issues

At Proven Scaling it’s not always all about scaling databases. Sometimes we get to solve other problems not related to scaling at all. We have a client that has been using jmap (unsupported) to grab memory statistics from java. They found that after they ran jmap they were unable to shutdown the jvm without it hanging.

After working on the problem for a bit I found that after jmap ran ps showed the java process as stopped. This is strange since java was still able to process requests. In linux threads are treated as processes, they get a pid just like any other process. To be POSIX compliant linux has the notion of thread groups and a thread group leader so signals can be delivered to an entire thread group.

[Update: 2009-05-19 The version of jmap that ships with jdk-1.6 detaches correctly]

jmap gets memory statistics by using ptrace on the pid of the thread group leader. When ptracing a thread group leader only that thread is stopped and analyzed. Other threads are free to continue processing. Jmap is a bit buggy in that it attaches to a thread but never detaches. Linux has a safe guard that if the parent of a traced process quits then linux changes the traced processes’ state from traced to stopped because traced processes can’t be killed.

When jmap quits the ps shows the process in state T which means stopped. What it doesn’t say is that only the thread group leader is stopped. To get around the limitations of ps and process states I went directly to proc to get the process state. The example below was done using mysqld instead of java. It shows the process state of all the threads, including the leader during a simulated jmap run.

In this example 20924 is the pid of mysqld. I substituted mysql for java in this example because I had it handy on my dev server. It reacts the same way java does. The bad_trace app simulates jmap by doing a ptrace attach and exiting before a detach. It sleeps for a big in the middle so I can get the process state of a normal traced process. Here is the source for bad_trace if you want to follow along at home.


#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ptrace.h>

int main(int argc, char **argv)
{
pid_t pid = 0;

if (argc < 2)
{
printf("First arg should be a pid\n");
return 1;
}

printf("argc %d\n", argc);
pid = atoi(argv[1]);
printf("Attaching to pid (%d)\n", pid);
ptrace(PTRACE_ATTACH, pid, NULL, NULL);
printf("Sleeping for ten seconds\n");
sleep (10);

return 0;
}

The bad_trace app does a ptrace attach and exits without detaching.

This is the normal state on an idle server all threads are sleeping.

ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: S (sleeping)
20926/status:State: S (sleeping)
20927/status:State: S (sleeping)
20928/status:State: S (sleeping)
20929/status:State: S (sleeping)
20931/status:State: S (sleeping)
20932/status:State: S (sleeping)
20933/status:State: S (sleeping)
20934/status:State: S (sleeping)

I execute bad_trace which puts the thread group leader into traced mode waiting for bad_trace to examine it.


ebergen@kamet:(/proc/20924/task) ~/bad_trace 20924
Attaching to pid (20924)
Sleeping for ten seconds
[1]+ Stopped ~/bad_trace 20924

I suspend bad_trace to grab the stats. You can see that the thread group leader has stopped in tracing mode waiting for bad_trace to examine it and tell it to continue.


ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: T (tracing stop)
20926/status:State: S (sleeping)
20927/status:State: S (sleeping)
20928/status:State: S (sleeping)
20929/status:State: S (sleeping)
20931/status:State: S (sleeping)
20932/status:State: S (sleeping)
20933/status:State: S (sleeping)
20934/status:State: S (sleeping)
ebergen@kamet:(/proc/20924/task) fg
~/bad_trace 20924

bad_trace resumes and exits without detaching from the traced process. To enable an admin to kill the traced process linux does some cleanup work by changing it’s state from traced to stopped. Now we can send signals to the thread group and they will be able to respond.


ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: T (stopped)
20926/status:State: S (sleeping)
20927/status:State: S (sleeping)
20928/status:State: S (sleeping)
20929/status:State: S (sleeping)
20931/status:State: S (sleeping)
20932/status:State: S (sleeping)
20933/status:State: S (sleeping)
20934/status:State: S (sleeping)

Naturally we don’t want the thread group leader stopped if everything is OK so I send it a continue signal to resume operation. This is where things get weird.


ebergen@kamet:(/proc/20924/task) kill -CONT 20924

Instead of the thread group leader resuming operation linux decides that it’s a good idea to stop all threads instead.


ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: T (stopped)
20926/status:State: T (stopped)
20927/status:State: T (stopped)
20928/status:State: T (stopped)
20929/status:State: T (stopped)
20931/status:State: T (stopped)
20932/status:State: T (stopped)
20933/status:State: T (stopped)
20934/status:State: T (stopped)

Sending a second continue signal does the right thing and the process resumes.


ebergen@kamet:(/proc/20924/task) kill -CONT 20924
ebergen@kamet:(/proc/20924/task) grep State */status
20924/status:State: S (sleeping)
20926/status:State: S (sleeping)
20927/status:State: S (sleeping)
20928/status:State: S (sleeping)
20929/status:State: S (sleeping)
20931/status:State: S (sleeping)
20932/status:State: S (sleeping)
20933/status:State: S (sleeping)
20934/status:State: S (sleeping)

I did some digging in the kernel and didn’t see any specific reason for this behavior. I suspect it’s a kernel bug that has never been uncovered because tracing a single thread of a threaded process isn’t a very common operation.

2 Comments

  1. Shruthi says:

    How can we access the reason for thread wait in linux?? If threads are waiting for long time which command helps us to know abt why thread is not getting that particular resource?

  2. Thank you very much! I have a JVM instance that I must get a heap dump from, but a previous jhat run was killed. Trying to run jhat again just hung. Thanks to your post, I’ve unstuck it with kill -CONT!

Leave a Reply