I have an application that has run fine on RHEL 5.X and does most of the time on RHEL 6.X. But now and then it gets stuck and there are no errors and the process cannot be terminated with kill -15 PID, it takes kill -9 PID. While doing some research on another problem I ran across the gstack command that I had never used before. So the next time the program hung I ran gstack on its pid and below is the results.
It appears the program had made a call to a local function to get the current date and time and format it to be displayed on the user's terminal, when the user must have just disconnected. The disconnect would have caused the process to receive the SIGHUP signal which it is trapping and would have gotten the date and time and formatted it to put a time stamped message in a log file and then would have exited. But because __tz_convert() was in the process of doing something and was reentered again because of the signal trapping and processing that called __tz_convert again, it blocks on _L_lock_2163() and __lll_lock_wait_private().
I know the general concept of signal handler is to do very little, just set a variable or something. But this signal handler would not return, it would log some information and then exit after removing a lock file. Therefore I never thought it would run into a problem like this. I can provide exact kernel version and version of anything else if needed.
My question is, is this a known problem? Is there a fix for it if it is a known problem? Or a work around?
#0 0x0000003ead2f806e in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x0000003ead29de5d in _L_lock_2163 () from /lib64/libc.so.6
#2 0x0000003ead29dc17 in __tz_convert () from /lib64/libc.so.6
#3 0x000000000040abd8 in fmt_timestring ()
#4 0x000000000040b6b0 in fmt_timestamp ()
#5 0x000000000040b99d in fmt_mid ()
#6 0x0000000000401fa3 in getout ()
#7 0x0000000000402078 in gotsig ()
#9 0x0000003ead256578 in _IO_vfscanf_internal () from /lib64/libc.so.6
#10 0x0000003ead269945 in vsscanf () from /lib64/libc.so.6
#11 0x0000003ead2639a8 in sscanf () from /lib64/libc.so.6
#12 0x0000003ead29d172 in __tzset_parse_tz () from /lib64/libc.so.6
#13 0x0000003ead29e34e in __tzfile_compute () from /lib64/libc.so.6
#14 0x0000003ead29dcd7 in __tz_convert () from /lib64/libc.so.6
#15 0x000000000040f2a8 in time_update ()
#16 0x0000000000405da1 in timeupdate ()
#17 0x000000000040773e in disp_input_edit ()