Mail Archives: cygwin/2007/08/08/14:14:06
> -----Original Message-----
> From: cygwin-owner AT cygwin DOT com
> [mailto:cygwin-owner AT cygwin DOT com] On Behalf Of Ernie Coskrey
> Sent: Tuesday, July 31, 2007 3:40 PM
> To: cygwin AT cygwin DOT com
> Subject: cygwin 1.5.20-1, spinning pdksh, 100% CPU
>
>
> I've run into a problem with cygwin 1.5.20-1 and pdksh
> 5.2.14. We've got a pdksh.exe process that is spinning,
> using all the CPU.
>
> This scenario is very hard to reproduce, but has happened on
> our test systems occasionally. It occurred recently, and I
> currently have gdb attached to the process and have the
> symbols loaded. I see that pdksh is continually calling
> "sigsuspend()", which is immediately returning from
> cancelable_wait due to the fact that the signal_arrived event
> is set. I also see that pdksh is waiting for a subprocess to
> complete, and has a handle to the PID of that process -
> however the process has long since terminated.
>
> It appears that something went wrong during delivery of SIGCHLD.
>
> I've got two questions related to this:
>
> - have there been changes between 1.5.20-1 and 1.5.24-2, or
> the latest snapshot, that might have fixed this issue? We've
> done some limited testing with 1.5.24-2 and haven't seen this
> happen yet, but as I said the it only happens rarely.
> - is there anything I can look at in gdb to help identify
> what the issue is?
>
> Any suggestions would be appreciated!
>
> ---------
> Ernie Coskrey
I've discovered an interesting piece of information that I think is
related to this. I'm hoping this might ring a bell with someone on the
list.
Looking at _main_tls->stack[], when I've set a breakpoint in
handle_sigsuspend just after the cancelable_wait() call, I see the
following entries:
0x6109186f 0x4132ac
0x6109186f is "sigdelayed()", which is the routine that should have been
called to deliver the signal and reset the signal_arrived event.
0x4132ac is j_waitj (in pdksh).
So, somehow, when this problem occurs, "sigdelayed" gets pushed onto the
stack *before* j_waitj does. So, _sigbe never calls sigdelayed.
I don't think there's ever a case where sigdelayed should be at
_main_tls->stack[0]. However this happened is, I believe, the cause of
this problem.
Ernie Coskrey
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -