delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2007/08/08/14:14:06

X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Subject: RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU
Date: Wed, 8 Aug 2007 14:10:57 -0400
Message-ID: <76087731258D2545B1016BB958F00ADA1239A5@STEELPO.steeleye.com>
In-Reply-To: <76087731258D2545B1016BB958F00ADA1234D7@STEELPO.steeleye.com>
From: "Ernie Coskrey" <Ernie DOT Coskrey AT steeleye DOT com>
To: <cygwin AT cygwin DOT com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id l78IDUm3008313

> -----Original Message-----
> From: cygwin-owner AT cygwin DOT com 
> [mailto:cygwin-owner AT cygwin DOT com] On Behalf Of Ernie Coskrey
> Sent: Tuesday, July 31, 2007 3:40 PM
> To: cygwin AT cygwin DOT com
> Subject: cygwin 1.5.20-1, spinning pdksh, 100% CPU
> 
>  
> I've run into a problem with cygwin 1.5.20-1 and pdksh 
> 5.2.14.  We've got a pdksh.exe process that is spinning, 
> using all the CPU.
>  
> This scenario is very hard to reproduce, but has happened on 
> our test systems occasionally.  It occurred recently, and I 
> currently have gdb attached to the process and have the 
> symbols loaded.  I see that pdksh is continually calling 
> "sigsuspend()", which is immediately returning from 
> cancelable_wait due to the fact that the signal_arrived event 
> is set.  I also see that pdksh is waiting for a subprocess to 
> complete, and has a handle to the PID of that process - 
> however the process has long since terminated.
>  
> It appears that something went wrong during delivery of SIGCHLD.
>  
> I've got two questions related to this:
>  
> - have there been changes between 1.5.20-1 and 1.5.24-2, or 
> the latest snapshot, that might have fixed this issue?  We've 
> done some limited testing with 1.5.24-2 and haven't seen this 
> happen yet, but as I said the it only happens rarely.
> - is there anything I can look at in gdb to help identify 
> what the issue is?
>  
> Any suggestions would be appreciated!
>  
> ---------
> Ernie Coskrey 

I've discovered an interesting piece of information that I think is
related to this.  I'm hoping this might ring a bell with someone on the
list.

Looking at _main_tls->stack[], when I've set a breakpoint in
handle_sigsuspend just after the cancelable_wait() call, I see the
following entries:

    0x6109186f  0x4132ac

0x6109186f is "sigdelayed()", which is the routine that should have been
called to deliver the signal and reset the signal_arrived event.
0x4132ac is j_waitj (in pdksh).

So, somehow, when this problem occurs, "sigdelayed" gets pushed onto the
stack *before* j_waitj does.  So, _sigbe never calls sigdelayed.

I don't think there's ever a case where sigdelayed should be at
_main_tls->stack[0].  However this happened is, I believe, the cause of
this problem.

Ernie Coskrey

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019