delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2007/08/01/11:31:36

X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Subject: RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU
Date: Wed, 1 Aug 2007 11:30:55 -0400
Message-ID: <76087731258D2545B1016BB958F00ADA123580@STEELPO.steeleye.com>
In-Reply-To: <76087731258D2545B1016BB958F00ADA123578@STEELPO.steeleye.com>
References: <76087731258D2545B1016BB958F00ADA1234D7 AT STEELPO DOT steeleye DOT com> <76087731258D2545B1016BB958F00ADA123578 AT STEELPO DOT steeleye DOT com>
From: "Ernie Coskrey" <Ernie DOT Coskrey AT steeleye DOT com>
To: <cygwin AT cygwin DOT com>, <pechtcha AT cs DOT nyu DOT edu>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id l71FVLcl014562

> -----Original Message-----
> From: Igor Peshansky
> 
> On Tue, 31 Jul 2007, Ernie Coskrey wrote:
> 
> > I've run into a problem with cygwin 1.5.20-1 and pdksh 5.2.14.
We've
> > got a pdksh.exe process that is spinning, using all the CPU.
> >
> > This scenario is very hard to reproduce, but has happened on our
test
> > systems occasionally.  It occurred recently, and I currently have
gdb
> > attached to the process and have the symbols loaded.
> 
> I assume you've rebuilt pdksh from source, since the packaged binary
is
> stripped...  Do you also have the symbols for the Cygwin DLL?

Yes, I've built both pdksh and cygwin1.dll from source and have the
symbols.

> 
> > I see that pdksh is continually calling "sigsuspend()", which is
> > immediately returning from cancelable_wait due to the fact that the
> > signal_arrived event is set.
> 
> Do you mean the sigpause() call?  Can you see which signal it attempts
> to
> suspend?  Can you email me (privately, if you wish) the stack dump
from
> gdb?
> 

It's sigsuspend() in j_waitj - line 1191 in jobs.c.  It calls
sigsuspend(&sm_default), and sm_default is 0 (no signals are blocked).

This immediately returns, and I see that j->state is still PRUNNING
every time.

> > I also see that pdksh is waiting for a subprocess to complete, and
> has a
> > handle to the PID of that process - however the process has long
> since
> > terminated.
> 
> That's normal (I think).  Cygwin may not deliver SIGCHLD immediately
> after
> process termination.  Until pdksh gets SIGCHLD, it'll keep the process
> handle.
> 
> > It appears that something went wrong during delivery of SIGCHLD.
> 
> Does this happen before or after j_sigchld() gets invoked?
> 

I suspect that j_sigchld never got invoked, or didn't run properly, but
can't definitvely prove that.

> > I've got two questions related to this:
> >
> > - have there been changes between 1.5.20-1 and 1.5.24-2, or the
> latest
> > snapshot, that might have fixed this issue?  We've done some limited
> > testing with 1.5.24-2 and haven't seen this happen yet, but as I
said
> > the it only happens rarely.
> 
> Quite possibly.  There were changes to signal handling since 1.5.20,
> IIRC.
> Unless I'm mistaken, there's even a patch for a race condition in
> process
> handling code (though it's not in 1.5.24, I think).
> 
> > - is there anything I can look at in gdb to help identify what the
> issue
> > is?
> >
> > Any suggestions would be appreciated!
> 
> Posting a sequence of steps that reliably reproduces the problem for
> you
> would be great (but not necessarily easy).

I wish I could supply this, but the problem happens very rarely.  I've
run many thousands of test shell iterations and haven't seen it reoccur
yet.

> 
> As I said above, a stack dump (with full pdksh symbols) would help...
> That might mean that you'd need to build an unstripped pdksh and
> attempt
> to reproduce the problem again.
> 	Igor
> --

Here's a stack trace of the thread where the spin is occurring.  The
other threads in the process are quiet - the signal thread is is
ReadFile as expected, and the other threads are all in stub routines
doing WaitForSingleObject.

(gdb) bt
#0  handle_sigsuspend (tempmask=0)
    at ../../../../src/winsup/cygwin/exceptions.cc:694
#1  0x61094b93 in sigsuspend (set=0x42db80)
    at ../../../../src/winsup/cygwin/signal.cc:477
#2  0x610917b8 in _sigfe () at
../../../../src/winsup/cygwin/cygserver.h:82
#3  0x0022c588 in ?? ()
#4  0x600301dc in ?? ()
#5  0x006854d8 in ?? ()
#6  0x00000003 in ?? ()
#7  0x0022c588 in ?? ()
#8  0x006874b8 in ?? ()
#9  0x006854d8 in ?? ()
#10 0x00000003 in ?? ()
#11 0x0022c5a8 in ?? ()
#12 0x004126e0 in waitlast () at ../src/jobs.c:729
#13 0x004126e0 in waitlast () at ../src/jobs.c:729
#14 0x0040b160 in expand (
    cp=0x6874b8
"\001R\001M\001T\001I\001N\001S\001R\001E\001A\001S\001O\001N\001=\003$L
KBIN/ins_list -d \"$EQVRMTSYS\" -t \"$EQVRMTTAG\" 2>NUL: | cut -d\001
-f8", wp=0x22c6b0, f=32) at ../src/eval.c:533
#15 0x0040a654 in evalstr (
    cp=0x6874b8
"\001R\001M\001T\001I\001N\001S\001R\001E\001A\001S\001O\001N\001=\003$L
KBIN/ins_list -d \"$EQVRMTSYS\" -t \"$EQVRMTTAG\" 2>NUL: | cut -d\001
-f8", f=32) at ../src/eval.c:113
#16 0x0040d80a in comexec (t=0x6871e0, tp=0x0, ap=0x687350, flags=0)
    at ../src/exec.c:555
#17 0x0040cc7d in execute (t=0x6871e0, flags=0) at ../src/exec.c:155
#18 0x0040ce39 in execute (t=0x687778, flags=0) at ../src/exec.c:192
#19 0x0040d311 in execute (t=0x686620, flags=1) at ../src/exec.c:367
#20 0x004124c1 in exchild (t=0x686620, flags=74, close_fd=0)
    at ../src/jobs.c:641
#21 0x0040cdf6 in execute (t=0x686620, flags=10) at ../src/exec.c:185
#22 0x0040ce62 in execute (t=0x688470, flags=0) at ../src/exec.c:195
#23 0x0040d311 in execute (t=0x684ee0, flags=0) at ../src/exec.c:367
#24 0x0041766e in shell (s=0x6839b8, toplevel=1) at ../src/main.c:616
#25 0x00417204 in main (argc=6, argv=0x61171f74) at ../src/main.c:429

Please let me know if there's any other information that would be
useful.  Thanks!

Ernie Coskrey
SteelEye Technology, Inc.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019