X-Spam-Check-By: sourceware.org Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU Date: Wed, 1 Aug 2007 11:30:55 -0400 Message-ID: <76087731258D2545B1016BB958F00ADA123580@STEELPO.steeleye.com> In-Reply-To: <76087731258D2545B1016BB958F00ADA123578@STEELPO.steeleye.com> References: <76087731258D2545B1016BB958F00ADA1234D7 AT STEELPO DOT steeleye DOT com> <76087731258D2545B1016BB958F00ADA123578 AT STEELPO DOT steeleye DOT com> From: "Ernie Coskrey" To: , X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id l71FVLcl014562 > -----Original Message----- > From: Igor Peshansky > > On Tue, 31 Jul 2007, Ernie Coskrey wrote: > > > I've run into a problem with cygwin 1.5.20-1 and pdksh 5.2.14. We've > > got a pdksh.exe process that is spinning, using all the CPU. > > > > This scenario is very hard to reproduce, but has happened on our test > > systems occasionally. It occurred recently, and I currently have gdb > > attached to the process and have the symbols loaded. > > I assume you've rebuilt pdksh from source, since the packaged binary is > stripped... Do you also have the symbols for the Cygwin DLL? Yes, I've built both pdksh and cygwin1.dll from source and have the symbols. > > > I see that pdksh is continually calling "sigsuspend()", which is > > immediately returning from cancelable_wait due to the fact that the > > signal_arrived event is set. > > Do you mean the sigpause() call? Can you see which signal it attempts > to > suspend? Can you email me (privately, if you wish) the stack dump from > gdb? > It's sigsuspend() in j_waitj - line 1191 in jobs.c. It calls sigsuspend(&sm_default), and sm_default is 0 (no signals are blocked). This immediately returns, and I see that j->state is still PRUNNING every time. > > I also see that pdksh is waiting for a subprocess to complete, and > has a > > handle to the PID of that process - however the process has long > since > > terminated. > > That's normal (I think). Cygwin may not deliver SIGCHLD immediately > after > process termination. Until pdksh gets SIGCHLD, it'll keep the process > handle. > > > It appears that something went wrong during delivery of SIGCHLD. > > Does this happen before or after j_sigchld() gets invoked? > I suspect that j_sigchld never got invoked, or didn't run properly, but can't definitvely prove that. > > I've got two questions related to this: > > > > - have there been changes between 1.5.20-1 and 1.5.24-2, or the > latest > > snapshot, that might have fixed this issue? We've done some limited > > testing with 1.5.24-2 and haven't seen this happen yet, but as I said > > the it only happens rarely. > > Quite possibly. There were changes to signal handling since 1.5.20, > IIRC. > Unless I'm mistaken, there's even a patch for a race condition in > process > handling code (though it's not in 1.5.24, I think). > > > - is there anything I can look at in gdb to help identify what the > issue > > is? > > > > Any suggestions would be appreciated! > > Posting a sequence of steps that reliably reproduces the problem for > you > would be great (but not necessarily easy). I wish I could supply this, but the problem happens very rarely. I've run many thousands of test shell iterations and haven't seen it reoccur yet. > > As I said above, a stack dump (with full pdksh symbols) would help... > That might mean that you'd need to build an unstripped pdksh and > attempt > to reproduce the problem again. > Igor > -- Here's a stack trace of the thread where the spin is occurring. The other threads in the process are quiet - the signal thread is is ReadFile as expected, and the other threads are all in stub routines doing WaitForSingleObject. (gdb) bt #0 handle_sigsuspend (tempmask=0) at ../../../../src/winsup/cygwin/exceptions.cc:694 #1 0x61094b93 in sigsuspend (set=0x42db80) at ../../../../src/winsup/cygwin/signal.cc:477 #2 0x610917b8 in _sigfe () at ../../../../src/winsup/cygwin/cygserver.h:82 #3 0x0022c588 in ?? () #4 0x600301dc in ?? () #5 0x006854d8 in ?? () #6 0x00000003 in ?? () #7 0x0022c588 in ?? () #8 0x006874b8 in ?? () #9 0x006854d8 in ?? () #10 0x00000003 in ?? () #11 0x0022c5a8 in ?? () #12 0x004126e0 in waitlast () at ../src/jobs.c:729 #13 0x004126e0 in waitlast () at ../src/jobs.c:729 #14 0x0040b160 in expand ( cp=0x6874b8 "\001R\001M\001T\001I\001N\001S\001R\001E\001A\001S\001O\001N\001=\003$L KBIN/ins_list -d \"$EQVRMTSYS\" -t \"$EQVRMTTAG\" 2>NUL: | cut -d\001 -f8", wp=0x22c6b0, f=32) at ../src/eval.c:533 #15 0x0040a654 in evalstr ( cp=0x6874b8 "\001R\001M\001T\001I\001N\001S\001R\001E\001A\001S\001O\001N\001=\003$L KBIN/ins_list -d \"$EQVRMTSYS\" -t \"$EQVRMTTAG\" 2>NUL: | cut -d\001 -f8", f=32) at ../src/eval.c:113 #16 0x0040d80a in comexec (t=0x6871e0, tp=0x0, ap=0x687350, flags=0) at ../src/exec.c:555 #17 0x0040cc7d in execute (t=0x6871e0, flags=0) at ../src/exec.c:155 #18 0x0040ce39 in execute (t=0x687778, flags=0) at ../src/exec.c:192 #19 0x0040d311 in execute (t=0x686620, flags=1) at ../src/exec.c:367 #20 0x004124c1 in exchild (t=0x686620, flags=74, close_fd=0) at ../src/jobs.c:641 #21 0x0040cdf6 in execute (t=0x686620, flags=10) at ../src/exec.c:185 #22 0x0040ce62 in execute (t=0x688470, flags=0) at ../src/exec.c:195 #23 0x0040d311 in execute (t=0x684ee0, flags=0) at ../src/exec.c:367 #24 0x0041766e in shell (s=0x6839b8, toplevel=1) at ../src/main.c:616 #25 0x00417204 in main (argc=6, argv=0x61171f74) at ../src/main.c:429 Please let me know if there's any other information that would be useful. Thanks! Ernie Coskrey SteelEye Technology, Inc. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/