X-Spam-Check-By: sourceware.org Date: Tue, 31 Jul 2007 17:14:17 -0400 (EDT) From: Igor Peshansky Reply-To: cygwin AT cygwin DOT com To: Ernie Coskrey cc: cygwin AT cygwin DOT com Subject: Re: cygwin 1.5.20-1, spinning pdksh, 100% CPU In-Reply-To: <76087731258D2545B1016BB958F00ADA1234D7@STEELPO.steeleye.com> Message-ID: References: <76087731258D2545B1016BB958F00ADA1234D7 AT STEELPO DOT steeleye DOT com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Tue, 31 Jul 2007, Ernie Coskrey wrote: > I've run into a problem with cygwin 1.5.20-1 and pdksh 5.2.14. We've > got a pdksh.exe process that is spinning, using all the CPU. > > This scenario is very hard to reproduce, but has happened on our test > systems occasionally. It occurred recently, and I currently have gdb > attached to the process and have the symbols loaded. I assume you've rebuilt pdksh from source, since the packaged binary is stripped... Do you also have the symbols for the Cygwin DLL? > I see that pdksh is continually calling "sigsuspend()", which is > immediately returning from cancelable_wait due to the fact that the > signal_arrived event is set. Do you mean the sigpause() call? Can you see which signal it attempts to suspend? Can you email me (privately, if you wish) the stack dump from gdb? > I also see that pdksh is waiting for a subprocess to complete, and has a > handle to the PID of that process - however the process has long since > terminated. That's normal (I think). Cygwin may not deliver SIGCHLD immediately after process termination. Until pdksh gets SIGCHLD, it'll keep the process handle. > It appears that something went wrong during delivery of SIGCHLD. Does this happen before or after j_sigchld() gets invoked? > I've got two questions related to this: > > - have there been changes between 1.5.20-1 and 1.5.24-2, or the latest > snapshot, that might have fixed this issue? We've done some limited > testing with 1.5.24-2 and haven't seen this happen yet, but as I said > the it only happens rarely. Quite possibly. There were changes to signal handling since 1.5.20, IIRC. Unless I'm mistaken, there's even a patch for a race condition in process handling code (though it's not in 1.5.24, I think). > - is there anything I can look at in gdb to help identify what the issue > is? > > Any suggestions would be appreciated! Posting a sequence of steps that reliably reproduces the problem for you would be great (but not necessarily easy). As I said above, a stack dump (with full pdksh symbols) would help... That might mean that you'd need to build an unstripped pdksh and attempt to reproduce the problem again. Igor -- http://cs.nyu.edu/~pechtcha/ |\ _,,,---,,_ pechtcha AT cs DOT nyu DOT edu | igor AT watson DOT ibm DOT com ZZZzz /,`.-'`' -. ;-;;,_ Igor Peshansky, Ph.D. (name changed!) |,4- ) )-,_. ,\ ( `'-' old name: Igor Pechtchanski '---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow! Belief can be manipulated. Only knowledge is dangerous. -- Frank Herbert -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/