Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Thu, 22 Sep 2005 22:26:19 -0400 From: Christopher Faylor To: cygwin AT cygwin DOT com Subject: Re: Funny hang with snapshop 20050920 Message-ID: <20050923022619.GB21253@trixie.casa.cgf.cx> Reply-To: cygwin AT cygwin DOT com References: <4333660B DOT 7060305 AT scytek DOT de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4333660B.7060305@scytek.de> User-Agent: Mutt/1.5.8i On Thu, Sep 22, 2005 at 10:18:51PM -0400, Volker Quetschke wrote: >My favorite testcase (building OOo) started hanging again. > >(Un?-)fortunately not on one of my systems and we also didn't manage >to reproduce with a reduced testcase. But the problem generally is: >In a tcsh shell (2980) start a perl script (3012) that starts a cygwin >program (a make clone) (3016) that starts a command in a tcsh. > >Now the fun part begins, the started tcsh command is (should be) just > "/usr/bin/tcsh -fc pwd" (3736) >but there is another process started by this process (3176) that > "/usr/bin/tcsh -fc pwd" (3176) >has exactly the same command and that appears to hang. Below you see >the output of a ps command: > > PID PPID PGID WINPID TTY UID STIME COMMAND > 3772 1 3772 3772 con 11290 18:59:34 /usr/bin/bash > 2980 3772 2980 3124 con 11290 18:59:39 /usr/bin/tcsh > 3616 1 3616 3616 con 11290 19:10:02 /usr/bin/bash > 3452 3616 3452 444 con 11290 19:10:07 /usr/bin/tcsh > 3012 2980 3012 3912 con 11290 18:10:56 /usr/bin/perl > 3016 3012 3012 3916 con 11290 18:37:01 > /cygdrive/e/work/OOo/SRC680_m124/solenv/wntmsci10/bin/dmake > 3736 3016 3012 3392 con 11290 18:37:01 /usr/bin/tcsh > 3176 3736 3012 3176 con 11290 18:37:01 /usr/bin/tcsh > 3804 3452 3804 3196 con 11290 18:40:17 /usr/bin/ps > >Attached you find the output of a "cat /proc//*" for the two pids. > >But now the *really* strange part begins: You can break the hang by doing > "ls /proc/3176/fd" !? >and the build continues (until the next hang). > >Sorry, we're unable to create a reduced testcase but we thought the >strange symptoms might help pinpoint the problem. > >Attached you also find the cygcheck output of that system. > >I hope this helps a little bit, Does sending a 'kill -CONT 3176' also unstick things? Both situations send a signal to the process. How about attaching to the hung process with strace? You didn't mention that. cgf (who deeply regrets ever trying to fix the windows 98 crash problem) -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/