delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2005/09/23/20:11:25

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Date: Fri, 23 Sep 2005 20:11:10 -0400
From: Christopher Faylor <cgf-no-personal-reply-please AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Funny hang with snapshop 20050920
Message-ID: <20050924001110.GA1390@trixie.casa.cgf.cx>
Reply-To: cygwin AT cygwin DOT com
References: <4333660B DOT 7060305 AT scytek DOT de> <20050923022619 DOT GB21253 AT trixie DOT casa DOT cgf DOT cx> <43348E75 DOT 7080309 AT scytek DOT de>
Mime-Version: 1.0
In-Reply-To: <43348E75.7080309@scytek.de>
User-Agent: Mutt/1.5.8i

On Fri, Sep 23, 2005 at 07:23:33PM -0400, Volker Quetschke wrote:
>Christopher Faylor wrote:
>>On Thu, Sep 22, 2005 at 10:18:51PM -0400, Volker Quetschke wrote:
>>
>>>My favorite testcase (building OOo) started hanging again.
>>>...
>>>But now the *really* strange part begins: You can break the hang by doing
>>>"ls /proc/3176/fd" !?
>>>and the build continues (until the next hang).
>>>
>>>Sorry, we're unable to create a reduced testcase but we thought the
>>>strange symptoms might help pinpoint the problem.
>>>
>>>Attached you also find the cygcheck output of that system.
>>
>>Does sending a 'kill -CONT 3176' also unstick things?  Both situations 
>>send a
>>signal to the process.
>
>Sorry, this question got lost, but ...
>
>>How about attaching to the hung process with strace?  You didn't mention
>>that.
>
>he tried to attach and strace was standing there without output.
>A "ls /proc/<pid>/fd" produced then the first four lines of the
>attached strace log but tcsh still hung.

You know, I noticed yesterday that there was some information missing
from the strace output in the open_shared function and, of course, I
didn't fix it.  Oh well.  That means that I don't get much from this
strace output.

>Several "ls /proc/<pid>/fd" later it continued and produced the
>rest of that logfile.
>
>Did you notice that the WINPID of the hanging tcsh is the same as
>the PID? This is always the case if it hangs.

That just means that the process has forked but hasn't execed anything.
I don't think that's significant.

>Additional info: Both tcsh processes exist with the respective
>WINPID in taskmgr.

I'd expect that they did or you wouldn't be able to attach to them.

There is a new snapshot up there now.  I think I've given up on the
technique that I was trying to use to fix the Windows 98 bug.  I've
yanked out a lot of the code and simplified things but I hope I haven't
caused the bug to reemerge.

Could you try the 2005-09-23 snapshot?  Same rules.  I'd still like to
know if sending a CONT to the hung process fixes it as well as ls
/proc/nnn/fd and I'd still like to see the strace output if the process
hangs again.

Also could anyone who could duplicate the Windows 98 error popup dialog
confirm or deny if it is still fixed?

cgf

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019