Mailing-List: contact cygwin-developers-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin-developers AT sources DOT redhat DOT com Date: Thu, 8 Nov 2001 12:09:56 -0500 From: Christopher Faylor To: cygwin-developers AT cygwin DOT com Subject: Re: Debugging problem in peek_pipe in select.cc Message-ID: <20011108120956.A2730@redhat.com> Reply-To: cygwin-developers AT cygwin DOT com Mail-Followup-To: cygwin-developers AT cygwin DOT com References: <20011108155542 DOT 19905 DOT qmail AT lizard DOT curl DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20011108155542.19905.qmail@lizard.curl.com> User-Agent: Mutt/1.3.21i On Thu, Nov 08, 2001 at 10:55:42AM -0500, Jonathan Kamens wrote: >I'm trying to debug why "make -j2" continues to hang for us >occasionally even after cgf's recent fix to the code in this area. > >After deploying a cygwin1.dll with his fix, I ran two builds in a row >which both hung. I didn't get much useful information out of them, so >I set things up to be able to debug better in case of future hangs, >and then started running builds. > >I ran a whole bunch of builds over several days and none of them >hung. Finally, one of them hung, and then one of my coworkers killed >and restarted it before I could debug it :-). > >Shortly after that, I finally got another build to hang, and I'm >looking at that one now. Here's the current roadblock preventing me >from understanding what's going on.... > >I attached to a hung process. The top of its stack trace in thread 1 >looks like this: > > #0 0x77f67a5b in ?? () > #1 0x61053b08 in peek_pipe (s=0x24aeee4, ignra=0, guard_mutex=0x1dc) > at /u/jik/cygwin-cvs/src/winsup/cygwin/select.cc:453 > #2 0x61053eba in fhandler_pipe::ready_for_read (this=0x61544920, fd=6, > howlong=4294967295, ignra=0) > at /u/jik/cygwin-cvs/src/winsup/cygwin/select.cc:512 > #3 0x61062b97 in _read (fd=6, ptr=0x24aeff2, len=1) > at /u/jik/cygwin-cvs/src/winsup/cygwin/syscalls.cc:315 > #4 0x6108cbce in read (fd=6, buf=0x24aeff2, cnt=1) > at /u/jik/cygwin-cvs/src/newlib/libc/syscalls/sysread.c:15 > >Line 453 of select.cc is a call to PeekNamedPipe. According to the >MSDN documentation for PeekNamedPipe, it never hangs. So, thinking >that frame 0 must be the PeekNamedPipe invocation, I typed "frame 0" >and then "finish" in a "gdb -nw" window (running inside an ssh session >to the Windows servers), and now it's hung. How can that be? I don't >get it. The point of my addition of a mutex to peek_pipe was to prevent occurrences of PeekNamedPipe blocking, actually. It can block in pathological situations when another thread/process is doing a blocking read. From your backtrace, it looks like you are running an older version of the sources. I have been making a lot of changes to select to try to fix this problem. One change in particular allowed me to run "make -j2" for more than 24 hours with no hang. I'm sorry that I didn't specifically send you email about this. >^C has no effect at this point, so I can't get get to stop the process >and tell me where it is now. If cygwin is in a blocking win32 API call, then ^C will not work. ready_for_read is specifically designed to not block so that signals will work wrt blocking reads. If you are still seeing hangs in the most recent sources, then there is still some kind of race with the guard mutex in peek_pipe. That is where you will need to investigate. cgf