Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com From: swamp-dog AT ntlworld DOT com (Guy Harrison) To: cygwin AT cygwin DOT com Subject: Re: cvs cygwin1.dll Date: Sun, 22 Sep 2002 15:08:41 GMT Message-ID: <3d90daeb.24161151@smtp.ntlworld.com> References: <3d81aa1b DOT 1496411 AT smtp DOT ntlworld DOT com> <20020913125816 DOT GA1030 AT redhat DOT com> <3d88c6c1 DOT 17117483 AT smtp DOT ntlworld DOT com> <20020918193553 DOT GA9328 AT redhat DOT com> <3d8bf2b5 DOT 1540735 AT smtp DOT ntlworld DOT com> <20020920155657 DOT GH24740 AT redhat DOT com> In-Reply-To: <20020920155657.GH24740@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id g8MFBee15684 On Fri, 20 Sep 2002 11:56:57 -0400, Christopher Faylor wrote: >On Fri, Sep 20, 2002 at 11:26:42AM +0000, Guy Harrison wrote: >>On Wed, 18 Sep 2002 15:35:53 -0400, Christopher Faylor >>wrote: Shame us non-developers can't get it "readonly". http://cygwin.com/ml/cygwin-developers/2002-09/msg00071.html ...sounds *exactly* like my problem. Moreover the build date on my last cygwin1.dll that works is 2002-07-11. Similar timeframe. >>>On Wed, Sep 18, 2002 at 06:42:50PM +0000, Guy Harrison wrote: >>>>On Fri, 13 Sep 2002 08:58:16 -0400, Christopher Faylor >>>>wrote: >>>> >>>>>On Fri, Sep 13, 2002 at 09:09:37AM +0000, Guy Harrison wrote: >>>>>>I can't seem to figure out how to set a breakpoint in sigproc.cc without >>>>>>recompiling make with debug. Any hints? >>>>> >>>>>Just attach to the running process and set a breakpoint. >>>>> >>>>>Alternatively, use the "dll" command to load cygwin1.dll and then set >>>>>a breakpoint on a *line number*. >>>> >>>>Thanks, the latter helped verify that debugging made the problem go away >>>>- ditto strace. Initially I thought it was a race. Racing certainly >>>>helps trigger it but that isn't the problem. >>>> >>>>I can't see a mechanism involving cygthread::stub to cater for the case >>>>where "last man out"+1 ensures "last man out" is running. In all >>>>situations where abnormal behaviour occurs we're left waiting upon a >>>>process that consists of a single suspended cygthread::stub thread. >>>>Others should be able to verify this by bumping up the size of the >>>>cygthread.cc threads[] array up to a silly value then attempt an >>>>intensive configure/make/install with it. Conversely now that I've set >>>>threads[1] there's been no breakages. >>> >>>Where are you seeing this wait? Details please. >> >>Any reasonably intensive configure/make/install build. Not surprising >>'cos that's what I do most. Name almost any process that occurs during >>that and its had a hang on a lone suspended thread - all the parent >>processes waiting on it. Spurious. > >The "where" meant where in the code. > >You apparently tracked things down to the cygthread code but I don't >see any real analysis of why the cygthread code would cause this. The >fact that you twiddled something and the problem went away does not >necessarily mean that you've found the source of the problem -- not >in any complex system at least. I stared at it until I went boss-eyed *then* I twiddled it. >I suspect that this is actualy due to a deadlock in the code init.cc >which was recently discussed in cygwin-developers. > >>The implication is that cygthread::stub's should be in suspended state >>as the process exits. Is this (a) correct, (b) expected, (c) required? > >a, b. > >>Anyone know, or heard of, issues reguarding suspended threads and >>::ExitProcess()? > >Deadlocks with thread or process attach/detach code are documented in >MSDN. > >>Possibles that come to mind - winAPI bug whereby a suspended thread can >>be momentarily woken (ie enough to become the main thread), or perhaps a >>suspended thread can linger due to a handle being left open on it and >>therby become the main thread. > >I don't think it has anything to do with suspended threads. You can >certainly verify this by adding code to kill the threads specifically, >though, and see what happens. I did. I declared threads[1]. All the work gets shoved onto cygthread::simplestub which neither suspends nor stays resident. >The deadlock would be more likely if there are more threads and with the >new cygthread code there will always be at least six extra threads. Thanks for confirming (a) & (b). I put some checks into _pinfo::exit() immediately prior to ::ExitProcess(). The info didn't mean much without that. Hung process: Name---------Pid-Pri-Thd--Hnd----Mem-----User-Time---Kernel-Time---Elapsed-Time sh-----------344---4---1---67---1832---0:00:00.020---0:00:00.080----0:02:29.935 ----------------------VM------WS---WS-Pk----Priv---Faults-NonP-Page-PageFile ------------------351732----1832----1964----1476------492----3---21-----1476 -Tid-Pri----Cswtch------------State-----User-Time---Kernel-Time---Elapsed-Time -548---4---------1---Wait:Suspended---0:00:00.000---0:00:00.000----0:02:29.825 Relevent log: Quick Key: 90 GetCommandLine() chars [n/32] =threads[n] of NTHREADS=32 mti =main_thread_id nam =ignore fixed on "mti" here sdc =SD_count (member added to cygthread class) suspend count av =threads[].avail id =threads[].id h =threads[].h sus =another suspend count gle =GetLastError() for failed "sus" <344/509> cli(90):J:\cygwin\bin\sh.exe pid=344 tid=509[0/32]{mti:509}: nam=[main] sdc=-999 av=877 id=0 h=296 sus=2 gle=0 pid=344 tid=509[1/32]{mti:509}: nam=[main] sdc=-999 av=212 id=0 h=300 sus=2 gle=0 pid=344 tid=509[2/32]{mti:509}: nam=[main] sdc=-999 av=894 id=0 h=304 sus=2 gle=0 pid=344 tid=509[3/32]{mti:509}: nam=[main] sdc=-999 av=482 id=0 h=308 sus=2 gle=0 pid=344 tid=509[4/32]{mti:509}: nam=[main] sdc=-999 av=606 id=0 h=312 sus=2 gle=0 pid=344 tid=509[5/32]{mti:509}: nam=[main] sdc=-999 av=664 id=0 h=316 sus=2 gle=0 pid=344 tid=509[6/32]{mti:509}: nam=[main] sdc=-999 av=673 id=0 h=324 sus=2 gle=0 pid=344 tid=509[7/32]{mti:509}: nam=[main] sdc=-999 av=317 id=0 h=328 sus=2 gle=0 pid=344 tid=509[8/32]{mti:509}: nam=[main] sdc=-999 av=303 id=0 h=332 sus=2 gle=0 pid=344 tid=509[9/32]{mti:509}: nam=[main] sdc=-999 av=723 id=0 h=336 sus=2 gle=0 pid=344 tid=509[10/32]{mti:509}: nam=[main] sdc=-999 av=337 id=0 h=340 sus=2 gle=0 pid=344 tid=509[11/32]{mti:509}: nam=[main] sdc=-999 av=472 id=0 h=344 sus=2 gle=0 pid=344 tid=509[12/32]{mti:509}: nam=[main] sdc=-999 av=627 id=0 h=348 sus=2 gle=0 pid=344 tid=509[13/32]{mti:509}: nam=[main] sdc=-999 av=458 id=0 h=352 sus=2 gle=0 pid=344 tid=509[14/32]{mti:509}: nam=[main] sdc=-999 av=875 id=0 h=356 sus=2 gle=0 pid=344 tid=509[15/32]{mti:509}: nam=[main] sdc=-999 av=637 id=0 h=360 sus=2 gle=0 pid=344 tid=509[16/32]{mti:509}: nam=[main] sdc=-999 av=768 id=0 h=364 sus=2 gle=0 pid=344 tid=509[17/32]{mti:509}: nam=[main] sdc=-999 av=168 id=0 h=368 sus=2 gle=0 pid=344 tid=509[18/32]{mti:509}: nam=[main] sdc=-999 av=216 id=0 h=372 sus=2 gle=0 pid=344 tid=509[19/32]{mti:509}: nam=[main] sdc=-999 av=783 id=0 h=376 sus=2 gle=0 pid=344 tid=509[20/32]{mti:509}: nam=[main] sdc=-999 av=226 id=0 h=380 sus=2 gle=0 pid=344 tid=509[21/32]{mti:509}: nam=[main] sdc=-999 av=355 id=0 h=384 sus=2 gle=0 pid=344 tid=509[22/32]{mti:509}: nam=[main] sdc=-999 av=651 id=0 h=388 sus=2 gle=0 pid=344 tid=509[23/32]{mti:509}: nam=[main] sdc=-999 av=717 id=0 h=392 sus=2 gle=0 pid=344 tid=509[24/32]{mti:509}: nam=[main] sdc=-999 av=859 id=0 h=396 sus=2 gle=0 pid=344 tid=509[25/32]{mti:509}: nam=[main] sdc=-999 av=752 id=0 h=400 sus=2 gle=0 pid=344 tid=509[26/32]{mti:509}: nam=[main] sdc=-999 av=796 id=0 h=404 sus=2 gle=0 pid=344 tid=509[27/32]{mti:509}: nam=[main] sdc=-999 av=887 id=0 h=408 sus=2 gle=0 pid=344 tid=509[28/32]{mti:509}: nam=[main] sdc=-999 av=728 id=0 h=412 sus=2 gle=0 pid=344 tid=509[29/32]{mti:509}: nam=[main] sdc=-99 av=0 id=908 h=416 sus=1 gle=0 pid=344 tid=509[30/32]{mti:509}: nam=[main] sdc=0 av=0 id=0 h=0 sus=-1 gle=6 pid=344 tid=509[31/32]{mti:509}: nam=[main] sdc=0 av=0 id=0 h=0 sus=-1 gle=6 The ::SuspendThread() and ::ResumeThread() calls in cygthread.cc assign their result directly to SD_count. I set it explicity to silly negative values at these points: -999 in cygthread::runner() after their ::CreateThread() -99 in cygthread::stub just prior to init_exceptions() -2 cygthread::exit_thread ::SetEvent() -9999 cygthread::stub ::ExitThread() Nothing else touches 'SD_count'. The above output is generated by a function 'SD_DumpLiving()' inserted immediately prior to ::ExitProcess() within _pinfo::exit(). Our hung process is definately suspended. I got this one woken back up and the build went to completion. tid=548 is nowhere to be seen so it stands to reason it formally resided in threads[30] or threads[31]. Nowhere do I set SD_count=0. Must be cygthread::stub SuspendThread or external influence. -- swamp-dog AT ntlworld DOT com -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/