Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Sun, 22 Sep 2002 12:40:54 -0400 From: Christopher Faylor To: cygwin AT cygwin DOT com Subject: Re: cvs cygwin1.dll Message-ID: <20020922164054.GB25597@redhat.com> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <3d81aa1b DOT 1496411 AT smtp DOT ntlworld DOT com> <20020913125816 DOT GA1030 AT redhat DOT com> <3d88c6c1 DOT 17117483 AT smtp DOT ntlworld DOT com> <20020918193553 DOT GA9328 AT redhat DOT com> <3d8bf2b5 DOT 1540735 AT smtp DOT ntlworld DOT com> <20020920155657 DOT GH24740 AT redhat DOT com> <3d90daeb DOT 24161151 AT smtp DOT ntlworld DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3d90daeb.24161151@smtp.ntlworld.com> User-Agent: Mutt/1.4i On Sun, Sep 22, 2002 at 03:08:41PM +0000, Guy Harrison wrote: >On Fri, 20 Sep 2002 11:56:57 -0400, Christopher Faylor >wrote: > >>On Fri, Sep 20, 2002 at 11:26:42AM +0000, Guy Harrison wrote: >>>On Wed, 18 Sep 2002 15:35:53 -0400, Christopher Faylor >>>wrote: > >Shame us non-developers can't get it "readonly". >http://cygwin.com/ml/cygwin-developers/2002-09/msg00071.html Yeah. Life's a bitch, isn't it? Anyway, you are looking at the wrong message. Remember this? On Fri, Sep 20, 2002 at 11:56:57AM -0400, Christopher Faylor wrote: >I suspect that this is actualy due to a deadlock in the code init.cc >which was recently discussed in cygwin-developers. If you look at my message which immediately follows the one that you mention which actually *mentions init.cc*, you will see the cause of at least one deadlock-on-exit in cygwin. Robert Collins has vowed to fix this problem this weekend. Until then, however, I have commented out the code in question. >>I don't think it has anything to do with suspended threads. You can >>certainly verify this by adding code to kill the threads specifically, >>though, and see what happens. > >I did. I declared threads[1]. All the work gets shoved onto >cygthread::simplestub which neither suspends nor stays resident. Not the same thing at all. >Hung process: > >Name---------Pid-Pri-Thd--Hnd----Mem-----User-Time---Kernel-Time---Elapsed-Time >sh-----------344---4---1---67---1832---0:00:00.020---0:00:00.080----0:02:29.935 >----------------------VM------WS---WS-Pk----Priv---Faults-NonP-Page-PageFile >------------------351732----1832----1964----1476------492----3---21-----1476 >-Tid-Pri----Cswtch------------State-----User-Time---Kernel-Time---Elapsed-Time >-548---4---------1---Wait:Suspended---0:00:00.000---0:00:00.000----0:02:29.825 > >Relevent log: > >Quick Key: > 90 GetCommandLine() chars >[n/32] =threads[n] of NTHREADS=32 >mti =main_thread_id >nam =ignore fixed on "mti" here >sdc =SD_count (member added to cygthread class) suspend count >av =threads[].avail >id =threads[].id >h =threads[].h >sus =another suspend count >gle =GetLastError() for failed "sus" > ><344/509> cli(90):J:\cygwin\bin\sh.exe >pid=344 tid=509[0/32]{mti:509}: nam=[main] sdc=-999 av=877 id=0 h=296 >sus=2 gle=0 >pid=344 tid=509[1/32]{mti:509}: nam=[main] sdc=-999 av=212 id=0 h=300 >sus=2 gle=0 >pid=344 tid=509[2/32]{mti:509}: nam=[main] sdc=-999 av=894 id=0 h=304 >sus=2 gle=0 >pid=344 tid=509[3/32]{mti:509}: nam=[main] sdc=-999 av=482 id=0 h=308 >sus=2 gle=0 > >The ::SuspendThread() and ::ResumeThread() calls in cygthread.cc assign >their result directly to SD_count. I set it explicity to silly negative >values at these points: > >-999 in cygthread::runner() after their ::CreateThread() >-99 in cygthread::stub just prior to init_exceptions() >-2 cygthread::exit_thread ::SetEvent() >-9999 cygthread::stub ::ExitThread() > >Nothing else touches 'SD_count'. The above output is generated by a >function 'SD_DumpLiving()' inserted immediately prior to ::ExitProcess() >within _pinfo::exit(). > >Our hung process is definately suspended. Not necessarily. I see nothing in the above which would disprove the theory that this is the problem which I raised in cygwin-developers. Of course, I am not 100% sure that I understand the above data. However, I'm not going to devote too much time to studying it since there is an obvious problem in the cygwin DLL now and you haven't, AFAICT, addressed that. This makes your analysis suspect until you generate a version of the DLL without the already known problem. However, if you want to provide an actual analysis of how the thread could be in a suspended state with someone waiting for it, that would be welcome. So far, everything you've provided points to the fact that the process in question is stuck in a deadlock state during ExitProcess, which sort of confirms my theory. That's why you can't debug it. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/