delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/03/16/15:49:58

X-Spam-Check-By: sourceware.org
Date: Thu, 16 Mar 2006 15:49:46 -0500
From: Christopher Faylor <cgf-no-personal-reply-please AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Shells hang during script execution
Message-ID: <20060316204946.GC14672@trixie.casa.cgf.cx>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <B6C33E7A8278A0408B707C9B491720D4045298 AT STEELPO DOT steeleye DOT com>
Mime-Version: 1.0
In-Reply-To: <B6C33E7A8278A0408B707C9B491720D4045298@STEELPO.steeleye.com>
User-Agent: Mutt/1.5.11
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Thu, Mar 16, 2006 at 03:14:03PM -0500, Ernie Coskrey wrote:
>>On Wed, Mar 01, 2006 at 01:01:46PM -0500, Ernie Coskrey wrote:
>>>>>Here's a description of a second hang condition we were encountering, along 
>>>>>with a patch for it.
>>>>>
>>>>>
>>>>>The application (pdksh in this case) does a read on a pipe, which eventually 
>>>>>calls pipe.cc fhandler_pipe::read in Thread 1.  This creates a new cygthread 
>>>>>with "read_pipe()" as the function.  Then >it calls th->detach(read_state).
>>>>>
>>>>>When the hang occurs, the new thread gets terminated early, before
>>>>>cygthread::stub() can call "callfunc()".  You see the error message
>>>>>"erroneous thread activation".  I'm not sure what's causing the thread
>>>>>to fail activation, but the result is, the read_state semaphore never
>>>>>gets signalled.
>>>>
>>>>Sorry but this is another band-aid around a problem.  The real problem
>>>>is that the code shouldn't get into the state that you are describing.
>>>>That's why cygwin prints an error message - it is a serious problem.
>>>>Making the code deal gracefully with a problem like this isn't going
>>>>to solve the underlying issue.
>>>>
>>>>If you can figure out what's causing the erroneous thread activation
>>>>then that will be the real culprit.
>>>>
>>>>cgf
>>>>
>>>
>>>OK, I believe I've tracked this down.
>>>
>>>The problem occurs when we get into a read_pipe cygthread constructor
>>>(cygthread::cygthread()) with a NULL h and an ev that is signalled.
>>>When this condition exists, a hang can occur as follows:
>>>
>>>1) Creator thread calls detach().  This waits for pipe_state to be released twice
>>>2) read_pipe thread calls read_pipe, reads data, and releases the semaphore twice
>>>3) Creator thread goes to WFSO(*this, INFINITE) which returns immediately because ev was set when the thread was created.
>>>4) Creator thread initiates another read_pipe cygthread to read more pipe data.
>>>
>>>At this point, there's a race: if the Creator thread gets past the
>>>initialization part of the constuctor, which sets __name(name), BEFORE
>>>the original read_pipe thread gets to the part of cygthread::stub()
>>>that sets info->__name = NULL, then you'll see the hang.  The new
>>>pipe_read will give the "erroneous thread activation" message, and the
>>>parent will be stuck waiting for data that will never arrive.
>>>
>>>The only path that leaves an unused thread structure in a state where
>>>h==NULL and ev is signalled is cygthread::release().  So the fix is
>>>simple:
>>>
>>>$ cat cygthread.cc.udiff
>>>--- cygthread.cc.ORIG   2006-02-22 10:57:42.123931300 -0500
>>>+++ cygthread.cc        2006-03-01 12:59:23.255023000 -0500
>>>@@ -268,7 +268,12 @@
>>> cygthread::release (bool nuke_h)
>>> {
>>>   if (nuke_h)
>>>+    {
>>>     h = NULL;
>>>+
>>>+    if (ev)
>>>+      ResetEvent (ev);
>>>+    }
>>> #ifdef DEBUGGING
>>>   __oldname = __name;
>>>   debug_printf ("released thread '%s'", __oldname);
>>
>>Nice analysis.  Thank you.  I think it's easier to fix this by just
>>making the ev event auto-reset then this condition would be caught in
>>terminate thread, as it was meant to be.
>
>Here's a patch for the problem that works with the latest snapshot.

I already changed this on 2006-03-01.  Making it auto-reset was actually
not the correct thing to do so I just reset the event in terminate_thread.

cgf

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019