X-Spam-Check-By: sourceware.org Date: Thu, 16 Mar 2006 15:49:46 -0500 From: Christopher Faylor To: cygwin AT cygwin DOT com Subject: Re: Shells hang during script execution Message-ID: <20060316204946.GC14672@trixie.casa.cgf.cx> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.11 Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Thu, Mar 16, 2006 at 03:14:03PM -0500, Ernie Coskrey wrote: >>On Wed, Mar 01, 2006 at 01:01:46PM -0500, Ernie Coskrey wrote: >>>>>Here's a description of a second hang condition we were encountering, along >>>>>with a patch for it. >>>>> >>>>> >>>>>The application (pdksh in this case) does a read on a pipe, which eventually >>>>>calls pipe.cc fhandler_pipe::read in Thread 1. This creates a new cygthread >>>>>with "read_pipe()" as the function. Then >it calls th->detach(read_state). >>>>> >>>>>When the hang occurs, the new thread gets terminated early, before >>>>>cygthread::stub() can call "callfunc()". You see the error message >>>>>"erroneous thread activation". I'm not sure what's causing the thread >>>>>to fail activation, but the result is, the read_state semaphore never >>>>>gets signalled. >>>> >>>>Sorry but this is another band-aid around a problem. The real problem >>>>is that the code shouldn't get into the state that you are describing. >>>>That's why cygwin prints an error message - it is a serious problem. >>>>Making the code deal gracefully with a problem like this isn't going >>>>to solve the underlying issue. >>>> >>>>If you can figure out what's causing the erroneous thread activation >>>>then that will be the real culprit. >>>> >>>>cgf >>>> >>> >>>OK, I believe I've tracked this down. >>> >>>The problem occurs when we get into a read_pipe cygthread constructor >>>(cygthread::cygthread()) with a NULL h and an ev that is signalled. >>>When this condition exists, a hang can occur as follows: >>> >>>1) Creator thread calls detach(). This waits for pipe_state to be released twice >>>2) read_pipe thread calls read_pipe, reads data, and releases the semaphore twice >>>3) Creator thread goes to WFSO(*this, INFINITE) which returns immediately because ev was set when the thread was created. >>>4) Creator thread initiates another read_pipe cygthread to read more pipe data. >>> >>>At this point, there's a race: if the Creator thread gets past the >>>initialization part of the constuctor, which sets __name(name), BEFORE >>>the original read_pipe thread gets to the part of cygthread::stub() >>>that sets info->__name = NULL, then you'll see the hang. The new >>>pipe_read will give the "erroneous thread activation" message, and the >>>parent will be stuck waiting for data that will never arrive. >>> >>>The only path that leaves an unused thread structure in a state where >>>h==NULL and ev is signalled is cygthread::release(). So the fix is >>>simple: >>> >>>$ cat cygthread.cc.udiff >>>--- cygthread.cc.ORIG 2006-02-22 10:57:42.123931300 -0500 >>>+++ cygthread.cc 2006-03-01 12:59:23.255023000 -0500 >>>@@ -268,7 +268,12 @@ >>> cygthread::release (bool nuke_h) >>> { >>> if (nuke_h) >>>+ { >>> h = NULL; >>>+ >>>+ if (ev) >>>+ ResetEvent (ev); >>>+ } >>> #ifdef DEBUGGING >>> __oldname = __name; >>> debug_printf ("released thread '%s'", __oldname); >> >>Nice analysis. Thank you. I think it's easier to fix this by just >>making the ev event auto-reset then this condition would be caught in >>terminate thread, as it was meant to be. > >Here's a patch for the problem that works with the latest snapshot. I already changed this on 2006-03-01. Making it auto-reset was actually not the correct thing to do so I just reset the event in terminate_thread. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/