X-Spam-Check-By: sourceware.org Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: Re: Shells hang during script execution Date: Thu, 16 Mar 2006 15:14:03 -0500 Message-ID: From: "Ernie Coskrey" To: X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id k2GKEE7h022912 >On Wed, Mar 01, 2006 at 01:01:46PM -0500, Ernie Coskrey wrote: >>>>Here's a description of a second hang condition we were encountering, along >>>>with a patch for it. >>>> >>>> >>>>The application (pdksh in this case) does a read on a pipe, which eventually >>>>calls pipe.cc fhandler_pipe::read in Thread 1. This creates a new cygthread >>>>with "read_pipe()" as the function. Then >it calls th->detach(read_state). >>>> >>>>When the hang occurs, the new thread gets terminated early, before >>>>cygthread::stub() can call "callfunc()". You see the error message >>>>"erroneous thread activation". I'm not sure what's causing the thread >>>>to fail activation, but the result is, the read_state semaphore never >>>>gets signalled. >>> >>>Sorry but this is another band-aid around a problem. The real problem >>>is that the code shouldn't get into the state that you are describing. >>>That's why cygwin prints an error message - it is a serious problem. >>>Making the code deal gracefully with a problem like this isn't going >>>to solve the underlying issue. >>> >>>If you can figure out what's causing the erroneous thread activation >>>then that will be the real culprit. >>> >>>cgf >>> >> >>OK, I believe I've tracked this down. >> >>The problem occurs when we get into a read_pipe cygthread constructor >>(cygthread::cygthread()) with a NULL h and an ev that is signalled. >>When this condition exists, a hang can occur as follows: >> >>1) Creator thread calls detach(). This waits for pipe_state to be released twice >>2) read_pipe thread calls read_pipe, reads data, and releases the semaphore twice >>3) Creator thread goes to WFSO(*this, INFINITE) which returns immediately because ev was set when the thread was created. >>4) Creator thread initiates another read_pipe cygthread to read more pipe data. >> >>At this point, there's a race: if the Creator thread gets past the >>initialization part of the constuctor, which sets __name(name), BEFORE >>the original read_pipe thread gets to the part of cygthread::stub() >>that sets info->__name = NULL, then you'll see the hang. The new >>pipe_read will give the "erroneous thread activation" message, and the >>parent will be stuck waiting for data that will never arrive. >> >>The only path that leaves an unused thread structure in a state where >>h==NULL and ev is signalled is cygthread::release(). So the fix is >>simple: >> >>$ cat cygthread.cc.udiff >>--- cygthread.cc.ORIG 2006-02-22 10:57:42.123931300 -0500 >>+++ cygthread.cc 2006-03-01 12:59:23.255023000 -0500 >>@@ -268,7 +268,12 @@ >> cygthread::release (bool nuke_h) >> { >> if (nuke_h) >>+ { >> h = NULL; >>+ >>+ if (ev) >>+ ResetEvent (ev); >>+ } >> #ifdef DEBUGGING >> __oldname = __name; >> debug_printf ("released thread '%s'", __oldname); > >Nice analysis. Thank you. I think it's easier to fix this by just >making the ev event auto-reset then this condition would be caught in >terminate thread, as it was meant to be. > >cgf Here's a patch for the problem that works with the latest snapshot. ----- Ernie Coskrey SteelEye Technology, Inc. --- cygthread.cc.ORIG 2006-03-01 17:40:44.000000000 -0500 +++ cygthread.cc 2006-03-16 14:54:04.148312500 -0500 @@ -78,7 +78,7 @@ debug_printf ("thread '%s', id %p, stack_ptr %p", info->name (), info->id, info->stack_ptr); if (!info->ev) { - info->ev = CreateEvent (&sec_none_nih, TRUE, FALSE, NULL); + info->ev = CreateEvent (&sec_none_nih, FALSE, FALSE, NULL); info->thread_sync = CreateEvent (&sec_none_nih, FALSE, FALSE, NULL); } } @@ -197,8 +197,6 @@ HANDLE htobe; if (h) { - if (ev) - ResetEvent (ev); while (!thread_sync) low_priority_sleep (0); SetEvent (thread_sync); @@ -223,7 +221,6 @@ while (!ev) low_priority_sleep (0); WaitForSingleObject (ev, INFINITE); - ResetEvent (ev); } h = htobe; } -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/