Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com content-class: urn:content-classes:message MIME-Version: 1.0 Subject: RE: 1.1.3 and upwards: apparent bug with pthread_cond_wait() and/or signal() Content-Type: text/plain; charset="us-ascii" Date: Thu, 2 May 2002 18:43:43 +1000 X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: From: "Robert Collins" To: "Michael Beach" Cc: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id g428hwa16475 > -----Original Message----- > From: Michael Beach [mailto:michaelb AT ieee DOT org] > Sent: Thursday, May 02, 2002 2:16 AM > > > > thread - lock > > thread - state=run > > thread - signal > > main - lock > > main - test state (passes) > > calls pthread_cond_wait(). Doh. I need some real serious sleep. "Linux > made me do it". :]. > However if you're not expecting high bandwidth, if you could > point me at a > document or whatnot that explains how to set up a development > environment I'd > be willing to have a go. There are very few developers contributing to pthreads code - right now I'm swamped, and a new contributor has offered some high quality patches. Http://www.cygwin.com/cvs.html explains how to grab the current source. You could also just click on the 'src' checkbox beside the cygwin package in setup.exe, to get it to download a snapshot. > Sure. The quick'n'dirty pthreads calls were only so I didn't > have to post > half of our source tree in order to illustrate the problem > with an example > that actually compiles. If you're serious about wanting to > run it, give me a > shout and I'll give you a version with error handling. I can duplicate the hang. What appears to be happenning is that signals sent from a thread when another thread is entering?exiting? the wait routine get dropped. The main signal() routine finds 0 waiting threads (see thread.cc:495) when it is called, so it does nothing. A - main thread b - new thread L - lock W - wait S - signal J - join U - unlock Fails A B L W L S (1) W S <-- is dropped U U J Ok, in detail S (1) does this: lock the cond variable signals A waits for A to wake to prevent dropped signals unlocks the cond struct then the W locks the cond variable increases the waiting count waits, releasing the mutex and unlocking the cond variable A on waking does this: decrements the waiting count (now 0) tells the S(1) routine that it's woken up Locks the mutex that it's waiting on. (*)clears the cond structure's cached mutex entry if it's the last waking thread locks the cond structure decrements the mutex's wait reference unlocks the cond structure. (*) was buggy. So what is happening is that the W when it releases the mutex, did so AFTER A tested for being the last thread, so A's test was flawed. I've a fix ready, I just need to get some time to test, which I will do tonight. If you want to test it, it's Index: thread.cc =================================================================== RCS file: /cvs/src/src/winsup/cygwin/thread.cc,v retrieving revision 1.65 diff -u -p -r1.65 thread.cc --- thread.cc 28 Feb 2002 13:50:41 -0000 1.65 +++ thread.cc 2 May 2002 08:42:21 -0000 @@ -1791,20 +1791,22 @@ __pthread_cond_dowait (pthread_cond_t *c InterlockedIncrement (&((*themutex)->condwaits)); if (pthread_mutex_unlock (&(*cond)->cond_access)) system_printf ("Failed to unlock condition variable access mutex, this %p", *cond); + /* At this point calls to Signal will progress evebn if we aren' yet waiting + * However, the loop there should allow us to get scheduled and call wait, + * and have them call PulseEvent again if we dont' respond. + */ rv = (*cond)->TimedWait (waitlength); /* this may allow a race on the mutex acquisition and waits.. * But doing this within the cond access mutex creates a different race */ - bool last = false; - if (InterlockedDecrement (&((*cond)->waiting)) == 0) - last = true; + InterlockedDecrement (&((*cond)->waiting)); /* Tell Signal that we have been released */ InterlockedDecrement (&((*cond)->ExitingWait)); (*themutex)->Lock (); - if (last == true) - (*cond)->mutex = NULL; if (pthread_mutex_lock (&(*cond)->cond_access)) system_printf ("Failed to lock condition variable access mutex, this %p", *cond); + if ((*cond)->waiting == 0) + (*cond)->mutex = NULL; InterlockedDecrement (&((*themutex)->condwaits)); if (pthread_mutex_unlock (&(*cond)->cond_access)) system_printf ("Failed to unlock condition variable access mutex, this %p", *cond); Rob -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/