Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <004901c57562$ee7f1d80$1f3ca8c0@AlohaSunset.com> From: "Mark Pizzolato" To: Subject: Re: Multi Threaded programs deadlock doing simple I/O operations Date: Sun, 19 Jun 2005 23:40:19 -0700 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-IsSubscribed: yes On Sunday, June 12, 2005 T 5:37 PM, Mark Pizzolato wrote: > On Friday, June 10, 2005 at 3:44 PM, Mark Pizzolato wrote: >> > On Thursday, June 09, 2005 at 6:12 PM, Mark Pizzolato wrote: >> >> On Thursday, June 09, 2005 at 3:35 PM, Christopher Faylor wrote: >> >> > On Wed, Jun 08, 2005 at 05:43:59PM -0700, Mark Pizzolato wrote: >> >> > >There is a serious problem for multi threaded programs doing simple >> >> > >I/O >> >> > >operations in cygwin (open, dup, fdopen, fclose, and close). >> >> > > >> >> > >The attached 81 line test program clearly demonstrates the issue >> >> > >(by >> >> > >hanging and no longer consuming CPU or performing any I/O >> >> > >operations). >> >> > >> >> > Thanks for the relatively small test case. That was enough to track >> >> > the >> >> > problem down. I'm generating a new snapshot with a fix for this >> >> > problem. >> >> >> >> The snapshot looks good! >> >> >> >> This fixes the stability problems with clamav's clamd that I've been >> >> chasing >> >> for a long time. >> > >> > Some more follow up here...I'm running with the 20050609 snapshot dll. >> > >> > clamav's clamd now runs better than it has ever for me on cygwin..... >> > >> > until "it doesn't", >> > >> > once it starts to run poorly it won't run cleanly again until I reboot >> > the system >> > (I haven't actually tried after merely exiting all processes ..) > > Well, i spoke too soon here. There may be some interaction with many > recently closed tcp sessions sitting in TIME_WAIT. I'm not sure, but > after some time, I can restart and experience aparrently good behavior and > then things get "poor" as described. > > If I run with the 20050607 snapshot, the new "poor" behavior doesn't > happen, while the test program I provided earlier in this thread hangs as > described. So, the fix to the original problem and the new "poor" behavior > are clearly related to changes between the 20050607 and the 20050609 > snapshots. > >> > To be more specific about the "poor" behavior: >> > >> > >> > - pthread_unlock_mutex fails leaving errno with a value of 90. This is >> > in a place where there is only one path through about a dozen lines of >> > code and the mutex is definately locked. there may have been a call to >> > pthread_create, and a definate call to pthread_cond_signal. >> > - once the above error happens, calls (by the same thread) to accept() >> > fail using a file descriptor which we've been successfully using all >> > along and only close when the program exists. >> > >> > so some change introduced recently (since 1.5.17-1), and possibly in >> > 20050609 fixes the dup() issue but now mutex operations are failing in >> > strange ways. >> > >> > Sorry not to have a simple isolated test case for this. The good news >> > is that once it breaks it won't run correcfly again until a reboot. > > I'm working on a test program to recreate this behavior. Well... The problem wasn't in cygwin. As it happens in clamav's clamd there were several pthread_mutex_t objects which weren't initialized to reasonable values (i.e. left to be zero instead of PTHREAD_MUTEX_INITIALIZER). Calls to pthread_mutex_lock and pthread_mutex_unlock on the uninitialized objects, depending on timing and sequence aparrently confused some aspect of mutex processing causing other calls to pthread_mutex_lock and pthread_mutex_unlock to fail in strange ways. Appropriate patches have been submitted to the clamav team. - Mark Pizzolato -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/