Mail Archives: cygwin/2005/06/20/02:52:18
On Sunday, June 12, 2005 T 5:37 PM, Mark Pizzolato wrote:
> On Friday, June 10, 2005 at 3:44 PM, Mark Pizzolato wrote:
>> > On Thursday, June 09, 2005 at 6:12 PM, Mark Pizzolato wrote:
>> >> On Thursday, June 09, 2005 at 3:35 PM, Christopher Faylor wrote:
>> >> > On Wed, Jun 08, 2005 at 05:43:59PM -0700, Mark Pizzolato wrote:
>> >> > >There is a serious problem for multi threaded programs doing simple
>> >> > >I/O
>> >> > >operations in cygwin (open, dup, fdopen, fclose, and close).
>> >> > >
>> >> > >The attached 81 line test program clearly demonstrates the issue
>> >> > >(by
>> >> > >hanging and no longer consuming CPU or performing any I/O
>> >> > >operations).
>> >> >
>> >> > Thanks for the relatively small test case. That was enough to track
>> >> > the
>> >> > problem down. I'm generating a new snapshot with a fix for this
>> >> > problem.
>> >>
>> >> The snapshot looks good!
>> >>
>> >> This fixes the stability problems with clamav's clamd that I've been
>> >> chasing
>> >> for a long time.
>> >
>> > Some more follow up here...I'm running with the 20050609 snapshot dll.
>> >
>> > clamav's clamd now runs better than it has ever for me on cygwin.....
>> >
>> > until "it doesn't",
>> >
>> > once it starts to run poorly it won't run cleanly again until I reboot
>> > the system
>> > (I haven't actually tried after merely exiting all processes ..)
>
> Well, i spoke too soon here. There may be some interaction with many
> recently closed tcp sessions sitting in TIME_WAIT. I'm not sure, but
> after some time, I can restart and experience aparrently good behavior and
> then things get "poor" as described.
>
> If I run with the 20050607 snapshot, the new "poor" behavior doesn't
> happen, while the test program I provided earlier in this thread hangs as
> described. So, the fix to the original problem and the new "poor" behavior
> are clearly related to changes between the 20050607 and the 20050609
> snapshots.
>
>> > To be more specific about the "poor" behavior:
>> >
>> >
>> > - pthread_unlock_mutex fails leaving errno with a value of 90. This is
>> > in a place where there is only one path through about a dozen lines of
>> > code and the mutex is definately locked. there may have been a call to
>> > pthread_create, and a definate call to pthread_cond_signal.
>> > - once the above error happens, calls (by the same thread) to accept()
>> > fail using a file descriptor which we've been successfully using all
>> > along and only close when the program exists.
>> >
>> > so some change introduced recently (since 1.5.17-1), and possibly in
>> > 20050609 fixes the dup() issue but now mutex operations are failing in
>> > strange ways.
>> >
>> > Sorry not to have a simple isolated test case for this. The good news
>> > is that once it breaks it won't run correcfly again until a reboot.
>
> I'm working on a test program to recreate this behavior.
Well... The problem wasn't in cygwin.
As it happens in clamav's clamd there were several pthread_mutex_t objects
which weren't initialized to reasonable values (i.e. left to be zero instead
of
PTHREAD_MUTEX_INITIALIZER). Calls to pthread_mutex_lock and
pthread_mutex_unlock on the uninitialized objects, depending on timing and
sequence aparrently confused some aspect of mutex processing causing
other calls to pthread_mutex_lock and pthread_mutex_unlock to fail in
strange ways.
Appropriate patches have been submitted to the clamav team.
- Mark Pizzolato
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -