Mail Archives: cygwin-developers/2002/06/30/08:25:48
Dear list,
I've been suffering from what looks like a memory corruption (?) in
cygserver and it's been doing my head in. I think I've found the
solution, which I thought I'd share with everyone since it doesn't
seem to be in the archives. AFAICT the problem is that the C++ new and
delete operators are not thread-safe in the current cygwin g++
release. I knew there was some issue with this release of gcc/g++ not
being compiled with --enable-threads but since threads are being used
by the DLL itself, I thought that threads and C++ were basically okay.
What I've been seeing is a segmentation fault in either __builtin_new
or __builtin_delete when thrashing the cygserver with continuous shm
requests. A common stress test I'm running here uses a hundred
process, each with three threads, all running continuous shmget(2)
calls on the same shared memory segment. I can re-create the problem
with fewer clients, but it's easier to generate with that sort of
load. If the segv doesn't occur, everything works fine, there are no
other symptoms.
I've disassembled the __builtin_delete operator, and the address at
which the fault appears seems to be something to do with exception
handling code before the function returns. Since gcc is not compiled
with threading, there is one global exception handler context object.
Presumably, if one thread does a new or delete while another thread is
doing likewise, the scene is set for bogosity when one of the threads
tries to unwind its exception handling state on return.
I've just re-built cygserver without any use of new and delete (I've
replaced them with free/malloc, placement new, and explicit calls to
destructors). The application is also compiled -fno-exceptions, so the
only exception-handling code linked in is in the builting new and
delete operators, and I'm no longer calling them. Now, I can't get it
to fall over. That's not exactly proof of anything for a
multi-threaded application but I was managing to kick it over
regularly before. (I've also been testing all of this in a mingw
version of cygserver but I was having exactly the same problem with
the cygwin version.)
If my diagnosis is correct, I'm surprised it's not been seen before
(like, in the cygwin DLL itself?). Then again, I'm having to stress
the program pretty heavily to trip it up.
Now, this might all be academic, what with the looming (?) arrival of
gcc 3.1 for cygwin, but I thought I'd share the results of several
days work . . . Any comments? Better ideas?
// Conrad
- Raw text -