Mail Archives: cygwin-developers/2001/06/04/05:15:36
The port attempt of bind9 has uncovered some serious problems with
pthread. Probably one of the major differences between bind 8 and
bind 9 is the use of threads for not only named but also the resolver
with in fact the lightweight resolver being a hard link to named --
which is another problem with the build process BTW. Also, it seems
dig and nslookup are threaded, they share some of the same sourcecode,
it apprears as they segfault in exactly the same place:
The call to pthread_cond_timedwait
using gdb it appears the the segfault occurs the last line of the following,
note the FIXME comment.
// FIXME: pshared mutexs have the cond count in the shared memory area.
// We need to accomodate that.
int
__pthread_cond_timedwait (pthread_cond_t * cond, pthread_mutex_t * mutex,
const struct timespec *abstime)
{
// and yes cond_access here is still open to a race. (we increment,
context swap,
// broadcast occurs - we miss the broadcast. the functions aren't split
properly.
int rv;
if (!abstime)
return EINVAL;
pthread_mutex **themutex = NULL;
if (*mutex == PTHREAD_MUTEX_INITIALIZER)
__pthread_mutex_init (mutex, NULL);
if ((((pshared_mutex *)(mutex))->flags & SYS_BASE == SYS_BASE))
// a pshared mutex
themutex = __pthread_mutex_getpshared (mutex);
if (!verifyable_object_isvalid (*themutex, PTHREAD_MUTEX_MAGIC))
return EINVAL;
Even taken out of context like it is, this is obviously buggy, themutex is
initialized to NULL and then is only re-initialized to a "valid" value if
the mutex is a pshared mutex, if it is not, then themutex is left == to
NULL.
And in fact when the above
pthread_mutex **themutex = NULL;
is replaced with
pthread_mutex_t *themutex = mutex;
to mimic the initalization that takes place in pthread_cond_wait, the
segmentation fault goes away, and the program dig ran part-way
successfully, but not totally:
$ ./dig 141monkeys.org
; <<>> DiG 9.1.2 <<>> 141monkeys.org
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38854
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; QUESTION SECTION:
;141monkeys.org. IN A
;; AUTHORITY SECTION:
141monkeys.org. 60 IN SOA bubba.141monkeys.org.
root.141mo
nkeys.org. 36 300 120 21600 60
;; Query time: 270 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Mon Jun 4 04:17:02 2001
;; MSG SIZE rcvd: 79
0 [unknown (0xFFFCEE81)] dig 199739 pthread_cond::BroadCast:
Broadcast cal
led with invalid mutex
from grepping through thread.cc, this message is raised in
pthread_cond::BroadCast (), and appears to be called from
int __pthread_cond_broadcast (pthread_cond_t * cond)
{
if (!verifyable_object_isvalid (*cond, PTHREAD_COND_MAGIC))
return EINVAL;
(*cond)->BroadCast ();
return 0;
}
perhaps a matter of not getting it properly from the shared area as is
done in __pthread_cond_timedwait? Unfornutately, the exact context
could not be determined as using gdb caused the program to freeze
and eventually, the machine had to be rebooted.
====================================================
====================================================
Ok so much for the background, now the question. Apparently from
the comments and the to-do list, the pthread impl is not completed,
could someone give me or point me to some documentation that describes
the architecture of cygwin and how threads fit into it? Also, what part is
done or generally considered solid by now? Also, what IS the shared
area BTW?
-Jeff
P.S. Oh yes, newbie here if the last question didn't give me away.
- Raw text -