Mail Archives: cygwin-developers/2002/07/25/16:39:43
[Sorry: Long email]
About a week ago I discovered a race condition in the UNIX domain
socket emulation in cygwin. I've got a patch for this that works
(and fixes several other small problems) bar one *minor* issue and
since I'm out of ideas, I hope someone else out there has got some
advice for me (even if it's only "don't do that!").
Here goes. I've put together a new UNIX domain handshake
protocol, but somewhere it's got to pause long enough for the
server to pick up the client's half of the protocol, since with a
socket a client can get connection, write some data and close the
socket before the server has accepted the connection (the
connection's just sitting on the pending queue).
So, I've got a piece of code in the fhandler_socket::close method
that only closes the client's secret event once the client has
received the server's okay signal *or* a (Unix) signal arrives
*or* the server closes its end of the connection (i.e. the server
exits w/o ever accepting the connection).
This is all fine and dandy except for two situations: if the
client receives an unhandled signal that should cause it to die
*or* if the client exits w/o closing the socket. At this point,
if the server is blocked itself and not accepting the connection,
the client will not exit and can't be ctrl-c'd either. The
problems in the two situations are caused by the same issue:
*) If the client receives an unhandled signal, e.g. SIGINT, the
do_exit function is called, which then calls close_all_files. But
it does this w/o setting the 'signal_arrived' event, so none of
the events are set that the fhandler_socket::close method is
waiting on (at least, not in the particular circumstances
mentioned here).
*) If the client exits w/o closing the socket, again it gets stuck
in fhandler_socket::close since no events are going to be raised.
Alternatives (AFAICT):
*) Just put a timeout in the fhandler_socket::close routine (as
was effectively the case in the previous protocol).
*) In do_exit, set a global flag that the close routine can pick
up. There is already such a flag: exit_already in "exceptions.cc"
but this is static and so inaccessible. Or is there an existing
mechanism that I'm missing?
*) A partial solution (and one that might be worth doing
regardless of any other solution) would be to set the
'signal_arrived' event before calling the do_exit function when
dying from a signal's arrival. I've tried this and it seems to
cause no problems, but is only a partial solution to the problem.
(Unless it's always set on exit . . . yuck?)
*) It would be okay perhaps to let the client block in this way,
*if* it could still be killed by a signal whilst blocked. *But*
the do_exit code in "dcrt0.cc" ignores a slew of signals, so if a
process does get blocked while exiting, it can't then be (easily)
killed. [You can still 'kill -9' it at this point.] Has someone
*) Or am I worrying too much? Don't worry about it much, bung in
a timeout, it'll hardly ever happen, relax?
// Conrad
- Raw text -