Mail Archives: cygwin-developers/1998/06/09/10:54:30
Hi,
I've stumbled across a winsock bug. After a TCP connection is
made and the connection is closed, the connection goes into TIME_WAIT
state as per the TCP spec. BUT winsock lets you bind to ports that
are occupied in TIME_WAIT connections even without the SO_REUSEADDR
option being set!!!
This might sound innocuous, but it isnt. It turns out that this
is the reason why cygwin's socketpair() implementation is broken. What
happens is the socketpair function tries to make a connection between
two TCP sockets over tcp. In the process, winsock implicitly binds
to two ports. When the connection is actually made though, the tuple
(local host, local port, rem host, rem port) must be unique (TCP spec).
Winsock properly enforces this (if it didnt, it would be very broken).
So what happens when you open many socket pairs and close them? You
end up with lots of (127.0.0.1, xxx, 127.0.0.1, yyy) connections in
time wait state. Winsock lets you freely bind over xxx and yyy. If
xxx and yyy are chosen such that they collide with an existing connection,
the socket pair function fails when trying to connect (EADDRINUSE).
To make matters worse, ports are bound to sequentially, and the full
range of ports are not used. End result -- socketpair fails when you
try to use it many times. I have previously posted a test case that
demonstrates this.
Ok... so how do you fix it? Get a non-repeating random number generator
(rand() ^ GetTickCount() should probably do), and implicitely bind to
random ports over the full port range (probably 1024-65535 is desired)
retrying whenever a bind cannot be performed. Then make the socket pair
connection. If the connection fails, retry again. NOTE: If you use
a repeating random number generator like rand() you will get burned!
(I did this first.. oops).
We are currently using our own userland implementation of socketpair
using this technique and it appears to be working well. I will write
a cygwinb19.dll implementation when I get time, if no one else beats me
to it.
Test case to verify TIME_WAIT/bind and socketpair bugs, and example
working socketpair code available on request.
Tim N.
- Raw text -