Mail Archives: cygwin-developers/2001/11/15/22:04:28
On Thursday 15 November 2001 14:21, you wrote:
> I've dug deeply enough into this to determine that I believe the
> problem is caused by a bug in winsock. I can get the problem to
> manifest itself completely independently from Cygwin. See the full
> description in the attached program, which one of my coworkers with an
> MSDN subscription is going to forward to Microsoft to see what they
> have to say about it.
For what it's worth, we recently encountered this problem in the ONC RPC
library. The original Sun code, and any revision I've been able to find,
binds a local port even on the TCP protocol. The same thing happens, with the
bind not failing, and the failure occurring on the connect.
We depend on RPC heavily, and would see delays on startup when the inital
clnt_create would fail repeatedly. The RPC attempts to use a pool of local
ports, and will increment and retry if the bind fails -- but it doesn't.
This is not a cygwin issue; we are using the MKS/DataFocus NutCracker
toolkit. DataFocus provided the ported ONC RPC code but does not support it.
We have been tinkering with it in-house. The bind can be eliminated for some
improvement, in this case.
There are other issues we are dealing with. I've forwarded a couple of the
emails to another programmer at work who is also working on NT/2000 socket
issues.
Interestingly enough, on Linux, the bind also fails unless the process has
root priveleges. However, the code only iterates on EADDRINUSE and the return
is not checked, so the connect succeeds.
I, also, wrote a native testcase with the WSA calls and got the same results.
I did note that the OS expires the port eventually, but it takes 5 to 20
minutes.
I believe the root of the problem is that both the remote host address and
local port are used to determine if the connection is unique. bind would fail
if anything other than ANY_ADDR is used, so at the time of the bind it isn't
known if the combination is unique. Only when the host address is known in
connect, will the combination fail.
Our problem was exacerbated by the fact several apps are typically started at
the same time on one station, and they are all trying to make RPC connections
to the server machine. The ONC RPC algo uses the pid to calculate which port
to try first; with several clients starting and making several connection,
there would be groups of used ports; if a connection timed out, and the next
attempt moved into a cluster of ports being used by another app, the
clnt_create would fail many times, before it finally iterated into fresh
territory.
- Raw text -