delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin-developers/2001/11/16/04:37:10

Mailing-List: contact cygwin-developers-help AT sourceware DOT cygnus DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-developers-subscribe AT sources DOT redhat DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin-developers/>
List-Post: <mailto:cygwin-developers AT sources DOT redhat DOT com>
List-Help: <mailto:cygwin-developers-help AT sources DOT redhat DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-developers-owner AT sources DOT redhat DOT com
Delivered-To: mailing list cygwin-developers AT sources DOT redhat DOT com
Date: Fri, 16 Nov 2001 10:36:57 +0100
From: Corinna Vinschen <vinschen AT redhat DOT com>
To: cygwin-developers AT cygwin DOT com
Subject: Re: TCP connections can occasionally fail because of a winsock bug
Message-ID: <20011116103657.H27452@cygbert.vinschen.de>
Reply-To: cygwin-developers AT cygwin DOT com
Mail-Followup-To: cygwin-developers AT cygwin DOT com
References: <20011115212156 DOT 5563 DOT qmail AT lizard DOT curl DOT com> <200111160258 DOT fAG2wVm27159 AT barbelith DOT montana DOT com>
Mime-Version: 1.0
User-Agent: Mutt/1.2.5i
In-Reply-To: <200111160258.fAG2wVm27159@barbelith.montana.com>; from bowman@montana.com on Thu, Nov 15, 2001 at 08:00:18PM -0700

On Thu, Nov 15, 2001 at 08:00:18PM -0700, robert bowman wrote:
> On Thursday 15 November 2001 14:21, you wrote:
> > I've dug deeply enough into this to determine that I believe the
> > problem is caused by a bug in winsock.  I can get the problem to
> > manifest itself completely independently from Cygwin.  See the full
> > description in the attached program, which one of my coworkers with an
> > MSDN subscription is going to forward to Microsoft to see what they
> > have to say about it.
> 
> For what it's worth, we recently encountered this problem in the ONC RPC 
> library. The original Sun code, and any revision I've been able to find, 
> binds a local port even on the TCP protocol. The same thing happens, with the 
> bind not failing, and the failure occurring on the connect. 
> 
> We depend on RPC heavily, and would see delays on startup when the inital 
> clnt_create would fail repeatedly. The RPC attempts to use a pool of local 
> ports, and will increment and retry if the bind fails -- but it doesn't.
> 
> This is not a cygwin issue; we are using the MKS/DataFocus NutCracker 
> toolkit. DataFocus provided the ported ONC RPC code but does not support it.  
> We have been tinkering with it in-house. The bind can be eliminated for some 
> improvement, in this case. 
> 
> There are other issues we are dealing with. I've forwarded a couple of the 
> emails to another programmer at work who is also working on NT/2000 socket 
> issues.
> 
> Interestingly enough, on Linux, the bind also fails unless the process has 
> root priveleges. However, the code only iterates on EADDRINUSE and the return 
> is not checked, so the connect succeeds. 
> 
> I, also, wrote a native testcase with the WSA calls and got the same results. 
> I did note that the OS expires the port eventually, but it takes 5 to 20 
> minutes. 
> 
> I believe the root of the problem is that both the remote host address and 
> local port are used to determine if the connection is unique. bind would fail 
> if anything other than ANY_ADDR is used, so at the time of the bind it isn't 
> known if the combination is unique. Only when the host address is known in 
> connect, will the combination fail.
> 
> Our problem was exacerbated by the fact several apps are typically started at 
> the same time on one station, and they are all trying to make RPC connections 
> to the server machine. The ONC RPC algo uses the pid to calculate  which port 
> to try first; with several clients starting and making several connection, 
> there would be groups of used ports; if a connection timed out, and the next 
> attempt moved into a cluster of ports being used by another app, the 
> clnt_create would fail many times, before it finally iterated into fresh 
> territory.

Thanks for that interesting description.  There's that SO_REUSEADDR
call to setsockopt().  I wonder if that could be a help.  It's
treated somewhat dangerous, though. 

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Developer                                mailto:cygwin AT cygwin DOT com
Red Hat, Inc.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019