Mailing-List: contact cygwin-developers-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner AT cygwin DOT com Delivered-To: mailing list cygwin-developers AT cygwin DOT com Message-ID: <005801c23730$02304170$6132bc3e@BABEL> From: "Conrad Scott" To: Cc: "Pierre A. Humblet" References: <010901c23724$96e5d430$6132bc3e AT BABEL> <3D4581E4 DOT BB580995 AT ieee DOT org> Subject: Re: TCP problems Date: Mon, 29 Jul 2002 19:44:44 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 "Pierre A. Humblet" wrote: > Conrad Scott wrote: > > *) On win98 (and possibly other non-NT systems) sockets don't seem > > to be released properly so with a long-running server you get > > WSAENOBUFS errors (sooner or later) and no clients can attach > > until the server is restarted. This is what I'm trying to > > understand right now (w/ no success as yet) --- an "equivalent" > > server using winsock2 directly doesn't suffer from this problem. > > > > *) There are a couple of reported bugs I've come across in the > > MSDN archives that need to be worked around but aren't currently > > (AFAICT). For example, see "BUG: Closesocket() on a Duplicated > > Socket Fails to Clean Up" > > (http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q198663&) > > and "INFO: WSA_FLAG_OVERLAPPED Is Needed for Non-Blocking Sockets" > > (http://support.microsoft.com/default.aspx?scid=kb;[LN];Q179942). > > Those two are the same, AFAIK. > The problem occurs when the primary socket is closed before the > duplicated socket (in another process). Does your "equivalent" > server do that? Sorry: I should have been clearer there. No: my test cygwin server doesn't duplicate any of its sockets (AFAICT etc. but I'm pretty sure). It's a really simple server: blocking accept, read/write on the new file descriptor, then shutdown/close it and back to a blocking accept. And it still hits the WASENOBUFS wall eventually (altho' it can be delayed by registry patches to increase various TCP parameters). > The solution I implemented in some test code (and which runs fine, > but uses a non-unix "close on fork" fcntl) is the second one, > i.e. "The other possibility...". > I have scratched my head about the "dummy tcp socket" and tried > various things, without success. Have you experimented with that? I haven't experimented with it yet and it does look wierd. The code would need to detect a close of a duplicated socket so it would need a new flag in the fhandler_socket structure to do it right (I suppose I could just add a dummy socket/closesocket call regardless to see if it has any affect). In general I've done less work with dup(2) and fork(2) than the other problems so far. > > One idea I've had is to extend the semphore work I put into the > > UNIX domain socket patch to allow the DLL to detect the last close > > of a socket if it's been duplicated by whatever means. This would > > allow the DLL to close the socket "properly" (e.g. non-blocking + > > shutdown(2) + linger as appropriate). > > I am not sure this does it (perhaps I don't understand what you mean). > As I recall, calling shutdown makes the socket to not appear > CLOSE_WAIT in netstat -a, but you still get the WSAENOBUFS after > a while. Again, the key is to delay closing the primary socket. The idea would be to detect the last close of a given socket system wide; so it wouldn't matter whether the parent or child or whatever was the last to close the socket. Thus, now that we know it's the last close and thus there can be no other operations outstanding on the socket (or none that we need to worry about: the code is closing the socket after all), we can close it with shutdown and linger delays etc. This solves the problem with a client closing a socket w/o shutdown and exiting, thus leading to data loss, which the current linger mod. in fhandler_socket::close addresses. It doesn't solve the "must close first socket last of all" problem: I wanted to see how far we could get without that being done anywhere, especially as I can't see any fix for that short of drastic surgery (all sockets opened in the cygserver that keeps them until a client detects last close . . .? yuck but possible except for systems where sockets are used to communicate with cygserver: oops). Thanks for the comments even if none of them suggest easy ways around these problems :-) // Conrad