Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-Id: <3.0.5.32.20020504140632.00802590@mail.attbi.com> X-Sender: phumblet AT mail DOT attbi DOT com Date: Sat, 04 May 2002 14:06:32 -0400 To: cygwin AT cygwin DOT com From: "Pierre A. Humblet" Subject: Catalog of TCP socket problems Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" During the last year a number of TCP socket problems have been reported, mainly on Win98/ME. This message catalogs them and gives references to the discussion threads. The second part discusses some proposed solutions. Win98/ME 1) CLOSE_WAIT / WSAENOBUFS http://support.microsoft.com/default.aspx?scid=kb;EN-US;q229658 Application level fix: fcntl("close on fork") http://cygwin.com/ml/cygwin-patches/2002-q2/msg00039.html Cygwin level fix: Corinna's socket/pid bookkeeping http://cygwin.com/ml/cygwin-patches/2002-q2/msg00049.html 2) Steve Chew ssh -R / persisting listen sockets http://sources.redhat.com/ml/cygwin/2002-04/msg00515.html Application level fix: make socket blocking before close Cygwin level fix: make socket blocking before close http://cygwin.com/ml/cygwin-patches/2002-q2/msg00107.html 3) Unexpected exit from ssh or other "forked workers" http://cygwin.com/ml/cygwin-patches/2002-q2/msg00102.html Application level fix: fcntl("close on fork") Cygwin level fix: (???) do not duplicate "listen" sockets after an accept() has succeeded 4) Jonathan Kamens (below), with extra read() hanging while wait for EOF http://cygwin.com/ml/cygwin-patches/2002-q2/msg00117.html Application level fix: shutdown() Cygwin level fix: Corinna's socket/pid bookkeeping 5) Steve Chew ssh -R when no server is present http://sources.redhat.com/ml/cygwin/2002-04/msg00515.html Fix: ???????? NT 1) Jonathan Kamens socketpair() / linger on close hack http://cygwin.com/ml/cygwin/2001-07/msg00758.html Application level fix: shutdown() http://cygwin.com/ml/cygwin/2001-07/msg00815.html Cygwin level fix: Corinna's socket/pid bookkeeping 2) Apache CLOSE_WAIT http://sources.redhat.com/ml/cygwin/2001-10/msg01171.html Fix: ??????? ********************************************************************** As discussed in http://cygwin.com/ml/cygwin/2001-07/msg00815.html the best solution to the NT problem #1 and Win98 #4 is to have Cygwin issue shutdown() on the last close(). This was dismissed for now. The "bookkeeping" solution is based on processes and may be easier to implement. It also helps Win98/ME. Its drawback is that a read() waiting for EOF returns when all processes with a copy of the socket are done, not when the last close() occurs. As I see it, there are three main cases to consider in a bookkeeping solution, depending how much interprocess communication is required. 1) PID_A is a long lived process. It opens a socket, forks PID_B. PID_B forks other processes. When PID_B exits all subprocesses are already terminated. In that case it is enough for Cygwin in PID_A to really close the socket when PID_B terminates, if it has already been close() in PID_A. This can be accomplished without changes to the Cygwin interprocess communication mechanism, only local bookkeeping is required. It probably covers 90% of the applications (sshd, inetd (I think), qpopper, Jonathan Kamens example...). Looks like an excellent benefit/work ratio. 2) Same as in 1), but some subprocesses are still running (with ppid = 1 ) when PID_B exits. I see two solutions: 2a) PID_B "reparents (like)" the subprocesses, making PID_A wait for them and close the socket after they terminate. 2b) PID_B signals to PID_A that it is logically exited, but keeps running in a "angel" state until subprocesses are done. 3) PID_A exits while PID_B is still alive. If so, some kind of "angel" state would be necessary. By the way, I just tried that on WinME. PID_A does socketpair(), children B & C use it. Bug Win98 #4 occurs as expected. In addition, if parent is gone when child write(), get "Socket operation on non-socket". But if parent is gone and the reader has closed its (useless) write socket, then the write() succeeds. The list above is already incomplete :( Regarding solution Win98/ME #2, I think the easiest is to split the Cygwin socket close in two cases: a) NT: keep "linger on close" for now, it helps with #1. b) Win98/ME: set blocking, if not already set. Finally, having Cygwin work around MS bugs is much better than having applications do it. However if Cygwin doesn't do it, having hooks (e.g. "close on fork") to fix applications is better than nothing. Pierre -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/