Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Mon, 29 Mar 2004 16:16:51 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Postgres Backend doesn't catch the next command, after SIGUSR2 Message-ID: <20040329141651.GZ17229@cygbert.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20040309152842 DOT 40194 DOT qmail AT web60303 DOT mail DOT yahoo DOT com> <20040329121443 DOT 52094 DOT qmail AT web60307 DOT mail DOT yahoo DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040329121443.52094.qmail@web60307.mail.yahoo.com> User-Agent: Mutt/1.4.2i On Mar 29 04:14, Patrick Samson wrote: > ! The explanation is spotted in net.cc ! > > in wsock_event::wait() > > case WSA_WAIT_EVENT_0 + 1: > if (!CancelIo ((HANDLE) socket)) > { > debug_printf ("CancelIo() %E, fallback to > blocking io"); > WSAGetOverlappedResult (socket, &ovr, &len, TRUE, > flags); > } > else > WSASetLastError (WSAEINTR); > break; > > Most of the time, when signal_arrived is raised, > there is nothing but the EINTR code to set, and > the backend loops in recv() to receive the next > command. > But the race conditions may be different, and > the command is available at the same time the > signal is detected. > So the CancelIo() call discards the command. > When the backend returns in recv(), the command > is lost, and the sender waits for an answer > -> deadlock > > Why this CancelIo() ?? > It seems too intrusive. When WSAWaitForMultipleEvents returns WSA_WAIT_EVENT_0 + 1, you can be sure that the event hasn't happen at this point. Otherwise it would have returned WSA_WAIT_EVENT_0. Unfortunately this doesn't mean that the event couldn't happen a nanosecond later. If the signal has arrived and the WSARecvFrom call should be interrupted, you can't just go ahead, since the call to WSARecvFrom got a pointer to application allocated memory. You can't rely on the fact that the application will keep this memory intact after recvfrom returned with EINTR. If you do, Windows might scramble application memory. To avoid that, the CancelIo cancels the active call. Having said that, does the below change at least alleviates the problem? The implementation would have to be changed a bit more to get this entirely non-racy, though. > Additional note: > DWORD len; > is present in case WSA_WAIT_EVENT_0 > but is missing in case WSA_WAIT_EVENT_0 + 1 Thanks for catching this. I've applied a patch. Corinna Index: net.cc =================================================================== RCS file: /cvs/src/src/winsup/cygwin/net.cc,v retrieving revision 1.162 diff -u -p -r1.162 net.cc --- net.cc 29 Mar 2004 14:08:44 -0000 1.162 +++ net.cc 29 Mar 2004 14:09:17 -0000 @@ -83,7 +83,9 @@ wsock_event::wait (int socket, LPDWORD f ret = (int) len; break; case WSA_WAIT_EVENT_0 + 1: - if (!CancelIo ((HANDLE) socket)) + if (WSAGetOverlappedResult (socket, &ovr, &len, FALSE, flags)) + ret = (int) len; + else if (!CancelIo ((HANDLE) socket)) { debug_printf ("CancelIo() %E, fallback to blocking io"); WSAGetOverlappedResult (socket, &ovr, &len, TRUE, flags); -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Developer mailto:cygwin AT cygwin DOT com Red Hat, Inc. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/