X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Fri, 23 Nov 2012 17:44:04 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Possible race in SYSV IPC (semaphores) Message-ID: <20121123164404.GX17347@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <5F8AAC04F9616747BC4CC0E803D5907D012856 AT MLBXV09 DOT nih DOT gov> <20121123113605 DOT GN17347 AT calimero DOT vinschen DOT de> <20121123131020 DOT GR17347 AT calimero DOT vinschen DOT de> <20121123133332 DOT GU17347 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20121123133332.GU17347@calimero.vinschen.de> User-Agent: Mutt/1.5.21 (2010-09-15) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Nov 23 14:33, Corinna Vinschen wrote: > On Nov 23 14:10, Corinna Vinschen wrote: > > On Nov 23 12:36, Corinna Vinschen wrote: > > > On Nov 19 21:06, Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote: > > > > Hello again, > > > > > > > > I can now positively confirm the race condition in cygserver w.r.t. the named > > > > pipe used to serialize SYSV requests through the server. The race is due to > > > > that transport_layer_pipes::accept() (bool *const recoverable) (file: transport_pipes.cc) does actually _create_ the pipe when pipe_instance == 0 > > > > (ironically, transport_layer_pipes::listen() does not create any OS primitives > > > > at all!). > > > > > > > > This means that under heavy load, cygserver threads may all end up processing > > > > their requests and closing all instances of the pipe (bringing pipe_instance == 0) > > > > yet not being able to get to the point of accepting new request (that is, to > > > > re-create the pipe). For the client (user process), this looks like the pipe > > > > does not exist (during that very tiny period of time), and the following message > > > > gets printed: > > > > > > > > Iteration 3016 > > > > 1 [main] a 4872 transport_layer_pipes::connect: lost connection to cygserver, error = 2 > > > > > > Thanks for analyzing this situation. IIUC, that means if we create the > > > pipe in listen(), and then create another pipe instance per accepting > > > thread in accept(), we will always have at least one instance of the pipe, > > > whatever the load, right? > > > > > > Are you set up to test a patch? If so, I'd propose the below patch. It > > > would be nice if you could test it in your environment. > > > > Forget it. That doesn't work. > > > > Surprisingly, the next client calling CreateFile is *not* connected to > > any server side of the pipe which calls ConnectNamedPipe, but apparently > > the pipes are used in the order of creation. Thus, the first client > > hangs waiting for the pipe instance created in the listen() method, but > > since the server never calls ConnectNamedPipe on that instance, the > > client will wait forever. > > > > Bummer. > > > > Back to the drawing board... > > Try the below one. It also connects to the listening pipe in listen(), > so there's a single instance of the pipe connected and blocked for > further use forever. This should avoid the race (*and* work...) > Please give it a try. You can also simply try the today's snapshot version of cygserver from http://cygwin.com/snapshots/ It contains this patch, as well as the semaphore patch I mailed to the other thread. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple