Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin AT sources DOT redhat DOT com Date: Mon, 6 Nov 2000 11:33:13 -0500 From: Christopher Faylor To: "'cygwin AT sources DOT redhat DOT com'" Subject: Re: showstopper bugs (boring technical details -- run away! run away!) Message-ID: <20001106113313.E1289@redhat.com> Reply-To: cygwin AT sources DOT redhat DOT com Mail-Followup-To: "'cygwin AT sources DOT redhat DOT com'" References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.6i In-Reply-To: ; from btown@ceddec.com on Mon, Nov 06, 2000 at 09:55:30AM -0500 On Mon, Nov 06, 2000 at 09:55:30AM -0500, Town, Brad wrote: >Chris Faylor wrote: >>I've had a couple of show stopper bugs reported to me which, of course, >>I can't duplicate, so I've held off on the release until I can either >>duplicate and fix them or someone else can fix them (hah). > >Arrgh! There's that "hah" again! :) > >Would it be possible for you to briefly recap the show-stopper bugs? >I'll help if I can. Wow. I've really stumbled onto something with the (hah). The showstopper bugs were (I'm using the past tense because I am such an incurable optimist) random errors from wait_subproc when logging in via ssh. Corinna reported them and since they were indicative of a serious problem in cygwin, I've been trying to track them down "in my spare time" (I'm supposed to be doing more managing and less programming). I duplicated the problems last night at around 9PM and checked in a fix at around 1AM. As I was triumphantly drifting off to sleep, I realized that some of my fix was questionable, so I have to redo it today. The problem was due to the way cygwin handles the 'exec' call. Since Windows has nothing that says "start a new process and give it the same pid", we have to kludge around this. So, when a program exec's, a stub sticks around waiting for an event from the newly "execed" process. When it gets the event, the stub opens the parent process with OpenProcess, duplicates a handle to the newly execed process into its parent, and then exits. The parent notices the exit, discovers that there is a new handle, for its child, does some bookkeeping and goes back to waiting for children to exit. The problem was that the process of contacting the parent was not 100% reliable. I don't know why this is now the case, but I worked around the problem by always passing a handle to the parent process to all of the children. This is something that I've wanted to do for a while anyway. In the process of fixing this bug, I stumbled across several other *#$! signal races which I worked around. Today, after a fresh night's sleep, I believe that I know how to fix them. Anyway, thanks for the offer. If you want to look at the code in question, it's in sigproc.cc (wait_subproc) and spawn.cc (spawn_guts). This is not for the faint of heart. I keep meaning to add more comments and document the whole sorry mess but I've never gotten around to it. By the way, I now need to do some laundry unless someone else gets around to it (hah). cgf -- Want to unsubscribe from this list? Send a message to cygwin-unsubscribe AT sourceware DOT cygnus DOT com