X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=ctwPDACWsLqarPv6 cEL3N4EN0vtEPk48cKYTmiP79+4Hctjr6qbws3ExbZASxkNtV7E8fYbBZBvK3p3d VivDiMfFypV/5Qqho/2raAFK1d/kjM8Tja9VwYm/GO7Jyt4ikIa+Of48PvMznOAx 5B8Lc3hxnIex60btAwbjpPteqO0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=fygOXk14rYsn6uamwAr74U 8/m50=; b=XJ0QSwc7JyG8v4npXc+nMUC9KB3hzQ1MiaPgB4HfJT/jdmzprfpr1F s799gVcgkHGbCEEs2GSS15xwZBP7wQhp3cQDsvf02j7gYTYQswNYZD0C5tnhB0Xe q1Zc1OfluCFbDYz+1vEFmXHHaP4NQai4+7Ge3mm0FE8JAzEaBq0hM= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=H*Ad:U*mark, stc, STC, H*r:ip*192.168.1.100 X-HELO: m0.truegem.net Subject: Re: Problem with zombie processes To: cygwin AT cygwin DOT com References: <58A3598F DOT 2020405 AT maxrnd DOT com> <58A773C9 DOT 1080905 AT maxrnd DOT com> <58AACADF DOT 6080101 AT maxrnd DOT com> From: Mark Geisert Message-ID: <58AB73B5.6040104@maxrnd.com> Date: Mon, 20 Feb 2017 14:54:45 -0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Erik Bray wrote: > On Mon, Feb 20, 2017 at 11:54 AM, Mark Geisert wrote: >>> So my guess was that Cygwin might try to hold on to a handle to a >>> child process at least until it's been explicitly wait()ed. But that >>> does not seem to be the case after all. >> >> >> You might have missed a subtlety in what I said above. The Python >> interpreter itself is calling wait4() to reap your child process. Cygwin >> has told Python one of its children has died. You won't get the chance to >> wait() for it yourself. Cygwin *does* have a handle to the process, but it >> gets closed as part of Python calling wait4(). > > To be clear, wait4() is not called from Python until the script > explicitly calls p.wait(). > In other words, when run this step by step (e.g. in gdb) I don't see a > wait4() call until the point where the script explicitly waits(). I > don't see any reason Python would do this behind the scenes. You're right. I missed the wait in your script and ASSumed too much of the Python interpreter :-( . >>> Anyways, I think it would be nicer if /proc returned at least partial >>> information on zombie processes, rather than an error. I have a patch >>> to this effect for /proc//stat, and will add a few more as well. >>> To me /proc//stat was the most important because that's the >>> easiest way to check the process's state in the first place! Now I >>> also have to catch EINVAL as well and assume that means a zombie >>> process. >> >> >> The file /proc//stat is there until Cygwin finishes cleanup of the >> child due to Python having wait()ed for it. When you run your test script, >> pay attention to the process state character in those cases where you >> successfully read the stat file. It's often S (stopped, I think) or R >> (running) but I also see Z (zombie) sometimes. Your script is in a race >> with Cygwin, and you cannot guarantee you'll see a killed process's state >> before Cygwin cleans it up. >> >> One way around this *might* be to install a SIGCHLD handler in your Python >> script. If that's possible, that should tell you when your child exits. > > Perhaps the Python script is a red herring. I just wrote it to > demonstrate the problem. The difference between where I send stdout > to is strange, but you're likely right that it just comes down to > subtle timing differences. Here's a C program that demonstrates the > same issue more reliably. Interestingly, it works when I run it in > strace (probably just because of the strace overhead) but not when I > run it normally. > > My point in all this is I'm confused why Cygwin would give up its > handles to the Windows process before wait() has been called. > > (In fact, it's pretty confusing to have fopen returning EINVAL which > according to [1] it should only be doing if the mode string were > invalid.) > > Thanks, > Erik > > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fopen.html O.K., you may be on to something amiss in the Cygwin DLL. Thanks for the STC in C; that'll help somebody looking further at this. I'm out of ideas. It might be possible to reduce strace overhead somewhat by selecting a smaller set of trace options than the default. ..mark -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple