X-Recipient: archive-cygwin@delorie.com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:references:from:message-id:date
	:mime-version:in-reply-to:content-type
	:content-transfer-encoding; q=dns; s=default; b=CbjQdsfTm3M/Q2mi
	zoSQUxHDzCl4LkkMHurw0lMVMNMINSaE4ypmneqZeoG+KDP0rJy26Kmfx/URAymw
	LHxXtn6BjrqpllWCUMeaWtfqaYT7owuWcRu0QXzRsxlsIx6L4mfRPQUZ0N65fXqH
	D4ceD3/rRdacEB8b+E85pxb2f7k=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:references:from:message-id:date
	:mime-version:in-reply-to:content-type
	:content-transfer-encoding; s=default; bh=N1yxJoEb6NN2LRjof0xvgT
	NBEq8=; b=sA3dWmFAVLUhBgVyYepgR2b4sxmpuBzXEFSytwHFec5lXILqoJVpPu
	me6erf1XfW3qQb2qRRrjRcdKmvr7bc9z1ybIXeNvjFesndZ4neGfDj77IxjssZAz
	8Zuu4lI6d16CaaKrbP/4mfCBCQ5DRkTlJjIdIo8tZD3cqqAFeQ8oQ=
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=H*Ad:U*mark, H*r:ip*192.168.1.100, Hx-spam-relays-external:!192.168.1.100!, H*RU:!192.168.1.100!
X-HELO: m0.truegem.net
Subject: Re: Problem with zombie processes
To: cygwin@cygwin.com
References: <CAOTD34bHSDJErA0B8Qt8Zqi54ciV5ZpRJdTa_pGs9Mp2PERsuw@mail.gmail.com> <58A3598F.2020405@maxrnd.com> <CAOTD34Z7VM=6=Ss_gCLS97c4sFNpnaT-+RgJq+xme-VyWYbbpw@mail.gmail.com> <58A773C9.1080905@maxrnd.com> <CAOTD34ZHspOy0kSrxNbZCEDj++gRFUQOh2rmE08N9TZt3wXVrw@mail.gmail.com>
From: Mark Geisert <mark@maxrnd.com>
Message-ID: <58AACADF.6080101@maxrnd.com>
Date: Mon, 20 Feb 2017 02:54:23 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40
MIME-Version: 1.0
In-Reply-To: <CAOTD34ZHspOy0kSrxNbZCEDj++gRFUQOh2rmE08N9TZt3wXVrw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Erik Bray wrote:
> On Fri, Feb 17, 2017 at 11:06 PM, Mark Geisert <REMOVED> wrote:
>> Erik Bray wrote:
>>>
>>> On Tue, Feb 14, 2017 at 8:25 PM, Mark Geisert <XXXX@XXXXXX.XXX> wrote:
>>
>>
>> Please don't quote raw email addresses.  We try to avoid feeding spammers.
>
> Sorry--normally replies on this ML are just back to the ML itself (my
> preference as well) so I wasn't expecting it.

Reiterating this: Please manually remove quoted email addresses from your replies.

>>>> Erik Bray wrote:
>>>>>
>>>>>
>>>>> The attached Python script
>>>>
>>>>
>>>> ??
>>>
>>>
>>> D'oh!  Here is the script.  It at least demonstrates the problem.
>>>
>> [...]
>>
>> Thanks!  Running this script repeatedly on my system (Win7, 2 cores / 4 HT
>> threads) showed no differences between your Test 1 and Test 2.  Each Test
>> concludes in one of three ways, seemingly randomly: (1) read of
>> /proc/<pid>/stat succeeds and process status is displayed, (2) read fails
>> with Python IOError, (3) read apparently succeeds but there's no process
>> data displayed.
>>
>> An strace of your script shows Python itself is calling wait4() to reap the
>> child process.  So, as Doug suggested on another thread, the script's
>> actions are just subject to the whims of process scheduling and vary from
>> run to run.
>
> You're right.  The first time I was testing this, for whatever reason,
> I was getting *very* consistent results.  Test 1 *always* succeeded
> and test 2 always fails.  But trying it now, I am getting similar
> results.
>
> What I was going by was the docs for ExitProcess [1] which states:
>
> "Exiting a process does not necessarily remove the process object from
> the operating system. A process object is deleted when the last handle
> to the process is closed."
>
> So my guess was that Cygwin might try to hold on to a handle to a
> child process at least until it's been explicitly wait()ed.  But that
> does not seem to be the case after all.

You might have missed a subtlety in what I said above.  The Python interpreter 
itself is calling wait4() to reap your child process.  Cygwin has told Python 
one of its children has died.  You won't get the chance to wait() for it 
yourself.  Cygwin *does* have a handle to the process, but it gets closed as 
part of Python calling wait4().

> Anyways, I think it would be nicer if /proc returned at least partial
> information on zombie processes, rather than an error.  I have a patch
> to this effect for /proc/<pid>/stat, and will add a few more as well.
> To me /proc/<pid>/stat was the most important because that's the
> easiest way to check the process's state in the first place!  Now I
> also have to catch EINVAL as well and assume that means a zombie
> process.

The file /proc/<pid>/stat is there until Cygwin finishes cleanup of the child 
due to Python having wait()ed for it.  When you run your test script, pay 
attention to the process state character in those cases where you successfully 
read the stat file.  It's often S (stopped, I think) or R (running) but I also 
see Z (zombie) sometimes.  Your script is in a race with Cygwin, and you cannot 
guarantee you'll see a killed process's state before Cygwin cleans it up.

One way around this *might* be to install a SIGCHLD handler in your Python 
script.  If that's possible, that should tell you when your child exits.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

