delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2017/04/23/18:46:07

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=KQHu1FmL9SFO7YIL
C1K+JNlAU2DvUrCPcsvWd0XUmpJ6gK+cmXI17FuSpmQ8/wajnzjWfs0aZ/gWc8Xp
4JlQirKk0gju94ATC/M6cwwsNpgx/G9OVSinZQjJroYp8LA4LmqNO6+hpgLlIbN/
oxmOtk6lwvWOPhjwugovpQcRn30=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=3gP5X0oxBAGNMR55eF2804
UEYQY=; b=ZMIpF2R4cV0qcK7i15Yccs7Dwi8nEgfSMI44sFHCoLKYpg2NEXSrlO
lRTovezXK90RvRRzRE5BJz6V+mH0UEyFefHTfiQVcUeVVRwxWjd+/ygnjLMPm+jY
JKMKVoca15T7WVhOVD7+WAb+buIvYnli1Y//ilTP/GvANxjg2xZNk=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=ancillary, ear, patchset, interesting!
X-HELO: sasl.smtp.pobox.com
Subject: Re: long I/O delays when strace is running
To: cygwin AT cygwin DOT com
References: <be8713ec-5b68-e0e9-c5a5-d48fce491a2d AT pobox DOT com> <91DCAC3CB99C724EB365BB64677FBE7B16EBBD AT MX204CL04 DOT corp DOT emc DOT com> <46e39bce-9782-5c74-c196-35ee97ebbc64 AT pobox DOT com> <ace81c86-80d5-bef9-6b93-7d36dde5a4e8 AT pobox DOT com> <bff9c5ea-85d9-bf4e-2f25-271c1490565f AT pobox DOT com> <58F9D314 DOT 2000100 AT maxrnd DOT com> <7d56e728-32a3-9179-b002-e83880e297d0 AT pobox DOT com> <58FA83DB DOT 3000209 AT maxrnd DOT com> <a4afe377-3b4f-8373-8116-e812a8d82fbe AT pobox DOT com> <58FC5AFB DOT 3010903 AT maxrnd DOT com>
From: Daniel Santos <daniel DOT santos AT pobox DOT com>
Message-ID: <142a2f3c-fc03-a58d-bc67-ac27d04573f5@pobox.com>
Date: Sun, 23 Apr 2017 17:50:29 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <58FC5AFB.3010903@maxrnd.com>
X-Pobox-Relay-ID: 98F6A21E-2876-11E7-8E6D-C260AE2156B6-06139138!pb-smtp2.pobox.com
X-IsSubscribed: yes

Unfortunately, I don't have much time to spend on this issue as the gcc8 
stage1 has started and I have a few more issues to clear up with my 
patchset.

On 04/23/2017 02:42 AM, Mark Geisert wrote:
>> Anyway, I can see that the strace process's shared _pinfo object is 
>> never fully
>> populated:
>>
>> _pinfo 0x30000 {
>>    pid 2800,
>>    process_state 0x00000001,
>>    ppid 0,
>>    exitcode 0
>>    cygstarted 0,
>>    dwProcessId 0x00000AF0,
>>    progname "D:\cygwin64\bin\strace.exe",
>>    uid 0,
>>    gid 0,
>>    pgid 0,
>>    sid 0,
>>    ctty 0,
>>    has_pgid_children 0
>>    start_time 1492881370,
>>    nice 0,
>>    stopsig 0,
>>    sendsig 0x0,
>>    exec_sendsig 0x0,
>>    exec_dwProcessId 0
>> }
>
> Again, strace.exe is a Windows executable, so perhaps some of those 
> fields don't make sense for a non-Cygwin process and are not 
> initialized?  Purely speculation on my part.

Oh, I understand now, thanks. :)  So it doesn't link to cygwin1.dll (or 
any other cygwin libs), that makes sense.  So the flaw is probably 
thinking that this executable *should* have uid, guid, ppid, etc.  Yet, 
it exists in the cygwin process database (apparently a bunch of shared 
(probably anonymous) files?).  So the mistake is either listing it in 
the database or not accounting for the possibility of strace, the 
semi-cygwin program?  Maybe there should be (or is?) a flag to tell 
readers of the cygwin process database that this is a "special case" 
process?

>> So I would venture to say that is a problem.  Also, pinfo::init() should
>> probably issure some error message if it waits 2-ish seconds and the 
>> struct
>> still isn't correctly populated.
>
> That seems right.  I unfortunately don't know why the code presumes 
> the struct is always populated within a certain (small) amount of 
> time.  The complaint comment about minimum possible sleep duration 
> sure makes it seem like it's always supposed to be populated very 
> quickly.
>

Yes, and not knowing cygwin's architecture it's hard for me to guess 
why, although I can do a git blame and try to understand when the code 
was put in.  Also, anything like this usually screams race condition in 
my ear, but I can't say that w/o really understanding it well and what 
assumptions are being made.  For instance, if another thread/process 
could really modify this then reads should be done using known atomic 
instructions.  On 32-bit x86, iirc, a mov of the machine word size is 
always atomic, i.e., either you get an intact old value or you get an 
intact new value, you should never get two bytes of the new value and 
two bytes of the old.  But when I'm writing C code, I never want to 
presume what the compiler will emit for situations like this and it's 
better to use some atomic read/write macro/inline, even though I can't 
really imagine this particular snippet not using a simple mov, using an 
explicit "atomic" function/macro conveys the intention.

I should note that in the case of trying to analyze this problem with 
expect, I allowed a make -kj8 check to run for a few days (should take 
maybe 4 hours) and I never had the race condition. Presumably, if I 
allowed it to run for a very long period or time (months or years) it 
would have likely occurred.

>> Is there a way to debug the children of strace?  It would make it a 
>> lot easier.
>> That's part of why I wrote the _pinfo::debug(), but also when I debug 
>> strace
>> with gdb, the _pinfo struct IS properly populated.  The best problems 
>> are the
>> ones that disappear when you try to debug them.
>
> strace always acts as the debugger of the target process you start 
> strace with (or attach to; see '-p' in strace's help).  strace has a 
> switch '-f' == '--trace-children' that defaults to being ON.  So by 
> default strace is getting DEBUG_EVENTs from the target strace launched 
> as well as any process the target creates.
>
> If you explicitly set the '-f' flag, you're actually turning OFF that 
> default and *only* the target process sends DEBUG_EVENTs.  In that 
> case any process the target creates will be invisible to strace.  You 
> could conceivably debug those sub-processes with gdb but you likely 
> won't be able to catch them at their startup unless they wait for your 
> attach.
>

Very interesting!  Is it possible to have two processes debugging and 
have strace forward debug events that it isn't interested in to another 
debugger in the chain?  I'm probably just talking crazy here.  Either 
way, that's ancillary to fixing the problem.

Daniel

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019