delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2014/08/18/08:30:51

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
q=dns; s=default; b=twryS+i2zg8D+hQzZ9ckl33i97gOij9zp6/bOKcAT7v
FGv8gA/q0UZSw00k4gyS4O5Iy5GKWygquWVTrHbdC9A1SE1An0SAsx1DrOTiTP1g
lc1dieEsEsW+OebMc+UhsZGomoEoqa/53o1q9bUW1coRGOdY8AwmZ/vV9m6fX610
=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
s=default; bh=aBRSv8qhYd5wIQnQMeOU8i9fYP4=; b=K+B/SWu4ToZmdhsHG
zZUNCQOmQDFwgC13EuvyTRUYRoibJfbu6ZJxqdWkJO9uOR29lqLF+A8I5PNXZVxU
OLyjWUFQA59TNjcHBZZKgUTlTg8V1pldOawMQqCol72Pif+JGxteofJwUgbTvCCH
EYax8CgaRZrCa8HfdjLFrqwOEk=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_05,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2
X-HELO: limerock04.mail.cornell.edu
X-CornellRouted: This message has been Routed already.
Message-ID: <53F1F154.1020702@cornell.edu>
Date: Mon, 18 Aug 2014 08:28:04 -0400
From: Ken Brown <kbrown AT cornell DOT edu>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: (call-process ...) hangs in emacs
References: <53DB8D23 DOT 7060806 AT alice DOT it> <CAK9Gx1cjj-7cDP7CunD7Bxz35L+SU9+4Ro3HRot5cwjcArudOA AT mail DOT gmail DOT com> <20140801133225 DOT GD25860 AT calimero DOT vinschen DOT de> <53DEDBBA DOT 20102 AT cornell DOT edu> <20140804080034 DOT GA2578 AT calimero DOT vinschen DOT de> <53DF8BDC DOT 8090104 AT cornell DOT edu> <20140804134526 DOT GK2578 AT calimero DOT vinschen DOT de> <53E0CC2D DOT 4080305 AT cornell DOT edu> <20140805135830 DOT GA9994 AT calimero DOT vinschen DOT de> <53E11A93 DOT 9070800 AT cornell DOT edu> <20140805184047 DOT GC13601 AT calimero DOT vinschen DOT de> <53E3685B DOT 8050508 AT cornell DOT edu> <53E39BAD DOT 3010004 AT redhat DOT com> <53E3CB46 DOT 1020909 AT cornell DOT edu> <53E3F2AE DOT 7030608 AT redhat DOT com> <53E4D01B DOT 9010005 AT cornell DOT edu>
In-Reply-To: <53E4D01B.9010005@cornell.edu>
X-IsSubscribed: yes

On 8/8/2014 9:26 AM, Ken Brown wrote:
> On 8/7/2014 5:42 PM, Eric Blake wrote:
>> On 08/07/2014 12:53 PM, Ken Brown wrote:
>>> On 8/7/2014 11:30 AM, Eric Blake wrote:
>>>> On 08/07/2014 05:51 AM, Ken Brown wrote:
>>>>>
>>>>> I think I found the problem with NORMAL mutexes.  emacs calls
>>>>> pthread_atfork after initializing the mutexes, and the resulting
>>>>> 'prepare' handler locks the mutexes.  (The parent and child handlers
>>>>> unlock them.)  So when emacs calls fork, the mutexes are locked, and
>>>>> shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock.
>>>>> Here's a gdb backtrace showing the sequence of calls:
>>>>
>>>> Arguably, that's an upstream bug in emacs.  POSIX has declared
>>>> pthread_atfork to be fundamentally useless; it is broken by design,
>>>> because you cannot use it for anything that is not async-signal-safe
>>>> without risking deadlock.  And (except for sem_post()), NONE of the
>>>> standardized locking functions are async-signal-safe.
>>>>
>>>> http://austingroupbugs.net/view.php?id=858
>>>>
>>>> That said, it would still be nice to support this, since even though
>>>> the
>>>> theory says it is broken, there are still lots of (broken)
>>>> programs/libraries still trying to use it.
>>>
>>> So what do you think emacs should do instead of using pthread_atfork? Or
>>> is it better to just remove it?  I don't know how likely it is that this
>>> would cause a problem.
>>
>> The POSIX recommendation is that multithreaded apps limit themselves
>> solely to async-signal-safe functions in the window between fork and
>> exec (or to use pthread_spawn instead of fork/exec).  I don't know what
>> emacs is trying to do in that window, but at this point, it's certainly
>> worth reporting it upstream.  If you need a pointer to the full list of
>> async-signal-safe functions:
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>>
>> and search for "The following table defines a set of functions that
>> shall be async-signal-safe."
>>
>> The most common deadlocks when violating async-signal-safety rules look
>> like this in single-threaded programs:
>>
>> function calls malloc()
>>    malloc() grabs a non-recursive mutex
>>      async signal arrives
>>        signal handler called
>>          signal handler calls malloc()
>>            malloc() can't grab the mutex - deadlock
>>
>> and this counterpart in multithreaded programs:
>>
>> thread1 calls malloc()
>>    malloc() grabs a non-recursive mutex
>> thread 2 gains control and calls fork()
>>    because of the fork, thread1 no longer exists to release the lock
>>    child process calls malloc()
>>      malloc() tries to grab mutex, but it is locked with no thread to
>> release it
>>
>> Switching malloc() to a recursive lock may or may not "solve" the
>> single-threaded deadlock (in that malloc can now obtain the mutex), but
>> it is probably NOT what you want to happen (unless malloc is fully
>> re-entrant, the inner instance will see incomplete data and either be
>> totally clobbered itself, or else totally clobber the outer instance
>> when it returns).  So it's GOOD that malloc does NOT use a recursive
>> mutex by default.
>>
>> In the multithreaded case, you are flat out hosed. Switching to a
>> recursive lock does not change the picture - you are still deadlocked
>> waiting on thread1 to release the lock, but thread1 doesn't exist.
>
> Thanks for the explanations, Eric.  I've filed an emacs bug report:
>
>    http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222

I've just made a new emacs test release that includes a workaround for 
this bug.  I think I see a way to make emacs use Cygwin's malloc; if 
this works, it will provide a better fix for the bug.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019