delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2014/08/08/09:27:26

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
q=dns; s=default; b=tkfPhzB8GeOI+pdouZI8J+QHaPejubn2P0uzWm0Hw72
YkDzJb0ujAdgDigRNKNOH51qrfL1swzHiaLo2vEbNOYdvZC9NsGVQC00yT4t/VN/
urFx9pLrkmImF+M/Gcs5HplR9ybnY8r/9MzgCAYkBlTqqiEvC1x4laD3L+cW5I+Q
=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
s=default; bh=twgYQ5tmgNhf3rFc1zbXAm67Vr8=; b=uFagcIuU88s0GrJnH
TdNAdulwHNIVd63Ae1QVN72OoQ3tkojR2qown8kncQZ++96kQkLhX5AcgXzVUv9u
uKsWolWq5T9VxIQsCjH/2rKymoXFVOQePPLY3pSow+7qBbujSg77ENoKgx6Hc/nD
2kOEtBcP2y79p9M70UosVw0dk0=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2
X-HELO: limerock02.mail.cornell.edu
X-CornellRouted: This message has been Routed already.
Message-ID: <53E4D01B.9010005@cornell.edu>
Date: Fri, 08 Aug 2014 09:26:51 -0400
From: Ken Brown <kbrown AT cornell DOT edu>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: (call-process ...) hangs in emacs
References: <53DB8D23 DOT 7060806 AT alice DOT it> <CAK9Gx1cjj-7cDP7CunD7Bxz35L+SU9+4Ro3HRot5cwjcArudOA AT mail DOT gmail DOT com> <20140801133225 DOT GD25860 AT calimero DOT vinschen DOT de> <53DEDBBA DOT 20102 AT cornell DOT edu> <20140804080034 DOT GA2578 AT calimero DOT vinschen DOT de> <53DF8BDC DOT 8090104 AT cornell DOT edu> <20140804134526 DOT GK2578 AT calimero DOT vinschen DOT de> <53E0CC2D DOT 4080305 AT cornell DOT edu> <20140805135830 DOT GA9994 AT calimero DOT vinschen DOT de> <53E11A93 DOT 9070800 AT cornell DOT edu> <20140805184047 DOT GC13601 AT calimero DOT vinschen DOT de> <53E3685B DOT 8050508 AT cornell DOT edu> <53E39BAD DOT 3010004 AT redhat DOT com> <53E3CB46 DOT 1020909 AT cornell DOT edu> <53E3F2AE DOT 7030608 AT redhat DOT com>
In-Reply-To: <53E3F2AE.7030608@redhat.com>
X-IsSubscribed: yes

On 8/7/2014 5:42 PM, Eric Blake wrote:
> On 08/07/2014 12:53 PM, Ken Brown wrote:
>> On 8/7/2014 11:30 AM, Eric Blake wrote:
>>> On 08/07/2014 05:51 AM, Ken Brown wrote:
>>>>
>>>> I think I found the problem with NORMAL mutexes.  emacs calls
>>>> pthread_atfork after initializing the mutexes, and the resulting
>>>> 'prepare' handler locks the mutexes.  (The parent and child handlers
>>>> unlock them.)  So when emacs calls fork, the mutexes are locked, and
>>>> shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock.
>>>> Here's a gdb backtrace showing the sequence of calls:
>>>
>>> Arguably, that's an upstream bug in emacs.  POSIX has declared
>>> pthread_atfork to be fundamentally useless; it is broken by design,
>>> because you cannot use it for anything that is not async-signal-safe
>>> without risking deadlock.  And (except for sem_post()), NONE of the
>>> standardized locking functions are async-signal-safe.
>>>
>>> http://austingroupbugs.net/view.php?id=858
>>>
>>> That said, it would still be nice to support this, since even though the
>>> theory says it is broken, there are still lots of (broken)
>>> programs/libraries still trying to use it.
>>
>> So what do you think emacs should do instead of using pthread_atfork? Or
>> is it better to just remove it?  I don't know how likely it is that this
>> would cause a problem.
>
> The POSIX recommendation is that multithreaded apps limit themselves
> solely to async-signal-safe functions in the window between fork and
> exec (or to use pthread_spawn instead of fork/exec).  I don't know what
> emacs is trying to do in that window, but at this point, it's certainly
> worth reporting it upstream.  If you need a pointer to the full list of
> async-signal-safe functions:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
> and search for "The following table defines a set of functions that
> shall be async-signal-safe."
>
> The most common deadlocks when violating async-signal-safety rules look
> like this in single-threaded programs:
>
> function calls malloc()
>    malloc() grabs a non-recursive mutex
>      async signal arrives
>        signal handler called
>          signal handler calls malloc()
>            malloc() can't grab the mutex - deadlock
>
> and this counterpart in multithreaded programs:
>
> thread1 calls malloc()
>    malloc() grabs a non-recursive mutex
> thread 2 gains control and calls fork()
>    because of the fork, thread1 no longer exists to release the lock
>    child process calls malloc()
>      malloc() tries to grab mutex, but it is locked with no thread to
> release it
>
> Switching malloc() to a recursive lock may or may not "solve" the
> single-threaded deadlock (in that malloc can now obtain the mutex), but
> it is probably NOT what you want to happen (unless malloc is fully
> re-entrant, the inner instance will see incomplete data and either be
> totally clobbered itself, or else totally clobber the outer instance
> when it returns).  So it's GOOD that malloc does NOT use a recursive
> mutex by default.
>
> In the multithreaded case, you are flat out hosed. Switching to a
> recursive lock does not change the picture - you are still deadlocked
> waiting on thread1 to release the lock, but thread1 doesn't exist.

Thanks for the explanations, Eric.  I've filed an emacs bug report:

   http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019