X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; q=dns; s=default; b=YY+3vsSUAQS1Cnzk5LpbqfPg/LY3ph/WwDr4QdN9+MS kyPXPHrh6WdtQpdsg1JsW1Qh4v3UvXR+CRl2KegrszoyQS4jvboHvoXmuf/OI6H9 N5gGWCo/m1WlP5hS2iXvPrevGlZj6gyYAmxFB/QAlLsfBDbvyv6qYkI8PxJb8QhA = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; s=default; bh=YRhQK0P2xMp/0OmwopHmnP5vGK4=; b=a6K6PmJgkUIvdLjJF /JbKLBmSNq5l9l04tLLbJIjTNdSS82UXBL6/XEg8F7p7IryWaod6/2JCYMNQlvkq t/DF9DtVPL70d4cB115j6RgclFxyjL11d5Huk97uW9+3R0N9Rd7AyrZyITYNj55+ PHvcYlt+16ANDZGIVTXdlmjURQ= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: limerock04.mail.cornell.edu X-CornellRouted: This message has been Routed already. Message-ID: <53E0CC2D.4080305@cornell.edu> Date: Tue, 05 Aug 2014 08:21:01 -0400 From: Ken Brown User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: (call-process ...) hangs in emacs References: <53DB8D23 DOT 7060806 AT alice DOT it> <20140801133225 DOT GD25860 AT calimero DOT vinschen DOT de> <53DEDBBA DOT 20102 AT cornell DOT edu> <20140804080034 DOT GA2578 AT calimero DOT vinschen DOT de> <53DF8BDC DOT 8090104 AT cornell DOT edu> <20140804134526 DOT GK2578 AT calimero DOT vinschen DOT de> In-Reply-To: <20140804134526.GK2578@calimero.vinschen.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes On 8/4/2014 9:45 AM, Corinna Vinschen wrote: > On Aug 4 09:34, Ken Brown wrote: >> On 8/4/2014 4:00 AM, Corinna Vinschen wrote: >>> On Aug 3 21:02, Ken Brown wrote: >>>> On 8/1/2014 9:32 AM, Corinna Vinschen wrote: >>>>> It could be a problem with the new default pthread mutexes being >>>>> NORMAL, rather then ERRORCHECK mutexes. >>>> >>>> That does seem to be the problem, since I can reproduce the bug starting >>>> with the 2014-07-14 snapshot. More precisely, I can reproduce it using >>>> emacs-nox (which is what the OP was using according to his cygcheck output) >>>> but not using emacs-X11 or emacs-w32. >>>> >>>> I tried running emacs under gdb with a breakpoint at call_process, but all I >>>> could see from that is that emacs tries to fork a subprocess, but the call >>>> to fork() never returns. I also tried running it under strace, but again >>>> all I can see is that fork() is called and then everything seems to be at a >>>> standstill. >>>> >>>> Corinna, if you want to take a look, here's the precise recipe: >>>> >>>> 1. emacs-nox -Q [This should start emacs and put you in the *scratch* >>>> buffer.] >>>> >>>> 2. Enter the following text into the buffer: >>>> >>>> (call-process "pwd" nil t) >>>> >>>> 3. Position the cursor at the end of the line and type Ctrl-j. >>>> >>>> What should happen, and what does happen prior to the 2014-07-14 snapshot, >>>> is that the current directory is displayed, followed by the exit code of 0. >>>> What happens instead is that emacs appears to hang. >>> >>> How does emacs start a process? Does it create a thread and then >>> forks and execs from the thread? Does it use its own pthread_mutex >>> to control the job? Is there a chance to create an STC of this >>> process? >> >> emacs does some bookkeeping and then calls vfork. It does not create a new >> thread, nor does it create a pthread_mutex. The only pthread_mutexes >> created anywhere in the emacs source code are in its implementation of >> malloc and friends, not in anything directly related to controlling >> subprocesses. (FWIW, this malloc implementation is used in the Cygwin build >> of emacs but not in the Linux build.) > > Can you take a close look here? This malloc will be used by Cygwin > as well if it's implemented in the usual way and... > >> I did think about trying to create an STC, but I'm stymied because the >> problem depends so strongly on how emacs is run: >> >> - If emacs is run interactively, the problem only occurs with emacs-nox, >> not with emacs-X11 or emacs-w32. >> >> - If emacs is run non-interactively (i.e., in batch mode), the problem >> occurs with emacs-w32 and emacs-X11 too, as Angelo and Katsumi pointed out >> earlier in the thread. >> >> I can't think of any way to capture these peculiarities in an STC. > > ...this, and the fact that fork/exec (vfork == fork on Cygwin) still > works nicely in other scenarios points to some problem with the usage of > pthread_mutexes in the application may be the culprit. > > For instance, is it possible that emacs expects the pthread_mutexes > in malloc to be ERRORCHECK mutexes? What if you explicitely set > them to ERRORCHECK at creation time? That doesn't seem to be the issue, but I think I did find the problem, and it looks like there might be both an emacs bug and a Cygwin bug. Here's the relevant code from emacs's gmalloc.c: pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; [...] /* Some pthread implementations call malloc for statically initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ pthread_mutex_init (&_malloc_mutex, NULL); pthread_mutex_init (&_aligned_blocks_mutex, NULL); The pthread_mutexes are initialized twice, resulting in undefined behavior according to Posix. That's the emacs bug. But simply removing the static initialization doesn't fix the problem. On the other hand, the following patch does seem to fix it, at least in preliminary testing: === modified file 'src/gmalloc.c' --- src/gmalloc.c 2014-03-04 19:02:49 +0000 +++ src/gmalloc.c 2014-08-05 01:35:38 +0000 @@ -490,8 +490,8 @@ } #ifdef USE_PTHREAD -pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; -pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_mutex_t _malloc_mutex; +pthread_mutex_t _aligned_blocks_mutex; int _malloc_thread_enabled_p; static void @@ -526,8 +526,11 @@ initialized mutexes when they are used first. To avoid such a situation, we initialize mutexes here while their use is disabled in malloc etc. */ - pthread_mutex_init (&_malloc_mutex, NULL); - pthread_mutex_init (&_aligned_blocks_mutex, NULL); + pthread_mutexattr_t attr1, attr2; + pthread_mutexattr_settype (&attr1, PTHREAD_MUTEX_NORMAL); + pthread_mutexattr_settype (&attr2, PTHREAD_MUTEX_NORMAL); + pthread_mutex_init (&_malloc_mutex, &attr1); + pthread_mutex_init (&_aligned_blocks_mutex, &attr2); pthread_atfork (malloc_atfork_handler_prepare, malloc_atfork_handler_parent, malloc_atfork_handler_child); The first hunk avoids the double initialization, but I don't understand why the second hunk does anything. Since PTHREAD_MUTEX_NORMAL is now the default, shouldn't calling pthread_mutex_init with NULL second argument be equivalent to my calls to pthread_mutexattr_settype? Does this indicate a Cygwin bug, or am I misunderstanding something? Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple