X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; q=dns; s=default; b=C8tc7tn4TsXd7dD7RezO2rBtpVwS5neF006OWpdm9+R rHd6+ltKbhP9k/spBMlUosGrpRoh5HpRrruhZQuLIFTu8dWexf6Vge+LjxWAIKUe 6tPxhfq306YLZ3bINRDmfuEnIRDuGL4pTa4bEwoklMkO6JXVAzYVWbz23FyTheyE = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; s=default; bh=Fy/YPvgeKSlwCnN4mFxfKLXZT1o=; b=HRcXw0IborawaCUc3 Fmsk9c3QwaERaf2fuoKQZIscJta7BRTPk1zFhq33+oXly8QTTzDEN87hLCq+T4v4 Uw7VUlFivGiLjtZGTTOi5rn/4I7bcJ1EFHrKq1wRd6CfHQbS5fm+VhdJsQ0e1KMZ d3zo2BGprv0n5N0T9mP28yhsPU= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: limerock03.mail.cornell.edu X-CornellRouted: This message has been Routed already. Message-ID: <53E3685B.8050508@cornell.edu> Date: Thu, 07 Aug 2014 07:51:55 -0400 From: Ken Brown User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: (call-process ...) hangs in emacs References: <53DB8D23 DOT 7060806 AT alice DOT it> <20140801133225 DOT GD25860 AT calimero DOT vinschen DOT de> <53DEDBBA DOT 20102 AT cornell DOT edu> <20140804080034 DOT GA2578 AT calimero DOT vinschen DOT de> <53DF8BDC DOT 8090104 AT cornell DOT edu> <20140804134526 DOT GK2578 AT calimero DOT vinschen DOT de> <53E0CC2D DOT 4080305 AT cornell DOT edu> <20140805135830 DOT GA9994 AT calimero DOT vinschen DOT de> <53E11A93 DOT 9070800 AT cornell DOT edu> <20140805184047 DOT GC13601 AT calimero DOT vinschen DOT de> In-Reply-To: <20140805184047.GC13601@calimero.vinschen.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Hi Corinna, On 8/5/2014 2:40 PM, Corinna Vinschen wrote: > Hi Ken, > > On Aug 5 13:55, Ken Brown wrote: >> On 8/5/2014 9:58 AM, Corinna Vinschen wrote: >>> On Aug 5 08:21, Ken Brown wrote: >>>> === modified file 'src/gmalloc.c' >>>> --- src/gmalloc.c 2014-03-04 19:02:49 +0000 >>>> +++ src/gmalloc.c 2014-08-05 01:35:38 +0000 >>>> @@ -490,8 +490,8 @@ >>>> } >>>> >>>> #ifdef USE_PTHREAD >>>> -pthread_mutex_t _malloc_mutex = PTHREAD_MUTEX_INITIALIZER; >>>> -pthread_mutex_t _aligned_blocks_mutex = PTHREAD_MUTEX_INITIALIZER; >>>> +pthread_mutex_t _malloc_mutex; >>>> +pthread_mutex_t _aligned_blocks_mutex; >>>> int _malloc_thread_enabled_p; >>>> >>>> static void >>>> @@ -526,8 +526,11 @@ >>>> initialized mutexes when they are used first. To avoid such a >>>> situation, we initialize mutexes here while their use is >>>> disabled in malloc etc. */ >>>> - pthread_mutex_init (&_malloc_mutex, NULL); >>>> - pthread_mutex_init (&_aligned_blocks_mutex, NULL); >>>> + pthread_mutexattr_t attr1, attr2; >>>> + pthread_mutexattr_settype (&attr1, PTHREAD_MUTEX_NORMAL); >>>> + pthread_mutexattr_settype (&attr2, PTHREAD_MUTEX_NORMAL); >>>> + pthread_mutex_init (&_malloc_mutex, &attr1); >>>> + pthread_mutex_init (&_aligned_blocks_mutex, &attr2); >>>> pthread_atfork (malloc_atfork_handler_prepare, >>>> malloc_atfork_handler_parent, >>>> malloc_atfork_handler_child); >>>> >>>> >>>> The first hunk avoids the double initialization, but I don't understand why >>>> the second hunk does anything. Since PTHREAD_MUTEX_NORMAL is now the >>>> default, shouldn't calling pthread_mutex_init with NULL second argument be >>>> equivalent to my calls to pthread_mutexattr_settype? Does this indicate a >>>> Cygwin bug, or am I misunderstanding something? >>> >>> AFAICS you're missing something. Your pthread_mutexattr_t attr1, attr2 >>> are not initialized. They contain some random values, thus they are not >>> good objects. The calls to pthread_mutexattr_settype as well as the >>> calls to pthread_mutex_init will fail with EINVAL, but you won't see it >>> due to missing error handling, and you end up without mutexes at all. >>> If you call pthread_mutexattr_init before calling >>> pthread_mutexattr_settype the situation shoul;d be the same as before. >> >> Thanks for catching my mistake. Your earlier suggestion about explicitly >> setting the pthread_mutexes to be ERRORCHECK mutexes seems to fix the >> problem (as long as I remember to call pthread_mutexattr_init). The revised >> patch is attached. I went back to using both the static and dynamic >> initializations as in the original code, since you said that's harmless. > > I'm glad to read that, but I'm still a little bit concerned. If your > code works with ERRORCHECK mutexes but hangs with NORMAL mutexes, you > *might* miss an error case. > > I'd suggest to tweak the pthread_mutex_lock/unlock calls and log the > threads calling it. It looks like the same thread calls malloc from > malloc for some reason and it might be interesting to learn how that > happens and if it's really ok in this scenario, because it seems to > be unexpected by the code. I think I found the problem with NORMAL mutexes. emacs calls pthread_atfork after initializing the mutexes, and the resulting 'prepare' handler locks the mutexes. (The parent and child handlers unlock them.) So when emacs calls fork, the mutexes are locked, and shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. Here's a gdb backtrace showing the sequence of calls: #0 malloc (size=size AT entry=40) at gmalloc.c:919 #1 0x0053fc28 in calloc (nmemb=1, size=40) at gmalloc.c:1510 #2 0x61082074 in calloc (nmemb=1, size=40) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/malloc_wrapper.cc:100 #3 0x61003177 in operator new (s=s AT entry=40) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/cxx.cc:23 #4 0x610fc9d3 in pthread_mutex::init (mutex=0x61187d34 , attr=0x0, initializer=0x12) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3118 #5 0x610fcc13 in pthread_mutex_lock (mutex=0x61187d34 ) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:3170 #6 0x611319d8 in __fp_lock (ptr=0x61187cd0 ) at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:287 #7 0x61154f75 in _fwalk (ptr=0x28d544, function=function AT entry=0x611319c0 <__fp_lock>) at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/fwalk.c:50 #8 0x61131dea in __fp_lock_all () at /usr/src/debug/cygwin-1.7.31-3/newlib/libc/stdio/findfp.c:307 #9 0x610fa45e in pthread::atforkprepare () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/thread.cc:2031 #10 0x61076292 in lock_pthread (this=) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:137 #11 hold_everything (x=, this=) at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/sigproc.h:169 #12 fork () at /usr/src/debug/cygwin-1.7.31-3/winsup/cygwin/fork.cc:582 Is there a better way to deal with this issue than using ERRORCHECK mutexes? Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple