delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2013/08/10/14:01:58

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
q=dns; s=default; b=X8RY8szH3ke4+JAPEQO5akYQpgTQBdvc1/sX1Lp3Ljn
SjG/pKH1EZwANv0HIAm9jZ0B2sv38PWOrsG5PsUP0kbtMBhX+MoCaJYwiImRqonW
LfcZA5m7jEIUCmlDkdreBpf/X8bW+ygZCNgReL6MyIpcuab2tbjBxnPGzcgYChxk
=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
s=default; bh=Csy3rI0b6a2BXZDbv06RxNCr5OY=; b=l8s4HCCvpdeVNhvz/
G8Pq/GAlMV1mDvLwXFkrg+4qdrtlSzfKRC3y13+q4/PsURUHh5yrDKZkIhthFlln
LDmU403wVfL9C0/xgIFrB4lkeDkkGXqj/B0GJcZ7k+4nXYufDwh6oqWr2Y7YmqD3
korXz1W1GmzES/ZwLo5jhCBoVE=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=AWL,BAYES_50,KHOP_THREADED,RCVD_IN_HOSTKARMA_NO,RDNS_NONE,SPF_PASS autolearn=no version=3.3.1
X-CornellRouted: This message has been Routed already.
Message-ID: <52067FDD.4000708@cornell.edu>
Date: Sat, 10 Aug 2013 14:01:01 -0400
From: Ken Brown <kbrown AT cornell DOT edu>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130620 Thunderbird/17.0.7
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: 64-bit emacs crashes a lot
References: <51F3151D DOT 7040000 AT cs DOT utoronto DOT ca> <51F33565 DOT 1090406 AT cornell DOT edu> <51F33F52 DOT 4060405 AT cs DOT utoronto DOT ca> <51FB1D9E DOT 5090102 AT cs DOT utoronto DOT ca> <20130802080211 DOT GA18054 AT calimero DOT vinschen DOT de> <51FB9228 DOT 2020309 AT cornell DOT edu> <51FBA100 DOT 90005 AT cs DOT utoronto DOT ca> <51FD5462 DOT 5020400 AT cs DOT utoronto DOT ca> <51FFBDFF DOT 7040501 AT cornell DOT edu> <51FFC4F2 DOT 8080909 AT cs DOT utoronto DOT ca> <5203D89E DOT 6030801 AT cornell DOT edu> <5203DCCA DOT 1010105 AT cs DOT utoronto DOT ca> <5205B364 DOT 8090007 AT cs DOT utoronto DOT ca> <52064730 DOT 50404 AT cornell DOT edu> <52065B3C DOT 6060104 AT cs DOT utoronto DOT ca>
In-Reply-To: <52065B3C.6060104@cs.utoronto.ca>

On 8/10/2013 11:24 AM, Ryan Johnson wrote:
> On 10/08/2013 9:59 AM, Ken Brown wrote:
>> On 8/9/2013 11:28 PM, Ryan Johnson wrote:
>>> On 08/08/2013 2:00 PM, Ryan Johnson wrote:
>>>> On 08/08/2013 1:42 PM, Ken Brown wrote:
>>>>> On 8/5/2013 11:29 AM, Ryan Johnson wrote:
>>>>>> On 05/08/2013 11:00 AM, Ken Brown wrote:
>>>>>>> On 8/3/2013 3:05 PM, Ryan Johnson wrote:
>>>>>>>> On 02/08/2013 8:07 AM, Ryan Johnson wrote:
>>>>>>>>> On 02/08/2013 7:04 AM, Ken Brown wrote:
>>>>>>>>>> On 8/2/2013 4:02 AM, Corinna Vinschen wrote:
>>>>>>>>>>> On Aug  1 22:46, Ryan Johnson wrote:
>>>>>>>>>>>> Here's a new one... I started a compilation, but before it
>>>>>>>>>>>> actually
>>>>>>>>>>>> invoked the command it started pegging the CPU. After
>>>>>>>>>>>> ^G^G^G, it
>>>>>>>>>>>> crashed with the following:
>>>>>>>>>>>>> Auto-save? (y or n) y
>>>>>>>>>>>>>       0 [main] emacs 5076 C:\cygwin64\bin\emacs-nox.exe: ***
>>>>>>>>>>>>> fatal
>>>>>>>>>>>>> error - Internal error: TP_NUM_W_BUFS too small 2268032 >= 10.
>>>>>>>>>>>
>>>>>>>>>>> That looks like a memory overwrite.  2268032 is 0x229b80, which
>>>>>>>>>>> looks
>>>>>>>>>>> suspiciously like a stack address.  And the overwritten value is
>>>>>>>>>>> on the
>>>>>>>>>>> stack, too, well within the cygwin TLS area.  If *this* value
>>>>>>>>>>> gets
>>>>>>>>>>> overwritten, the TLS is probbaly totally hosed at this point.
>>>>>>>>>>> There's
>>>>>>>>>>> just no way to infer the culprit from this limited info.
>>>>>>>>>>
>>>>>>>>>> Could this be BLODA?  Ryan, I noticed that you wrote in a
>>>>>>>>>> different
>>>>>>>>>> thread, "I recently migrated to 64-bit cygwin...and so far
>>>>>>>>>> have not
>>>>>>>>>> had to disable Windows Defender; the latter was a recurring
>>>>>>>>>> source of
>>>>>>>>>> trouble for my previous 32-bit cygwin install on Win7/64."
>>>>>>>>> This would be a whole new level of nasty from a BLODA... I thought
>>>>>>>>> they only interfered with fork()?
>>>>>>>>>
>>>>>>>>> However, this *is* Windows Defender we're talking about... service
>>>>>>>>> disabled and all cygwin processes restarted. I'll let you know
>>>>>>>>> in a
>>>>>>>>> day or so if the crashes go away.
>>>>>>>> Rats. I just had another crash, the "Fatal error 6" variety.
>>>>>>>> Windows
>>>>>>>> Defender has not turned itself back on (it's been known to do
>>>>>>>> that), and
>>>>>>>> a scan of the BLODA list didn't match anything else on my system.
>>>>>>>>
>>>>>>>> So I don't think it's BLODA...
>>>>>>>>
>>>>>>>> Ideas?
>>>>>>>
>>>>>>> Not really, other than the obvious: (a) Find a reproducible way of
>>>>>>> making emacs-nox crash.  (b) Catch the crash in gdb by setting a
>>>>>>> suitable break point.
>>> Got one! Looks like a stack overflow somewhere in the garbage collector:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 5316.0x1af4]
>>> 0x00000001004df44a in mark_object (arg=<optimized out>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5903
>>> 5903            if (CONS_MARKED_P (ptr))
>>> (gdb) bt
>>> #0  0x00000001004df44a in mark_object (arg=<optimized out>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5903
>>> #1  0x00000001004df66e in mark_object (arg=<optimized out>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
>>> #2  0x00000001004df593 in mark_object (arg=<optimized out>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5809
>>> #3  0x00000001004df66e in mark_object (arg=<optimized out>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
>>> #4  0x00000001004df66e in mark_object (arg=<optimized out>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
>>> #5  0x00000001004df585 in mark_object (arg=<optimized out>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5808
>>> #6  0x00000001004dfa4e in mark_vectorlike (
>>>      ptr=0x100f66f28 <bss_sbrk_buffer+6955080>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5501
>>> ... snip ...
>>> #2606 0x00000001004dfaf4 in mark_buffer (buffer=<optimized out>)
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5552
>>> #2607 0x00000001004dff2c in Fgarbage_collect ()
>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5181
>>> #2608 0x0000000000000000 in ?? ()
>>
>> I don't know whether 2608 stack frames is unusual or not.  Is this
>> enough to cause a stack overflow?
> I don't know the answer to that for emacs, but in general that's an
> exceedingly deep stack that would normally indicate some sort of
> infinite recursion. Would you actually expect an object tree in emacs to
> be 2000+ pointers deep? No plausible non-bug scenarios leap to mind
> right off...

I'd be very surprised if there were a bug in the garbage collection 
routine that's causing this.  If there were, I'd expect to see lots of 
people reporting this.  Could there be some memory corruption that 
creeps in when you suspend/resume emacs?  You did say that the crashes 
are less frequent since you deactivated Windows Defender, so I'm not 
sure you can rule out BLODA.

By the way, are your crashes always related to suspending and resuming 
emacs?  I don't recall that you said that before, but you keep 
mentioning ^Z.  Do you still get crashes if you never suspend emacs? 
You could also try one of the GUI versions of emacs to see if you get 
crashes.  "Suspending" in that case simply iconifies the frame.

>>
>>> I have the full backtrace saved to file, let me know if that would be
>>> useful (there wasn't anything obvious that I could see, just more of the
>>> same). Meanwhile, I verified that none of the addresses printed is
>>> repeated, so it doesn't seem to be due to an obvious cycle in the object
>>> graph.
>>
>> From what you've shown, it appears that most of the addresses have
>> been optimized out.  I think you would need an unoptimized build in
>> order to check that, wouldn't you?
> Probably, yes. That's why I said no "obvious" cycles -- at least the 400
> pointers that are shown don't show a problem.
>
>>
>>> The crash happened when I foregrounded a stopped emacs. I tried playing
>>> around with various breakpoints while repeatedly sending ^Z, but no luck
>>> repeating the "feat" yet.
>>>
>>> Ideas?
>>
>> Can you trigger the bug by calling garbage collection manually (M-x
>> garbage-collect)?  What happens if you put a breakpoint at
>> Fgarbage_collect and step through it?  (Again, you might need an
>> unoptimized build before that will be useful.)
> I tried breaking on Fgarbage_collect and hitting ^Z no love. I also
> tried setting a breakpoint on one of those other internal functions,
> with an ignore count intended to trigger it deep in a GC cycle. It
> triggered some tens of frames deep and ^Z there didn't cause trouble
> either. I wonder if the GC cycle just happened to coincide with
> reactivating emacs (perhaps triggered by some internal timeout that
> elapsed while it was stopped?)
>
>>
>> There are lots of lisp variables that can be used to control garbage
>> collection and get information about it.  See the section on garbage
>> collection in the elisp manual.  For example, you could try
>> customizing garbage-collection-messages.  Or you could play with
>> gc-cons-threshold.
> I didn't see anything glaringly useful there... the messages just
> announce a GC run, which gdb can catch just fine. There doesn't seem to
> be any way of tracking how deep an object tree emacs traversed, or how
> many objects were freed.

Sorry, I misread what the message would be.  I should have said that you 
could look directly at the output from garbage-collect, which you can 
see if you evaluate (garbage-collect) in the *scratch* buffer.  But, as 
I said above, I'm not sure that garbage collection is the underlying 
problem here.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019