delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2013/08/14/10:05:19

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
q=dns; s=default; b=pYDLquwa5/iF0vHhZmot3+yYSHvl+eq2Yci8ZcaUTkj
vqfnT3xBrZplvKCpge8PQez0HXmdoY0Dw7pjNDh9l27KDQfHus98ONe+0r2huiFR
j5IJLfdvzFiQXQ+CTR/2BX3CVGrfr7D4k7d84v2LOW8MvmpIcmaLuDKfUBaXLRkE
=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
s=default; bh=4nui1EHzzCjxJkWkKj/rDK8muPE=; b=Fh8Wxafhm5+rGvv8W
UxzoRUgN38M4q6lHy6rE+mbiGj2rAGLf18E12pWmMzwSs6tKN3y02hbUoODcJ/YT
lOV5a9cIZcPQflkmvE1IEr+M+tyGCEWifCAWdfX59lGc/toJqH6ko0uRnM/ypjZp
Wwk+961wxOi8CCiQhXef1/3DzA=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_50,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_NO,RP_MATCHES_RCVD,SPF_NEUTRAL autolearn=ham version=3.3.2
Message-ID: <520B8E7F.6060709@cs.utoronto.ca>
Date: Wed, 14 Aug 2013 10:04:47 -0400
From: Ryan Johnson <ryan DOT johnson AT cs DOT utoronto DOT ca>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: 64-bit emacs crashes a lot
References: <51F3151D DOT 7040000 AT cs DOT utoronto DOT ca> <51F33565 DOT 1090406 AT cornell DOT edu> <51F33F52 DOT 4060405 AT cs DOT utoronto DOT ca> <51FB1D9E DOT 5090102 AT cs DOT utoronto DOT ca> <20130802080211 DOT GA18054 AT calimero DOT vinschen DOT de> <51FB9228 DOT 2020309 AT cornell DOT edu> <51FBA100 DOT 90005 AT cs DOT utoronto DOT ca> <51FD5462 DOT 5020400 AT cs DOT utoronto DOT ca> <51FFBDFF DOT 7040501 AT cornell DOT edu> <51FFC4F2 DOT 8080909 AT cs DOT utoronto DOT ca> <5203D89E DOT 6030801 AT cornell DOT edu> <5203DCCA DOT 1010105 AT cs DOT utoronto DOT ca> <5205B364 DOT 8090007 AT cs DOT utoronto DOT ca> <52064730 DOT 50404 AT cornell DOT edu> <52065B3C DOT 6060104 AT cs DOT utoronto DOT ca> <52067FDD DOT 4000708 AT cornell DOT edu>
In-Reply-To: <52067FDD.4000708@cornell.edu>

On 10/08/2013 2:01 PM, Ken Brown wrote:
> On 8/10/2013 11:24 AM, Ryan Johnson wrote:
>> On 10/08/2013 9:59 AM, Ken Brown wrote:
>>> On 8/9/2013 11:28 PM, Ryan Johnson wrote:
>>>> On 08/08/2013 2:00 PM, Ryan Johnson wrote:
>>>>> On 08/08/2013 1:42 PM, Ken Brown wrote:
>>>>>> On 8/5/2013 11:29 AM, Ryan Johnson wrote:
>>>>>>> On 05/08/2013 11:00 AM, Ken Brown wrote:
>>>>>>>> On 8/3/2013 3:05 PM, Ryan Johnson wrote:
>>>>>>>>> On 02/08/2013 8:07 AM, Ryan Johnson wrote:
>>>>>>>>>> On 02/08/2013 7:04 AM, Ken Brown wrote:
>>>>>>>>>>> On 8/2/2013 4:02 AM, Corinna Vinschen wrote:
>>>>>>>>>>>> On Aug  1 22:46, Ryan Johnson wrote:
>>>>>>>>>>>>> Here's a new one... I started a compilation, but before it
>>>>>>>>>>>>> actually
>>>>>>>>>>>>> invoked the command it started pegging the CPU. After
>>>>>>>>>>>>> ^G^G^G, it
>>>>>>>>>>>>> crashed with the following:
>>>>>>>>>>>>>> Auto-save? (y or n) y
>>>>>>>>>>>>>>       0 [main] emacs 5076 C:\cygwin64\bin\emacs-nox.exe: ***
>>>>>>>>>>>>>> fatal
>>>>>>>>>>>>>> error - Internal error: TP_NUM_W_BUFS too small 2268032 
>>>>>>>>>>>>>> >= 10.
>>>>>>>>>>>>
>>>>>>>>>>>> That looks like a memory overwrite.  2268032 is 0x229b80, 
>>>>>>>>>>>> which
>>>>>>>>>>>> looks
>>>>>>>>>>>> suspiciously like a stack address.  And the overwritten 
>>>>>>>>>>>> value is
>>>>>>>>>>>> on the
>>>>>>>>>>>> stack, too, well within the cygwin TLS area.  If *this* value
>>>>>>>>>>>> gets
>>>>>>>>>>>> overwritten, the TLS is probbaly totally hosed at this point.
>>>>>>>>>>>> There's
>>>>>>>>>>>> just no way to infer the culprit from this limited info.
>>>>>>>>>>>
>>>>>>>>>>> Could this be BLODA?  Ryan, I noticed that you wrote in a
>>>>>>>>>>> different
>>>>>>>>>>> thread, "I recently migrated to 64-bit cygwin...and so far
>>>>>>>>>>> have not
>>>>>>>>>>> had to disable Windows Defender; the latter was a recurring
>>>>>>>>>>> source of
>>>>>>>>>>> trouble for my previous 32-bit cygwin install on Win7/64."
>>>>>>>>>> This would be a whole new level of nasty from a BLODA... I 
>>>>>>>>>> thought
>>>>>>>>>> they only interfered with fork()?
>>>>>>>>>>
>>>>>>>>>> However, this *is* Windows Defender we're talking about... 
>>>>>>>>>> service
>>>>>>>>>> disabled and all cygwin processes restarted. I'll let you know
>>>>>>>>>> in a
>>>>>>>>>> day or so if the crashes go away.
>>>>>>>>> Rats. I just had another crash, the "Fatal error 6" variety.
>>>>>>>>> Windows
>>>>>>>>> Defender has not turned itself back on (it's been known to do
>>>>>>>>> that), and
>>>>>>>>> a scan of the BLODA list didn't match anything else on my system.
>>>>>>>>>
>>>>>>>>> So I don't think it's BLODA...
>>>>>>>>>
>>>>>>>>> Ideas?
>>>>>>>>
>>>>>>>> Not really, other than the obvious: (a) Find a reproducible way of
>>>>>>>> making emacs-nox crash.  (b) Catch the crash in gdb by setting a
>>>>>>>> suitable break point.
>>>> Got one! Looks like a stack overflow somewhere in the garbage 
>>>> collector:
>>>>
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> [Switching to Thread 5316.0x1af4]
>>>> 0x00000001004df44a in mark_object (arg=<optimized out>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5903
>>>> 5903            if (CONS_MARKED_P (ptr))
>>>> (gdb) bt
>>>> #0  0x00000001004df44a in mark_object (arg=<optimized out>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5903
>>>> #1  0x00000001004df66e in mark_object (arg=<optimized out>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
>>>> #2  0x00000001004df593 in mark_object (arg=<optimized out>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5809
>>>> #3  0x00000001004df66e in mark_object (arg=<optimized out>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
>>>> #4  0x00000001004df66e in mark_object (arg=<optimized out>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914
>>>> #5  0x00000001004df585 in mark_object (arg=<optimized out>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5808
>>>> #6  0x00000001004dfa4e in mark_vectorlike (
>>>>      ptr=0x100f66f28 <bss_sbrk_buffer+6955080>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5501
>>>> ... snip ...
>>>> #2606 0x00000001004dfaf4 in mark_buffer (buffer=<optimized out>)
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5552
>>>> #2607 0x00000001004dff2c in Fgarbage_collect ()
>>>>      at /usr/src/debug/emacs-24.3-4/src/alloc.c:5181
>>>> #2608 0x0000000000000000 in ?? ()
>>>
>>> I don't know whether 2608 stack frames is unusual or not.  Is this
>>> enough to cause a stack overflow?
>> I don't know the answer to that for emacs, but in general that's an
>> exceedingly deep stack that would normally indicate some sort of
>> infinite recursion. Would you actually expect an object tree in emacs to
>> be 2000+ pointers deep? No plausible non-bug scenarios leap to mind
>> right off...
>
> I'd be very surprised if there were a bug in the garbage collection 
> routine that's causing this.  If there were, I'd expect to see lots of 
> people reporting this.  Could there be some memory corruption that 
> creeps in when you suspend/resume emacs?  You did say that the crashes 
> are less frequent since you deactivated Windows Defender, so I'm not 
> sure you can rule out BLODA.
>
> By the way, are your crashes always related to suspending and resuming 
> emacs?  I don't recall that you said that before, but you keep 
> mentioning ^Z.  Do you still get crashes if you never suspend emacs? 
> You could also try one of the GUI versions of emacs to see if you get 
> crashes.  "Suspending" in that case simply iconifies the frame.
>
>>>
>>>> I have the full backtrace saved to file, let me know if that would be
>>>> useful (there wasn't anything obvious that I could see, just more 
>>>> of the
>>>> same). Meanwhile, I verified that none of the addresses printed is
>>>> repeated, so it doesn't seem to be due to an obvious cycle in the 
>>>> object
>>>> graph.
>>>
>>> From what you've shown, it appears that most of the addresses have
>>> been optimized out.  I think you would need an unoptimized build in
>>> order to check that, wouldn't you?
>> Probably, yes. That's why I said no "obvious" cycles -- at least the 400
>> pointers that are shown don't show a problem.
>>
>>>
>>>> The crash happened when I foregrounded a stopped emacs. I tried 
>>>> playing
>>>> around with various breakpoints while repeatedly sending ^Z, but no 
>>>> luck
>>>> repeating the "feat" yet.
>>>>
>>>> Ideas?
>>>
>>> Can you trigger the bug by calling garbage collection manually (M-x
>>> garbage-collect)?  What happens if you put a breakpoint at
>>> Fgarbage_collect and step through it?  (Again, you might need an
>>> unoptimized build before that will be useful.)
>> I tried breaking on Fgarbage_collect and hitting ^Z no love. I also
>> tried setting a breakpoint on one of those other internal functions,
>> with an ignore count intended to trigger it deep in a GC cycle. It
>> triggered some tens of frames deep and ^Z there didn't cause trouble
>> either. I wonder if the GC cycle just happened to coincide with
>> reactivating emacs (perhaps triggered by some internal timeout that
>> elapsed while it was stopped?)
>>
>>>
>>> There are lots of lisp variables that can be used to control garbage
>>> collection and get information about it.  See the section on garbage
>>> collection in the elisp manual.  For example, you could try
>>> customizing garbage-collection-messages.  Or you could play with
>>> gc-cons-threshold.
>> I didn't see anything glaringly useful there... the messages just
>> announce a GC run, which gdb can catch just fine. There doesn't seem to
>> be any way of tracking how deep an object tree emacs traversed, or how
>> many objects were freed.
>
> Sorry, I misread what the message would be.  I should have said that 
> you could look directly at the output from garbage-collect, which you 
> can see if you evaluate (garbage-collect) in the *scratch* buffer.  
> But, as I said above, I'm not sure that garbage collection is the 
> underlying problem here.
Agree it's probably not GC... GC would just tend to trip over any bad 
pointers that were lurking around...

After a rash of crashes where I either forgot to attach gdb or forgot to 
set appropriate breakpoints, I finally managed to catch the stack trace 
below. It occurred during M-x compile, while emacs parsed the 
compilation's rather copious output, which is by far the most common 
type of crash I've been getting lately. I have no idea how to interpret 
the backtrace, though.

What should I try next? I assume I'll need a debug-compiled emacs so the 
backtrace isn't garbage? If so, (a) what is the most straightforward way 
to compile emacs-nox that way and (b) what would I be looking for if I 
encountered the below stack trace in a debug build?

Thanks,
Ryan

Breakpoint 2, 0x000000010055d190 in kill ()
(gdb) bt
#0  0x000000010055d190 in kill ()
#1  0x000000010053702e in process_send_signal 
(process=process AT entry=25781889629, signo=signo AT entry=2, 
current_group=<optimized out>, nomsg=nomsg AT entry=0) at 
/usr/src/debug/emacs-24.3-4/src/process.c:5948
#2  0x0000000100537198 in Finterrupt_process (process=25781889629, 
current_group=<optimized out>) at 
/usr/src/debug/emacs-24.3-4/src/process.c:5966
#3  0x00000001004f7761 in Ffuncall (nargs=<optimized out>, 
args=<optimized out>) at /usr/src/debug/emacs-24.3-4/src/eval.c:2781
#4  0x000000010052b5ed in exec_byte_code (bytestr=4294962344, 
vector=2268896, maxdepth=2, args_template=4303595040, nargs=4304157760, 
args=0x100902032 <bss_sbrk_buffer+250194>)
     at /usr/src/debug/emacs-24.3-4/src/bytecode.c:900
#5  0x00000001004f7293 in funcall_lambda (fun=25778101277, 
nargs=nargs AT entry=0, arg_vector=arg_vector AT entry=0x22a188) at 
/usr/src/debug/emacs-24.3-4/src/eval.c:3010
#6  0x00000001004f75cb in Ffuncall (nargs=nargs AT entry=1, 
args=args AT entry=0x22a180) at /usr/src/debug/emacs-24.3-4/src/eval.c:2839
#7  0x00000001004f8bef in apply1 (fn=25778613730, fn AT entry=4304161216, 
arg=arg AT entry=4304412722) at /usr/src/debug/emacs-24.3-4/src/eval.c:2539
#8  0x00000001004f3567 in Fcall_interactively (function=4304161216, 
record_flag=4304412722, keys=4299711881) at 
/usr/src/debug/emacs-24.3-4/src/callint.c:377
#9  0x00000001004f7752 in Ffuncall (nargs=nargs AT entry=4, 
args=args AT entry=0x22a3b0) at /usr/src/debug/emacs-24.3-4/src/eval.c:2785
#10 0x00000001004f91b7 in call3 (fn=<optimized out>, arg1=<optimized 
out>, arg2=<optimized out>, arg3=<optimized out>) at 
/usr/src/debug/emacs-24.3-4/src/eval.c:2603
#11 0x00000001004883cd in Fcommand_execute (cmd=<optimized out>, 
record_flag=<optimized out>, keys=<optimized out>, special=<optimized 
out>) at /usr/src/debug/emacs-24.3-4/src/keyboard.c:10241
#12 0x0000000100494ae8 in command_loop_1 () at 
/usr/src/debug/emacs-24.3-4/src/keyboard.c:1587
#13 0x00000001004f5c2e in internal_condition_case 
(bfun=bfun AT entry=0x100494740 <command_loop_1>, handlers=4304470642, 
hfun=hfun AT entry=0x10048ae40 <cmd_error>) at 
/usr/src/debug/emacs-24.3-4/src/eval.c:1289
#14 0x000000010048630a in command_loop_2 
(ignore=ignore AT entry=4304412722) at 
/usr/src/debug/emacs-24.3-4/src/keyboard.c:1168
#15 0x00000001004f5aef in internal_catch (tag=<optimized out>, 
func=func AT entry=0x1004862e0 <command_loop_2>, arg=4304412722) at 
/usr/src/debug/emacs-24.3-4/src/eval.c:1060
#16 0x000000010048a914 in command_loop () at 
/usr/src/debug/emacs-24.3-4/src/keyboard.c:1147
#17 recursive_edit_1 () at /usr/src/debug/emacs-24.3-4/src/keyboard.c:779
#18 0x000000010048ac47 in Frecursive_edit () at 
/usr/src/debug/emacs-24.3-4/src/keyboard.c:843
#19 0x000000010055e8ef in main (argc=<optimized out>, argv=<optimized 
out>) at /usr/src/debug/emacs-24.3-4/src/emacs.c:1537


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019