X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; q=dns; s=default; b=pYDLquwa5/iF0vHhZmot3+yYSHvl+eq2Yci8ZcaUTkj vqfnT3xBrZplvKCpge8PQez0HXmdoY0Dw7pjNDh9l27KDQfHus98ONe+0r2huiFR j5IJLfdvzFiQXQ+CTR/2BX3CVGrfr7D4k7d84v2LOW8MvmpIcmaLuDKfUBaXLRkE = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; s=default; bh=4nui1EHzzCjxJkWkKj/rDK8muPE=; b=Fh8Wxafhm5+rGvv8W UxzoRUgN38M4q6lHy6rE+mbiGj2rAGLf18E12pWmMzwSs6tKN3y02hbUoODcJ/YT lOV5a9cIZcPQflkmvE1IEr+M+tyGCEWifCAWdfX59lGc/toJqH6ko0uRnM/ypjZp Wwk+961wxOi8CCiQhXef1/3DzA= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_50,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_NO,RP_MATCHES_RCVD,SPF_NEUTRAL autolearn=ham version=3.3.2 Message-ID: <520B8E7F.6060709@cs.utoronto.ca> Date: Wed, 14 Aug 2013 10:04:47 -0400 From: Ryan Johnson User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: 64-bit emacs crashes a lot References: <51F3151D DOT 7040000 AT cs DOT utoronto DOT ca> <51F33565 DOT 1090406 AT cornell DOT edu> <51F33F52 DOT 4060405 AT cs DOT utoronto DOT ca> <51FB1D9E DOT 5090102 AT cs DOT utoronto DOT ca> <20130802080211 DOT GA18054 AT calimero DOT vinschen DOT de> <51FB9228 DOT 2020309 AT cornell DOT edu> <51FBA100 DOT 90005 AT cs DOT utoronto DOT ca> <51FD5462 DOT 5020400 AT cs DOT utoronto DOT ca> <51FFBDFF DOT 7040501 AT cornell DOT edu> <51FFC4F2 DOT 8080909 AT cs DOT utoronto DOT ca> <5203D89E DOT 6030801 AT cornell DOT edu> <5203DCCA DOT 1010105 AT cs DOT utoronto DOT ca> <5205B364 DOT 8090007 AT cs DOT utoronto DOT ca> <52064730 DOT 50404 AT cornell DOT edu> <52065B3C DOT 6060104 AT cs DOT utoronto DOT ca> <52067FDD DOT 4000708 AT cornell DOT edu> In-Reply-To: <52067FDD.4000708@cornell.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 10/08/2013 2:01 PM, Ken Brown wrote: > On 8/10/2013 11:24 AM, Ryan Johnson wrote: >> On 10/08/2013 9:59 AM, Ken Brown wrote: >>> On 8/9/2013 11:28 PM, Ryan Johnson wrote: >>>> On 08/08/2013 2:00 PM, Ryan Johnson wrote: >>>>> On 08/08/2013 1:42 PM, Ken Brown wrote: >>>>>> On 8/5/2013 11:29 AM, Ryan Johnson wrote: >>>>>>> On 05/08/2013 11:00 AM, Ken Brown wrote: >>>>>>>> On 8/3/2013 3:05 PM, Ryan Johnson wrote: >>>>>>>>> On 02/08/2013 8:07 AM, Ryan Johnson wrote: >>>>>>>>>> On 02/08/2013 7:04 AM, Ken Brown wrote: >>>>>>>>>>> On 8/2/2013 4:02 AM, Corinna Vinschen wrote: >>>>>>>>>>>> On Aug 1 22:46, Ryan Johnson wrote: >>>>>>>>>>>>> Here's a new one... I started a compilation, but before it >>>>>>>>>>>>> actually >>>>>>>>>>>>> invoked the command it started pegging the CPU. After >>>>>>>>>>>>> ^G^G^G, it >>>>>>>>>>>>> crashed with the following: >>>>>>>>>>>>>> Auto-save? (y or n) y >>>>>>>>>>>>>> 0 [main] emacs 5076 C:\cygwin64\bin\emacs-nox.exe: *** >>>>>>>>>>>>>> fatal >>>>>>>>>>>>>> error - Internal error: TP_NUM_W_BUFS too small 2268032 >>>>>>>>>>>>>> >= 10. >>>>>>>>>>>> >>>>>>>>>>>> That looks like a memory overwrite. 2268032 is 0x229b80, >>>>>>>>>>>> which >>>>>>>>>>>> looks >>>>>>>>>>>> suspiciously like a stack address. And the overwritten >>>>>>>>>>>> value is >>>>>>>>>>>> on the >>>>>>>>>>>> stack, too, well within the cygwin TLS area. If *this* value >>>>>>>>>>>> gets >>>>>>>>>>>> overwritten, the TLS is probbaly totally hosed at this point. >>>>>>>>>>>> There's >>>>>>>>>>>> just no way to infer the culprit from this limited info. >>>>>>>>>>> >>>>>>>>>>> Could this be BLODA? Ryan, I noticed that you wrote in a >>>>>>>>>>> different >>>>>>>>>>> thread, "I recently migrated to 64-bit cygwin...and so far >>>>>>>>>>> have not >>>>>>>>>>> had to disable Windows Defender; the latter was a recurring >>>>>>>>>>> source of >>>>>>>>>>> trouble for my previous 32-bit cygwin install on Win7/64." >>>>>>>>>> This would be a whole new level of nasty from a BLODA... I >>>>>>>>>> thought >>>>>>>>>> they only interfered with fork()? >>>>>>>>>> >>>>>>>>>> However, this *is* Windows Defender we're talking about... >>>>>>>>>> service >>>>>>>>>> disabled and all cygwin processes restarted. I'll let you know >>>>>>>>>> in a >>>>>>>>>> day or so if the crashes go away. >>>>>>>>> Rats. I just had another crash, the "Fatal error 6" variety. >>>>>>>>> Windows >>>>>>>>> Defender has not turned itself back on (it's been known to do >>>>>>>>> that), and >>>>>>>>> a scan of the BLODA list didn't match anything else on my system. >>>>>>>>> >>>>>>>>> So I don't think it's BLODA... >>>>>>>>> >>>>>>>>> Ideas? >>>>>>>> >>>>>>>> Not really, other than the obvious: (a) Find a reproducible way of >>>>>>>> making emacs-nox crash. (b) Catch the crash in gdb by setting a >>>>>>>> suitable break point. >>>> Got one! Looks like a stack overflow somewhere in the garbage >>>> collector: >>>> >>>> Program received signal SIGSEGV, Segmentation fault. >>>> [Switching to Thread 5316.0x1af4] >>>> 0x00000001004df44a in mark_object (arg=) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5903 >>>> 5903 if (CONS_MARKED_P (ptr)) >>>> (gdb) bt >>>> #0 0x00000001004df44a in mark_object (arg=) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5903 >>>> #1 0x00000001004df66e in mark_object (arg=) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914 >>>> #2 0x00000001004df593 in mark_object (arg=) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5809 >>>> #3 0x00000001004df66e in mark_object (arg=) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914 >>>> #4 0x00000001004df66e in mark_object (arg=) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5914 >>>> #5 0x00000001004df585 in mark_object (arg=) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5808 >>>> #6 0x00000001004dfa4e in mark_vectorlike ( >>>> ptr=0x100f66f28 ) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5501 >>>> ... snip ... >>>> #2606 0x00000001004dfaf4 in mark_buffer (buffer=) >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5552 >>>> #2607 0x00000001004dff2c in Fgarbage_collect () >>>> at /usr/src/debug/emacs-24.3-4/src/alloc.c:5181 >>>> #2608 0x0000000000000000 in ?? () >>> >>> I don't know whether 2608 stack frames is unusual or not. Is this >>> enough to cause a stack overflow? >> I don't know the answer to that for emacs, but in general that's an >> exceedingly deep stack that would normally indicate some sort of >> infinite recursion. Would you actually expect an object tree in emacs to >> be 2000+ pointers deep? No plausible non-bug scenarios leap to mind >> right off... > > I'd be very surprised if there were a bug in the garbage collection > routine that's causing this. If there were, I'd expect to see lots of > people reporting this. Could there be some memory corruption that > creeps in when you suspend/resume emacs? You did say that the crashes > are less frequent since you deactivated Windows Defender, so I'm not > sure you can rule out BLODA. > > By the way, are your crashes always related to suspending and resuming > emacs? I don't recall that you said that before, but you keep > mentioning ^Z. Do you still get crashes if you never suspend emacs? > You could also try one of the GUI versions of emacs to see if you get > crashes. "Suspending" in that case simply iconifies the frame. > >>> >>>> I have the full backtrace saved to file, let me know if that would be >>>> useful (there wasn't anything obvious that I could see, just more >>>> of the >>>> same). Meanwhile, I verified that none of the addresses printed is >>>> repeated, so it doesn't seem to be due to an obvious cycle in the >>>> object >>>> graph. >>> >>> From what you've shown, it appears that most of the addresses have >>> been optimized out. I think you would need an unoptimized build in >>> order to check that, wouldn't you? >> Probably, yes. That's why I said no "obvious" cycles -- at least the 400 >> pointers that are shown don't show a problem. >> >>> >>>> The crash happened when I foregrounded a stopped emacs. I tried >>>> playing >>>> around with various breakpoints while repeatedly sending ^Z, but no >>>> luck >>>> repeating the "feat" yet. >>>> >>>> Ideas? >>> >>> Can you trigger the bug by calling garbage collection manually (M-x >>> garbage-collect)? What happens if you put a breakpoint at >>> Fgarbage_collect and step through it? (Again, you might need an >>> unoptimized build before that will be useful.) >> I tried breaking on Fgarbage_collect and hitting ^Z no love. I also >> tried setting a breakpoint on one of those other internal functions, >> with an ignore count intended to trigger it deep in a GC cycle. It >> triggered some tens of frames deep and ^Z there didn't cause trouble >> either. I wonder if the GC cycle just happened to coincide with >> reactivating emacs (perhaps triggered by some internal timeout that >> elapsed while it was stopped?) >> >>> >>> There are lots of lisp variables that can be used to control garbage >>> collection and get information about it. See the section on garbage >>> collection in the elisp manual. For example, you could try >>> customizing garbage-collection-messages. Or you could play with >>> gc-cons-threshold. >> I didn't see anything glaringly useful there... the messages just >> announce a GC run, which gdb can catch just fine. There doesn't seem to >> be any way of tracking how deep an object tree emacs traversed, or how >> many objects were freed. > > Sorry, I misread what the message would be. I should have said that > you could look directly at the output from garbage-collect, which you > can see if you evaluate (garbage-collect) in the *scratch* buffer. > But, as I said above, I'm not sure that garbage collection is the > underlying problem here. Agree it's probably not GC... GC would just tend to trip over any bad pointers that were lurking around... After a rash of crashes where I either forgot to attach gdb or forgot to set appropriate breakpoints, I finally managed to catch the stack trace below. It occurred during M-x compile, while emacs parsed the compilation's rather copious output, which is by far the most common type of crash I've been getting lately. I have no idea how to interpret the backtrace, though. What should I try next? I assume I'll need a debug-compiled emacs so the backtrace isn't garbage? If so, (a) what is the most straightforward way to compile emacs-nox that way and (b) what would I be looking for if I encountered the below stack trace in a debug build? Thanks, Ryan Breakpoint 2, 0x000000010055d190 in kill () (gdb) bt #0 0x000000010055d190 in kill () #1 0x000000010053702e in process_send_signal (process=process AT entry=25781889629, signo=signo AT entry=2, current_group=, nomsg=nomsg AT entry=0) at /usr/src/debug/emacs-24.3-4/src/process.c:5948 #2 0x0000000100537198 in Finterrupt_process (process=25781889629, current_group=) at /usr/src/debug/emacs-24.3-4/src/process.c:5966 #3 0x00000001004f7761 in Ffuncall (nargs=, args=) at /usr/src/debug/emacs-24.3-4/src/eval.c:2781 #4 0x000000010052b5ed in exec_byte_code (bytestr=4294962344, vector=2268896, maxdepth=2, args_template=4303595040, nargs=4304157760, args=0x100902032 ) at /usr/src/debug/emacs-24.3-4/src/bytecode.c:900 #5 0x00000001004f7293 in funcall_lambda (fun=25778101277, nargs=nargs AT entry=0, arg_vector=arg_vector AT entry=0x22a188) at /usr/src/debug/emacs-24.3-4/src/eval.c:3010 #6 0x00000001004f75cb in Ffuncall (nargs=nargs AT entry=1, args=args AT entry=0x22a180) at /usr/src/debug/emacs-24.3-4/src/eval.c:2839 #7 0x00000001004f8bef in apply1 (fn=25778613730, fn AT entry=4304161216, arg=arg AT entry=4304412722) at /usr/src/debug/emacs-24.3-4/src/eval.c:2539 #8 0x00000001004f3567 in Fcall_interactively (function=4304161216, record_flag=4304412722, keys=4299711881) at /usr/src/debug/emacs-24.3-4/src/callint.c:377 #9 0x00000001004f7752 in Ffuncall (nargs=nargs AT entry=4, args=args AT entry=0x22a3b0) at /usr/src/debug/emacs-24.3-4/src/eval.c:2785 #10 0x00000001004f91b7 in call3 (fn=, arg1=, arg2=, arg3=) at /usr/src/debug/emacs-24.3-4/src/eval.c:2603 #11 0x00000001004883cd in Fcommand_execute (cmd=, record_flag=, keys=, special=) at /usr/src/debug/emacs-24.3-4/src/keyboard.c:10241 #12 0x0000000100494ae8 in command_loop_1 () at /usr/src/debug/emacs-24.3-4/src/keyboard.c:1587 #13 0x00000001004f5c2e in internal_condition_case (bfun=bfun AT entry=0x100494740 , handlers=4304470642, hfun=hfun AT entry=0x10048ae40 ) at /usr/src/debug/emacs-24.3-4/src/eval.c:1289 #14 0x000000010048630a in command_loop_2 (ignore=ignore AT entry=4304412722) at /usr/src/debug/emacs-24.3-4/src/keyboard.c:1168 #15 0x00000001004f5aef in internal_catch (tag=, func=func AT entry=0x1004862e0 , arg=4304412722) at /usr/src/debug/emacs-24.3-4/src/eval.c:1060 #16 0x000000010048a914 in command_loop () at /usr/src/debug/emacs-24.3-4/src/keyboard.c:1147 #17 recursive_edit_1 () at /usr/src/debug/emacs-24.3-4/src/keyboard.c:779 #18 0x000000010048ac47 in Frecursive_edit () at /usr/src/debug/emacs-24.3-4/src/keyboard.c:843 #19 0x000000010055e8ef in main (argc=, argv=) at /usr/src/debug/emacs-24.3-4/src/emacs.c:1537 -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple