delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2013/08/16/02:00:11

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
q=dns; s=default; b=rpN/bnkWL1jGotrqWkhiKls+Z+qLCrWpCr8SmB22keU
NW11FSkASYkqiy25XGBFvaZ9BYBm+ZOza/o9hMabN4QffvQSuiVJ5AScIYvzy5mU
xHzkuvYHXpY2c/+jeuxjI9PqwlL8wl6zDBhxtGXGbKqluNKF8+zRHgkAN7pHNKx4
=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:mime-version:to:subject
:references:in-reply-to:content-type:content-transfer-encoding;
s=default; bh=TOCX9P0DLworBund6WPVumGRvjg=; b=raC/yBi3scx7PZcRS
LL5aJCbLmC6wMl4DrCcpFhzRijh1jvYbfIsMoF8lwkytnX1JpLpzgslZLFDtYyXN
iq3oCVmlE9t68AKNcYeRaqUtzUKtDtXgl2v2W0hwmj1qjYZuIcQJGlQVPyZHxKRS
0uXhp27kVPmoeG3nbfdysKION4=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-Spam-SWARE-Status: No, score=-4.2 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RCVD_IN_HOSTKARMA_NO,RP_MATCHES_RCVD,SPF_NEUTRAL autolearn=ham version=3.3.2
Message-ID: <520DBFCD.4080808@cs.utoronto.ca>
Date: Fri, 16 Aug 2013 01:59:41 -0400
From: Ryan Johnson <ryan DOT johnson AT cs DOT utoronto DOT ca>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: 64-bit emacs crashes a lot
References: <51F3151D DOT 7040000 AT cs DOT utoronto DOT ca> <51F33565 DOT 1090406 AT cornell DOT edu> <51F33F52 DOT 4060405 AT cs DOT utoronto DOT ca> <51FB1D9E DOT 5090102 AT cs DOT utoronto DOT ca> <20130802080211 DOT GA18054 AT calimero DOT vinschen DOT de> <51FB9228 DOT 2020309 AT cornell DOT edu> <51FBA100 DOT 90005 AT cs DOT utoronto DOT ca> <51FD5462 DOT 5020400 AT cs DOT utoronto DOT ca> <51FFBDFF DOT 7040501 AT cornell DOT edu> <51FFC4F2 DOT 8080909 AT cs DOT utoronto DOT ca> <5203D89E DOT 6030801 AT cornell DOT edu> <5203DCCA DOT 1010105 AT cs DOT utoronto DOT ca> <5205B364 DOT 8090007 AT cs DOT utoronto DOT ca> <52064730 DOT 50404 AT cornell DOT edu> <"52065B3C DOT 6060104 AT cs DOT utoronto <520CCA41.3000107"@cs.utoronto.ca> <520D089A DOT 1020806 AT cornell DOT edu> <83ioz6op5v DOT fsf AT gnu DOT org> <520D4036 DOT 8010303 AT cs DOT utoronto DOT ca> <520D900A DOT 8000907 AT cornell DOT edu> <520DABDC DOT 8020304 AT cs DOT utoronto DOT ca>
In-Reply-To: <520DABDC.8020304@cs.utoronto.ca>

On 16/08/2013 12:34 AM, Ryan Johnson wrote:
> On 15/08/2013 10:35 PM, Ken Brown wrote:
>> On 8/15/2013 4:55 PM, Ryan Johnson wrote:
>>> At this point I'm pretty confident it's memory corruption of some kind.
>>> Consider the following semi-STC:
>>> 1. Invoke: emacs-nox -Q; echo -e "att $(jobs -p)\nc" > 
>>> /dev/clipboard; fg
>>> 2. ^Z
>>> 3. (switch to window running gdb and hit [shift]+[insert] to paste from
>>> clipboard)
>>> 5. (switch to window running emacs): M-x compile C-a C-k ls [ret]
>>> 6. C-x o (to switch to the compilation output window)
>>> 7. Hit 'g' to keep repeating the "compilation" until gdb picks up a 
>>> crash.
>>
>> I tried a simpler version of this (without gdb and without 
>> suspending/resuming):
>>
>> 1. Invoke 'emacs-nox -Q' in mintty.
>>
>> 2. M-x compile C-a C-k ls RET
>>
>> 3. C-x o
>>
>> 4. Hit 'g' repeatedly.
>>
>> I got it to abort with Fatal error 6 after slightly over 100 
>> repetitions.
>>
>> I then tried the same thing with emacs-X11 (running under X, not in 
>> mintty).  I hit 'g' 200 times without a problem.  I repeated this 
>> with emacs-w32, again 200 times without a problem.
>>
>> So there's a bug somewhere.  But if it's an emacs bug, it's strange 
>> that it only occurs with emacs-nox and not with either of the GUI 
>> versions of emacs.
> Well, at least I'm not (necessarily) crazy or BLODA-infested... out of 
> curiosity, can you repro with 32-bit emacs-nox? I don't remember 
> 32-bit being so crash-happy, which makes me wonder if something about 
> 64-bit cygwin interacts poorly with emacs.

This is really weird... I got a crash in emacs compiled with `-g -O0', 
but it makes no sense:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 7160.0xf70]
> 0x0000000100535d0f in regex_compile (pattern=0x6000ac580 "\\(?:^\\|::  
> \\|\\S ( \\)\\(/[^ \n\t()]+\\)(\\([0-9]+\\))\\(?:: 
> \\(warning:\\)?\\|$\\| ),\\)", size=75, syntax=3408388, 
> bufp=0x10095dc30 <searchbufs+6512>) at regex.c:3627
> 3627                  || pending_exact + *pending_exact + 1 != b
> bt
> #0  0x0000000100535d0f in regex_compile (pattern=0x6000ac580 
> "\\(?:^\\|::  \\|\\S ( \\)\\(/[^ \n\t()]+\\)(\\([0-9]+\\))\\(?:: 
> \\(warning:\\)?\\|$\\| ),\\)", size=75, syntax=3408388, 
> bufp=0x10095dc30 <searchbufs+651\
> 2>) at regex.c:3627

The variable pending_exact has value 0x0, which would be a Bad Thing... 
except that the code looks like this:
>           if (!pending_exact
>
>               /* If last exactn not at current position.  */
> =>            || pending_exact + *pending_exact + 1 != b
>
... with corresponding assembly code looking very reasonable:
>    0x0000000100535cfa <regex_compile+34482>:    cmpq   $0x0,0x3f8(%rbp)
>    0x0000000100535d02 <regex_compile+34490>:    je 0x100535eca 
> <regex_compile+34946>
>    0x0000000100535d08 <regex_compile+34496>:    mov 0x3f8(%rbp),%rax
> => 0x0000000100535d0f <regex_compile+34503>:    movzbl (%rax),%eax
>    0x0000000100535d12 <regex_compile+34506>:    movzbl %al,%eax
>    0x0000000100535d15 <regex_compile+34509>:    lea 0x1(%rax),%rdx
>    0x0000000100535d19 <regex_compile+34513>:    mov 0x3f8(%rbp),%rax
>    0x0000000100535d20 <regex_compile+34520>:    add %rdx,%rax
>    0x0000000100535d23 <regex_compile+34523>:    cmp %rbx,%rax
>    0x0000000100535d26 <regex_compile+34526>:    jne 0x100535eca 
> <regex_compile+34946>

Something apparently set 0x3f8(%rbp) to NULL during the very small 
window between the cmpq and the mov two instructions later.

A second crash hit here:
> #1  0x000000010052d589 in re_iswctype (ch=80, cc=RECC_ALPHA) at 
> regex.c:2087 

The default branch was taken even though cc should have matched the 
RECC_ALPHA case:
>   switch (cc)
>     {
>     case RECC_ALNUM: return ISALNUM (ch) != 0;
>     case RECC_ALPHA: return ISALPHA (ch) != 0;
>     case RECC_BLANK: return ISBLANK (ch) != 0;
>     ....
>     case RECC_ERROR: return false;
>     default:
> =>    abort ();
>     }

This time there's a jump table involved at machine code level, so I 
couldn't easily go deeper into why the wrong jump target was chosen.

A third crash:
> #1  0x0000000100541930 in re_match_2_internal (bufp=0x10095ce20 
> <searchbufs+2912>, string1=0x0, size1=0, string2=0x6fffff00028 "-*- 
> mode: compilation; default-directory: \"~/\" -*-\nCompilation started 
> at Fri Aug 16 01:32:19\n\nls\n#message-20130808-090732#\t 
> emacs-crash.txt\t\tmusic\n6b8ob06a.default.tar.xz\t\t 
> emacs-nox.exe."..., size2=355, pos=254, regs=0x10095def0 
> <search_regs>, stop=317) at regex.c:6217
> 6217              abort ();
This time, p (the subject of the case statement) points to 0x76b3b6c7, 
which is the middle of a function (ntdll!RtlFillMemory, though the 
memory map places that address smack in the middle of kernel32.dll 
instead). This time it makes perfect sense that the switch statement 
should fail, but how did p go so wrong?

Even more strangely, it seems to be deterministic: a second crash there 
had exactly the same address as before.

The fifth crash was a repeat of the NULL pending_exact scenario that 
came first.

One last observation, or perhaps just superstition: if gdb reports a 
single thread being created at some point during the compile-fest, a 
crash usually follows soon after. If no threads are created after gdb 
attaches and continues, or if two threads are created in quick 
succession , the crash never comes (where "never" = 300+ successful 
compiles). I have no idea why that would mean anything, though...

I'm officially stumped at this point... any ideas?

Ryan




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019