Mail Archives: cygwin/2013/08/16/02:00:11
On 16/08/2013 12:34 AM, Ryan Johnson wrote:
> On 15/08/2013 10:35 PM, Ken Brown wrote:
>> On 8/15/2013 4:55 PM, Ryan Johnson wrote:
>>> At this point I'm pretty confident it's memory corruption of some kind.
>>> Consider the following semi-STC:
>>> 1. Invoke: emacs-nox -Q; echo -e "att $(jobs -p)\nc" >
>>> /dev/clipboard; fg
>>> 2. ^Z
>>> 3. (switch to window running gdb and hit [shift]+[insert] to paste from
>>> clipboard)
>>> 5. (switch to window running emacs): M-x compile C-a C-k ls [ret]
>>> 6. C-x o (to switch to the compilation output window)
>>> 7. Hit 'g' to keep repeating the "compilation" until gdb picks up a
>>> crash.
>>
>> I tried a simpler version of this (without gdb and without
>> suspending/resuming):
>>
>> 1. Invoke 'emacs-nox -Q' in mintty.
>>
>> 2. M-x compile C-a C-k ls RET
>>
>> 3. C-x o
>>
>> 4. Hit 'g' repeatedly.
>>
>> I got it to abort with Fatal error 6 after slightly over 100
>> repetitions.
>>
>> I then tried the same thing with emacs-X11 (running under X, not in
>> mintty). I hit 'g' 200 times without a problem. I repeated this
>> with emacs-w32, again 200 times without a problem.
>>
>> So there's a bug somewhere. But if it's an emacs bug, it's strange
>> that it only occurs with emacs-nox and not with either of the GUI
>> versions of emacs.
> Well, at least I'm not (necessarily) crazy or BLODA-infested... out of
> curiosity, can you repro with 32-bit emacs-nox? I don't remember
> 32-bit being so crash-happy, which makes me wonder if something about
> 64-bit cygwin interacts poorly with emacs.
This is really weird... I got a crash in emacs compiled with `-g -O0',
but it makes no sense:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 7160.0xf70]
> 0x0000000100535d0f in regex_compile (pattern=0x6000ac580 "\\(?:^\\|::
> \\|\\S ( \\)\\(/[^ \n\t()]+\\)(\\([0-9]+\\))\\(?::
> \\(warning:\\)?\\|$\\| ),\\)", size=75, syntax=3408388,
> bufp=0x10095dc30 <searchbufs+6512>) at regex.c:3627
> 3627 || pending_exact + *pending_exact + 1 != b
> bt
> #0 0x0000000100535d0f in regex_compile (pattern=0x6000ac580
> "\\(?:^\\|:: \\|\\S ( \\)\\(/[^ \n\t()]+\\)(\\([0-9]+\\))\\(?::
> \\(warning:\\)?\\|$\\| ),\\)", size=75, syntax=3408388,
> bufp=0x10095dc30 <searchbufs+651\
> 2>) at regex.c:3627
The variable pending_exact has value 0x0, which would be a Bad Thing...
except that the code looks like this:
> if (!pending_exact
>
> /* If last exactn not at current position. */
> => || pending_exact + *pending_exact + 1 != b
>
... with corresponding assembly code looking very reasonable:
> 0x0000000100535cfa <regex_compile+34482>: cmpq $0x0,0x3f8(%rbp)
> 0x0000000100535d02 <regex_compile+34490>: je 0x100535eca
> <regex_compile+34946>
> 0x0000000100535d08 <regex_compile+34496>: mov 0x3f8(%rbp),%rax
> => 0x0000000100535d0f <regex_compile+34503>: movzbl (%rax),%eax
> 0x0000000100535d12 <regex_compile+34506>: movzbl %al,%eax
> 0x0000000100535d15 <regex_compile+34509>: lea 0x1(%rax),%rdx
> 0x0000000100535d19 <regex_compile+34513>: mov 0x3f8(%rbp),%rax
> 0x0000000100535d20 <regex_compile+34520>: add %rdx,%rax
> 0x0000000100535d23 <regex_compile+34523>: cmp %rbx,%rax
> 0x0000000100535d26 <regex_compile+34526>: jne 0x100535eca
> <regex_compile+34946>
Something apparently set 0x3f8(%rbp) to NULL during the very small
window between the cmpq and the mov two instructions later.
A second crash hit here:
> #1 0x000000010052d589 in re_iswctype (ch=80, cc=RECC_ALPHA) at
> regex.c:2087
The default branch was taken even though cc should have matched the
RECC_ALPHA case:
> switch (cc)
> {
> case RECC_ALNUM: return ISALNUM (ch) != 0;
> case RECC_ALPHA: return ISALPHA (ch) != 0;
> case RECC_BLANK: return ISBLANK (ch) != 0;
> ....
> case RECC_ERROR: return false;
> default:
> => abort ();
> }
This time there's a jump table involved at machine code level, so I
couldn't easily go deeper into why the wrong jump target was chosen.
A third crash:
> #1 0x0000000100541930 in re_match_2_internal (bufp=0x10095ce20
> <searchbufs+2912>, string1=0x0, size1=0, string2=0x6fffff00028 "-*-
> mode: compilation; default-directory: \"~/\" -*-\nCompilation started
> at Fri Aug 16 01:32:19\n\nls\n#message-20130808-090732#\t
> emacs-crash.txt\t\tmusic\n6b8ob06a.default.tar.xz\t\t
> emacs-nox.exe."..., size2=355, pos=254, regs=0x10095def0
> <search_regs>, stop=317) at regex.c:6217
> 6217 abort ();
This time, p (the subject of the case statement) points to 0x76b3b6c7,
which is the middle of a function (ntdll!RtlFillMemory, though the
memory map places that address smack in the middle of kernel32.dll
instead). This time it makes perfect sense that the switch statement
should fail, but how did p go so wrong?
Even more strangely, it seems to be deterministic: a second crash there
had exactly the same address as before.
The fifth crash was a repeat of the NULL pending_exact scenario that
came first.
One last observation, or perhaps just superstition: if gdb reports a
single thread being created at some point during the compile-fest, a
crash usually follows soon after. If no threads are created after gdb
attaches and continues, or if two threads are created in quick
succession , the crash never comes (where "never" = 300+ successful
compiles). I have no idea why that would mean anything, though...
I'm officially stumped at this point... any ideas?
Ryan
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -