delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2013/08/16/04:57:02

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:subject:in-reply-to:to:cc:reply-to
:message-id:references; q=dns; s=default; b=WuLvLpV2eE5XZk245g1h
HO450uBQlXmulh/W6B1XX7is8GMQFNKpmcXGsCrm6oN5t1Ut0XYX+unaoInqfKri
7drGHT+9UOhoFKp7BsZ4tBAgSTy1ocwUqQYAWqPpfiGDJ7NLxI52TvdrxLJ7xGWe
szrQ2/W1Dy2IaNGc06y918o=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:subject:in-reply-to:to:cc:reply-to
:message-id:references; s=default; bh=ViPUAWuRkMf6sRydQ3Ze0kd2lA
g=; b=mlsmB4Ly0a29p+Md88PLDRxBAN9uu26mxdp6tCQ7uIDnSiwterlhOsjfKf
Vp7m/1HKF7ouHFQlBytT0azKvMXtrmrEnj8GPCPGeqyhRNlz39sRecZnPV04XD0h
Wqh+bq20nynWh8TCCr0BCCBYxO4op/NfVBclvFN/kRY+s4F1w=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_50,KHOP_THREADED,RCVD_IN_DNSWL_NONE,RCVD_IN_HOSTKARMA_NO,SPF_SOFTFAIL autolearn=no version=3.3.2
Date: Fri, 16 Aug 2013 11:56:48 +0300
From: Eli Zaretskii <eliz AT gnu DOT org>
Subject: Re: 64-bit emacs crashes a lot
In-reply-to: <520D4036.8010303@cs.utoronto.ca>
To: Ryan Johnson <ryan DOT johnson AT cs DOT utoronto DOT ca>
Cc: cygwin AT cygwin DOT com
Reply-to: Eli Zaretskii <eliz AT gnu DOT org>
Message-id: <8361v6nhdb.fsf@gnu.org>
References: <51F3151D DOT 7040000 AT cs DOT utoronto DOT ca> <51F33565 DOT 1090406 AT cornell DOT edu> <51F33F52 DOT 4060405 AT cs DOT utoronto DOT ca> <51FB1D9E DOT 5090102 AT cs DOT utoronto DOT ca> <20130802080211 DOT GA18054 AT calimero DOT vinschen DOT de> <51FB9228 DOT 2020309 AT cornell DOT edu> <51FBA100 DOT 90005 AT cs DOT utoronto DOT ca> <51FD5462 DOT 5020400 AT cs DOT utoronto DOT ca> <51FFBDFF DOT 7040501 AT cornell DOT edu> <51FFC4F2 DOT 8080909 AT cs DOT utoronto DOT ca> <5203D89E DOT 6030801 AT cornell DOT edu> <5203DCCA DOT 1010105 AT cs DOT utoronto DOT ca> <5205B364 DOT 8090007 AT cs DOT utoronto DOT ca> <52064730 DOT 50404 AT cornell DOT edu> <"52065B3C DOT 6060104 AT cs DOT utoronto <520CCA41.3000107"@cs.utoronto.ca> <520D089A DOT 1020806 AT cornell DOT edu> <83ioz6op5v DOT fsf AT gnu DOT org> <520D4036 DOT 8010303 AT cs DOT utoronto DOT ca>

I'm not subscribed to this list, so if you want me to reply, please CC
me explicitly.  Besides, this discussion should be moved to
emacs-devel AT gnu DOT org, since I don't see anything Cygwin specific here
at this point.

> Date: Thu, 15 Aug 2013 16:55:18 -0400
> From: Ryan Johnson <ryan DOT johnson AT cs DOT utoronto DOT ca>
> 
> On 15/08/2013 1:10 PM, Eli Zaretskii wrote:
> >> Date: Thu, 15 Aug 2013 12:58:02 -0400
> >> From: Ken Brown <kbrown AT cornell DOT edu>
> >> CC: Eli Zaretskii <eliz AT gnu DOT org>
> >>
> >> Eli is the expert on bidi.c (he wrote it).  He can probably tell you
> >> whether you've really bumped into an emacs bug here.
> > There's nothing wrong with bidi.c here, it just aborts because it is
> > handed an invalid character codepoint.  It would have been useful to
> > see the value of that character.
> I guess I would just consider crashing to be overkill for a bad byte on 
> the input stream...

It's not a crash, it's a deliberate abort.  Any invalid codepoint at
such low level of the Emacs display engine means only one thing: a
bug, and a grave one at that.  Such bugs must be flagged prominently
and unequivocally, prompting users to report them.  We could in
principle "recover" by substituting some other character, but such
recovery would only sweep a grave problem under the carpet.  Since
Emacs isn't a safety-critical program, and auto-saves your edits
before it commits suicide, such recovery feature is deemed
inappropriate, and detrimental to the general quality of Emacs code in
the long run.

> and in any case, if 5-byte UTF-8 is illegal, and 
> worth dying for, wouldn't it make sense to die right away rather than 
> processing it so something else can croak down the road?

See above: yes, it's worth dying for, because I'm quite sure this is a
sign of a very serious trouble in the session anyway.  Why does it
matter for you, as a user, whether we abort here or "down the road"?
The principle is to die as soon as possible, because in many cases
this allows to identify the culprit faster and easier.  IOW, dying
sooner and faster helps the Emacs maintainers to find and fix problems
without any real effect on the users.

> > Anyway, I generally agree that this is probably some memory
> > corruption, as I'm guessing that the text in the window was all ASCII
> > in this case, so any character codepoint beyond 127 is not to be
> > expected.
> I set a breakpoint there, since I thought it was guaranteed to lead to a 
> crash if it ever ran, but it turns out that's not true. Invoking M-x 
> compile triggers the breakpoint twice in a row with the following 
> (valid!) 5-byte UTF-8:
> 
> 111110XX 10XXXXXX 10XXXXXX 10XXXXXX 10XXXXXX
> 11111000 10001111 10111111 10111101 10111111
> 
> The value is always the same, and corresponds to the code point 
> U+3FFF7F, FWIW.

If the value is positive and below 3FFFFF, then the abort could not
have happened.  Therefore, I believe that the optimized build lies to
GDB, and the actual value is not what you see in GDB.

Alternatively (and that is also a known effect of debugging an
optimized build), the abort happened not where you think, but rather a
few lines below:

  default_type = (bidi_type_t) XINT (CHAR_TABLE_REF (bidi_type_table, ch));
  /* Every valid character code, even those that are unassigned by the
     UCD, have some bidi-class property, according to
     DerivedBidiClass.txt file.  Therefore, if we ever get UNKNOWN_BT
     (= zero) code from CHAR_TABLE_REF, that's a bug.  */
  if (default_type == UNKNOWN_BT)
    emacs_abort (); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Optimized code frequently emits only one call to emacs_abort, and
converts the other calls to a jump to the locus of that single call.

I really suggest to get an unoptimized build and debug that instead.
Debugging optimized builds, even with GCC 4.8, is a hard and
frustrating task.  In particular, most of the backtraces you posted
don't make any sense at all -- a frequent problem in optimized builds.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019