Mail Archives: cygwin/2013/08/16/04:57:02
I'm not subscribed to this list, so if you want me to reply, please CC
me explicitly. Besides, this discussion should be moved to
emacs-devel AT gnu DOT org, since I don't see anything Cygwin specific here
at this point.
> Date: Thu, 15 Aug 2013 16:55:18 -0400
> From: Ryan Johnson <ryan DOT johnson AT cs DOT utoronto DOT ca>
>
> On 15/08/2013 1:10 PM, Eli Zaretskii wrote:
> >> Date: Thu, 15 Aug 2013 12:58:02 -0400
> >> From: Ken Brown <kbrown AT cornell DOT edu>
> >> CC: Eli Zaretskii <eliz AT gnu DOT org>
> >>
> >> Eli is the expert on bidi.c (he wrote it). He can probably tell you
> >> whether you've really bumped into an emacs bug here.
> > There's nothing wrong with bidi.c here, it just aborts because it is
> > handed an invalid character codepoint. It would have been useful to
> > see the value of that character.
> I guess I would just consider crashing to be overkill for a bad byte on
> the input stream...
It's not a crash, it's a deliberate abort. Any invalid codepoint at
such low level of the Emacs display engine means only one thing: a
bug, and a grave one at that. Such bugs must be flagged prominently
and unequivocally, prompting users to report them. We could in
principle "recover" by substituting some other character, but such
recovery would only sweep a grave problem under the carpet. Since
Emacs isn't a safety-critical program, and auto-saves your edits
before it commits suicide, such recovery feature is deemed
inappropriate, and detrimental to the general quality of Emacs code in
the long run.
> and in any case, if 5-byte UTF-8 is illegal, and
> worth dying for, wouldn't it make sense to die right away rather than
> processing it so something else can croak down the road?
See above: yes, it's worth dying for, because I'm quite sure this is a
sign of a very serious trouble in the session anyway. Why does it
matter for you, as a user, whether we abort here or "down the road"?
The principle is to die as soon as possible, because in many cases
this allows to identify the culprit faster and easier. IOW, dying
sooner and faster helps the Emacs maintainers to find and fix problems
without any real effect on the users.
> > Anyway, I generally agree that this is probably some memory
> > corruption, as I'm guessing that the text in the window was all ASCII
> > in this case, so any character codepoint beyond 127 is not to be
> > expected.
> I set a breakpoint there, since I thought it was guaranteed to lead to a
> crash if it ever ran, but it turns out that's not true. Invoking M-x
> compile triggers the breakpoint twice in a row with the following
> (valid!) 5-byte UTF-8:
>
> 111110XX 10XXXXXX 10XXXXXX 10XXXXXX 10XXXXXX
> 11111000 10001111 10111111 10111101 10111111
>
> The value is always the same, and corresponds to the code point
> U+3FFF7F, FWIW.
If the value is positive and below 3FFFFF, then the abort could not
have happened. Therefore, I believe that the optimized build lies to
GDB, and the actual value is not what you see in GDB.
Alternatively (and that is also a known effect of debugging an
optimized build), the abort happened not where you think, but rather a
few lines below:
default_type = (bidi_type_t) XINT (CHAR_TABLE_REF (bidi_type_table, ch));
/* Every valid character code, even those that are unassigned by the
UCD, have some bidi-class property, according to
DerivedBidiClass.txt file. Therefore, if we ever get UNKNOWN_BT
(= zero) code from CHAR_TABLE_REF, that's a bug. */
if (default_type == UNKNOWN_BT)
emacs_abort (); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Optimized code frequently emits only one call to emacs_abort, and
converts the other calls to a jump to the locus of that single call.
I really suggest to get an unoptimized build and debug that instead.
Debugging optimized builds, even with GCC 4.8, is a hard and
frustrating task. In particular, most of the backtraces you posted
don't make any sense at all -- a frequent problem in optimized builds.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -