delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/09/04/17:43:34

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=FQBpcOYh/ORkAtL5
E/TeJ65WebIdhVRtmeJ3A7D7w+Nc7K4XZZ0zyUcxLJFG9qRgcDag3Q9sMgYaCGWd
ZJvwgYM25ZVM5kE/yRlE89G4IqSteY/kSSWNWNwHtlmK6Vx1Rb9HBL1ugvYt57wN
LgBPOXFLaMTq/zNiLYfqKv2YSiI=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=NrH2Pjq+qMxAwHf/feBvXT
WwGaQ=; b=EEllZd1QwqkFrNnJ1M/JZJ3po2WTwvYdPN8/tKTEFOtmfvR2IHWsHV
cyN/PtvcRLKDRurplsmB4vruIVtIv7ohLMdxDOXreOAUufllQIP7nL6qlTFsax0k
VB381mdb2D+Bet6jNyPhoCKpX/kt9m+hT26OrbTXoQNaAr7eIrLp8=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-0.8 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy==ef=bf=bd, 04.09.2018, 04092018, Hx-languages-length:4668?=
X-HELO: mout.kundenserver.de
Subject: Re: Cygwin fails to utilize Unicode replacement character
To: cygwin AT cygwin DOT com
References: <4a728822-3c4f-c99f-51cd-63822445aa18 AT towo DOT net> <5b8ee2ae DOT 1c69fb81 DOT 7f961 DOT 3c7d AT mx DOT google DOT com>
From: Thomas Wolff <towo AT towo DOT net>
Message-ID: <5c366e53-ad20-7ccc-5d76-c4fd5adefdf9@towo.net>
Date: Tue, 4 Sep 2018 23:43:16 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <5b8ee2ae.1c69fb81.7f961.3c7d@mx.google.com>
X-IsSubscribed: yes
Note-from-DJ: This may be spam

Am 04.09.2018 um 21:53 schrieb Steven Penny:
> On Tue, 4 Sep 2018 20:41:48, Thomas Wolff wrote:
> ...
>> the .notdef glyph is not an appropriate indication of illegal 
>> encoding (like broken UTF-8 bytes)
>
> true, but neither is U+2592. as far as i know U+2592 is not defined 
> officially
> anywhere as being a representation of anything other than "MEDIUM SHADE".
Traditionally, many terminals used to display the DEL character as a 
checkered block, which is more or less the MEDIUM SHADE.
This makes the glyph appear somewhat "erroneous" by convention.

> Corinna originally added it in 2009:
>
> http://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git&a=commitdiff&h=161211d 
>
>
> with no justification of why it was chosen that i can tell.
Justification is traditional usage of the symbol as described above.

> similarly, mintty
> actually changed from U+FFFD to U+2592 in 2009:
>
> http://github.com/mintty/mintty/commit/90c11d3
>
> with actually a good reason, which was to avoid ambiguity with fonts 
> that didnt
> have U+FFFD. but again, no reason why U+2592 was chosen. i personally 
> see both
> sides of the argument but i tend to land of the side of any standards 
> if they
> exist.

> Here is the standard for U+FFFD:
>
> http://unicode.org/charts/nameslist/n_FFF0.html
FFFD     �     Replacement Character
           •    used to replace an incoming character whose value is 
unknown or unrepresentable in Unicode
>
> if we were to use something other than U+FFFD, I would propose U+25A1, 
> as it is
> also defined by Unicode:
>
>    25A1     □     White Square
>    •    may be used to represent a missing ideograph
>
> http://unicode.org/charts/nameslist/n_25A0.html
Quoting yourself from your other response:
> U+2592 MEDIUM SHADE is *only* used in cases of invalid UTF-8. In case 
> of missing character - the ".notdef" glyph is used
This is my point. We have two use cases here:
invalid code point -> MEDIUM SHADE
valid code point with no glyph in font -> .notdef glyph -> WHITE SQUARE
Now if you switch to FFFD REPLACEMENT CHARACTER for invalid code point, 
and considering that it does not exist in most actual fonts and that the 
console does not apply font fallback, it will resolve to WHITE SQUARE, thus:
folding the two different use cases into the same appearance,
which is bad.
Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019