delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type | |
:content-transfer-encoding; q=dns; s=default; b=FQBpcOYh/ORkAtL5 | |
E/TeJ65WebIdhVRtmeJ3A7D7w+Nc7K4XZZ0zyUcxLJFG9qRgcDag3Q9sMgYaCGWd | |
ZJvwgYM25ZVM5kE/yRlE89G4IqSteY/kSSWNWNwHtlmK6Vx1Rb9HBL1ugvYt57wN | |
LgBPOXFLaMTq/zNiLYfqKv2YSiI= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type | |
:content-transfer-encoding; s=default; bh=NrH2Pjq+qMxAwHf/feBvXT | |
WwGaQ=; b=EEllZd1QwqkFrNnJ1M/JZJ3po2WTwvYdPN8/tKTEFOtmfvR2IHWsHV | |
cyN/PtvcRLKDRurplsmB4vruIVtIv7ohLMdxDOXreOAUufllQIP7nL6qlTFsax0k | |
VB381mdb2D+Bet6jNyPhoCKpX/kt9m+hT26OrbTXoQNaAr7eIrLp8= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Spam-SWARE-Status: | =?ISO-8859-1?Q?No, score=-0.8 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy==ef=bf=bd, 04.09.2018, 04092018, Hx-languages-length:4668?= |
X-HELO: | mout.kundenserver.de |
Subject: | Re: Cygwin fails to utilize Unicode replacement character |
To: | cygwin AT cygwin DOT com |
References: | <4a728822-3c4f-c99f-51cd-63822445aa18 AT towo DOT net> <5b8ee2ae DOT 1c69fb81 DOT 7f961 DOT 3c7d AT mx DOT google DOT com> |
From: | Thomas Wolff <towo AT towo DOT net> |
Message-ID: | <5c366e53-ad20-7ccc-5d76-c4fd5adefdf9@towo.net> |
Date: | Tue, 4 Sep 2018 23:43:16 +0200 |
User-Agent: | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
MIME-Version: | 1.0 |
In-Reply-To: | <5b8ee2ae.1c69fb81.7f961.3c7d@mx.google.com> |
X-IsSubscribed: | yes |
Note-from-DJ: | This may be spam |
Am 04.09.2018 um 21:53 schrieb Steven Penny: > On Tue, 4 Sep 2018 20:41:48, Thomas Wolff wrote: > ... >> the .notdef glyph is not an appropriate indication of illegal >> encoding (like broken UTF-8 bytes) > > true, but neither is U+2592. as far as i know U+2592 is not defined > officially > anywhere as being a representation of anything other than "MEDIUM SHADE". Traditionally, many terminals used to display the DEL character as a checkered block, which is more or less the MEDIUM SHADE. This makes the glyph appear somewhat "erroneous" by convention. > Corinna originally added it in 2009: > > http://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git&a=commitdiff&h=161211d > > > with no justification of why it was chosen that i can tell. Justification is traditional usage of the symbol as described above. > similarly, mintty > actually changed from U+FFFD to U+2592 in 2009: > > http://github.com/mintty/mintty/commit/90c11d3 > > with actually a good reason, which was to avoid ambiguity with fonts > that didnt > have U+FFFD. but again, no reason why U+2592 was chosen. i personally > see both > sides of the argument but i tend to land of the side of any standards > if they > exist. > Here is the standard for U+FFFD: > > http://unicode.org/charts/nameslist/n_FFF0.html FFFD    �    Replacement Character         •   used to replace an incoming character whose value is unknown or unrepresentable in Unicode > > if we were to use something other than U+FFFD, I would propose U+25A1, > as it is > also defined by Unicode: > >   25A1    □    White Square >   •   may be used to represent a missing ideograph > > http://unicode.org/charts/nameslist/n_25A0.html Quoting yourself from your other response: > U+2592 MEDIUM SHADE is *only* used in cases of invalid UTF-8. In case > of missing character - the ".notdef" glyph is used This is my point. We have two use cases here: invalid code point -> MEDIUM SHADE valid code point with no glyph in font -> .notdef glyph -> WHITE SQUARE Now if you switch to FFFD REPLACEMENT CHARACTER for invalid code point, and considering that it does not exist in most actual fonts and that the console does not apply font fallback, it will resolve to WHITE SQUARE, thus: folding the two different use cases into the same appearance, which is bad. Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |