delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/09/04/15:53:33

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:subject:to:references
:content-type; q=dns; s=default; b=uBLZxOpU0TRVfMNlYtKckHxJCxPuA
7LOVgBNaVnttfhjMXo4afOSBp19OSKXrCsLQr2MJwUNyaoI95TQaOROMRYk+Muqr
by94xlhVLX5xbVN4WhwxFaK5qYq4Th+0HZ1xdVuU+7TcFGr5NGi8aeVv/HDuCIkR
2+27MtxaAH68bA=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:subject:to:references
:content-type; s=default; bh=PQsIgolLTQmameCd3ktqjPup8tk=; b=Kn8
ua+g39eW1wTZrpghCN8/+ixLleYambzP3A+QzmDFsz4J4+HAyN3catEI0PWctznk
HyyXqBrIPabTPdqrp8v1qBWpKjfQmQS0ikXZZwP40HzvdCdkleTVvJLh0oJd/5ra
cumPd5wGNkHonrSTY7xwF/7PsVTxqGxkGUJ2/epA=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=Square, opinion, sides, 2009
X-HELO: mail-oi0-f65.google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:date:from:subject:to:references:user-agent; bh=HOqnt32zvRBH59Zo7BYjpcDbknrujQCvk8JqT1zvtZ4=; b=Vizsqlima6o74YamZxL0GaEey1lFxazaI5WAF/rNMIBosG8KelWZfv4Acuou1XcKGN ldNCdIa+iY0DDBVgNSAIx6sCpg0ipNYWazFSAS+rT7HBjzJStevkF2YtoOIQaN65dEnr YCNFv0ql9pziFnDqYQO6xAZ6WB0cp14VHoHRhJG5L6gXVEkyyQ0buijSEt7cDLwnnB0L 7Ajuex05xdldrECHwZPzjTrXRJ8JR8Oo7k/kSayMHOPxKROZEKz8AJp99rI0zEQDUrzy FHHrqrkKE4u0icWarKVv1CXpNsRopv2uutZt5AnvGH0SBSMsakh1W1Ts8vwJu8+dEJwK HlZA==
Message-ID: <5b8ee2ae.1c69fb81.7f961.3c7d@mx.google.com>
Date: Tue, 04 Sep 2018 12:53:18 -0700 (PDT)
From: Steven Penny <svnpenn AT gmail DOT com>
Subject: Re: Cygwin fails to utilize Unicode replacement character
To: cygwin AT cygwin DOT com
References: <4a728822-3c4f-c99f-51cd-63822445aa18 AT towo DOT net>
User-Agent: Tryst/2.8.0 (cup.github.io/tryst)

On Tue, 4 Sep 2018 20:41:48, Thomas Wolff wrote:
> No idea what you consider dangerous. Anyway, we obviously agree that 
> hardly any available console font supports the REPLACEMENT CHARACTER. 
> You had previously suggested code that might work (using CreateFont(0, 
> 0, ....)). Maybe you can sort out with Corinna how to get that work 
> inside cygwin. Otherwise, my opinion:
> - *working* fallback from FFFD to 2592: good

i am fine with this, but i think corinna feels it is too much code for not
enough benefit - thats her decision.

> - fix FFFD: not good, because the .notdef glyph is not an appropriate 
> indication of illegal encoding (like broken UTF-8 bytes)

not sure what you even mean by this - FFFD doesnt need fixing - Windows just
need to adopt some fonts with proper unicode support. we are dealing with their
lack of doing that.

> the .notdef glyph is not an appropriate indication of illegal encoding (like
> broken UTF-8 bytes)

true, but neither is U+2592. as far as i know U+2592 is not defined officially
anywhere as being a representation of anything other than "MEDIUM SHADE".
Corinna originally added it in 2009:

http://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git&a=commitdiff&h=161211d

with no justification of why it was chosen that i can tell. similarly, mintty
actually changed from U+FFFD to U+2592 in 2009:

http://github.com/mintty/mintty/commit/90c11d3

with actually a good reason, which was to avoid ambiguity with fonts that didnt
have U+FFFD. but again, no reason why U+2592 was chosen. i personally see both
sides of the argument but i tend to land of the side of any standards if they
exist. Here is the standard for U+FFFD:

http://unicode.org/charts/nameslist/n_FFF0.html

> - revert to 2592: OK

if we were to use something other than U+FFFD, I would propose U+25A1, as it is
also defined by Unicode:

    25A1	 â–¡ 	White Square
    •	may be used to represent a missing ideograph

http://unicode.org/charts/nameslist/n_25A0.html

and it has better support than U+FFFD:

    yes:
    - Consolas
    - Courier New
    - DejaVu Sans Mono
    - MS Gothic
    - NSimSun

    no:
    - Lucida Console
    - SimSun-ExtB


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019