delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/09/04/17:06:06

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:to:references:subject
:content-type; q=dns; s=default; b=ZVqe4Hv9r6bSewUQPw1VlpXynev84
XlZMOhYiikgQNdYuaNDENNSBlHBzzKs1gic43sUsrzFshpGcPqjWnOD9BSsfLyW4
EQHAADiPaJ+05nhiYKZX5va508CdqjHAeEsAo3z8c0HUXrzBuc6ObMzVLXs8Bm1n
37bskXU+tSG5nI=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:message-id:date:from:to:references:subject
:content-type; s=default; bh=5SGEhq8Dtz19cn3gx3cbtriy7fk=; b=ToO
5KbM6nB9vZ+d9kC5VwWT4hmd7PFNcNiwocNjGOmB6ljafIoiskGvR7SzOUAmiTQ9
pjj0fDCitTrQ9jU+vpHI3GPz2AdNxBb4cOzMZvtMSCiGJNchFo2QQRjrLBBpuDWg
PyAvResgLrb/jK12vn4DhtSCqIZUmFLpS8ztipxk=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=henderson, Henderson, weight, cats
X-HELO: mail-oi0-f67.google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:date:from:to:references:subject:user-agent; bh=NG1/FiQJqBwfyx0VfwinQQzzvDjrZ5FQ+hT3BQvq7n4=; b=TqOJQIybf924jggjunhb59l3XJNaaQenJJAexZ8mZyUhbuRb9nzWswAzGgPcga7AQv cuNkcOciftus0tqyPA4ei0S3wNHIZ3aZw4SzVH9/ePktiUhyvfMDoLLVtUPwBvUoCUZ2 gPQIMKd/5icleLioFMB85BulH3fHUzqTcuNwSZlpiAM9eQNXxNTqC1tODON4tljy909H NSEZ0Io1dac+BtXhBcJi8FhnkijFJ3B9h90jEu2jstk66Tu3fgm0taLAQxHbnVcr9YvY T7rQQtgxYdz/uJzM4QCwdN6oIdGW1kbhoJWgn5zR78w4suEb3XnmRzm1jrFwumAaGeuU msJQ==
Message-ID: <5b8ef3af.1c69fb81.6801.f392@mx.google.com>
Date: Tue, 04 Sep 2018 14:05:51 -0700 (PDT)
From: Steven Penny <svnpenn AT gmail DOT com>
To: cygwin AT cygwin DOT com
References: <CAJ1FpuNrMhfB-cmKSiQbj_JB2F_GymCzuv_kY2K9M7RuFqr8Rw AT mail DOT gmail DOT com>
Subject: Re: Cygwin fails to utilize Unicode replacement character
User-Agent: Tryst/2.8.0 (cup.github.io/tryst)

On Tue, 4 Sep 2018 13:59:10, Doug Henderson wrote:
> My preference is to remove the output fiddling code that Corrina has
> been working on. It is trying to solve the wrong problem.
> I think we have gone down a rabbit hole at the wrong end of cat's data flow.

this has nothing to do with "cat". it has to do with the unfounded design
decision to use U+2592. Granted at this point we are bikeshedding - but an
official standard does exist, namely Unicode, with 2 applicable characters for
this use case:

1. U+FFFD: http://unicode.org/charts/nameslist/n_FFF0.html
2. U+25A1: http://unicode.org/charts/nameslist/n_25A0.html

> Should any changes to the way a character is displayed be required, it
> needs to be in the terminal program that display the character, not in
> cygwin which should pass the character along unmodified.

the "terminal" in this case is either "cygwin" or "xterm" - in both cases code
changes have already been made in reponse to this thread, so i dont think your
comment here holds weight.

> Both cygwin and Debian 9.5 show:
>
>     $ file alfa.txt
>     alfa.txt: ISO-8859 text
>
> When Linux reads the file, it assumes the encoding is UTF-8.
> When cygwin reads the file, it assume the encoding is CP1252
> This command shows the problem
>
>     $ iconv -f utf8 alfa.txt
>     iconv: alfa.txt:1:0: incomplete character or shift sequence
>
> On Linux, this shows a slightly different message, with the same intent.
>
> Try using this string:
>
>     $ printf "\xC3\xAB\353\n"
>     =C3=AB=E2=96=92
>
> to get a better understanding of the problem. It contains two
> representation of LATIN SMALL LETTER E WITH DIAERESIS, first encoded
> in UTF-8, then using ISO-8859-1.

now it appears *you* are going down the rabbit hole. both Cygwin and Mintty were
in violation on Unicode standard - however this has already been remedied in the
code.

> There are two different reasons for the MEDIUM SHADE. Here it
> indicates an invalid UTF-8 character, and the font does not have a
> glyph for REPLACEMENT CHARACTER. The MEDIUM SHADE is also used in
> place of an ordinary character without a glyph in the font.

this is flat wrong. U+2592 MEDIUM SHADE is *only* used in cases of invalid
UTF-8. In case of missing character - the ".notdef" glyph is used - as has been
discussed several times in this thread. This is not an actual character, so i
cannot paste it here - but as an example with "DejaVu Sans Mono" the glyph is
an empty rectangle.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019