delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:message-id:date:from:to:references:subject | |
:content-type; q=dns; s=default; b=ZVqe4Hv9r6bSewUQPw1VlpXynev84 | |
XlZMOhYiikgQNdYuaNDENNSBlHBzzKs1gic43sUsrzFshpGcPqjWnOD9BSsfLyW4 | |
EQHAADiPaJ+05nhiYKZX5va508CdqjHAeEsAo3z8c0HUXrzBuc6ObMzVLXs8Bm1n | |
37bskXU+tSG5nI= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:message-id:date:from:to:references:subject | |
:content-type; s=default; bh=5SGEhq8Dtz19cn3gx3cbtriy7fk=; b=ToO | |
5KbM6nB9vZ+d9kC5VwWT4hmd7PFNcNiwocNjGOmB6ljafIoiskGvR7SzOUAmiTQ9 | |
pjj0fDCitTrQ9jU+vpHI3GPz2AdNxBb4cOzMZvtMSCiGJNchFo2QQRjrLBBpuDWg | |
PyAvResgLrb/jK12vn4DhtSCqIZUmFLpS8ztipxk= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Spam-SWARE-Status: | No, score=-1.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=henderson, Henderson, weight, cats |
X-HELO: | mail-oi0-f67.google.com |
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:date:from:to:references:subject:user-agent; bh=NG1/FiQJqBwfyx0VfwinQQzzvDjrZ5FQ+hT3BQvq7n4=; b=TqOJQIybf924jggjunhb59l3XJNaaQenJJAexZ8mZyUhbuRb9nzWswAzGgPcga7AQv cuNkcOciftus0tqyPA4ei0S3wNHIZ3aZw4SzVH9/ePktiUhyvfMDoLLVtUPwBvUoCUZ2 gPQIMKd/5icleLioFMB85BulH3fHUzqTcuNwSZlpiAM9eQNXxNTqC1tODON4tljy909H NSEZ0Io1dac+BtXhBcJi8FhnkijFJ3B9h90jEu2jstk66Tu3fgm0taLAQxHbnVcr9YvY T7rQQtgxYdz/uJzM4QCwdN6oIdGW1kbhoJWgn5zR78w4suEb3XnmRzm1jrFwumAaGeuU msJQ== |
Message-ID: | <5b8ef3af.1c69fb81.6801.f392@mx.google.com> |
Date: | Tue, 04 Sep 2018 14:05:51 -0700 (PDT) |
From: | Steven Penny <svnpenn AT gmail DOT com> |
To: | cygwin AT cygwin DOT com |
References: | <CAJ1FpuNrMhfB-cmKSiQbj_JB2F_GymCzuv_kY2K9M7RuFqr8Rw AT mail DOT gmail DOT com> |
Subject: | Re: Cygwin fails to utilize Unicode replacement character |
User-Agent: | Tryst/2.8.0 (cup.github.io/tryst) |
On Tue, 4 Sep 2018 13:59:10, Doug Henderson wrote: > My preference is to remove the output fiddling code that Corrina has > been working on. It is trying to solve the wrong problem. > I think we have gone down a rabbit hole at the wrong end of cat's data flow. this has nothing to do with "cat". it has to do with the unfounded design decision to use U+2592. Granted at this point we are bikeshedding - but an official standard does exist, namely Unicode, with 2 applicable characters for this use case: 1. U+FFFD: http://unicode.org/charts/nameslist/n_FFF0.html 2. U+25A1: http://unicode.org/charts/nameslist/n_25A0.html > Should any changes to the way a character is displayed be required, it > needs to be in the terminal program that display the character, not in > cygwin which should pass the character along unmodified. the "terminal" in this case is either "cygwin" or "xterm" - in both cases code changes have already been made in reponse to this thread, so i dont think your comment here holds weight. > Both cygwin and Debian 9.5 show: > > $ file alfa.txt > alfa.txt: ISO-8859 text > > When Linux reads the file, it assumes the encoding is UTF-8. > When cygwin reads the file, it assume the encoding is CP1252 > This command shows the problem > > $ iconv -f utf8 alfa.txt > iconv: alfa.txt:1:0: incomplete character or shift sequence > > On Linux, this shows a slightly different message, with the same intent. > > Try using this string: > > $ printf "\xC3\xAB\353\n" > =C3=AB=E2=96=92 > > to get a better understanding of the problem. It contains two > representation of LATIN SMALL LETTER E WITH DIAERESIS, first encoded > in UTF-8, then using ISO-8859-1. now it appears *you* are going down the rabbit hole. both Cygwin and Mintty were in violation on Unicode standard - however this has already been remedied in the code. > There are two different reasons for the MEDIUM SHADE. Here it > indicates an invalid UTF-8 character, and the font does not have a > glyph for REPLACEMENT CHARACTER. The MEDIUM SHADE is also used in > place of an ordinary character without a glyph in the font. this is flat wrong. U+2592 MEDIUM SHADE is *only* used in cases of invalid UTF-8. In case of missing character - the ".notdef" glyph is used - as has been discussed several times in this thread. This is not an actual character, so i cannot paste it here - but as an example with "DejaVu Sans Mono" the glyph is an empty rectangle. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |