delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/09/04/15:59:42

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:mime-version:references:in-reply-to:from:date
:message-id:subject:to:content-type:content-transfer-encoding;
q=dns; s=default; b=wqBhC9MODo0B9KjUJXOZiSqywRYtBZv90AdMZplHPf2
fyztjR55CGhUUNh12CQsEOmilswKxTOrpmfejZnkq8USbs7+ktLP83R8w5R/KZXR
jMjGtqTWQsm/6JUuY3IJxz/MiSkcRCFJisjEUlzItPY7L1wSPT1JCXJuYJZ+vuk8
=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:mime-version:references:in-reply-to:from:date
:message-id:subject:to:content-type:content-transfer-encoding;
s=default; bh=Ry33CMzvKw7HolAfOjg+W5wpYlc=; b=BOIQcGZ1vw5N6a64k
Fbokja8mBg3WbLImPfqSlTr+BLNj7apYBIHh1uQVwhCCdYIts/fZGVwKL2IiR6Co
smOUjjecTL60jQ/bPOLqG8abgQqFbOr0VUDETxJB6m+WUDKi1wAzbJOIzetElXbk
LxUzCiNLQmFFzhQEAkTSt4Ixi0=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-1.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,FROM_LOCAL_NOVOWEL,HK_RANDOM_ENVFROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy=henderson, Henderson, =ef=bf=bd, ordinary?=
X-HELO: mail-qt0-f177.google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=kuUBle6VGeesfdT/vDHOl5ICJX4402XkVA0CauZRqoQ=; b=KyIkYFqy6SgsahJIC2EfEOUdnIHdDu6FqKv0riubthC/3KE8/UDbsWF45KJN6GjCSU k33+0jF9fgMkksVaXCxIm0NkbF64yWM3UQYl1wvCrz2tupcmew0M3dHQBEJ80O0lrWZ5 Lw7djFCyffOVF+yPK3M1g7jPfREqQRm1WrFl7jfagjfy3x2yUsADeQUrtEX7XbcC+pDF AwW7TiNQCvURGFN08WEz9pImEASa1+qm8ByGOEcLet5W9MiTsW0dUjjmw8RpV/VBY9Ao +uniUqZGIrmHpRcfrJelCZGs+TWZLQMHxK7cQUMOU60542tojJfYDFmfdmb9C3x/OxhY r5aQ==
MIME-Version: 1.0
References: <5b8aba97 DOT 1c69fb81 DOT 96f14 DOT 1b37 AT mx DOT google DOT com>
In-Reply-To: <5b8aba97.1c69fb81.96f14.1b37@mx.google.com>
From: Doug Henderson <djndnbvg AT gmail DOT com>
Date: Tue, 4 Sep 2018 13:59:10 -0600
Message-ID: <CAJ1FpuNrMhfB-cmKSiQbj_JB2F_GymCzuv_kY2K9M7RuFqr8Rw@mail.gmail.com>
Subject: Re: Cygwin fails to utilize Unicode replacement character
To: cygwin <cygwin AT cygwin DOT com>
X-IsSubscribed: yes
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id w84JxeGw015935

On Sat, 1 Sep 2018 at 10:13, Steven Penny  wrote:
<snip>
> You get this result with Linux:
>
>     $ cat alfa.txt
>     �
>
> Where "cat" properly outputs Unicode 'REPLACEMENT CHARACTER' (U+FFFD). However
> with Cygwin you get this:
>
>     $ cat alfa.txt
>     â–’
>
> Where "cat" outputs Unicode Character 'MEDIUM SHADE' (U+2592).


My preference is to remove the output fiddling code that Corrina has
been working on. It is trying to solve the wrong problem.
I think we have gone down a rabbit hole at the wrong end of cat's data flow.

Should any changes to the way a character is displayed be required, it
needs to be in the terminal program that display the character, not in
cygwin which should pass the character along unmodified.

Both cygwin and Debian 9.5 show:

    $ file alfa.txt
    alfa.txt: ISO-8859 text

When Linux reads the file, it assumes the encoding is UTF-8.
When cygwin reads the file, it assume the encoding is CP1252
This command shows the problem

    $ iconv -f utf8 alfa.txt
    iconv: alfa.txt:1:0: incomplete character or shift sequence

On Linux, this shows a slightly different message, with the same intent.

Try using this string:

    $ printf "\xC3\xAB\353\n"
    ë▒

to get a better understanding of the problem. It contains two
representation of LATIN SMALL LETTER E WITH DIAERESIS, first encoded
in UTF-8, then using ISO-8859-1.

There are two different reasons for the MEDIUM SHADE. Here it
indicates an invalid UTF-8 character, and the font does not have a
glyph for REPLACEMENT CHARACTER. The MEDIUM SHADE is also used in
place of an ordinary character without a glyph in the font.

HTH
Doug

-- 
Doug Henderson, Calgary, Alberta, Canada - from gmail.com

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019