delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/06/26/15:24:06

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=xPWZnFOaayCzxrgp
dJOtK37n6BG3MJc8/NUSkce33f2pEmzts0JZh7oS2BvGYNk1OHHQ+aqbXFJQUoyd
K2Jwum5Yc3PwyAWiGAdsCaxz3W4V8VHjJcuCquksSlv+26Sm6rAl3qKvzI/XzTWj
1XwZPtZA6+axC8z9X/EYKFPv7K4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=T16eL15PLy4agIQSuJaGHh
a3e1w=; b=rqjHLVf1aAG0E5m5t9KepLBNjwCmC1G7pi1FB7cSLhGD1gA+FP7FkU
o2bttitgTpatJmhocagwaT2dofto9lWHpEKt/e/PaX7Bt/ZTtbDLC5qULtaZOT6o
WTutT6oFZewMCYTp2QEwa0MoflovSk1OF48R5aHwVZbEiSHfNDo2E=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-5.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=25.06.2018, 25062018
X-HELO: mout.kundenserver.de
Subject: Re: UTF-8 character encoding
To: cygwin AT cygwin DOT com
References: <CAD8GWss253v-p+FjeonEqibr53v6wZRCQ+NWxBhb0LimQaM4sQ AT mail DOT gmail DOT com> <1183751257 DOT 20180621042620 AT yandex DOT ru> <CAD8GWsuo3PuQSdSyMRhbxZQXa=GUSBcyes7QEaqDYfh3FCof0Q AT mail DOT gmail DOT com> <5B3045B1 DOT 4080504 AT tlinx DOT org> <CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg AT mail DOT gmail DOT com>
From: Thomas Wolff <towo AT towo DOT net>
Message-ID: <981ba1fe-7961-5ed0-e3c7-a5717af8c141@towo.net>
Date: Tue, 26 Jun 2018 21:23:53 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg@mail.gmail.com>
X-IsSubscribed: yes

Am 25.06.2018 um 20:33 schrieb Lee:
> On 6/24/18, L A Walsh <cygwin AT tlinx DOT org> wrote:
>> Lee wrote:
>>> So... keep it simple, set
>>>    LANG=en_US.UTF-8
>>> and use vi or something else that comes with cygwin to create the file
>>> and I'll have a file with UTF-8 character encoding - correct?
>> ---
>> 	The first 127 characters of UTF-8 are identical to the
>> first 127 characters of ASCII, and latin1 and iso-8859-1.
>>
>> If you don't use any characters that need accents or special symbols,
>> then nothing will be encoded in UTF-8, because its only
>> the characters OVER the first 127
>> (see chart @ http://www.babelstone.co.uk/Unicode/babelmap.html).
> I'm still trying to figure utf-8 out, but it seems to me that 0x0 -
> 0xff is part of the utf-8 encoding.  This chart makes things clearer
> ... at least for me :)
>      http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
>   The proposed UCS transformation format encodes UCS values in the range
>   [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, and 5
>   bytes.  For all encodings of more than one byte, the initial byte
>   determines the number of bytes used and the high-order bit in each byte
>   is set.
>
>   An easy way to remember this transformation format is to note that the
>   number of high-order 1's in the first byte is the same as the number of
>   subsequent bytes in the multibyte character:
>
>      Bits  Hex Min  Hex Max         Byte Sequence in Binary
>   1    7  00000000 0000007f 0zzzzzzz
>   2   13  00000080 0000207f 10zzzzzz 1yyyyyyy
>   3   19  00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx
>   4   25  00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww
>   5   31  02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv
This encoding scheme is wrong; where did you get it from? Maybe it's the 
obsolete UTF-8...

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019