delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type | |
:content-transfer-encoding; q=dns; s=default; b=xPWZnFOaayCzxrgp | |
dJOtK37n6BG3MJc8/NUSkce33f2pEmzts0JZh7oS2BvGYNk1OHHQ+aqbXFJQUoyd | |
K2Jwum5Yc3PwyAWiGAdsCaxz3W4V8VHjJcuCquksSlv+26Sm6rAl3qKvzI/XzTWj | |
1XwZPtZA6+axC8z9X/EYKFPv7K4= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type | |
:content-transfer-encoding; s=default; bh=T16eL15PLy4agIQSuJaGHh | |
a3e1w=; b=rqjHLVf1aAG0E5m5t9KepLBNjwCmC1G7pi1FB7cSLhGD1gA+FP7FkU | |
o2bttitgTpatJmhocagwaT2dofto9lWHpEKt/e/PaX7Bt/ZTtbDLC5qULtaZOT6o | |
WTutT6oFZewMCYTp2QEwa0MoflovSk1OF48R5aHwVZbEiSHfNDo2E= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Spam-SWARE-Status: | No, score=-5.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=25.06.2018, 25062018 |
X-HELO: | mout.kundenserver.de |
Subject: | Re: UTF-8 character encoding |
To: | cygwin AT cygwin DOT com |
References: | <CAD8GWss253v-p+FjeonEqibr53v6wZRCQ+NWxBhb0LimQaM4sQ AT mail DOT gmail DOT com> <1183751257 DOT 20180621042620 AT yandex DOT ru> <CAD8GWsuo3PuQSdSyMRhbxZQXa=GUSBcyes7QEaqDYfh3FCof0Q AT mail DOT gmail DOT com> <5B3045B1 DOT 4080504 AT tlinx DOT org> <CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg AT mail DOT gmail DOT com> |
From: | Thomas Wolff <towo AT towo DOT net> |
Message-ID: | <981ba1fe-7961-5ed0-e3c7-a5717af8c141@towo.net> |
Date: | Tue, 26 Jun 2018 21:23:53 +0200 |
User-Agent: | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 |
MIME-Version: | 1.0 |
In-Reply-To: | <CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg@mail.gmail.com> |
X-IsSubscribed: | yes |
Am 25.06.2018 um 20:33 schrieb Lee: > On 6/24/18, L A Walsh <cygwin AT tlinx DOT org> wrote: >> Lee wrote: >>> So... keep it simple, set >>> LANG=en_US.UTF-8 >>> and use vi or something else that comes with cygwin to create the file >>> and I'll have a file with UTF-8 character encoding - correct? >> --- >> The first 127 characters of UTF-8 are identical to the >> first 127 characters of ASCII, and latin1 and iso-8859-1. >> >> If you don't use any characters that need accents or special symbols, >> then nothing will be encoded in UTF-8, because its only >> the characters OVER the first 127 >> (see chart @ http://www.babelstone.co.uk/Unicode/babelmap.html). > I'm still trying to figure utf-8 out, but it seems to me that 0x0 - > 0xff is part of the utf-8 encoding. This chart makes things clearer > ... at least for me :) > http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt > The proposed UCS transformation format encodes UCS values in the range > [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, and 5 > bytes. For all encodings of more than one byte, the initial byte > determines the number of bytes used and the high-order bit in each byte > is set. > > An easy way to remember this transformation format is to note that the > number of high-order 1's in the first byte is the same as the number of > subsequent bytes in the multibyte character: > > Bits Hex Min Hex Max Byte Sequence in Binary > 1 7 00000000 0000007f 0zzzzzzz > 2 13 00000080 0000207f 10zzzzzz 1yyyyyyy > 3 19 00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx > 4 25 00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww > 5 31 02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv This encoding scheme is wrong; where did you get it from? Maybe it's the obsolete UTF-8... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |