X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=xPWZnFOaayCzxrgp dJOtK37n6BG3MJc8/NUSkce33f2pEmzts0JZh7oS2BvGYNk1OHHQ+aqbXFJQUoyd K2Jwum5Yc3PwyAWiGAdsCaxz3W4V8VHjJcuCquksSlv+26Sm6rAl3qKvzI/XzTWj 1XwZPtZA6+axC8z9X/EYKFPv7K4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=T16eL15PLy4agIQSuJaGHh a3e1w=; b=rqjHLVf1aAG0E5m5t9KepLBNjwCmC1G7pi1FB7cSLhGD1gA+FP7FkU o2bttitgTpatJmhocagwaT2dofto9lWHpEKt/e/PaX7Bt/ZTtbDLC5qULtaZOT6o WTutT6oFZewMCYTp2QEwa0MoflovSk1OF48R5aHwVZbEiSHfNDo2E= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=25.06.2018, 25062018 X-HELO: mout.kundenserver.de Subject: Re: UTF-8 character encoding To: cygwin AT cygwin DOT com References: <1183751257 DOT 20180621042620 AT yandex DOT ru> <5B3045B1 DOT 4080504 AT tlinx DOT org> From: Thomas Wolff Message-ID: <981ba1fe-7961-5ed0-e3c7-a5717af8c141@towo.net> Date: Tue, 26 Jun 2018 21:23:53 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Am 25.06.2018 um 20:33 schrieb Lee: > On 6/24/18, L A Walsh wrote: >> Lee wrote: >>> So... keep it simple, set >>> LANG=en_US.UTF-8 >>> and use vi or something else that comes with cygwin to create the file >>> and I'll have a file with UTF-8 character encoding - correct? >> --- >> The first 127 characters of UTF-8 are identical to the >> first 127 characters of ASCII, and latin1 and iso-8859-1. >> >> If you don't use any characters that need accents or special symbols, >> then nothing will be encoded in UTF-8, because its only >> the characters OVER the first 127 >> (see chart @ http://www.babelstone.co.uk/Unicode/babelmap.html). > I'm still trying to figure utf-8 out, but it seems to me that 0x0 - > 0xff is part of the utf-8 encoding. This chart makes things clearer > ... at least for me :) > http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt > The proposed UCS transformation format encodes UCS values in the range > [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, and 5 > bytes. For all encodings of more than one byte, the initial byte > determines the number of bytes used and the high-order bit in each byte > is set. > > An easy way to remember this transformation format is to note that the > number of high-order 1's in the first byte is the same as the number of > subsequent bytes in the multibyte character: > > Bits Hex Min Hex Max Byte Sequence in Binary > 1 7 00000000 0000007f 0zzzzzzz > 2 13 00000080 0000207f 10zzzzzz 1yyyyyyy > 3 19 00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx > 4 25 00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww > 5 31 02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv This encoding scheme is wrong; where did you get it from? Maybe it's the obsolete UTF-8... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple