delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/06/25/14:33:24

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:mime-version:in-reply-to:references:from:date
:message-id:subject:to:content-type; q=dns; s=default; b=qcTqg8v
aa7KeWfrRIKAZdk0PyI4KuIWBypuHYb8bgi0UTlAev//xHs5iudPuZy4SSfTEr5H
iG/Ls8dvZbF76enbfcbFIrKo9DobXXEhcqF9POxXcN/XT9N67KnkmXXd9UYMVTYJ
rlXRq14piJkbxQjbnPIBJvUCz9JnwUwz6R9w=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:mime-version:in-reply-to:references:from:date
:message-id:subject:to:content-type; s=default; bh=2dPGGrL7YMQRm
aUrXscY/ibz/1E=; b=lBevja/7tOKwLYwGoPuQAA4VXs68W/M62FDtPxlJ/ouv8
11J4XPoGpUKb0WcOqC/gf7PF8Fe8xjKcMdJ89INfxZCNVljhDX/cLu9NlnAduih5
DRXcoxMCUutxSePSM8A0Jal7pENmk1Wzs/LCVUbsdfWIzoBIuTrvmtGPuQGvi0=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-6.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=Min, walsh, Walsh, 1s
X-HELO: mail-io0-f171.google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ebeg1YLlm6C5p/Fmy6bnU4+7tauwUWY4O+/h0YN97mw=; b=aELNU0SETVom+28jSrr//xM8vT1U9Fcf/UJxmSQvbKt6e5YBzMoOL2gc/et2PybWJl kw2FgUDHFl+bpjd73vGeZXGEojPE3FAgTOM3ME5X+/59EOFFDBax48bRgjt2rpJ8d1qZ V3tOIuUSkIgiW7QxnPCU36LRpoF08AM2m/zlCCWtw5wksLn394lboDs8lzagHATJxzp9 vgQEaiIgfXWWyoWy+3XM31bGGZlpGCPq5z/5O5qUgR/OThnJ0gSIjHaKk++Snhcy1xAt LhcPp5GFIg5wOVPIGmQaJjlP+j/X3O7YWDJCZ5aD2LRKdWl+hL2RLUct2S3ID8UoOGC7 ww5g==
MIME-Version: 1.0
In-Reply-To: <5B3045B1.4080504@tlinx.org>
References: <CAD8GWss253v-p+FjeonEqibr53v6wZRCQ+NWxBhb0LimQaM4sQ AT mail DOT gmail DOT com> <1183751257 DOT 20180621042620 AT yandex DOT ru> <CAD8GWsuo3PuQSdSyMRhbxZQXa=GUSBcyes7QEaqDYfh3FCof0Q AT mail DOT gmail DOT com> <5B3045B1 DOT 4080504 AT tlinx DOT org>
From: Lee <ler762 AT gmail DOT com>
Date: Mon, 25 Jun 2018 14:33:06 -0400
Message-ID: <CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg@mail.gmail.com>
Subject: Re: UTF-8 character encoding
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes

On 6/24/18, L A Walsh <cygwin AT tlinx DOT org> wrote:
> Lee wrote:
>> So... keep it simple, set
>>   LANG=en_US.UTF-8
>> and use vi or something else that comes with cygwin to create the file
>> and I'll have a file with UTF-8 character encoding - correct?
> ---
> 	The first 127 characters of UTF-8 are identical to the
> first 127 characters of ASCII, and latin1 and iso-8859-1.
>
> If you don't use any characters that need accents or special symbols,
> then nothing will be encoded in UTF-8, because its only
> the characters OVER the first 127
> (see chart @ http://www.babelstone.co.uk/Unicode/babelmap.html).

I'm still trying to figure utf-8 out, but it seems to me that 0x0 -
0xff is part of the utf-8 encoding.  This chart makes things clearer
... at least for me :)
    http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
 The proposed UCS transformation format encodes UCS values in the range
 [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, and 5
 bytes.  For all encodings of more than one byte, the initial byte
 determines the number of bytes used and the high-order bit in each byte
 is set.

 An easy way to remember this transformation format is to note that the
 number of high-order 1's in the first byte is the same as the number of
 subsequent bytes in the multibyte character:

    Bits  Hex Min  Hex Max         Byte Sequence in Binary
 1    7  00000000 0000007f 0zzzzzzz
 2   13  00000080 0000207f 10zzzzzz 1yyyyyyy
 3   19  00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx
 4   25  00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww
 5   31  02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv

Thanks
Lee

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019