delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/06/26/17:39:55

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:mime-version:in-reply-to:references:from:date
:message-id:subject:to:content-type; q=dns; s=default; b=WQi5dRK
MN0S1DKuvANhgFGCPF8lPeH5n/LAU7Luw5YqJA9NPGuA+njU/02IUjPnfxXSS4oC
dJyqTs48l9zNF796hbfu3Q222sqJxJ89J2rjhSbimzzTy9pKORFUSft8V81z2btM
yUNkyZmitov8KkMgGwpBjUMy4qwgbkHF1zPs=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:mime-version:in-reply-to:references:from:date
:message-id:subject:to:content-type; s=default; bh=MQaRVEq1wElBI
r7Hivz8ZPxsf7s=; b=GD2CUpaxB85MBKdtrWfyJBLikwdsziXwR5FIlSk/zBMfj
yOo3FARVsyjQY8np39Nh88qQDCjFbvypBBiUtk8nNgaXbMZJtK51OEg3qKUjf0Fi
MZHiWks/3EkVLqYRul9Tdh1Tij/UfnXOTfmbgbgzNwSvMNVMmyzNovLdkrC3WU=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=1-4, arrived, Hx-languages-length:1194
X-HELO: mail-oi0-f52.google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kmcardiff-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=sR4Px5scipBgKXY4VOAQiAsrbuuFiav52QsO7KXpk00=; b=bSapeO4iPX48sKUpky/6VhSIClVwy7faBi4q4BbxwjAmdAI/lapy5iQEh1/HVD4/HH PjDhjZmm5bkIG2J4FQ2829ml2u2c9kG+ukeBXJ8MUK715tq8MwIJH2rD3rnMiVh+xpJT Z5dhES2P2yNpucfOPltDwO2uXPylDVGobKY02z66ntAv/llNZtTUcB0s0KLbDNZXu2Jf 84sXj3Xk36nuO5Qb9Qwg1ava1vEQrq2K2rt9/NaEieHVuwGWRulKoT+3BU875fS9BiA4 HVXnNCFZuY1GpBd16tOKGkva6toWi87bCEg8GwKf2YVtIajUKKQPrwrEHhbG5TET9HBd d/gg==
MIME-Version: 1.0
In-Reply-To: <CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg@mail.gmail.com>
References: <CAD8GWss253v-p+FjeonEqibr53v6wZRCQ+NWxBhb0LimQaM4sQ AT mail DOT gmail DOT com> <1183751257 DOT 20180621042620 AT yandex DOT ru> <CAD8GWsuo3PuQSdSyMRhbxZQXa=GUSBcyes7QEaqDYfh3FCof0Q AT mail DOT gmail DOT com> <5B3045B1 DOT 4080504 AT tlinx DOT org> <CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg AT mail DOT gmail DOT com>
From: Michael Enright <mike AT kmcardiff DOT com>
Date: Tue, 26 Jun 2018 14:39:35 -0700
Message-ID: <CAOC2fq-OZE1DvG_p1tcVpfJHD3x2rg2nmiY1ox8mAoVSEM3S7w@mail.gmail.com>
Subject: Re: UTF-8 character encoding
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes

On Mon, Jun 25, 2018 at 11:33 AM, Lee <ler762 AT gmail DOT com> wrote:
> I'm still trying to figure utf-8 out, but it seems to me that 0x0 -
> 0xff is part of the utf-8 encoding.

I don't see how you arrived at this. An initial byte of 0xFF is not
the initial byte of any valid UTF-8 byte sequence. And it doesn't
conform with the statement you have later:

>  An easy way to remember this transformation format is to note that the
>  number of high-order 1's in the first byte is the same as the number of
>  subsequent bytes in the multibyte character:

This is true, but there is also a zero bit that ends the
high-order-1's bit string, which means that 0xFF is not a valid lead
byte. 0x7F is the highest byte value that you can have as a
single-byte UTF8 string.

Perhaps your statement about 0-0xFF was meant to be read differently.

Thomas Wolff's note seems to be objecting to the inclusion of
characters above U+10FFFF which isn't legal UTF-8, but was in the
original proposal. Otherwise your table rows 1-4 is correct.

The standards such as IETF RFC-3629 are easy enough to read, so I
recommend using them and citing them to others instead of trying to
summarize.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019