delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2023/07/31/09:38:52

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9A5293858407
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1690810729;
bh=VzK71a4MUh5dWNjVCpFiBiznfND0tPDd/2XbKRIh680=;
h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=sZbiHHCGGMKGjLfe0lk7ilZmyav3Q90B98qC8XvbaiZ5U7hiHqKdPp/nbZUE1ZJLD
zDk1DNftm55/PyQankW0oo/29zX0HQEOyWKd9hklZFZNZhij5YoXNfE9UCT6l1Ncbq
vUZMAx5mfAwzhd+/bFX3qJqtaJy1nermJS0GbbUY=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2BA533858C2F
Date: Mon, 31 Jul 2023 15:38:11 +0200
To: Bruno Haible <bruno AT clisp DOT org>
Subject: Re: GB18030 locale
Message-ID: <ZMe5Q02S5ap5gBbJ@calimero.vinschen.de>
Mail-Followup-To: Bruno Haible <bruno AT clisp DOT org>, cygwin AT cygwin DOT com
References: <3884636 DOT 3uDm00564X AT nimes> <4641755 DOT FJ9Bj1ZfmD AT nimes>
<ZMTaqNf4dnbry6BD AT calimero DOT vinschen DOT de> <5536760 DOT 9AX2XiyloC AT nimes>
<ZMeH6yZQkK0exU8H AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
In-Reply-To: <ZMeH6yZQkK0exU8H@calimero.vinschen.de>
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Corinna Vinschen via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>, cygwin AT cygwin DOT com
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>

Hi Bruno,

On Jul 31 12:07, Corinna Vinschen via Cygwin wrote:
> On Jul 29 11:53, Bruno Haible via Cygwin wrote:
> > Corinna Vinschen wrote:
> > > However, on debugging this, I see it's totally broken.  Trying to fix
> > > this in the existing functions is futile.  We need dedicated
> > > support functions for GB18030, kind of like the FreeBSD functions,
> > > just with extra support for surrogate pairs, as with our UTF8 stuff.
> > 
> > In case it helps: Find here a test suite for the various multibyte
> > functions with GB18030 specific test cases. (Extracted from gnulib.)
> > https://haible.de/bruno/gnu/testdir-gb18030.tar.gz
> 
> Thank you, I'm already hacking and testing :)

I have a problem with the c32isalpha function.

c32isalpha fails for the character U+FF11 FULLWIDTH DIGIT ONE,
because it expects the character to be an alphabetic character.

The Cygwin unicode information is automatically generated from the
Unicode data file UnicodeData.txt, fresh from their homepage.  iswalpha
in newlib is checking for the Unicode categories, using the expression:

    return cat == CAT_LC || cat == CAT_Lu || cat == CAT_Ll || cat == CAT_Lt
          || cat == CAT_Lm || cat == CAT_Lo
	  || cat == CAT_Nl // Letter_Number
	  ;

with CAT_foo being equivalent to Unicode category foo.

Per UnicodeData.txt, ff11 is of category Nd, so it's a digit, not an
alphabetic character.

I see that Glibc returns 1 from c32isalpha for U+FF11, but I don't see
where it takes that info and why this is correct.  Can you point me to
some info on this?


Thanks,
Corinna

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019