X-Recipient: archive-cygwin@delorie.com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DF11A385802F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
	s=default; t=1690812393;
	bh=sg+8bo1pFvpnzt9cufa4foni8IlK14F6F9cbihWXtJ0=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=bKpuHrouIcuRVYBGPrlnjeEsde8ZUDSpJbBb3jSXBy/P04npoZJe31a0JObdqAt1S
	 vilZ3vuiSWnYDJpKNbytVwcvZ9UoBt7CgXmnCqsuPCujhT2cuA3tfRBMnkaFiKsjOw
	 U0rSAqw5diNyGgp0FOn1XMJZ0Zrzwu6xqVbTaFBw=
X-Original-To: cygwin@cygwin.com
Delivered-To: cygwin@cygwin.com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 03A4B3858CD1
ARC-Seal: i=1; a=rsa-sha256; t=1690812373; cv=none;
 d=strato.com; s=strato-dkim-0002;
 b=k+qjgrQCoHpVYK6ZsQSuB0R5mEUZk/AorIf5bfHuUSlbzRyvXQHhWpbDQdU/xzbiMy
 3Wf4qwxbLJeLaIuy03bGkZBGyfZfDf4+cHjfKcSAv5PLSOMd9fCU2/GPpoXkib4BWxv+
 3GKh3eb7bHLrOxkql4CrpA3Ina9HNUVedzi/TYsHLWN8TCsoGlNq21SCeMBO0XAnkHml
 yAku9WEsqQdyW0sRZsJ9GJHglrPCyr+w8UWj4fs2WNX2r39Jy1kWI2OxtQO+WST6XxSF
 YswOtlALv0hPIhBX0chFUgBOjGW24Fo+R6z3HUnfUnldDIN1KQwsib552xp1qtgPOuIY
 jumQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1690812373;
 s=strato-dkim-0002; d=strato.com;
 h=References:In-Reply-To:Message-ID:Date:Subject:To:From:Cc:Date:From:
 Subject:Sender;
 bh=0YnhAmIePkkYrlAKPA7qC2tcUsPDgIN4qXENjfLTimM=;
 b=EanQDx3mf1naZX370ijRzZGAEi/VPc1VeOML9+9c52RWp5sjg5iATxSr+8Z+WLG01h
 KGA4PLBVAAXXYeLB8ZNhepN72gVxww7hqVsrd8cBn9PIWQQHW90LriZd43+DadZ7tImp
 mOohO8ka964WDUN6RlYRZRj8k5GlVNxPmBzn0xo7c2iLsBtWgTeW1CeMkuyInB7PIKCl
 dOZxulRkxwVdmUbFFESrSRbNWuLNGvVxxONYNj6CLGOuzLgnZKca4ROugTGCxzTy2EM0
 1/rZnFE1ZOSHNWhm+hY3SPPkgNok7scQz8i3n6gkjeOC74iVQN5WLnSlNk/1JChz7+i+
 yCQQ==
ARC-Authentication-Results: i=1; strato.com;
    arc=none;
    dkim=none
X-RZG-CLASS-ID: mo00
X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zq68+3cfpPHj6C6mIk6D1piuCc2EubRrsS9rw=="
To: cygwin@cygwin.com
Subject: Re: character class "alpha"
Date: Mon, 31 Jul 2023 16:06:13 +0200
Message-ID: <5176597.IBPj4gxFZX@nimes>
In-Reply-To: <ZMe5Q02S5ap5gBbJ@calimero.vinschen.de>
References: <3884636.3uDm00564X@nimes> <ZMeH6yZQkK0exU8H@calimero.vinschen.de>
 <ZMe5Q02S5ap5gBbJ@calimero.vinschen.de>
MIME-Version: 1.0
X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: cygwin@cygwin.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=subscribe>
From: Bruno Haible via Cygwin <cygwin@cygwin.com>
Reply-To: Bruno Haible <bruno@clisp.org>
Content-Type: text/plain; charset="iso-8859-1"
Errors-To: cygwin-bounces+archive-cygwin=delorie.com@cygwin.com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie.com@cygwin.com>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id 36VE6YPp029403

Corinna Vinschen wrote:
> I have a problem with the c32isalpha function.
> 
> c32isalpha fails for the character U+FF11 FULLWIDTH DIGIT ONE,
> because it expects the character to be an alphabetic character.

This is not a big problem. You can see in the test-c32isalpha.c file
that this test is disabled for many platforms, in particular glibc.
There's no problem with disabling it on Cygwin as well.

> The Cygwin unicode information is automatically generated from the
> Unicode data file UnicodeData.txt, fresh from their homepage.  iswalpha
> in newlib is checking for the Unicode categories, using the expression:
> 
>     return cat == CAT_LC || cat == CAT_Lu || cat == CAT_Ll || cat == CAT_Lt
>           || cat == CAT_Lm || cat == CAT_Lo
> 	  || cat == CAT_Nl // Letter_Number
> 	  ;
> 
> with CAT_foo being equivalent to Unicode category foo.
> 
> Per UnicodeData.txt, ff11 is of category Nd, so it's a digit, not an
> alphabetic character.

This is not wrong. However, see the comments in the generator of the
gnulib tables:

https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/gen-uni-tables.c;h=0dceedc06cd72f886807fd575a2c4dba99cd147a;hb=HEAD#l5789

   /* Consider all the non-ASCII digits as alphabetic.
      ISO C 99 forbids us to have them in category "digit",
      but we want iswalnum to return true on them.  */

Likewise in the generator of the glibc tables:

https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/unicode-gen/unicode_utils.py;h=5af03113a2f1f063769752ea426fcaf6f6ba9e95;hb=HEAD#l274

The original comment (from 2000) was:

  /* SUSV2 gives us some freedom for the "digit" category, but ISO C 99
     takes it away:
     7.25.2.1.5:
        The iswdigit function tests for any wide character that corresponds
        to a decimal-digit character (as defined in 5.2.1).
     5.2.1:
        the 10 decimal digits 0 1 2 3 4 5 6 7 8 9
   */
  return (ch >= 0x0030 && ch <= 0x0039);

The question is: In which category do you put these non-ASCII digits?
"print" and "graph", sure. But other than that? "punct" or "alnum"?
"punct" seems wrong. If you, like me, decide to put them in "alnum",
then you they need to be in "alpha" or "digit" (per POSIX
https://pubs.opengroup.org/onlinepubs/9699919799/functions/iswalnum.html ).
But ISO C 23 § 7.4.1.5 + § 5.2.1 does not allow them in category "digit".

Bruno




-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

