delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2023/07/31/10:06:34

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DF11A385802F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1690812393;
bh=sg+8bo1pFvpnzt9cufa4foni8IlK14F6F9cbihWXtJ0=;
h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=bKpuHrouIcuRVYBGPrlnjeEsde8ZUDSpJbBb3jSXBy/P04npoZJe31a0JObdqAt1S
vilZ3vuiSWnYDJpKNbytVwcvZ9UoBt7CgXmnCqsuPCujhT2cuA3tfRBMnkaFiKsjOw
U0rSAqw5diNyGgp0FOn1XMJZ0Zrzwu6xqVbTaFBw=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 03A4B3858CD1
ARC-Seal: i=1; a=rsa-sha256; t=1690812373; cv=none;
d=strato.com; s=strato-dkim-0002;
b=k+qjgrQCoHpVYK6ZsQSuB0R5mEUZk/AorIf5bfHuUSlbzRyvXQHhWpbDQdU/xzbiMy
3Wf4qwxbLJeLaIuy03bGkZBGyfZfDf4+cHjfKcSAv5PLSOMd9fCU2/GPpoXkib4BWxv+
3GKh3eb7bHLrOxkql4CrpA3Ina9HNUVedzi/TYsHLWN8TCsoGlNq21SCeMBO0XAnkHml
yAku9WEsqQdyW0sRZsJ9GJHglrPCyr+w8UWj4fs2WNX2r39Jy1kWI2OxtQO+WST6XxSF
YswOtlALv0hPIhBX0chFUgBOjGW24Fo+R6z3HUnfUnldDIN1KQwsib552xp1qtgPOuIY
jumQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1690812373;
s=strato-dkim-0002; d=strato.com;
h=References:In-Reply-To:Message-ID:Date:Subject:To:From:Cc:Date:From:
Subject:Sender;
bh=0YnhAmIePkkYrlAKPA7qC2tcUsPDgIN4qXENjfLTimM=;
b=EanQDx3mf1naZX370ijRzZGAEi/VPc1VeOML9+9c52RWp5sjg5iATxSr+8Z+WLG01h
KGA4PLBVAAXXYeLB8ZNhepN72gVxww7hqVsrd8cBn9PIWQQHW90LriZd43+DadZ7tImp
mOohO8ka964WDUN6RlYRZRj8k5GlVNxPmBzn0xo7c2iLsBtWgTeW1CeMkuyInB7PIKCl
dOZxulRkxwVdmUbFFESrSRbNWuLNGvVxxONYNj6CLGOuzLgnZKca4ROugTGCxzTy2EM0
1/rZnFE1ZOSHNWhm+hY3SPPkgNok7scQz8i3n6gkjeOC74iVQN5WLnSlNk/1JChz7+i+
yCQQ==
ARC-Authentication-Results: i=1; strato.com;
arc=none;
dkim=none
X-RZG-CLASS-ID: mo00
X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zq68+3cfpPHj6C6mIk6D1piuCc2EubRrsS9rw=="
To: cygwin AT cygwin DOT com
Subject: Re: character class "alpha"
Date: Mon, 31 Jul 2023 16:06:13 +0200
Message-ID: <5176597.IBPj4gxFZX@nimes>
In-Reply-To: <ZMe5Q02S5ap5gBbJ@calimero.vinschen.de>
References: <3884636 DOT 3uDm00564X AT nimes> <ZMeH6yZQkK0exU8H AT calimero DOT vinschen DOT de>
<ZMe5Q02S5ap5gBbJ AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE,
RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP,
T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Bruno Haible via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Bruno Haible <bruno AT clisp DOT org>
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id 36VE6YPp029403

Corinna Vinschen wrote:
> I have a problem with the c32isalpha function.
> 
> c32isalpha fails for the character U+FF11 FULLWIDTH DIGIT ONE,
> because it expects the character to be an alphabetic character.

This is not a big problem. You can see in the test-c32isalpha.c file
that this test is disabled for many platforms, in particular glibc.
There's no problem with disabling it on Cygwin as well.

> The Cygwin unicode information is automatically generated from the
> Unicode data file UnicodeData.txt, fresh from their homepage.  iswalpha
> in newlib is checking for the Unicode categories, using the expression:
> 
>     return cat == CAT_LC || cat == CAT_Lu || cat == CAT_Ll || cat == CAT_Lt
>           || cat == CAT_Lm || cat == CAT_Lo
> 	  || cat == CAT_Nl // Letter_Number
> 	  ;
> 
> with CAT_foo being equivalent to Unicode category foo.
> 
> Per UnicodeData.txt, ff11 is of category Nd, so it's a digit, not an
> alphabetic character.

This is not wrong. However, see the comments in the generator of the
gnulib tables:

https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/gen-uni-tables.c;h=0dceedc06cd72f886807fd575a2c4dba99cd147a;hb=HEAD#l5789

   /* Consider all the non-ASCII digits as alphabetic.
      ISO C 99 forbids us to have them in category "digit",
      but we want iswalnum to return true on them.  */

Likewise in the generator of the glibc tables:

https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/unicode-gen/unicode_utils.py;h=5af03113a2f1f063769752ea426fcaf6f6ba9e95;hb=HEAD#l274

The original comment (from 2000) was:

  /* SUSV2 gives us some freedom for the "digit" category, but ISO C 99
     takes it away:
     7.25.2.1.5:
        The iswdigit function tests for any wide character that corresponds
        to a decimal-digit character (as defined in 5.2.1).
     5.2.1:
        the 10 decimal digits 0 1 2 3 4 5 6 7 8 9
   */
  return (ch >= 0x0030 && ch <= 0x0039);

The question is: In which category do you put these non-ASCII digits?
"print" and "graph", sure. But other than that? "punct" or "alnum"?
"punct" seems wrong. If you, like me, decide to put them in "alnum",
then you they need to be in "alpha" or "digit" (per POSIX
https://pubs.opengroup.org/onlinepubs/9699919799/functions/iswalnum.html ).
But ISO C 23 § 7.4.1.5 + § 5.2.1 does not allow them in category "digit".

Bruno




-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019