delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2023/07/31/17:37:51

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 06A713858017
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1690839470;
bh=veSyoDrKlDqfD9bO9Lp21VILrvVfCYc084bfxSC3gqk=;
h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=KjTvkGqf/69DfjQVeTXDyF7NhMSADu6cuZ6FlUEpk6CdSOjDOFNjKLQXgQhqoLCKB
vz5XbSDaRnFPOJOS5QWbH3r/o/r9UMeOukJJGmcZKwjP1ljFnaRB6ZDTL6z3DxB4ra
YRk7CYerq84b+xx/qG/xo6cvj5LsaAB/anUPNW04=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C5AD13858CD1
ARC-Seal: i=1; a=rsa-sha256; t=1690839428; cv=none;
d=strato.com; s=strato-dkim-0002;
b=SNSdUznTcoGg0hDL61K90WUC3ZV4OcwxBJUe/DJy/ECJsU01liiEVC9J/SL7x/vtON
5lOMtCf5L8LjzzE0y6pgNpqn4CcYQvx2tLHCrhh1CMWlcnotDREnFRrk1mv9Tm5INZCt
ZB4ORR0kuaB260mf1ANZJkGCZQQ7rC/kx1eL5Y7/vAeGChRCZtYOBJlATxiYM03ZIKm0
vODfLAzpeXsBAlVE/zv3U+sWGpNzZsNpGQc5f8xM7XAU7eFLAf7h1LCqZ4v5f8fAEYoe
tJr8EU94Dv0vZxjeU5BVbFkiGqevXLWJEKqJK/xF9pY5HdcBGJXmo7usD1BvMEbpbW63
Z6Lg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1690839428;
s=strato-dkim-0002; d=strato.com;
h=References:In-Reply-To:Message-ID:Date:Subject:To:From:Cc:Date:From:
Subject:Sender;
bh=MZCa2Zg4xyNGADRmD7w1Ev488gLq6fWSZXYUAAnOL7M=;
b=CJBTAftGxa7KGrEl9c82uczGn/e3NYgocRflYaJSBw6RIjcr/I9XtAKiOioti2xHHj
4gqmXi59jmeF97rEq3PGjmgu6QbPnMcdrLqSuDyd4FbV1fbKrZovREEAk8Y/v3zxkz1Z
B3do+qkCqSInuF4QnmLQ/9i3blX6cn4ZG/7tklhc78JM6Du+axfVykWavGGRDK79pj6k
wUzU0sW4cCm8VLWc8MfsqHelFW1mfRX3joNWkCPcZXkJGYrZ1t+R9ISQFNRprcSJ4YPA
bJvz4JUWioic5e4QhZKimJNK/cC+HJgQzbBGmnNjNfOUpbFcE8TfkKRMiKDGDKphpl/M
Svmw==
ARC-Authentication-Results: i=1; strato.com;
arc=none;
dkim=none
X-RZG-CLASS-ID: mo00
X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zq68+3cfpPHj6C6mIk6D1piuCc2EubRrsS9rw=="
To: cygwin AT cygwin DOT com, Brian Inglis <Brian DOT Inglis AT shaw DOT ca>
Subject: Re: character class "alpha"
Date: Mon, 31 Jul 2023 23:37:08 +0200
Message-ID: <18620212.dDkQJl9nhx@nimes>
In-Reply-To: <223e3d56-1a63-57ef-5236-bc1df37716a0@Shaw.ca>
References: <3884636 DOT 3uDm00564X AT nimes> <4474610 DOT kIfH5X4irW AT nimes>
<223e3d56-1a63-57ef-5236-bc1df37716a0 AT Shaw DOT ca>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW,
RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP,
T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Bruno Haible via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Bruno Haible <bruno AT clisp DOT org>
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>

Brian Inglis wrote:
> It seems to me that most application developers needing to support 
> non-Western-European languages might want a non-POSIX interpretation of digits.

Sure. GNU libunistring has dedicated API for this:
  - https://www.gnu.org/software/libunistring/manual/html_node/Object-oriented-API.html
    UC_DECIMAL_DIGIT_NUMBER.
  - https://www.gnu.org/software/libunistring/manual/html_node/Decimal-digit-value.html
  - https://www.gnu.org/software/libunistring/manual/html_node/Digit-value.html
  - https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-objects.html
    UC_PROPERTY_DECIMAL_DIGIT
  - https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-functions.html
    uc_is_property_decimal_digit

I'm sure ICU4C has similar APIs too.

> Are the Unicode character attribute classes supported for those application use 
> cases that need more than POSIX limitations allow?

POSIX allows the libc to define additional character classes. But these will be
platform and locale dependent, and I don't know of any application which makes
use of such additional character classes via wctype() and iswctype().

> I know that I sometimes want to see some alternative numeric digit forms and 
> expect to be able to find those with an appropriate grep expression.

I think you can do so with GNU 'grep', when it was built with PCRE support.
PCRE includes support for Unicode character classes.
<https://www.pcre.org/current/doc/html/pcre2pattern.html>

Bruno




-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019