X-Recipient: archive-cygwin@delorie.com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 06A713858017
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
	s=default; t=1690839470;
	bh=veSyoDrKlDqfD9bO9Lp21VILrvVfCYc084bfxSC3gqk=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=KjTvkGqf/69DfjQVeTXDyF7NhMSADu6cuZ6FlUEpk6CdSOjDOFNjKLQXgQhqoLCKB
	 vz5XbSDaRnFPOJOS5QWbH3r/o/r9UMeOukJJGmcZKwjP1ljFnaRB6ZDTL6z3DxB4ra
	 YRk7CYerq84b+xx/qG/xo6cvj5LsaAB/anUPNW04=
X-Original-To: cygwin@cygwin.com
Delivered-To: cygwin@cygwin.com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C5AD13858CD1
ARC-Seal: i=1; a=rsa-sha256; t=1690839428; cv=none;
 d=strato.com; s=strato-dkim-0002;
 b=SNSdUznTcoGg0hDL61K90WUC3ZV4OcwxBJUe/DJy/ECJsU01liiEVC9J/SL7x/vtON
 5lOMtCf5L8LjzzE0y6pgNpqn4CcYQvx2tLHCrhh1CMWlcnotDREnFRrk1mv9Tm5INZCt
 ZB4ORR0kuaB260mf1ANZJkGCZQQ7rC/kx1eL5Y7/vAeGChRCZtYOBJlATxiYM03ZIKm0
 vODfLAzpeXsBAlVE/zv3U+sWGpNzZsNpGQc5f8xM7XAU7eFLAf7h1LCqZ4v5f8fAEYoe
 tJr8EU94Dv0vZxjeU5BVbFkiGqevXLWJEKqJK/xF9pY5HdcBGJXmo7usD1BvMEbpbW63
 Z6Lg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1690839428;
 s=strato-dkim-0002; d=strato.com;
 h=References:In-Reply-To:Message-ID:Date:Subject:To:From:Cc:Date:From:
 Subject:Sender;
 bh=MZCa2Zg4xyNGADRmD7w1Ev488gLq6fWSZXYUAAnOL7M=;
 b=CJBTAftGxa7KGrEl9c82uczGn/e3NYgocRflYaJSBw6RIjcr/I9XtAKiOioti2xHHj
 4gqmXi59jmeF97rEq3PGjmgu6QbPnMcdrLqSuDyd4FbV1fbKrZovREEAk8Y/v3zxkz1Z
 B3do+qkCqSInuF4QnmLQ/9i3blX6cn4ZG/7tklhc78JM6Du+axfVykWavGGRDK79pj6k
 wUzU0sW4cCm8VLWc8MfsqHelFW1mfRX3joNWkCPcZXkJGYrZ1t+R9ISQFNRprcSJ4YPA
 bJvz4JUWioic5e4QhZKimJNK/cC+HJgQzbBGmnNjNfOUpbFcE8TfkKRMiKDGDKphpl/M
 Svmw==
ARC-Authentication-Results: i=1; strato.com;
    arc=none;
    dkim=none
X-RZG-CLASS-ID: mo00
X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zq68+3cfpPHj6C6mIk6D1piuCc2EubRrsS9rw=="
To: cygwin@cygwin.com, Brian Inglis <Brian.Inglis@shaw.ca>
Subject: Re: character class "alpha"
Date: Mon, 31 Jul 2023 23:37:08 +0200
Message-ID: <18620212.dDkQJl9nhx@nimes>
In-Reply-To: <223e3d56-1a63-57ef-5236-bc1df37716a0@Shaw.ca>
References: <3884636.3uDm00564X@nimes> <4474610.kIfH5X4irW@nimes>
 <223e3d56-1a63-57ef-5236-bc1df37716a0@Shaw.ca>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW,
 RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: cygwin@cygwin.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=subscribe>
From: Bruno Haible via Cygwin <cygwin@cygwin.com>
Reply-To: Bruno Haible <bruno@clisp.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: cygwin-bounces+archive-cygwin=delorie.com@cygwin.com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie.com@cygwin.com>

Brian Inglis wrote:
> It seems to me that most application developers needing to support 
> non-Western-European languages might want a non-POSIX interpretation of digits.

Sure. GNU libunistring has dedicated API for this:
  - https://www.gnu.org/software/libunistring/manual/html_node/Object-oriented-API.html
    UC_DECIMAL_DIGIT_NUMBER.
  - https://www.gnu.org/software/libunistring/manual/html_node/Decimal-digit-value.html
  - https://www.gnu.org/software/libunistring/manual/html_node/Digit-value.html
  - https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-objects.html
    UC_PROPERTY_DECIMAL_DIGIT
  - https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-functions.html
    uc_is_property_decimal_digit

I'm sure ICU4C has similar APIs too.

> Are the Unicode character attribute classes supported for those application use 
> cases that need more than POSIX limitations allow?

POSIX allows the libc to define additional character classes. But these will be
platform and locale dependent, and I don't know of any application which makes
use of such additional character classes via wctype() and iswctype().

> I know that I sometimes want to see some alternative numeric digit forms and 
> expect to be able to find those with an appropriate grep expression.

I think you can do so with GNU 'grep', when it was built with PCRE support.
PCRE includes support for Unicode character classes.
<https://www.pcre.org/current/doc/html/pcre2pattern.html>

Bruno




-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
