X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 06A713858017 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1690839470; bh=veSyoDrKlDqfD9bO9Lp21VILrvVfCYc084bfxSC3gqk=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=KjTvkGqf/69DfjQVeTXDyF7NhMSADu6cuZ6FlUEpk6CdSOjDOFNjKLQXgQhqoLCKB vz5XbSDaRnFPOJOS5QWbH3r/o/r9UMeOukJJGmcZKwjP1ljFnaRB6ZDTL6z3DxB4ra YRk7CYerq84b+xx/qG/xo6cvj5LsaAB/anUPNW04= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C5AD13858CD1 ARC-Seal: i=1; a=rsa-sha256; t=1690839428; cv=none; d=strato.com; s=strato-dkim-0002; b=SNSdUznTcoGg0hDL61K90WUC3ZV4OcwxBJUe/DJy/ECJsU01liiEVC9J/SL7x/vtON 5lOMtCf5L8LjzzE0y6pgNpqn4CcYQvx2tLHCrhh1CMWlcnotDREnFRrk1mv9Tm5INZCt ZB4ORR0kuaB260mf1ANZJkGCZQQ7rC/kx1eL5Y7/vAeGChRCZtYOBJlATxiYM03ZIKm0 vODfLAzpeXsBAlVE/zv3U+sWGpNzZsNpGQc5f8xM7XAU7eFLAf7h1LCqZ4v5f8fAEYoe tJr8EU94Dv0vZxjeU5BVbFkiGqevXLWJEKqJK/xF9pY5HdcBGJXmo7usD1BvMEbpbW63 Z6Lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1690839428; s=strato-dkim-0002; d=strato.com; h=References:In-Reply-To:Message-ID:Date:Subject:To:From:Cc:Date:From: Subject:Sender; bh=MZCa2Zg4xyNGADRmD7w1Ev488gLq6fWSZXYUAAnOL7M=; b=CJBTAftGxa7KGrEl9c82uczGn/e3NYgocRflYaJSBw6RIjcr/I9XtAKiOioti2xHHj 4gqmXi59jmeF97rEq3PGjmgu6QbPnMcdrLqSuDyd4FbV1fbKrZovREEAk8Y/v3zxkz1Z B3do+qkCqSInuF4QnmLQ/9i3blX6cn4ZG/7tklhc78JM6Du+axfVykWavGGRDK79pj6k wUzU0sW4cCm8VLWc8MfsqHelFW1mfRX3joNWkCPcZXkJGYrZ1t+R9ISQFNRprcSJ4YPA bJvz4JUWioic5e4QhZKimJNK/cC+HJgQzbBGmnNjNfOUpbFcE8TfkKRMiKDGDKphpl/M Svmw== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zq68+3cfpPHj6C6mIk6D1piuCc2EubRrsS9rw==" To: cygwin AT cygwin DOT com, Brian Inglis Subject: Re: character class "alpha" Date: Mon, 31 Jul 2023 23:37:08 +0200 Message-ID: <18620212.dDkQJl9nhx@nimes> In-Reply-To: <223e3d56-1a63-57ef-5236-bc1df37716a0@Shaw.ca> References: <3884636 DOT 3uDm00564X AT nimes> <4474610 DOT kIfH5X4irW AT nimes> <223e3d56-1a63-57ef-5236-bc1df37716a0 AT Shaw DOT ca> MIME-Version: 1.0 X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Bruno Haible via Cygwin Reply-To: Bruno Haible Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Brian Inglis wrote: > It seems to me that most application developers needing to support > non-Western-European languages might want a non-POSIX interpretation of digits. Sure. GNU libunistring has dedicated API for this: - https://www.gnu.org/software/libunistring/manual/html_node/Object-oriented-API.html UC_DECIMAL_DIGIT_NUMBER. - https://www.gnu.org/software/libunistring/manual/html_node/Decimal-digit-value.html - https://www.gnu.org/software/libunistring/manual/html_node/Digit-value.html - https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-objects.html UC_PROPERTY_DECIMAL_DIGIT - https://www.gnu.org/software/libunistring/manual/html_node/Properties-as-functions.html uc_is_property_decimal_digit I'm sure ICU4C has similar APIs too. > Are the Unicode character attribute classes supported for those application use > cases that need more than POSIX limitations allow? POSIX allows the libc to define additional character classes. But these will be platform and locale dependent, and I don't know of any application which makes use of such additional character classes via wctype() and iswctype(). > I know that I sometimes want to see some alternative numeric digit forms and > expect to be able to find those with an appropriate grep expression. I think you can do so with GNU 'grep', when it was built with PCRE support. PCRE includes support for Unicode character classes. Bruno -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple