delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2023/02/15/08:53:26

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 101F03858284
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1676469165;
bh=uTfjUM7Pepen9udbrEC+5AVCqTrMwvCxT/9bXzcFEEE=;
h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=CxP7BXfoVq0HJ6w9hXFOh0CYSo9F72fcQZZ0mo1vdccFY49c8N17rEeJzmtyg3B5l
JhWC75KVd6Kt4vRrDVeJbriu5OTGUk75qXmdQNYIHsyDeE9P6+c8P8q9lkulGZNGFt
aRDlNpxpQbHIVW1jSBHFnHVZ8VNKbGcYKw4YZ4HM=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
Date: Wed, 15 Feb 2023 14:52:23 +0100
To: cygwin AT cygwin DOT com
Subject: Re: [ANNOUNCEMENT] Updated: dash 0.5.12-2
Message-ID: <Y+zjl5E4SsUZpQ4Y@calimero.vinschen.de>
Mail-Followup-To: cygwin AT cygwin DOT com, Brian Inglis <Brian DOT Inglis AT shaw DOT ca>
References: <announce DOT 20230212220325 DOT 5447-1-Brian DOT Inglis AT Shaw DOT ca>
<Y+oBUDYR+cFCx3Fw AT calimero DOT vinschen DOT de>
<6810586169 DOT 20230213204858 AT yandex DOT ru>
<8a583e14-b413-d1a2-35d9-e76f73a4b338 AT Shaw DOT ca>
<Y+qRXYAzPKsSHWAy AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
In-Reply-To: <Y+qRXYAzPKsSHWAy@calimero.vinschen.de>
X-Provags-ID: V03:K1:RLNiTs5iRAnO+JihjdF343de7+0HbSsn0BpYLjDrjfsX4pxVDsk
S6BxsmWBe3CzSzCrXsSN0Nl3PotWw2dj6+4zbYvdP0si/jQttzwYyXOF9++lHxfDPskzc9V
FTKdwhKzWE/TQS2JXy8GINattGphi0ZhxKoILvvLrpUKywEUe5WTO/k8OyS6tFheVi4aqFp
wj0F/fpDOG+xBF3bafSZA==
UI-OutboundReport: notjunk:1;M01:P0:xRkfNTXmPSw=;WhKwHoiIb4qJ6iNnFz3+gW/F5bD
j2FDZrnnl2UtUlvH0Pt7uJM/0jSUTYlCJdTa4JVYKOkM3iTw7PyoNEsnXGmcxIdzdqz38qX32
1JxLMAJEUUMJ9747KGlRRzJjTa0f3HUzNrY/s5E+7rGxSF+ru0HZGqxvjMlks2XIpwKT63uCY
bxV1JU3YVWQo02qDMNn2aRTdJ/U232BX77CTli+BWn6VSUNtG76U4xkC3Kp8NnFo7+fl0mkpo
l9KxLEQbAHvsis2lvu1b7xqCkJIx7Q9r5BOgdvZN39K76PVYUkxOSUI+0rkNua4hnqrWoDvdn
q5mFm/GPdMAPBYLYzwppWU0C0wlzuE+Mw31fIyN3CoyA2a5Y9zZjsh99cglGC/VrQtTHGj0N5
A6DnaQDpHJpqcVLgDv6jpHCqfnpxz9G9Yh0Wz/JAPt2yAeDsKzAnYyv0F4QR8vGNXo6Uie3v/
8pE+38L+5aenTfIxbjygdQGyzUdlhBTpmX922HpL4gA2f94BE0vDvbrHGrt7LSb/YwmZD/sZ+
TMn77XcPGjL8UnuggjzLIguRZMAmRpvvZt8xGn9rC/S1iKmMuFKH0Sdj29fpCrq3MPsaJrlPN
V0gLMXTyAgehD/ojcL28LNxmCdU4qT0/v9IQcy/HXnop3OcJqXiN60q8GH02dEARB8HVpwhkK
4fie1APRmm9o6VnkBzDUh79b+avsPLdUp+5dbEsk0Q==
X-Spam-Status: No, score=-97.1 required=5.0 tests=BAYES_00,
GOOD_FROM_CORINNA_CYGWIN, KAM_DMARC_STATUS, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE,
RCVD_IN_MSPIKE_H2, SPF_FAIL, SPF_HELO_NONE,
TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Corinna Vinschen via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>,
Brian Inglis <Brian DOT Inglis AT shaw DOT ca>
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 31FDr8eI014828

Hi Brian,

On Feb 13 20:37, Corinna Vinschen via Cygwin wrote:
> On Feb 13 12:03, Brian Inglis via Cygwin wrote:
> > On 2023-02-13 10:43, ASSI via Cygwin wrote:
> > > Corinna Vinschen via Cygwin writes:
> > > > Can you give me an example?  I'm a bit puzzled because fnmatch as well
> > > > as glob in Cygwin support native characters.
> > 
> > But not locale dependent named character classes like regexp in paths.
> 
> I checked the dash code of curent dash git, and while its internal glob
> implementation supports character classes, they are no localized, using
> standard singlebyte functions isalnum, isalpha, etc. under the hood.
> 
> So, yeah, what you say further down this mail... looks like dash
> supports locale dependent character classes only with glibc.
> [...]
> Either way, I don't care much for what a certain application provides by
> itself.  I'm talking about our libc, that is Cygwin, and what it
> provides to processes calling its implementations of regcomp/regexec,
> glob and fnmatch.
> 
> All these functions have been taken from FreeBSD and all three suffer
> shortcomings:
> 
> - regcomp/regexec supports POSIX named character classes, collating
>   symbols, and equivalence class expressions, but all of them only work
>   for ASCII chars.
> 
> - fnmatch and glob support neither of named character classes,
>   collating symbols, and equivalence class expressions.
> 
> I checked the upstream code in FreeBSD, OpenBSD and NetBSD and none of
> these functions are improved to support locales (regcomp) or any of
> the character classes stuff (fnmatch/glob).
> 
> So, if we want to add this support to Cygwin (and thus, to all
> applications calling the libc implementation of these functions),
> quite a bit of work is required.
> 
> Being able to fetch the implementation from some other source
> would reduce the effort enourmously :}

I took the liberty to add [:<class>:] support to Cygwin's fnmatch(3) and
glob(3) functions.  They also recognize collating symbols [.<coll.] and
equivalence class expressions [=<equiv>=].  But the latter two are not
implemented yet and fnmatch/glob simply skip them in the pattern.

Given that glob and fnmatch use wide characters internally, the support
for character classes is internationalized by default, albeit in a
slightly differentt way than in glibc.  The classes a unicode character
belongs to is not locale dependent in Cygwin/newlib.  All characters
have their classes assigned all the time, so, for instance, the german
character 'ä' is lower and alpha even in the en_US.utf8 locale.

The currently building cygwin test release 3.5.0-0.174.gd6d4436145b8
contains the new code.  Would you mind to build a dash for testing so we
can see if and how it works?


Thanks,
Corinna

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019