| delorie.com/archives/browse.cgi | search |
| DMARC-Filter: | OpenDMARC Filter v1.4.2 delorie.com 581Mut1k3057442 |
| Authentication-Results: | delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com |
| Authentication-Results: | delorie.com; spf=pass smtp.mailfrom=cygwin.com |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 delorie.com 581Mut1k3057442 |
| Authentication-Results: | delorie.com; |
| dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=iUq09yzT | |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org E82D63858280 |
| DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; |
| s=default; t=1756767412; | |
| bh=XEA57/9v4wqwyPAcq8jGFgnFSzDDEPhHR5pxiaODNDc=; | |
| h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: | |
| List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: | |
| From; | |
| b=iUq09yzTvMSUehDv9inpFvcWRwFiab8RNrPQjEGgEEjutIEDb/iGj6DtLiIHFbDfY | |
| xGYTNYJRIRxayoXRvTQ3+nX2RWF68SheVvwopGFMs2vmISvFHBhoRjv19YCLNwcqEH | |
| CRaz7E2kuZz44AQtxa7cp57Vl95MXHi0PaED8vEM= | |
| X-Original-To: | cygwin AT cygwin DOT com |
| Delivered-To: | cygwin AT cygwin DOT com |
| DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org 1829F3858D21 |
| ARC-Filter: | OpenARC Filter v1.0.0 sourceware.org 1829F3858D21 |
| ARC-Seal: | i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1756767383; cv=none; |
| b=mG9z4e1Gg2NIGczcoC/svUBHo1ftFavrAgr16hZmlbei+AuGNR4iPtMXNTQyfEiywg/Ap9lpB0qogt4XTPnaACHqQSKXj05xRRI7wCTqWQ4ObDKqwxIRtU3Pn4OjUDU/PJxI2/0lgknfia2U3oL6/bglPvpaYMkpWcGQU2Bhtnc= | |
| ARC-Message-Signature: | i=1; a=rsa-sha256; d=sourceware.org; s=key; |
| t=1756767383; c=relaxed/simple; | |
| bh=mplkNoJQd8fo/ScEoKL4tQSKhCXtB4LIGE6iFoKKb7g=; | |
| h=Message-ID:Date:MIME-Version:From:Subject:To:DKIM-Signature; | |
| b=T3PNJR6SrZASTNqR1dFbHGSFtG+4CQpn0j+GO6+MK4d2tHAvjdmwTm7jwZEv+vT9rEPTQzSKdkEYcVPSZbZEo7zBjy/k0upYpiNeaZOxmRObhLfs1KT0SRKpAb0GzMu6x+VQb482mEKmbPqXbJvXG6WaoU2nxmoEjgdmZpRxMcw= | |
| ARC-Authentication-Results: | i=1; server2.sourceware.org |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 1829F3858D21 |
| Message-ID: | <41fa5313-f8aa-449b-bc50-b5eb5b8e9a9f@systematicsw.ab.ca> |
| Date: | Mon, 1 Sep 2025 16:56:20 -0600 |
| MIME-Version: | 1.0 |
| User-Agent: | Mozilla Thunderbird |
| Subject: | Re: bash 5.2.21-1: a bug in [0-9] expansion |
| To: | cygwin AT cygwin DOT com |
| References: | <CADxVF5dR-0Z-fvjTNkMfC1U=12T7WfuipiODhuZNizQDUPL5AA AT mail DOT gmail DOT com> |
| <50ef4d1a-fab5-4711-b17a-5d26e74c4881 AT SystematicSW DOT ab DOT ca> | |
| <f69e0dc9-ed4a-4871-bd4a-a6bcf0d6edc0 AT gmx DOT com> | |
| Organization: | Systematic Software |
| In-Reply-To: | <f69e0dc9-ed4a-4871-bd4a-a6bcf0d6edc0@gmx.com> |
| X-Rspamd-Queue-Id: | 40FD020025 |
| X-Stat-Signature: | dodaepm68tszph8o6rax517cpwppbzp3 |
| X-Rspamd-Server: | rspamout02 |
| X-Session-Marker: | 427269616E2E496E676C69734053797374656D6174696353572E61622E6361 |
| X-Session-ID: | U2FsdGVkX1+NDmFzslXqcevOUbxqAO3caDyRSlTMunE= |
| X-HE-Tag: | 1756767381-405602 |
| X-HE-Meta: | U2FsdGVkX18rUUi8tNo7jBAkZQxObrawh5cjM5YhON7RveR4X119EUzpyWiGjasMEtKER2pl1mliMkEi6k8iR/cdcyOF8sSQ/bN6EUirU9+u+gGYeutpmFoCH+mwuisz7W0XF45qhIrVsG7nY8000dtebdCje+3Xp5F/dZt9pZjCsv9KjTTv5XntAsCkxdE7n3xI7FOmA7y/jcgVA/05z0GmdgAVBR2IseRnEk2HyBa+wee3CbmQuXig/E16rrATqBRgbrCJI3WU+qdVPy3FQ+u7fqXzSKOJZBfDj0i8qATLlNNKSPWYDpdKiKqG9d+2CApsirBGUMWG8zvAN4G0QdizTYdtowy+D9JPeLNrI1EV5lCGbKkmkA== |
| X-BeenThere: | cygwin AT cygwin DOT com |
| X-Mailman-Version: | 2.1.30 |
| List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
| List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
| List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
| List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
| From: | Brian Inglis via Cygwin <cygwin AT cygwin DOT com> |
| Reply-To: | cygwin AT cygwin DOT com |
| Cc: | brian DOT inglis AT systematicsw DOT ab DOT ca |
| Errors-To: | cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com |
| Sender: | "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com> |
| X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 581Mut1k3057442 |
On 2025-09-01 15:23, Sam Edge via Cygwin wrote:
> On 01/09/2025 18:19, Brian Inglis via Cygwin wrote:
> > On 2025-08-31 13:06, Mariusz Wodzicki via Cygwin wrote:
> >> Description of the problem.
> >> [0-9] picks also certain Unicode superscript characters ( namely, ⁰ ⁴ ⁵ ⁶
> >> ⁷ ⁸ ⁹ ), and every Unicode subscript character.
> >>
> >> Example: the directory has the following files:
> >> $ /bin/ls
> >> ₀.txt ₁.txt ₂.txt ₃.txt ₄.txt ₅.txt ₆.txt ₇.txt ₈.txt ₉.txt
> >> ⁰.txt ¹.txt ².txt ³.txt ⁴.txt ⁵.txt ⁶.txt ⁷.txt ⁸.txt ⁹.txt
> >>
> >> $ /bin/ls [0-9].txt
> >> ₀.txt ₁.txt ₃.txt ⁴.txt ⁵.txt ⁶.txt ⁷.txt ⁸.txt
> >> ⁰.txt ₂.txt ₄.txt ₅.txt ₆.txt ₇.txt ₈.txt
> >>
> >> $ locale
> >> LANG=en_US.UTF-8
> >> LC_CTYPE="en_US.UTF-8"
> >> LC_NUMERIC="en_US.UTF-8"
> >> LC_TIME="en_US.UTF-8"
> >> LC_COLLATE="en_US.UTF-8"
> >> LC_MONETARY="en_US.UTF-8"
> >> LC_MESSAGES="en_US.UTF-8"
> >> LC_ALL=
> >>
> >> System.
> >> Fully up to date Windows 11
> >> cygwin 3.6.4-1
> >> bash 5.2.21-1
> >
> > For reproducible results prefix commands with LC_ALL=C … or possibly just
> LC_COLLATE=C or LC_CTYPE=C or =POSIX to standardize the locale, otherwise many
> commands will respect the current locale, and some respect Unicode regardless of
> locale e.g. `info wc`:
> >
> > "Unless the environment variable ‘POSIXLY_CORRECT’ is set, GNU ‘wc’ treats
> the following Unicode characters as white space even if the current locale does
> not: U+00A0 NO-BREAK SPACE, U+2007 FIGURE SPACE, U+202F NARROW NO-BREAK SPACE,
> and U+2060 WORD JOINER."
> >
> > For GNU utilities, where info pages are preferred, such as coreutils*,
> compiler and language processors, and tools packages, many details do not appear
> in the man pages, for example:
> >
> > "Full documentation <https://www.gnu.org/software/coreutils/wc> or available
> locally via: info '(coreutils) wc invocation'"
> >
> > although `info wc` shows the same page.
> >
> > —————
> > * [ arch b2sum base32 base64 basename cat chcon chgrp chmod chown chroot
> cksum comm cp csplit cut date dd df dir dircolors dirname du echo env expand
> expr factor false fmt fold gkill groups head hostid id install join link ln
> logname ls md5sum mkdir mkfifo mknod mktemp mv nice nl nohup nproc numfmt od
> paste pathchk pinky pr printenv printf ptx pwd readlink realpath rm rmdir runcon
> seq sha1sum sha224sum sha256sum sha384sum sha512sum shred shuf sleep sort split
> stat stdbuf stty sum sync tac tail tee test timeout touch tr true truncate tsort
> tty uname unexpand uniq unlink users vdir wc who whoami yes
> >
>
> Bash is GNU but isn't part of coreutils as far as I know. Type 'man bash' and
> then read the 'Pattern Matching' section for its globbing behaviour.
Good point - must have needed brain food! ;^>
> TL;DR For bash 5.2, using 'export LC_ALL=C.UTF-8' as Brian suggests or 'export
> LC_COLLATE=C.UTF-8' or 'shopt -s globasciiranges' should revert to simple ASCII
> ranges for '[0-9]', '[a-z]' etc.
>
> I'm seeing the correct behaviour with up-to-date Cygwin bash/coreutils etc. by
> the way. 'echo [0-9]*' only expands out sub/super-digits if I use
> 'LC_COLLATE=en_GB.UTF-8' or similar with 'shopt -u globasciiranges'.
What I find interesting is that the superscript low codes ¹ \ub9 ² \ub2 ³ \ub3
are not matched nor ⁹ \u2079 except by higher ranges, while the wider range
excludes more values, and the classes [:digit:] and equivalences [=0=] do nothing:
$ echo ?.txt
₀.txt ⁰.txt ₁.txt ¹.txt ₂.txt ².txt ₃.txt ³.txt ₄.txt ⁴.txt ₅.txt ⁵.txt ₆.txt
⁶.txt ₇.txt ⁷.txt ₈.txt ⁸.txt ₉.txt ⁹.txt
$ echo [$'\u2070'-$'\u2079'].txt
⁰.txt ₁.txt ¹.txt ₂.txt ².txt ₃.txt ³.txt ₄.txt ⁴.txt ₅.txt ⁵.txt ₆.txt ⁶.txt
₇.txt ⁷.txt ₈.txt ⁸.txt ₉.txt ⁹.txt
$ echo [$'\u2080'-$'\u2089'].txt
₀.txt ⁰.txt ₁.txt ¹.txt ₂.txt ².txt ₃.txt ³.txt ₄.txt ⁴.txt ₅.txt ⁵.txt ₆.txt
⁶.txt ₇.txt ⁷.txt ₈.txt ⁸.txt ₉.txt
$ echo [$'\u2070'-$'\u2089'].txt
⁰.txt ₁.txt ¹.txt ₂.txt ².txt ₃.txt ³.txt ₄.txt ⁴.txt ₅.txt ⁵.txt ₆.txt ⁶.txt
₇.txt ⁷.txt ₈.txt ⁸.txt ₉.txt
$ echo [0-9].txt
₀.txt ⁰.txt ₁.txt ₂.txt ₃.txt ₄.txt ⁴.txt ₅.txt ⁵.txt ₆.txt ⁶.txt ₇.txt ⁷.txt
₈.txt ⁸.txt
$ echo [$'\u00b2'-$'\u00b9'].txt
¹.txt ².txt ³.txt
$ echo [$'\ub2'-$'\ub9'].txt
¹.txt ².txt ³.txt
$ echo [$'\ub2'$'\ub3'$'\ub9'].txt
¹.txt ².txt ³.txt
$ echo [[=0=][=1=][=2=][=3=]].txt
[[=0=][=1=][=2=][=3=]].txt
$ echo [[:digit:]].txt
[[:digit:]].txt
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retrancher but when there is no more to cut
-- Antoine de Saint-Exupéry
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
| webmaster | delorie software privacy |
| Copyright 2019 by DJ Delorie | Updated Jul 2019 |