DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 4ANBNgM3773515
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4ANBNgM3773515
Authentication-Results: delorie.com;
	dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=mM42FraO
X-Recipient: archive-cygwin@delorie.com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CE60C3858019
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
	s=default; t=1732361021;
	bh=Tp9gzLvLh50KUs3wwZ2HgXnt3wiTx+W/FpYu9cZc9t0=;
	h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=mM42FraOhZxkEwUKDbc582vyBq451FW9JT04UWnqNBy3kLPOrVg001NLkpT6MKe7i
	 do+8oL+coyX9DuQY7n+UFwHPrAFxHjHHeD3gMvvwm+mWPztxbd8W0Qr5Nfmz42beDt
	 04Yv4MxrWatabHfBILa3gbTqxsGIFPJNl+qJOPpQ=
X-Original-To: cygwin@cygwin.com
Delivered-To: cygwin@cygwin.com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4ED063858D37
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4ED063858D37
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732360955; cv=none;
 b=T0eJixm7JDA8Ca5rt4wqkJR/tCcQI42al+NbcqOIwXEgC32Ebbayhqv3JhPsJLBktcYx/BARQemTpqh6nRjBVKk73YVqXlB4caGtr0uhtejozwzNEtfyjw9h7707bdJQna1PhQlaR6xV0i1JdwhZgC33dfWxld5On4zb+iTX1RU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1732360955; c=relaxed/simple;
 bh=/8sfXcHBJT0WZdbAdVTePw+/PBov9yfhWcpic7WnqNE=;
 h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To;
 b=cfgddsx7FwjWW3Djhtx3nc4Bgteb/MMz1zvmufZRi8e6Bm6O/jXtxX6u8hD6C1uOWJiSd9J/yseOxI+y8ZH8nbTzu3jhkZc9dN+uwwjgi6Rbepk1ywwtq7irK/5Zv6rpCY64wVd5VxxVfifJ90u+yq4nMGeNvNL9i6Hq2fpgya4=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4ED063858D37
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1732360953; x=1732965753;
 h=content-transfer-encoding:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=607md+0osDQCV/KmW4YooyexylBpno3SCK8+8ryUyqQ=;
 b=kY48ICWlkkY6Ha0yUpnXPY25/mZL85LSgoL5rEAWg8sUBqMs4WRmrd7SCDh2ThTgog
 cxix0tlOpuHnkg0aeoymZNrHfXKIv1rorCoVhu9gUaeGQvV+ggd7VhHc+CBzDRMOd3gP
 VT/we858EykXIvwDt+blRrlVhLKsnq/SONNRHKesqRxQfqAv8ZpEgzl0+pLFHLWURFJk
 V516Xb60GnI9DkFYbOaF6SnwkbhjiVMxfZwx74kq4ap/WFgJD2d5B+7A00wskcrxAfgo
 WWovltuWzAOrxRJBrt3I8M3kP5tDUZgJ2+CPnGirKq61ahvOM7NDqArLwWiB57aDSiru
 5n3w==
X-Gm-Message-State: AOJu0YyI8iKf7N1M02rKUn08DdHMD6iL1rX8ZLg+ylHkB7BcpA4OM9aP
 S1OwBAyf3Jp17MOY9vc+3xEbNQoIDiTzPYCd1YK3nj7EpxEKSohe83uQl3zyAS0Isz7QP7HqI5M
 qws+wHNVCzlMb2e5mzAULcN4Q0WXMJYNK
X-Gm-Gg: ASbGncv+MzK9MnIftyrWZhjMfaAlAJJ+ZSWvcms87xvJcpOwZxIZolFVuPfnO2xxx8L
 i4YVEPYoTYSq2nsH2v9GNMWVjs024dVE=
X-Google-Smtp-Source: AGHT+IHGmRHic73hnTXgRGb2yd311vshANn332zw1+tYdYGjKG1X5aDruZQpXAngJb8iGkLvWo1Knx6FM661TgwL/sE=
X-Received: by 2002:a05:6402:42cb:b0:5cf:cfa8:d6bd with SMTP id
 4fb4d7f45d1cf-5d0207995b0mr5010068a12.25.1732360952975; Sat, 23 Nov 2024
 03:22:32 -0800 (PST)
MIME-Version: 1.0
References: <CALXu0UcnZnQBbJQcSsbianeKiyB2vkOmvE1weGN_-EQSU=RNrQ@mail.gmail.com>
In-Reply-To: <CALXu0UcnZnQBbJQcSsbianeKiyB2vkOmvE1weGN_-EQSU=RNrQ@mail.gmail.com>
Date: Sat, 23 Nov 2024 12:21:56 +0100
Message-ID: <CALXu0UfYmRP5yMG4J6znd4svqq1kbgEkpvHj-CWjB6APE8C3uw@mail.gmail.com>
Subject: Re: /bin/ls -l cannot handle printable Unicode characters outside the
 BMP ...
To: cygwin@cygwin.com
X-BeenThere: cygwin@cygwin.com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=subscribe>
From: Cedric Blancher via Cygwin <cygwin@cygwin.com>
Reply-To: Cedric Blancher <cedric.blancher@gmail.com>
Content-Type: text/plain; charset="utf-8"
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie.com@cygwin.com>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4ANBNgM3773515

On Sat, 23 Nov 2024 at 11:44, Cedric Blancher <cedric.blancher@gmail.com> wrote:
>
> Good morning!
>
> /bin/ls -l cannot handle printable Unicode characters outside the BMP
>
> Example using '𝒯'
> bash -c 'printf "\U0001D4AF\n"' # MATHEMATICAL SCRIPT CAPITAL T
> (yes, our mathematicians want to use THAT as file name)
>
> On Linux:
> LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
> ls -la
> total 8
> -rw-r--r--  1 ced staden  0 Nov 23 11:29 ööööööö
> -rw-r--r--  2 ced staden  4 Nov 23 11:31 𝒯
> -rw-r--r--  2 ced staden  4 Nov 23 11:31𝒯𝒯
>
> On Cygwin:
> LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
> $ ls -la
> -rw-r--r-- 1 ced staden  0 Nov 23 11:29  ööööööö
> -rw-r--r-- 2 ced staden  4 Nov 23 11:31 ''$'\360\235\222\257'
> -rw-r--r-- 2 ced staden  4 Nov 23 11:31 ''$'\360\235\222\257\360\235\222\257'
>
> Looks like the Cygwin locale has a problem with non-BMP chars.

find(1) is even worse:
$ find .
.
./ööööööö
./????
./x??x

The Microsoft Explorer GUI shows the file names correctly, so IMO this
is not a Windows or Win32 API problem.

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

