delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2023/04/19/07:58:20

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C64563858404
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1681905452;
bh=De3xfPYGJpo1knx6IFEZJghH09CMjYXYNsmfSVxsFLQ=;
h=Date:To:Cc:Subject:In-Reply-To:References:List-Id:
List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
From:Reply-To:From;
b=qnBIwEpXgxoRKtbUgadnTddM5BZTwrnFR09tYWhoFgsWImOM/vwkwPFC5RlwTzH74
eb/l2TucIqv6e83NBR9dJ4c1SHUk2i4FGnP35V9rf7jeo8zgRXNuY4fK64x163SqK4
kRK+D/k0D4UTJSwN+sRsIqoevvdl5Ufx0Dtciel4=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 00C3C3858425
MIME-Version: 1.0
Date: Wed, 19 Apr 2023 13:56:54 +0200
To: L A Walsh <cygwin AT tlinx DOT org>
Cc: cygwin AT cygwin DOT com
Subject: Re: Can not stat file with utf char U+F020
In-Reply-To: <643F3F87.2050403@tlinx.org>
References: <992b3c28d7f1cfc17f7c9bb47b53f770 AT assyoma DOT it>
<ZDmiyYS+m0x4QZmh AT calimero DOT vinschen DOT de>
<f4d0bd30-731a-fb5e-43d2-a86d1af761b6 AT Shaw DOT ca>
<dfbdeec869acd6ed1ad7b4ec803336f2 AT assyoma DOT it>
<ZDm20Rug7TswU4KI AT calimero DOT vinschen DOT de>
<1274a3199d9bedab4f15d209694c6e1f AT assyoma DOT it>
<de19dc1ecd891f748e5e4a317f347477 AT assyoma DOT it>
<ZD0L5VoqmTHmHBT9 AT calimero DOT vinschen DOT de>
<fc6d3aa5f3ae525a777e65d62ccbd6c2 AT assyoma DOT it>
<1a7db5a68644e5b66634d5af9b402caf AT assyoma DOT it> <643F3F87 DOT 2050403 AT tlinx DOT org>
Message-ID: <9f1593d259faf7f845b96947eaff8619@assyoma.it>
X-Sender: g DOT danti AT assyoma DOT it
X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS,
T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Gionatan Danti via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Gionatan Danti <g DOT danti AT assyoma DOT it>
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>

Il 2023-04-19 03:10 L A Walsh ha scritto:
> I'm a bit confused as to what char you are trying to access/use, as
> U+F020 is in the Private Use area (PUA)
> 
> Since it's in the PUA, it seems its meaning could differ by
> application/OS/User, no?
> I.e. have no set definition
> 
> I mean you can use it in Cygwin to represent some character not
> usually permitted in
> a DOS/Win filename (like :/\, etc.), but it wouldn't have the same 
> meaning then
> in Windows though.?  Isn't Private Use area application specific so an
> application can
> create and use its own symbol set -- even though it wouldn't be
> portable to another application.

The issue is with any clients/applications (even cygwin) creating a 
filename ending with a dot (or other chars) which is replaced with 
U+F020. If this file is later renamed adding some other character 
*after* the replaced dot, it become unreadable by cygwin.

Something similar to that:
- an user create a file name "project.", forgetting the extension, on an 
Windows share;
- the client replace the dot with U+F020;
- at this point all is good: the file can be read by the client, Windows 
and cygwin;
- the user notice the missing extension and rename the file in 
"project.txt";
- cygwin now does *not* traslate back U+F020 to dot and it is unable to 
read the file.

> I think characters in the PUA range are used to allow Cygwin filenames
> to contain colon, slashes
> and quotes -- so one wouldn't want Windows to understand the cygwin
> intent or it would defeat
> the purpose of using custom characters to represent filenames that are
> legal under POSIX but not
> under Windows.

True, but dot and spaces are somewhat different from the other reserved 
chars. While backslash, colons, etc. are rejected by NTFS itself (or by 
lower layer API), trailing dot and spaces are ignored/stripped by Win32. 
This means that Linux clients accessing an SMB share *can* successfully 
create such filenames without any issue and without replacing them with 
PUA chars.

For example, I created a file called "zzz." from a Linux+Mate client. 
Cygwin correctly see the filename as:
$ ls "zzz." | od -x --endian=big
0000000 7a7a 7a2e 0a00

True, Windows can not access this file, but this is fine because such a 
filename should never be understood by Windows. Not being able to open 
the file from Windows, its users themselves will find and correct the 
issue, renaming the file.

As things are now, we have the opposite issue: should (for whichever 
reason) a file exist with names as "zzz[U+F020]txt", cygwin will not be 
able to access this file. This means that anyone using cygwin+rsync to 
backup a Windows server will now have an inaccessible and impossible to 
backup file.

Thinking about that: how do you feel having an option to exclude 
trailing dots and spaces from PUA translations (effectively reverting 
them to the status of "normal" characters)?

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g DOT danti AT assyoma DOT it - info AT assyoma DOT it
GPG public key ID: FF5F32A8

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019