delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2025/06/29/13:48:32

DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 55THmVhd2416418
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 55THmVhd2416418
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=dRE9jt1i
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 928343852768
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1751219309;
bh=ssxObxJuqXYH0WxkKb84xCtBy6XOunmU2qcr0b4S7eI=;
h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=dRE9jt1iN9dZHMSeuYR7xwJxdcm6FGz/7E0VvhOas5t7/xV1vSV69/Ojfisc+ecCB
HqgwZ8q1KPyukVKPdbp/1yG7FuSqgje1E+JILj2bHkedwwA5i04CfqxgYFkI158UsX
1fyaq8myu9aYPBeI5ubdL8/8ljvK32g5fSkCProE=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4257C3852FD7
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4257C3852FD7
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1751219254; cv=none;
b=SMxuCV/Huet5glngN60mZbt/vU+ECsGKjlyYRpNzOZIpMeyO6LMXDS0XO/Eh8JyC3gTPAvM8iXkTmB/6POdogjApIouGwwS1Qp1iRjA+b53vjKWxbWeijrSWOn3n8eSCZ3O5YwptYjTzr3M7wNHO2qEAu0GZAPf3YFcOUaZcoT0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1751219254; c=relaxed/simple;
bh=+uoFLV+zD3PzHbTZd4hYYKmtE2UPkUPR9Q6/SpKWl5k=;
h=Subject:From:To:Message-ID:Date:MIME-Version;
b=tePmVGgnihMMnXZ/ZaYuVQgow/yjxsUfm6dZKyWlJWLgcINqDA6umJpe2ZshJbnHnitaC1ZBd75P3oJJt7TFkT0uQU2jGTy0XUR0e/oYFbBLmPqxkxddzPsJSVqDqdMjNIJyQSZ42TJsVqtx/f44KaM9etzK6gnbKgxBGZpdsw0=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4257C3852FD7
Subject: Re: readdir() returns inaccessible name if file was created with
invalid UTF-8
To: cygwin AT cygwin DOT com
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de>
<03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de>
<aFxRfI4NdZ8y5IlK AT calimero DOT vinschen DOT de>
<f78c615c-aefe-b3d0-aada-5f9d0cf73a0a AT t-online DOT de>
<aF5y15iQ840LxLYJ AT calimero DOT vinschen DOT de>
<3295c8bd-2c09-76c7-8b5f-0106dc39dd96 AT t-online DOT de>
<aF6x55WXIS1t655i AT calimero DOT vinschen DOT de>
<5fae4fcc-6847-ab19-b487-3a28c76d96e4 AT t-online DOT de>
Message-ID: <2ff83e59-9374-a04a-36fb-e51e5dd5f6b7@t-online.de>
Date: Sun, 29 Jun 2025 19:47:29 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101
SeaMonkey/2.53.20
MIME-Version: 1.0
In-Reply-To: <5fae4fcc-6847-ab19-b487-3a28c76d96e4@t-online.de>
X-TOI-EXPURGATEID: 150726::1751219251-DBFF54BA-10E04365/0/0 CLEAN NORMAL
X-TOI-MSGID: 82723ec6-9b51-4c0a-9539-db32d01cfe4b
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Christian Franke via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Christian Franke <Christian DOT Franke AT t-online DOT de>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 55THmVhd2416418

Christian Franke wrote:
> Corinna Vinschen via Cygwin wrote:
>> On Jun 27 15:32, Christian Franke via Cygwin wrote:
>>> $ touch $'t-\xef\x80\x80'
>>> The name mapping is:
>>> "t-\xEF\x80\x80" -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-"
>> Did you copy/paste this from the old mail, by any chance?
>
> Sorry, I accidentally mixed two cases with same readdir() result:
>
> "t-\xEF\x80\x80" -(open, ...)-> L"t-\xF000" -(readdir)-> "t-"
> "t-\xED\xAD\x99' -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-"
>
> $ touch $'t-\xed\xad\x99'
> $ touch $'t-\xef\x80\x80'
> $ ls | uniq -c
>       2 t-
>
> Does no longer occur in 3.7.0-0.165.g1b60f4861b70 but see below.
> ...
>> ...
>> I'll apply the patch shortly.
>
> $ touch $'t-\xed\xad\x90'
> $ touch $'t-\xed\xad\x91'
> $ touch $'t-\xed\xad\x92'
> $ touch $'t-\xed\xad\x93'
> $ touch $'t-\xed\xad\x94'
> $ ls | uniq -c
>       5 t-
>
> $ ls -s
> ls: cannot access 't-': No such file or directory
> ls: cannot access 't-': No such file or directory
> ls: cannot access 't-': No such file or directory
> ls: cannot access 't-': No such file or directory
> ls: cannot access 't-': No such file or directory
> total 0
> ? t-  ? t-  ? t-  ? t-  ? t-
>
> All results found by several runs with different seeds of the attached 
> test program have in common that the Windows path name contains an 
> invalid word in UTF-16 High Surrogate range:
>
> $ ./randnames 42
> $'t-\xEC\x9E\xB3\xEF\x82\x80\xEF\x83\xA0': access() failed, errno=2:
> $'t-\xED\xA4\xA8\x80\xE0': original path
> L"t-\xD928\xF080\xF0E0": Windows path
>
> $'t-\xEE\x9E\xB3\xEF\x83\xA1': access() failed, errno=2:
> $'t-\xED\xA6\xB0\xE1': original path
> L"t-\xD9B0\xF0E1": Windows path
> ...
> $'t-\xE7\xBE\xB3\xEF\x82\xB3': access() failed, errno=2:
> $'t-\xED\xA2\x96\xB3': original path
> L"t-\xD896\xF0B3": Windows path
>

A closer look reveals two problems:

1.) A lone high surrogate is not encoded correctly. Could be fixed with 
this patch:
https://cygwin.com/pipermail/cygwin-patches/2025q2/014001.html

2.) A high surrogate at the very end of the string is not encoded at 
all. A fix would require to enhance the interface between __*_wctomb() 
and the outer functions. The outer loop would need to call the function 
again after L'\0' occurred.

BTW: if the file name consists only of a single high surrogate, an 
interesting corner case of readdir() is visible:

$ echo foo >$'\uD876' # Windows name: L"\xD876"
$ cat $'\uD876'
foo
$ ls
$ ls -a | uniq -c
       1 .
       2 ..

-- 
Regards,
Christian


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019