delorie.com/archives/browse.cgi | search |
DMARC-Filter: | OpenDMARC Filter v1.4.2 delorie.com 55THmVhd2416418 |
Authentication-Results: | delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com |
Authentication-Results: | delorie.com; spf=pass smtp.mailfrom=cygwin.com |
DKIM-Filter: | OpenDKIM Filter v2.11.0 delorie.com 55THmVhd2416418 |
Authentication-Results: | delorie.com; |
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=dRE9jt1i | |
X-Recipient: | archive-cygwin AT delorie DOT com |
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 928343852768 |
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; |
s=default; t=1751219309; | |
bh=ssxObxJuqXYH0WxkKb84xCtBy6XOunmU2qcr0b4S7eI=; | |
h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: | |
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: | |
From; | |
b=dRE9jt1iN9dZHMSeuYR7xwJxdcm6FGz/7E0VvhOas5t7/xV1vSV69/Ojfisc+ecCB | |
HqgwZ8q1KPyukVKPdbp/1yG7FuSqgje1E+JILj2bHkedwwA5i04CfqxgYFkI158UsX | |
1fyaq8myu9aYPBeI5ubdL8/8ljvK32g5fSkCProE= | |
X-Original-To: | cygwin AT cygwin DOT com |
Delivered-To: | cygwin AT cygwin DOT com |
DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org 4257C3852FD7 |
ARC-Filter: | OpenARC Filter v1.0.0 sourceware.org 4257C3852FD7 |
ARC-Seal: | i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1751219254; cv=none; |
b=SMxuCV/Huet5glngN60mZbt/vU+ECsGKjlyYRpNzOZIpMeyO6LMXDS0XO/Eh8JyC3gTPAvM8iXkTmB/6POdogjApIouGwwS1Qp1iRjA+b53vjKWxbWeijrSWOn3n8eSCZ3O5YwptYjTzr3M7wNHO2qEAu0GZAPf3YFcOUaZcoT0= | |
ARC-Message-Signature: | i=1; a=rsa-sha256; d=sourceware.org; s=key; |
t=1751219254; c=relaxed/simple; | |
bh=+uoFLV+zD3PzHbTZd4hYYKmtE2UPkUPR9Q6/SpKWl5k=; | |
h=Subject:From:To:Message-ID:Date:MIME-Version; | |
b=tePmVGgnihMMnXZ/ZaYuVQgow/yjxsUfm6dZKyWlJWLgcINqDA6umJpe2ZshJbnHnitaC1ZBd75P3oJJt7TFkT0uQU2jGTy0XUR0e/oYFbBLmPqxkxddzPsJSVqDqdMjNIJyQSZ42TJsVqtx/f44KaM9etzK6gnbKgxBGZpdsw0= | |
ARC-Authentication-Results: | i=1; server2.sourceware.org |
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 4257C3852FD7 |
Subject: | Re: readdir() returns inaccessible name if file was created with |
invalid UTF-8 | |
To: | cygwin AT cygwin DOT com |
References: | <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> |
<03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de> | |
<aFxRfI4NdZ8y5IlK AT calimero DOT vinschen DOT de> | |
<f78c615c-aefe-b3d0-aada-5f9d0cf73a0a AT t-online DOT de> | |
<aF5y15iQ840LxLYJ AT calimero DOT vinschen DOT de> | |
<3295c8bd-2c09-76c7-8b5f-0106dc39dd96 AT t-online DOT de> | |
<aF6x55WXIS1t655i AT calimero DOT vinschen DOT de> | |
<5fae4fcc-6847-ab19-b487-3a28c76d96e4 AT t-online DOT de> | |
Message-ID: | <2ff83e59-9374-a04a-36fb-e51e5dd5f6b7@t-online.de> |
Date: | Sun, 29 Jun 2025 19:47:29 +0200 |
User-Agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 |
SeaMonkey/2.53.20 | |
MIME-Version: | 1.0 |
In-Reply-To: | <5fae4fcc-6847-ab19-b487-3a28c76d96e4@t-online.de> |
X-TOI-EXPURGATEID: | 150726::1751219251-DBFF54BA-10E04365/0/0 CLEAN NORMAL |
X-TOI-MSGID: | 82723ec6-9b51-4c0a-9539-db32d01cfe4b |
X-BeenThere: | cygwin AT cygwin DOT com |
X-Mailman-Version: | 2.1.30 |
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
From: | Christian Franke via Cygwin <cygwin AT cygwin DOT com> |
Reply-To: | cygwin AT cygwin DOT com |
Cc: | Christian Franke <Christian DOT Franke AT t-online DOT de> |
Errors-To: | cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com |
Sender: | "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com> |
X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 55THmVhd2416418 |
Christian Franke wrote: > Corinna Vinschen via Cygwin wrote: >> On Jun 27 15:32, Christian Franke via Cygwin wrote: >>> $ touch $'t-\xef\x80\x80' >>> The name mapping is: >>> "t-\xEF\x80\x80" -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-" >> Did you copy/paste this from the old mail, by any chance? > > Sorry, I accidentally mixed two cases with same readdir() result: > > "t-\xEF\x80\x80" -(open, ...)-> L"t-\xF000" -(readdir)-> "t-" > "t-\xED\xAD\x99' -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-" > > $ touch $'t-\xed\xad\x99' > $ touch $'t-\xef\x80\x80' > $ ls | uniq -c >      2 t- > > Does no longer occur in 3.7.0-0.165.g1b60f4861b70 but see below. > ... >> ... >> I'll apply the patch shortly. > > $ touch $'t-\xed\xad\x90' > $ touch $'t-\xed\xad\x91' > $ touch $'t-\xed\xad\x92' > $ touch $'t-\xed\xad\x93' > $ touch $'t-\xed\xad\x94' > $ ls | uniq -c >      5 t- > > $ ls -s > ls: cannot access 't-': No such file or directory > ls: cannot access 't-': No such file or directory > ls: cannot access 't-': No such file or directory > ls: cannot access 't-': No such file or directory > ls: cannot access 't-': No such file or directory > total 0 > ? t- ? t- ? t- ? t- ? t- > > All results found by several runs with different seeds of the attached > test program have in common that the Windows path name contains an > invalid word in UTF-16 High Surrogate range: > > $ ./randnames 42 > $'t-\xEC\x9E\xB3\xEF\x82\x80\xEF\x83\xA0': access() failed, errno=2: > $'t-\xED\xA4\xA8\x80\xE0': original path > L"t-\xD928\xF080\xF0E0": Windows path > > $'t-\xEE\x9E\xB3\xEF\x83\xA1': access() failed, errno=2: > $'t-\xED\xA6\xB0\xE1': original path > L"t-\xD9B0\xF0E1": Windows path > ... > $'t-\xE7\xBE\xB3\xEF\x82\xB3': access() failed, errno=2: > $'t-\xED\xA2\x96\xB3': original path > L"t-\xD896\xF0B3": Windows path > A closer look reveals two problems: 1.) A lone high surrogate is not encoded correctly. Could be fixed with this patch: https://cygwin.com/pipermail/cygwin-patches/2025q2/014001.html 2.) A high surrogate at the very end of the string is not encoded at all. A fix would require to enhance the interface between __*_wctomb() and the outer functions. The outer loop would need to call the function again after L'\0' occurred. BTW: if the file name consists only of a single high surrogate, an interesting corner case of readdir() is visible: $ echo foo >$'\uD876' # Windows name: L"\xD876" $ cat $'\uD876' foo $ ls $ ls -a | uniq -c      1 .      2 .. -- Regards, Christian -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |