delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2025/06/28/06:19:36

DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 55SAJZSg1853475
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 55SAJZSg1853475
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=iXVGf74p
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BEDAD38560BC
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1751105973;
bh=zVRjxM51AAWnHFu8ytJEQtQo7yoAc+4TtuU9ISwaEOs=;
h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=iXVGf74pO7gnJkssh4gezhStCNG9C4ANL4Uf48DCFqJNgxFNgqVYEUP8HdKwInUKm
lYDUWDTfnDH9wuCU4mQp4kYYOzmObuRfz3DSUNvXTkm5Yy/3R2tT82/jFMMQZqyrXA
CjnKUeQ5gkuRVQrdzuvoR0D93o2XibuphigkzgRQ=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 75091385781B
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 75091385781B
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1751105944; cv=none;
b=g09xW7EvTMCMtT//rSOU43OBB7z2Fws0Z0cpplx+GsssVJ2SZf90AbfmSmoQA7wFlvFPkixqSvFkB36hugnujq1YbS/AJCwmV7wrilYIwBWN+TeDv8Wh19lBeeiWmOQo6kKQZ4Ih/02fMeSs4tlp/lLUQzZLw1eDU7eQz0j0+3I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1751105944; c=relaxed/simple;
bh=F1OK78Q7wFHXoll8kU8rdiRh6W4bOhDZRf7MngBwwOc=;
h=From:Subject:To:Message-ID:Date:MIME-Version;
b=g+k4Ioxblq0DpkWyqKSQir8ra7Rw4FF6Am2KBr3LXmwHBHYjw50mDI5Bop3gUVXQqoMWTXdY8eiijWIr1CXz1gMoqT/uSKX1MbrCFWYm3XQ2kmpJdJtJ0N79wOcbAew7KPmkrQ9mEX99TQyUtIirT7x4vuANAGWK6uAlIerbZCo=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 75091385781B
Subject: Re: readdir() returns inaccessible name if file was created with
invalid UTF-8
To: cygwin AT cygwin DOT com
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de>
<03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de>
<aFxRfI4NdZ8y5IlK AT calimero DOT vinschen DOT de>
<f78c615c-aefe-b3d0-aada-5f9d0cf73a0a AT t-online DOT de>
<aF5y15iQ840LxLYJ AT calimero DOT vinschen DOT de>
<3295c8bd-2c09-76c7-8b5f-0106dc39dd96 AT t-online DOT de>
<aF6x55WXIS1t655i AT calimero DOT vinschen DOT de>
Message-ID: <5fae4fcc-6847-ab19-b487-3a28c76d96e4@t-online.de>
Date: Sat, 28 Jun 2025 12:18:57 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101
SeaMonkey/2.53.20
MIME-Version: 1.0
In-Reply-To: <aF6x55WXIS1t655i@calimero.vinschen.de>
X-TOI-EXPURGATEID: 150726::1751105940-057F8536-6F7683DD/0/0 CLEAN NORMAL
X-TOI-MSGID: 54bf753f-892e-4dbe-b9ad-14dbce175f03
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Christian Franke via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Christian Franke <Christian DOT Franke AT t-online DOT de>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>

This is a multi-part message in MIME format.
--------------B19B623E207A25A37D693D40
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

Corinna Vinschen via Cygwin wrote:
> On Jun 27 15:32, Christian Franke via Cygwin wrote:
>> $ touch $'t-\xef\x80\x80'
>> The name mapping is:
>> "t-\xEF\x80\x80" -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-"
> Did you copy/paste this from the old mail, by any chance?

Sorry, I accidentally mixed two cases with same readdir() result:

"t-\xEF\x80\x80" -(open, ...)-> L"t-\xF000" -(readdir)-> "t-"
"t-\xED\xAD\x99' -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-"

$ touch $'t-\xed\xad\x99'
$ touch $'t-\xef\x80\x80'
$ ls | uniq -c
       2 t-

Does no longer occur in 3.7.0-0.165.g1b60f4861b70 but see below.


> Using the latest test DLL the mapping is
>
>    "t-\xEF\x80\x80" -(open, ...)-> L"t-\xF000"
>
> And that's basically correct, albeit it leads to problems.
>
> You know that we defined the area from 0xf000 to 0xf0ff as our private
> use area to create filenames with characters invalid in DOS filenames
> by transposing these chars into the private use area.  When converting
> the filenames back, the 0xf0XX chars are transposed back to 0xXX.

Yes.


> But yeah, I found the bug here.  The problem is that the transpose table
> incorrectly contains NUL as transposable character.  So if you create
> L"t-\xF000", that's fine.  However, when converting this name back to
> UTF-8, the filename becomes L"t-\0".  Oops.
>
> I dropped the ASCII NUL from the list of transposable characters and
> now what you get is this:
>
>    $ touch $'t-\xef\x80\x80'
>    $ touch $'t-\xef\x80\x81'
>    $ ls -l
>    total 0
>    -rw-r--r-- 1 corinna vinschen 0 Jun 27 16:49 't-'$'\001'
>    -rw-r--r-- 1 corinna vinschen 0 Jun 27 16:49 't-'$'\357\200\200'
>
> Apart from the incorrect transposition of ASCII NUL, the transposition
> works transparently:
>
>    $ echo foo > $'t-\xef\x80\x81'
>    $ cat $'t-\xef\x80\x81'
>    foo
>    $ cat $'t-\x01'
>    foo
>
> I'll apply the patch shortly.

$ touch $'t-\xed\xad\x90'
$ touch $'t-\xed\xad\x91'
$ touch $'t-\xed\xad\x92'
$ touch $'t-\xed\xad\x93'
$ touch $'t-\xed\xad\x94'
$ ls | uniq -c
       5 t-

$ ls -s
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
total 0
? t-  ? t-  ? t-  ? t-  ? t-

All results found by several runs with different seeds of the attached 
test program have in common that the Windows path name contains an 
invalid word in UTF-16 High Surrogate range:

$ ./randnames 42
$'t-\xEC\x9E\xB3\xEF\x82\x80\xEF\x83\xA0': access() failed, errno=2:
$'t-\xED\xA4\xA8\x80\xE0': original path
L"t-\xD928\xF080\xF0E0": Windows path

$'t-\xEE\x9E\xB3\xEF\x83\xA1': access() failed, errno=2:
$'t-\xED\xA6\xB0\xE1': original path
L"t-\xD9B0\xF0E1": Windows path
...
$'t-\xE7\xBE\xB3\xEF\x82\xB3': access() failed, errno=2:
$'t-\xED\xA2\x96\xB3': original path
L"t-\xD896\xF0B3": Windows path


-- 
Thanks,
Christian



--------------B19B623E207A25A37D693D40
Content-Type: text/plain; charset=UTF-8;
 name="randnames.c"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="randnames.c"

I2luY2x1ZGUgPGRpcmVudC5oPg0KI2luY2x1ZGUgPGVycm5vLmg+DQojaW5jbHVkZSA8ZmNu
dGwuaD4NCiNpbmNsdWRlIDxzdGRpby5oPg0KI2luY2x1ZGUgPHN0ZGxpYi5oPg0KI2luY2x1
ZGUgPHN0cmluZy5oPg0KI2luY2x1ZGUgPHVuaXN0ZC5oPg0KI2luY2x1ZGUgPHdjaGFyLmg+
DQojaW5jbHVkZSA8d2luZG93cy5oPg0KDQpzdGF0aWMgdm9pZCBwcmludF9jKEZJTEUgKiBm
LCBjb25zdCBjaGFyICogcykNCnsNCiAgZnB1dHMoIiQnIiwgZik7DQogIGNoYXIgYzsNCiAg
Zm9yIChpbnQgaSA9IDA7IChjID0gc1tpXSk7IGkrKykgew0KICAgIGlmIChjID09ICdcJycp
DQogICAgICBmcHV0cygiJ1xcJyQnIiwgZik7DQogICAgZWxzZSBpZiAoJyAnIDw9IGMgJiYg
YyA8PSAnficpDQogICAgICBmcHV0YyhjLCBmKTsNCiAgICBlbHNlDQogICAgICBmcHJpbnRm
KGYsICJcXHglMDJYIiwgYyAmIDB4ZmYpOw0KICB9DQogIGZwdXRjKCdcJycsIGYpOw0KfQ0K
DQpzdGF0aWMgdm9pZCBwcmludF93KEZJTEUgKiBmLCBjb25zdCB3Y2hhcl90ICogcykNCnsN
CiAgZnB1dHMoIkxcIiIsIGYpOw0KICB3Y2hhcl90IGM7DQogIGZvciAoaW50IGkgPSAwOyAo
YyA9IHNbaV0pOyBpKyspIHsNCiAgICBpZiAoYyA9PSBMJyInIHx8IGMgPT0gTCdcXCcpDQog
ICAgICBmcHJpbnRmKGYsICJcXCVjIiwgYyk7DQogICAgZWxzZSBpZiAoTCcgJyA8PSBjICYm
IGMgPD0gTCd+JykNCiAgICAgIGZwdXRjKGMsIGYpOw0KICAgIGVsc2UNCiAgICAgIGZwcmlu
dGYoZiwgIlxceCUwNFgiLCBjICYgMHhmZmZmKTsNCiAgfQ0KICBmcHV0YygnIicsIGYpOw0K
fQ0KDQpzdGF0aWMgdm9pZCBnZXRfd2lubmFtZSh3Y2hhcl90ICogbmFtZSkNCnsNCiAgV0lO
MzJfRklORF9EQVRBVyBlOw0KICBIQU5ETEUgaCA9IEZpbmRGaXJzdEZpbGVXKEwiKiIsICZl
KTsNCiAgaWYgKGggPT0gSU5WQUxJRF9IQU5ETEVfVkFMVUUpIHsNCiAgICBmcHJpbnRmKHN0
ZGVyciwgIkZpbmRGaXJzdEZpbGVXKCk6IEVycm9yPSV1XG4iLCBHZXRMYXN0RXJyb3IoKSk7
DQogICAgZXhpdCgxKTsNCiAgfQ0KICBpbnQgaSA9IDA7DQogIGRvIHsNCiAgICBpZiAoIXdj
c2NtcChlLmNGaWxlTmFtZSwgTCIuIikgfHwgIXdjc2NtcChlLmNGaWxlTmFtZSwgTCIuLiIp
KQ0KICAgICAgY29udGludWU7DQogICAgd2NzY3B5KG5hbWUsIGUuY0ZpbGVOYW1lKTsNCiAg
ICBpKys7DQogIH0gd2hpbGUgKEZpbmROZXh0RmlsZVcoaCwgJmUpKTsNCiAgRmluZENsb3Nl
KGgpOw0KICBpZiAoaSAhPSAxKSB7DQogICAgZnByaW50ZihzdGRlcnIsICJFcnJvcjogJWQg
V2luMzIgZmlsZXMgZm91bmRcbiIsIGkpOw0KICAgIGV4aXQoMSk7DQogIH0NCn0NCg0Kc3Rh
dGljIHZvaWQgZ2V0X2N5Z25hbWUoY2hhciAqIG5hbWUpDQp7DQogIERJUiAqIGQgPSBvcGVu
ZGlyKCIuIik7IA0KICBpZiAoIWQpIHsNCiAgICBwZXJyb3IoIm9wZW5kaXIiKTsNCiAgICBl
eGl0KDEpOw0KICB9DQogIGludCBpID0gMDsNCiAgY29uc3Qgc3RydWN0IGRpcmVudCAqIGU7
DQogIHdoaWxlICgoZSA9IHJlYWRkaXIoZCkpKSB7DQogICAgaWYgKCFzdHJjbXAoZS0+ZF9u
YW1lLCAiLiIpIHx8ICFzdHJjbXAoZS0+ZF9uYW1lLCAiLi4iKSkNCiAgICAgIGNvbnRpbnVl
Ow0KICAgIHN0cmNweShuYW1lLCBlLT5kX25hbWUpOw0KICAgIGkrKzsNCiAgfQ0KICBjbG9z
ZWRpcihkKTsNCiAgaWYgKGkgIT0gMSkgew0KICAgIGZwcmludGYoc3RkZXJyLCAiRXJyb3I6
ICVkIEN5Z3dpbiBmaWxlcyBmb3VuZFxuIiwgaSk7DQogICAgZXhpdCgxKTsNCiAgfQ0KfQ0K
DQpzdGF0aWMgdm9pZCByYW5kbmFtZShjaGFyICogbmFtZSwgaW50IG1heGxlbikNCnsNCiAg
aW50IGxlbiA9IDEgKyByYW5kKCkgJSAobWF4bGVuICsgMSAtIDEpOw0KICBmb3IgKGludCBp
ID0gMDsgaSA8IGxlbjsgaSsrKSB7DQogICAgY2hhciBjID0gMSArIHJhbmQoKSAlICgyNTYg
LSAyIC0gMSk7DQogICAgaWYgKGMgPj0gJy8nKQ0KICAgICAgYysrOw0KICAgIGlmIChjID49
ICdcXCcpDQogICAgICBjKys7DQogICAgbmFtZVtpXSA9IGM7DQogIH0NCiAgbmFtZVtsZW5d
ID0gMDsNCn0NCg0Kc3RhdGljIGludCB0ZXN0bmFtZShjb25zdCBjaGFyICogbmFtZSkNCnsN
CiAgaW50IGZkID0gb3BlbihuYW1lLCBPX1dST05MWXxPX0NSRUFULCAwNjQ0KTsNCiAgaWYg
KGZkIDwgMCkgew0KICAgIHByaW50X2Moc3Rkb3V0LCBuYW1lKTsgcHJpbnRmKCI6IG9wZW4o
KSBmYWlsZWQsIGVycm5vPSVkXG4iLCBlcnJubyk7DQogICAgZXhpdCgxKTsNCiAgfQ0KICBj
bG9zZShmZCk7DQoNCiAgY2hhciBjeWduYW1lW01BWF9QQVRIXTsNCiAgZ2V0X2N5Z25hbWUo
Y3lnbmFtZSk7DQogIHdjaGFyX3Qgd2lubmFtZVtNQVhfUEFUSF07DQogIGdldF93aW5uYW1l
KHdpbm5hbWUpOw0KDQogIGludCByYyA9IDE7DQogIGlmIChhY2Nlc3MoY3lnbmFtZSwgMCkp
IHsNCiAgICBwcmludF9jKHN0ZG91dCwgY3lnbmFtZSk7IHByaW50ZigiOiBhY2Nlc3MoKSBm
YWlsZWQsIGVycm5vPSVkOlxuIiwgZXJybm8pOw0KICAgIHByaW50X2Moc3Rkb3V0LCBuYW1l
KTsgcHJpbnRmKCI6IG9yaWdpbmFsIHBhdGhcbiIpOyANCiAgICBwcmludF93KHN0ZG91dCwg
d2lubmFtZSk7IHByaW50ZigiOiBXaW5kb3dzIHBhdGhcblxuIik7DQogICAgcmMgPSAwOw0K
ICB9DQoNCiAgaWYgKHVubGluayhuYW1lKSkgew0KICAgIHByaW50X2Moc3Rkb3V0LCBuYW1l
KTsgcHJpbnRmKCI6IHVubGluaygpIGZhaWxlZCwgZXJybm89JWRcbiIsIGVycm5vKTsNCiAg
ICBwcmludF93KHN0ZG91dCwgd2lubmFtZSk7IHByaW50ZigiOiBXaW5kb3dzIHBhdGhcbiIp
Ow0KICAgIGV4aXQoMSk7DQogIH0NCiAgcmV0dXJuIHJjOw0KfQ0KDQppbnQgbWFpbihpbnQg
YXJnYywgY2hhciAqKmFyZ3YpDQp7DQogIGlmIChhcmdjID4gMSkNCiAgICBzcmFuZChhdG9p
KGFyZ3ZbMV0pKTsNCg0KICBjb25zdCBjaGFyICogZGlyID0gInRlc3QudG1wIjsNCiAgcm1k
aXIoZGlyKTsNCiAgaWYgKG1rZGlyKGRpciwgMDc1NSkpIHsNCiAgICBwZXJyb3IoZGlyKTsg
cmV0dXJuIDE7DQogIH0NCiAgaWYgKGNoZGlyKGRpcikpIHsNCiAgICBwZXJyb3IoZGlyKTsg
cmV0dXJuIDE7DQogIH0NCg0KICBpbnQgZXJycyA9IDA7DQogIGZvciAoaW50IGkgPSAwOyBp
IDwgMTAwMDAwOyBpKyspIHsNCiAgICBjaGFyIG5hbWVbOF0gPSAidC0iOw0KICAgIHJhbmRu
YW1lKG5hbWUgKyAyLCBzaXplb2YobmFtZSkgLSAxIC0gMik7DQogICAgaWYgKCF0ZXN0bmFt
ZShuYW1lKSAmJiArK2VycnMgPj0gMTApDQogICAgICBicmVhazsNCiAgfQ0KICByZXR1cm4g
MDsNCn0NCg==
--------------B19B623E207A25A37D693D40
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

--------------B19B623E207A25A37D693D40--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019