delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2025/07/21/23:35:27

DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 56M3ZQUr3549069
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 56M3ZQUr3549069
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=PLOTOGXm
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EFDEF385C6EB
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1753155326;
bh=39qYtt728lXBrY5wO9J7GD+qmwRKFpma6po/b3k779g=;
h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=PLOTOGXmhOpRg4lfByUN4i3LpQ8wvesJBIZLaBlM8wlLpLZWAn5G5c1fiRp2bVghU
APQo1rjydianr0w0ikuBsyxk3hY0eCjajcJwB1rKEFca3EiB7BkvSoJXE4jBqM8K2Y
la8T1QaBz1D6F/WIH+Yvg6VKPxgFJyyPNY+BbTlc=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 58884385C41F
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 58884385C41F
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1753155268; cv=none;
b=aEHO0cmO7FCRKXZ78GScOSAhKv4wztPJ7OKGlzo935Eu2xDI+OK3qLipjNxqShe23W4FE2Q0xQhtZVeOzDwjl1PTgmyDOgfDL4k2RIxjCF2FKoTU8ee42zeTo2snE0KRIL2xFY6KNGqrHFDJnfzwfsbM7p6RnMZbJMx5dCdOPHM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1753155268; c=relaxed/simple;
bh=v1+3n0DKQBhv+k8aOUVcZIqvn+OZrmKT0MMnJJr3gKs=;
h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From;
b=bMsDJNeAwVYx9wnR0RH58O9C92bXIPWZb4mCPhuy038YCY/lzOt7bNwUwwh5Tp03gSETERYYGVvR2/KO8iIxaUULxAp/PvFWl6Z/Ks7YB33byZ/auA20CWEY8jfPEUTzhKLo0Gum4bf7U8zuUrEoq+X4lM0kmOlFsc6HtEPpTDI=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 58884385C41F
X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6
Message-ID: <ca205dbd-907f-4552-9e5c-2cb0050f83a3@towo.net>
Date: Tue, 22 Jul 2025 05:38:10 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: readdir() returns inaccessible name if file was created with
invalid UTF-8
To: cygwin AT cygwin DOT com
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de>
<03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de>
<aFxRfI4NdZ8y5IlK AT calimero DOT vinschen DOT de>
<f78c615c-aefe-b3d0-aada-5f9d0cf73a0a AT t-online DOT de>
<aF5y15iQ840LxLYJ AT calimero DOT vinschen DOT de>
Autocrypt: addr=towo AT towo DOT net; keydata=
xsDNBGNaf3QBDACVevqudcTSevLThXKQPU1QpaDxtGuYjtwmr7i9wXxVGih4Y4oxOJN4PYlu
KBX9IVAI4651dA+xYtXuyIkWOPZWyyzkGKavQOn3Q7dk09oj7bh2IwOndpxXXde337D408EQ
bQEGbMHr9lOWhSAideowzgCeFIvGTf2AovbPh97HpexJn1/HCRiRAhTNlrkS1DByUgCAeEMK
fEr6aGM/Ou29MT+eTnQwOIZTnl9Z9LxM2FtqqMH3MycC7I2OoW3XXhuL8BPQdyJUjWa0/J11
Oo5jFkRXtWenIns6jGn18oW72jnDmo9jXwwS+iZWAV6Y51nhD7jSC+3xs9ORmPCdtHUSpTr1
zh67UueUJ3DUUNVuA25Hn/9EJMJ2L60BGUEr88NEB6pcZhmcwdkurAQeYT6t+frzBz2ctsoN
BoxP/Xc02yd+z7hXWRRMrJWh9WHlQHA3Z4FfmyNhyPhs3MgKTJ1E9QfzGquigAmF3/k/Dc1m
7cSOKhGYhpEJdSpdXccJFKkAEQEAAc0cVGhvbWFzIFdvbGZmIDx0b3dvQHRvd28ubmV0PsLB
BwQTAQgAMRYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn93AhsDBAsJCAcFFQgJCgsFFgID
AQAACgkQxvPR7vYGnQKSMAv8Di+8MXB2mcfsemRdShfLLKcLOv+d0CXAtPVaY3XKxbKpRvC9
+AAT5wIHYjQft77/b2y87vGIh+nQ5hKLtNtQPSDtqG/Igkb5jAXpLi28fSUzgM96DvARmwve
5wSnAU3prxH+Y63YpOpslEcGMRoEtYCDy1ANMYPcEZT/YvDd4CplyyEai4VYrw3/LsESDYlY
GK6uMQzZ1jl2cNOUFu6BwLUeZIcwaqGto8n4R4nbf4jxUEpa21bWBPqE+Jf49uipjPr/iJ72
5HbdWuuCfyTTJEJjfNEBigWP2RXM9iNDcO61V3aEjh76tThfBK2MMlLWfZkQaQziu24x8R4B
I0efJYWBX2Sv2qnsH/EWj7FUIZjRqGG7LnWHLShfG6yjSOTOWYi8BbsvoftpaLWgZX28aGX4
uzuSZ5L0caXh/pr/gSgqoH/YbuFIgqtQH4seOBgTybd22Vpe78rnc+8450pN8qwchHAZaJka
UxS0SpYxXzXmHUKILA4C43s0U/z2Mez9zsDNBGNaf3cBDADeJ7paMrb6f1+k8wM7tyk0/Ded
KX/pOejt/D20Ceerw2iL/4tUmBL+A3ic2yjiSFUSsEfHwgCVwKrn4MwZtkesdiphm2lk6xWc
k1ENCQy44QwQT6UZ/mHWYWcj5LS6ua183x1zdn9iF3lv150nm/ssw56D7USz/ap1Vh0lf5te
D+CIheGLocVDqxWiu7rHP8jKRWFgq/+OU6HKX8p2Yv1oYsykh9qF2bFzawLDS+S1VbfRicfD
G0RtceL/BAf7b6UE5u9TGdfrFEa2TKZeS/FS/ViKUfwsXQIki1sWt2FQENbuDY28vxyR46ZZ
0gixDCFUoBw5pkmOGVQa+1RQYrRqlN4X0CAgp7mFVeEHl5NTgiL1bemkQVmHOUDG+CzNg+Lk
UGoedAtT672l3JjrnSs4j8zNshpgV2OfAhAC+V9XvqCjMnxzVfXkVlbuWpPfUWQeFclLGg8P
agpQUE0Ux+VV4DoeQCxYEnRCf/n7n+IRfILj5+2l6Zw4M7zSu6ii0tUAEQEAAcLA9gQYAQgA
IBYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn97AhsMAAoJEMbz0e72Bp0CQr4L/REdT0SF
mbapnZIe92THCdtAUgwEv8VdNiNFBJelz8P/fuXuNPtisYvQQD4e64zpWe2UC4Cxo9DUk/pW
6Qci1xaXRKEiSPjHdSGGVB1PFIcqiS75GCf/ga/Dnfsy0Y4Uh6OGTQnkvZLBCe3vvcVLDQ7F
PuV79zA9/eOeOW6aGoO6bq/wH+z96f9LyTITkQDy07fm6JYTGuzAoJE2AEboU1mgbtlx+tAa
QFkpAQkp2g1Vhc3A7k4vntlHOrjMC+uVFh7QTGFfIlLRF6izUjSe6EZ06LErzlIiE05RP3yF
FSRWidW0wze26peYlxYVgH1+T9wMTW2oiTBybfAMHBAxUP7Gr1WUo/oJEr0srWhatz8AwydP
y7NwFbdpYn0NcFBaIlLW/JL11Eovwlivow+oGpzGFuuzSuflp2q9s2JWtn4EhW0kEs93D0LP
iuJWvRaCZ6aD3uF3FMW8wyVWZYsLrzune2jH8w/uKMprDEOGOm+BcyhEFedTyY1ygbZKl+0G kQ==
In-Reply-To: <aF5y15iQ840LxLYJ@calimero.vinschen.de>
X-Provags-ID: V03:K1:XCupYmC8BMIW8Rehku2ZCmWAg3mFDC1xQWf6MJjz04+TMyJ/Seh
o/JTtujgtpQrd61oP6lPFA61HSIBoXxZ1pOw6uZzBbRIc8IQGgWZ6I3B+hXPl2xQ9w1mWmF
yV7nnjCLBOE0xd3GXQ3d/4MUZE00jvEa0VDHKIJwg51BrrK9fj0VL077PKBbQIaAz9dlyP+
ZPKKvcdUzvA8hKE4o6xHw==
UI-OutboundReport: notjunk:1;M01:P0:mPQVk1kIV7A=;emQd/Cg3Tk2ZDac0svmqdvmqJkA
5GJo1YJi2Ok8O/s9JERJpXHLtxMVOUm8brDuWQtEZmq2IVvsyY/zpqv1Zyw7o/zlr/fNdMB06
I5kwLhUXIrgZ6WkKiav2zNhtIYyG11orQfRdWOaYvQmloE0oSBcmpH1x/rNtEU9COGIkoV5rp
9tO861Ffizrc7jTfxOByzpaQvSsWJ1Nav0v1RWtV6yfDLGJpYUTbkNVq7eg5PkWQ3Vmkpk0NE
fm79b16628II3aOI4QvKk4GOWk0aQ/MtT4xqJA9uHmv+cM/HGXn57FwyDlywiajKhMK8rbsb1
UgoIbbYnXcRJhUwil1OgT2YKYbUam1BZ4akFy6ex59rMRVIl9gjfXaMmGo4KpZyTXykAS/pEY
rUIdK9tHCE1QBZaZF3dyWBjBYMeU3BCRvTInBz+Mp7NwMM43PcLWESWIi1G03NwuqGKIGe4s6
XoAl3+70Fkwo8ZDyBTheCDZ+/MrDPRCfCWfSPkxuWxMk2i3mX2Q4f8kXIPvEq+QToaCQi2Ti9
LlUV53TDalL/ucU7hLO6/s5yMu0LRmhmt+sDeZkAnPUydJhBsB49jamOOzhNQhKPzCdXW36i6
l8yd1MMmc+3QtwCWatmS3DdeEeDWXTxc9TbK3KePAw+TtJxuFqW+VUfvtpSiN6sD5nazzpfgr
uIEseTP1uVcJJM2yelet7dQrgBSM7KH1IavIjIJvYnU2JzAdrjnhZZFO9NDkAMPnRFoJ4JehU
4UYjwA6qIT1m8hV9SYSqVUuldE7WFxqWiq81uWzQotnwYcwUllk9OHbnKIoT3XWUM3KVOGXao
C7ouWNEZ3xM6s/zAUfCbRQUQ04W5F9TyKRPG9o2x3XLCJ+Rp9t94e5C86a8pqo8yZISR4dq5Z
egLbUeeiyCV049VIHMIDsBOlOzF7zAqABY84vK0kTY7PkglHMAGgWabB5HxmR4tspPRza1wb8
dUeGKwNy8uO7PAjjNQnRkJEGcseGJKX2jRoqGKMJLDUQNfx1kC+3z2tIZHKdges0B1Xflkou1
4XAcg3N7Rxe6bl/tn7Hq/AEBWNDwnpeZt05uoup+MOsYACJ61j0yUNQ4LyT6e0aWHRDcRfj0R
Iw2P0AqwzeIWjHnGIBCO93x8LbUrzCU2dJk38/n7nb+10/Bhhj93ciLpyRzCxLGkscAxbXCRR
jz6K5j3nkQE347KqeXfWWzKsjEFzf8dZH3TkIxRiEojs6i1c5OZOmdRIRf2XjSCkrU8gCnKfN
mHdedWGLLOUYCqta/EjpvrjJEglkrvoags/0NWVRSF1QY6V37VThDaiLxBolne5Uz4iqfIhhe
qM/DnQH9DiRtdkaYrsh+oxNmN3m2f1nwIkdrgsohwFtdT/jsbcJ/sn2wUQZMYZkLUeE5rv0uO
bmzuqfQ7dPjX38hqg5zsEArAgPHUE7nR4QUAQ7/RXb3GcgCaCNVZMIgCBOEbxV3l8OmU+SIRj
VdNfB/9SBFx14/qF8MLIzmv+kk5F6h5paTqZlCu6AxX2t6HCRHzJmpe7lu6yil6Lj1VcxRyV8
A72InJXIt1R/rSjm+Qpp5CLG0aCSI2HrSag45Lur3OzUOvJ4GbO62LwpAoE2SlSzm64Blu0Ul
AabQCnM05Y=
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Thomas Wolff via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Thomas Wolff <towo AT towo DOT net>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 56M3ZQUr3549069

Am 27.06.2025 um 12:30 schrieb Corinna Vinschen via Cygwin:
> Hi Christian,
>
> On Jun 26 19:07, Christian Franke via Cygwin wrote:
>> Corinna Vinschen via Cygwin wrote:
>>> On Jun 25 16:59, Christian Franke via Cygwin wrote:
>>>> On Sun, 15 Sep 2024 19:47:11 +0200, Christian Franke wrote:
>>>>> If a file name contains an invalid (truncated) UTF-8 sequence, open()
>>>>> does not refuse to create the file. Later readdir() returns a different
>>>>> name which could not be used to access the file.
>>>>>
>>>>> Testcase with U+1F321 (Thermometer):
>>>>>
>>>>> $ uname -r
>>>>> 3.5.4-1.x86_64
>>>>>
>>>>> $ printf $'\U0001F321' | od -A none -t x1
>>>>>    f0 9f 8c a1
>>>>>
>>>>> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext'
>>>>>
>>>>> $ touch 'file2-'$'\xf0\x9f\x8c''.ext'
>>>>>
>>>>> $ touch 'file3-'$'\xf0\x9f\x8c'
>>>>>
>>>>> $ ls -1
>>>>> ls: cannot access 'file2-.?ext': No such file or directory
>>>>> ls: cannot access 'file3-': No such file or directory
>>>>> 'file1-'$'\360\237\214\241''.ext'
>>>>> file2-.?ext
>>>>> file3-
>>>>> [...]
>>> I don't know exactly where this happens, but the input of the
>>> conversion is invalid UTF-8 because it's missing the 4th byte.
>>> There's no way to represent these filenames on Windows
>>> filesystems storing filenames as UTF-16 values.
>>>
>>> So the problem here is that the conversion somehow misses that
>>> the 4th byte is invalid and just plods forward and converts the
>>> leading three bytes into the matching high surrogate value and
>>> then stumbles over the conversion for the low surrogate.
>>>
>>> It would be really helpful to have an STC for this problem.
>> With some trial and error I found a testcase for this more serious problem
>> reported yesterday but not quoted above:
>>
>>>> In cases like file3-... above, the converted Windows path ends with
>>>> 0xF000. This suggests that this is an accidental conversion of the
>>>> terminating null to the 0xF0xx range.
>>>>
>>>> In some cases, the created Windows file name has random garbage
>>>> behind the 0xF000. Then even Cygwin is not able to access or unlink
>>>> the file after creation.
>> Testcase (attached):
> Thanks for the testcase!
>
> I found the problem in the newlib core function creating wchar_t from
> UTF-8 input.  In case of 4 byte UTF-8 sequences, the code created the
> low surrogate already after reading byte 3, without checking if byte 4
> of the UTF-8 sequence is a valid byte. Hilarity ensues.
I'm afraid the fix may have broken mbrtowc as I just reported to the 
list, with a test case, thus also breaking mintty.
The low surrogate MUST be created after byte 3 because otherwise the 
high surrogate cannot be delivered after byte 4 as it needs to.
I think it's a drawback of UTF-16 that must be swallowed, even if some 
incorrect sequences slip through somehow.

Thomas

> Fortunately this bug has only been introduced very recently, to wit, on
> 2009-03-24, a mere 16 years ago.  And it is my bug and mine alone :}
>
> I'm just prep'ing a fix which I'll push in a minute or two.
>
>
> Thanks,
> Corinna
>


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019