delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2025/07/22/23:45:30

DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 56N3jUwK390296
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 56N3jUwK390296
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=V2uctReJ
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6241B3858D1E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1753242328;
bh=aHA+1E3CuNBtomCNa6AbpyXwml6LRtramkLHtV0PN7w=;
h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=V2uctReJLYks5hhi2OaAm/bXfZhHl3nlL5O4OhTPZy4qzhlyIkclqg14euvtnDHQj
lLq66KWA7rtyhyDghzpByJVRkobqOG6fM8N8JER3jbjBNTWUiLfc0AptCuzp2GRR4D
hC2091NQaPV0YZuURQTIR/c4uF6leKANy7tyPEx8=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2FA453858D1E
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2FA453858D1E
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1753242265; cv=none;
b=PRbPbLb/TqkHYM6WIYQcUSipL42NtSHKzfTHOBKcWYTshdQmy46s3cTPZf3HY6PpaYDjQUYxt8nTrCEbSY9AdGm4xqf0/yFSfFIV7aVQlRj177Nyx7zlHeIRh4oeVDWItPHzDqP0seTIXBQInqASZHjBIejkFCQ5HSlLjLbkkFo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1753242265; c=relaxed/simple;
bh=Y7CP5ZdMA14RjIePDAoSGE14b/2RfXfM5PgfrjpfO+A=;
h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From;
b=q7Hh+U8Fhxhlvq4iJnxBqJIk95ezgsSyP6WFn002mC37ALXlAfssCE6dQX8u6sMukyX8xy8WqemGq8xvR6jMpMKBrisnnN31qf1QcAJ++hRHshn07Qpcc3RJB549hwKExVia3ZcFpq53XD6UaW13W18t6ECj1rvbpttVpFr0tdI=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2FA453858D1E
X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6
Message-ID: <68f65634-8f4e-436b-ba6a-d30bdf882aaa@towo.net>
Date: Wed, 23 Jul 2025 05:44:23 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: readdir() returns inaccessible name if file was created with
invalid UTF-8
To: cygwin AT cygwin DOT com
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de>
<03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de>
<aFxRfI4NdZ8y5IlK AT calimero DOT vinschen DOT de>
<f78c615c-aefe-b3d0-aada-5f9d0cf73a0a AT t-online DOT de>
<aF5y15iQ840LxLYJ AT calimero DOT vinschen DOT de>
<ca205dbd-907f-4552-9e5c-2cb0050f83a3 AT towo DOT net>
<aH-MtwqARmDmLwoo AT calimero DOT vinschen DOT de>
<91f26856-72b0-483b-8d04-bd90a27b6be0 AT towo DOT net>
<4ab2c1b7-3164-4556-ba36-29814ecf5766 AT towo DOT net>
Autocrypt: addr=towo AT towo DOT net; keydata=
xsDNBGNaf3QBDACVevqudcTSevLThXKQPU1QpaDxtGuYjtwmr7i9wXxVGih4Y4oxOJN4PYlu
KBX9IVAI4651dA+xYtXuyIkWOPZWyyzkGKavQOn3Q7dk09oj7bh2IwOndpxXXde337D408EQ
bQEGbMHr9lOWhSAideowzgCeFIvGTf2AovbPh97HpexJn1/HCRiRAhTNlrkS1DByUgCAeEMK
fEr6aGM/Ou29MT+eTnQwOIZTnl9Z9LxM2FtqqMH3MycC7I2OoW3XXhuL8BPQdyJUjWa0/J11
Oo5jFkRXtWenIns6jGn18oW72jnDmo9jXwwS+iZWAV6Y51nhD7jSC+3xs9ORmPCdtHUSpTr1
zh67UueUJ3DUUNVuA25Hn/9EJMJ2L60BGUEr88NEB6pcZhmcwdkurAQeYT6t+frzBz2ctsoN
BoxP/Xc02yd+z7hXWRRMrJWh9WHlQHA3Z4FfmyNhyPhs3MgKTJ1E9QfzGquigAmF3/k/Dc1m
7cSOKhGYhpEJdSpdXccJFKkAEQEAAc0cVGhvbWFzIFdvbGZmIDx0b3dvQHRvd28ubmV0PsLB
BwQTAQgAMRYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn93AhsDBAsJCAcFFQgJCgsFFgID
AQAACgkQxvPR7vYGnQKSMAv8Di+8MXB2mcfsemRdShfLLKcLOv+d0CXAtPVaY3XKxbKpRvC9
+AAT5wIHYjQft77/b2y87vGIh+nQ5hKLtNtQPSDtqG/Igkb5jAXpLi28fSUzgM96DvARmwve
5wSnAU3prxH+Y63YpOpslEcGMRoEtYCDy1ANMYPcEZT/YvDd4CplyyEai4VYrw3/LsESDYlY
GK6uMQzZ1jl2cNOUFu6BwLUeZIcwaqGto8n4R4nbf4jxUEpa21bWBPqE+Jf49uipjPr/iJ72
5HbdWuuCfyTTJEJjfNEBigWP2RXM9iNDcO61V3aEjh76tThfBK2MMlLWfZkQaQziu24x8R4B
I0efJYWBX2Sv2qnsH/EWj7FUIZjRqGG7LnWHLShfG6yjSOTOWYi8BbsvoftpaLWgZX28aGX4
uzuSZ5L0caXh/pr/gSgqoH/YbuFIgqtQH4seOBgTybd22Vpe78rnc+8450pN8qwchHAZaJka
UxS0SpYxXzXmHUKILA4C43s0U/z2Mez9zsDNBGNaf3cBDADeJ7paMrb6f1+k8wM7tyk0/Ded
KX/pOejt/D20Ceerw2iL/4tUmBL+A3ic2yjiSFUSsEfHwgCVwKrn4MwZtkesdiphm2lk6xWc
k1ENCQy44QwQT6UZ/mHWYWcj5LS6ua183x1zdn9iF3lv150nm/ssw56D7USz/ap1Vh0lf5te
D+CIheGLocVDqxWiu7rHP8jKRWFgq/+OU6HKX8p2Yv1oYsykh9qF2bFzawLDS+S1VbfRicfD
G0RtceL/BAf7b6UE5u9TGdfrFEa2TKZeS/FS/ViKUfwsXQIki1sWt2FQENbuDY28vxyR46ZZ
0gixDCFUoBw5pkmOGVQa+1RQYrRqlN4X0CAgp7mFVeEHl5NTgiL1bemkQVmHOUDG+CzNg+Lk
UGoedAtT672l3JjrnSs4j8zNshpgV2OfAhAC+V9XvqCjMnxzVfXkVlbuWpPfUWQeFclLGg8P
agpQUE0Ux+VV4DoeQCxYEnRCf/n7n+IRfILj5+2l6Zw4M7zSu6ii0tUAEQEAAcLA9gQYAQgA
IBYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn97AhsMAAoJEMbz0e72Bp0CQr4L/REdT0SF
mbapnZIe92THCdtAUgwEv8VdNiNFBJelz8P/fuXuNPtisYvQQD4e64zpWe2UC4Cxo9DUk/pW
6Qci1xaXRKEiSPjHdSGGVB1PFIcqiS75GCf/ga/Dnfsy0Y4Uh6OGTQnkvZLBCe3vvcVLDQ7F
PuV79zA9/eOeOW6aGoO6bq/wH+z96f9LyTITkQDy07fm6JYTGuzAoJE2AEboU1mgbtlx+tAa
QFkpAQkp2g1Vhc3A7k4vntlHOrjMC+uVFh7QTGFfIlLRF6izUjSe6EZ06LErzlIiE05RP3yF
FSRWidW0wze26peYlxYVgH1+T9wMTW2oiTBybfAMHBAxUP7Gr1WUo/oJEr0srWhatz8AwydP
y7NwFbdpYn0NcFBaIlLW/JL11Eovwlivow+oGpzGFuuzSuflp2q9s2JWtn4EhW0kEs93D0LP
iuJWvRaCZ6aD3uF3FMW8wyVWZYsLrzune2jH8w/uKMprDEOGOm+BcyhEFedTyY1ygbZKl+0G kQ==
In-Reply-To: <4ab2c1b7-3164-4556-ba36-29814ecf5766@towo.net>
X-Provags-ID: V03:K1:fV+/Rrw4o6IvYvDWDNyQsjKdYN03nYG9l96ptDajmd68/Lym9Uh
SknonHLJ5rD+OT01fSell534y2cRXz+qucXrHJOOAZmaRR2lHJ+EBYaqrrSFGG6f9a9ng2C
6BB88gw+pjZT92VS9RYlT/zcDb84pjMc0AJDz+RGr/Mj4i9oBp1XPYLNMZUI+K/smt0r2JX
NEX4FlqTXEvc1N51XaHeg==
UI-OutboundReport: notjunk:1;M01:P0:e9Da3euGN/c=;rqITYxpiYdpzjyPQea6bleBdT8Z
L86FDbuvUkuDOwVWgkCA/aBcd+xEiTTx03Jq6Lvlg7XnOwfhZQrn+whpdtAvlfgCW4QYrVKVu
Apq/0MJkNc6vtnb9DG5Sb+s3ZUfqYvBj/SUtdoafFaEN8TuOg4LOjjQ2mY5mCQRl67+eV0ltb
+fJ6/2RwUG3W2nZbScncohlA9JxF5jeA4/1CgPxUlBFQ9e2EZA3I6A6wupO5jZQuw2yeATCeH
QJs2QklUPmvRoDwNVcG7JCjtAe/Vc7Glmp+GBGwy3Vbx0QutKrUb1H87Z0lLu1iAZwQYYXADS
oY8XXW2cI7utu41CFXW2mjXrLj/vyruS0b6n1chaebZ+a5CaeIfiyRvbGF0PBBowAaxkCcHLq
VSg9r/niiohh8bcEKzmXCYk4BsyKBziPJiUyqgy6Qenuai6hekB36rXQPaiL0iYQtfYUXeToS
nM03w9uBACS/4skBvn0fPmELBo9qVYwORNbkX9B6NTTJHOmu9kSZRxwB3Za4gwxip2i0eZPn6
nuVMctheh/Wht5HGp+ugv9OkuF7NxR6hk/gonSll1lvp0GKbPmx8YnUqs1pjJBmiJH3mCbQfX
XTUOXPHzUWh/nmfK3fFLnYETXiezyus8JkKtZgXAFgJPzgNG6bz67V2Pq0E9j7TRCyUH/N+J+
5qCwewiNQ6eJCOsSnpy11hcdJeH2Xs0t+y2B/CRvjhxJ/3b0nuY2vuK/Hw+eCpEE44lTRawtD
9J7br5MYMK3a9rP/VWKqzNhY65XVQq6UpWlFLAM3KpVRRaXNePBMyUdE/dHumRlTJxyfH2y4a
cg+M/+EK6FEcloRbQZk+NjdXBa9GPCpApWz4tHG8LOKzQksenn8PCjpSi7fqtZCrGJQpEi7gd
YIzLbgdktUXG0m52v5ULQMslXhYTkqkdSjoxB1FlBTk/+Cwh9FJGI+4lQrdoDh7ZvnTnSZhU0
6D9YCERqgZmZGDtlBTaWn9zdZqonmQh9lhnk11/biIjnXIr755SFYfRG7TBx1feA1wbqDnO8i
c0KHTYH9oaTM2Bc5NV/mkf6SyVw084s04kmgJ0EIHKyZqAbyLX0PuJ2AgHebQH4T0ZNmaW3k2
3kc7nmmzi+P5jHKFDI1LToryq4quyqsKNBfrCt2DexmTc3WWvngOIN3Xnl9Ht5doLr5g6sEOP
DIXtxobtozNRU7AjKjC89nHpeqcW04/ybU+AkOEgPr+YNw4F23Qw6y4QexT1xkTwcbDHDvbdU
VO+BIN/k2IbZjvIeI6B3ie4Q8LTRXRFOMHqbKIyX2xP+weGhCbPBmBfl73HDdTobgbBNPxjWl
9tBtYAoqmLLK1RlqncQWFXiGS15kNjR7Nrrgup0jlAXPh52oyR0cP3ABSwZNnNccV7Id0PTSp
hKej1klN05Yu64QuBccCDTHfUlTsNeBmIfCzB72D5C5LlQ622/A7Hbi01ND+hD2eAzISkSBIY
X5fp9Eogz8XM+ArRWNmqRn7z/fojFjFXl2xpiQKOK43bGTIu/wlU0zVg78n566Q/fsiy4Mp5Z
NPNxb+XTZwjJbF3rRp6ijfHAEAZ+b8aI2yCNTOjQhy5oBDodsQESwekJUI0QBw==
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Thomas Wolff via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Thomas Wolff <towo AT towo DOT net>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 56N3jUwK390296


Am 23.07.2025 um 04:25 schrieb Thomas Wolff via Cygwin:
>
>
> Am 22.07.2025 um 17:09 schrieb Thomas Wolff via Cygwin:
>>
>>
>> Am 22.07.2025 um 15:05 schrieb Corinna Vinschen:
>>> On Jul 22 05:38, Thomas Wolff via Cygwin wrote:
>>>> Am 27.06.2025 um 12:30 schrieb Corinna Vinschen via Cygwin:
>>>>> On Jun 26 19:07, Christian Franke via Cygwin wrote:
>>>>>> With some trial and error I found a testcase for this more 
>>>>>> serious problem
>>>>>> reported yesterday but not quoted above:
>>>>>>
>>>>>>>> In cases like file3-... above, the converted Windows path ends 
>>>>>>>> with
>>>>>>>> 0xF000. This suggests that this is an accidental conversion of the
>>>>>>>> terminating null to the 0xF0xx range.
>>>>>>>>
>>>>>>>> In some cases, the created Windows file name has random garbage
>>>>>>>> behind the 0xF000. Then even Cygwin is not able to access or 
>>>>>>>> unlink
>>>>>>>> the file after creation.
>>>>>> Testcase (attached):
>>>>> Thanks for the testcase!
>>>>>
>>>>> I found the problem in the newlib core function creating wchar_t from
>>>>> UTF-8 input.  In case of 4 byte UTF-8 sequences, the code created the
>>>>> low surrogate already after reading byte 3, without checking if 
>>>>> byte 4
>>>>> of the UTF-8 sequence is a valid byte. Hilarity ensues.
>>>> I'm afraid the fix may have broken mbrtowc as I just reported to 
>>>> the list,
>>>> with a test case, thus also breaking mintty.
>>>> The low surrogate MUST be created after byte 3 because otherwise 
>>>> the high
>>>> surrogate cannot be delivered after byte 4 as it needs to.
>>>> I think it's a drawback of UTF-16 that must be swallowed, even if some
>>>> incorrect sequences slip through somehow.
>>> Bummer.  What bugs me most is that you might be right here. It's a bit
>>> late, but we should have defined wchar_t as a 4 byte type back when we
>>> worked on Cygwin 1.7.0... sigh.
>>>
>>> mbrtowc() is inherently a bad idea when it comes to UTF-16. It's a
>>> function which only works really correctly for the unicode base plane,
>>> or if wchar_t is big enough.
>>>
>>> It's the reason we don't use mbrtowc() if possible.  It's better to 
>>> call
>>> mbstowcs() or friends and allow at least 3 chars in the wchar_t buffer.
>>> You can't change that in mintty by any chance?
>> Well, I've started to think about a workaround but it's code I've 
>> never touched before and I'd need to carefully ponder about all kinds 
>> of possible special situations, so my testing effort would be high. 
>> Also, I'd need to implement bytewise mbr collection which is right 
>> now done by that function.
>> Since not using mbrtowc anymore would leave it still broken (and what 
>> other software may fall into that trap...), I'd prefer a fix of that 
>> function anyway.
> I've checked whether to use the old version of mbrtowc from newlib 
> directly in mintty but it pulls too many dependencies...
> I've also checked whether to use _mbrtowc_r instead which is defined 
> in wchar.h but it does not link.
> By the way, discussion and commit log mix up the order: the high 
> surrogate comes first.
>
OK, suppose I'd consider to switch to mbs[[n]r]towcs, collecting bytes 
until the function gives me a result.
This would work fine as long as I receive only valid sequences. But look 
at input string test case
char nonbmp[] = {0xF8, 0x88, 0x8A, 0xAF, 0x2D, 0}; // an invalid 
sequence followed by a valid char
The functions only return -1 and (in the case of mbsnrtowcs) do not 
advance the input pointer.
So how am I supposed to recognize that the invalid sequence has ended 
and a valid character has arrived?


>>
>> Thomas
>>
>>> Corinna
>>
>>
>
>


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019