DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 56MF6fqt4106535 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 56MF6fqt4106535 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=UbnEYZkT X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CE783385F025 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1753196799; bh=GENobLRIhfhnLSmKADJGysQulLuO9t97TXnnsGM8eW0=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=UbnEYZkTmoYXdvYhLsp+lZ4NMG1SKdBvNuUc3DNsyJG2mfAi68EeumnNHmIhsA9OM zFpk5UBtLXiX5qZ4u7i5RAK47CrSCOP7EcCmhApU+CQazU3hWN6p8CaEySXU1+k4LE xl/4MWCuDoBdNcQBN7JJzaH2beHcQpn4IwUlnNpo= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BE60B3850842 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BE60B3850842 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1753196771; cv=none; b=Tx8nQHyBwcrDYtiB1+bYefPm4jaJyKfHbGSPXzxdNMVTrs+Ch1cTISVL3OxCtaI0Sdn4hlWv+AW9/KAM+oTJpMN03PHKa+JisU0QnExpIXEXt1foLxAemIel21/LAs+hBxbC9ohik4Ar6/lJT7C227I7pVUWvGlKZoobrkjIoow= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1753196771; c=relaxed/simple; bh=t1vlPGniw7B9nA1bXuY555HX5t2MHKfNgA3fTRj14wk=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=FuRuZWFAUe9GE1si/mtUvXbE3Fs/oixVpow4ALsgKVLNPk9d3s/WXaRfqoVLYkHjaQg8pS/z8hGnGPrs2Ug3JHXo9L3YP3zIAPwNUJvEheMBLYudvpNDLzgFqP+tmg8bA8R5JLfQm2Q4ivXRMqaeh212rWkTJzGF58sANynwOac= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BE60B3850842 X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Message-ID: <91f26856-72b0-483b-8d04-bd90a27b6be0@towo.net> Date: Tue, 22 Jul 2025 17:09:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 To: cygwin AT cygwin DOT com References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> <03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de> Autocrypt: addr=towo AT towo DOT net; keydata= xsDNBGNaf3QBDACVevqudcTSevLThXKQPU1QpaDxtGuYjtwmr7i9wXxVGih4Y4oxOJN4PYlu KBX9IVAI4651dA+xYtXuyIkWOPZWyyzkGKavQOn3Q7dk09oj7bh2IwOndpxXXde337D408EQ bQEGbMHr9lOWhSAideowzgCeFIvGTf2AovbPh97HpexJn1/HCRiRAhTNlrkS1DByUgCAeEMK fEr6aGM/Ou29MT+eTnQwOIZTnl9Z9LxM2FtqqMH3MycC7I2OoW3XXhuL8BPQdyJUjWa0/J11 Oo5jFkRXtWenIns6jGn18oW72jnDmo9jXwwS+iZWAV6Y51nhD7jSC+3xs9ORmPCdtHUSpTr1 zh67UueUJ3DUUNVuA25Hn/9EJMJ2L60BGUEr88NEB6pcZhmcwdkurAQeYT6t+frzBz2ctsoN BoxP/Xc02yd+z7hXWRRMrJWh9WHlQHA3Z4FfmyNhyPhs3MgKTJ1E9QfzGquigAmF3/k/Dc1m 7cSOKhGYhpEJdSpdXccJFKkAEQEAAc0cVGhvbWFzIFdvbGZmIDx0b3dvQHRvd28ubmV0PsLB BwQTAQgAMRYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn93AhsDBAsJCAcFFQgJCgsFFgID AQAACgkQxvPR7vYGnQKSMAv8Di+8MXB2mcfsemRdShfLLKcLOv+d0CXAtPVaY3XKxbKpRvC9 +AAT5wIHYjQft77/b2y87vGIh+nQ5hKLtNtQPSDtqG/Igkb5jAXpLi28fSUzgM96DvARmwve 5wSnAU3prxH+Y63YpOpslEcGMRoEtYCDy1ANMYPcEZT/YvDd4CplyyEai4VYrw3/LsESDYlY GK6uMQzZ1jl2cNOUFu6BwLUeZIcwaqGto8n4R4nbf4jxUEpa21bWBPqE+Jf49uipjPr/iJ72 5HbdWuuCfyTTJEJjfNEBigWP2RXM9iNDcO61V3aEjh76tThfBK2MMlLWfZkQaQziu24x8R4B I0efJYWBX2Sv2qnsH/EWj7FUIZjRqGG7LnWHLShfG6yjSOTOWYi8BbsvoftpaLWgZX28aGX4 uzuSZ5L0caXh/pr/gSgqoH/YbuFIgqtQH4seOBgTybd22Vpe78rnc+8450pN8qwchHAZaJka UxS0SpYxXzXmHUKILA4C43s0U/z2Mez9zsDNBGNaf3cBDADeJ7paMrb6f1+k8wM7tyk0/Ded KX/pOejt/D20Ceerw2iL/4tUmBL+A3ic2yjiSFUSsEfHwgCVwKrn4MwZtkesdiphm2lk6xWc k1ENCQy44QwQT6UZ/mHWYWcj5LS6ua183x1zdn9iF3lv150nm/ssw56D7USz/ap1Vh0lf5te D+CIheGLocVDqxWiu7rHP8jKRWFgq/+OU6HKX8p2Yv1oYsykh9qF2bFzawLDS+S1VbfRicfD G0RtceL/BAf7b6UE5u9TGdfrFEa2TKZeS/FS/ViKUfwsXQIki1sWt2FQENbuDY28vxyR46ZZ 0gixDCFUoBw5pkmOGVQa+1RQYrRqlN4X0CAgp7mFVeEHl5NTgiL1bemkQVmHOUDG+CzNg+Lk UGoedAtT672l3JjrnSs4j8zNshpgV2OfAhAC+V9XvqCjMnxzVfXkVlbuWpPfUWQeFclLGg8P agpQUE0Ux+VV4DoeQCxYEnRCf/n7n+IRfILj5+2l6Zw4M7zSu6ii0tUAEQEAAcLA9gQYAQgA IBYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn97AhsMAAoJEMbz0e72Bp0CQr4L/REdT0SF mbapnZIe92THCdtAUgwEv8VdNiNFBJelz8P/fuXuNPtisYvQQD4e64zpWe2UC4Cxo9DUk/pW 6Qci1xaXRKEiSPjHdSGGVB1PFIcqiS75GCf/ga/Dnfsy0Y4Uh6OGTQnkvZLBCe3vvcVLDQ7F PuV79zA9/eOeOW6aGoO6bq/wH+z96f9LyTITkQDy07fm6JYTGuzAoJE2AEboU1mgbtlx+tAa QFkpAQkp2g1Vhc3A7k4vntlHOrjMC+uVFh7QTGFfIlLRF6izUjSe6EZ06LErzlIiE05RP3yF FSRWidW0wze26peYlxYVgH1+T9wMTW2oiTBybfAMHBAxUP7Gr1WUo/oJEr0srWhatz8AwydP y7NwFbdpYn0NcFBaIlLW/JL11Eovwlivow+oGpzGFuuzSuflp2q9s2JWtn4EhW0kEs93D0LP iuJWvRaCZ6aD3uF3FMW8wyVWZYsLrzune2jH8w/uKMprDEOGOm+BcyhEFedTyY1ygbZKl+0G kQ== In-Reply-To: X-Provags-ID: V03:K1:fGCvEfc9PEbCqgqouFB/F/sd39Xl9bcGXU6+XG9xEMu5qfyd56/ Scf+t0Gf8UpHbEM3djuBQN8NvDkJ3DiH0Qe9EM+dmxp8HBFVfreuIFxINZ9sTKth2Tc0rWp ywllw6CKxTUY3eg+iNryaCt9U2Dkxc9V9G3JDwtGQ0rPspAUJI6Ewjeg9d0ZX7UKdOc8uEU v6A0PRyqqaSSl9ycTkv3g== UI-OutboundReport: notjunk:1;M01:P0:rwhAydkqpqI=;wPr6rllDRAeDIp9xbaRh8wJmlcc jWKzAQrjWPwsB+dXMbgmkE33GS3pcSSG+6AgMnU4aQ4aeXy+vm4LpzQbaXAmpnTBJcvTLoHGZ +JDKjfOo82j+4rS21fdu2iz/O01ravwctqspIGOowLMegGroqO8dXqdM1uekIjk0u83j7LObA 4hiF53A2UNk4RNBJs0SOzRS/ECXTxqZyI4FPSW4FZS+Hl3avf0p6MSn2aUQIOCScVWIK2zdzK JaPUcVMpGWjoytFjOxsI+lQsuKUUdrnqimU9ys3jnKu2ptE5sa30ikewExohXAL0jD08Svn8F /8WaBLZlTlfITy2Gx2KCjiQ1kPzMWR0bZWyGJTS2HOx5kqWQOv8v27qtxOVOJrWpYFT6sYqGI pnUDrAWU01z1WFyfOd4E6awgnu+EOQVrof4nkGXfugH1pgSm5EvCT2AreTe5UkBxm9Rb1wc3n KSQRj6dDCS+du7TJUD2wGkO2d16PkI+7s4Si3h4yNhGT7EWW/4Co0ETudtcKd19nfIrNdj96j wsUJeiqn/dEBF9prfHd6qKPeX6c9q0ioyJLTXZ4aeOVcB31Uj/QCtMMjGpjWA8lHHtX84UKKT v/gMnUdrD/UyG8u4ZLj6gE5wwoaum/9P2c82/2w81h0Zl3RSZ7BUj6biLOMKaKZiHBcH2SlaE 8TxmazaY/m9kthYCFUhm01nSzDbk/G1YZ4GFyT7KhSz5jqVfu1F2ASgVhK/QXpkITdo5iAbCT fG7ACgmDpMrddOrZPEwSbTxBeDrtdwhTI1vHHElwjufxRp80Jwei/2gkKwsKkETua24yw+3yQ suDIp0IeQ2AX5IMKCRUyWpeXE2JkRT0I2YVzUzhS8N98GWHz+4xNQQkyzjOH76bEcHSKe6b2o ARmVNKXGmRbJRSACcsbW1znczNRf6UmUFUMyozR0/gqmAHvFiews+mRKju/yHc9bARGPBEQ2M xJSDAcX7cS9QYZSIBNwKspzpAaXRMFdgZClRfiEkZ7B48IsyoIrLdKkUrFW5ZytPbiG4VJZ5H UhzbpeFhyBirWjnBWS7Zyow0UoLoGQmiZ6iBsrQWOcfmmFZv82BYQ1aVDZZsj98emf8b2dDRQ mJ0cymNmvzwTUaBha6NhXOu4QQpIzDOPQAtM/FHf+poHs+Bqq7riDb7IINNXCSQrq3JGnZpKm cuuP7IqQuRj7PiB/L/V9w8YqWhQTjarB9AMrnAqz8aX4PbSvoZwAWbQ43VcK2Y8gmK7prVvDg ybD4ILGDt9ubLgSi4uBwg9YJfMFX9ujZ9syU1+JAeuEUxNb1+5hkCxMkCn3yaKBQBj9464gPf AMYHcN0KK6Cu8DHUVl8sr72yinSR6/kBiFPXX4rDf+E8Xw6+Q9Rhe4qhAjun5w2l6mrgeFjfW 8TAYYEd9HeOtssFbBQAoGtl1609Ys8P6O0ZWPFGo1KFBnIhg33ZOXsDIyqBP9V7FIFZ8F2LLO c3fLqYRXQ+jtuqlK2kqRcjpXwyKUVww2vyhDMWauY6rWPxDaWQhBxA3MS+v7byEgzRLgAOueg EKATtK8//VI5yxG3FEGCSi0ZTquGA5LNLnYt3EVkQMuV71QGXIlHM5uZ2pOlPavenMNlo8p37 De6SBEzraI= X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Thomas Wolff via Cygwin Reply-To: Thomas Wolff Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Am 22.07.2025 um 15:05 schrieb Corinna Vinschen: > On Jul 22 05:38, Thomas Wolff via Cygwin wrote: >> Am 27.06.2025 um 12:30 schrieb Corinna Vinschen via Cygwin: >>> On Jun 26 19:07, Christian Franke via Cygwin wrote: >>>> With some trial and error I found a testcase for this more serious problem >>>> reported yesterday but not quoted above: >>>> >>>>>> In cases like file3-... above, the converted Windows path ends with >>>>>> 0xF000. This suggests that this is an accidental conversion of the >>>>>> terminating null to the 0xF0xx range. >>>>>> >>>>>> In some cases, the created Windows file name has random garbage >>>>>> behind the 0xF000. Then even Cygwin is not able to access or unlink >>>>>> the file after creation. >>>> Testcase (attached): >>> Thanks for the testcase! >>> >>> I found the problem in the newlib core function creating wchar_t from >>> UTF-8 input. In case of 4 byte UTF-8 sequences, the code created the >>> low surrogate already after reading byte 3, without checking if byte 4 >>> of the UTF-8 sequence is a valid byte. Hilarity ensues. >> I'm afraid the fix may have broken mbrtowc as I just reported to the list, >> with a test case, thus also breaking mintty. >> The low surrogate MUST be created after byte 3 because otherwise the high >> surrogate cannot be delivered after byte 4 as it needs to. >> I think it's a drawback of UTF-16 that must be swallowed, even if some >> incorrect sequences slip through somehow. > Bummer. What bugs me most is that you might be right here. It's a bit > late, but we should have defined wchar_t as a 4 byte type back when we > worked on Cygwin 1.7.0... sigh. > > mbrtowc() is inherently a bad idea when it comes to UTF-16. It's a > function which only works really correctly for the unicode base plane, > or if wchar_t is big enough. > > It's the reason we don't use mbrtowc() if possible. It's better to call > mbstowcs() or friends and allow at least 3 chars in the wchar_t buffer. > You can't change that in mintty by any chance? Well, I've started to think about a workaround but it's code I've never touched before and I'd need to carefully ponder about all kinds of possible special situations, so my testing effort would be high. Also, I'd need to implement bytewise mbr collection which is right now done by that function. Since not using mbrtowc anymore would leave it still broken (and what other software may fall into that trap...), I'd prefer a fix of that function anyway. Thomas > Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple