DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 56OHkACn1546513 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 56OHkACn1546513 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=V0W8iMnO X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 872C0385B52F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1753379169; bh=ZPlxZU38hiEEJ2vNaFzavFnMFWJa2w3+9jCOj+vPXNU=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=V0W8iMnOZ+Ax/j4KmwhwcRPOO2/AkZ2YcU7KqIAI65IX+2jqIG9DFw4a3+E08xtoz LKpKtv4qVkV5VxsiXLmta+kiW0xTI++gPZkuf6vxeZsFpMS6bsf2PoJsN2xUWSZkFm c1QE975R47Jvz9vyQTuuXHvKA77T7/e1ZwMHeX3g= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B8D64385B516 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B8D64385B516 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1753379118; cv=none; b=m2AMvUqebTKX213McxRIMM82ctAR3bxa0gEoPEKMedLdbe+UYVqq0Qylz+WVkCwNAR15ao1L/N+pxIkUdw8EC9Nev9L7jiW+JF9k7F7WwHCdDrPXtB5hjKANR9rjmSDzs1xh5V/2DlQVymI4Qoxpz4ipoT53ePM6e/Xs64enBlY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1753379118; c=relaxed/simple; bh=t5QrU1ge8BUYKt32LpOQfj4Tt9/aQhaE6KPFc1o9fOM=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=bHtBc6UA8f+ECaTPR6udglEjDGdIdut59tGvjvUcyZtsQ1sJnR+M06WfoY1a5iurpJWuP8QPd2v3PpPueQeRD8NVyT+pdkvIjdQ3HZnNObxFL3FeNXO+lXWMiHulh1nvsaVBpBjFa3zuG+djCbB53zK9k3YzrVCakJXilnKuvtg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B8D64385B516 X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Message-ID: Date: Thu, 24 Jul 2025 19:45:16 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 To: cygwin AT cygwin DOT com References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> <03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de> <91f26856-72b0-483b-8d04-bd90a27b6be0 AT towo DOT net> <4ab2c1b7-3164-4556-ba36-29814ecf5766 AT towo DOT net> <68f65634-8f4e-436b-ba6a-d30bdf882aaa AT towo DOT net> Autocrypt: addr=towo AT towo DOT net; keydata= xsDNBGNaf3QBDACVevqudcTSevLThXKQPU1QpaDxtGuYjtwmr7i9wXxVGih4Y4oxOJN4PYlu KBX9IVAI4651dA+xYtXuyIkWOPZWyyzkGKavQOn3Q7dk09oj7bh2IwOndpxXXde337D408EQ bQEGbMHr9lOWhSAideowzgCeFIvGTf2AovbPh97HpexJn1/HCRiRAhTNlrkS1DByUgCAeEMK fEr6aGM/Ou29MT+eTnQwOIZTnl9Z9LxM2FtqqMH3MycC7I2OoW3XXhuL8BPQdyJUjWa0/J11 Oo5jFkRXtWenIns6jGn18oW72jnDmo9jXwwS+iZWAV6Y51nhD7jSC+3xs9ORmPCdtHUSpTr1 zh67UueUJ3DUUNVuA25Hn/9EJMJ2L60BGUEr88NEB6pcZhmcwdkurAQeYT6t+frzBz2ctsoN BoxP/Xc02yd+z7hXWRRMrJWh9WHlQHA3Z4FfmyNhyPhs3MgKTJ1E9QfzGquigAmF3/k/Dc1m 7cSOKhGYhpEJdSpdXccJFKkAEQEAAc0cVGhvbWFzIFdvbGZmIDx0b3dvQHRvd28ubmV0PsLB BwQTAQgAMRYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn93AhsDBAsJCAcFFQgJCgsFFgID AQAACgkQxvPR7vYGnQKSMAv8Di+8MXB2mcfsemRdShfLLKcLOv+d0CXAtPVaY3XKxbKpRvC9 +AAT5wIHYjQft77/b2y87vGIh+nQ5hKLtNtQPSDtqG/Igkb5jAXpLi28fSUzgM96DvARmwve 5wSnAU3prxH+Y63YpOpslEcGMRoEtYCDy1ANMYPcEZT/YvDd4CplyyEai4VYrw3/LsESDYlY GK6uMQzZ1jl2cNOUFu6BwLUeZIcwaqGto8n4R4nbf4jxUEpa21bWBPqE+Jf49uipjPr/iJ72 5HbdWuuCfyTTJEJjfNEBigWP2RXM9iNDcO61V3aEjh76tThfBK2MMlLWfZkQaQziu24x8R4B I0efJYWBX2Sv2qnsH/EWj7FUIZjRqGG7LnWHLShfG6yjSOTOWYi8BbsvoftpaLWgZX28aGX4 uzuSZ5L0caXh/pr/gSgqoH/YbuFIgqtQH4seOBgTybd22Vpe78rnc+8450pN8qwchHAZaJka UxS0SpYxXzXmHUKILA4C43s0U/z2Mez9zsDNBGNaf3cBDADeJ7paMrb6f1+k8wM7tyk0/Ded KX/pOejt/D20Ceerw2iL/4tUmBL+A3ic2yjiSFUSsEfHwgCVwKrn4MwZtkesdiphm2lk6xWc k1ENCQy44QwQT6UZ/mHWYWcj5LS6ua183x1zdn9iF3lv150nm/ssw56D7USz/ap1Vh0lf5te D+CIheGLocVDqxWiu7rHP8jKRWFgq/+OU6HKX8p2Yv1oYsykh9qF2bFzawLDS+S1VbfRicfD G0RtceL/BAf7b6UE5u9TGdfrFEa2TKZeS/FS/ViKUfwsXQIki1sWt2FQENbuDY28vxyR46ZZ 0gixDCFUoBw5pkmOGVQa+1RQYrRqlN4X0CAgp7mFVeEHl5NTgiL1bemkQVmHOUDG+CzNg+Lk UGoedAtT672l3JjrnSs4j8zNshpgV2OfAhAC+V9XvqCjMnxzVfXkVlbuWpPfUWQeFclLGg8P agpQUE0Ux+VV4DoeQCxYEnRCf/n7n+IRfILj5+2l6Zw4M7zSu6ii0tUAEQEAAcLA9gQYAQgA IBYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn97AhsMAAoJEMbz0e72Bp0CQr4L/REdT0SF mbapnZIe92THCdtAUgwEv8VdNiNFBJelz8P/fuXuNPtisYvQQD4e64zpWe2UC4Cxo9DUk/pW 6Qci1xaXRKEiSPjHdSGGVB1PFIcqiS75GCf/ga/Dnfsy0Y4Uh6OGTQnkvZLBCe3vvcVLDQ7F PuV79zA9/eOeOW6aGoO6bq/wH+z96f9LyTITkQDy07fm6JYTGuzAoJE2AEboU1mgbtlx+tAa QFkpAQkp2g1Vhc3A7k4vntlHOrjMC+uVFh7QTGFfIlLRF6izUjSe6EZ06LErzlIiE05RP3yF FSRWidW0wze26peYlxYVgH1+T9wMTW2oiTBybfAMHBAxUP7Gr1WUo/oJEr0srWhatz8AwydP y7NwFbdpYn0NcFBaIlLW/JL11Eovwlivow+oGpzGFuuzSuflp2q9s2JWtn4EhW0kEs93D0LP iuJWvRaCZ6aD3uF3FMW8wyVWZYsLrzune2jH8w/uKMprDEOGOm+BcyhEFedTyY1ygbZKl+0G kQ== In-Reply-To: X-Provags-ID: V03:K1:xaUQuY5ciZkwHkNp3OYBO6HWoeX3tqpjsQyzVEQPCmJkgOT0f6K Xpig3yqce7OUucp2AK1W4FWvl+lLdfDKsDGIq8bjAkU5V0hp9xaSlCpk+zp5SvdxuJRWPEI AceEVGw+pcPAUC4oXI5tKPGMaIu/NCtVf6Cqg2TChfTImEeUUBnVw46fpt47fHPcwP+hBRF E/YSBwAdZsphJHOn8vUYQ== UI-OutboundReport: notjunk:1;M01:P0:ThC92YLcnlI=;pCDlvk/9Q3dst6ko+AWqfitORr+ +gv+Yf+eT6T2e0uVrn9PTLvPbPmbalF9sEoZTGW1olPiDTvzKAkKlJi+Wu9+b+ur3ZLNVIqOZ Dvl/yJ/QRy3jKaCWFRMvx58/Ae3Toea2aF68GAdLnYfyuxnNvqCpKAp86m47VYLeRPtsOSO6r KkmvZh74AVfd0PDDJ99gqoT5FRqXNpzMeUAXNv5WHfqF+g8rAxhmz67buf9UdbF9YechYdLpd +GAicsvivki2sTSd2+t5YJyUPQHczqBMSRs/dGR3Z/lC3fZ/+IvXJOIiaXfuY6N2PGOQJy0tF +lux2WLdXJyiYK8abel8/IXDW6p9fLX0ihZnOXU45Xnm0kKoyt0wwxNEwQpOOUCeZWd1dHsQd NsVDryXbdfGaKAWarWkFKV9X8eqlVSBWboHCPIje2zqDuIFj7xyK6rtKPij6ho7itoVkeSj3Y DQF8SqZAG7ZmKtpyr2IujpY8sVgg/C/lrqSJXCfiAm4Yi9247ebyhgGapvFvivvigs3jiGaxO vJNzk1HOK+PBEiAJYgPZweipVscfm6JgKdD8iLpvLQVyASMKeyydlrkdMsKHpiAWBoON99C0C 3y7Lh8K/oinxHsRskFSH6EX9IfPrH5WhRmB3j81OnvpfxUBFSTK71mX8wsfUNIhJ1RL/7NVx/ Stt65Byre8XxfbNeEtGAyS673F+biMbZARWW02WYfbwSNpVv2dG+PNtYG6gXCFE4vEnYt57Xk UrTy23rRnz+wMHsjQcNcY1lLwG0Tcpd9U+3WTX5HBQ0AZNnVrT7AOqRU8OlAkAByFoPxfMo/N DIoNb+fyVEmG8dEPyQ67G2l/vuCMx8XLuShu2gjQA4WAlGlyE5qFUwRhtOzXBZsLiN5NPL/fM WCDspPrmZMWxu0iIFXlgTJIHHX6pWLP5E8ibBk7/PMMN7EC96AseP7JUAcoINlZeDehSaRJtt 8IOU3yaFPwvkSfa5oor5OF2dTNtxkoUjXItyZ+CxW3ze9rjekSzQlS76c8P71njUDWpc5qoC3 /JxnkANz6nBzTsPNALPBh/Fq5S2d9WxzDCnggAisNIaeicutbkdwNWDifrl6QZiDldazjv/cz pJZRLBJOLiD83m35R3RcafycEuuSB2SVp/sdqP9e9Wjukj70THQx2Utz62stJLh5HYhUSHSZr Sdqzama/5kdBk7vNK3j04ZVCIaa4F7LX8soLxIMjY/AiQ3dRv7F58hXtfI53ihsmcCwXT6RKY QzAtKu7g24FjFfqSjwpSzHkRwUglUEjz/90dWhDPzC+zsITPNOv5bUFkdd+MYAtv3LIE43DkP I0EHU5pNiuY1Io23Qw/GLNUQxkghTo9w3iGIyvdefBL/q/AChIy6AtjR6h/FeqPQnRT10HgEH ZWJ3Hpx4OKyKINLbWBn/L95sOWEqq7qqq0I+KvUyhuCol+5SZTRaeCsApzzp9D2HvfwgNGo3n c3IoNxmeJ3NQ887j3fe6vWUNEjdiStBtAgWsrzAen5MQQlY7a8YNnKlpb9DMhnIGTVI1HMRUV lzaJeiEYNrhuHF/TMaeniltcgwx4gsMMPuJmpbsCT0XMNKJ3XqQklruYtdZR6Q== X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Thomas Wolff via Cygwin Reply-To: Thomas Wolff Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 56OHkACn1546513 Am 24.07.2025 um 17:35 schrieb Corinna Vinschen: > Thomas, > > On Jul 23 05:44, Thomas Wolff via Cygwin wrote: >>>> Am 22.07.2025 um 15:05 schrieb Corinna Vinschen: >>>>> mbrtowc() is inherently a bad idea when it comes to UTF-16. It's a >>>>> function which only works really correctly for the unicode base plane, >>>>> or if wchar_t is big enough. >>>>> >>>>> It's the reason we don't use mbrtowc() if possible.  It's better >>>>> to call >>>>> mbstowcs() or friends and allow at least 3 chars in the wchar_t buffer. >>>>> You can't change that in mintty by any chance? >>> [...] >> OK, suppose I'd consider to switch to mbs[[n]r]towcs, collecting bytes until >> the function gives me a result. >> This would work fine as long as I receive only valid sequences. But look at >> input string test case >> char nonbmp[] = {0xF8, 0x88, 0x8A, 0xAF, 0x2D, 0}; // an invalid sequence >> followed by a valid char >> The functions only return -1 and (in the case of mbsnrtowcs) do not advance >> the input pointer. >> So how am I supposed to recognize that the invalid sequence has ended and a >> valid character has arrived? > Apart from that, you probably still have a problem in mintty: GB18030. > > The problem with GB18030 is, that you need all four bytes to generate > the high surrogate. > > Consider the following GB18030 string: 0x90 0x30 0x81 0x30 > > This string translates into a UTF-16 surrogate pair: 0xd800 0xdc00. > > If you run a tweaked version of your test applicaton from > https://cygwin.com/pipermail/cygwin/2025-July/258513.html: > > setlocale (LC_CTYPE, "zh_CN.gb18030"); > mb (0x90); > mb (0x30); > mb (0x81); > mb (0x30); > > Then the output is: > > 90 -> 0000 : -2 > 30 -> 0000 : -2 > 81 -> 0000 : -2 > 30 -> D800 : 0 > > However, if you notice this situation... > > if (ret_from_mbrtowc == 0 && codeset == gb18030 > && (pwc & 0xfc00) == 0xd800) > > ...you can just add a fake NUL byte: > > mbrtowc (&wc, '\0', 1, &mbstate); > > If you do that, the above sequence becomes: > > 90 -> 0000 : -2 > 30 -> 0000 : -2 > 81 -> 0000 : -2 > 30 -> D800 : 0 > 00 -> DC00 : 1 > > I hope this helps, if you didn't already handle GB18030 differently > in mintty. Oooff. No, I didn't. So that is already before 3.6.4 (and again 3.6.5), right? Thanks for the notice, I'll check and test your workaround. Thomas > Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple