DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 56OHhoKg1545485 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 56OHhoKg1545485 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=cHxOTTBj X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 49B9C385B51F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1753379028; bh=sifA/vtT4ZEGxVJe7snWCJao/OZrhaHmo66PqTAXofw=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=cHxOTTBjeNT3T5GISdSrQPVrR1aubW7Hyq8ksu4C4HEb5XTrps8uL5962k8iqS8uM 02EXftTTSO2atp4FqhAZwxlSSgtbIx2LuOeDMaigU+wuBb4amgTMEi+a67Dz1yw6ML 5I5UMyqYcQgeYjIldfRBaCos8XvSPLiTB3LAn9P8= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 38720385B516 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 38720385B516 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1753379004; cv=none; b=Oix1zXOTmvIyLRYUQVo7YxLnmor3Vvct/23MKx6B8SXwoCM6yjg43EyVlvlK4QH7tpaaLk1qOrmpRDtRbdCnCwXEoAqbZ6DTL3JWOIrB5brx8rpQr2x8lE5GTeuDTWHbPV/Qt6evusdZ2nUS6qNxEdeVmFeGGNyvUCbHL2BmDrU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1753379004; c=relaxed/simple; bh=q6GZly2MQq/M9+Sw5Ej9Q0HuX14FFb34jRtylkYb0IY=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=fo9gVhHaLULAkqrZv2nEulR/zP89hXhYflObImCn+iCNxQe0b56zibFhbKglz8rCfGS4ObWTAFBlv8p7A+3V7yM+x3LFOY4GBPKEhkqiKuvD7HDvhc9TZRdkFDxrj5JPwIS7dC4Eljx880//+o5FlR+NahYk0ORpZS7UCEjy9ec= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 38720385B516 X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Message-ID: <7adc5002-932a-48e3-82ce-7a59713b5476@towo.net> Date: Thu, 24 Jul 2025 19:43:22 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 To: Christian Franke , cygwin AT cygwin DOT com References: <91f26856-72b0-483b-8d04-bd90a27b6be0 AT towo DOT net> <4ab2c1b7-3164-4556-ba36-29814ecf5766 AT towo DOT net> <68f65634-8f4e-436b-ba6a-d30bdf882aaa AT towo DOT net> <11282182-60d1-4841-bf78-5ef78cf30060 AT towo DOT net> Autocrypt: addr=towo AT towo DOT net; keydata= xsDNBGNaf3QBDACVevqudcTSevLThXKQPU1QpaDxtGuYjtwmr7i9wXxVGih4Y4oxOJN4PYlu KBX9IVAI4651dA+xYtXuyIkWOPZWyyzkGKavQOn3Q7dk09oj7bh2IwOndpxXXde337D408EQ bQEGbMHr9lOWhSAideowzgCeFIvGTf2AovbPh97HpexJn1/HCRiRAhTNlrkS1DByUgCAeEMK fEr6aGM/Ou29MT+eTnQwOIZTnl9Z9LxM2FtqqMH3MycC7I2OoW3XXhuL8BPQdyJUjWa0/J11 Oo5jFkRXtWenIns6jGn18oW72jnDmo9jXwwS+iZWAV6Y51nhD7jSC+3xs9ORmPCdtHUSpTr1 zh67UueUJ3DUUNVuA25Hn/9EJMJ2L60BGUEr88NEB6pcZhmcwdkurAQeYT6t+frzBz2ctsoN BoxP/Xc02yd+z7hXWRRMrJWh9WHlQHA3Z4FfmyNhyPhs3MgKTJ1E9QfzGquigAmF3/k/Dc1m 7cSOKhGYhpEJdSpdXccJFKkAEQEAAc0cVGhvbWFzIFdvbGZmIDx0b3dvQHRvd28ubmV0PsLB BwQTAQgAMRYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn93AhsDBAsJCAcFFQgJCgsFFgID AQAACgkQxvPR7vYGnQKSMAv8Di+8MXB2mcfsemRdShfLLKcLOv+d0CXAtPVaY3XKxbKpRvC9 +AAT5wIHYjQft77/b2y87vGIh+nQ5hKLtNtQPSDtqG/Igkb5jAXpLi28fSUzgM96DvARmwve 5wSnAU3prxH+Y63YpOpslEcGMRoEtYCDy1ANMYPcEZT/YvDd4CplyyEai4VYrw3/LsESDYlY GK6uMQzZ1jl2cNOUFu6BwLUeZIcwaqGto8n4R4nbf4jxUEpa21bWBPqE+Jf49uipjPr/iJ72 5HbdWuuCfyTTJEJjfNEBigWP2RXM9iNDcO61V3aEjh76tThfBK2MMlLWfZkQaQziu24x8R4B I0efJYWBX2Sv2qnsH/EWj7FUIZjRqGG7LnWHLShfG6yjSOTOWYi8BbsvoftpaLWgZX28aGX4 uzuSZ5L0caXh/pr/gSgqoH/YbuFIgqtQH4seOBgTybd22Vpe78rnc+8450pN8qwchHAZaJka UxS0SpYxXzXmHUKILA4C43s0U/z2Mez9zsDNBGNaf3cBDADeJ7paMrb6f1+k8wM7tyk0/Ded KX/pOejt/D20Ceerw2iL/4tUmBL+A3ic2yjiSFUSsEfHwgCVwKrn4MwZtkesdiphm2lk6xWc k1ENCQy44QwQT6UZ/mHWYWcj5LS6ua183x1zdn9iF3lv150nm/ssw56D7USz/ap1Vh0lf5te D+CIheGLocVDqxWiu7rHP8jKRWFgq/+OU6HKX8p2Yv1oYsykh9qF2bFzawLDS+S1VbfRicfD G0RtceL/BAf7b6UE5u9TGdfrFEa2TKZeS/FS/ViKUfwsXQIki1sWt2FQENbuDY28vxyR46ZZ 0gixDCFUoBw5pkmOGVQa+1RQYrRqlN4X0CAgp7mFVeEHl5NTgiL1bemkQVmHOUDG+CzNg+Lk UGoedAtT672l3JjrnSs4j8zNshpgV2OfAhAC+V9XvqCjMnxzVfXkVlbuWpPfUWQeFclLGg8P agpQUE0Ux+VV4DoeQCxYEnRCf/n7n+IRfILj5+2l6Zw4M7zSu6ii0tUAEQEAAcLA9gQYAQgA IBYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn97AhsMAAoJEMbz0e72Bp0CQr4L/REdT0SF mbapnZIe92THCdtAUgwEv8VdNiNFBJelz8P/fuXuNPtisYvQQD4e64zpWe2UC4Cxo9DUk/pW 6Qci1xaXRKEiSPjHdSGGVB1PFIcqiS75GCf/ga/Dnfsy0Y4Uh6OGTQnkvZLBCe3vvcVLDQ7F PuV79zA9/eOeOW6aGoO6bq/wH+z96f9LyTITkQDy07fm6JYTGuzAoJE2AEboU1mgbtlx+tAa QFkpAQkp2g1Vhc3A7k4vntlHOrjMC+uVFh7QTGFfIlLRF6izUjSe6EZ06LErzlIiE05RP3yF FSRWidW0wze26peYlxYVgH1+T9wMTW2oiTBybfAMHBAxUP7Gr1WUo/oJEr0srWhatz8AwydP y7NwFbdpYn0NcFBaIlLW/JL11Eovwlivow+oGpzGFuuzSuflp2q9s2JWtn4EhW0kEs93D0LP iuJWvRaCZ6aD3uF3FMW8wyVWZYsLrzune2jH8w/uKMprDEOGOm+BcyhEFedTyY1ygbZKl+0G kQ== In-Reply-To: X-Provags-ID: V03:K1:pUeD2N9YOYKtNtKgg32V0SS8ypHWjx40cASU0NKaEvRFDUVn4VO xJk6QaMY8Yyo3GN62xX4eu/DhNM6eZ8jqHVE94IlCRwddNpQCzZmjuA+tDckUdCZlk9C+yy MspzTxy8GEp+LffyGXr5L9z/ZhWdfIAPtuOopqMrGkls0LAf67bV84SOGU+Cwp6yhYQHviz gGTIzawxq0BBjXzoB7m/A== UI-OutboundReport: notjunk:1;M01:P0:2sCAaJH9f9A=;uO0wgh4ZjMqrpCwUDqJzDv87w+/ 6wdeZ71oRGWxz4Oq12bxy9M5NJiEXoQgPYDuh9rNDFV+5Jpuwi4BXnypWl9INxd8UCiPZ68y1 aqz7G1loD9cSsI5WTL/8p+ba0BzT+C2K9bC1hmZMvgDMg/X3vADVj7VyyedCfX+PT28QP8v3v iivgER63ScdVupHnrABcX33vC7pKRNZU2W5dETPXsivVsnb9A1Se6KlBPTKlc1vLrzCtKbcub cn0563Z8B2sgS/sOmjqONIi6yBQ4SMq1mJ6aGl2mhqVFTHYVcEpwUp/zFXUIeXBRbDNpoF8ZP 43UJ+94B7HC+hQ3x/txuAtJGSX/dOYOPHLF3OwUag0FpwsXJTvGAsJk1ZnlikcIRphZrCpIo1 OZgcf2w6CiQsIkZcldcDF7zc59fKjuX2J1gb1lo/w8icg06R6zGjPF72mN2741XIFmIrJF0y8 XUa9Rh3OCKoOT6qFElsCGh0cjkkSMO5npe1UNpBhybiJmkXmzoC9BejEEPiXEGzAMLDgxg/Vi K9OCwGvXxFqaGbeKmhofnZXT+1rhWx65QocfcZ4Kb5dnQ0DOUn4QawDY88xksH1f8Io7vjRrk yh7L4p1xacfj3bxHml86kgeYuG3cPNKy/pbx7TTHNJtwkiRDlJW0cEb3I8rUJOu61uKYDHP7C WITWtjy6ubULqRzEjq4rT+9Md/1XZ+H5iPX6TNKabVVokwrbBTNtfEGVoaddNGA1IgH3j7s0/ MQIxX+yDcTXcj96aiF1eZjOQhxrsHvjMIo9TB36T4NDsVJhduSFMMr0r8Q834+j0VjkEztLzA Mdm0a+qzMHWU5eabHmZVsn3n8ROnciOEbqj8+X7ZIDvigqhRC3DMj9Yw9mPgx/1tlvGfkhZvD NYFJyy1OohIRzMOKQacup+fHG+Pdfq69T24m8StgAsTvcIZtJC/TLD6M9n/4x3k0bvQ4BJbx4 6z6nmrCCbVDeZseDr85N2altqsWP2itE0zKCeVGJfVtpyMZSsqtSakhgrvUooJAphMAOBJOQJ SMN2WdLRSGFdkPc7kQwtOghOUY3Zk6ct8cyKCtprgd2GufzWNZo7pIAPghlBP11uzBOucjyd+ MtcVJxNRNOTzCsJbHzccAF0MqEoPcmu/IsFHouXJSMl9vB4kyIZ64CB2Vds7G0YJqMvIDf2uT JFVxiGENCIhaZWd3ULZ0dC8uniBtUQxuT+SxPLNQ9N0fV3W2IDCoqqTm7nm7/x9CRC3OlwlXZ bFeANSzeEZwdQ9LL1mZYZ8zxoie1xAbGupyBG9R+VTysyY+QyFW6fw3tfvk2oGyqANRCG+0AU dzOa1qsTJ24Yl7i1F9BfnyYWMff2RE+fKeF8S08BpqOM5LIaRYTpzc/4ACyAepiaFmeu3xn65 +N8ksn698Dr4cMOrEcwuP8ul2gbdnyNInykWpRFU4O137q3BHzpEcJms0l5Xed6I6dOFT4e90 Jbzcc61xmphqRakD+EX3k/ja5JQ/4MDihha1SauVW3vqFPfIHXOGzE4edTYWg1DXgBhZuVbis rb3pDshaBO+fUIePsZIihpFxuoERVkNdVcGfd99vXvHO0hsPDaC8p/cB5Y2HxQ== X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Thomas Wolff via Cygwin Reply-To: Thomas Wolff Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Am 24.07.2025 um 16:08 schrieb Corinna Vinschen: > On Jul 24 15:41, Thomas Wolff via Cygwin wrote: >> Am 24.07.2025 um 12:30 schrieb Corinna Vinschen: >>> What does that mean? Consider this UTF8 input string: >>> >>> 0xf0 0x90 0x80 0x2e >>> >>> mbstowcs: returns -1 >>> sys_mbstowcs: f0f0 f090 f080 002e >>> >>> Let's convert it back to multibyte: >>> >>> sys_wcstombs: 0xf0 0x90 0x80 0x2e >>> wcstombs: 0xef 0x83 0xb0 0xef 0x82 0x90 0xef 0x82 0x80 0x2e >>> >>> So while sys_wcstombs has special code converting the string back to its >>> original MB string, wcstombs converts to the CESU-8 representation. >>> >>> This is transparent. If we convert this CESU-8 string back to >>> wide-char, the resulting wide-char strings are the same: >>> >>> mbstowcs: f0f0 f090 f080 002e >>> sys_mbstowcs: f0f0 f090 f080 002e >>> >>> So the question here is, shall we keep the special case converting >>> private use area bytes back to their original byte encoding? >>> >>> Or shall simply go along with CESU-8 when converting back to multibyte >>> to keep the string the same as with wcstombs? >>> >>> Exempt from this are the characters not valid in a DOS filename. >>> These will always be converted if we create wide-char filenames. >> Sounds like a fair solution with only minor glitches. Poor 4th byte but >> thanks a lot anyway. >> About the latter decision, if there's no strong bias otherwise, I'd prefer >> to drop special handling (but don't take my vote, I don't care so much about >> that). > Thanks for your input. > > As another datapoint we have to consider how sys_wcstombs is used. > > wcstombs on a filename will be used by the application only, and only if > the filename is incoming application level data or has been converted to a > wide char by the application itself. > > sys_wcstombs will be used to generate a readable multi-byte filename from > UTF-16 filenames read from the filesystem. So it's major use in terms of > filenames is by readdir(). > > Knowing that, the question boils down to this: > > Do we want readdir() returning the same name as given to open(), or is > CESU-8 sufficent? You mean for "normal" cases (i.e. proper non-BMP characters, not invalid stuff or handled special or private range characters)? In that case, I'd not expect or wish to handle CESU-8, as an application developer. Thomas > > > Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple