DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 56MD70aQ4013410 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 56MD70aQ4013410 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=g7rdsThT X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 474FF385AC27 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1753189618; bh=67L+E0iHxVgaPUoQa2cS+buKIxTuBjbgz0sriijw8Mo=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=g7rdsThTCHmk78+LEBfSlLPNupCQWX6MDoxwqO0SaB1VmJa4qXzinN67oe353jC+J PBpvLb/MSDfa6uFW9s/jHa6ismmKxUpPhN/DS4p+rTHoCa8SxkO7cT8meCTj5DX/jP kYGwJR9zv09qHohLkGyYzNt7cRHFrohRvSID2Q94= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6C0B93858D1E Date: Tue, 22 Jul 2025 15:05:59 +0200 To: Thomas Wolff Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 Message-ID: Mail-Followup-To: Thomas Wolff , cygwin AT cygwin DOT com References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> <03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Corinna Vinschen via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Corinna Vinschen , cygwin AT cygwin DOT com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" On Jul 22 05:38, Thomas Wolff via Cygwin wrote: > Am 27.06.2025 um 12:30 schrieb Corinna Vinschen via Cygwin: > > On Jun 26 19:07, Christian Franke via Cygwin wrote: > > > With some trial and error I found a testcase for this more serious problem > > > reported yesterday but not quoted above: > > > > > > > > In cases like file3-... above, the converted Windows path ends with > > > > > 0xF000. This suggests that this is an accidental conversion of the > > > > > terminating null to the 0xF0xx range. > > > > > > > > > > In some cases, the created Windows file name has random garbage > > > > > behind the 0xF000. Then even Cygwin is not able to access or unlink > > > > > the file after creation. > > > Testcase (attached): > > Thanks for the testcase! > > > > I found the problem in the newlib core function creating wchar_t from > > UTF-8 input. In case of 4 byte UTF-8 sequences, the code created the > > low surrogate already after reading byte 3, without checking if byte 4 > > of the UTF-8 sequence is a valid byte. Hilarity ensues. > I'm afraid the fix may have broken mbrtowc as I just reported to the list, > with a test case, thus also breaking mintty. > The low surrogate MUST be created after byte 3 because otherwise the high > surrogate cannot be delivered after byte 4 as it needs to. > I think it's a drawback of UTF-16 that must be swallowed, even if some > incorrect sequences slip through somehow. Bummer. What bugs me most is that you might be right here. It's a bit late, but we should have defined wchar_t as a 4 byte type back when we worked on Cygwin 1.7.0... sigh. mbrtowc() is inherently a bad idea when it comes to UTF-16. It's a function which only works really correctly for the unicode base plane, or if wchar_t is big enough. It's the reason we don't use mbrtowc() if possible. It's better to call mbstowcs() or friends and allow at least 3 chars in the wchar_t buffer. You can't change that in mintty by any chance? Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple