DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 56N7s6Fq548853 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 56N7s6Fq548853 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=bm6cq+OM X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BE4DE385840E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1753257243; bh=R1XsN3w43LVW+mO6pE2QKHQtlAg5rdF0/6AcsIhO5Rc=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=bm6cq+OMmJJm9s4e/nDkhdf6tZyVBXZGzWkee+AjcHEk5Ibx6z3y3OCDEp6aTnkus pvlQlA/ilHxbhrF//qOG6makWCeiQQ3vX+ss5qFoRoKXn+Q5aNz2uQU991hKxeksWP TKq3qPpiBfoz88JRy51UN8nuNX/smq/lpnM4In68= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 333CB3858D3C Date: Wed, 23 Jul 2025 09:53:41 +0200 To: cygwin AT cygwin DOT com Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 Message-ID: Mail-Followup-To: cygwin AT cygwin DOT com References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> <03c4fae7-7322-572c-ae72-52e300f0b438 AT t-online DOT de> <91f26856-72b0-483b-8d04-bd90a27b6be0 AT towo DOT net> <4ab2c1b7-3164-4556-ba36-29814ecf5766 AT towo DOT net> <68f65634-8f4e-436b-ba6a-d30bdf882aaa AT towo DOT net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <68f65634-8f4e-436b-ba6a-d30bdf882aaa@towo.net> X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Corinna Vinschen via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Corinna Vinschen Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" On Jul 23 05:44, Thomas Wolff via Cygwin wrote: > OK, suppose I'd consider to switch to mbs[[n]r]towcs, collecting bytes until > the function gives me a result. > This would work fine as long as I receive only valid sequences. But look at > input string test case > char nonbmp[] = {0xF8, 0x88, 0x8A, 0xAF, 0x2D, 0}; // an invalid sequence > followed by a valid char > The functions only return -1 and (in the case of mbsnrtowcs) do not advance > the input pointer. > So how am I supposed to recognize that the invalid sequence has ended and a > valid character has arrived? Yeah, I see the problem. One of the slightly puzzeling behaviours of mbsnrtowcs is the fact that the src pointer stays at the start of the invalid sequence. I think the idea is to skip the invalid sequence byte-wise until wcsnrtombs reports a valid sequence again. What bugs me is that we have the choice between a broken mbrtowc on one side and a chance to generate broken filenames on the other side. I think we should actually revert fa272e05bbd0 ("wcstombs: also call __WCTOMB on terminating NUL if output buffer is NULL") and see if we can fix the filename issue in the Cygwin functions for filename conversion alone. Any ideas appreciated. Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple