DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 48JHKeBx2738120 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=CI9h/eFd X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B1D7385842C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1726766438; bh=NpJnlKX/ewEzLKkjhjBd06EZAVaNdls+LH/bHkrBi5k=; h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=CI9h/eFd8Zgu+PtF2LC25Hw74q2KQLzeo5HA7swJ/X6pnh903VvEG0LzXnRMU1EOF bMuL0yjoQI6qSSUay0R2iJdUPlXnGlStURzzRiOIb8kSE0krt+Rj/L/ZEfDT17T/ER KV4DTJPE94aJDydgNmBbmjoiLOdOglvDhcl563Yg= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A7B243858C52 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A7B243858C52 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726766335; cv=none; b=fev2m4zPsrlDTxWlIliRe/Cv6BwmJFViFC81cs8X7K462gOOlD1pSx0+M1PClp5GsdDAIJhuxZngwNZxDrDu22FFVJ2f1S7dzdALg4+OPDkEqSq8f+DJQ35NHgnC0nYjcf/F8vk+09dYnhNc/4d4tJQ8Jm9N81rhWDx24lD+bNk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726766335; c=relaxed/simple; bh=0TBtz5zwiYmjqBVsxhyiFV/KWQL3j8VNlWXngJubADw=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=SBFHoKOf2BwCIeymWoasUs1SuV2ITVRfN2elh34/OUWgXGfZLPDcuTygRAb3Y4adED7Pw1+MKqw1USQ4XliZHW/z0poRvYnQIptTcy7m6GxsQOLC1nHqhTh7HoitkL4eQNuygGzcw29eH5/72Jy2d/aaLTtkPbVq4T8kRw1qvN4= ARC-Authentication-Results: i=1; server2.sourceware.org Date: Thu, 19 Sep 2024 10:18:50 -0700 (PDT) X-X-Sender: jeremyd AT resin DOT csoft DOT net To: Brian Inglis via Cygwin Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 In-Reply-To: <984103a4-ab2d-4337-9964-cc1e3208155d@SystematicSW.ab.ca> Message-ID: References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> <6451a249-adcd-9c56-b76e-1b00886cea80 AT t-online DOT de> <66051d82-e2c3-684f-d13f-d1301170b0d4 AT t-online DOT de> <984103a4-ab2d-4337-9964-cc1e3208155d AT SystematicSW DOT ab DOT ca> MIME-Version: 1.0 X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Jeremy Drake via Cygwin Reply-To: Jeremy Drake Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" On Thu, 19 Sep 2024, Brian Inglis via Cygwin wrote: > On 2024-09-19 07:27, Christian Franke via Cygwin wrote: > > > > > > Yes, but Cygwin does not provide consistent forward/reverse UTF-8 <-> UTF-16 > > mappings. > > Surrogates halves are invalid for UTF-8 encoding; they should be first be > encoded as a valid UTF-16 code point. > The encoder should just fail if it encounters any invalid sequence! > Handling surrogates or other invalid values as anything other than invalid > turns > the encoding into what has been called WTF-8 where W may be for Windows! ;^> This may be necessary though, in order to round-trip anything which is valid in NTFS. In my opinion, rm -rf not failing in the face of potentially maliciously named files/directories is more important than strictly adhering to a standard that says 'fail if you see these values'. https://cygwin.com/pipermail/cygwin/2024-June/256111.html -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple