DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 48JHKeBx2738120
Authentication-Results: delorie.com;
	dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=CI9h/eFd
X-Recipient: archive-cygwin@delorie.com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B1D7385842C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
	s=default; t=1726766438;
	bh=NpJnlKX/ewEzLKkjhjBd06EZAVaNdls+LH/bHkrBi5k=;
	h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=CI9h/eFd8Zgu+PtF2LC25Hw74q2KQLzeo5HA7swJ/X6pnh903VvEG0LzXnRMU1EOF
	 bMuL0yjoQI6qSSUay0R2iJdUPlXnGlStURzzRiOIb8kSE0krt+Rj/L/ZEfDT17T/ER
	 KV4DTJPE94aJDydgNmBbmjoiLOdOglvDhcl563Yg=
X-Original-To: cygwin@cygwin.com
Delivered-To: cygwin@cygwin.com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A7B243858C52
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A7B243858C52
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726766335; cv=none;
 b=fev2m4zPsrlDTxWlIliRe/Cv6BwmJFViFC81cs8X7K462gOOlD1pSx0+M1PClp5GsdDAIJhuxZngwNZxDrDu22FFVJ2f1S7dzdALg4+OPDkEqSq8f+DJQ35NHgnC0nYjcf/F8vk+09dYnhNc/4d4tJQ8Jm9N81rhWDx24lD+bNk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1726766335; c=relaxed/simple;
 bh=0TBtz5zwiYmjqBVsxhyiFV/KWQL3j8VNlWXngJubADw=;
 h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version;
 b=SBFHoKOf2BwCIeymWoasUs1SuV2ITVRfN2elh34/OUWgXGfZLPDcuTygRAb3Y4adED7Pw1+MKqw1USQ4XliZHW/z0poRvYnQIptTcy7m6GxsQOLC1nHqhTh7HoitkL4eQNuygGzcw29eH5/72Jy2d/aaLTtkPbVq4T8kRw1qvN4=
ARC-Authentication-Results: i=1; server2.sourceware.org
Date: Thu, 19 Sep 2024 10:18:50 -0700 (PDT)
X-X-Sender: jeremyd@resin.csoft.net
To: Brian Inglis via Cygwin <cygwin@cygwin.com>
Subject: Re: readdir() returns inaccessible name if file was created with
 invalid UTF-8
In-Reply-To: <984103a4-ab2d-4337-9964-cc1e3208155d@SystematicSW.ab.ca>
Message-ID: <a135ca64-7245-7453-9693-4bf9dda8bb49@jdrake.com>
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1@t-online.de>
 <bc8bd61c-818e-424f-bb42-52f4fecd4849@towo.net>
 <b6ab074b-919e-4514-8276-72a30c36ab58@towo.net>
 <de4767e2-85b7-ead2-df9a-64e1f24f4e8f@t-online.de>
 <6451a249-adcd-9c56-b76e-1b00886cea80@t-online.de>
 <CAN0SSYx+g4JE6AA6krNAzG6QXrve52TBv0d3VM0SODV-tzZQSQ@mail.gmail.com>
 <66051d82-e2c3-684f-d13f-d1301170b0d4@t-online.de>
 <984103a4-ab2d-4337-9964-cc1e3208155d@SystematicSW.ab.ca>
MIME-Version: 1.0
X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: cygwin@cygwin.com
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=subscribe>
From: Jeremy Drake via Cygwin <cygwin@cygwin.com>
Reply-To: Jeremy Drake <cygwin@jdrake.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: cygwin-bounces~archive-cygwin=delorie.com@cygwin.com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie.com@cygwin.com>

On Thu, 19 Sep 2024, Brian Inglis via Cygwin wrote:

> On 2024-09-19 07:27, Christian Franke via Cygwin wrote:
> >
> >
> > Yes, but Cygwin does not provide consistent forward/reverse UTF-8 <-> UTF-16
> > mappings.
>
> Surrogates halves are invalid for UTF-8 encoding; they should be first be
> encoded as a valid UTF-16 code point.
> The encoder should just fail if it encounters any invalid sequence!
> Handling surrogates or other invalid values as anything other than invalid
> turns
> the encoding into what has been called WTF-8 where W may be for Windows! ;^>

This may be necessary though, in order to round-trip anything which
is valid in NTFS.  In my opinion, rm -rf not failing in the face of
potentially maliciously named files/directories is more important than
strictly adhering to a standard that says 'fail if you see these values'.

https://cygwin.com/pipermail/cygwin/2024-June/256111.html

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
