delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/09/19/13:20:40

DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 48JHKeBx2738120
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=CI9h/eFd
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B1D7385842C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1726766438;
bh=NpJnlKX/ewEzLKkjhjBd06EZAVaNdls+LH/bHkrBi5k=;
h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=CI9h/eFd8Zgu+PtF2LC25Hw74q2KQLzeo5HA7swJ/X6pnh903VvEG0LzXnRMU1EOF
bMuL0yjoQI6qSSUay0R2iJdUPlXnGlStURzzRiOIb8kSE0krt+Rj/L/ZEfDT17T/ER
KV4DTJPE94aJDydgNmBbmjoiLOdOglvDhcl563Yg=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A7B243858C52
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A7B243858C52
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726766335; cv=none;
b=fev2m4zPsrlDTxWlIliRe/Cv6BwmJFViFC81cs8X7K462gOOlD1pSx0+M1PClp5GsdDAIJhuxZngwNZxDrDu22FFVJ2f1S7dzdALg4+OPDkEqSq8f+DJQ35NHgnC0nYjcf/F8vk+09dYnhNc/4d4tJQ8Jm9N81rhWDx24lD+bNk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1726766335; c=relaxed/simple;
bh=0TBtz5zwiYmjqBVsxhyiFV/KWQL3j8VNlWXngJubADw=;
h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version;
b=SBFHoKOf2BwCIeymWoasUs1SuV2ITVRfN2elh34/OUWgXGfZLPDcuTygRAb3Y4adED7Pw1+MKqw1USQ4XliZHW/z0poRvYnQIptTcy7m6GxsQOLC1nHqhTh7HoitkL4eQNuygGzcw29eH5/72Jy2d/aaLTtkPbVq4T8kRw1qvN4=
ARC-Authentication-Results: i=1; server2.sourceware.org
Date: Thu, 19 Sep 2024 10:18:50 -0700 (PDT)
X-X-Sender: jeremyd AT resin DOT csoft DOT net
To: Brian Inglis via Cygwin <cygwin AT cygwin DOT com>
Subject: Re: readdir() returns inaccessible name if file was created with
invalid UTF-8
In-Reply-To: <984103a4-ab2d-4337-9964-cc1e3208155d@SystematicSW.ab.ca>
Message-ID: <a135ca64-7245-7453-9693-4bf9dda8bb49@jdrake.com>
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de>
<bc8bd61c-818e-424f-bb42-52f4fecd4849 AT towo DOT net>
<b6ab074b-919e-4514-8276-72a30c36ab58 AT towo DOT net>
<de4767e2-85b7-ead2-df9a-64e1f24f4e8f AT t-online DOT de>
<6451a249-adcd-9c56-b76e-1b00886cea80 AT t-online DOT de>
<CAN0SSYx+g4JE6AA6krNAzG6QXrve52TBv0d3VM0SODV-tzZQSQ AT mail DOT gmail DOT com>
<66051d82-e2c3-684f-d13f-d1301170b0d4 AT t-online DOT de>
<984103a4-ab2d-4337-9964-cc1e3208155d AT SystematicSW DOT ab DOT ca>
MIME-Version: 1.0
X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, SPF_HELO_PASS,
SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Jeremy Drake via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Jeremy Drake <cygwin AT jdrake DOT com>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>

On Thu, 19 Sep 2024, Brian Inglis via Cygwin wrote:

> On 2024-09-19 07:27, Christian Franke via Cygwin wrote:
> >
> >
> > Yes, but Cygwin does not provide consistent forward/reverse UTF-8 <-> UTF-16
> > mappings.
>
> Surrogates halves are invalid for UTF-8 encoding; they should be first be
> encoded as a valid UTF-16 code point.
> The encoder should just fail if it encounters any invalid sequence!
> Handling surrogates or other invalid values as anything other than invalid
> turns
> the encoding into what has been called WTF-8 where W may be for Windows! ;^>

This may be necessary though, in order to round-trip anything which
is valid in NTFS.  In my opinion, rm -rf not failing in the face of
potentially maliciously named files/directories is more important than
strictly adhering to a standard that says 'fail if you see these values'.

https://cygwin.com/pipermail/cygwin/2024-June/256111.html

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019