delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/09/17/06:44:06

DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 48HAi6BT1064905
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=NDNSJEfk
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C25C63858C3A
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1726569845;
bh=PUFcwYHQ1vH+loNPqMg407ECFrlog/Etkphfx4a8WVg=;
h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=NDNSJEfkTivxvT3sunZ94tOmjbRkenmPUWxWJDwHhFW8dU2p4GLD1/TCR5f1Z4WgD
Co1sMnl+YANDv5Xc6EaickhDV5HDh/u7OPFyCEdsAs94KAiYloV8fjpIH8mzOshITt
VzB4NVJiZpg9fjhcS2j9ub00FirlzE913fBKf5NI=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2FAF03858D26
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2FAF03858D26
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726569783; cv=none;
b=tCH0HO7vfMjK0wMzwB7dt9A7K9Hxakgx+qPZlWQDqjSLa2aGEvGPCOrBWGBuIwqtI2MeZt40fHrFR47TTzpM8JqMjeNdA/m2T9FVWlXx4vA8RPWycLywib+X0HySdGT3ECG4yS4k/6SIoROefSvYjQnbdejFzzm7y4agEdqdMzo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1726569783; c=relaxed/simple;
bh=Te1gLwaefnKxjR7K5LTBrNdjJOsQfWSKjlI1WydF+EA=;
h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To;
b=h6+OHDalyE6YnX/WsGIhPsCpnEHC3SV9R5WXVa+kYFKpo9s3b/WlMWN7bIT9YV3pICITjGsm2hQ6UnwvYWEninL+WHuxBDO+njpz83DsS4/egvWEgR1EI4+daqvYypIKrktR5efXJgRDpJ1PWmdKtQFRMXiQp7uq0FV1stqzwSQ=
ARC-Authentication-Results: i=1; server2.sourceware.org
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1726569781; x=1727174581;
h=content-transfer-encoding:to:subject:message-id:date:from
:in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
:subject:date:message-id:reply-to;
bh=tzQD7GC07PPu3fzssiePxvpoVbeZVtzu7nzudMhdfi8=;
b=UuSMCmYUkys87dTp5gNrVwY5kV7TuXWdH7Kw0T8mXqVz/yiAuHoeKHZ1ebFrE1l9E8
L/Vhi8lNt5BUGNCxahSq2GlwEG5FY6aeAV7TMariO2TQj+62YAxrnTSxGZp3QK5hO9/J
XkbH1M+C8/7wkp0z+gYFiY7vqk2u6Cy+uoW4OTmwCGJDg47y4MJ9NkeGw6MdC162ob6T
kHrHxDcLCMw8iZZxidh5XxzzAlBH7viTGGdiHwXBRgY1uEowlGGsYZ5MU/WVRM7N0Nyv
bNMNRXC4k/T2/ig5ffQMN62X0LWpGLNh+xZg7vusR5PhJUH2nn7D++ZwUDcYSdojnxFl
IhfA==
X-Gm-Message-State: AOJu0Yxz1rje9JY0BXkAdouGCqk011h+0Xr7AYrBdivLON68/7h5ux0x
MKDqXjLhtoWVE3yWukKrVxwk4cxqbn7tA7kMjZQ8s2uI7vNLD0dJN5XbU6rKfDvsZkFl+T0STjY
vl6hINDBu8dvVhHR5oJq8PzM7lGDdFQ==
X-Google-Smtp-Source: AGHT+IE9gztTYV2eqRkNNrBLXSPE8mrNmF7JNZIWvcCqQdvIqE72MsZZDkjFri68Itxny3mRMCsIQfCSyV4y0oOymYk=
X-Received: by 2002:a05:6a00:3c93:b0:712:7195:265d with SMTP id
d2e1a72fcca58-719260b7b36mr27546388b3a.0.1726569780805; Tue, 17 Sep 2024
03:43:00 -0700 (PDT)
MIME-Version: 1.0
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de>
<bc8bd61c-818e-424f-bb42-52f4fecd4849 AT towo DOT net>
<b6ab074b-919e-4514-8276-72a30c36ab58 AT towo DOT net>
<de4767e2-85b7-ead2-df9a-64e1f24f4e8f AT t-online DOT de>
<6451a249-adcd-9c56-b76e-1b00886cea80 AT t-online DOT de>
In-Reply-To: <6451a249-adcd-9c56-b76e-1b00886cea80@t-online.de>
Date: Tue, 17 Sep 2024 12:42:24 +0200
Message-ID: <CAN0SSYx+g4JE6AA6krNAzG6QXrve52TBv0d3VM0SODV-tzZQSQ@mail.gmail.com>
Subject: Re: readdir() returns inaccessible name if file was created with
invalid UTF-8
To: cygwin AT cygwin DOT com
X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_NUMSUBJECT,
RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Mark Liam Brown via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Mark Liam Brown <brownmarkliam AT gmail DOT com>
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 48HAi6BT1064905

On Mon, Sep 16, 2024 at 11:51 AM Christian Franke via Cygwin
<cygwin AT cygwin DOT com> wrote:
>
> Christian Franke via Cygwin wrote:
> > Thomas Wolff via Cygwin wrote:
> >>
> >> Am 15.09.2024 um 20:15 schrieb Thomas Wolff via Cygwin:
> >>> Am 15.09.2024 um 19:47 schrieb Christian Franke via Cygwin:
> >>>> If a file name contains an invalid (truncated) UTF-8 sequence, open()
> >>>> does not refuse to create the file. Later readdir() returns a
> >>>> different name which could not be used to access the file.
> >>>>
> >>>> Testcase with U+1F321 (Thermometer):
> >>>>
> >>>> $ uname -r
> >>>> 3.5.4-1.x86_64
> >>>>
> >>>> $ printf $'\U0001F321' | od -A none -t x1
> >>>>  f0 9f 8c a1
> >>>>
> >>>> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext'
> >>>>
> >>>> $ touch 'file2-'$'\xf0\x9f\x8c''.ext'
> >>>>
> >>>> $ touch 'file3-'$'\xf0\x9f\x8c'
> >>>>
> >>>> $ ls -1
> >>>> ls: cannot access 'file2-.?ext': No such file or directory
> >>>> ls: cannot access 'file3-': No such file or directory
> >>>> 'file1-'$'\360\237\214\241''.ext'
> >>>> file2-.?ext
> >>>> file3-
> >>> I don't reproduce this.
> >
> > Yes, sorry, the above 'ls' was actually aliased to 'ls --color=auto'
> > which needs to call stat(). Plain 'ls' does not, so the errors do not
> > occur then.
> >
> >
> >>>
> >>> While the file name gets mangled, all resulting file names are valid
> >>> and
> >>> listed:
> >>> In file2 the sequence is turned into U+17B3 but exchanged with the dot.
> >>> In file3 the same sequence is just dropped.
> >>> $ ls -1|cat
> >>> file1-🌡.ext
> >>> file2-.áž³ext
> >>> file3-
> >>>
> >>> However, ls file2* fails, as does ls *.
> >> On the other hand, ls file3- fails too, so some mapping error occurs
> >> internally.
> >> Also, the files cannot be deleted from cygwin (need to use cmd).
> >
> > 'rm' using the original names works for file2-..., but not for file3-...
> >
> > $ rm -v 'file2-'$'\xf0\x9f\x8c''.ext'
> > removed 'file2-'$'\360\237\214''.ext'
> >
> > $ rm -v 'file3-'$'\xf0\x9f\x8c'
> > rm: cannot remove 'file3-'$'\360\237\214': No such file or directory
> >
>
> Further tests suggest that the problem only occurs with:
> - incomplete 4 byte UTF-8 sequences (Unicode above 16 bit)
> - complete but invalid 3 byte UTF-8 sequences which encode the UTF-16
> 'high surrogate' range (0xD800..0xDBFF).

Makes perfect sense, the Windows kernel uses UTF16 internally.

Mark
-- 
IT Infrastructure Consultant
Windows, Linux

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019