delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/09/16/05:50:43

DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 48G9ogin262530
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=aJiik9VY
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EBAE73858417
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1726480242;
bh=E7ssmjAS80obz9m58oZgAMVtnLgxjdR7vgVZuKniMX4=;
h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=aJiik9VYlSU5c10T8GfHUEwv3fOcGjjDIRmV6O2h6rw6qfcLUecJGOmq+tRbsGVuu
rPMmfEuhhOSNdrjZGSR8SDbV/rUd3a+3C+UDKnD315EhsF/PL0xED2/eNIH2kgKGy+
36EB9PMvgEljSystvcMos4LIn2Jbd3yKy+CeC7o4=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1D3953858D20
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1D3953858D20
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726480190; cv=none;
b=qvj4xAzwLjbD98/c+CAGYCAOktSJYMsPXZAcwADOW3XgLHL8UuJRQzNN2cjGTI+Gzyfwubw6pTcF1H+Hkaluc+cbQ5vDFuT95goM9Y15bL1Z5StoiTjuLX+UDsfPN7NjqM3vp4WHpIIfulqPbKepxAX1I5/r7+0Mz7bKwTvNKrI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1726480190; c=relaxed/simple;
bh=SXDBeRXc/MzyJnArrnmpZbZSki5AtFxzei0xwn4ZjPs=;
h=Subject:To:From:Message-ID:Date:MIME-Version;
b=ZiBvyrgr8Q4DiGU4C/DEPwnLKuc04mQ0WBgTzIXODlFDLxMhplpr1GQuQgMzvr50FLpb27YUYVeVITGz3kTsOtk6XWnsFCKKHINJRAvaeivUztoGlVHE28gj3bJ1AgbO+M/QG3GzPZmqXzu6zL1wJI5wYTFJkaT0x2NIsrzZUBk=
ARC-Authentication-Results: i=1; server2.sourceware.org
Subject: Re: readdir() returns inaccessible name if file was created with
invalid UTF-8
To: cygwin AT cygwin DOT com
References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de>
<bc8bd61c-818e-424f-bb42-52f4fecd4849 AT towo DOT net>
<b6ab074b-919e-4514-8276-72a30c36ab58 AT towo DOT net>
<de4767e2-85b7-ead2-df9a-64e1f24f4e8f AT t-online DOT de>
Message-ID: <6451a249-adcd-9c56-b76e-1b00886cea80@t-online.de>
Date: Mon, 16 Sep 2024 11:49:41 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
SeaMonkey/2.53.18.2
MIME-Version: 1.0
In-Reply-To: <de4767e2-85b7-ead2-df9a-64e1f24f4e8f@t-online.de>
X-TOI-EXPURGATEID: 150726::1726480181-DB7FC5A6-AFE7B622/0/0 CLEAN NORMAL
X-TOI-MSGID: 5f96d0f3-c3af-4324-8152-db6bae90ce13
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00, FREEMAIL_FROM,
KAM_DMARC_STATUS, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE,
SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Christian Franke via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Christian Franke <Christian DOT Franke AT t-online DOT de>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 48G9ogin262530

Christian Franke via Cygwin wrote:
> Thomas Wolff via Cygwin wrote:
>>
>> Am 15.09.2024 um 20:15 schrieb Thomas Wolff via Cygwin:
>>> Am 15.09.2024 um 19:47 schrieb Christian Franke via Cygwin:
>>>> If a file name contains an invalid (truncated) UTF-8 sequence, open()
>>>> does not refuse to create the file. Later readdir() returns a
>>>> different name which could not be used to access the file.
>>>>
>>>> Testcase with U+1F321 (Thermometer):
>>>>
>>>> $ uname -r
>>>> 3.5.4-1.x86_64
>>>>
>>>> $ printf $'\U0001F321' | od -A none -t x1
>>>>  f0 9f 8c a1
>>>>
>>>> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext'
>>>>
>>>> $ touch 'file2-'$'\xf0\x9f\x8c''.ext'
>>>>
>>>> $ touch 'file3-'$'\xf0\x9f\x8c'
>>>>
>>>> $ ls -1
>>>> ls: cannot access 'file2-.?ext': No such file or directory
>>>> ls: cannot access 'file3-': No such file or directory
>>>> 'file1-'$'\360\237\214\241''.ext'
>>>> file2-.?ext
>>>> file3-
>>> I don't reproduce this.
>
> Yes, sorry, the above 'ls' was actually aliased to 'ls --color=auto' 
> which needs to call stat(). Plain 'ls' does not, so the errors do not 
> occur then.
>
>
>>>
>>> While the file name gets mangled, all resulting file names are valid 
>>> and
>>> listed:
>>> In file2 the sequence is turned into U+17B3 but exchanged with the dot.
>>> In file3 the same sequence is just dropped.
>>> $ ls -1|cat
>>> file1-🌡.ext
>>> file2-.áž³ext
>>> file3-
>>>
>>> However, ls file2* fails, as does ls *.
>> On the other hand, ls file3- fails too, so some mapping error occurs
>> internally.
>> Also, the files cannot be deleted from cygwin (need to use cmd).
>
> 'rm' using the original names works for file2-..., but not for file3-...
>
> $ rm -v 'file2-'$'\xf0\x9f\x8c''.ext'
> removed 'file2-'$'\360\237\214''.ext'
>
> $ rm -v 'file3-'$'\xf0\x9f\x8c'
> rm: cannot remove 'file3-'$'\360\237\214': No such file or directory
>

Further tests suggest that the problem only occurs with:
- incomplete 4 byte UTF-8 sequences (Unicode above 16 bit)
- complete but invalid 3 byte UTF-8 sequences which encode the UTF-16 
'high surrogate' range (0xD800..0xDBFF).


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019