DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 48FNV4rw032381 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=oiCeQcOb X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 05BD13858402 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1726433359; bh=C7h545Xy4v3VpdB/mULJUXpiLZ/OttkE9hlsweZ/XqQ=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=oiCeQcObyCGgLN0Sl2xwU/WG4l+nobvNd5q9AW4wHGQX63m+2AAHN6OYvYwCRpTDS 4rjspsZ+lavg9VFS0j8eXePJwLYLUFgU3jcXX1fUiOY0xQGHIeBMdguLBTyfMz3ugr 7xSRnCMF0J/Pf31pjltZ/WN/vJEYxXLBrJ6Dm4Ro= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8D13D3858D20 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8D13D3858D20 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726433337; cv=none; b=Qfiv0eN5Rqzgli3EmJ4hpXghBx5rEkgrdOpf5hmtQjaFiu5HPJnSh/dNu+tqbDvRTgG0M2ViyWH1wDxryp/RyjPnnKjNQsjjS9Laa0sr8/qzG5WemE1Od9HlV26cm+VuisjB4NXfCfXQR/PiQgm/1TtzNgDDRNRRk/9GT4iOE9A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1726433337; c=relaxed/simple; bh=UGh76Whc1T5z3f5bA0Zbdw0LFpi0/Qs76AwnaFnzUoM=; h=Subject:To:From:Message-ID:Date:MIME-Version; b=PpTLKfPF7j4/6COcxjPq8qMJLz7SbqELtVbbwpI/wj9XguzEnQLXufw7Jq7egmeXZ8m0EyWd62KbCJFA9CiPFXW8V1o3xAvENTPoLOTyIFpmh0aqwz9PSHwWidpBBc3GqWVDXWm9ledoAlPanwGG2OwTfE5+qjhQSqeASlFQ8bQ= ARC-Authentication-Results: i=1; server2.sourceware.org Subject: Re: readdir() returns inaccessible name if file was created with invalid UTF-8 To: cygwin AT cygwin DOT com References: <96f2253b-791b-b8a0-97dd-8d257eefb9b1 AT t-online DOT de> Message-ID: Date: Sun, 15 Sep 2024 22:48:53 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 SeaMonkey/2.53.18.2 MIME-Version: 1.0 In-Reply-To: X-TOI-EXPURGATEID: 150726::1726433333-5CFF683E-0F8C2AD7/0/0 CLEAN NORMAL X-TOI-MSGID: d984ef50-7411-4ec5-9d84-79a7ef82939f X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00, FREEMAIL_FROM, KAM_DMARC_STATUS, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Christian Franke via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Christian Franke Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 48FNV4rw032381 Thomas Wolff via Cygwin wrote: > > Am 15.09.2024 um 20:15 schrieb Thomas Wolff via Cygwin: >> Am 15.09.2024 um 19:47 schrieb Christian Franke via Cygwin: >>> If a file name contains an invalid (truncated) UTF-8 sequence, open() >>> does not refuse to create the file. Later readdir() returns a >>> different name which could not be used to access the file. >>> >>> Testcase with U+1F321 (Thermometer): >>> >>> $ uname -r >>> 3.5.4-1.x86_64 >>> >>> $ printf $'\U0001F321' | od -A none -t x1 >>>  f0 9f 8c a1 >>> >>> $ touch 'file1-'$'\xf0\x9f\x8c\xa1''.ext' >>> >>> $ touch 'file2-'$'\xf0\x9f\x8c''.ext' >>> >>> $ touch 'file3-'$'\xf0\x9f\x8c' >>> >>> $ ls -1 >>> ls: cannot access 'file2-.?ext': No such file or directory >>> ls: cannot access 'file3-': No such file or directory >>> 'file1-'$'\360\237\214\241''.ext' >>> file2-.?ext >>> file3- >> I don't reproduce this. Yes, sorry, the above 'ls' was actually aliased to 'ls --color=auto' which needs to call stat(). Plain 'ls' does not, so the errors do not occur then. >> >> While the file name gets mangled, all resulting file names are valid and >> listed: >> In file2 the sequence is turned into U+17B3 but exchanged with the dot. >> In file3 the same sequence is just dropped. >> $ ls -1|cat >> file1-🌡.ext >> file2-.ឳext >> file3- >> >> However, ls file2* fails, as does ls *. > On the other hand, ls file3- fails too, so some mapping error occurs > internally. > Also, the files cannot be deleted from cygwin (need to use cmd). 'rm' using the original names works for file2-..., but not for file3-... $ rm -v 'file2-'$'\xf0\x9f\x8c''.ext' removed 'file2-'$'\360\237\214''.ext' $ rm -v 'file3-'$'\xf0\x9f\x8c' rm: cannot remove 'file3-'$'\360\237\214': No such file or directory -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple