delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2021/06/25/21:54:27

X-Recipient: archive-cygwin AT delorie DOT com
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CA666385F017
Authentication-Results: sourceware.org;
dmarc=none (p=none dis=none) header.from=syping.de
Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=syping.de
To: cygwin AT cygwin DOT com
From: Vadim <vad AT syping DOT de>
Subject: Cygwin, Unicode and "long" path names
Message-ID: <952ad3ba-34f4-c3a4-450c-263b16795c8d@syping.de>
Date: Sat, 26 Jun 2021 03:53:29 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
MIME-Version: 1.0
X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_50, BODY_8BITS,
KAM_DMARC_STATUS, MAY_BE_FORGED, SPF_FAIL, SPF_HELO_FAIL,
TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Level: ***
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 15Q1sQV5018703

Ah, this beautiful topic. Windows 7 x64.

This is the summary written as post-scriptum, tests and findings below:

1) Cygwin limits individual names to 255 bytes, Windows seems to follow 
UTF-16 chars and work fine: 256 bytes in 108 characters works.

Basically, this becomes a bytes vs characters story.

2) Bash file name auto-expansion detects the file of that name, but it 
gets truncated to 255 bytes. find's behaviour is the same ("No such file 
or directory" due to trying to access a non-existing truncated name)

2.1) If you try to correct the above mistake by adding truncated 
characters, then the program (cat) will complain about "File name too long"

2.2) If there exists a folder with a 255-byte name, equal to the 
truncated name, then "find ." will do a listing on that folder twice 
(effectively hiding the long-named folder from tools without leaving an 
error message)

3) UNC Paths get the same treatment: File name too long.

I expected Cygwin to handle these names without problems just like 
Windows, Explorer, cmd etc. do. Is this particular problem new or known? 
All I could find on the mailing list is around the time when Cygwin 
hadn't yet implemented Unicode support (UTF-8?), ~2004-2008.

These names were created by youtube-dl.exe executed from within Cygwin.

- Vadim

---

This file name is 255 bytes long and works:

s123點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt

This is 256 bytes and works perfectly normal in Windows (explorer, can 
paste and "dir <name>" in cmd despite showing [] block chars), but not 
Cygwin terminal (I used s123/s1234 as a prefix for easy auto-expansion):

s1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt


If I try to use tab-expansion in the terminal (mintty, bash) the problem 
becomes apparent ("xt" missing at the end):

$ cat s1234點半蘋果新聞報道\ 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成 ︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.t
cat: 's1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.t': No such file or directory


However, with one fewer byte it expands properly:

$ cat s123點半蘋果新聞報道\ 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.txt
hello


MAX_PATH? Yes, 255 bytes. Why then does the full file/folder name work 
in Windows? This is the full name (a folder), 257 bytes:

20210518_9點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞

And it can get longer! In fact, I can bump the total path to 396 bytes 
or "Column 249" as Notepad++ counts the characters (individual folder 
name is 359b or 211 chars, "column 212"):

D:/abcdefgh/Local_TEMP/cygwinunicode/1_123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789020210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞


NTFS allows up to 255 UTF-16 for an individual path segment and this 
seems to align with the Windows tooling: cmd and Explorer can browse 
these fine, but the included file in the folder spills beyond the limit 
and you run into the usual 'total path too long' problem).

Whether you manually add the missing "xt" to the tab-completion or use 
UNC paths, the result is the same:

$ cat s1234點半蘋果新聞報道\ 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ 
O記轉介律政司︱新巴車長被判不小心駕駛罪成 ︱深圳賽格大樓離奇劇晃\ 
民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.txt
cat: 's1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt': File name too long
$ cat '\\?\D:\abcdefgh\Local_TEMP\cygwinunicode\20210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt'
cat: '\\?\D:\abcdefgh\Local_TEMP\cygwinunicode\20210518_9點半蘋果新聞報道 
字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 
O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 
民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt': File name too long


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019