delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2013/12/10/05:28:22

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; q=dns; s=
default; b=qs+8gqmREdz8HgmbjYJjsdLPLEdoD+1wsfixb9afSicpq/y22yrb2
s7vsyGCj4DM4Z1BxJ7nLMEsTAkubk46or95mqFu2RatFX+ex0puTThB+vySkxQ50
Tc3Iqca8FpPJ8TC8ipEYucW1w+Wlci1GCBUNrqaKRw7YG9Azt1rm60=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; s=default;
bh=hxFPUvzl7V5BVOqhYpjDSZSRR9A=; b=RKohTwpAR8Z+xAGE4AI7AjnLV3RD
6Y7Gv2AWcqSQGrfhAda5TcsAYOsUa51l4yD55BNl+nvohox7Dx3fHUD0j/Zj0awU
WWFs3AKUeWx1WMkbPeP9nYaCVwRffhZXi6o8548NyZSNB82VTCZH3O3apKspTtxz
X3kQJGt1fIJxarU=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=2.1 required=5.0 tests=AWL,BAYES_50,LIKELY_SPAM_SUBJECT autolearn=no version=3.3.2
X-HELO: calimero.vinschen.de
Date: Tue, 10 Dec 2013 11:27:55 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: cant access to files more than 128 utf-8 symbol long names
Message-ID: <20131210102755.GQ2527@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <52A6BFA4 DOT 9010101 AT spektr-rfs DOT ru>
MIME-Version: 1.0
In-Reply-To: <52A6BFA4.9010101@spektr-rfs.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)

--L1EIGrW/+75u5Nmw
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Dec 10 11:15, Nikolay Ilychev wrote:
> Hello!
>=20
> When using cygwin, i can't list, copy, remove files and directories
> with 128 utf-8 symbol long names.
>=20
> useless examples that illustrates the problem:
> [...]
> same problem with other tools - find, perl, rsync from cygwin repo.
>=20
> Please, make the MAX_PATH not for 260 bytes, but 260 utf-8 symbols.

Easier said than done.

First of all, this is NOT about MAX_PATH.  MAX_PATH (260 chars) is the
number of characters allowed in the Win32 ANSI file API for a complete
path, including the terminating null.  Cygwin is using the native NT API
and, occasionally, the Win32 UNICODE file API, which allows paths of up
to 32767 chars.

The problem here is about NAME_MAX.  NAME_MAX is per POSIX[1] the
"maximum number of bytes in a filename (not including the terminating
null)."

Note the word *bytes*.  Not characters, bytes. UTF-8 chars are 1 to 4
bytes in length.  Thus, the maximum number of UTF-8 chars in a filename
is potentially less than NAME_MAX:

A filename of chars only from the basic latin charset (1 byte in UTF-8)
may consist of NAME_MAX characters, a filename solely constructed from
chars of the latin-1 supplement (2 byte chars) may consist of NAME_MAX /
2 characters, a filename constructed from emoticons (4 byte chars) only
of NAME_MAX / 4 chars.

Ok, so we all know that Windows is not using a byte representation of
filenames, rather the OS uses UTF-16 to store and handle filenames
internally.  Filename on Windows filesystems may consist of 255 UTF-16
chars[2].

How do you represent this in a byte-oriented POSIX system?  What do you
set NAME_MAX to?  You can't get it right due to the unfortunate multibyte
vs. UTF-16 encoding issue.

To cover all UTF-8 chars, NAME_MAX would have to be 1020.  But then,
applications relying on NAME_MAX will be surprised by ENAMETOOLONG
errors for perfectly valid POSIX filenames.

If you make it 255, applications will be surprised by ENAMETOOLONG
errors for perfectly valid Windows filenames.

If you make it 255 on the application level but then return filenames
longer than 255 multibyte chars to the application, they will crash
due to buffer overflow issues.  After all, NAME_MAX is a contractual
obligation.

There was also the backward compatibility issue.  Back in the pre-Cygwin
1.7 days, when Cygwin used the ANSI file API, NAME_MAX was already 255.
Changing that to a bigger value might have resulted in the
aforementioned application crashes due to buffer overflows as well.

So we decided to keep NAME_MAX at the same value as it always was, 255.
This restricts the actual filename length when using multibyte
characters just as on any other POSIX system with the downside that,
occasionally, a Windows filename will be too long to handle.

Sorry if that is frustrating in your current situation, but this
isn't something we can just change at a whim and go ahead.  It would
break compatibility with all existing Cygwin executables.


Corinna


[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html
[2] However, this does *not* cover NFS or other filesystems using a
    byte representation for storing filenames.


--=20
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--L1EIGrW/+75u5Nmw
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)

iQIcBAEBAgAGBQJSpuyrAAoJEPU2Bp2uRE+g+5YP/ibv3cYiurP1aC0BW3C15UhR
X++Ex9NYlMo3QD1RahVzC+nFzHSveQlGTKrBHYOVd892HQmG3NL47s1PcZlE2yyp
efmmCTMj3xC64epKJPpr5UTtDgAVZFHUR8XTsBF1PJW304/d/GrX1MUre7o+I3sV
93xNG3d9D/ReSYNw0wr5JWoBx5QS/7hd4wHOu6ZLFD80F/F8blibuqTC75usJSkj
VqArllavicu5DQpKb7AI7R+NfY0+Z7qtK4mN5Kz5Qh1Ruzlet11NwnUVykVLNnTQ
+V/hLNbQS74M+xIdXRRM3cGdKk5LBmMdYzCaQYwi4EE3zqHvxxns6jPldEwLPCy4
MNsp3o0H7RZV7P34NwD45ss+kdTz6z18eK+ugfEABX5AGgfnbhQS6orCx3NrBBhe
rlfjYF9HO2srLi8F8n0aF6vPhVTUxq5XP/exXjVRWan3BMqw7zSu0YxSy+smxURl
3xTGGh2AePxKqz5i05dB3lSr0IwgTPnfCq28urp/OjYfM4KfdzgC54ZejGhFp3l6
4aycuBhwwH36a6iYG2CBx+EyMXkhwsJHFVDn2yBjg5SdM2f07zUj5ltKSbkItH/q
MoNjRVQBr0G7r0Ta2iiULy8IBFyL2zQYDIbjtQ0iZ1s6ysgYq9FKH3B45kzkpczU
qGE2ZhvSMYzMrHTgApDy
=gYu5
-----END PGP SIGNATURE-----

--L1EIGrW/+75u5Nmw--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019