X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=qs+8gqmREdz8HgmbjYJjsdLPLEdoD+1wsfixb9afSicpq/y22yrb2 s7vsyGCj4DM4Z1BxJ7nLMEsTAkubk46or95mqFu2RatFX+ex0puTThB+vySkxQ50 Tc3Iqca8FpPJ8TC8ipEYucW1w+Wlci1GCBUNrqaKRw7YG9Azt1rm60= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=hxFPUvzl7V5BVOqhYpjDSZSRR9A=; b=RKohTwpAR8Z+xAGE4AI7AjnLV3RD 6Y7Gv2AWcqSQGrfhAda5TcsAYOsUa51l4yD55BNl+nvohox7Dx3fHUD0j/Zj0awU WWFs3AKUeWx1WMkbPeP9nYaCVwRffhZXi6o8548NyZSNB82VTCZH3O3apKspTtxz X3kQJGt1fIJxarU= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.1 required=5.0 tests=AWL,BAYES_50,LIKELY_SPAM_SUBJECT autolearn=no version=3.3.2 X-HELO: calimero.vinschen.de Date: Tue, 10 Dec 2013 11:27:55 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: cant access to files more than 128 utf-8 symbol long names Message-ID: <20131210102755.GQ2527@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <52A6BFA4 DOT 9010101 AT spektr-rfs DOT ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="L1EIGrW/+75u5Nmw" Content-Disposition: inline In-Reply-To: <52A6BFA4.9010101@spektr-rfs.ru> User-Agent: Mutt/1.5.21 (2010-09-15) --L1EIGrW/+75u5Nmw Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Dec 10 11:15, Nikolay Ilychev wrote: > Hello! >=20 > When using cygwin, i can't list, copy, remove files and directories > with 128 utf-8 symbol long names. >=20 > useless examples that illustrates the problem: > [...] > same problem with other tools - find, perl, rsync from cygwin repo. >=20 > Please, make the MAX_PATH not for 260 bytes, but 260 utf-8 symbols. Easier said than done. First of all, this is NOT about MAX_PATH. MAX_PATH (260 chars) is the number of characters allowed in the Win32 ANSI file API for a complete path, including the terminating null. Cygwin is using the native NT API and, occasionally, the Win32 UNICODE file API, which allows paths of up to 32767 chars. The problem here is about NAME_MAX. NAME_MAX is per POSIX[1] the "maximum number of bytes in a filename (not including the terminating null)." Note the word *bytes*. Not characters, bytes. UTF-8 chars are 1 to 4 bytes in length. Thus, the maximum number of UTF-8 chars in a filename is potentially less than NAME_MAX: A filename of chars only from the basic latin charset (1 byte in UTF-8) may consist of NAME_MAX characters, a filename solely constructed from chars of the latin-1 supplement (2 byte chars) may consist of NAME_MAX / 2 characters, a filename constructed from emoticons (4 byte chars) only of NAME_MAX / 4 chars. Ok, so we all know that Windows is not using a byte representation of filenames, rather the OS uses UTF-16 to store and handle filenames internally. Filename on Windows filesystems may consist of 255 UTF-16 chars[2]. How do you represent this in a byte-oriented POSIX system? What do you set NAME_MAX to? You can't get it right due to the unfortunate multibyte vs. UTF-16 encoding issue. To cover all UTF-8 chars, NAME_MAX would have to be 1020. But then, applications relying on NAME_MAX will be surprised by ENAMETOOLONG errors for perfectly valid POSIX filenames. If you make it 255, applications will be surprised by ENAMETOOLONG errors for perfectly valid Windows filenames. If you make it 255 on the application level but then return filenames longer than 255 multibyte chars to the application, they will crash due to buffer overflow issues. After all, NAME_MAX is a contractual obligation. There was also the backward compatibility issue. Back in the pre-Cygwin 1.7 days, when Cygwin used the ANSI file API, NAME_MAX was already 255. Changing that to a bigger value might have resulted in the aforementioned application crashes due to buffer overflows as well. So we decided to keep NAME_MAX at the same value as it always was, 255. This restricts the actual filename length when using multibyte characters just as on any other POSIX system with the downside that, occasionally, a Windows filename will be too long to handle. Sorry if that is frustrating in your current situation, but this isn't something we can just change at a whim and go ahead. It would break compatibility with all existing Cygwin executables. Corinna [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html [2] However, this does *not* cover NFS or other filesystems using a byte representation for storing filenames. --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --L1EIGrW/+75u5Nmw Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAEBAgAGBQJSpuyrAAoJEPU2Bp2uRE+g+5YP/ibv3cYiurP1aC0BW3C15UhR X++Ex9NYlMo3QD1RahVzC+nFzHSveQlGTKrBHYOVd892HQmG3NL47s1PcZlE2yyp efmmCTMj3xC64epKJPpr5UTtDgAVZFHUR8XTsBF1PJW304/d/GrX1MUre7o+I3sV 93xNG3d9D/ReSYNw0wr5JWoBx5QS/7hd4wHOu6ZLFD80F/F8blibuqTC75usJSkj VqArllavicu5DQpKb7AI7R+NfY0+Z7qtK4mN5Kz5Qh1Ruzlet11NwnUVykVLNnTQ +V/hLNbQS74M+xIdXRRM3cGdKk5LBmMdYzCaQYwi4EE3zqHvxxns6jPldEwLPCy4 MNsp3o0H7RZV7P34NwD45ss+kdTz6z18eK+ugfEABX5AGgfnbhQS6orCx3NrBBhe rlfjYF9HO2srLi8F8n0aF6vPhVTUxq5XP/exXjVRWan3BMqw7zSu0YxSy+smxURl 3xTGGh2AePxKqz5i05dB3lSr0IwgTPnfCq28urp/OjYfM4KfdzgC54ZejGhFp3l6 4aycuBhwwH36a6iYG2CBx+EyMXkhwsJHFVDn2yBjg5SdM2f07zUj5ltKSbkItH/q MoNjRVQBr0G7r0Ta2iiULy8IBFyL2zQYDIbjtQ0iZ1s6ysgYq9FKH3B45kzkpczU qGE2ZhvSMYzMrHTgApDy =gYu5 -----END PGP SIGNATURE----- --L1EIGrW/+75u5Nmw--