delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:date:from:reply-to:message-id:to:subject | |
:in-reply-to:references:mime-version:content-type | |
:content-transfer-encoding; q=dns; s=default; b=IwSylVvORZJ4ZBCQ | |
jkggBOnzhW4ADA5nha5uyVBPZBCDnVfwiR/6odosOUGviYOvNsW+dH1pjtTq+qRo | |
CocUdnXzhkgVKf8WpSzmNWpHaV5jPAJKW31AxLp9AQRGsC6S9GsXF1fe7YMSHo8p | |
H+khLaO6NG8nT+GqzV8gui3BuBY= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:date:from:reply-to:message-id:to:subject | |
:in-reply-to:references:mime-version:content-type | |
:content-transfer-encoding; s=default; bh=ivlou45o8QTPhAITztNZZ7 | |
1Hflw=; b=ZwWMdw1X4oN1RgIYf0D6RDw+V8t/AWvkdKjgoQhy/SD7BQPnAgeim1 | |
4bHMPLJMTEbM7+4RH5kblqtwxCW/yK9c3NH2SqlKt3NZFtxKKpHMqVKkb0L63aPy | |
B9WmE9JKJmPeOXwqDWCjJTwDbB3/XC1vjVx8I6p6Dr0yiNsTQTLxU= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Virus-Found: | No |
X-Spam-SWARE-Status: | Yes, score=5.5 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,KAM_THEBAT,LIKELY_SPAM_SUBJECT,SPF_SOFTFAIL autolearn=no version=3.3.2 |
X-HELO: | smtpback.ht-systems.ru |
Date: | Wed, 11 Dec 2013 11:04:39 +0400 |
From: | Andrey Repin <anrdaemon AT yandex DOT ru> |
Reply-To: | Andrey Repin <cygwin AT cygwin DOT com> |
Message-ID: | <238428894.20131211110439@mtu-net.ru> |
To: | Corinna Vinschen <cygwin AT cygwin DOT com> |
Subject: | Re: cant access to files more than 128 utf-8 symbol long names |
In-Reply-To: | <20131210102755.GQ2527@calimero.vinschen.de> |
References: | <52A6BFA4 DOT 9010101 AT spektr-rfs DOT ru> <20131210102755 DOT GQ2527 AT calimero DOT vinschen DOT de> |
MIME-Version: | 1.0 |
X-IsSubscribed: | yes |
Greetings, Corinna Vinschen! > The problem here is about NAME_MAX. NAME_MAX is per POSIX[1] the > "maximum number of bytes in a filename (not including the terminating > null)." Does this mean that POSIX standard is not compatible with real life? No surprise I was having hard times copying a rather simple directory structure to a UNIX servers. Just 2 levels deep with 4-5 words in each element name. > Note the word *bytes*. Not characters, bytes. UTF-8 chars are 1 to 4 > bytes in length. Thus, the maximum number of UTF-8 chars in a filename > is potentially less than NAME_MAX: > A filename of chars only from the basic latin charset (1 byte in UTF-8) > may consist of NAME_MAX characters, a filename solely constructed from > chars of the latin-1 supplement (2 byte chars) may consist of NAME_MAX / > 2 characters, a filename constructed from emoticons (4 byte chars) only > of NAME_MAX / 4 chars. > Ok, so we all know that Windows is not using a byte representation of > filenames, rather the OS uses UTF-16 to store and handle filenames > internally. Filename on Windows filesystems may consist of 255 UTF-16 > chars[2]. > How do you represent this in a byte-oriented POSIX system? What do you > set NAME_MAX to? You can't get it right due to the unfortunate multibyte > vs. UTF-16 encoding issue. > To cover all UTF-8 chars, NAME_MAX would have to be 1020. But then, > applications relying on NAME_MAX will be surprised by ENAMETOOLONG > errors for perfectly valid POSIX filenames. > If you make it 255, applications will be surprised by ENAMETOOLONG > errors for perfectly valid Windows filenames. > If you make it 255 on the application level but then return filenames > longer than 255 multibyte chars to the application, they will crash > due to buffer overflow issues. After all, NAME_MAX is a contractual > obligation. > There was also the backward compatibility issue. Back in the pre-Cygwin > 1.7 days, when Cygwin used the ANSI file API, NAME_MAX was already 255. > Changing that to a bigger value might have resulted in the > aforementioned application crashes due to buffer overflows as well. > So we decided to keep NAME_MAX at the same value as it always was, 255. > This restricts the actual filename length when using multibyte > characters just as on any other POSIX system with the downside that, > occasionally, a Windows filename will be too long to handle. > Sorry if that is frustrating in your current situation, but this > isn't something we can just change at a whim and go ahead. It would > break compatibility with all existing Cygwin executables. > Corinna > [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html > [2] However, this does *not* cover NFS or other filesystems using a > byte representation for storing filenames. -- WBR, Andrey Repin (anrdaemon AT yandex DOT ru) 11.12.2013, <10:55> Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |