delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2013/12/11/02:05:45

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:reply-to:message-id:to:subject
:in-reply-to:references:mime-version:content-type
:content-transfer-encoding; q=dns; s=default; b=IwSylVvORZJ4ZBCQ
jkggBOnzhW4ADA5nha5uyVBPZBCDnVfwiR/6odosOUGviYOvNsW+dH1pjtTq+qRo
CocUdnXzhkgVKf8WpSzmNWpHaV5jPAJKW31AxLp9AQRGsC6S9GsXF1fe7YMSHo8p
H+khLaO6NG8nT+GqzV8gui3BuBY=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:reply-to:message-id:to:subject
:in-reply-to:references:mime-version:content-type
:content-transfer-encoding; s=default; bh=ivlou45o8QTPhAITztNZZ7
1Hflw=; b=ZwWMdw1X4oN1RgIYf0D6RDw+V8t/AWvkdKjgoQhy/SD7BQPnAgeim1
4bHMPLJMTEbM7+4RH5kblqtwxCW/yK9c3NH2SqlKt3NZFtxKKpHMqVKkb0L63aPy
B9WmE9JKJmPeOXwqDWCjJTwDbB3/XC1vjVx8I6p6Dr0yiNsTQTLxU=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: Yes, score=5.5 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,KAM_THEBAT,LIKELY_SPAM_SUBJECT,SPF_SOFTFAIL autolearn=no version=3.3.2
X-HELO: smtpback.ht-systems.ru
Date: Wed, 11 Dec 2013 11:04:39 +0400
From: Andrey Repin <anrdaemon AT yandex DOT ru>
Reply-To: Andrey Repin <cygwin AT cygwin DOT com>
Message-ID: <238428894.20131211110439@mtu-net.ru>
To: Corinna Vinschen <cygwin AT cygwin DOT com>
Subject: Re: cant access to files more than 128 utf-8 symbol long names
In-Reply-To: <20131210102755.GQ2527@calimero.vinschen.de>
References: <52A6BFA4 DOT 9010101 AT spektr-rfs DOT ru> <20131210102755 DOT GQ2527 AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
X-IsSubscribed: yes

Greetings, Corinna Vinschen!

> The problem here is about NAME_MAX.  NAME_MAX is per POSIX[1] the
> "maximum number of bytes in a filename (not including the terminating
> null)."

Does this mean that POSIX standard is not compatible with real life?
No surprise I was having hard times copying a rather simple directory
structure to a UNIX servers. Just 2 levels deep with 4-5 words in each
element name.

> Note the word *bytes*.  Not characters, bytes. UTF-8 chars are 1 to 4
> bytes in length.  Thus, the maximum number of UTF-8 chars in a filename
> is potentially less than NAME_MAX:

> A filename of chars only from the basic latin charset (1 byte in UTF-8)
> may consist of NAME_MAX characters, a filename solely constructed from
> chars of the latin-1 supplement (2 byte chars) may consist of NAME_MAX /
> 2 characters, a filename constructed from emoticons (4 byte chars) only
> of NAME_MAX / 4 chars.

> Ok, so we all know that Windows is not using a byte representation of
> filenames, rather the OS uses UTF-16 to store and handle filenames
> internally.  Filename on Windows filesystems may consist of 255 UTF-16
> chars[2].

> How do you represent this in a byte-oriented POSIX system?  What do you
> set NAME_MAX to?  You can't get it right due to the unfortunate multibyte
> vs. UTF-16 encoding issue.

> To cover all UTF-8 chars, NAME_MAX would have to be 1020.  But then,
> applications relying on NAME_MAX will be surprised by ENAMETOOLONG
> errors for perfectly valid POSIX filenames.

> If you make it 255, applications will be surprised by ENAMETOOLONG
> errors for perfectly valid Windows filenames.

> If you make it 255 on the application level but then return filenames
> longer than 255 multibyte chars to the application, they will crash
> due to buffer overflow issues.  After all, NAME_MAX is a contractual
> obligation.

> There was also the backward compatibility issue.  Back in the pre-Cygwin
> 1.7 days, when Cygwin used the ANSI file API, NAME_MAX was already 255.
> Changing that to a bigger value might have resulted in the
> aforementioned application crashes due to buffer overflows as well.

> So we decided to keep NAME_MAX at the same value as it always was, 255.
> This restricts the actual filename length when using multibyte
> characters just as on any other POSIX system with the downside that,
> occasionally, a Windows filename will be too long to handle.

> Sorry if that is frustrating in your current situation, but this
> isn't something we can just change at a whim and go ahead.  It would
> break compatibility with all existing Cygwin executables.


> Corinna


> [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html
> [2] However, this does *not* cover NFS or other filesystems using a
>     byte representation for storing filenames.




--
WBR,
Andrey Repin (anrdaemon AT yandex DOT ru) 11.12.2013, <10:55>

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019