delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/03/02/07:26:40

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Wed, 2 Mar 2011 13:26:16 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: readdir truncates file names whose UTF-8 representation is longer than 255 bytes
Message-ID: <20110302122616.GR22240@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <DDB181A6E7B893429D83CAC8DDBA74F1114B753028 AT VMBX108 DOT ihostexchange DOT net>
MIME-Version: 1.0
In-Reply-To: <DDB181A6E7B893429D83CAC8DDBA74F1114B753028@VMBX108.ihostexchange.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Mar  2 06:56, Uri Simchoni wrote:
> Hi,
> I'm using Cygwin 1.7.7 in UTF-8 mode. I have a file whose name is composed of Hebrew character, so the UTF-8 representation is longer than 255 characters.
> Trying "ls -l" fails to list the file's attributes.
> Using a short C program that loops through a directory (readdir()/stat()) shows that readdir() truncates the file name.
> Is there any way around it? (using environment variable, fstab or system call other readdir - I want to keep UTF-8)

I don't think there's a way around this, at least not an easy one for
Cygwin.  The problem is that the dirent structure has no room for a
multibyte string of more than 255 bytes, while the underlying OS
provides filenames with up to 255 UTF-16 chars.

To support that, we would have to raise the size of a single dirent so
that it allows names with at least 512 bytes, but even that might be too
short, 1024 would be required.  That's not exactly an easy change, so we
won't do that any time soon, I think.

The only solution for now is to switch to another charset or to
shorten the filenames for now.


Sorry for not having better news,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019