delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/08/30/20:53:21

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Sun, 30 Aug 2009 20:52:58 -0400
From: Christopher Faylor <cgf-use-the-mailinglist-please AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: The C locale
Message-ID: <20090831005258.GG2068@ednor.casa.cgf.cx>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <416096c60908300959i1e0084b1xc8f6e65e792b035d AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <416096c60908300959i1e0084b1xc8f6e65e792b035d@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Sun, Aug 30, 2009 at 05:59:11PM +0100, Andy Koppe wrote:
>Trying to reply to Tuomo Valkonen's post about locale issues, I got
>rather confused about the C locale. The manual and the POSIX standard
>say that it supports ASCII only, so in theory anything above 0x7F
>should be rejected. In practice though, both Cygwin 1.5 and 1.7 do
>support characters above 0x7F in the C locale, which could be quite
>useful. Trouble is, they do so rather inconsistenly.
>
>Both in 1.5 and 1.7, the mb conversion functions treat such characters
>as ISO-8859-1. In other words, conversion between chars and wchars are
>simple casts (except that wchars above 0xFF can't be converted). This
>makes some sense.
>
>Filename handling is different though. Cygwin 1.5 translates filenames
>according to the system's ANSI codepage. I guess the inconsistency
>with the mb functions didn't really matter, as the mb functions were
>pretty much useless anyway, and supporting the system codepage was
>more important.
>
>So, with Cygwin 1.7, I'd have expected filename handling in the C
>locale to either use ISO-8859-1 for consistency with the mb functions,
>or the ANSI codepage for compatibility with 1.5. In actual fact
>though, it uses UTF-8.
>
>Is this on purpose? If so, shouldn't the multibyte conversions
>functions in the C locale use UTF-8 as well?

Since Cygin has a clear system that it is supposed to be emulating,
the real question is "What does Linux do?"

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019