delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/08/30/12:59:29

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Date: Sun, 30 Aug 2009 17:59:11 +0100
Message-ID: <416096c60908300959i1e0084b1xc8f6e65e792b035d@mail.gmail.com>
Subject: The C locale
From: Andy Koppe <andy DOT koppe AT gmail DOT com>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Trying to reply to Tuomo Valkonen's post about locale issues, I got
rather confused about the C locale. The manual and the POSIX standard
say that it supports ASCII only, so in theory anything above 0x7F
should be rejected. In practice though, both Cygwin 1.5 and 1.7 do
support characters above 0x7F in the C locale, which could be quite
useful. Trouble is, they do so rather inconsistenly.

Both in 1.5 and 1.7, the mb conversion functions treat such characters
as ISO-8859-1. In other words, conversion between chars and wchars are
simple casts (except that wchars above 0xFF can't be converted). This
makes some sense.

Filename handling is different though. Cygwin 1.5 translates filenames
according to the system's ANSI codepage. I guess the inconsistency
with the mb functions didn't really matter, as the mb functions were
pretty much useless anyway, and supporting the system codepage was
more important.

So, with Cygwin 1.7, I'd have expected filename handling in the C
locale to either use ISO-8859-1 for consistency with the mb functions,
or the ANSI codepage for compatibility with 1.5. In actual fact
though, it uses UTF-8.

Is this on purpose? If so, shouldn't the multibyte conversions
functions in the C locale use UTF-8 as well?

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019