delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/12/29/08:41:16

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_SOFTFAIL
X-Spam-Check-By: sourceware.org
Message-ID: <4B3A070D.4080407@byu.net>
Date: Tue, 29 Dec 2009 06:41:33 -0700
From: Eric Blake <ebb9 AT byu DOT net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: gcc4[1.7] printf treats differently a string constant and a character array
References: <380-2200912128193944786 AT cantv DOT net> <416096c60912281437o16aec4cct8b64b7518d9a9a1 AT mail DOT gmail DOT com> <416096c60912282217h57cf311h6af5d98ff9580f0 AT mail DOT gmail DOT com> <4B3A0246 DOT 4050705 AT byu DOT net> <416096c60912290530m4d70e587iad6d551b231d9776 AT mail DOT gmail DOT com>
In-Reply-To: <416096c60912290530m4d70e587iad6d551b231d9776@mail.gmail.com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Andy Koppe on 12/29/2009 6:30 AM:
>> Remember, POSIX states that any use in a character context of bytes with
>> the 8th-bit set is specifically undefined in the C locale (whether that be
>> C.ASCII or C.UTF-8).
> 
> I very much disagree with that. C.ASCII and C.UTF-8 are different
> locales from plain "C", and the whole point of the explicitly stated
> charset is to define the meaning of bytes beyond 7-bit ASCII.

Point taken: an explicit "C.UTF-8" is a request of a specific charset
along with C semantics (such as no translation of output messages,
posix-mandated formatting for time and money, ...), but because the
charset is explicit, the use of 8-bit bytes is well-defined in our
implementation (and since POSIX does not specify C.UTF-8, you've already
left the realm of portability and gone into implementation-defined).

But my point remains: an explicit "C" is specified to be charset-agnostic,
so a portable program requesting "C" should not be expecting any
particular behavior of 8-bit bytes in character contexts.  Programs that
use LC_ALL=C to try to get 8-bit transparency from character contexts are
flat-out non-portable.  They get other well-defined benefits on 8-bit
bytes (such as sorting by strcmp instead of strcoll, fixed-format
messages, ...), but only insofar as those 8-bit bytes are in byte contexts
rather than character contexts.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             ebb9 AT byu DOT net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAks6Bw0ACgkQ84KuGfSFAYByhQCZAWbgggdJm5KBtBfNm9ElHmJN
p14AoMoKgy2XxhNqnV/KxuFVyttbp+m6
=eLYn
-----END PGP SIGNATURE-----

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019