X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_SOFTFAIL X-Spam-Check-By: sourceware.org Message-ID: <4B3A070D.4080407@byu.net> Date: Tue, 29 Dec 2009 06:41:33 -0700 From: Eric Blake User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: gcc4[1.7] printf treats differently a string constant and a character array References: <380-2200912128193944786 AT cantv DOT net> <416096c60912281437o16aec4cct8b64b7518d9a9a1 AT mail DOT gmail DOT com> <416096c60912282217h57cf311h6af5d98ff9580f0 AT mail DOT gmail DOT com> <4B3A0246 DOT 4050705 AT byu DOT net> <416096c60912290530m4d70e587iad6d551b231d9776 AT mail DOT gmail DOT com> In-Reply-To: <416096c60912290530m4d70e587iad6d551b231d9776@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Andy Koppe on 12/29/2009 6:30 AM: >> Remember, POSIX states that any use in a character context of bytes with >> the 8th-bit set is specifically undefined in the C locale (whether that be >> C.ASCII or C.UTF-8). > > I very much disagree with that. C.ASCII and C.UTF-8 are different > locales from plain "C", and the whole point of the explicitly stated > charset is to define the meaning of bytes beyond 7-bit ASCII. Point taken: an explicit "C.UTF-8" is a request of a specific charset along with C semantics (such as no translation of output messages, posix-mandated formatting for time and money, ...), but because the charset is explicit, the use of 8-bit bytes is well-defined in our implementation (and since POSIX does not specify C.UTF-8, you've already left the realm of portability and gone into implementation-defined). But my point remains: an explicit "C" is specified to be charset-agnostic, so a portable program requesting "C" should not be expecting any particular behavior of 8-bit bytes in character contexts. Programs that use LC_ALL=C to try to get 8-bit transparency from character contexts are flat-out non-portable. They get other well-defined benefits on 8-bit bytes (such as sorting by strcmp instead of strcoll, fixed-format messages, ...), but only insofar as those 8-bit bytes are in byte contexts rather than character contexts. - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9 AT byu DOT net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAks6Bw0ACgkQ84KuGfSFAYByhQCZAWbgggdJm5KBtBfNm9ElHmJN p14AoMoKgy2XxhNqnV/KxuFVyttbp+m6 =eLYn -----END PGP SIGNATURE----- -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple