X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.1 required=5.0 tests=AWL,BAYES_50,SPF_PASS X-Spam-Check-By: sourceware.org X-DNSBL-MILTER: Passed X-Matched-Lists: [] Message-ID: <380-22009123301337494@cantv.net> Reply-To: rodmedina AT cantv DOT net From: "Rodrigo Medina" To: cygwin AT cygwin DOT com Subject: Re: gcc4[1.7] printf treats differently a string constant and a character array Date: Wed, 30 Dec 2009 09:07:04 -0430 MIME-Version: 1.0 Content-type: text/plain; charset=ISO-8859-1 Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Hi, Eric Blake on Dec 2009 06:41:33 wrote: >According to Andy Koppe on 12/29/2009 6:30 AM: >>> Remember, POSIX states that any use in a character context of bytes with >>> the 8th-bit set is specifically undefined in the C locale (whether that be >>> C.ASCII or C.UTF-8). >> >> I very much disagree with that. C.ASCII and C.UTF-8 are different >> locales from plain "C", and the whole point of the explicitly stated >> charset is to define the meaning of bytes beyond 7-bit ASCII. >Point taken: an explicit "C.UTF-8" is a request of a specific charset >along with C semantics (such as no translation of output messages, >posix-mandated formatting for time and money, ...), but because the >charset is explicit, the use of 8-bit bytes is well-defined in our >implementation (and since POSIX does not specify C.UTF-8, you've already >left the realm of portability and gone into implementation-defined). >But my point remains: an explicit "C" is specified to be charset-agnostic, >so a portable program requesting "C" should not be expecting any >particular behavior of 8-bit bytes in character contexts. Programs that >use LC_ALL=C to try to get 8-bit transparency from character contexts are >flat-out non-portable. They get other well-defined benefits on 8-bit >bytes (such as sorting by strcmp instead of strcoll, fixed-format >messages, ...), but only insofar as those 8-bit bytes are in byte contexts >rather than character contexts. Some comments: 1- I think that printf(string_constant) and printf(char_array) should give the same output in any circumstance. 2- In absence of a call to setlocale printf((string_constant) writes according to the locale of the environment, but printf(char_array) does not, even though it is affected by the locale of the environment. 3- I think that a program that was written for locale=C should work without modification if the locale in the environment is any of the one-byte characters ones. 4- I think that a plain C (8-bit transparent) locale should be available, even if it is not the default one. RM -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple