delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/12/29/08:31:02

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <4B3A0246.4050705@byu.net>
References: <380-2200912128193944786 AT cantv DOT net> <416096c60912281437o16aec4cct8b64b7518d9a9a1 AT mail DOT gmail DOT com> <416096c60912282217h57cf311h6af5d98ff9580f0 AT mail DOT gmail DOT com> <4B3A0246 DOT 4050705 AT byu DOT net>
Date: Tue, 29 Dec 2009 13:30:51 +0000
Message-ID: <416096c60912290530m4d70e587iad6d551b231d9776@mail.gmail.com>
Subject: Re: gcc4[1.7] printf treats differently a string constant and a character array
From: Andy Koppe <andy DOT koppe AT gmail DOT com>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

2009/12/29 Eric Blake:
>> I couldn't find specific text about invalid bytes in the POSIX printf
>> spec,
>
> http://www.opengroup.org/onlinepubs/9699919799/functions/fprintf.html
>
> "all forms of fprintf() shall fail if:
>
> [EILSEQ]
> =C2=A0 =C2=A0[CX] A wide-character code that does not correspond to a val=
id
> character has been detected."

The issue wasn't with wide characters, but invalid multibyte chars.
But anyway, we're agreed that printf is right to bail out.


> Remember, POSIX states that any use in a character context of bytes with
> the 8th-bit set is specifically undefined in the C locale (whether that be
> C.ASCII or C.UTF-8).

I very much disagree with that. C.ASCII and C.UTF-8 are different
locales from plain "C", and the whole point of the explicitly stated
charset is to define the meaning of bytes beyond 7-bit ASCII.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019