Mail Archives: cygwin/2009/12/29/01:18:09
2009/12/28 Andy Koppe:
> 2009/12/28 Rodrigo Medina:
>> Hi,
>> I am moving from cygwin-1.5 and gcc3.4 to cygwin1.7 and gcc4.
>> Some simple programs of mine fail.
>>
>> I am using LC_ALL=3Des_VE.ISO-8859-15.
>>
>> I have reduced the problem to this example
>>
>> --------------
>> #include <stdio.h>
>> main()
>> {
>> static char* line1 =3D
>> " This letter has an accent -->=C3=A1, this one has no accent -->a\n\n";
>> static char* line2 =3D " ***** another line ******\n\n";
>> static char* line3 =3D
>> " These letters have an accent -->=C3=83=C2=A1, these ones have no accen=
t -->A!\n\n";
>> static char* line4 =3D
>> " This letter has an accent -->=C3=83, this one has no accent -->A\n\n";
>> =C2=A0printf(" This letter has an accent -->=C3=A1, this one has no acce=
nt
>> -->a\n\n");
>> =C2=A0printf(line2);
>> =C2=A0printf("%d %d %d\n\n",line1[29],line1[30],line1[31]);
>> =C2=A0printf(line1);
>> =C2=A0printf(line2);
>> =C2=A0printf(" These letters have an accent -->=C3=83=C2=A1, these ones =
have no accent
>> -->A!\n\n");
>> =C2=A0printf(line2);
>> =C2=A0printf("%d %d %d %d\n\n",line3[32],line3[33],line3[34],line3[35]);
>> =C2=A0printf(line3);
>> =C2=A0printf(line2);
>> =C2=A0printf(" This letter has an accent -->=C3=83, this one has no acce=
nt
>> -->A\n\n");
>> =C2=A0printf(line2);
>> =C2=A0printf("%d %d %d\n\n",line4[29],line4[30],line4[31]);
>> =C2=A0printf(line4);
>> =C2=A0printf(line2);
>> =C2=A0printf(" ----- END ------");
>> }----------------
>>
>> My output is:
>>
>> =C2=A0This letter has an accent -->=C3=A1, this one has no accent -->a
>>
>> =C2=A0***** another line ******
>>
>> 62 -31 44
>>
>> =C2=A0This letter has an accent --> ***** another line ******
>>
>> =C2=A0These letters have an accent -->=C3=83=C2=A1, these ones have no a=
ccent -->A!
>>
>> =C2=A0***** another line ******
>>
>> 62 -61 -95 44
>>
>> =C2=A0These letters have an accent -->=C3=83=C2=A1, these ones have no a=
ccent -->A!
>>
>> =C2=A0***** another line ******
>>
>> =C2=A0This letter has an accent -->=C3=83, this one has no accent -->A
>>
>> =C2=A0***** another line ******
>>
>> 62 -61 44
>>
>> =C2=A0This letter has an accent --> ***** another line ******
>>
>> =C2=A0----- END ------
>>
>> As you can see the output of printf(string_constant) is what
>> I expected. The ouput of printf(char_array) is trucated at the non-ASCII
>> character.
>
> Reproduced. Looking at the compiler's assembly output, some of the
> printf() calls are replaced by calls to puts(), and those do work
> correctly, whereas the remaining printf() calls with accented
> characters misbehave. So printf()'s handling of non-ASCII characters
> needs a closer look.
Ah, the problem actually is that your program is missing a call to
setlocale(LC_CTYPE, "") to switch to the locale and character set
specified in the environment. In fact, since your program contains
hard-coded ISO-8859-15 strings, you should probably do
setlocale(LC_CTYPE, "<whatever>.ISO-8859-15").
Without a setlocale call, programs use the "C" locale, and on Cygwin
1.7 that implies the UTF-8 character set. Those single accented
ISO-8859-15 characters are invalid when interpreted as UTF-8, so
printf halts there. The accented character pairs like "=C3=83=C2=A1", meanw=
hile,
happen to be valid UTF-8, so they get through.
I couldn't find specific text about invalid bytes in the POSIX printf
spec, but it does say the following: "The format is a character
string, beginning and ending in its initial shift state, if any. The
format is composed of zero or more directives: ordinary characters,
which are simply copied to the output stream, and conversion
specifications, each of which shall result in the fetching of zero or
more arguments."
It's talking about "characters" rather than "bytes" there, which I
think does leave the behaviour for invalid bytes undefined, so
newlib's printf implementation is in its rights to just stop
processing the string at one of those.
Andy
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -