Mail Archives: cygwin/2011/01/25/06:15:38
On Jan 24 22:09, Charles Wilson wrote:
> On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
> > Here's what happens on Cygwin:
> >
> > $ gcc -g -o ic ic.c -liconv
> > $ ./ic
> > iconv: 138 <Invalid or incomplete multibyte or wide character>
> > in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
> > iconv: 138 <Invalid or incomplete multibyte or wide character>
> > in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
> > iconv: 138 <Invalid or incomplete multibyte or wide character>
> > in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
> > in = <Liian pitkä sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
>
> Confirmed.
>
> > So, AFAICS, there are two problems:
> >
> > - Even though iconv_open has been opened explicitely with "UTF-8" as
> > input string, the conversion still depends on the current application
> > codeset. That dsoesn't make sense.
> >
> > - Even though the last parameter to iconv is defined in bytes, the
> > value of outbytesleft after the conversion is the number of remaining
> > wchar"t's, not the number of remaining bytes. That's contrary to what
> > POSIX defines, see
> > http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
> >
> > Is this analyzes correct? Is there by any chance a newer version of
> > libiconv2 which does not have these problems?
>
> Well, iconv's behavior is very dependent on detailed characteristics of
> the system on which it was compiled -- e.g. it's very finicky about the
> platform's behavior vis character sets.
Ok, but that doesn't mean it has to stumble over its own feet if the
current locale's codeset is different from the codeset which has to
be converted.
I found that gencat uses the return value of the nl_langinfo call
after it called setlocale, like this:
setlocale (LC_ALL, "");
codeset = nl_langinfo (CODESET);
setlocale (LC_ALL, "C");
[...]
This is plain wrong. See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html
"Calls to setlocale() with a category corresponding to the category of
item (see <langinfo.h>), or to the category LC_ALL , may overwrite the
array pointed to by the return value."
That's what happens in newlib, but not in glibc. Maybe that's
libiconv's problem as well?
I also found that
iconv_close ((iconv_t) -1);
crashes the application with a SEGV. It's clearly the fault of the
application, but it doesn't deserve a SEGV, imho.
FYI, I examined the libiconv sources cursorily, and I found a couple of
code snippets with Cygwin-specific code which is rather questionable.
- Why on earth is libiconv on Cygwin using Windows functions in some
places?
- libcharset/lib/relocatable.c
- srclib/progreloc.c
- srclib/relocatable.c
- lib/relocatable.c
- libcharset/lib/relocatable.c and srclib/relocatable.c define their own
DllMain and use Windows functions. And the old
cygwin_conv_to_posix_path function as well.
- The usage of a fixed table instaed of the charset.alias file in
libcharset/lib/localcharset.c, function get_charset_aliases() is
not good, not good at all.
- Same file, function locale_charset() contains old Cygwin-specific
code which is outdated. AFAICS it shouldn't hurt, though, since
Cygwin no longer returns "US-ASCII".
- lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
ei_ucs2internal encoding table. I'm not sure if that's right or
wrong, but it looks worrying. Please note that I defined
__STDC_ISO_10646__ for Cygwin 1.7.8 yesterday. This define is missing
since 1.7.2.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -