Mail Archives: cygwin/2011/01/25/10:04:50
On 1/25/2011 6:15 AM, Corinna Vinschen wrote:
> On Jan 24 22:09, Charles Wilson wrote:
>> On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
>>> So, AFAICS, there are two problems:
>>>
>>> - Even though iconv_open has been opened explicitely with "UTF-8" as
>>> input string, the conversion still depends on the current application
>>> codeset. That dsoesn't make sense.
>>>
>>> - Even though the last parameter to iconv is defined in bytes, the
>>> value of outbytesleft after the conversion is the number of remaining
>>> wchar"t's, not the number of remaining bytes. That's contrary to what
>>> POSIX defines, see
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
>>>
>>> Is this analyzes correct? Is there by any chance a newer version of
>>> libiconv2 which does not have these problems?
>>
>> Well, iconv's behavior is very dependent on detailed characteristics of
>> the system on which it was compiled -- e.g. it's very finicky about the
>> platform's behavior vis character sets.
>
> Ok, but that doesn't mean it has to stumble over its own feet if the
> current locale's codeset is different from the codeset which has to
> be converted.
True, of course. I was just thinking that *maybe* just recompiling
libiconv now that cygwin's i18n stuff has become more stable...might help.
> I found that gencat uses the return value of the nl_langinfo call
> after it called setlocale, like this:
>
> setlocale (LC_ALL, "");
> codeset = nl_langinfo (CODESET);
> setlocale (LC_ALL, "C");
> [...]
>
> This is plain wrong. See
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html
>
> "Calls to setlocale() with a category corresponding to the category of
> item (see <langinfo.h>), or to the category LC_ALL , may overwrite the
> array pointed to by the return value."
>
> That's what happens in newlib, but not in glibc. Maybe that's
> libiconv's problem as well?
Hmm. I'll look for that.
> I also found that
>
> iconv_close ((iconv_t) -1);
>
> crashes the application with a SEGV. It's clearly the fault of the
> application, but it doesn't deserve a SEGV, imho.
Yeah, that's bad.
> FYI, I examined the libiconv sources cursorily, and I found a couple of
> code snippets with Cygwin-specific code which is rather questionable.
>
> - Why on earth is libiconv on Cygwin using Windows functions in some
> places?
>
> - libcharset/lib/relocatable.c
> - srclib/progreloc.c
> - srclib/relocatable.c
> - lib/relocatable.c
whoo boy. That's...a long story. It's all part of Bruno's magic
relocatability machinery. However, on cygwin it should be using unixish
mechanisms (at least for exe's -- looking at /proc/$pid/exe. For
DLLs...I think it needs to keep using the DllMain approach).
> - libcharset/lib/relocatable.c and srclib/relocatable.c define their own
> DllMain and use Windows functions. And the old
> cygwin_conv_to_posix_path function as well.
Well, yes. It's how DLLs determine their installation path, so they can
then automatically deduce the relative path to <whatever>. And since
that requires using a win32 function (GetModuleFileName) it needs to
convert to cygwin format. These days it ought to use the new
functions...I'll prepare a gnulib patch, and from there it will work its
way down into libiconv/gettext. I'm not sure if the gnulib guys want to
preserve compat with 1.5 (e.g. check for cygwin_conv_path() and only use
if present, otherwise use deprecated?) or not.
> - The usage of a fixed table instaed of the charset.alias file in
> libcharset/lib/localcharset.c, function get_charset_aliases() is
> not good, not good at all.
Yeah, you're right. It looks like there's been some bitrot with respect
to some of the "&& !CYGWIN" guards on WIN32. Both libiconv and gettext,
IIRC, jump thru hoops to ensure that [_]*WIN32 is defined for both
"regular" win32 and for cygwin...which means defined(CYGWIN) guards are
necessary.
> - Same file, function locale_charset() contains old Cygwin-specific
> code which is outdated. AFAICS it shouldn't hurt, though, since
> Cygwin no longer returns "US-ASCII".
>
> - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
> ei_ucs2internal encoding table. I'm not sure if that's right or
> wrong, but it looks worrying.
Well, remember (A) upstream libiconv itself hasn't been updated since
30-Jun-2009, which predated cygwin 1.7.1 (23 Dec 2009), and (B) even our
most recent version (1.13.1-1) was released almost simultaneously (23
Dec 2009) -- and there was a LOT of shakeup in all that stuff from 1.7.1
thru 1.7.5.
> Please note that I defined
> __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday. This define is
> missing since 1.7.2.
Hmmm...maybe I should (re)build libiconv against a snapshot?
I don't routinely use extended character sets, and have to rely on the
test suites. They passed, so...I thought good enough. Perhaps not...
--
Chuck
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -