delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/01/25/10:04:50

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Message-ID: <4D3EE67D.3000900@cwilson.fastmail.fm>
Date: Tue, 25 Jan 2011 10:04:29 -0500
From: Charles Wilson <cygwin AT cwilson DOT fastmail DOT fm>
Reply-To: Charles Wilson <cygwin AT cwilson DOT fastmail DOT fm>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: Bug in libiconv?
References: <20110124154158 DOT GA15279 AT calimero DOT vinschen DOT de> <4D3E3EF6 DOT 7010501 AT cwilson DOT fastmail DOT fm> <20110125111507 DOT GA28470 AT calimero DOT vinschen DOT de>
In-Reply-To: <20110125111507.GA28470@calimero.vinschen.de>
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On 1/25/2011 6:15 AM, Corinna Vinschen wrote:
> On Jan 24 22:09, Charles Wilson wrote:
>> On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
>>> So, AFAICS, there are two problems:
>>>
>>>   - Even though iconv_open has been opened explicitely with "UTF-8" as
>>>     input string, the conversion still depends on the current application
>>>     codeset.  That dsoesn't make sense.
>>>
>>>   - Even though the last parameter to iconv is defined in bytes, the
>>>     value of outbytesleft after the conversion is the number of remaining
>>>     wchar"t's, not the number of remaining bytes.  That's contrary to what
>>>     POSIX defines, see
>>>     http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
>>>
>>> Is this analyzes correct?  Is there by any chance a newer version of
>>> libiconv2 which does not have these problems?
>>
>> Well, iconv's behavior is very dependent on detailed characteristics of
>> the system on which it was compiled -- e.g. it's very finicky about the
>> platform's behavior vis character sets.
> 
> Ok, but that doesn't mean it has to stumble over its own feet if the
> current locale's codeset is different from the codeset which has to
> be converted.

True, of course. I was just thinking that *maybe* just recompiling
libiconv now that cygwin's i18n stuff has become more stable...might help.

> I found that gencat uses the return value of the nl_langinfo call
> after it called setlocale, like this:
> 
>   setlocale (LC_ALL, "");
>   codeset = nl_langinfo (CODESET);
>   setlocale (LC_ALL, "C");
>   [...]
> 
> This is plain wrong.  See
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html
> 
>   "Calls to setlocale() with a category corresponding to the category of
>    item (see <langinfo.h>), or to the category LC_ALL , may overwrite the
>    array pointed to by the return value."
> 
> That's what happens in newlib, but not in glibc.  Maybe that's
> libiconv's problem as well?

Hmm. I'll look for that.

> I also found that
> 
>   iconv_close ((iconv_t) -1);
> 
> crashes the application with a SEGV.  It's clearly the fault of the
> application, but it doesn't deserve a SEGV, imho.

Yeah, that's bad.

> FYI, I examined the libiconv sources cursorily, and I found a couple of
> code snippets with Cygwin-specific code which is rather questionable.
> 
> - Why on earth is libiconv on Cygwin using Windows functions in some
>   places?
> 
>   - libcharset/lib/relocatable.c
>   - srclib/progreloc.c
>   - srclib/relocatable.c
>   - lib/relocatable.c

whoo boy. That's...a long story. It's all part of Bruno's magic
relocatability machinery.  However, on cygwin it should be using unixish
mechanisms (at least for exe's -- looking at /proc/$pid/exe.  For
DLLs...I think it needs to keep using the DllMain approach).

> - libcharset/lib/relocatable.c and srclib/relocatable.c define their own
>   DllMain and use Windows functions.  And the old
>   cygwin_conv_to_posix_path function as well.

Well, yes.  It's how DLLs determine their installation path, so they can
then automatically deduce the relative path to <whatever>.  And since
that requires using a win32 function (GetModuleFileName) it needs to
convert to cygwin format.  These days it ought to use the new
functions...I'll prepare a gnulib patch, and from there it will work its
way down into libiconv/gettext.  I'm not sure if the gnulib guys want to
preserve compat with 1.5 (e.g. check for cygwin_conv_path() and only use
if present, otherwise use deprecated?) or not.

> - The usage of a fixed table instaed of the charset.alias file in
>   libcharset/lib/localcharset.c, function get_charset_aliases() is
>   not good, not good at all.

Yeah, you're right.  It looks like there's been some bitrot with respect
to some of the "&& !CYGWIN" guards on WIN32.  Both libiconv and gettext,
IIRC, jump thru hoops to ensure that [_]*WIN32 is defined for both
"regular" win32 and for cygwin...which means defined(CYGWIN) guards are
necessary.

> - Same file, function locale_charset() contains old Cygwin-specific
>   code which is outdated.  AFAICS it shouldn't hurt, though, since
>   Cygwin no longer returns "US-ASCII".
> 
> - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
>   ei_ucs2internal encoding table.  I'm not sure if that's right or
>   wrong, but it looks worrying.

Well, remember (A) upstream libiconv itself hasn't been updated since
30-Jun-2009, which predated cygwin 1.7.1 (23 Dec 2009), and (B) even our
most recent version (1.13.1-1) was released almost simultaneously (23
Dec 2009) -- and there was a LOT of shakeup in all that stuff from 1.7.1
thru 1.7.5.

> Please note that I defined
> __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is
> missing since 1.7.2.

Hmmm...maybe I should (re)build libiconv against a snapshot?

I don't routinely use extended character sets, and have to rely on the
test suites.  They passed, so...I thought good enough.  Perhaps not...

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019