X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Message-ID: <4D3EE67D.3000900@cwilson.fastmail.fm> Date: Tue, 25 Jan 2011 10:04:29 -0500 From: Charles Wilson Reply-To: Charles Wilson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Bug in libiconv? References: <20110124154158 DOT GA15279 AT calimero DOT vinschen DOT de> <4D3E3EF6 DOT 7010501 AT cwilson DOT fastmail DOT fm> <20110125111507 DOT GA28470 AT calimero DOT vinschen DOT de> In-Reply-To: <20110125111507.GA28470@calimero.vinschen.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 1/25/2011 6:15 AM, Corinna Vinschen wrote: > On Jan 24 22:09, Charles Wilson wrote: >> On 1/24/2011 10:41 AM, Corinna Vinschen wrote: >>> So, AFAICS, there are two problems: >>> >>> - Even though iconv_open has been opened explicitely with "UTF-8" as >>> input string, the conversion still depends on the current application >>> codeset. That dsoesn't make sense. >>> >>> - Even though the last parameter to iconv is defined in bytes, the >>> value of outbytesleft after the conversion is the number of remaining >>> wchar"t's, not the number of remaining bytes. That's contrary to what >>> POSIX defines, see >>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html >>> >>> Is this analyzes correct? Is there by any chance a newer version of >>> libiconv2 which does not have these problems? >> >> Well, iconv's behavior is very dependent on detailed characteristics of >> the system on which it was compiled -- e.g. it's very finicky about the >> platform's behavior vis character sets. > > Ok, but that doesn't mean it has to stumble over its own feet if the > current locale's codeset is different from the codeset which has to > be converted. True, of course. I was just thinking that *maybe* just recompiling libiconv now that cygwin's i18n stuff has become more stable...might help. > I found that gencat uses the return value of the nl_langinfo call > after it called setlocale, like this: > > setlocale (LC_ALL, ""); > codeset = nl_langinfo (CODESET); > setlocale (LC_ALL, "C"); > [...] > > This is plain wrong. See > http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html > > "Calls to setlocale() with a category corresponding to the category of > item (see ), or to the category LC_ALL , may overwrite the > array pointed to by the return value." > > That's what happens in newlib, but not in glibc. Maybe that's > libiconv's problem as well? Hmm. I'll look for that. > I also found that > > iconv_close ((iconv_t) -1); > > crashes the application with a SEGV. It's clearly the fault of the > application, but it doesn't deserve a SEGV, imho. Yeah, that's bad. > FYI, I examined the libiconv sources cursorily, and I found a couple of > code snippets with Cygwin-specific code which is rather questionable. > > - Why on earth is libiconv on Cygwin using Windows functions in some > places? > > - libcharset/lib/relocatable.c > - srclib/progreloc.c > - srclib/relocatable.c > - lib/relocatable.c whoo boy. That's...a long story. It's all part of Bruno's magic relocatability machinery. However, on cygwin it should be using unixish mechanisms (at least for exe's -- looking at /proc/$pid/exe. For DLLs...I think it needs to keep using the DllMain approach). > - libcharset/lib/relocatable.c and srclib/relocatable.c define their own > DllMain and use Windows functions. And the old > cygwin_conv_to_posix_path function as well. Well, yes. It's how DLLs determine their installation path, so they can then automatically deduce the relative path to . And since that requires using a win32 function (GetModuleFileName) it needs to convert to cygwin format. These days it ought to use the new functions...I'll prepare a gnulib patch, and from there it will work its way down into libiconv/gettext. I'm not sure if the gnulib guys want to preserve compat with 1.5 (e.g. check for cygwin_conv_path() and only use if present, otherwise use deprecated?) or not. > - The usage of a fixed table instaed of the charset.alias file in > libcharset/lib/localcharset.c, function get_charset_aliases() is > not good, not good at all. Yeah, you're right. It looks like there's been some bitrot with respect to some of the "&& !CYGWIN" guards on WIN32. Both libiconv and gettext, IIRC, jump thru hoops to ensure that [_]*WIN32 is defined for both "regular" win32 and for cygwin...which means defined(CYGWIN) guards are necessary. > - Same file, function locale_charset() contains old Cygwin-specific > code which is outdated. AFAICS it shouldn't hurt, though, since > Cygwin no longer returns "US-ASCII". > > - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the > ei_ucs2internal encoding table. I'm not sure if that's right or > wrong, but it looks worrying. Well, remember (A) upstream libiconv itself hasn't been updated since 30-Jun-2009, which predated cygwin 1.7.1 (23 Dec 2009), and (B) even our most recent version (1.13.1-1) was released almost simultaneously (23 Dec 2009) -- and there was a LOT of shakeup in all that stuff from 1.7.1 thru 1.7.5. > Please note that I defined > __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday. This define is > missing since 1.7.2. Hmmm...maybe I should (re)build libiconv against a snapshot? I don't routinely use extended character sets, and have to rely on the test suites. They passed, so...I thought good enough. Perhaps not... -- Chuck -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple