X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Tue, 25 Jan 2011 12:15:07 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Bug in libiconv? Message-ID: <20110125111507.GA28470@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20110124154158 DOT GA15279 AT calimero DOT vinschen DOT de> <4D3E3EF6 DOT 7010501 AT cwilson DOT fastmail DOT fm> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4D3E3EF6.7010501@cwilson.fastmail.fm> User-Agent: Mutt/1.5.21 (2010-09-15) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Jan 24 22:09, Charles Wilson wrote: > On 1/24/2011 10:41 AM, Corinna Vinschen wrote: > > Here's what happens on Cygwin: > > > > $ gcc -g -o ic ic.c -liconv > > $ ./ic > > iconv: 138 > > in = , inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492 > > iconv: 138 > > in = , inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492 > > iconv: 138 > > in = , inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492 > > in = , inbuf = <>, inbytesleft = 0, outbytesleft = 480 > > Confirmed. > > > So, AFAICS, there are two problems: > > > > - Even though iconv_open has been opened explicitely with "UTF-8" as > > input string, the conversion still depends on the current application > > codeset. That dsoesn't make sense. > > > > - Even though the last parameter to iconv is defined in bytes, the > > value of outbytesleft after the conversion is the number of remaining > > wchar"t's, not the number of remaining bytes. That's contrary to what > > POSIX defines, see > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html > > > > Is this analyzes correct? Is there by any chance a newer version of > > libiconv2 which does not have these problems? > > Well, iconv's behavior is very dependent on detailed characteristics of > the system on which it was compiled -- e.g. it's very finicky about the > platform's behavior vis character sets. Ok, but that doesn't mean it has to stumble over its own feet if the current locale's codeset is different from the codeset which has to be converted. I found that gencat uses the return value of the nl_langinfo call after it called setlocale, like this: setlocale (LC_ALL, ""); codeset = nl_langinfo (CODESET); setlocale (LC_ALL, "C"); [...] This is plain wrong. See http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html "Calls to setlocale() with a category corresponding to the category of item (see ), or to the category LC_ALL , may overwrite the array pointed to by the return value." That's what happens in newlib, but not in glibc. Maybe that's libiconv's problem as well? I also found that iconv_close ((iconv_t) -1); crashes the application with a SEGV. It's clearly the fault of the application, but it doesn't deserve a SEGV, imho. FYI, I examined the libiconv sources cursorily, and I found a couple of code snippets with Cygwin-specific code which is rather questionable. - Why on earth is libiconv on Cygwin using Windows functions in some places? - libcharset/lib/relocatable.c - srclib/progreloc.c - srclib/relocatable.c - lib/relocatable.c - libcharset/lib/relocatable.c and srclib/relocatable.c define their own DllMain and use Windows functions. And the old cygwin_conv_to_posix_path function as well. - The usage of a fixed table instaed of the charset.alias file in libcharset/lib/localcharset.c, function get_charset_aliases() is not good, not good at all. - Same file, function locale_charset() contains old Cygwin-specific code which is outdated. AFAICS it shouldn't hurt, though, since Cygwin no longer returns "US-ASCII". - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the ei_ucs2internal encoding table. I'm not sure if that's right or wrong, but it looks worrying. Please note that I defined __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday. This define is missing since 1.7.2. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple