delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/01/24/22:10:21

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Message-ID: <4D3E3EF6.7010501@cwilson.fastmail.fm>
Date: Mon, 24 Jan 2011 22:09:42 -0500
From: Charles Wilson <cygwin AT cwilson DOT fastmail DOT fm>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: Bug in libiconv?
References: <20110124154158 DOT GA15279 AT calimero DOT vinschen DOT de>
In-Reply-To: <20110124154158.GA15279@calimero.vinschen.de>
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
> Here's what happens on Cygwin:
> 
>   $ gcc -g -o ic ic.c -liconv
>   $ ./ic
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
>   in = <Liian pitkä sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480

Confirmed.

> So, AFAICS, there are two problems:
> 
>   - Even though iconv_open has been opened explicitely with "UTF-8" as
>     input string, the conversion still depends on the current application
>     codeset.  That dsoesn't make sense.
> 
>   - Even though the last parameter to iconv is defined in bytes, the
>     value of outbytesleft after the conversion is the number of remaining
>     wchar"t's, not the number of remaining bytes.  That's contrary to what
>     POSIX defines, see
>     http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
> 
> Is this analyzes correct?  Is there by any chance a newer version of
> libiconv2 which does not have these problems?

Well, iconv's behavior is very dependent on detailed characteristics of
the system on which it was compiled -- e.g. it's very finicky about the
platform's behavior vis character sets.

Now, cygwin's libiconv-1.13.1 was built a LONG time ago (2009 Dec 23),
and many things have changed in cygwin itself since then (e.g.
cygwin-1.7.1-1 was current at that time).

Now, since there has not yet been an updated upstream release of
libiconv, my first step would be to simply rebuild our existing
libiconv-1.13.1 on a platform with current cygwin (1.7.7-1), and try the
test case again.

If that doesn't correct the issue...then I'd try to run your test case
on linux, but *explicitly* using libiconv on that system, rather than
(as is typically the case on linux) relying on the underlying glibc
implementation of iconv functionality.  If the test case fails there,
then we've got a presumption that the problem is in the (generic,
cross-platform bits of) libiconv library itself.  Then, it's debugging
time... :-(

--
Chuck



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019