delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/01/26/08:44:16

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Message-ID: <4D402507.6030205@cwilson.fastmail.fm>
Date: Wed, 26 Jan 2011 08:43:35 -0500
From: Charles Wilson <cygwin AT cwilson DOT fastmail DOT fm>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: Bug in libiconv?
References: <1309 DOT 192 DOT 168 DOT 6 DOT 58 DOT 1296044105 DOT squirrel AT simlinux> <20110126132613 DOT GN28470 AT calimero DOT vinschen DOT de>
In-Reply-To: <20110126132613.GN28470@calimero.vinschen.de>
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On 1/26/2011 8:26 AM, Corinna Vinschen wrote:
> On Jan 26 13:15, simrw AT sim-basis DOT de wrote:
>>> Here's what happens on Cygwin:
>>>
>>> $ gcc -g -o ic ic.c -liconv
>>> $ ./ic
>>> iconv: 138 <Invalid or incomplete multibyte or wide character>
>>> in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7,
>> outbytesleft = 492
>>>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>>>   in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7,
>> outbytesleft = 492
>>>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>>>   in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7,
>> outbytesleft = 492
>>>   in = <Liian pitkä sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
>>>
>>> So, AFAICS, there are two problems:
>>>
>>>   - Even though iconv_open has been opened explicitely with "UTF-8" as
>>>     input string, the conversion still depends on the current application
>>>     codeset.  That dsoesn't make sense.
>>>
>>>   - Even though the last parameter to iconv is defined in bytes, the
>>>     value of outbytesleft after the conversion is the number of remaining
>>>     wchar"t's, not the number of remaining bytes.  That's contrary to
>>> what POSIX defines, see
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
>>
>> IMHO, the count is correct.
>> On Windows/Cygwin, wchar_t is 2 bytes, on Linux, 4 bytes.
>> So the buffer is 512 bytes.
>> In the first 3 cases, 10 input bytes were consumed so that there remains
>> in the buffer (512 - 20) = 492 bytes.
>> In the last case all 16 bytes are consumed so there remains in
>> the buffer (512 - 32) = 480 bytes.
> 
> Yes, you're right.  Quite obviously I misinterpreted the results without
> realizing that the buffer is smaller under Cygwin.

Sure, but there ARE still bugs in libiconv on Cygwin -- specifically:
 - Even though iconv_open has been opened explicitely with "UTF-8" as
   input string, the conversion still depends on the current application
   codeset.  That doesn't make sense.
and
 - 'iconv_close ((iconv_t) -1);' crashes the application with a SEGV.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019