delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/07/28/05:51:13

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-0.2 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <20090728091413.GJ18621@calimero.vinschen.de>
References: <416096c60907271456x5e8cb3f7y64433d542ec6cdcb AT mail DOT gmail DOT com> <20090728091413 DOT GJ18621 AT calimero DOT vinschen DOT de>
Date: Tue, 28 Jul 2009 06:50:58 -0300
Message-ID: <94b5b62d0907280250q3321f62ft6cc542367dbc68d2@mail.gmail.com>
Subject: Re: bug in mbrtowc?
From: Pedro Izecksohn <pedro DOT izecksohn AT gmail DOT com>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

  The bug is in O.P.'s code as &s is not being passed to mbrtowc.

  I'm on Ubuntu. I do not have Cygwin here.

  I should consume some calories before trying to debug anything.

On Tue, Jul 28, 2009 at 6:14 AM, Corinna
Vinschen<corinna-cygwin AT cygwin DOT com> wrote:
> On Jul 27 22:56, Andy Koppe wrote:
>> I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
>> Here's an example:
>>
>> #include <stdio.h>
>> #include <locale.h>
>> #include <stdlib.h>
>> #include <wchar.h>
>>
>> int main(void) {
>> =C2=A0 wchar_t wc;
>> =C2=A0 size_t ret;
>> =C2=A0 mbstate_t s =3D { 0 };
>> =C2=A0 puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
>> =C2=A0 printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
>> =C2=A0 printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
>> =C2=A0 printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
>> =C2=A0 printf("%x\n", wc);
>> =C2=A0 return 0;
>> }
>>
>> The sequence E2 94 84 should translate to U+2514. Instead, the second
>> and third calls to mbrtowc report encoding errors. It does work
>> correctly if the three bytes are passed to mbrtowc() in one go:
>>
>> =C2=A0 printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));
>
> That's a bug in the newlib function __utf8_mbtowc. =C2=A0I'm really surpr=
ised
> that this bug has never been reported before since it's in the code for
> years, probably since it has been introduced in 2002.
>
> I'll follow up on the newlib list.
>
>
> Thanks for the report and especially thanks for the testcase,
> Corinna

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019