X-Recipient: archive-cygwin@delorie.com
X-Spam-Check-By: sourceware.org
Date: Tue, 28 Jul 2009 11:14:13 +0200
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: bug in mbrtowc?
Message-ID: <20090728091413.GJ18621@calimero.vinschen.de>
Reply-To: cygwin@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
References: <416096c60907271456x5e8cb3f7y64433d542ec6cdcb@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <416096c60907271456x5e8cb3f7y64433d542ec6cdcb@mail.gmail.com>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie.com@cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com

On Jul 27 22:56, Andy Koppe wrote:
> I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
> Here's an example:
> 
> #include <stdio.h>
> #include <locale.h>
> #include <stdlib.h>
> #include <wchar.h>
> 
> int main(void) {
>   wchar_t wc;
>   size_t ret;
>   mbstate_t s = { 0 };
>   puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
>   printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
>   printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
>   printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
>   printf("%x\n", wc);
>   return 0;
> }
> 
> The sequence E2 94 84 should translate to U+2514. Instead, the second
> and third calls to mbrtowc report encoding errors. It does work
> correctly if the three bytes are passed to mbrtowc() in one go:
> 
>   printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));

That's a bug in the newlib function __utf8_mbtowc.  I'm really surprised
that this bug has never been reported before since it's in the code for
years, probably since it has been introduced in 2002.

I'll follow up on the newlib list.


Thanks for the report and especially thanks for the testcase,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

