X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Tue, 28 Jul 2009 11:14:13 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: bug in mbrtowc? Message-ID: <20090728091413.GJ18621@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <416096c60907271456x5e8cb3f7y64433d542ec6cdcb AT mail DOT gmail DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <416096c60907271456x5e8cb3f7y64433d542ec6cdcb@mail.gmail.com> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Jul 27 22:56, Andy Koppe wrote: > I've encountered what looks like a bug in mbrtowc's handling of UTF-8. > Here's an example: > > #include > #include > #include > #include > > int main(void) { > wchar_t wc; > size_t ret; > mbstate_t s = { 0 }; > puts(setlocale(LC_CTYPE, "en_GB.UTF-8")); > printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0)); > printf("%i\n", mbrtowc(&wc, "\x94", 1, 0)); > printf("%i\n", mbrtowc(&wc, "\x84", 1, 0)); > printf("%x\n", wc); > return 0; > } > > The sequence E2 94 84 should translate to U+2514. Instead, the second > and third calls to mbrtowc report encoding errors. It does work > correctly if the three bytes are passed to mbrtowc() in one go: > > printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0)); That's a bug in the newlib function __utf8_mbtowc. I'm really surprised that this bug has never been reported before since it's in the code for years, probably since it has been introduced in 2002. I'll follow up on the newlib list. Thanks for the report and especially thanks for the testcase, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple