X-Recipient: archive-cygwin@delorie.com
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 	tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <416096c60907271456x5e8cb3f7y64433d542ec6cdcb@mail.gmail.com>
References: <416096c60907271456x5e8cb3f7y64433d542ec6cdcb@mail.gmail.com>
Date: Tue, 28 Jul 2009 01:33:43 -0300
Message-ID: <94b5b62d0907272133g5d75858ei2a328d82cd54da11@mail.gmail.com>
Subject: Re: bug in mbrtowc?
From: Pedro Izecksohn <pedro.izecksohn@gmail.com>
To: cygwin@cygwin.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie.com@cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com

  From the "Linux Programmer=E2=80=99s Manual" (release 3.15 of the Linux m=
an-pages):
"If the n bytes starting at s do not contain a complete multibyte
character,  mbrtowc()  returns  (size_t) -2."

On Mon, Jul 27, 2009 at 6:56 PM, Andy Koppe wrote:
> I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
> Here's an example:
>
> #include <stdio.h>
> #include <locale.h>
> #include <stdlib.h>
> #include <wchar.h>
>
> int main(void) {
> =C2=A0wchar_t wc;
> =C2=A0size_t ret;
> =C2=A0mbstate_t s =3D { 0 };
> =C2=A0puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
> =C2=A0printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
> =C2=A0printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
> =C2=A0printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
> =C2=A0printf("%x\n", wc);
> =C2=A0return 0;
> }
>
> The sequence E2 94 84 should translate to U+2514. Instead, the second
> and third calls to mbrtowc report encoding errors. It does work
> correctly if the three bytes are passed to mbrtowc() in one go:
>
> =C2=A0printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));
>
> Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

