delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2015/04/01/12:10:52

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; q=dns; s=
default; b=MDFDkTdxRMhR34XPvMzExTt+b5XFvOeHhk96B42Zvo9OG602LjK1j
+0oWa8+mxzL8WCpi116uP/SnnXaaQmExuHNFcN2J6zr0n1KwLCQYWJP0LMQF1CDO
86QaiFKtkeAZA4r08Zd5tQDVpWNe+DXaZWaIYMRn/5kkw8Njg/wNzA=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; s=default;
bh=rgTqi3ddoMZxkcAgsiKd32sTvHE=; b=aIhXScPNAjTa+ATEZdbM+V0xuplM
lC3YKpyRe/nFgwK+OqOF4ZhpNNreu+zk8w6tEK+sp5GW3cVMfO5x6fsQdrAuWzjl
y8/FjSrr20wuVAJ7pPKbnVvJcrzyPly3mpmji3/PWHUl/lvqqumOzsEMmTLOu0Ci
e/Wzj3pSNIq0a+8=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.6 required=5.0 tests=AWL,BAYES_50,LIKELY_SPAM_SUBJECT autolearn=no version=3.3.2
X-HELO: calimero.vinschen.de
Date: Wed, 1 Apr 2015 18:10:29 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: With bad UTF-8, cygwin can create files it can't read
Message-ID: <20150401161029.GB13285@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <CAOCY71AaRWGEFVcPqLKNEjqWEkELdfLD-KBvxMAQCi0wt2A5ZA AT mail DOT gmail DOT com> <20150330110446 DOT GK29875 AT calimero DOT vinschen DOT de> <20150401133401 DOT GV13285 AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
In-Reply-To: <20150401133401.GV13285@calimero.vinschen.de>
User-Agent: Mutt/1.5.23 (2014-03-12)

--Z0wTxTCd2IDq3u/i
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Apr  1 15:34, Corinna Vinschen wrote:
> Hi Stuart,
>=20
> On Mar 30 13:04, Corinna Vinschen wrote:
> > On Mar 25 14:34, Kyzer wrote:
> > > Hello,
> > >=20
> > > I've found that if you use cygwin to create a file with badly-encoded
> > > UTF-8, readdir() gives out an entry with a name that cygwin won't
> > > subsequently accept.
> > >=20
> > > * create a file using filename with hex bytes F4 8F BF BF
> > > * readdir() reports the filename as hex bytes E2 8E B3 ED BF BF
> > > * attempting to open or unlink the filename E2 8E B3 ED BF BF fails
> > > * attempting to open or unlink the filename F4 8F BF BF succeeds
> >=20
> > Thanks for the testcase.  I'll have a look later this week (I hope).
>=20
> Wow.  Just wow.  You found a long-standing bug in the wctomb conversion
> from UTF-16 to UTF-8.
>=20
> As you probably know, Unicode values beyond the base plane (that is,
> everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
> are represented as so-called surrogate pairs in UTF-16, two UTF-16
> values in the 0xd800 - 0xdfff range.
>=20
> While the conversion from UTF-8 f4 8f Bf Bf to UTF-16 dbff dfff
> worked fine, the conversion back to UTF-8 has a subtil bug.  There's
> a test for a lone high surrogate pair in the underlying conversion
> function.  This tests the next UTF-16 value like this:
>=20
>   if (wchar < 0xdc00 || wchar >=3D 0xdfff)
>     /* Handle lone high surrogate */
>=20
> Notice the >=3D 0xdfff?  That should have been > 0xdfff.  Duh.  This
> bug is only a bit over 5 years old...
>=20
> Fixed in the git repo.  I'l regenerate the today's fool..., erm, the
> today's developer snapshot on https://cygwin.com/snapshots/ later today.

Snapshot is up.  Please give it a try.


Thanks,
Corinna

--=20
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--Z0wTxTCd2IDq3u/i
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJVHBh0AAoJEPU2Bp2uRE+gCKgP/11hhfr8MLbM1vm4WcbSveYo
SohCWaKS9imjXTYGflhgTBjOCxmndB6FOoS3fq3LuGwyFQ8/1niB0CnVAc6lE9le
1TDD+bTULE8TGqpWmdVVi/FjUX4P8bB7qnsFREmNx0D2NUy5dOGobOIAASqBzK33
Xs09ShcDC6F697a/I0Z4w8+YB5PR2PzPpIw6N9mHjpP3fu9FR6eMNnx2l9x8TU0U
bNc8qRrG1nWWHwn4K0G+JpLiLJfkW46EPj8gvpBGbVeSlpRDqmGwJKRPO4OFsRci
3rGrAijdtatNZzOgbSlLOlH391XaSqQSBg3PM4VtYjbUVSvgs76ArNaJFa9UyrHh
BQa0sZFmYUkYVOIAqPYfqKF/iGMPAW9jhlD/DsETgRMijq1ZoNvEIGZJlQoJVzLL
g+SHPVxYzaIC2ssVlNftqKeGVdIMhiJUA5du7Rga9rB3gAJQwC1/x3mVeCU9RXeh
f+x7EGQvS/IdSLjVqCg1xYLOpeGZWDuQ2mrl29LZOEx/xceG6IcQOi2JCZ7Y1vUH
si8ktTyl97d1bN7h7HbgBG+1QcnBNvy0Syd+/CHxh7dZ7CFyI/AO2XpfE2+T6sdw
1aIS3h0Q+x0KXIggw15WBumOWRVz7Uhns71bCyAGE0sEAPmX2QTy5zJ0/WyJ/6h4
+y3QQBV32YcLl6UE22EP
=amxW
-----END PGP SIGNATURE-----

--Z0wTxTCd2IDq3u/i--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019