| delorie.com/archives/browse.cgi | search |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
| :list-unsubscribe:list-subscribe:list-archive:list-post | |
| :list-help:sender:date:from:to:subject:message-id:reply-to | |
| :references:mime-version:content-type:in-reply-to; q=dns; s= | |
| default; b=bTiT334CT/BFdv9M4QGaXwnxCKtCHuRUgCosz/gspsncxqZKP9Ex6 | |
| rimHEVz/LPfUR9BBri5WK9vv0PM1uIYuqhf5iRj+fzMJ+Dodq3ZJ+II245fWpLvh | |
| jnxJF1PFNg1YDYyJMjPMsUpe1J50KlpHqisZIvTJtF/aJDv7yPcBsI= | |
| DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
| :list-unsubscribe:list-subscribe:list-archive:list-post | |
| :list-help:sender:date:from:to:subject:message-id:reply-to | |
| :references:mime-version:content-type:in-reply-to; s=default; | |
| bh=peEsW3mkNzVeBNam5FxAQhR7KmI=; b=Q+QtMJJzdBhufyDW+0nyHudC7Kit | |
| E/L+9pOPQG+5km/vw0xmjtDcx836zQMzqmp5wnuLjc022B/1pfwTZrzNQ8PBMePL | |
| LO51224Klh1EO6CbHkK2MZJ5aHSFWXyGWvE5ld0c3COAATik2qWkFDSLLiZ8GxWb | |
| xc1bCijnstnwHnc= | |
| Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
| List-Id: | <cygwin.cygwin.com> |
| List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
| List-Archive: | <http://sourceware.org/ml/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
| Sender: | cygwin-owner AT cygwin DOT com |
| Mail-Followup-To: | cygwin AT cygwin DOT com |
| Delivered-To: | mailing list cygwin AT cygwin DOT com |
| Authentication-Results: | sourceware.org; auth=none |
| X-Virus-Found: | No |
| X-Spam-SWARE-Status: | No, score=-4.9 required=5.0 tests=AWL,BAYES_00,LIKELY_SPAM_SUBJECT autolearn=no version=3.3.2 |
| X-HELO: | calimero.vinschen.de |
| Date: | Wed, 1 Apr 2015 15:34:01 +0200 |
| From: | Corinna Vinschen <corinna-cygwin AT cygwin DOT com> |
| To: | cygwin AT cygwin DOT com |
| Subject: | Re: With bad UTF-8, cygwin can create files it can't read |
| Message-ID: | <20150401133401.GV13285@calimero.vinschen.de> |
| Reply-To: | cygwin AT cygwin DOT com |
| Mail-Followup-To: | cygwin AT cygwin DOT com |
| References: | <CAOCY71AaRWGEFVcPqLKNEjqWEkELdfLD-KBvxMAQCi0wt2A5ZA AT mail DOT gmail DOT com> <20150330110446 DOT GK29875 AT calimero DOT vinschen DOT de> |
| MIME-Version: | 1.0 |
| In-Reply-To: | <20150330110446.GK29875@calimero.vinschen.de> |
| User-Agent: | Mutt/1.5.23 (2014-03-12) |
--MRBOAUz+O/XNC2GI
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Hi Stuart,
On Mar 30 13:04, Corinna Vinschen wrote:
> On Mar 25 14:34, Kyzer wrote:
> > Hello,
> >=20
> > I've found that if you use cygwin to create a file with badly-encoded
> > UTF-8, readdir() gives out an entry with a name that cygwin won't
> > subsequently accept.
> >=20
> > * create a file using filename with hex bytes F4 8F BF BF
> > * readdir() reports the filename as hex bytes E2 8E B3 ED BF BF
> > * attempting to open or unlink the filename E2 8E B3 ED BF BF fails
> > * attempting to open or unlink the filename F4 8F BF BF succeeds
>=20
> Thanks for the testcase. I'll have a look later this week (I hope).
Wow. Just wow. You found a long-standing bug in the wctomb conversion
from UTF-16 to UTF-8.
As you probably know, Unicode values beyond the base plane (that is,
everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
are represented as so-called surrogate pairs in UTF-16, two UTF-16
values in the 0xd800 - 0xdfff range.
While the conversion from UTF-8 f4 8f Bf Bf to UTF-16 dbff dfff
worked fine, the conversion back to UTF-8 has a subtil bug. There's
a test for a lone high surrogate pair in the underlying conversion
function. This tests the next UTF-16 value like this:
if (wchar < 0xdc00 || wchar >=3D 0xdfff)
/* Handle lone high surrogate */
Notice the >=3D 0xdfff? That should have been > 0xdfff. Duh. This
bug is only a bit over 5 years old...
Fixed in the git repo. I'l regenerate the today's fool..., erm, the
today's developer snapshot on https://cygwin.com/snapshots/ later today.
Thanks, especially for the simple testcase,
Corinna
--=20
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
--MRBOAUz+O/XNC2GI
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBAgAGBQJVG/PIAAoJEPU2Bp2uRE+giAwP/3GnsohzpFcdr9R81vt2PbgE
p3RaBDuqLhA/TMGlG6immpqfthPrIFR1L6ncfatZgPI3yNpDp/4eutKxBTKCkfVD
wKFyEH4CPOikCqLAfPUKcIsktfVxm4vdwBp6CUYXYoBbTgCQt9Tc6KgwqgJ9YsJG
QaB/IdEas2LS/DURviZr9sDYqtvDcfpA2aaD12mFiKPWYGGgM2m3lgRUbwwsXrL1
w9DPhWwaudYW+xz0aIBwx1XdPKZn3cWFcEl5fzFMmrWJoRRkHz1GIiWsyMDiH1GF
OVDxTOD4Cpxsw+5JRYluosW0a4fV5BJ+G4EuJGoDGkijBdOOmaJoA2RXvcuE+7SD
nNYioDjE6qWvo7gWZ9SOoXusi5x4H9rS8Vvn3129pgCS1eNafGXqSeuXvduhZ6q2
SCs8Oz2WCNN/4vrqc+vLxVJxuC2pO5I8Kwz5mMyBmiUf3yW0BLoCCIkLNUoXpfTw
80BiS5osKSaSByPCI5ektDp0o7LrvR9PGVK995sVAl0lMVnhaf08/jo++ovY0+fA
DZRqye0fb06jYX7whaPLJDNszcTjVziADT7Mqy7yBQcAJ+o6USJ/QR+jTobzBBUE
EfeigO5XnGT8sTCYrNMJalwKzGzjPzSD7TBlros/r6Gdku+uVL5ZXPx3Wy1Cyo6f
OwuL6SlzCo6lXC0etELU
=hFLj
-----END PGP SIGNATURE-----
--MRBOAUz+O/XNC2GI--
| webmaster | delorie software privacy |
| Copyright © 2019 by DJ Delorie | Updated Jul 2019 |