X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=bTiT334CT/BFdv9M4QGaXwnxCKtCHuRUgCosz/gspsncxqZKP9Ex6 rimHEVz/LPfUR9BBri5WK9vv0PM1uIYuqhf5iRj+fzMJ+Dodq3ZJ+II245fWpLvh jnxJF1PFNg1YDYyJMjPMsUpe1J50KlpHqisZIvTJtF/aJDv7yPcBsI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=peEsW3mkNzVeBNam5FxAQhR7KmI=; b=Q+QtMJJzdBhufyDW+0nyHudC7Kit E/L+9pOPQG+5km/vw0xmjtDcx836zQMzqmp5wnuLjc022B/1pfwTZrzNQ8PBMePL LO51224Klh1EO6CbHkK2MZJ5aHSFWXyGWvE5ld0c3COAATik2qWkFDSLLiZ8GxWb xc1bCijnstnwHnc= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.9 required=5.0 tests=AWL,BAYES_00,LIKELY_SPAM_SUBJECT autolearn=no version=3.3.2 X-HELO: calimero.vinschen.de Date: Wed, 1 Apr 2015 15:34:01 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: With bad UTF-8, cygwin can create files it can't read Message-ID: <20150401133401.GV13285@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20150330110446 DOT GK29875 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="MRBOAUz+O/XNC2GI" Content-Disposition: inline In-Reply-To: <20150330110446.GK29875@calimero.vinschen.de> User-Agent: Mutt/1.5.23 (2014-03-12) --MRBOAUz+O/XNC2GI Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Stuart, On Mar 30 13:04, Corinna Vinschen wrote: > On Mar 25 14:34, Kyzer wrote: > > Hello, > >=20 > > I've found that if you use cygwin to create a file with badly-encoded > > UTF-8, readdir() gives out an entry with a name that cygwin won't > > subsequently accept. > >=20 > > * create a file using filename with hex bytes F4 8F BF BF > > * readdir() reports the filename as hex bytes E2 8E B3 ED BF BF > > * attempting to open or unlink the filename E2 8E B3 ED BF BF fails > > * attempting to open or unlink the filename F4 8F BF BF succeeds >=20 > Thanks for the testcase. I'll have a look later this week (I hope). Wow. Just wow. You found a long-standing bug in the wctomb conversion from UTF-16 to UTF-8. As you probably know, Unicode values beyond the base plane (that is, everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation) are represented as so-called surrogate pairs in UTF-16, two UTF-16 values in the 0xd800 - 0xdfff range. While the conversion from UTF-8 f4 8f Bf Bf to UTF-16 dbff dfff worked fine, the conversion back to UTF-8 has a subtil bug. There's a test for a lone high surrogate pair in the underlying conversion function. This tests the next UTF-16 value like this: if (wchar < 0xdc00 || wchar >=3D 0xdfff) /* Handle lone high surrogate */ Notice the >=3D 0xdfff? That should have been > 0xdfff. Duh. This bug is only a bit over 5 years old... Fixed in the git repo. I'l regenerate the today's fool..., erm, the today's developer snapshot on https://cygwin.com/snapshots/ later today. Thanks, especially for the simple testcase, Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --MRBOAUz+O/XNC2GI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJVG/PIAAoJEPU2Bp2uRE+giAwP/3GnsohzpFcdr9R81vt2PbgE p3RaBDuqLhA/TMGlG6immpqfthPrIFR1L6ncfatZgPI3yNpDp/4eutKxBTKCkfVD wKFyEH4CPOikCqLAfPUKcIsktfVxm4vdwBp6CUYXYoBbTgCQt9Tc6KgwqgJ9YsJG QaB/IdEas2LS/DURviZr9sDYqtvDcfpA2aaD12mFiKPWYGGgM2m3lgRUbwwsXrL1 w9DPhWwaudYW+xz0aIBwx1XdPKZn3cWFcEl5fzFMmrWJoRRkHz1GIiWsyMDiH1GF OVDxTOD4Cpxsw+5JRYluosW0a4fV5BJ+G4EuJGoDGkijBdOOmaJoA2RXvcuE+7SD nNYioDjE6qWvo7gWZ9SOoXusi5x4H9rS8Vvn3129pgCS1eNafGXqSeuXvduhZ6q2 SCs8Oz2WCNN/4vrqc+vLxVJxuC2pO5I8Kwz5mMyBmiUf3yW0BLoCCIkLNUoXpfTw 80BiS5osKSaSByPCI5ektDp0o7LrvR9PGVK995sVAl0lMVnhaf08/jo++ovY0+fA DZRqye0fb06jYX7whaPLJDNszcTjVziADT7Mqy7yBQcAJ+o6USJ/QR+jTobzBBUE EfeigO5XnGT8sTCYrNMJalwKzGzjPzSD7TBlros/r6Gdku+uVL5ZXPx3Wy1Cyo6f OwuL6SlzCo6lXC0etELU =hFLj -----END PGP SIGNATURE----- --MRBOAUz+O/XNC2GI--