delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:content-type:mime-version:subject:from | |
:in-reply-to:date:content-transfer-encoding:message-id | |
:references:to; q=dns; s=default; b=N55PpWSm1HimqU4Na3a+wsS/tUat | |
0eJdiw+aqgf+r2cP4LSa2O+ZbcdQ7tWuXoDah+Pd5+IiGwWr/9vju1qxsocIVz+2 | |
9OBDrxBYPK9Zbdl1t7zz3Ks7Jl4/5gZPdjGrPuxnJLWhfPdLdKALtUMjDAkApe2L | |
ZzcXLDUGOxzuuoM= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:content-type:mime-version:subject:from | |
:in-reply-to:date:content-transfer-encoding:message-id | |
:references:to; s=default; bh=stKYTxTu/7EClrr5CqrProy12AI=; b=NC | |
qVGP6750uHWCou3L7t4ZBprSG63K+1sQmg59fHsqKlId1jap4RzmoNhBNfKYwfvj | |
tUhSJHQgOHjxtfEvpCkmnz7tHVpvSVp1vgTT5GNIFJKWxrasNmjpQ1Bp3TbDaNBe | |
cPVjz+hdwr/4xHOkkt9DMjs4YEyYw4cknGh5Z31q0= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Virus-Found: | No |
X-Spam-SWARE-Status: | No, score=1.1 required=5.0 tests=AWL,BAYES_50,LIKELY_SPAM_SUBJECT,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 |
X-HELO: | etr-usa.com |
Mime-Version: | 1.0 (Mac OS X Mail 8.2 \(2070.6\)) |
Subject: | Re: With bad UTF-8, cygwin can create files it can't read |
From: | Warren Young <wyml AT etr-usa DOT com> |
In-Reply-To: | <20150401133401.GV13285@calimero.vinschen.de> |
Date: | Wed, 1 Apr 2015 10:01:42 -0600 |
Message-Id: | <F7BC8B64-DE90-4F01-9C8F-2BB3511B4EF5@etr-usa.com> |
References: | <CAOCY71AaRWGEFVcPqLKNEjqWEkELdfLD-KBvxMAQCi0wt2A5ZA AT mail DOT gmail DOT com> <20150330110446 DOT GK29875 AT calimero DOT vinschen DOT de> <20150401133401 DOT GV13285 AT calimero DOT vinschen DOT de> |
To: | cygwin AT cygwin DOT com |
X-IsSubscribed: | yes |
X-MIME-Autoconverted: | from quoted-printable to 8bit by delorie.com id t31G20UM025883 |
On Apr 1, 2015, at 7:34 AM, Corinna Vinschen <corinna-cygwin AT cygwin DOT com> wrote: > > As you probably know, Unicode values beyond the base plane (that is, > everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation) > are represented as so-called surrogate pairs in UTF-16, two UTF-16 > values in the 0xd800 - 0xdfff range. I happened to have run across a similar strangeness in Unicode earlier today. Does Cygwin cope with/care about Unicode normalization forms? http://goo.gl/jnsqhC For example, will open(2) cope with any UTF-8 form of a string that you could pass in UTF-16 encoding to CreateFile()? You could imagine, say, a web app getting a string from a user, then using that to access a file on disk. A different browser given the “same” string could result in a different series of bytes passed to the Cygwin POSIX layer. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright 2019 by DJ Delorie | Updated Jul 2019 |