delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2015/04/01/12:02:04

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:content-type:mime-version:subject:from
:in-reply-to:date:content-transfer-encoding:message-id
:references:to; q=dns; s=default; b=N55PpWSm1HimqU4Na3a+wsS/tUat
0eJdiw+aqgf+r2cP4LSa2O+ZbcdQ7tWuXoDah+Pd5+IiGwWr/9vju1qxsocIVz+2
9OBDrxBYPK9Zbdl1t7zz3Ks7Jl4/5gZPdjGrPuxnJLWhfPdLdKALtUMjDAkApe2L
ZzcXLDUGOxzuuoM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:content-type:mime-version:subject:from
:in-reply-to:date:content-transfer-encoding:message-id
:references:to; s=default; bh=stKYTxTu/7EClrr5CqrProy12AI=; b=NC
qVGP6750uHWCou3L7t4ZBprSG63K+1sQmg59fHsqKlId1jap4RzmoNhBNfKYwfvj
tUhSJHQgOHjxtfEvpCkmnz7tHVpvSVp1vgTT5GNIFJKWxrasNmjpQ1Bp3TbDaNBe
cPVjz+hdwr/4xHOkkt9DMjs4YEyYw4cknGh5Z31q0=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=1.1 required=5.0 tests=AWL,BAYES_50,LIKELY_SPAM_SUBJECT,T_RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: etr-usa.com
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\))
Subject: Re: With bad UTF-8, cygwin can create files it can't read
From: Warren Young <wyml AT etr-usa DOT com>
In-Reply-To: <20150401133401.GV13285@calimero.vinschen.de>
Date: Wed, 1 Apr 2015 10:01:42 -0600
Message-Id: <F7BC8B64-DE90-4F01-9C8F-2BB3511B4EF5@etr-usa.com>
References: <CAOCY71AaRWGEFVcPqLKNEjqWEkELdfLD-KBvxMAQCi0wt2A5ZA AT mail DOT gmail DOT com> <20150330110446 DOT GK29875 AT calimero DOT vinschen DOT de> <20150401133401 DOT GV13285 AT calimero DOT vinschen DOT de>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id t31G20UM025883

On Apr 1, 2015, at 7:34 AM, Corinna Vinschen <corinna-cygwin AT cygwin DOT com> wrote:
> 
> As you probably know, Unicode values beyond the base plane (that is,
> everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
> are represented as so-called surrogate pairs in UTF-16, two UTF-16
> values in the 0xd800 - 0xdfff range.

I happened to have run across a similar strangeness in Unicode earlier today.  Does Cygwin cope with/care about Unicode normalization forms?

  http://goo.gl/jnsqhC

For example, will open(2) cope with any UTF-8 form of a string that you could pass in UTF-16 encoding to CreateFile()?

You could imagine, say, a web app getting a string from a user, then using that to access a file on disk.  A different browser given the “same” string could result in a different series of bytes passed to the Cygwin POSIX layer.
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


- Raw text -


  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019