Mail Archives: cygwin/2009/11/23/23:29:18

delorie.com/archives/browse.cgi

search

Mail Archives: cygwin/2009/11/23/23:29:18

X-Recipient: archive-cygwin AT delorie DOT com

X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_PASS

X-Spam-Check-By: sourceware.org

Message-ID: <4B0B610D.6080709@tlinx.org>

Date: Mon, 23 Nov 2009 20:29:01 -0800

From: Linda Walsh <cygwin AT tlinx DOT org>

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.22) Gecko/20090605 Lightning/0.9 Thunderbird/2.0.0.22 ThunderBrowse/3.2.6.5 Mnenhy/0.7.6.666

MIME-Version: 1.0

To: cygwin AT cygwin DOT com

Subject: Re: cyg1.7 - DOS character remapping: change request.

References: <4B0B21E0 DOT 3050909 AT tlinx DOT org> <4B0B5433 DOT 8020603 AT byu DOT net>

In-Reply-To: <4B0B5433.8020603@byu.net>

X-Stationery: 0.4.10

X-IsSubscribed: yes

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm

List-Id: <cygwin.cygwin.com>

List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>

List-Archive: <http://sourceware.org/ml/cygwin/>

List-Post: <mailto:cygwin AT cygwin DOT com>

List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>

Sender: cygwin-owner AT cygwin DOT com

Mail-Followup-To: cygwin AT cygwin DOT com

Delivered-To: mailing list cygwin AT cygwin DOT com

Eric Blake wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> According to Linda Walsh on 11/23/2009 4:59 PM:
>> Instead of using random characters out of the 'random free area' --
>> which could display as anything if you aren't in cygwin, depending
>> on what charset you have loaded,  why not use 'dedicated' unicode
>> characters that map to the signs for those characters?  They aren't
>> exactly equivalent, as they include some built-in display spacing,
>> BUT, they would display a colon as a colon, "*" as a asterisk, etc.
> 
> But then, how would you distinguish between the valid UTF-16 replacement
> used to represent an invalid character, and a valid UTF-16 character
> representing itself?  I'm sorry, but the value of a 1-to-1 round trip
> mapping outweighs the convenience of displaying a glyph that looks the
> same but causes ambiguous round trip conversions.
----

	You've already broken 1-to-1 round trip compatibility by NOT
using an **INVALID** UTF-16 character.  You are using "the 0xf000-0xf0ff
range.  This range is part of the UNICODE block 95, "Private Use Area".
These are *valid* unicode characters -- they are just NOT reserved for
a particular application.  This means they will be displayed randomly
and CAN be used by other applications (Mathematica for more than one of
it's character sets).  IF you had used something that was NOT valid unicode,
you'd be safe.  But the private use area IS valid, usable, area that is
already in use by other applications.  You are 'illusioned' if you think
cygwin can use those characters without conflict.  (I hate disillusioning
people...they usually don't like it, likely due to my great skill in the
area of 'tact'(!*sigh*!)).

	This being the case, using characters that *are* reserved
for displaying the characters cygwin needs ("*:<>|?), makes sense.  No 
one will be using those characters for something other than to display
those 7 characters.

	Those are "display forms" of those characters -- used for 
displaying those characters when the actual characters can't or aren't 
usable due to encoding issues.

	That pretty much sums up how Cygwin is using them.  In order to 
not break other applications and standards, I strongly urge you to consider
using the allocated forms for the 'display' versions of the characters
you are using.  There should be absolutely no breaking in compatibility.
Since anyone using those in a filename would be trying to get exactly the
effect Cygwin is wanting -- something that displays as those characters, but
isn't treated as those characters semantically.

	This is coming from someone who DOES use those characters, and I know
that if cygwin treated them as standard characters (converting them to their
ASCII equivalents) in programs, it wouldn't break anything- because those
are all generic filename characters).

	Your argument of trying not to break 1:1 roundtrip compatibility is
specious as it's simply broken already, as you are using characters that many
fonts use.  I have a few thousand fonts, and a surprising number use that area
for storing alternative glyphs.   You are more likely to encounter a conflict
using something that is documented to be usable by anyone for anything, than
if you use characters that are documented to be used exactly for the purpose
cygwin is using them.  

-l

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

X-Recipient:	archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status:	No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By:	sourceware.org
Message-ID:	<4B0B610D.6080709@tlinx.org>
Date:	Mon, 23 Nov 2009 20:29:01 -0800
From:	Linda Walsh <cygwin AT tlinx DOT org>
User-Agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.22) Gecko/20090605 Lightning/0.9 Thunderbird/2.0.0.22 ThunderBrowse/3.2.6.5 Mnenhy/0.7.6.666
MIME-Version:	1.0
To:	cygwin AT cygwin DOT com
Subject:	Re: cyg1.7 - DOS character remapping: change request.
References:	<4B0B21E0 DOT 3050909 AT tlinx DOT org> <4B0B5433 DOT 8020603 AT byu DOT net>
In-Reply-To:	<4B0B5433.8020603@byu.net>
X-Stationery:	0.4.10
X-IsSubscribed:	yes
Mailing-List:	contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id:	<cygwin.cygwin.com>
List-Subscribe:	<mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive:	<http://sourceware.org/ml/cygwin/>
List-Post:	<mailto:cygwin AT cygwin DOT com>
List-Help:	<mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender:	cygwin-owner AT cygwin DOT com
Mail-Followup-To:	cygwin AT cygwin DOT com
Delivered-To:	mailing list cygwin AT cygwin DOT com