Mail Archives: cygwin/2009/11/23/23:29:18
Eric Blake wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> According to Linda Walsh on 11/23/2009 4:59 PM:
>> Instead of using random characters out of the 'random free area' --
>> which could display as anything if you aren't in cygwin, depending
>> on what charset you have loaded, why not use 'dedicated' unicode
>> characters that map to the signs for those characters? They aren't
>> exactly equivalent, as they include some built-in display spacing,
>> BUT, they would display a colon as a colon, "*" as a asterisk, etc.
>
> But then, how would you distinguish between the valid UTF-16 replacement
> used to represent an invalid character, and a valid UTF-16 character
> representing itself? I'm sorry, but the value of a 1-to-1 round trip
> mapping outweighs the convenience of displaying a glyph that looks the
> same but causes ambiguous round trip conversions.
----
You've already broken 1-to-1 round trip compatibility by NOT
using an **INVALID** UTF-16 character. You are using "the 0xf000-0xf0ff
range. This range is part of the UNICODE block 95, "Private Use Area".
These are *valid* unicode characters -- they are just NOT reserved for
a particular application. This means they will be displayed randomly
and CAN be used by other applications (Mathematica for more than one of
it's character sets). IF you had used something that was NOT valid unicode,
you'd be safe. But the private use area IS valid, usable, area that is
already in use by other applications. You are 'illusioned' if you think
cygwin can use those characters without conflict. (I hate disillusioning
people...they usually don't like it, likely due to my great skill in the
area of 'tact'(!*sigh*!)).
This being the case, using characters that *are* reserved
for displaying the characters cygwin needs ("*:<>|?), makes sense. No
one will be using those characters for something other than to display
those 7 characters.
Those are "display forms" of those characters -- used for
displaying those characters when the actual characters can't or aren't
usable due to encoding issues.
That pretty much sums up how Cygwin is using them. In order to
not break other applications and standards, I strongly urge you to consider
using the allocated forms for the 'display' versions of the characters
you are using. There should be absolutely no breaking in compatibility.
Since anyone using those in a filename would be trying to get exactly the
effect Cygwin is wanting -- something that displays as those characters, but
isn't treated as those characters semantically.
This is coming from someone who DOES use those characters, and I know
that if cygwin treated them as standard characters (converting them to their
ASCII equivalents) in programs, it wouldn't break anything- because those
are all generic filename characters).
Your argument of trying not to break 1:1 roundtrip compatibility is
specious as it's simply broken already, as you are using characters that many
fonts use. I have a few thousand fonts, and a surprising number use that area
for storing alternative glyphs. You are more likely to encounter a conflict
using something that is documented to be usable by anyone for anything, than
if you use characters that are documented to be used exactly for the purpose
cygwin is using them.
-l
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -