delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/11/28/05:24:33

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <4B10FA45.6060605@tlinx.org>
Date: Sat, 28 Nov 2009 02:24:05 -0800
From: Linda Walsh <cygwin AT tlinx DOT org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 ThunderBrowse/3.2.6.5 Mnenhy/0.7.6.666
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: cyg1.7 - DOS character remapping: change request.
References: <4B0B21E0 DOT 3050909 AT tlinx DOT org> <4B0B5433 DOT 8020603 AT byu DOT net> <4B0B610D DOT 6080709 AT tlinx DOT org> <20091124085022 DOT GR29173 AT calimero DOT vinschen DOT de> <20091124090646 DOT GS29173 AT calimero DOT vinschen DOT de>
In-Reply-To: <20091124090646.GS29173@calimero.vinschen.de>
X-Stationery: 0.4.10
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

# Eric Blake:...
   [believes he has round trip mapping and that it is more valuable]
    than user's being able to identify their files in the OS GUI or
    on a linux server]

# Linda W. replies to Eric:
   [points out that the current system already uses valid Unicode values
   (as others have pointed out, not all UTF-16 values are valid Unicode
   values) and that the current system is already breaking round trip
   mapping, so this is not a valid point with the current encoding]

# Corinna Vinschen writes:
> Right, and we use them to map characters from the base plane.  There's
> no area in the entire Unicode plane which would not conflict one way or
> the other.
---

   But there are "probabilities of conflict".   Cygwin wants to allow
the use of the 7-deadly chars, by mapping them 'randomly' (to some,
"hopfully" unused area.  I say lets map them to their visual
equivalents.  That way, they have a strong chance of being recognized
as correct in Explorer and in linux Gui's whether they are on SMB
mappings or have been copied across.  

As it is, those characters, in Explorer or on a linux server will
look like random blanks, boxes or other garbage, and the files won't
be identifiable.   I see that as bad.   The display equivalents
would look like the ASCII equivalents enought to allow recognition
of what the filename is meant to be.    Best of all, that displayed
value would be a constant based on the reserved UNICODE value
of those characters.  They could always (with a character set that
displays those values), display their ASCII equivalents.  

> We're using the same mapping as Interix does, so we're at
> least compatible with one other product.  The only alternative is
> not to map ascii chars at all and revert this change.
----
   Interix is a MS product.  MS is not noted for following standard,
but doing their darndest to do harm to standards.  They chose their
values before Unicode was standardized.

   Any other standards group I know of is going UTF-8.  All of the 
linux distributions I know are going UTF-8.  I'd like to see Cygwin
go that way too.  Using the visual encodings for the deadly 7 will
allow the chars to look correct on Windows (in Explorer and 
Unicode browsers like IE and FF) as well as on Linux.

> Oh and, btw., the conversion between base plane and private use area is
> only done for system objects like filenames.  It's not done for every
> multibyte to widechar conversion within the application itself.  So,
> *if* you have collisions, they will only occur for filenames, which are
> rather unlikely (not impossible, I know) to use these private use
> characters.
----
   Am aware of this -- it's looking at the files in Explorer or on
linux that I want to see something that looks like a colon when I
put in a colon.  :-).

   If you were strongly concerned about mapping collisions,
      you could:

   1) use a single env var to turn it off or on,   OR 
   2) use the html entities to provide valid mappings,   OR
   3) do either of the above in the registry

   But barring any other changes, I'd really, (like pretty please!)
like to see them mapped to their, reserved-visual, but semantically
impotent equivalents.  After all, that's one reason those characters
are there! :-)

linda





--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019