delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/11/24/04:07:07

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Tue, 24 Nov 2009 10:06:46 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: cyg1.7 - DOS character remapping: change request.
Message-ID: <20091124090646.GS29173@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <4B0B21E0 DOT 3050909 AT tlinx DOT org> <4B0B5433 DOT 8020603 AT byu DOT net> <4B0B610D DOT 6080709 AT tlinx DOT org> <20091124085022 DOT GR29173 AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
In-Reply-To: <20091124085022.GR29173@calimero.vinschen.de>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Nov 24 09:50, Corinna Vinschen wrote:
> On Nov 23 20:29, Linda Walsh wrote:
> > Eric Blake wrote:
> > >But then, how would you distinguish between the valid UTF-16 replacement
> > >used to represent an invalid character, and a valid UTF-16 character
> > >representing itself?  I'm sorry, but the value of a 1-to-1 round trip
> > >mapping outweighs the convenience of displaying a glyph that looks the
> > >same but causes ambiguous round trip conversions.
> > ----
> > 
> > 	You've already broken 1-to-1 round trip compatibility by NOT
> > using an **INVALID** UTF-16 character.  You are using "the 0xf000-0xf0ff
> 
> There is no invalid UTF-16.  There could be invalid UTF-32, but that's
> not used by Windows.
> 
> > range.  This range is part of the UNICODE block 95, "Private Use Area".
> > These are *valid* unicode characters -- they are just NOT reserved for
> > a particular application.  This means they will be displayed randomly
> > and CAN be used by other applications
> 
> Right, and we use them to map characters from the base plane.  There's
> no area in the entire Unicode plane which would not conflict one way or
> the other.  We're using the same mapping as Interix does, so we're at
> least compatible with one other product.  The only alternative is
> not to map ascii chars at all and revert this change.

Oh and, btw., the conversion between base plane and private use area is
only done for system objects like filenames.  It's not done for every
multibyte to widechar conversion within the application itself.  So,
*if* you have collisions, they will only occur for filenames, which are
rather unlikely (not impossible, I know) to use these private use
characters.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019