delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/11/24/11:44:12

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Message-ID: <4B0C0D4D.7050303@towo.net>
Date: Tue, 24 Nov 2009 17:43:57 +0100
From: Thomas Wolff <towo AT towo DOT net>
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: cyg1.7 - DOS character remapping: change request.
References: <4B0B21E0 DOT 3050909 AT tlinx DOT org> <4B0B5433 DOT 8020603 AT byu DOT net> <4B0B610D DOT 6080709 AT tlinx DOT org> <20091124085022 DOT GR29173 AT calimero DOT vinschen DOT de>
In-Reply-To: <20091124085022.GR29173@calimero.vinschen.de>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Corinna Vinschen wrote:
> On Nov 23 20:29, Linda Walsh wrote:
>   
>> Eric Blake wrote:
>>     
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> According to Linda Walsh on 11/23/2009 4:59 PM:
>>>       
>>>> Instead of using random characters out of the 'random free area' --
>>>> which could display as anything if you aren't in cygwin, depending
>>>> on what charset you have loaded,  why not use 'dedicated' unicode
>>>> characters that map to the signs for those characters?  They aren't
>>>> exactly equivalent, as they include some built-in display spacing,
>>>> BUT, they would display a colon as a colon, "*" as a asterisk, etc.
>>>>         
>>> But then, how would you distinguish between the valid UTF-16 replacement
>>> used to represent an invalid character, and a valid UTF-16 character
>>> representing itself?  I'm sorry, but the value of a 1-to-1 round trip
>>> mapping outweighs the convenience of displaying a glyph that looks the
>>> same but causes ambiguous round trip conversions.
>>>       
>> ----
>>
>> 	You've already broken 1-to-1 round trip compatibility by NOT
>> using an **INVALID** UTF-16 character.  You are using "the 0xf000-0xf0ff
>>     
>
> There is no invalid UTF-16.  There could be invalid UTF-32, but that's
> not used by Windows.
>   
Isolated surrogates are invalid UTF-16. Low surrogates could be used for 
this purpose, don't know if that was already discussed as an alternative.
>> range.  This range is part of the UNICODE block 95, "Private Use Area".
>> These are *valid* unicode characters -- they are just NOT reserved for
>> a particular application.  This means they will be displayed randomly
>> and CAN be used by other applications
>>     
>
> Right, and we use them to map characters from the base plane.  There's
> no area in the entire Unicode plane which would not conflict one way or
> the other.  We're using the same mapping as Interix does, so we're at
> least compatible with one other product.
Which is a convincing argument since this choice is kind of native...

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019