X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Message-ID: <4B0C0D4D.7050303@towo.net> Date: Tue, 24 Nov 2009 17:43:57 +0100 From: Thomas Wolff User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: cyg1.7 - DOS character remapping: change request. References: <4B0B21E0 DOT 3050909 AT tlinx DOT org> <4B0B5433 DOT 8020603 AT byu DOT net> <4B0B610D DOT 6080709 AT tlinx DOT org> <20091124085022 DOT GR29173 AT calimero DOT vinschen DOT de> In-Reply-To: <20091124085022.GR29173@calimero.vinschen.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Corinna Vinschen wrote: > On Nov 23 20:29, Linda Walsh wrote: > >> Eric Blake wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> According to Linda Walsh on 11/23/2009 4:59 PM: >>> >>>> Instead of using random characters out of the 'random free area' -- >>>> which could display as anything if you aren't in cygwin, depending >>>> on what charset you have loaded, why not use 'dedicated' unicode >>>> characters that map to the signs for those characters? They aren't >>>> exactly equivalent, as they include some built-in display spacing, >>>> BUT, they would display a colon as a colon, "*" as a asterisk, etc. >>>> >>> But then, how would you distinguish between the valid UTF-16 replacement >>> used to represent an invalid character, and a valid UTF-16 character >>> representing itself? I'm sorry, but the value of a 1-to-1 round trip >>> mapping outweighs the convenience of displaying a glyph that looks the >>> same but causes ambiguous round trip conversions. >>> >> ---- >> >> You've already broken 1-to-1 round trip compatibility by NOT >> using an **INVALID** UTF-16 character. You are using "the 0xf000-0xf0ff >> > > There is no invalid UTF-16. There could be invalid UTF-32, but that's > not used by Windows. > Isolated surrogates are invalid UTF-16. Low surrogates could be used for this purpose, don't know if that was already discussed as an alternative. >> range. This range is part of the UNICODE block 95, "Private Use Area". >> These are *valid* unicode characters -- they are just NOT reserved for >> a particular application. This means they will be displayed randomly >> and CAN be used by other applications >> > > Right, and we use them to map characters from the base plane. There's > no area in the entire Unicode plane which would not conflict one way or > the other. We're using the same mapping as Interix does, so we're at > least compatible with one other product. Which is a convincing argument since this choice is kind of native... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple