X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Tue, 24 Nov 2009 09:50:22 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: cyg1.7 - DOS character remapping: change request. Message-ID: <20091124085022.GR29173@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <4B0B21E0 DOT 3050909 AT tlinx DOT org> <4B0B5433 DOT 8020603 AT byu DOT net> <4B0B610D DOT 6080709 AT tlinx DOT org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B0B610D.6080709@tlinx.org> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Nov 23 20:29, Linda Walsh wrote: > Eric Blake wrote: > >-----BEGIN PGP SIGNED MESSAGE----- > >Hash: SHA1 > > > >According to Linda Walsh on 11/23/2009 4:59 PM: > >>Instead of using random characters out of the 'random free area' -- > >>which could display as anything if you aren't in cygwin, depending > >>on what charset you have loaded, why not use 'dedicated' unicode > >>characters that map to the signs for those characters? They aren't > >>exactly equivalent, as they include some built-in display spacing, > >>BUT, they would display a colon as a colon, "*" as a asterisk, etc. > > > >But then, how would you distinguish between the valid UTF-16 replacement > >used to represent an invalid character, and a valid UTF-16 character > >representing itself? I'm sorry, but the value of a 1-to-1 round trip > >mapping outweighs the convenience of displaying a glyph that looks the > >same but causes ambiguous round trip conversions. > ---- > > You've already broken 1-to-1 round trip compatibility by NOT > using an **INVALID** UTF-16 character. You are using "the 0xf000-0xf0ff There is no invalid UTF-16. There could be invalid UTF-32, but that's not used by Windows. > range. This range is part of the UNICODE block 95, "Private Use Area". > These are *valid* unicode characters -- they are just NOT reserved for > a particular application. This means they will be displayed randomly > and CAN be used by other applications Right, and we use them to map characters from the base plane. There's no area in the entire Unicode plane which would not conflict one way or the other. We're using the same mapping as Interix does, so we're at least compatible with one other product. The only alternative is not to map ascii chars at all and revert this change. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple