X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Tue, 24 Nov 2009 10:06:46 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: cyg1.7 - DOS character remapping: change request. Message-ID: <20091124090646.GS29173@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <4B0B21E0 DOT 3050909 AT tlinx DOT org> <4B0B5433 DOT 8020603 AT byu DOT net> <4B0B610D DOT 6080709 AT tlinx DOT org> <20091124085022 DOT GR29173 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20091124085022.GR29173@calimero.vinschen.de> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Nov 24 09:50, Corinna Vinschen wrote: > On Nov 23 20:29, Linda Walsh wrote: > > Eric Blake wrote: > > >But then, how would you distinguish between the valid UTF-16 replacement > > >used to represent an invalid character, and a valid UTF-16 character > > >representing itself? I'm sorry, but the value of a 1-to-1 round trip > > >mapping outweighs the convenience of displaying a glyph that looks the > > >same but causes ambiguous round trip conversions. > > ---- > > > > You've already broken 1-to-1 round trip compatibility by NOT > > using an **INVALID** UTF-16 character. You are using "the 0xf000-0xf0ff > > There is no invalid UTF-16. There could be invalid UTF-32, but that's > not used by Windows. > > > range. This range is part of the UNICODE block 95, "Private Use Area". > > These are *valid* unicode characters -- they are just NOT reserved for > > a particular application. This means they will be displayed randomly > > and CAN be used by other applications > > Right, and we use them to map characters from the base plane. There's > no area in the entire Unicode plane which would not conflict one way or > the other. We're using the same mapping as Interix does, so we're at > least compatible with one other product. The only alternative is > not to map ascii chars at all and revert this change. Oh and, btw., the conversion between base plane and private use area is only done for system objects like filenames. It's not done for every multibyte to widechar conversion within the application itself. So, *if* you have collisions, they will only occur for filenames, which are rather unlikely (not impossible, I know) to use these private use characters. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple