X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Date: Tue, 23 Jun 2009 17:04:33 +0200 (CEST) Message-Id: <200906231504.n5NF4Xiv027571@mail.bln1.bf.nsn-intra.net> From: Thomas Wolff To: cygwin AT cygwin DOT com Subject: Re: default codepage References: <200906221448 DOT n5MEmF1r018726 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <200906231345 DOT n5NDj9i1026763 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <20090623140643 DOT GB3024 AT calimero DOT vinschen DOT de> Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com > > > > On Jun 22 16:48, Thomas Wolff wrote: > > > > > Since the latest locale-related changes, the default codepage after > > > > > starting cygwin _without_ explicit setting (of a locale variable) > > > > > seems to have changed from CP1252 ("Windows ANSI") to ISO 8859-1 ("Latin 1"). > > > > > Was this change on purpose? > > > > > ... > I tested this myself and now I understand what you mean. The console > seems to use ISO-8859-1, but actually it doesn't. What happens is this: > The console I/O functions are using UTF-16 under the hood, so each > incoming character is converted to Unicode. The ASCII->Unicode > conversion treats all incoming bytes literally. Since the Unicode > values from 0x80 to 0xff are derived from the ISO-8859-1 table, you > actually see ISO-8859-1 by default on the console. Understood; which means the effective codepage of the terminal is ISO-8859-1 (by whatever mechanism this is achieved). Maybe wcwidth etc. have a different opinion in this configuration (which I haven't tested) which might however raise additional problems. > So here's the question: Why is that a problem? It's just the default > output. I *can't* use CP1252 as default, because it's only a valid > default on western language versions of Windows. Rather I would have to > use the defualt ANSI codepage, whatever that is on the machine. OK, if that's how it was in 1.5, it would be fine. > ISO-8859-1 OTOH is the least intrusive default since it allows a > representation on all machines, independent of their default ANSI > codepage. The new approach is not a problem for me. I was just wondering about compatibility issues and pondering that keeping the 1.5 default might reduce the number of complaints from various users on this mailing list later when 1.7 goes mainstream... But wait - yet here's my question: Why is there a difference between bash --login and bash - where in the latter case CP1252 (or the default ANSI codepage) *is* still the default? Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple