X-Spam-Check-By: sourceware.org Date: Tue, 28 Mar 2006 09:47:28 -0500 (EST) From: Igor Peshansky Reply-To: cygwin AT cygwin DOT com To: Lapo Luchini cc: cygwin AT cygwin DOT com Subject: Re: Locales with wrong umlauts In-Reply-To: <4428D135.7020601@lapo.it> Message-ID: References: <4428D135 DOT 7020601 AT lapo DOT it> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Tue, 28 Mar 2006, Lapo Luchini wrote: > Igor Peshansky wrote: > > The system has no idea what charset it's using, because it depends on the > > font you set for your terminal, which is outside of the terminal's > > control. Even if you use a Unicode font with charset conversion, the > > charset is specified outside of the console. > > Oh? I had no idea about that. > Then the "Arial" distributed with latin1-like CP1252 areas (most western > europe) is a different font that the "Arial" used in eastern europe > (CP1250 AFAIR?) or the "Arilal" used for cyrillic-using places (CP1251?)? Nope, the font is probably the same (Unicode/UCS-2), but the encoding vector is specified in the properties of each terminal window, and thus not set globally. That said, there may be a system-default encoding (in the language preferences) that can be used as a good guess for the output encoding of filenames as converted to 8-bit from UCS-2. In particular, my Windows is set to accept Russian as one of its primary locales (the main one being en_US), and thus my non-English filenames are rendered in the CP1251 encoding (as is evident from xterms trying to display them using a latin1-encoded font). > Anyway, regarding file names, I don't think it is correct to say that > the name depends on the font: the "correct" name depends on the system > default codepage (or, well, since I guess underneath in now uses Unicode > let's say "the codepage used for retro-compatibility in the non-unicode > system calls"). Yep, except I would even say "the correct *rendering* of the name depends on the default codepage". The name doesn't change if you change the codepages. > If I have a filename with accents I want "ls" to show it "just like > Explorer", at least by default, with no explicit override on my part > using .Xdefaults or "rxvt -fn". Windows terminals use the above system-default encoding. IIRC, xterm and rxvt use latin1 by default. > OK, maybe I prefer to use a CP850-font like LucidaP because I want to > see line-drawings in "mc" and thus every accent will be messed up, but > that's another matter 0=) So, in this case, the encoding vector is part of the font. And no Windows API call will identify this vector for you so that OUTPUT_CHARSET can be set in the terminal... > > Is there any way to tell mv, rm &co to display non-ASCII characters in > > filenames? I know this isn't Cygwin-specific, but I'm not even sure what > > to Google for. > > Ohh, us poor non-ASCII-using people, don't you know it is just plain > wrong to use "strange accents" in filenames? Even more "wrong" starting > a filename with a dot or (what horror) using an extension more than 3 > chars long! (just kidding ^_^) Yes. Languages with different alphabets have a long history of transliteration on the Internet, specifically because i18n became widespread not too long ago (relatively speaking, of course). > Lapo > > PS: > don't we blame Cygwin too much, many Windows apps has problems with > unicode. E.g. if I create a folder name with japanese characters in it, > most applications are not even able to save a file in it. I'm not blaming Cygwin. If anything, I'm blaming newlib... J/K. :-) Igor -- http://cs.nyu.edu/~pechtcha/ |\ _,,,---,,_ pechtcha AT cs DOT nyu DOT edu | igor AT watson DOT ibm DOT com ZZZzz /,`.-'`' -. ;-;;,_ Igor Peshansky, Ph.D. (name changed!) |,4- ) )-,_. ,\ ( `'-' old name: Igor Pechtchanski '---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow! "Las! je suis sot... -Mais non, tu ne l'es pas, puisque tu t'en rends compte." "But no -- you are no fool; you call yourself a fool, there's proof enough in that!" -- Rostand, "Cyrano de Bergerac" -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/