delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/03/28/09:47:41

X-Spam-Check-By: sourceware.org
Date: Tue, 28 Mar 2006 09:47:28 -0500 (EST)
From: Igor Peshansky <pechtcha AT cs DOT nyu DOT edu>
Reply-To: cygwin AT cygwin DOT com
To: Lapo Luchini <lapo DOT luchini AT gmail DOT com>
cc: cygwin AT cygwin DOT com
Subject: Re: Locales with wrong umlauts
In-Reply-To: <4428D135.7020601@lapo.it>
Message-ID: <Pine.GSO.4.63.0603280934290.18642@access1.cims.nyu.edu>
References: <loom DOT 20060326T135539-102 AT post DOT gmane DOT org> <Pine DOT GSO DOT 4 DOT 63 DOT 0603272344260 DOT 18642 AT access1 DOT cims DOT nyu DOT edu> <4428D135 DOT 7020601 AT lapo DOT it>
MIME-Version: 1.0
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Tue, 28 Mar 2006, Lapo Luchini wrote:

> Igor Peshansky wrote:
> > The system has no idea what charset it's using, because it depends on the
> > font you set for your terminal, which is outside of the terminal's
> > control.  Even if you use a Unicode font with charset conversion, the
> > charset is specified outside of the console.
>
> Oh? I had no idea about that.
> Then the "Arial" distributed with latin1-like CP1252 areas (most western
> europe) is a different font that the "Arial" used in eastern europe
> (CP1250 AFAIR?) or the "Arilal" used for cyrillic-using places (CP1251?)?

Nope, the font is probably the same (Unicode/UCS-2), but the encoding
vector is specified in the properties of each terminal window, and thus
not set globally.  That said, there may be a system-default encoding (in
the language preferences) that can be used as a good guess for the output
encoding of filenames as converted to 8-bit from UCS-2.  In particular, my
Windows is set to accept Russian as one of its primary locales (the main
one being en_US), and thus my non-English filenames are rendered in the
CP1251 encoding (as is evident from xterms trying to display them using a
latin1-encoded font).

> Anyway, regarding file names, I don't think it is correct to say that
> the name depends on the font: the "correct" name depends on the system
> default codepage (or, well, since I guess underneath in now uses Unicode
> let's say "the codepage used for retro-compatibility in the non-unicode
> system calls").

Yep, except I would even say "the correct *rendering* of the name depends
on the default codepage".  The name doesn't change if you change the
codepages.

> If I have a filename with accents I want "ls" to show it "just like
> Explorer", at least by default, with no explicit override on my part
> using .Xdefaults or "rxvt -fn".

Windows terminals use the above system-default encoding.  IIRC, xterm and
rxvt use latin1 by default.

> OK, maybe I prefer to use a CP850-font like LucidaP because I want to
> see line-drawings in "mc" and thus every accent will be messed up, but
> that's another matter 0=)

So, in this case, the encoding vector is part of the font.  And no Windows
API call will identify this vector for you so that OUTPUT_CHARSET can be
set in the terminal...

> > Is there any way to tell mv, rm &co to display non-ASCII characters in
> > filenames?  I know this isn't Cygwin-specific, but I'm not even sure what
> > to Google for.
>
> Ohh, us poor non-ASCII-using people, don't you know it is just plain
> wrong to use "strange accents" in filenames? Even more "wrong" starting
> a filename with a dot or (what horror) using an extension more than 3
> chars long! (just kidding ^_^)

Yes.  Languages with different alphabets have a long history of
transliteration on the Internet, specifically because i18n became
widespread not too long ago (relatively speaking, of course).

>     Lapo
>
> PS:
> don't we blame Cygwin too much, many Windows apps has problems with
> unicode. E.g. if I create a folder name with japanese characters in it,
> most applications are not even able to save a file in it.

I'm not blaming Cygwin.  If anything, I'm blaming newlib...  J/K. :-)
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_	    pechtcha AT cs DOT nyu DOT edu | igor AT watson DOT ibm DOT com
ZZZzz /,`.-'`'    -.  ;-;;,_		Igor Peshansky, Ph.D. (name changed!)
     |,4-  ) )-,_. ,\ (  `'-'		old name: Igor Pechtchanski
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

"Las! je suis sot... -Mais non, tu ne l'es pas, puisque tu t'en rends compte."
"But no -- you are no fool; you call yourself a fool, there's proof enough in
that!" -- Rostand, "Cyrano de Bergerac"

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019