delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/03/20/07:40:54

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Fri, 20 Mar 2009 13:40:31 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Q: Is anybody here using the CYGWIN=codepage:oem setting?
Message-ID: <20090320124030.GM9322@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org> <20090319181323 DOT GB1868 AT calimero DOT vinschen DOT de> <49C29366 DOT 8080708 AT acm DOT org> <20090319192031 DOT GB9322 AT calimero DOT vinschen DOT de> <20090319192229 DOT GC9322 AT calimero DOT vinschen DOT de> <loom DOT 20090319T193406-675 AT post DOT gmane DOT org> <20090319201144 DOT GE9322 AT calimero DOT vinschen DOT de> <20090319203046 DOT GF9322 AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
In-Reply-To: <20090319203046.GF9322@calimero.vinschen.de>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Mar 19 21:30, Corinna Vinschen wrote:
> Here's another idea:
> 
> If the codeset is not UTF-8, and if a filename contains wide chars not
> representable in the current ANSI codeset, use the good old ASCII "SO/SI"
> method.
> 
> Example:  Assuming the ANSI codepage is CP1252.  Assuming the filename
> is in UTF-16
> 
>   /dir/to/foo\x1234bar
>   
> All chars except for \x1234 are convertible to the current ANSI code
> page.  The convertible chars are converted as usual.  The
> non-convertible characters are converted to an ASCII SO/SI sequence:
> 
>   /dir/to/foo\x0e\x12\x34\x0fbar

Of course this requires to convert the wchar to a utf-8 sequence.

> On the way back, Cygwin converts SO/SI sequences back to their
> UTF-16 counterpart and converts everything else using the current\
> codepage to UTF-16 conversion.
> 
> This would allow to manipulate all files on the disk regardless of
> using characters invalid in the current CP.
> 
> Does that solution make sense?

Apart from that I now proposed a change to newlib, so that setlocale
on Cygwin always chooses the charset which is equivalent to the
current ANSI codepage, if the charset is not given explicitely.
The list of so far suported codepages is the one I posted in
http://cygwin.com/ml/cygwin/2009-03/msg00693.html

For instance, if you set $LANG to "de_DE", the charset will become
CP1252, as is the default on german Windows systems.  If you set
$LANG to "de_DE.ISO-8859-15", you will get iso-8859-15 instead.
Setting it to "de_DE.UTF-8" ... you get the idea.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019