Mail Archives: cygwin/2009/03/20/07:40:54
On Mar 19 21:30, Corinna Vinschen wrote:
> Here's another idea:
>
> If the codeset is not UTF-8, and if a filename contains wide chars not
> representable in the current ANSI codeset, use the good old ASCII "SO/SI"
> method.
>
> Example: Assuming the ANSI codepage is CP1252. Assuming the filename
> is in UTF-16
>
> /dir/to/foo\x1234bar
>
> All chars except for \x1234 are convertible to the current ANSI code
> page. The convertible chars are converted as usual. The
> non-convertible characters are converted to an ASCII SO/SI sequence:
>
> /dir/to/foo\x0e\x12\x34\x0fbar
Of course this requires to convert the wchar to a utf-8 sequence.
> On the way back, Cygwin converts SO/SI sequences back to their
> UTF-16 counterpart and converts everything else using the current\
> codepage to UTF-16 conversion.
>
> This would allow to manipulate all files on the disk regardless of
> using characters invalid in the current CP.
>
> Does that solution make sense?
Apart from that I now proposed a change to newlib, so that setlocale
on Cygwin always chooses the charset which is equivalent to the
current ANSI codepage, if the charset is not given explicitely.
The list of so far suported codepages is the one I posted in
http://cygwin.com/ml/cygwin/2009-03/msg00693.html
For instance, if you set $LANG to "de_DE", the charset will become
CP1252, as is the default on german Windows systems. If you set
$LANG to "de_DE.ISO-8859-15", you will get iso-8859-15 instead.
Setting it to "de_DE.UTF-8" ... you get the idea.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -