Mail Archives: cygwin/2014/02/14/07:05:38
Greetings, Corinna Vinschen!
>> The issue can be observed when you have a user or group name containing
>> characters outside basic ASCII character set. Even western diacritics will
>> suffice.
>>
>> Add somewhere in your startup files an equivalent of the following block:
>> (I have it in private .profile)
>>
>> ---->8-------->8-------->8-------->8-------->8-------->8-------->8----
>> case "$TERM" in
>> xterm*)
>> LANG=ru_RU.UTF-8
>> ;;
>> *)
>> LANG=ru_RU.CP866
>> ;;
>> esac
>>
>> export PATH HISTCONTROL LANG
>> ----8<--------8<--------8<--------8<--------8<--------8<--------8<----
>>
>> restart your shell, and try to ls -l a directory, where you have files owned
>> by abovementioned user/group.
>>
>> Try it in mintty(the encoding will be UTF-8 and names will show up readable)
>> and in native console (with appropriate single-byte encoding, the names will
>> still be printed in unicode, means, raw byte sequences will be dumped to
>> terminal).
>> I though it could be affected by the fact I'm changing LANG on the fly, but
>> starting bash in a console that initially have correct LANG= variable doesn't
>> change observed results.
> Yes, this is a problem, and I'm not sure how to fix it, if at all.
> The problem is hopefully obvious. We have to initialize things in some
> order. For instance, to read /etc/fstab.d/$USER, we need the username.
> And since the Cygwin username can be different from the Windows username
> (I guess I should have never added this functionality in the first
> place),
I feel your pain...
> we have to read the user's passwd before we read the fstabs.
> Same for the initialization of $LANG and friends. That occurs pretty
> late in the process initialization. You know that Windows uses UTF-16
> under the hood, so a lot of stuff gets read and converted to UTF-8
> before we even care for the environment. And if you set the codeset in
> the application only, all the relevant information has already been read
> long ago, of course.
> But this is a problem not different from Linux. If you have a username
> with non-ASCII chars, it will use *some* encoding in the passwd DB,
> usually UTF-8 these days. If you then change the codeset in your
> application, you will still get your username in UTF-8. It won't be
> changed on the fly, just because your application calls setlocale.
I understand it (mostly), but there's actually two issues, not one.
One issue is the display part, where names are output for user consumption.
Another can be observed in, i.e., rsync, and file access in general (remember
the discussion about accessing long directory names in unicode).
Changing LANG variable DO matter for the latter, and you may only hope that
whatever is output in the former case is actually printable (thank God, most
of the time it actually is, in case of UTF-8).
It is getting even more complicated, when you consider the fact, that in
Windows you have 2 different single-byte encodings, so-called ANSI (for GUI
applications) and OEM (for console). And alot of stuff making assumptions
without consulting with current status of things.
As convoluted the problem is, I think, we need some sort of solution, or at
the very least - documentation.
--
WBR,
Andrey Repin (anrdaemon AT yandex DOT ru) 14.02.2014, <15:15>
Sorry for my terrible english...
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -