delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2014/02/14/07:05:38

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:reply-to:message-id:to:subject
:in-reply-to:references:mime-version:content-type
:content-transfer-encoding; q=dns; s=default; b=I3emwTG3cxtDfRxp
mcj7jDevI+89eeDsLAoLt4DqNMnZAFiRosOPPKgDU9ePGuGbLcTdZN0Yb4/F7eTm
0PnQfO2f5IuBjp+eqIJyCak0nIF0S1/G2n7rCOlV+7/+DlkDKOob7qBZpPStoyEO
3RcjyLa6uOXX9DqGZUsczxXx6hE=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:reply-to:message-id:to:subject
:in-reply-to:references:mime-version:content-type
:content-transfer-encoding; s=default; bh=sgphzRUNGqX3pvZ5bzSGXH
WjmUU=; b=RZmHVm+ye9ABPy2cpfSZb949mSPFc5ciuvr0/GnZTc+Maibofq5UdI
t61ryfHo+pkx+EXMohabnYnMCWn3l+KoRZUBYa9sfHMZIULMrZm/rsU2dyx65c4M
IAMmfEa25kPl7DlEbX0YXK6SF3BA9sFwHU380eCHUu9p3CtXIQkMI=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_THEBAT,SPF_SOFTFAIL autolearn=no version=3.3.2
X-HELO: smtpback.ht-systems.ru
Date: Fri, 14 Feb 2014 15:56:31 +0400
From: Andrey Repin <anrdaemon AT yandex DOT ru>
Reply-To: Andrey Repin <cygwin AT cygwin DOT com>
Message-ID: <1078913914.20140214155631@mtu-net.ru>
To: Corinna Vinschen <cygwin AT cygwin DOT com>
Subject: Re: New passwd/group handling in Cygwin - test results and observations
In-Reply-To: <20140214102044.GX2246@calimero.vinschen.de>
References: <20140213143849 DOT GH2246 AT calimero DOT vinschen DOT de> <1717869165 DOT 20140214021113 AT mtu-net DOT ru> <20140214102044 DOT GX2246 AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
X-IsSubscribed: yes

Greetings, Corinna Vinschen!

>> The issue can be observed when you have a user or group name containing
>> characters outside basic ASCII character set. Even western diacritics will
>> suffice.
>> 
>> Add somewhere in your startup files an equivalent of the following block:
>> (I have it in private .profile)
>> 
>> ---->8-------->8-------->8-------->8-------->8-------->8-------->8----
>> case "$TERM" in
>>   xterm*)
>>     LANG=ru_RU.UTF-8
>>     ;;
>>   *)
>>     LANG=ru_RU.CP866
>>     ;;
>> esac
>> 
>> export PATH HISTCONTROL LANG
>> ----8<--------8<--------8<--------8<--------8<--------8<--------8<----
>> 
>> restart your shell, and try to ls -l a directory, where you have files owned
>> by abovementioned user/group.
>> 
>> Try it in mintty(the encoding will be UTF-8 and names will show up readable)
>> and in native console (with appropriate single-byte encoding, the names will
>> still be printed in unicode, means, raw byte sequences will be dumped to
>> terminal).
>> I though it could be affected by the fact I'm changing LANG on the fly, but
>> starting bash in a console that initially have correct LANG= variable doesn't
>> change observed results.

> Yes, this is a problem, and I'm not sure how to fix it, if at all.

> The problem is hopefully obvious.  We have to initialize things in some
> order.  For instance, to read /etc/fstab.d/$USER, we need the username.
> And since the Cygwin username can be different from the Windows username
> (I guess I should have never added this functionality in the first
> place),

I feel your pain...

> we have to read the user's passwd before we read the fstabs.

> Same for the initialization of $LANG and friends.  That occurs pretty
> late in the process initialization. You know that Windows uses UTF-16
> under the hood, so a lot of stuff gets read and converted to UTF-8
> before we even care for the environment. And if you set the codeset in
> the application only, all the relevant information has already been read
> long ago, of course.

> But this is a problem not different from Linux.  If you have a username
> with non-ASCII chars, it will use *some* encoding in the passwd DB,
> usually UTF-8 these days.  If you then change the codeset in your
> application, you will still get your username in UTF-8.  It won't be
> changed on the fly, just because your application calls setlocale.

I understand it (mostly), but there's actually two issues, not one.
One issue is the display part, where names are output for user consumption.
Another can be observed in, i.e., rsync, and file access in general (remember
the discussion about accessing long directory names in unicode).
Changing LANG variable DO matter for the latter, and you may only hope that
whatever is output in the former case is actually printable (thank God, most
of the time it actually is, in case of UTF-8).
It is getting even more complicated, when you consider the fact, that in
Windows you have 2 different single-byte encodings, so-called ANSI (for GUI
applications) and OEM (for console). And alot of stuff making assumptions
without consulting with current status of things.
As convoluted the problem is, I think, we need some sort of solution, or at
the very least - documentation.


--
WBR,
Andrey Repin (anrdaemon AT yandex DOT ru) 14.02.2014, <15:15>

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019