X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:reply-to:message-id:to:subject :in-reply-to:references:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=I3emwTG3cxtDfRxp mcj7jDevI+89eeDsLAoLt4DqNMnZAFiRosOPPKgDU9ePGuGbLcTdZN0Yb4/F7eTm 0PnQfO2f5IuBjp+eqIJyCak0nIF0S1/G2n7rCOlV+7/+DlkDKOob7qBZpPStoyEO 3RcjyLa6uOXX9DqGZUsczxXx6hE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:reply-to:message-id:to:subject :in-reply-to:references:mime-version:content-type :content-transfer-encoding; s=default; bh=sgphzRUNGqX3pvZ5bzSGXH WjmUU=; b=RZmHVm+ye9ABPy2cpfSZb949mSPFc5ciuvr0/GnZTc+Maibofq5UdI t61ryfHo+pkx+EXMohabnYnMCWn3l+KoRZUBYa9sfHMZIULMrZm/rsU2dyx65c4M IAMmfEa25kPl7DlEbX0YXK6SF3BA9sFwHU380eCHUu9p3CtXIQkMI= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_THEBAT,SPF_SOFTFAIL autolearn=no version=3.3.2 X-HELO: smtpback.ht-systems.ru Date: Fri, 14 Feb 2014 15:56:31 +0400 From: Andrey Repin Reply-To: Andrey Repin Message-ID: <1078913914.20140214155631@mtu-net.ru> To: Corinna Vinschen Subject: Re: New passwd/group handling in Cygwin - test results and observations In-Reply-To: <20140214102044.GX2246@calimero.vinschen.de> References: <20140213143849 DOT GH2246 AT calimero DOT vinschen DOT de> <1717869165 DOT 20140214021113 AT mtu-net DOT ru> <20140214102044 DOT GX2246 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Greetings, Corinna Vinschen! >> The issue can be observed when you have a user or group name containing >> characters outside basic ASCII character set. Even western diacritics will >> suffice. >> >> Add somewhere in your startup files an equivalent of the following block: >> (I have it in private .profile) >> >> ---->8-------->8-------->8-------->8-------->8-------->8-------->8---- >> case "$TERM" in >> xterm*) >> LANG=ru_RU.UTF-8 >> ;; >> *) >> LANG=ru_RU.CP866 >> ;; >> esac >> >> export PATH HISTCONTROL LANG >> ----8<--------8<--------8<--------8<--------8<--------8<--------8<---- >> >> restart your shell, and try to ls -l a directory, where you have files owned >> by abovementioned user/group. >> >> Try it in mintty(the encoding will be UTF-8 and names will show up readable) >> and in native console (with appropriate single-byte encoding, the names will >> still be printed in unicode, means, raw byte sequences will be dumped to >> terminal). >> I though it could be affected by the fact I'm changing LANG on the fly, but >> starting bash in a console that initially have correct LANG= variable doesn't >> change observed results. > Yes, this is a problem, and I'm not sure how to fix it, if at all. > The problem is hopefully obvious. We have to initialize things in some > order. For instance, to read /etc/fstab.d/$USER, we need the username. > And since the Cygwin username can be different from the Windows username > (I guess I should have never added this functionality in the first > place), I feel your pain... > we have to read the user's passwd before we read the fstabs. > Same for the initialization of $LANG and friends. That occurs pretty > late in the process initialization. You know that Windows uses UTF-16 > under the hood, so a lot of stuff gets read and converted to UTF-8 > before we even care for the environment. And if you set the codeset in > the application only, all the relevant information has already been read > long ago, of course. > But this is a problem not different from Linux. If you have a username > with non-ASCII chars, it will use *some* encoding in the passwd DB, > usually UTF-8 these days. If you then change the codeset in your > application, you will still get your username in UTF-8. It won't be > changed on the fly, just because your application calls setlocale. I understand it (mostly), but there's actually two issues, not one. One issue is the display part, where names are output for user consumption. Another can be observed in, i.e., rsync, and file access in general (remember the discussion about accessing long directory names in unicode). Changing LANG variable DO matter for the latter, and you may only hope that whatever is output in the former case is actually printable (thank God, most of the time it actually is, in case of UTF-8). It is getting even more complicated, when you consider the fact, that in Windows you have 2 different single-byte encodings, so-called ANSI (for GUI applications) and OEM (for console). And alot of stuff making assumptions without consulting with current status of things. As convoluted the problem is, I think, we need some sort of solution, or at the very least - documentation. -- WBR, Andrey Repin (anrdaemon AT yandex DOT ru) 14.02.2014, <15:15> Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple