delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
X-SWARE-Spam-Status: | No, hits=1.2 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_14,J_CHICKENPOX_42,SARE_MSGID_LONG40,SARE_SUB_ENC_UTF8,SPF_PASS |
X-Spam-Check-By: | sourceware.org |
MIME-Version: | 1.0 |
In-Reply-To: | <e2480c70905140614w427eb5bcpf1482512e43f70a@mail.gmail.com> |
References: | <e2480c70905140614w427eb5bcpf1482512e43f70a AT mail DOT gmail DOT com> |
Date: | Thu, 14 May 2009 17:35:58 +0400 |
Message-ID: | <e2480c70905140635q4fdcd53bt9db497f81477205@mail.gmail.com> |
Subject: | [1.7] Problem with national characters in directory names when using UTF-8 charset |
From: | Alexey Borzenkov <snaury AT gmail DOT com> |
To: | cygwin AT cygwin DOT com |
X-IsSubscribed: | yes |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Unsubscribe: | <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
There is something strange going on with national characters in directory names when using Cygwin 1.7 with UTF-8. Here's a sample session: # test.rb # -*- coding: utf-8 -*- filename =3D File.expand_path("test.txt") puts filename puts File.open(filename) { |f| f.read } # test.txt This is a test C:\cygwin\home\aborzenkov> set LANG=3Den_US.UTF-8 C:\cygwin\home\aborzenkov> mkdir =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA= =D0=B0 C:\cygwin\home\aborzenkov> copy test.rb =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1= =80=D0=BA=D0=B0 C:\cygwin\home\aborzenkov> copy test.txt =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1= =80=D0=BA=D0=B0 C:\cygwin\home\aborzenkov> C:\cygwin\bin\ruby =D0=BF=D1=80=D0=BE=D0=B2=D0= =B5=D1=80=D0=BA=D0=B0/test.rb /usr/bin/ruby: No such file or directory -- =D0=BF=D1=80=D0=BE=D0=B2=D0=B5= =D1=80=D0=BA=D0=B0/test.rb (LoadError) C:\cygwin\home\aborzenkov> C:\cygwin\bin\cat =D0=BF=D1=80=D0=BE=D0=B2=D0=B5= =D1=80=D0=BA=D0=B0/test.txt This is a test C:\cygwin\home\aborzenkov> cd =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0= =B0 C:\cygwin\home\aborzenkov\=D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0=B0>= C:\cygwin\bin\ruby test.rb /home/aborzenkov/=E2=96=92??=E2=96=92N?=E2=96=92??=E2=96=92??=E2=96=92?=D1= =87=E2=96=92N?=E2=96=92??=E2=96=92?=C2=B0/test.txt This is a test C:\cygwin\home\aborzenkov\=D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0=B0>= C:\cygwin\bin\cat test.txt /usr/bin/cat: test.txt: No such file or directory C:\cygwin\home\aborzenkov\=D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0=B0>= C:\cygwin\bin\ls -al /usr/bin/ls: cannot open directory .: No such file or directory Why is it that some commands can't accept russian character in filenames, yet work within russian directories, and other can open filenames with russian paths, but can't work within russian directories? It seems extremely weird to me. :-/ Also, I'm wondering about this discrepancy: C:\cygwin\home\aborzenkov> C:\cygwin\bin\ruby /bin/irb irb(main):001:0> Dir.chdir("=D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0= =B0") =3D> 0 irb(main):002:0> File.expand_path("*") =3D> "/home/aborzenkov/\320\277\321\200\320\276\320\262\320\265\321\200\320= \272\320\260/*" C:\cygwin\home\aborzenkov\=D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0=B0>= C:\cygwin\bin\ruby /bin/irb irb(main):001:0> File.expand_path("*") =3D> "/home/aborzenkov/\016\320\277\016\321\200\016\320\276\016\320\262\016= \320\265\016\321\200\016\320\272\016\320\260/*" Notice how for the same current directory (one where cygwin session has done chdir to russian directory on its own, another where cygwin session was started in russian directory) give different results for File.expand_path in ruby. If I understood cygwin documentation correctly, \016 is supposed to appear only for character that cannot be represented with current charset (which is utf-8), yet in second case they appear all over the place. The same thing is happening with, for example, bash, which shows garbled pwd output when started from within russian directory, yet works well when I chdir to that directory manually. What's going on? -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |