delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2010/03/16/15:36:42

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS
X-Spam-Check-By: sourceware.org
Reply-To: <dbyron AT dbyron DOT com>
From: "David Byron" <dbyron AT dbyron DOT com>
To: "'Andy Koppe'" <andy DOT koppe AT gmail DOT com>, <cygwin AT cygwin DOT com>
References: <493F5820D3F64434A76F433604C79D4A AT pleaset> <416096c61003160019p24e58433x4a969c0f99068fa6 AT mail DOT gmail DOT com> <6C05DF4D85804B3A865E7FE549B0475E AT pleaset> <416096c61003161315p504dff5dn7d1e847db01754c8 AT mail DOT gmail DOT com>
Subject: RE: filenames with characters that have the high bit set
Date: Tue, 16 Mar 2010 13:36:47 -0700
Message-ID: <21BD29CF0D2D42A1ABD17B6AEB918C77@pleaset>
MIME-Version: 1.0
In-Reply-To: <416096c61003161315p504dff5dn7d1e847db01754c8@mail.gmail.com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

> > > > $ echo $LC_ALL
> > > > en_US
> > >
> > > Hang on, where did that come from?

It was in my environment.  My apologies for being dense.

> > I unset LC_ALL and...
>=20
> Where?

I unset LC_ALL in bash, which was the wrong place.

> > Now ls foo<tab> adds the actual accented character to
> > the command line, but when I press return I get:
> >
> > ls: cannot access foo<a gray box>: No such file or directory

And of course this works now.  Sorry for the trouble.

> > I still get the right answer from test -f, when using
> > the shell builtin.  /usr/bin/test tells me the file
> > doesn't exist.
>=20
> .. and that.

As does this, as long as I use the same encoding I used to originally create
the file which is totally fine.

> > > The \x18 scheme is only used for codepoints that can
> > > not be represented in the selected character set, yet
> > > U+00E9 can be represented CP1252. By definition, any
> > > Unicode codepoint can be represented in UTF-8, so the
> > > \x18 scheme is never used when that is selected.
> > >
> > > To enable C-style backslash interpretation, you need
> > > to use $'...' quoting.
> >
> > I now see the bash man page explains this. =A0Must have
> > missed it the first time. =A0The above paragraphs with
> > some examples (where \x18 is needed and where it isn't)
> > added to
> >
http://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-unusual
> > would have gotten me farther before posting.
>=20
> But what I said is explained there already:

I suppose, but the point about \x18 not working with a character set that
represents the desired codepoint wasn't clear.  Nor was the bash syntax for
using \x in general.  It's in the bash man page and not cygwin-specific, but
an example showing the gory details would have helped me at least.

> > And finally here are the steps that illustrate what's going on.
> >
> > $ touch $'\x18'; echo $?
> > 0
> >
> > ls shows a file named up-arrow (0x18):
>=20
> What do you mean by up-arrow? I'm getting a question mark, because
> that's what ls prints for non-printable characters by default. You can
> choose various quoting styles using the --quoting style option.

I mean the uparrow that ls prints with --show-control-chars.  Another
important omission on my part.  Doh!

> Yep, but that's a bash vs ls issue rather than a Cygwin
> one. You'd get the same on Linux. But if you use control
> characters in filenames, you better know what you're doing
> anyway. Some argue that it shouldn't be allowed in the
> first place, e.g.
> http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

Thanks for the link.  I don't typically use control characters in filename.
Just an example.

> > $ mkshortcut -n shortcut$'\xC3\xA9' plain; echo $?
> > $ readshortcut shortcut$'\xE9'
>=20
> I'm afraid these aren't yet Unicode-ready, i.e. they still use Windows
> "ANSI" APIs.

Guess it's time to roll up my sleeves and write a patch.

-DB


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019