X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Reply-To: From: "David Byron" To: "'Andy Koppe'" , References: <493F5820D3F64434A76F433604C79D4A AT pleaset> <416096c61003160019p24e58433x4a969c0f99068fa6 AT mail DOT gmail DOT com> <6C05DF4D85804B3A865E7FE549B0475E AT pleaset> <416096c61003161315p504dff5dn7d1e847db01754c8 AT mail DOT gmail DOT com> Subject: RE: filenames with characters that have the high bit set Date: Tue, 16 Mar 2010 13:36:47 -0700 Message-ID: <21BD29CF0D2D42A1ABD17B6AEB918C77@pleaset> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable In-Reply-To: <416096c61003161315p504dff5dn7d1e847db01754c8@mail.gmail.com> X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com > > > > $ echo $LC_ALL > > > > en_US > > > > > > Hang on, where did that come from? It was in my environment. My apologies for being dense. > > I unset LC_ALL and... >=20 > Where? I unset LC_ALL in bash, which was the wrong place. > > Now ls foo adds the actual accented character to > > the command line, but when I press return I get: > > > > ls: cannot access foo: No such file or directory And of course this works now. Sorry for the trouble. > > I still get the right answer from test -f, when using > > the shell builtin. /usr/bin/test tells me the file > > doesn't exist. >=20 > .. and that. As does this, as long as I use the same encoding I used to originally create the file which is totally fine. > > > The \x18 scheme is only used for codepoints that can > > > not be represented in the selected character set, yet > > > U+00E9 can be represented CP1252. By definition, any > > > Unicode codepoint can be represented in UTF-8, so the > > > \x18 scheme is never used when that is selected. > > > > > > To enable C-style backslash interpretation, you need > > > to use $'...' quoting. > > > > I now see the bash man page explains this. =A0Must have > > missed it the first time. =A0The above paragraphs with > > some examples (where \x18 is needed and where it isn't) > > added to > > http://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-unusual > > would have gotten me farther before posting. >=20 > But what I said is explained there already: I suppose, but the point about \x18 not working with a character set that represents the desired codepoint wasn't clear. Nor was the bash syntax for using \x in general. It's in the bash man page and not cygwin-specific, but an example showing the gory details would have helped me at least. > > And finally here are the steps that illustrate what's going on. > > > > $ touch $'\x18'; echo $? > > 0 > > > > ls shows a file named up-arrow (0x18): >=20 > What do you mean by up-arrow? I'm getting a question mark, because > that's what ls prints for non-printable characters by default. You can > choose various quoting styles using the --quoting style option. I mean the uparrow that ls prints with --show-control-chars. Another important omission on my part. Doh! > Yep, but that's a bash vs ls issue rather than a Cygwin > one. You'd get the same on Linux. But if you use control > characters in filenames, you better know what you're doing > anyway. Some argue that it shouldn't be allowed in the > first place, e.g. > http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html Thanks for the link. I don't typically use control characters in filename. Just an example. > > $ mkshortcut -n shortcut$'\xC3\xA9' plain; echo $? > > $ readshortcut shortcut$'\xE9' >=20 > I'm afraid these aren't yet Unicode-ready, i.e. they still use Windows > "ANSI" APIs. Guess it's time to roll up my sleeves and write a patch. -DB -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple