delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2010/03/03/15:30:57

X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f
X-Recipient: djgpp AT delorie DOT com
Date: Wed, 03 Mar 2010 22:15:16 +0200
From: Eli Zaretskii <eliz AT gnu DOT org>
Subject: Re: Bug in findfirst/findnext: mangles certain characters.
In-reply-to: <3efa6f5b-92ae-4e91-b4f7-cb3e8cb9f772@f35g2000yqd.googlegroups.com>
To: djgpp AT delorie DOT com
Message-id: <83ocj52mcr.fsf@gnu.org>
MIME-version: 1.0
X-012-Sender: halo1 AT inter DOT net DOT il
References: <2PydnQe72P4H_BrWnZ2dnUVZ_vmdnZ2d AT giganews DOT com> <hmbvg7$ieq$1 AT speranza DOT aioe DOT org> <Br-dnQ3aTcaMsRfWnZ2dnUVZ_uSdnZ2d AT giganews DOT com> <4b8ba6a1 AT news DOT x-privat DOT org> <OKSdnaHEbqHx8BPWnZ2dnUVZ_h2dnZ2d AT giganews DOT com> <3efa6f5b-92ae-4e91-b4f7-cb3e8cb9f772 AT f35g2000yqd DOT googlegroups DOT com>
Reply-To: djgpp AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

> From: Rugxulo <rugxulo AT gmail DOT com>
> Date: Wed, 3 Mar 2010 11:12:09 -0800 (PST)
>=20
> > > ... but with Windows. =A0It is /Windows/ that is renaming the f=
ile...
> >
> > I don't know what you mean by "windows is renaming the file".
> > *I* am the one attempting to rename a file.
>=20
> ... by asking Windows to do it. This is all a bit weird, but it
> *appears* to be Windows' fault. Maybe it's some setting somewhere
> (most likely), but it seems that a file called "A^.txt" (capital A
> with circumflex) is returned as "a.txt" by DOS findfirst / findnext
> regardless of using DJGPP or something else.

Windows transparently translates UTF-16 encoded file names to the OEM
codepage used by the DOS box.  This translation is in general lossy.
That's all there is to it.

> "cmd /c dir /x" will show
> a completely random (and different but *valid* "7CE5~1.TXT" in my
> case ...) SFN unlike what DOS apps show ("command /c dir").

SFNs are something entirely different, and unrelated.  findfirst
returns the _long_ file name (by default, unless you disable LFNs by
setting USE_LFN=3Dn in the environment).

> > Unicode is not relevant here, since we're talking
> > about single-byte encodings.
>=20
> But the problem (for you) is on Win2k, which internally is UTF-16.

Internally, yes.  But it will only return UTF-16 encoded file names i=
f
you use Unicode APIs.  DJGPP (and DOS programs in general) cannot
access those APIs, so we can only get single-byte encodings.

> > And the ideal solution to the bug would be to fix it,
> > rather than to tell people to abandon djgpp as being useless.
> > Fixing it should not be that hard.
>=20
> If it was fixable on DJGPP's end, that would indeed be a good idea.

It isn't.  Not without having the locales supported much better than
what we have now.  And even then we will see characters replaced with
`?' sometimes, because they cannot be encoded in the OEM charset.

> P.S. It is indeed bad / weird (but unavoidable???) that "ls.exe -l
> *.txt" does this:
>=20
> [ Vista/DJGPP  ] - Wed 03/03/2010 >ls *.txt -l
> ls: A.txt: No such file or directory (ENOENT)
> -rw-r--r--    1 Rugxulo  root            3 Mar  3 13:00 ?.txt
> -rw-r--r--    1 Rugxulo  root            3 Mar  3 12:53 ?.txt
> -rw-r--r--    1 Rugxulo  root            4 Mar  3 12:55 ??.txt

I see the same with a MinGW compiled ls.exe, for file names that have
characters outside the OEM codepage.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019