X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f X-Recipient: djgpp AT delorie DOT com Date: Wed, 03 Mar 2010 22:15:16 +0200 From: Eli Zaretskii Subject: Re: Bug in findfirst/findnext: mangles certain characters. In-reply-to: <3efa6f5b-92ae-4e91-b4f7-cb3e8cb9f772@f35g2000yqd.googlegroups.com> To: djgpp AT delorie DOT com Message-id: <83ocj52mcr.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: QUOTED-PRINTABLE X-012-Sender: halo1 AT inter DOT net DOT il References: <2PydnQe72P4H_BrWnZ2dnUVZ_vmdnZ2d AT giganews DOT com> <4b8ba6a1 AT news DOT x-privat DOT org> <3efa6f5b-92ae-4e91-b4f7-cb3e8cb9f772 AT f35g2000yqd DOT googlegroups DOT com> Reply-To: djgpp AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk > From: Rugxulo > Date: Wed, 3 Mar 2010 11:12:09 -0800 (PST) >=20 > > > ... but with Windows. =A0It is /Windows/ that is renaming the f= ile... > > > > I don't know what you mean by "windows is renaming the file". > > *I* am the one attempting to rename a file. >=20 > ... by asking Windows to do it. This is all a bit weird, but it > *appears* to be Windows' fault. Maybe it's some setting somewhere > (most likely), but it seems that a file called "A^.txt" (capital A > with circumflex) is returned as "a.txt" by DOS findfirst / findnext > regardless of using DJGPP or something else. Windows transparently translates UTF-16 encoded file names to the OEM codepage used by the DOS box. This translation is in general lossy. That's all there is to it. > "cmd /c dir /x" will show > a completely random (and different but *valid* "7CE5~1.TXT" in my > case ...) SFN unlike what DOS apps show ("command /c dir"). SFNs are something entirely different, and unrelated. findfirst returns the _long_ file name (by default, unless you disable LFNs by setting USE_LFN=3Dn in the environment). > > Unicode is not relevant here, since we're talking > > about single-byte encodings. >=20 > But the problem (for you) is on Win2k, which internally is UTF-16. Internally, yes. But it will only return UTF-16 encoded file names i= f you use Unicode APIs. DJGPP (and DOS programs in general) cannot access those APIs, so we can only get single-byte encodings. > > And the ideal solution to the bug would be to fix it, > > rather than to tell people to abandon djgpp as being useless. > > Fixing it should not be that hard. >=20 > If it was fixable on DJGPP's end, that would indeed be a good idea. It isn't. Not without having the locales supported much better than what we have now. And even then we will see characters replaced with `?' sometimes, because they cannot be encoded in the OEM charset. > P.S. It is indeed bad / weird (but unavoidable???) that "ls.exe -l > *.txt" does this: >=20 > [ Vista/DJGPP ] - Wed 03/03/2010 >ls *.txt -l > ls: A.txt: No such file or directory (ENOENT) > -rw-r--r-- 1 Rugxulo root 3 Mar 3 13:00 ?.txt > -rw-r--r-- 1 Rugxulo root 3 Mar 3 12:53 ?.txt > -rw-r--r-- 1 Rugxulo root 4 Mar 3 12:55 ??.txt I see the same with a MinGW compiled ls.exe, for file names that have characters outside the OEM codepage.