delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2010/03/03/14:30:16.1

X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f
From: Rugxulo <rugxulo AT gmail DOT com>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Bug in findfirst/findnext: mangles certain characters.
Date: Wed, 3 Mar 2010 11:12:09 -0800 (PST)
Organization: http://groups.google.com
Lines: 87
Message-ID: <3efa6f5b-92ae-4e91-b4f7-cb3e8cb9f772@f35g2000yqd.googlegroups.com>
References: <2PydnQe72P4H_BrWnZ2dnUVZ_vmdnZ2d AT giganews DOT com>
<hmbvg7$ieq$1 AT speranza DOT aioe DOT org> <Br-dnQ3aTcaMsRfWnZ2dnUVZ_uSdnZ2d AT giganews DOT com>
<4b8ba6a1 AT news DOT x-privat DOT org> <OKSdnaHEbqHx8BPWnZ2dnUVZ_h2dnZ2d AT giganews DOT com>
NNTP-Posting-Host: 65.13.115.246
Mime-Version: 1.0
X-Trace: posting.google.com 1267643530 7750 127.0.0.1 (3 Mar 2010 19:12:10 GMT)
X-Complaints-To: groups-abuse AT google DOT com
NNTP-Posting-Date: Wed, 3 Mar 2010 19:12:10 +0000 (UTC)
Complaints-To: groups-abuse AT google DOT com
Injection-Info: f35g2000yqd.googlegroups.com; posting-host=65.13.115.246;
posting-account=p5rsXQoAAAB8KPnVlgg9E_vlm2dvVhfO
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.7)
Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729),gzip(gfe),gzip(gfe)
Bytes: 5078
X-Original-Bytes: 5035
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Hi,

On Mar 3, 8:08=A0am, "Robbie Hatley"
<see DOT my DOT signat DOT  DOT  DOT  AT for DOT my DOT contact DOT info> wrote:
> "Jason Hood" wrote:
>
> > The problem is not with findfirst ...
>
> Actually, I'm pretty sure it is.

findfirst / findnext (4E, 4F or 714E, 714F for LFNs) are DOS functions
which are emulated by Windows' NTVDM. DJGPP doesn't appear to do
anything special here (see DJLSR204.ZIP: d_findf.c, d_findn.c,
findfirs.c, findnext.c).

> > ... but with Windows. =A0It is /Windows/ that is renaming the file...
>
> I don't know what you mean by "windows is renaming the file".
> *I* am the one attempting to rename a file.

... by asking Windows to do it. This is all a bit weird, but it
*appears* to be Windows' fault. Maybe it's some setting somewhere
(most likely), but it seems that a file called "A^.txt" (capital A
with circumflex) is returned as "a.txt" by DOS findfirst / findnext
regardless of using DJGPP or something else. "cmd /c dir /x" will show
a completely random (and different but *valid* "7CE5~1.TXT" in my
case ...) SFN unlike what DOS apps show ("command /c dir").

Curiosly, "tdep 7ce5~1.txt" will open it, but "mined 7ce5~1.txt"
won't. (Neither works on the bogus, non-existing "a.txt".) So "thar be
dragons here."    :-/

EDIT: Now here's a dumb question, what encoding is your file name in?

A.TXT -> UTF-16 (no circumflex shown)
-A:.TXT -> UTF-8 (two chars, looks like dash and 'a' w/ diaresis)
T.TXT -> ISO-8859-1 (Latin-1; boxchar, not really a T, just looks like
it)
-||.TXT -> cp850 (one boxchar, looks like dash and two pipes)

It varies on how it will be displayed (try "chcp 850"), but the bytes
used indeed differ. I'm fairly certain that DJGPP can handle the last
three but not the first (which doesn't exist, even to TDE/Win32
[tdew]).

> Unicode is not relevant here, since we're talking
> about single-byte encodings.

But the problem (for you) is on Win2k, which internally is UTF-16.

> And the ideal solution to the bug would be to fix it,
> rather than to tell people to abandon djgpp as being useless.
> Fixing it should not be that hard.

If it was fixable on DJGPP's end, that would indeed be a good idea.
However, I'm skeptical at this point.   :-/
Besides, even if it was a confirmed Windows bug *and* they agreed to
fix it (sigh), you personally would probably be out of luck since
Win2k will be EOL'd completely in July.

> Ironically, for the names returned by findfirst()
> to be USEFUL, if you print them to the screen they
> should look like GIBBERISH in the default code page,
> not even remotely similar to the correct file names.
> (This is because CP437 is so drastically different
> from iso-8859-1, which seems to be what Windows is
> using for Long File Names, unless you force it to
> use Unicode by using Hebrew or Chinese or some such.)

I think most people (esp. in Europe) just use cp850 as "close enough"
to Latin-1 (most of the same glyphs, just different code points).
Personally, I'd suggest Kostis' ISOLATIN.CPI for true Latin-1 although
that's useless for Win2k (NT-based) on up.

You should also try pure DOS (or DOSEMU) to see what happens.

P.S. It is indeed bad / weird (but unavoidable???) that "ls.exe -l
*.txt" does this:

[ Vista/DJGPP  ] - Wed 03/03/2010 >ls *.txt -l
ls: A.txt: No such file or directory (ENOENT)
-rw-r--r--    1 Rugxulo  root            3 Mar  3 13:00 ?.txt
-rw-r--r--    1 Rugxulo  root            3 Mar  3 12:53 ?.txt
-rw-r--r--    1 Rugxulo  root            4 Mar  3 12:55 ??.txt

(Although if you redirect it to file it shows that '?' isn't used but
instead the above-mentioned bytes, e.g. 0xC2)

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019