X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f From: Rugxulo Newsgroups: comp.os.msdos.djgpp Subject: Re: Bug in findfirst/findnext: mangles certain characters. Date: Wed, 3 Mar 2010 11:12:09 -0800 (PST) Organization: http://groups.google.com Lines: 87 Message-ID: <3efa6f5b-92ae-4e91-b4f7-cb3e8cb9f772@f35g2000yqd.googlegroups.com> References: <2PydnQe72P4H_BrWnZ2dnUVZ_vmdnZ2d AT giganews DOT com> <4b8ba6a1 AT news DOT x-privat DOT org> NNTP-Posting-Host: 65.13.115.246 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1267643530 7750 127.0.0.1 (3 Mar 2010 19:12:10 GMT) X-Complaints-To: groups-abuse AT google DOT com NNTP-Posting-Date: Wed, 3 Mar 2010 19:12:10 +0000 (UTC) Complaints-To: groups-abuse AT google DOT com Injection-Info: f35g2000yqd.googlegroups.com; posting-host=65.13.115.246; posting-account=p5rsXQoAAAB8KPnVlgg9E_vlm2dvVhfO User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729),gzip(gfe),gzip(gfe) Bytes: 5078 X-Original-Bytes: 5035 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Hi, On Mar 3, 8:08=A0am, "Robbie Hatley" wrote: > "Jason Hood" wrote: > > > The problem is not with findfirst ... > > Actually, I'm pretty sure it is. findfirst / findnext (4E, 4F or 714E, 714F for LFNs) are DOS functions which are emulated by Windows' NTVDM. DJGPP doesn't appear to do anything special here (see DJLSR204.ZIP: d_findf.c, d_findn.c, findfirs.c, findnext.c). > > ... but with Windows. =A0It is /Windows/ that is renaming the file... > > I don't know what you mean by "windows is renaming the file". > *I* am the one attempting to rename a file. ... by asking Windows to do it. This is all a bit weird, but it *appears* to be Windows' fault. Maybe it's some setting somewhere (most likely), but it seems that a file called "A^.txt" (capital A with circumflex) is returned as "a.txt" by DOS findfirst / findnext regardless of using DJGPP or something else. "cmd /c dir /x" will show a completely random (and different but *valid* "7CE5~1.TXT" in my case ...) SFN unlike what DOS apps show ("command /c dir"). Curiosly, "tdep 7ce5~1.txt" will open it, but "mined 7ce5~1.txt" won't. (Neither works on the bogus, non-existing "a.txt".) So "thar be dragons here." :-/ EDIT: Now here's a dumb question, what encoding is your file name in? A.TXT -> UTF-16 (no circumflex shown) -A:.TXT -> UTF-8 (two chars, looks like dash and 'a' w/ diaresis) T.TXT -> ISO-8859-1 (Latin-1; boxchar, not really a T, just looks like it) -||.TXT -> cp850 (one boxchar, looks like dash and two pipes) It varies on how it will be displayed (try "chcp 850"), but the bytes used indeed differ. I'm fairly certain that DJGPP can handle the last three but not the first (which doesn't exist, even to TDE/Win32 [tdew]). > Unicode is not relevant here, since we're talking > about single-byte encodings. But the problem (for you) is on Win2k, which internally is UTF-16. > And the ideal solution to the bug would be to fix it, > rather than to tell people to abandon djgpp as being useless. > Fixing it should not be that hard. If it was fixable on DJGPP's end, that would indeed be a good idea. However, I'm skeptical at this point. :-/ Besides, even if it was a confirmed Windows bug *and* they agreed to fix it (sigh), you personally would probably be out of luck since Win2k will be EOL'd completely in July. > Ironically, for the names returned by findfirst() > to be USEFUL, if you print them to the screen they > should look like GIBBERISH in the default code page, > not even remotely similar to the correct file names. > (This is because CP437 is so drastically different > from iso-8859-1, which seems to be what Windows is > using for Long File Names, unless you force it to > use Unicode by using Hebrew or Chinese or some such.) I think most people (esp. in Europe) just use cp850 as "close enough" to Latin-1 (most of the same glyphs, just different code points). Personally, I'd suggest Kostis' ISOLATIN.CPI for true Latin-1 although that's useless for Win2k (NT-based) on up. You should also try pure DOS (or DOSEMU) to see what happens. P.S. It is indeed bad / weird (but unavoidable???) that "ls.exe -l *.txt" does this: [ Vista/DJGPP ] - Wed 03/03/2010 >ls *.txt -l ls: A.txt: No such file or directory (ENOENT) -rw-r--r-- 1 Rugxulo root 3 Mar 3 13:00 ?.txt -rw-r--r-- 1 Rugxulo root 3 Mar 3 12:53 ?.txt -rw-r--r-- 1 Rugxulo root 4 Mar 3 12:55 ??.txt (Although if you redirect it to file it shows that '?' isn't used but instead the above-mentioned bytes, e.g. 0xC2)