X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f NNTP-Posting-Date: Wed, 03 Mar 2010 08:08:44 -0600 From: "Robbie Hatley" Newsgroups: comp.os.msdos.djgpp References: <2PydnQe72P4H_BrWnZ2dnUVZ_vmdnZ2d AT giganews DOT com> <4b8ba6a1 AT news DOT x-privat DOT org> Subject: Re: Bug in findfirst/findnext: mangles certain characters. Date: Wed, 3 Mar 2010 06:08:26 -0800 Organization: Tustin Free Zone X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2800.1983 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1983 Message-ID: Lines: 156 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-feIgRXNIfrFRMwe4FjEcH3Q2+h/T4sN7BqIBPw2CuqwiwN+sL5SCWW8toghwHGt18DKdGSnsSRLhm7G!t2f/rZzpdM7D0uliHLeR266Ovq+TykicKu4iEHQ6OhJORSdIC73AdWbyEGtod9WAx7Q79UI6TA== X-Complaints-To: abuse AT giganews DOT com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 Bytes: 6869 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com "Jason Hood" wrote: > On 28/02/2010 18:43, Robbie Hatley wrote: > > Try renaming THAT file using findfirst/findnext and rename(). > > It won't work. (Findfirst will turn "Ìn-Ít" to "In-It", > > for one thing. rename() will then complain "no such file".) > > The problem is not with findfirst ... Actually, I'm pretty sure it is. > ... but with Windows. It is /Windows/ that is renaming the file... I don't know what you mean by "windows is renaming the file". *I* am the one attempting to rename a file. In the case of the example file name I gave, Windows holds the file name to be: "Fíle-Nàme-Wíth-Måny-ïsõ-8859-1-Lëttêrs-Ìn-Ít.txt" That is a fully-valid Windows Long File name, and Windows makes no complaint about it and does not attempt to rename it. Do you mean that Windows is remapping the character set of the file name before handing it to findfirst()? Are you *sure* of that? Do you have the full source code in front of you as you say that? Have you thoroughly inspected it? Or, as I suspect, are you just *assuming* that's what's happening? Personally, I don't think that's what's happening at all. I think the remapping is in the findfirst/findnext code. In the case of the example file I give above, what findfirst() returns is the following: "Fíle-Nàme-Wíth-Måny-ïsõ-8859-1-Lëttêrs-In-It.txt" Do you see the 2 differences? 1. capital-I-grave-accent --> capital-I-no-accent 2. capical-I-acute-accent --> capital-I-no-accent Non-injective, hence non-invertable, hence non-remapable. And since no such file exists, rename() fails. Also, I have determined that Code Pages have nothing to do with it. I tried the following experiment with two differnet code pages, and determined that while the file name from findfirst() LOOKS very different, if you look at the raw numbers, they're the same regardless of Code Page. Code pages just re-map character sets immediately before writing text to the screen. Has nothing to do with the encodings used in file names or by findfirst()/findnext(). ======================================================== Using Code Page 437: wd=E:\TEST-R~1\FINDFI~1 %chcp 437 Active code page: 437 wd=E:\TEST-R~1\FINDFI~1 %findfirst-test Fíle-Nàme-Wíth-Måny-ïso-8859-1-Lëttêrs-And-Symbols_¿¼½_x÷_In-It.txt 70, 161, 108, 101, 45, 78, 133, 109, 101, 45, 87, 161, 116, 104, 45, 77, 134, 11 0, 121, 45, 139, 115, 111, 45, 56, 56, 53, 57, 45, 49, 45, 76, 137, 116, 116, 13 6, 114, 115, 45, 65, 110, 100, 45, 83, 121, 109, 98, 111, 108, 115, 95, 168, 172 , 171, 95, 120, 246, 95, 73, 110, 45, 73, 116, 46, 116, 120, 116, ======================================================== Using Code Page 1252: wd=E:\TEST-R~1\FINDFI~1 %chcp 1252 Active code page: 1252 wd=E:\TEST-R~1\FINDFI~1 %findfirst-test F¡le-N.me-W¡th-M?ny- ... so there is nothing that findfirst can do... Sure there is. It can stop remapping! > The same problem even happens with Win32 console programs, > when switching the file APIs to OEM. I don't think that's the same thing at all. > The ideal solution is to use a Windows Unicode program, so > you might want to have a look at MinGW. Unicode is not relevant here, since we're talking about single-byte encodings. And the ideal solution to the bug would be to fix it, rather than to tell people to abandon djgpp as being useless. Fixing it should not be that hard. As near as I can tell, there's some remapping code in findfirst()/findnext() which is remapping file names to something very-close-to (but not quite) Code Page 437, possibly because that's the default code page for MS-DOS and Windows Command Consoles. If the strings from findfirst() are printed to the screen using that code page (and no other), the file names then look almost correct. But this is horribly broken, because rename() cannot understand the remapped file names! It expects the original, un-remapped numerical values from the Windows Long File Name. If it gets a remapped version, it will fail with error "File Not Found". Ironically, for the names returned by findfirst() to be USEFUL, if you print them to the screen they should look like GIBBERISH in the default code page, not even remotely similar to the correct file names. (This is because CP437 is so drastically different from iso-8859-1, which seems to be what Windows is using for Long File Names, unless you force it to use Unicode by using Hebrew or Chinese or some such.) But in reality, the names returned by findfirst(), when printed on the screen using CP437, look almost correct. And that is a very bad sign. It means that these functions are broken, and cannot handle most Windows Long File Names which contain characters over char(126). -- Cheers, Robbie Hatley lonewolf at well dot com www dot well dot com slant tilde lonewolf slant