X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f From: Rugxulo Newsgroups: comp.os.msdos.djgpp Subject: Re: Bug in findfirst/findnext: mangles certain characters. Date: Fri, 26 Feb 2010 17:27:47 -0800 (PST) Organization: http://groups.google.com Lines: 92 Message-ID: <5099c66a-fad4-42b6-8fb0-aaae2f01d35e@19g2000yqu.googlegroups.com> References: <2PydnQe72P4H_BrWnZ2dnUVZ_vmdnZ2d AT giganews DOT com> NNTP-Posting-Host: 65.13.115.246 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1267234068 16428 127.0.0.1 (27 Feb 2010 01:27:48 GMT) X-Complaints-To: groups-abuse AT google DOT com NNTP-Posting-Date: Sat, 27 Feb 2010 01:27:48 +0000 (UTC) Complaints-To: groups-abuse AT google DOT com Injection-Info: 19g2000yqu.googlegroups.com; posting-host=65.13.115.246; posting-account=p5rsXQoAAAB8KPnVlgg9E_vlm2dvVhfO User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729),gzip(gfe),gzip(gfe) Bytes: 4989 X-Original-Bytes: 4946 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Hi, On Feb 25, 11:54=A0pm, "Robbie Hatley" wrote: > > I've noticed that whenever I write programs using djgpp which > rename files, if they encounter files with certain characters > in their names, rename attempts fail, because findfirst() > and findnext() change the numeric value of those characters, > apparently in an attempt to re-map them to some other > character encoding. > > Some extended-ASCII characters DO get through unmolested. > But most non-ASCII character get re-mapped. > (snip) > Those are all legal characters in both iso-8859-1 and > in Windows long file names. DJGPP only properly supports "C" locale, e.g. 7-bit ASCII. Anything extra isn't available. For pure DOS, you can try the third-party llocl102b.zip library, but even it may not work (haven't tested it much myself) and needs COUNTRY.SYS + DISPLAY + EGA?.CPI + KEYB or similar. (Henrique Peron of FreeDOS is the resident expert in this area, FYI, if you really really need help.) http://djgpp.cybermirror.org/current/v2tk/llocl02b.zip http://djgpp.cybermirror.org/current/v2tk/llocl02s.zip BTW, what Windows are you using? I'll guess XP. Anyways, I guess you know XP (even with FAT partitions?) uses UTF-16. So there is no Latin-1 there (nor was there any in Win9x either, cp850 is just an altered variant with most of the same glyphs). http://www.kostis.net/en/index.htm http://www.kostis.net/freeware/isocp101.zip isocp101.zip V1.01 1993-12-19 ISO 8859-x code pages for MS-DOS > BUT, for some reason, > findfirst() and findnext() convert them to other characters. > It looks to me like these functions are trying to convert > characters they don't like into characters with similar-looking > glyphs in some other encoding. > > This is broken, because it causes rename attempts to fail, > because no files actually exist with the altered versions > of their names given by findfirst/findnext. So this is a problem of findfirst / findnext or of rename or both? Does a simple findfirst / findnext app (e.g. ls.exe) report the names correctly? (Using iconv???) > I'm curious if anyone has run across this bug before? Probably not English-only Americans like me. I've (very very) briefly dabbled in codepages "for fun" (Latin-3 ftw!), but nothing hardcore. ;-) > And has this been fixed in recent versions? =A0(I'm using djgpp's > gcc version "4.2.3", so i'm about 3 versions behind the latest.) http://gcc.gnu.org/releases.html GCC 4.2.3 February 1, 2008 That's not really old, IMHO. Besides, it's not GCC proper's fault, per se, it's our libc (e.g. DJGPP) or OS or both or .... > If it's not been fixed, I suggest it should be put on the list of > "bugs to fix in next release". Cygwin (1.7) only recently gained full use of Unicode by dropping Win9x support (bleh). And DJGPP is not Cygwin. The problem may indeed lie with Windows (NTVDM limitation?). As mentioned, pure DOS is a whole other ball of wax. > In the mean time, anyone know of any workarounds for this? > Some way to turn off the "character re-mapping" which > findfirst and findnext are doing, and force them to retain > the original numeric value of each character? Does a simple "ren blah blah2" at the shell work? Bash? 4DOS? WinXP CMD or command.com? FreeCOM? You'll have to test some things to see what to expect, what works, etc. P.S. The best (only??) DJGPP program to really support i18n features is the text editor Mined (just released 2000.16). It probably has some stuff in there that you would find useful. Give it a whirl in addition to trying some of the above-mentioned stuff for completeness. http://www.towo.net/mined/