Mail Archives: cygwin/1996/12/22/13:19:14
Something strange started happening when I did a cmp and wc on two large files - I used an editor
to create test.txt from the start of xx.txt, then checked the size with ls -l. However, a wc of
test.txt makes the file seem the same size as xx.txt, and cmp suggests they are identical! cmp is
working for smaller files as shown at the end.
I am using B17.1 on Win95. I tried a DOS version of wc and it worked OK on the same files, so this
is something to do with Cygwin.dll.
Here is what happens:
chanrossa(1) download2$ ls -l
total 11404
-rw-r--r-- 1 500 everyone 15350889 Dec 21 17:00 xx.txt
-rw-r--r-- 1 500 everyone 7820765 Dec 22 10:26 test.txt
-rw-r--r-- 1 500 everyone 181129 Dec 22 10:27 test2.txt
chanrossa(1) download2$ wc *
23120 61481 1404603 xx.txt
23120 61481 1404603 test.txt
3000 7383 178129 test2.txt
49240 130345 2987335 total
chanrossa(1) download2$ which wc
wc is /UNIX/H-I386-CYGWIN32/BIN/wc
chanrossa(1) download2$ cmp test.txt xx.txt
chanrossa(1) download2$ which cmp
cmp is /UNIX/H-I386-CYGWIN32/BIN/cmp
chanrossa(1) download2$ cmp test.txt xx.txt
chanrossa(1) download2$ cmp test2.txt test.txt
cmp: EOF on test2.txt
chanrossa(1) download2$
After some more investigation, I have worked out that at line 23120 (in both xx.txt and test.txt)
where wc and cmp stop processing there is a Ctrl/Z character.
As it says in the FAQ:
Control-Z's are now handled as a valid EOF token in files opened as text.
Unfortunately, handling Ctrl/Z's like this is extremely un-Unix-like - and in fact most Windows
tools, and many DOS tools, ignore Ctrl/Z these days. For example, the old DOS COPY command built
in to COMMAND.COM processes Ctrl/Z and has a /B option to ignore them, but the more modern XCOPY
and XCOPY32 commands supplied since DOS 4 or 5 ignore Ctrl/Z's at all times and have no options to
handle them.
I would like Ctrl/Z handling to at least be configurable, perhaps using the filesystem mount
technique used for CR/CRLF handling, or some sort of global registry setting (preferable).
The MKS toolkit programs completely ignore Ctrl/Z, which seems much more Unix-like and also does
not silently truncate files like this. The Ctrl/Z actually occurred in a DOS text file that
somebody sent me in an email, so I did not even know it was there.
The \n to \r\n mapping is inherent in the way that DOS/Windows stores files, but Ctrl/Z processing
is really a DOS relic. If people are still using apps with Gnu-Win32 (almost certainly old DOS
applications) that still need this semantics, this should be configurable and probably not the
default.
Although taking out the Ctrl/Z manually is feasible in this case, this will cause message lengths
to be wrong in the email file, which may cause problems re-importing the messages. Even taking out
the Ctrl/Z's requires writing a small C program, since all the obvious Gnu tools (cat -v, tr, etc)
open files in text mode ...
Richard
--
richardd AT cix DOT compulink DOT co DOT uk http://www.inside-edge.co.uk/
Inside Edge Consultancy Client/Server and Internet Applications
PGP key from: pgp-public-keys AT keys DOT pgp DOT net -or- http://www.four11.com/
-
For help on using this list, send a message to
"gnu-win32-request AT cygnus DOT com" with one line of text: "help".
- Raw text -