From: richardd AT cix DOT compulink DOT co DOT uk (Richard Donkin) Subject: B17.1 Ctrl/Z processing 22 Dec 1996 13:19:14 -0800 Sender: daemon AT cygnus DOT com Approved: cygnus DOT gnu-win32 AT cygnus DOT com Distribution: cygnus Message-ID: Reply-To: richardd AT cix DOT compulink DOT co DOT uk Original-To: gnu-win32 AT cygnus DOT com Original-Cc: richardd AT cix DOT compulink DOT co DOT uk Original-Sender: owner-gnu-win32 AT cygnus DOT com Something strange started happening when I did a cmp and wc on two large files - I used an editor to create test.txt from the start of xx.txt, then checked the size with ls -l. However, a wc of test.txt makes the file seem the same size as xx.txt, and cmp suggests they are identical! cmp is working for smaller files as shown at the end. I am using B17.1 on Win95. I tried a DOS version of wc and it worked OK on the same files, so this is something to do with Cygwin.dll. Here is what happens: chanrossa(1) download2$ ls -l total 11404 -rw-r--r-- 1 500 everyone 15350889 Dec 21 17:00 xx.txt -rw-r--r-- 1 500 everyone 7820765 Dec 22 10:26 test.txt -rw-r--r-- 1 500 everyone 181129 Dec 22 10:27 test2.txt chanrossa(1) download2$ wc * 23120 61481 1404603 xx.txt 23120 61481 1404603 test.txt 3000 7383 178129 test2.txt 49240 130345 2987335 total chanrossa(1) download2$ which wc wc is /UNIX/H-I386-CYGWIN32/BIN/wc chanrossa(1) download2$ cmp test.txt xx.txt chanrossa(1) download2$ which cmp cmp is /UNIX/H-I386-CYGWIN32/BIN/cmp chanrossa(1) download2$ cmp test.txt xx.txt chanrossa(1) download2$ cmp test2.txt test.txt cmp: EOF on test2.txt chanrossa(1) download2$ After some more investigation, I have worked out that at line 23120 (in both xx.txt and test.txt) where wc and cmp stop processing there is a Ctrl/Z character. As it says in the FAQ: Control-Z's are now handled as a valid EOF token in files opened as text. Unfortunately, handling Ctrl/Z's like this is extremely un-Unix-like - and in fact most Windows tools, and many DOS tools, ignore Ctrl/Z these days. For example, the old DOS COPY command built in to COMMAND.COM processes Ctrl/Z and has a /B option to ignore them, but the more modern XCOPY and XCOPY32 commands supplied since DOS 4 or 5 ignore Ctrl/Z's at all times and have no options to handle them. I would like Ctrl/Z handling to at least be configurable, perhaps using the filesystem mount technique used for CR/CRLF handling, or some sort of global registry setting (preferable). The MKS toolkit programs completely ignore Ctrl/Z, which seems much more Unix-like and also does not silently truncate files like this. The Ctrl/Z actually occurred in a DOS text file that somebody sent me in an email, so I did not even know it was there. The \n to \r\n mapping is inherent in the way that DOS/Windows stores files, but Ctrl/Z processing is really a DOS relic. If people are still using apps with Gnu-Win32 (almost certainly old DOS applications) that still need this semantics, this should be configurable and probably not the default. Although taking out the Ctrl/Z manually is feasible in this case, this will cause message lengths to be wrong in the email file, which may cause problems re-importing the messages. Even taking out the Ctrl/Z's requires writing a small C program, since all the obvious Gnu tools (cat -v, tr, etc) open files in text mode ... Richard -- richardd AT cix DOT compulink DOT co DOT uk http://www.inside-edge.co.uk/ Inside Edge Consultancy Client/Server and Internet Applications PGP key from: pgp-public-keys AT keys DOT pgp DOT net -or- http://www.four11.com/ - For help on using this list, send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".