delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/1996/12/22/13:19:14

From: richardd AT cix DOT compulink DOT co DOT uk (Richard Donkin)
Subject: B17.1 Ctrl/Z processing
22 Dec 1996 13:19:14 -0800 :
Sender: daemon AT cygnus DOT com
Approved: cygnus DOT gnu-win32 AT cygnus DOT com
Distribution: cygnus
Message-ID: <memo.779262.cygnus.gnu-win32@cix.compulink.co.uk>
Reply-To: richardd AT cix DOT compulink DOT co DOT uk
Original-To: gnu-win32 AT cygnus DOT com
Original-Cc: richardd AT cix DOT compulink DOT co DOT uk
Original-Sender: owner-gnu-win32 AT cygnus DOT com

Something strange started happening when I did a cmp and wc on two large files - I used an editor 
to create test.txt from the start of xx.txt, then checked the size with ls -l.  However, a wc of 
test.txt makes the file seem the same size as xx.txt, and cmp suggests they are identical!  cmp is 
working for smaller files as shown at the end.

I am using B17.1 on Win95.  I tried a DOS version of wc and it worked OK on the same files, so this 
is something to do with Cygwin.dll.

Here is what happens:

chanrossa(1) download2$ ls -l
total 11404
-rw-r--r--   1 500      everyone 15350889 Dec 21 17:00 xx.txt
-rw-r--r--   1 500      everyone  7820765 Dec 22 10:26 test.txt
-rw-r--r--   1 500      everyone   181129 Dec 22 10:27 test2.txt
chanrossa(1) download2$ wc *
  23120   61481 1404603 xx.txt
  23120   61481 1404603 test.txt
   3000    7383  178129 test2.txt
  49240  130345 2987335 total
chanrossa(1) download2$ which wc
wc is /UNIX/H-I386-CYGWIN32/BIN/wc
chanrossa(1) download2$ cmp test.txt xx.txt
chanrossa(1) download2$ which cmp
cmp is /UNIX/H-I386-CYGWIN32/BIN/cmp
chanrossa(1) download2$ cmp test.txt xx.txt
chanrossa(1) download2$ cmp test2.txt test.txt
cmp: EOF on test2.txt
chanrossa(1) download2$

After some more investigation, I have worked out that at line 23120 (in both xx.txt and test.txt) 
where wc and cmp stop processing there is a Ctrl/Z character.

As it says in the FAQ:

   Control-Z's are now handled as a valid EOF token in files opened as text.
   
Unfortunately, handling Ctrl/Z's like this is extremely un-Unix-like - and in fact most Windows 
tools, and many DOS tools, ignore Ctrl/Z these days.  For example, the old DOS COPY command built 
in to COMMAND.COM processes Ctrl/Z and has a /B option to ignore them, but the more modern XCOPY 
and XCOPY32 commands supplied since DOS 4 or 5 ignore Ctrl/Z's at all times and have no options to 
handle them.

I would like Ctrl/Z handling to at least be configurable, perhaps using the filesystem mount 
technique used for CR/CRLF handling, or some sort of global registry setting (preferable).

The MKS toolkit programs completely ignore Ctrl/Z, which seems much more Unix-like and also does 
not silently truncate files like this.  The Ctrl/Z actually occurred in a DOS text file that 
somebody sent me in an email, so I did not even know it was there.

The \n to \r\n mapping is inherent in the way that DOS/Windows stores files, but Ctrl/Z processing 
is really a DOS relic.  If people are still using apps with Gnu-Win32 (almost certainly old DOS 
applications) that still need this semantics, this should be configurable and probably not the 
default.

Although taking out the Ctrl/Z manually is feasible in this case, this will cause message lengths 
to be wrong in the email file, which may cause problems re-importing the messages.  Even taking out 
the Ctrl/Z's requires writing a small C program, since all the obvious Gnu tools (cat -v, tr, etc) 
open files in text mode ...

Richard
--
richardd AT cix DOT compulink DOT co DOT uk                http://www.inside-edge.co.uk/
Inside Edge Consultancy           Client/Server and Internet Applications
PGP key from:  pgp-public-keys AT keys DOT pgp DOT net  -or-  http://www.four11.com/

-
For help on using this list, send a message to
"gnu-win32-request AT cygnus DOT com" with one line of text: "help".

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019