From: bob DOT mcgowan AT artecon DOT com (Bob McGowan) Subject: RE: echo is wrong... 16 Apr 1998 10:13:35 -0700 Message-ID: <8B40B8756FA1D111BCB900A02495E24F36B42C.cygnus.gnu-win32@neptune.xstor.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" To: "'Andrew Dalgleish'" Cc: "'gnu-win32 AT cygnus DOT com'" My comments are marked below by: [Bob McGowan] comment [END] -----Original Message----- From: Andrew Dalgleish [mailto:andrewd AT axonet DOT com DOT au] Sent: Monday, April 13, 1998 11:51 PM To: gnu-win32 AT cygnus DOT com Subject: RE: echo is wrong... --> -----Original Message----- --> From: Larry Hall [SMTP:lhall AT rfk DOT com] --> Sent: 1998 April 13, Monday 23:53 --> To: earnie_boyd AT hotmail DOT com --> Cc: gw32 --> Subject: Re: echo is wrong... --> --> At 05:13 AM 4/13/98 -0700, Earnie Boyd wrote: --> >---Larry Hall wrote: --> -- --> >Why? Who in there right mind would want anything but binary pipe --> >reads? What purpose would text pipes give? I can't think of any. --> >Pipes should always just pass along any data received. They should --> >never do anything with the data, including interpret a ^Z as the end --> >of file. -- --> I completely agree with you Earnie. Not that I want to start up a --> text vs --> binary war again but I've always come down on the side of using --> binary. --> While there may be reasons why its beneficial to have "text" mode --> files, --> its not at all clear to me that there are any benefits whatsoever to --> having --> "text" mode pipes. If there are some good reasons (and it might be --> interesting to hear what people think these could be), its also not --> clear --> to me that there are enough Win32 programs that would rely on "text" --> mode --> pipes to warrant the pain it causes all those who attempt to use the --> Cygwin utilities. --[Andrew Dalgleish] --Assuming you have text mode files, there is a very good reason for using --text mode pipes. --It is not a good idea to have a tool operate in two different modes (one --mode for reading from a file, one mode for reading from a pipe). --The characters which get passed through a pipe should be exactly the --same as the characters which would be written to a file. --This means translating end-of-line when reading and writing to a pipe, --but *only* if a tool opens the pipe in text mode. [Bob McGowan] You appear to be assuming that the application (more, cat or whatever) manipulates the pipe. I don't know how MS Win systems (or DOS, for that matter) do it, but I do know that in UNIX shells (including bash) all I/O redirection is handled by the shell. And the shell does not "know" whether the tool being used will want its data in binary or text mode. Safest, I think, is to do binary mode for the pipe. Then at least the data is passed in a consistent way. [END] ----snip snip --The plain vanilla Win32 tools are just as inconsistent with ^Z. --What little documentation there is suggests that ^Z is only used to --terminate stdin coming from the console, and is NOT the end-of-file --marker when reading from a file or a pipe. [Bob McGowan] The whole point of this discussion is that the ^Z IS interpreted. A binary file, containing an embedded ^Z character, read through a text mode file descriptor, will return EOF on reading the ^Z character. This results in the "truncated" file problems that so many posters have been talking about. [END] --Remember that fgetc() returns an int so it can hold EOF, if ^Z was the --end-of-file then fgetc() would return a char. [Bob McGowan] This logic does not work, for 2 reasons: 1) If you go back to C compilers ported to the DOS environment, you will find all sorts of UNIX'ish stuff that is clearly not supported by DOS (a good example is the stat structure, which has all 3 time fields, which all hold the same value, as well as fields for user id, group id and i-node number, none of which are valid for DOS). The DOS way of doing things is being translated, as best as possible, into the UNIX/C way. So, for text mode file descriptors, the underlying code could very well take a ^Z character and return whatever it needs to emulate the UNIX/C world. 2) Even on UNIX, this does not quite work. At the OS level, the read() system call will return 0 characters read on EOF, which is then translated by higher level routines to be whatever EOF is defined to be. The reason EOF is defined as an INT is so it can hold a value (generally -1, but not necessarily) that is guaranteed to NOT be a char. [END] --file A contains "123^Z456\n" (8 characters, ^Z == 0x1A) --type A --displays "123" --moreB --leaves B with 8 chars (123^Z456\n) --type A|more>B --leaves B with 11 chars (123^Z456\r\n\r\n) --moreB --leaves B with 9 chars (123^Z456\r\n) --I would suggest that ^Z is *never* used for the end-of-file when reading --from a file or pipe. [Bob McGowan] Per my commnents above, I clearly disagree with this statement. Also, MS itself has had to deal with this sort of thing. Refer to the documentation for "copy" and the /b switch, which forces binary mode. I have done binary downloads of "split" files which I needed to use in the MS environment. The tool to "cat" them together is "copy": copy a+b+c+... destfile But, the file+file format defaults to text mode and the above fails. The proper format is: copy /b a+b+c+... destfile And the reason has always been the presence, in the binary "split" pieces, of ^Z characters. The series of examples only prove that the utility is taking a peek at what is going on (writing a pipe vs. the console) and changing the file mode as the programmer deemed necessary. And I guess this could technically be taken to mean that ^Z is then not used as an end of file mark, but that is because the file is "probably" being accessed in binary mode (note this is a guess, I have no access to any source to prove the point). [END] --I use text files, and on the few occasions I run into problems I remind --myself that cygwin32 is not unix. --It's great, but it's not unix, so I don't expect everything to work --perfectly. --But I am satisfied more often than I am not. --Regards, --Andrew Dalgleish [Bob McGowan] I think that the series of examples you have here just shows to what lengths MS Win has to go to get things "right". Also, the point that this is NOT UNIX is well taken. A good understanding of the DOS/MS Win way of doing things helps a lot in understanding what is going on. And it is infinitely better, as it stands. But, the environment being set up IS trying to emulate UNIX as much as possible. And pipes as well as commands like "cat" are "expected" to do the right thing with both text and binary items. I think the safest, most consistent and reliable way of working this is to use binary mode file semantics in all cases. The other alternative would be to add code to test files and adjust the I/O in some way. But this adds complexity and potential problems. The final point is that I not only want the tools to work in as much of a UNIX way as possible, but they have to work consistently on both text and binary files to be useful. And pipes and commands that "always" work in text mode, cause me no end of problems. [END] --- Please accept my apologies for sending you this directly as well as to the list. The list is so flakey and slow currently that I felt this would be the more reliable and speedy way to get an answer to you. Bob McGowan i'm: bob dot mcgowan at artecon dot com - For help on using this list (especially unsubscribing), send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".