From: jqb AT netcom DOT com (Jim Balter) Subject: Re: ASCII and BINARY files. Why? 2 Feb 1997 23:32:23 -0800 Sender: daemon AT cygnus DOT com Approved: cygnus DOT gnu-win32 AT cygnus DOT com Distribution: cygnus Message-ID: <32F57A8E.6FCD.cygnus.gnu-win32@netcom.com> References: <199702030314 DOT AA25561 AT crl8 DOT crl DOT com> X-Mailer: Mozilla 3.01Gold (WinNT; I) MIME-Version: 1.0 Original-To: gnu-win32 AT cygnus DOT com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Original-Sender: owner-gnu-win32 AT cygnus DOT com Alex Stewart wrote: > > > POSIX is a flawed standard and always has been. It is fundamentally > > > incompatible with the already-established ANSI standard for C programming while > > > offering no substantial gains in its incompatibility. For this reason, the > > > POSIX standard should and must be ignored where such incompatibilities arise as > > > it is the only sane response to such an assenine flaw. > > > > Be careful of who you call asinine. POSIX *conforms* to ANSI C. > > ANSI C requires that files are opened in text mode by default. The ANSI C language standard standardized the C language. Standards committees for existing ad hoc entities (such as existing programming languages and operating systems) have a responsibility to cover existing practice. The C standard in particular had to have wide applicability. Since the C language was and is implemented on systems that have a useful text/binary distinction, such as VMS, and there was existing technology in the form of the "b" flag, the standards committee standardized that in the language spec. In *those implementations that make such a distinction*, the default mode is text mode. In those implementations that do not make such a distinction, the "b" flag is ignored. ANSI C does not mandate whether an implementation must make the distinction. It certainly does not mandate that implementations that run on Windows boxes must make the distinction (which makes Geoff Noer's comment that he is following the ANSI spec rather odd). > POSIX requires > that there is no distinction between text and binary files. POSIX also is a standard for an existing entity, namely ***unix***. POSIX grew out of the /usr/group standard initiated by Heinz Lycklama, a former boss of mine, VP at Interactive Systems Corporation, a unix VAR. In formulating a standard for unix, the POSIX standards committee had the responsibility to cover existing practice, which in unix means that files, even those open via fopen, are byte streams. Since ANSI C mandates that the default mode for fopen is "text" mode, but in unix fopen by default opens byte streams, which correspond to ANSI C's "binary streams", POSIX mandates that there is no distinction between the two, as ANSI C explicitly allows it to do (well, technically, it is only *explicit* in a footnote). Doing anything else would not have created a *unix* standard, which is what POSIX is: The purpose of this part of ISO/IEC 9945 [POSIX.1 -- jqb] is to define a standard operating systems interface and environment based upon the UNIX Operating System documentation to support application portability at the source level. To complain that the POSIX standard, by virtue of being what it is (a standard for unix) is "fatally flawed" is, well, that word you used. To say that it "offers no substantial gains" is woefully ignorant and confused. > These two > standards can coexist only on underlying systems where there is no distinction > between file types (such as POSIX OSes). POSIX and Windows are identical in lacking a distinction between text and binary files. The difference is that, because C and unix were designed together, the mapping from the C newline character to the unix end-of-line indicator is 1-1, and thus binary and text streams are equivalent. This is the crux of the matter, and a point that people (including myself) often miss or misunderstand in these discussions. ANSI C allows implementations to make text/binary distinction or not. POSIX, an API for systems in which the line terminators in files are the same as the line terminators in C, naturally does not make this distinction. Windows implementations are in a much more difficult position, because the line terminator in Windows does not match that of C, *yet there is no file type in Windows*. Thus, it is necessary under Windows to know, when a program writes a newline, whether it is writing a line terminator or just another byte. > Win32 is not a POSIX OS environment Which hardly makes POSIX "fatally flawed". Any more than the Win32 API is "fatally flawed" because it isn't a POSIX API. > in that it does distinguish between text and binary file types, This isn't true. If it were, GNU-win32 would have less of a problem. There is nothing in the Win32 API that allows you to open files in "text" mode or to mark files as being "text" files. Windows simply uses a different convention for terminating text lines than does unix/POSIX, one that also is different from the convention in C. That's why cygwin's imposing a text/binary distinction has so many problems. > and therefore > ANSI C requires that files be opened with newline conversions by > default, No, it does not. In an ANSI C implementation in which newlines are converted, such as VC++, you will not see carriage returns upon reading autoexec.bat. In an ANSI C implementation in which newlines are not converted, such as a unix system reading a copy of autoexec.bat or reading it via a network mount, or a GNU-win32 system with a filesystem mounted -b, you will see carriage returns upon reading autoexec.bat. ANSI C does not mandate which must occur, and thinking it does is a major misunderstanding (one that Geoff Noer apparently shares). Implementations can do newline translation or not, define a text/binary distinction or not, at their disgression. All ANSI C says is that *if* your implementation defines a text/binary distinction, fopen opens in text mode unless a "b" flag is provided. But the distinction is for the ANSI C implementation, and need not reflect the underlying system. > however the POSIX C standard requires that they not be. This is a > fundamental > incompatibility which renders POSIX inherently _incompatible_ with > ANSI C No, this is a serious misunderstanding at several different levels. POSIX is entirely compatible with the ANSI C standard, which allows ANSI C implementations to impose a text/binary distinction or not. POSIX simply mandates that those ANSI C implementations that are POSIX implementations must not impose such a distinction (of course, POSIX mandates a bunch of other things unrelated to ANSI C, like providing an API with specific (unix) semantics). > (please note here that we are discussing the POSIX API standard, not the larger > POSIX OS standards. Such issues in an OS specification would easily be > dismissed by simply saying "well, Win32 isn't a POSIX OS", however the POSIX > API specification should be applicable in an ANSI C, non-POSIX OS environment This is utter nonsense. The POSIX "OS standard" *is* the POSIX API standard. There is no "POSIX OS standard" separate from the API. What in the world do you think the POSIX API *is*?? All the rest of POSIX has to do with even more specific levels above the API, such as exactly how sh and cpio and termcap function. The POSIX API can be layered on other systems, such as Mach or even Windows NT. > (as is exactly what people are attempting with GNU-Win32), and the fact that it > cannot be is still a flaw) POSIX could be exactly emulated on GNU-Win32. However, the result wouldn't be very useful, because it wouldn't coexist at the same level as "native" Windows programs. It can be usefully emulated somewhat closely, but there are many POSIX facilities, such as file protection modes, effective uids, tty modes, ptys, device abstraction, fork, file locking, etc. etc. that are missing from Win32 or come in a radically different form or are done poorly. I don't know how many people on this list have an appreciation for just how difficult a job it is to implement GNU/POSIX under Win32. The POSIX API is not a matter line terminators; it is much more than that. But even if you emulate POSIX somewhat closely, you are left with the fact that in Windows text lines end with CRLF but in unix text lines end with LF. The same would hold true if you tried to emulate the Win32 API on a unix system. Programs that do CreateFile and do their own writing would fail miserably if the emulation magically transformed CRLF's in the written data to LF's (although they would fail less often, since CRLF's in binary files are of course less frequent than are LF's). > > Perhaps someone around here is an idiot and a moron, but it isn't > > me or those in charge of the GNU project. Since GNU programs require > > many POSIX extensions to ANSI C, such as, say, "stat", it is pointless > > to try to make GNU programs *strictly* ANSI conforming. But programs > > that conform to POSIX already conform (but not strictly) to ANSI C. > > They do not conform to ANSI C if they (for example) fopen a file without a "b" > flag and expect to read/write binary data from it without problems. This is simply false. "conform" is well defined in the standard. ANSI C allows implementations that do not distinguish between text and binary. All POSIX implementations are such implementations, including the Windows NT POSIX implementation. Of course, printf("hello world\n") from a program under that implementation will produce a file that doesn't contain a CR, but nothing in ANSI C or the Win32 API says it must. > Many GNU > utilities do this and are therefore incompatible with the ANSI C standard > (strictly or otherwise). Wrong. > There is no reason for this as it is possible to > design application code in such a way that it will function correctly under > both systems, and therefore any code which does not is flawed and requires a > bug fix. Programs that, say, print out the number of links to a file, such as ls, cannot be written in strictly conforming ANSI C, yet ls is not "incompatible with the ANSI C standard". Since a large percentage of GNU programs fall into this category, trying to convert individual pieces of GNU code to be strictly conforming to ANSI C, rather than merely conforming to the POSIX extensions, is pointless. > If the GNU project will not accept such bug fixes (thus requiring > their software to be incompatible with ANSI standards) for no reason other than > "we don't want it that way", then I reiterate my statement that they are > morons. I'm sure it gives you a great sense of righteousness when better informed persons dismiss your ranting, but it won't further your goals. > > I really don't think that understanding this distinction makes one > > an idiot or a moron. I suggest you think twice or more before throwing > > those words around. > > Understanding the distinction is not the issue. Requiring incompatibility for > no reason is the issue, and it is still a valid one. If you don't like my > choice of words, fair enough, but it doesn't change the actual issues involved. You still don't understand the distinction. Perhaps if you did you would understand where you have gone wrong. > > While strictly conforming ANSI C programs can use fopen(file, "rt"), > > they cannot use open, O_BIN, O_BINARY, or O_TEXT. And if they > > do use "rt", they cannot depend upon its effects and still be strictly > > conforming. Since the meaning of none of these is defined by either > > ANSI C nor POSIX, their use is not portable. > > Under an ANSI-compatible system, "t" is unnecessary as all files are opened in > text mode by default, therefore "depending on its effect" (its effect being > that it doesn't need one) is a non-issue. One would only need to depend on its > effect within an environment which itself did not conform to these standards > anyway, and therefore the point becomes moot. It is moot in a limited sense. ANSI C says that streams default to text mode, that conforming implementations must accept all strictly conforming programs, and that other characters are allowed after the standard prefixes ("r", "rb", "w+", etc.). Therefore, a conforming implementation can only interpret "t" in a way that makes no difference (and not as "time bomb", as someone suggested). However, a non-conforming implementation could open streams in binary mode by default, but open them in text mode if the "t" flag were present. Such non-conforming implementations generally have a way to tell them to be conforming. e.g., gcc, which is by default non-conforming, can be told to conform via a command-line switch. A system like GNU-win32 could non-conform when mounted -b (because it makes a text/binary distinction but defaults to binary) but conform when mounted -t. That would make the "t" flag useful in that context. However, it would not be portable, in the sense that there might be some other non-conforming implementation that takes "t" to mean "time bomb". -- - For help on using this list, send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".