delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1994/03/23/11:34:01

To: djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: Re: Line terms; UNIX v. DOS; opinions wanted (was: Info port...)
Date: Wed, 23 Mar 94 18:08:47 +0200
From: eliz AT is DOT elta DOT co DOT il

  This is NOT meant to add to the on-going and never-ending dispute about
the CR/LF pairs, DOS vs U**x, meaning of life, the universe, and everything!
I only want here to try to clarify a simple issue of porting from Unix to the
messy-DOS.  Those of you who had enough of this thing lately, please forgive
me and press the ``Delete'' button.
  For those who are still reading: some of you were concerned about breaking
existing Info files by teaching Info to understand DOS text files.  Well, before
I plunge into too much theoretical hand-waving, let me tell you the bottom
line: it works for BOTH Unix-style and DOS-style files.  I ran it on all the
files I have (some original, some edited under DOS), and it never complained.
Makeinfo also does its thing on both styles.
  Now for the simple explanation.  When Unix program read()'s a text file, it
expects to find each line terminated by a single character -- LF, aka Newline,
because this is how files are stored under Unix.  Many Unix-originated programs
rely on this fact.  Therefore, C compilers for DOS were made compatible with
this behavior, so as not to break zillions of lines of existing code.  DJGCC is
no exception to this rule.  The solution is simple: if you open() the file in
the ``text'' mode and then read() it, the library discards all the CR's for you,
so that the program never sees them.  Because Unix programs don't care about
``text'' mode (it wasn't even in existence when many programs were written) the
TEXT mode is usually the default under MS-DOS.  I have yet to see a DOS-based
compiler which doesn't behave like this.  The result of all this is, that if
you leave the program code as it was, all the characters the program sees in its
buffer are EXACTLY THE SAME under DOS and Unix, including the relative positions
of the characters from the beginning of the buffer (which are recorded in the
tag tables of Info files).  So, everybody lives happily ever after, right?
  Well, not quite.  The problem bites you if you try to be too smart, and, say,
check if the value returned by read() is EXACTLY the same as the st-size field
of the structure returned by stat() or fstat().  read() adjusts its value after
discarding CR's, so it tells how many characters you ACTUALLY have, and this
would break some too scrupulous sanity checks inside the Info code as
distributed.  Those are the only REAL code changes I needed to make Info hum
happily on any OS-style file.  Just a couple of #ifdef'ed lines; no need to
change ANYTHING in ANY of the existing Info files.
  Sorry about so many words, but this whole thing is about porting from Unix
to DOS, so I thought it might be interesting to some

	Eli Zaretskii

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019