delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/1998/12/14/09:57:05

Date: Mon, 14 Dec 1998 16:57:12 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
X-Sender: eliz AT is
To: "Salvador Eduardo Tropea (SET)" <salvador AT inti DOT gov DOT ar>
cc: Tim Zastai <zastai AT hotmail DOT com>, djgpp-workers AT delorie DOT com
Subject: Re: Bugs in CVS 1.9 distributed in Simtelnet
In-Reply-To: <m0zpXoO-000S57C@inti.gov.ar>
Message-ID: <Pine.SUN.3.91.981214163014.6784A-100000@is>
MIME-Version: 1.0
Reply-To: djgpp-workers AT delorie DOT com

On Mon, 14 Dec 1998, Salvador Eduardo Tropea (SET) wrote:

> 1) The import command imports the files using fopen(file,"r"); and for some
> strange reason Tim used _fmode=O_BINARY

In my experience, it is almost never a good idea to use _fmode=O_BINARY 
globally.  Most programs need to open some files in text mode and others 
in binary mode, so _fmode doesn't save much trouble.

>   Anyways that's a serious bug because then the text files are saved in the
> repository with \r\n. I think that here CVS should check the mode and open
> the file according it.

I don't really know much about CVS internals, so what's below are some 
general remarks about this text/binary nuisance, mainly in the context of 
revision-control software.

Saving a source file with the DOS-style CRLF EOLs is indeed a Bad Idea 
(IMHO).  The main problem with that is that the master file is then 
non-portable to Unix.  So I think source files should be checked in with 
the CR characters stripped.

However, CVS can be also used to store non-text files.  Clearly, those 
should have all their characters preserved.

Therefore, opening in text mode in all cases is not a good solution 
either.  CVS should peek at some portion of the file, decide if it is 
text or binary (maybe even provide an option for the user to force one of 
these), and then strip all CR characters from the CRLF pairs if that's a 
text file.  That requires a binary open for reading.

Note that it is a mistake to strip ALL the CR characters, even if they are
not followed by an LF: this will have subtle bugs if the source file has
literal CR characters in it.  Only a CR before the LF should be stripped 
from a text file.  This requires to read the file in binary mode, and 
then loop through it manually stripping the CRs, since no library 
function is smart enough to do this for you.

A related question is how to write the file when it is checked out. 
Clearly, a non-text file should be written verbatim.  For a text file, the
answer is less obvious, but I think DOS-style CRLF format is better.  For
example, imagine a source file which originally had strings with CRLF
pairs inside it: these would be stripped by check-in, and must be added on
check-out, to prevent program from breaking.  Also, many DOS editors still
have trouble with Unix-style EOLs (I was surprised to learn that even
MultiEdit has this bug). 

If it is possible, CVS should record the type of the file (text or 
binary) in the repository when the file is checked in, and use that info 
when it writes the file out later.

The above still has some problems, e.g. when a source file has embedded 
strings which end in a single literal newline.  But these cases are rare.

>   I have a working patch for it that checks the mode and explicitly selects
> "wb" or "wt" (no default assumptions for "w"=xxxx).

Some old non-ANSI compilers don't support "wt" and "wb".  You might be 
better off defining a bunch of macros, like FOPEN_WBIN_MODE, so that 
users of other compilers could define them as appropriate.

> I think is much better than RCS

CVS and RCS are not generally interchangeable, they are designed to work 
on different levels and support different development models.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019