delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2000/11/23/02:08:27

From: "Juan Manuel Guerrero" <ST001906 AT HRZ1 DOT HRZ DOT TU-Darmstadt DOT De>
Organization: Darmstadt University of Technology
To: djgpp-workers AT delorie DOT com
Date: Thu, 23 Nov 2000 08:06:53 +0200
MIME-Version: 1.0
Subject: New patch for dtou.c
X-mailer: Pegasus Mail for Windows (v2.54DE)
Message-ID: <58DAE532FD@HRZ1.hrz.tu-darmstadt.de>
Reply-To: djgpp-workers AT delorie DOT com

Date: Wed, 8 Nov 2000 09:43:15 +0200 (WET)
From: Andris Pavenis <pavenis AT lanet DOT lv>
> One additional suggestion: There are small DOS utility in Simtelnet
> (simtelnet/msdos/fileutils/nocrlf10.zip) which permits to repair
> binary files which are errorously transfered as text (it was DOS only so
> no LFN support, of course). Now only change we need for that in dtou is to
> skip Ctrl-Z processing. My suggestion is to do that if executable name is
> nocrlf only. So one can do:


Date: Wed, 8 Nov 2000 13:27:53 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
> And, if we are talking about adding features to DTOU, here's a small 
> wishlist:
>
>   - add verbose operation option, whereby the program will print whether 
>     it removed any CR's and ^Z's, and whether some lines had LF without a 
>     CR (a sure sign the file is either binary or has inconsistent EOL 
>     format for some other reason).
>
>   - add an option which will remove any number of CRs before an LF, as in 
>     "\r\r\r\r\r\n" (this happens with buggy ports of Unix software, such 
>     as the Windows CVS client, which always blindly add a CR to LF, even 
>     if there's already a CR there).
>
>   - return an exit status which says whether any changes were done to the 
>     file.
>
>   - add an option which causes the file time stamps to be preserved only 
>     if the file was left unchanged.
>
>   - explain more about how these two programs work in utils.tex.


I have added 5 command-line options to dtou:
 -h: Displays a help text and exits.

 -r: Repair mode. This mode transforms MSDOS-style EOL (CRLF) into
     UNIX-style EOL (LF). It ignores Cntl-Z thus it will not truncate the file.
     CR sequences in front of LFs are left unchanged. A CR sequence is a sequence
     except for the last CR of the sequence. This last CR together with the LF
     forms the MSDOS-style EOL (CRLF). This implies that if there are n CRs followed
     by a LF, the sequence is only n-1 CRs long, of course. This mode is intended
     for repairing files that have erroneously been transmited in text-mode
     instead of binary-mode during a FTP session.

 -s: Strip mode. It transforms MSDOS-style EOL (CRLF) into UNIX-style EOL (LF)
     and strips a CR sequence of arbitrary length from a file, if the sequence
     is followed by a LF. CR sequences that are not followed by a LF are left
     unchanged.

 -t: Timestamp. With this option the timestamp of a file (modified or not)
     will be preserved.

 -v: Verbose mode. This mode outputs some information during file processing.
     All possible output looks like:

       File: foo.c
       File unchanged.
       At least one CRLF to LF transformation.
       Warning: At least one CR sequence striped from a LF.
       Warning: At least one Cntl-Z. File truncated at line n.
       Warning: At least one LF without a preceeding CR.

The program is backward compatible with previous program versions if no options
are given at all. In this case, an occurrence of Cntl-Z will truncate the file,
MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence
stripping will not happen at all. Also the timestamp will not be alterated.


The table below summarizes the exit status:
   0: File is unchanged.
   1: At least one CRLF to LF convertion has occurred in the processed file.
   2: At least one CR sequence has been removed from a LF in the processed file.
   3: At least one CR sequence has been removed from a LF and one CRLF to CR
      conversion has occurred in the processed file.
   4: Cntl-Z (software EOF) has occurred, thus the processed file has been truncated.
   5: Cntl-Z has occurred, thus the processed file has been truncated and at least
      one CRLF to LF convertion has occurred.
   6: Cntl-Z has occurred, thus the processed file has been truncated and at least
      one CR sequence has been removed from a LF in the processed file.
   7: At least one LF has ocurred without a preceeding CR in the processed file.
   8: At least one LF has ocurred without a preceeding CR and at least one
      CRLF to LF convertion has occurred in the processed file.
   9: At least one LF has ocurred without a preceeding CR and at least one
      CR sequence has been removed from a LF in the processed file.
  10: At least one LF has ocurred without a preceeding CR, one CR sequence
      has been removed from a LF and one CRLF to LF conversion has occurred
      in the processed file.
  11: At least one LF has ocurred without a preceeding CR and at least one
      Cntl-Z has occurred. The processed file has been truncated.
  12: At least one LF has ocurred without a preceeding CR, one CRLF to LF
      convertion and one Cntl-Z has occurred in the processed file.
      The file has been truncated.
  13: At least one LF has ocurred without a preceeding CR, one CR sequence
      has been removed and one Cntl-Z has occurred in the processed file.
      The file has been truncated.
  14: At least one LF has ocurred without a preceeding CR, one CR sequence
      has been removed, CRLF to LF convertion and one Cntl-Z has occurred
      in the processed file. The file has been truncated.
  16: Some I/O error occurred.

I have tested it on DOS. No linux/unix testing at all but I have only used posix functions
so there should appear no diffuculties when compiling and running dtou under unix.

Comments, objections, suggestions, etc. are welcome.

Regards,
Guerrero, Juan

diff -acprNC5 djgpp.orig/src/util/dtou.c djgpp/src/util/dtou.c
*** djgpp.orig/src/util/dtou.c	Wed Nov 22 23:43:52 2000
--- djgpp/src/util/dtou.c	Thu Nov 23 01:22:58 2000
***************
*** 12,98 ****
  
  #ifndef O_BINARY
  #define O_BINARY 0
  #endif
  
  static int
! dtou(char *fname)
  {
!   int i, k, k2, sf, df, l, l2=0, err=0, isCR=0;
!   char buf[16384];
    char tfname[FILENAME_MAX], *bn, *w;
    struct stat st;
    struct utimbuf tim1;
!   sf = open(fname, O_RDONLY|O_BINARY);
    if (sf < 1)
    {
!     perror(fname);
!     return 1;
    }
    
    fstat (sf,&st);
    tim1.actime = st.st_atime;
    tim1.modtime = st.st_mtime;
  
    strcpy (tfname, fname);
!   for (bn=w=tfname; *w; w++) 
!     if (*w=='/' || *w=='\\' || *w==':') 
        bn = w+1;  
    if (bn) *bn=0;
!   strcat (tfname,"utod.tm$");
    
!   df = open(tfname, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, 0644);
    if (df < 1)
    {
!     perror(tfname);
!     close(sf);
!     return 1;
    }
  
!   k2=0;
!   while ((l=read(sf, buf, 16384)) > 0)
    { 
!     int CtrlZ=0;
!     for (i=k=0; i<l; i++) 
        {
!          if (isCR && buf[i]!=0x0A) buf[k++] = 0x0D; 
!          if (buf[i]==0x0D) { isCR=1; continue; }
!          if (buf[i]==0x1A) { CtrlZ=1; break; }
!          	     else    buf[k++] = buf[i];
!          isCR = 0;
        }
!     l2=(k>0 ? write(df, buf, k) : 0);
!     if (l2<0 || CtrlZ) break;
!     if (l2!=k) { err=1; break; }
    }
  
!   if (l<0) perror (fname);
!   if (l2<0) perror (tfname);
!   if (err) fprintf (stderr,"Cannot process file %s\n",fname);
  
!   close(sf);
!   close(df);
  
!   if (l>=0 && l2>=0 && err==0)
    {
!     remove(fname);
!     rename(tfname, fname);
!     utime(fname, &tim1);
!     chown(fname, st.st_uid, st.st_gid);
!     chmod(fname, st.st_mode);
    }
!   else 
!   {
!     remove(tfname);
!   }
!   return 0;
  }
  
  int
  main(int argc, char **argv)
  {
!   int rv = 0;
!   for (argc--, argv++; argc; argc--, argv++)
!     rv += dtou(*argv);
!   return rv;
! }
  
--- 12,292 ----
  
  #ifndef O_BINARY
  #define O_BINARY 0
  #endif
  
+ #define IS_DIR_SEPARATOR(path) ((path) == '/' || (path) == '\\' || (path) == ':')
+ #define IS_LAST_IN_BUF  (i == l - 1)
+ #define IS_LAST_IN_FILE (position + i + 1 == st.st_size)
+ #define SET_FLAG(flag)         \
+ do {                           \
+   if ((flag) == 0) (flag) = 1; \
+ } while (0)
+ #define BUF_SIZE      16384
+ 
+ /* Control characters. */    
+ #define LF            0x0A
+ #define CR            0x0D
+ #define CntlZ         0x1A
+ 
+ /* Exit codes. */
+ #define NO_CHANGE     0x00  /* No changes at all have been done to the file. */
+ #define CR_REMOVED    0x01  /* Single CR removed from a LF. */
+ #define nCR_REMOVED   0x02  /* Multiple CRs removed from a LF. */
+ #define CntlZ_EOF     0x04  /* ^Z as EOF appeared. */
+ #define LF_ONLY       0x08  /* A LF without a preceeding CR appeared. */
+ 
+ #define NO_ERROR      0x00
+ #define IO_ERROR      0x10  /* Some I/O error occurred. */
+ 
+ 
  static int
! dtou(char *fname, int r_mode, int s_mode, int v_mode, int t_mode)
  {
!   int i, k, sf, df, l, l2 = 0, is_CR = 0, is_nCR = 0, is_CR_sequence = 0;
!   int CntlZ_flag = 0, CR_flag = 0, nCR_flag = 0, LF_flag = 0, exit_status = NO_CHANGE;
!   int buf_counter, nbufs, LF_counter, must_rewind, position, offset, whence;
!   char buf[BUF_SIZE];
    char tfname[FILENAME_MAX], *bn, *w;
    struct stat st;
    struct utimbuf tim1;
! 
!   sf = open (fname, O_RDONLY|O_BINARY);
    if (sf < 1)
    {
!     perror (fname);
!     return IO_ERROR;
    }
    
    fstat (sf,&st);
    tim1.actime = st.st_atime;
    tim1.modtime = st.st_mtime;
+   nbufs = st.st_size / BUF_SIZE;
  
    strcpy (tfname, fname);
!   for (bn = w = tfname; *w; w++) 
!     if (IS_DIR_SEPARATOR (*w))
        bn = w+1;  
    if (bn) *bn=0;
!   strcat (tfname,"dtou.tm$");
    
!   df = open (tfname, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, 0644);
    if (df < 1)
    {
!     perror (tfname);
!     close (sf);
!     return IO_ERROR;
    }
  
!   buf_counter = LF_counter = must_rewind = position = 0;
!   if (s_mode)
!   {
!     offset = 0;
!     whence = SEEK_SET;
!   }
!   else
!   {
!     offset = -1;
!     whence = SEEK_CUR;
!   }
!   while ((l = read (sf, buf, BUF_SIZE)) > 0)
    { 
!     for (i = k = 0; i < l; i++) 
!     {
!       if (!r_mode)
!         if (buf[i] == CntlZ) { SET_FLAG (CntlZ_flag); break; }
!       if (s_mode)
        {
!         if (buf[i] == LF)
!         {
!           if (!(is_CR || is_nCR)) SET_FLAG (LF_flag);
!           if (is_nCR) { SET_FLAG (nCR_flag); is_nCR = 0; }
!           if (is_CR) { SET_FLAG (CR_flag); is_CR = 0; }
!           LF_counter++;
!           offset = must_rewind = 0;
!           buf[k++] = buf[i]; continue;
!         }
!         if (is_CR_sequence)
!         {
!           if (buf[i] == CR) { buf[k++] = buf[i]; continue; }
!           else is_CR_sequence = 0;
!         }
!         if (is_nCR)
!         {
!           if (buf[i] != CR || IS_LAST_IN_FILE)
!           {
!             is_CR_sequence = must_rewind = 1;
!             is_nCR = 0; break;
!           }
!           else
!             continue;
!         }
!         if (is_CR && buf[i] == CR) { is_nCR = 1; is_CR = 0; continue; }
!         if (buf[i] == CR)
!         {
!           if (IS_LAST_IN_FILE) { buf[k++] = buf[i]; break; }
!           is_CR = must_rewind = 1;
!           offset = position + i;
!           continue;
!         }
        }
!       else
!       {
!         if (buf[i] == LF)
!         {
!           if (is_CR)  SET_FLAG (CR_flag);
!           if (!is_CR) SET_FLAG (LF_flag);
!           LF_counter++;
!         }
!         if (is_CR && buf[i] != LF) buf[k++] = CR;
!         if (buf[i] == CR)
!         {
!           if (IS_LAST_IN_BUF)
!           {
!             if (buf_counter < nbufs) must_rewind = 1;
!             else buf[k++] = CR;
!           }
!           is_CR = 1; continue;
!         }
!         is_CR = 0;
!       }
!       buf[k++] = buf[i];
!     }
! 
!     is_CR = 0;
!     buf_counter++;
!     position += l;
!     /* Last character/s in buf are CR/s.
!        Push it/them back and reread it/them next time. */
!     if (must_rewind)
!     {
!       position = lseek (sf, offset, whence);
!       must_rewind = 0;
!     }
! 
!     l2 = (k > 0 ? write (df, buf, k) : 0);
!     if (l2 < 0 || CntlZ_flag) break;
!     if (l2 != k) { exit_status = IO_ERROR; break; }
    }
  
!   if (l < 0) perror (fname);
!   if (l2 < 0) perror (tfname);
!   if (exit_status != NO_ERROR)
!     fprintf (stderr,"Cannot process file %s\n",fname);
  
!   close (sf);
!   close (df);
  
!   if (l >= 0 && l2 >= 0 && exit_status == NO_ERROR)
    {
!     remove (fname);
!     rename (tfname, fname);
!     chown (fname, st.st_uid, st.st_gid);
!     chmod (fname, st.st_mode);
!     if (t_mode)
!       utime (fname, &tim1);
!     if (v_mode) 
!       printf ("File: %s\n",fname);
!     if (CR_flag)
!     {
!       exit_status |= CR_REMOVED;
!       if (v_mode) 
!         printf ("At least one CRLF to LF transformation.\n");
!     }
!     if (nCR_flag)
!     {
!       exit_status |= nCR_REMOVED;
!       if (v_mode) 
!         printf ("Warning: At least one CR sequence striped from a LF.\n");
!     }
!     if (CntlZ_flag)
!     {
!       exit_status |= CntlZ_EOF;
!       if (v_mode) 
!         printf ("Warning: At least one Cntl-Z. File truncated at line %i.\n", LF_counter);
!     }
!     if (LF_flag)
!     {
!       exit_status |= LF_ONLY;
!       if (v_mode) 
!         printf ("Warning: At least one LF without a preceeding CR.\n");
!     }
!     if (v_mode && exit_status == NO_CHANGE)
!       printf ("File unchanged.\n");
    }
!   else
!     remove (tfname);
! 
!   return exit_status;
! }
! 
! void
! usage(char *progname)
! {
!   printf ("Usage: %s [-h] [-r] [-s] [-t] [-v] files...\n\n", progname);
!   printf ("Options are:\n");
!   printf ("              -h:  Display this help and exit.\n");
!   printf ("              -r:  repair mode. Transform MSDOS-style EOF (CRLF) into\n");
!   printf ("                   UNIX-style EOL (LF).\n");
!   printf ("                   Cntl-Z are ignored and will not truncate the file and\n");
!   printf ("                   CR sequences in front of LF will left unchanged.\n");
!   printf ("              -s:  strip mode. Transform MSDOS-style EOF (CRLF) into\n");
!   printf ("                   UNIX-style EOL (LF) and strip a CR sequence of\n");
!   printf ("                   arbitrary length from the file, if and only if\n");
!   printf ("                   the sequence is followed by LF. CR sequences that\n");
!   printf ("                   are not followed by LF are always left unchanged.\n");
!   printf ("              -t:  timestamp. The timestamp of the file (modified or\n");
!   printf ("                   not modified) will be preserved.\n");
!   printf ("              -v:  verbose mode.\n\n");
!   printf ("The program is backward compatible with previous program versions if no options\n");
!   printf ("are given at all. In this case, an occurrence of Cntl-Z will truncate the file,\n");
!   printf ("MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence\n");
!   printf ("stripping will not happen at all. Also the timestamp will not be alterated.\n");
  }
  
  int
  main(int argc, char **argv)
  {
!   int exit_status = NO_ERROR, i, repair_mode, strip_mode, verbose_mode, timestamp;
!   char* progname = strlwr(strdup(argv[0]));
  
+   if (argc < 2)
+   {
+     usage (progname);
+     exit(NO_ERROR);
+   }
+ 
+   repair_mode = strip_mode = verbose_mode = 0; /* Default for */
+   timestamp = 1;                               /* backward compatibility. */
+   i = 1;
+   while ((argc > i) && (argv[i][0] == '-') && argv[i][1])
+   {
+     switch (argv[i][1])
+     {
+       case 'h':
+         usage (progname);
+         exit(NO_ERROR);
+         break;
+       case 'r':
+         repair_mode = 1;
+         strip_mode = 0;
+         timestamp = 0;
+         break;
+       case 's':
+         strip_mode = 1;
+         repair_mode = 0;
+         timestamp = 0;
+         break;
+       case 't':
+         timestamp = 1;
+         break;
+       case 'v':
+         verbose_mode = 1;
+         break;
+     }
+     i++;
+   }
+ 
+   for (; i < argc; i++)
+     exit_status = dtou (argv[i], repair_mode, strip_mode, verbose_mode, timestamp);
+   return exit_status;
+ }
diff -acprNC5 djgpp.orig/src/util/utils.tex djgpp/src/util/utils.tex
*** djgpp.orig/src/util/utils.tex	Wed Nov 22 23:44:24 2000
--- djgpp/src/util/utils.tex	Thu Nov 23 01:22:58 2000
*************** so that they won't get mixed with the fi
*** 320,333 ****
  
  @c -----------------------------------------------------------------------------
  @node dtou, utod, djtar, Top
  @chapter dtou
  
  Each file specified on the command line is converted from dos's CR/LF
  text file mode to unix's NL text file mode.
  
! All djgpp wildcards are supported.  Timestamps of the files are preserved.
  
  @c -----------------------------------------------------------------------------
  @node utod, gxx, dtou, Top
  @chapter utod
  
--- 320,433 ----
  
  @c -----------------------------------------------------------------------------
  @node dtou, utod, djtar, Top
  @chapter dtou
  
+ Usage: @code{dtou} [@code{-h}] [@code{-r}] [@code{-s}] [@code{-t}]
+ [@code{-v}] @file{files}
+ 
  Each file specified on the command line is converted from dos's CR/LF
  text file mode to unix's NL text file mode.
  
! All djgpp wildcards are supported. Timestamps of the files are preserved
! if the files are left unchanged.
! 
! @strong{Options:}
! 
! @table @code
! 
! @item -h
! 
! Displays a help text and exits.
! 
! @item -r
! 
! Repair mode. This mode transforms MSDOS-style EOL (CRLF) into
! UNIX-style EOL (LF). It ignores Cntl-Z thus it will not truncate the file.
! CR sequences in front of LFs are left unchanged. This mode is intended
! for repairing files that have erroneously been transmited in text-mode
! instead of binary-mode during a FTP session.
! 
! @item -s
! 
! Strip mode. It transforms MSDOS-style EOL (CRLF) into UNIX-style EOL (LF)
! and strips a CR sequence of arbitrary length from a file, if the sequence
! followed by a LF. CR sequences that are not followed by a LF are left
! unchanged.
! 
! @item -t
! 
! Timestamp. With this option the timestamp of file (modified or not modified)
! will be preserved.
! 
! @item -v
! 
! Verbose mode.
! 
! @end table
! 
! The program is backward compatible with previous program versions if no options
! are given at all. In this case, an occurrence of Cntl-Z will truncate the file,
! MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence
! stripping will not happen at all. Also the timestamp will not be alterated.
! 
! The table below summarizes the exit status. When wildcards are used
! the exit status always refers to the last processed file.
! 
! @strong{Exit status:}
! 
! @enumerate 0
! 
! @item
! File is unchanged.
! @item
! At least one CRLF to LF convertion has occurred in the processed file.
! @item
! At least one CR sequence has been removed from a LF in the processed file.
! @item
! At least one CR sequence has been removed from a LF and one CRLF to CR
! conversion has occurred in the processed file.
! @item
! Cntl-Z (software EOF) has occurred, thus the processed file has been truncated.
! @item
! Cntl-Z has occurred, thus the processed file has been truncated and at least
! one CRLF to LF convertion has occurred.
! @item
! Cntl-Z has occurred, thus the processed file has been truncated and at least
! one CR sequence has been removed from a LF in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR and at least one
! CRLF to LF convertion has occurred in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR and at least one
! CR sequence has been removed from a LF in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR, one CR sequence
! has been removed from a LF and one CRLF to LF conversion has occurred
! in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR and at least one
! Cntl-Z has occurred. The processed file has been truncated.
! @item
! At least one LF has ocurred without a preceeding CR, one CRLF to LF
! convertion and one Cntl-Z has occurred in the processed file.
! The file has been truncated.
! @item
! At least one LF has ocurred without a preceeding CR, one CR sequence
! has been removed and one Cntl-Z has occurred in the processed file.
! The file has been truncated.
! @item
! At least one LF has ocurred without a preceeding CR, one CR sequence
! has been removed, CRLF to LF convertion and one Cntl-Z has occurred
! in the processed file. The file has been truncated.
! @end enumerate
! @enumerate 16
! @item
! Some I/O error occurred.
! @end enumerate
  
  @c -----------------------------------------------------------------------------
  @node utod, gxx, dtou, Top
  @chapter utod
  

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019