From: "Juan Manuel Guerrero" Organization: Darmstadt University of Technology To: djgpp-workers AT delorie DOT com Date: Thu, 23 Nov 2000 08:06:53 +0200 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: New patch for dtou.c X-mailer: Pegasus Mail for Windows (v2.54DE) Message-ID: <58DAE532FD@HRZ1.hrz.tu-darmstadt.de> Reply-To: djgpp-workers AT delorie DOT com Date: Wed, 8 Nov 2000 09:43:15 +0200 (WET) From: Andris Pavenis > One additional suggestion: There are small DOS utility in Simtelnet > (simtelnet/msdos/fileutils/nocrlf10.zip) which permits to repair > binary files which are errorously transfered as text (it was DOS only so > no LFN support, of course). Now only change we need for that in dtou is to > skip Ctrl-Z processing. My suggestion is to do that if executable name is > nocrlf only. So one can do: Date: Wed, 8 Nov 2000 13:27:53 +0200 (IST) From: Eli Zaretskii > And, if we are talking about adding features to DTOU, here's a small > wishlist: > > - add verbose operation option, whereby the program will print whether > it removed any CR's and ^Z's, and whether some lines had LF without a > CR (a sure sign the file is either binary or has inconsistent EOL > format for some other reason). > > - add an option which will remove any number of CRs before an LF, as in > "\r\r\r\r\r\n" (this happens with buggy ports of Unix software, such > as the Windows CVS client, which always blindly add a CR to LF, even > if there's already a CR there). > > - return an exit status which says whether any changes were done to the > file. > > - add an option which causes the file time stamps to be preserved only > if the file was left unchanged. > > - explain more about how these two programs work in utils.tex. I have added 5 command-line options to dtou: -h: Displays a help text and exits. -r: Repair mode. This mode transforms MSDOS-style EOL (CRLF) into UNIX-style EOL (LF). It ignores Cntl-Z thus it will not truncate the file. CR sequences in front of LFs are left unchanged. A CR sequence is a sequence except for the last CR of the sequence. This last CR together with the LF forms the MSDOS-style EOL (CRLF). This implies that if there are n CRs followed by a LF, the sequence is only n-1 CRs long, of course. This mode is intended for repairing files that have erroneously been transmited in text-mode instead of binary-mode during a FTP session. -s: Strip mode. It transforms MSDOS-style EOL (CRLF) into UNIX-style EOL (LF) and strips a CR sequence of arbitrary length from a file, if the sequence is followed by a LF. CR sequences that are not followed by a LF are left unchanged. -t: Timestamp. With this option the timestamp of a file (modified or not) will be preserved. -v: Verbose mode. This mode outputs some information during file processing. All possible output looks like: File: foo.c File unchanged. At least one CRLF to LF transformation. Warning: At least one CR sequence striped from a LF. Warning: At least one Cntl-Z. File truncated at line n. Warning: At least one LF without a preceeding CR. The program is backward compatible with previous program versions if no options are given at all. In this case, an occurrence of Cntl-Z will truncate the file, MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence stripping will not happen at all. Also the timestamp will not be alterated. The table below summarizes the exit status: 0: File is unchanged. 1: At least one CRLF to LF convertion has occurred in the processed file. 2: At least one CR sequence has been removed from a LF in the processed file. 3: At least one CR sequence has been removed from a LF and one CRLF to CR conversion has occurred in the processed file. 4: Cntl-Z (software EOF) has occurred, thus the processed file has been truncated. 5: Cntl-Z has occurred, thus the processed file has been truncated and at least one CRLF to LF convertion has occurred. 6: Cntl-Z has occurred, thus the processed file has been truncated and at least one CR sequence has been removed from a LF in the processed file. 7: At least one LF has ocurred without a preceeding CR in the processed file. 8: At least one LF has ocurred without a preceeding CR and at least one CRLF to LF convertion has occurred in the processed file. 9: At least one LF has ocurred without a preceeding CR and at least one CR sequence has been removed from a LF in the processed file. 10: At least one LF has ocurred without a preceeding CR, one CR sequence has been removed from a LF and one CRLF to LF conversion has occurred in the processed file. 11: At least one LF has ocurred without a preceeding CR and at least one Cntl-Z has occurred. The processed file has been truncated. 12: At least one LF has ocurred without a preceeding CR, one CRLF to LF convertion and one Cntl-Z has occurred in the processed file. The file has been truncated. 13: At least one LF has ocurred without a preceeding CR, one CR sequence has been removed and one Cntl-Z has occurred in the processed file. The file has been truncated. 14: At least one LF has ocurred without a preceeding CR, one CR sequence has been removed, CRLF to LF convertion and one Cntl-Z has occurred in the processed file. The file has been truncated. 16: Some I/O error occurred. I have tested it on DOS. No linux/unix testing at all but I have only used posix functions so there should appear no diffuculties when compiling and running dtou under unix. Comments, objections, suggestions, etc. are welcome. Regards, Guerrero, Juan diff -acprNC5 djgpp.orig/src/util/dtou.c djgpp/src/util/dtou.c *** djgpp.orig/src/util/dtou.c Wed Nov 22 23:43:52 2000 --- djgpp/src/util/dtou.c Thu Nov 23 01:22:58 2000 *************** *** 12,98 **** #ifndef O_BINARY #define O_BINARY 0 #endif static int ! dtou(char *fname) { ! int i, k, k2, sf, df, l, l2=0, err=0, isCR=0; ! char buf[16384]; char tfname[FILENAME_MAX], *bn, *w; struct stat st; struct utimbuf tim1; ! sf = open(fname, O_RDONLY|O_BINARY); if (sf < 1) { ! perror(fname); ! return 1; } fstat (sf,&st); tim1.actime = st.st_atime; tim1.modtime = st.st_mtime; strcpy (tfname, fname); ! for (bn=w=tfname; *w; w++) ! if (*w=='/' || *w=='\\' || *w==':') bn = w+1; if (bn) *bn=0; ! strcat (tfname,"utod.tm$"); ! df = open(tfname, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, 0644); if (df < 1) { ! perror(tfname); ! close(sf); ! return 1; } ! k2=0; ! while ((l=read(sf, buf, 16384)) > 0) { ! int CtrlZ=0; ! for (i=k=0; i0 ? write(df, buf, k) : 0); ! if (l2<0 || CtrlZ) break; ! if (l2!=k) { err=1; break; } } ! if (l<0) perror (fname); ! if (l2<0) perror (tfname); ! if (err) fprintf (stderr,"Cannot process file %s\n",fname); ! close(sf); ! close(df); ! if (l>=0 && l2>=0 && err==0) { ! remove(fname); ! rename(tfname, fname); ! utime(fname, &tim1); ! chown(fname, st.st_uid, st.st_gid); ! chmod(fname, st.st_mode); } ! else ! { ! remove(tfname); ! } ! return 0; } int main(int argc, char **argv) { ! int rv = 0; ! for (argc--, argv++; argc; argc--, argv++) ! rv += dtou(*argv); ! return rv; ! } --- 12,292 ---- #ifndef O_BINARY #define O_BINARY 0 #endif + #define IS_DIR_SEPARATOR(path) ((path) == '/' || (path) == '\\' || (path) == ':') + #define IS_LAST_IN_BUF (i == l - 1) + #define IS_LAST_IN_FILE (position + i + 1 == st.st_size) + #define SET_FLAG(flag) \ + do { \ + if ((flag) == 0) (flag) = 1; \ + } while (0) + #define BUF_SIZE 16384 + + /* Control characters. */ + #define LF 0x0A + #define CR 0x0D + #define CntlZ 0x1A + + /* Exit codes. */ + #define NO_CHANGE 0x00 /* No changes at all have been done to the file. */ + #define CR_REMOVED 0x01 /* Single CR removed from a LF. */ + #define nCR_REMOVED 0x02 /* Multiple CRs removed from a LF. */ + #define CntlZ_EOF 0x04 /* ^Z as EOF appeared. */ + #define LF_ONLY 0x08 /* A LF without a preceeding CR appeared. */ + + #define NO_ERROR 0x00 + #define IO_ERROR 0x10 /* Some I/O error occurred. */ + + static int ! dtou(char *fname, int r_mode, int s_mode, int v_mode, int t_mode) { ! int i, k, sf, df, l, l2 = 0, is_CR = 0, is_nCR = 0, is_CR_sequence = 0; ! int CntlZ_flag = 0, CR_flag = 0, nCR_flag = 0, LF_flag = 0, exit_status = NO_CHANGE; ! int buf_counter, nbufs, LF_counter, must_rewind, position, offset, whence; ! char buf[BUF_SIZE]; char tfname[FILENAME_MAX], *bn, *w; struct stat st; struct utimbuf tim1; ! ! sf = open (fname, O_RDONLY|O_BINARY); if (sf < 1) { ! perror (fname); ! return IO_ERROR; } fstat (sf,&st); tim1.actime = st.st_atime; tim1.modtime = st.st_mtime; + nbufs = st.st_size / BUF_SIZE; strcpy (tfname, fname); ! for (bn = w = tfname; *w; w++) ! if (IS_DIR_SEPARATOR (*w)) bn = w+1; if (bn) *bn=0; ! strcat (tfname,"dtou.tm$"); ! df = open (tfname, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, 0644); if (df < 1) { ! perror (tfname); ! close (sf); ! return IO_ERROR; } ! buf_counter = LF_counter = must_rewind = position = 0; ! if (s_mode) ! { ! offset = 0; ! whence = SEEK_SET; ! } ! else ! { ! offset = -1; ! whence = SEEK_CUR; ! } ! while ((l = read (sf, buf, BUF_SIZE)) > 0) { ! for (i = k = 0; i < l; i++) ! { ! if (!r_mode) ! if (buf[i] == CntlZ) { SET_FLAG (CntlZ_flag); break; } ! if (s_mode) { ! if (buf[i] == LF) ! { ! if (!(is_CR || is_nCR)) SET_FLAG (LF_flag); ! if (is_nCR) { SET_FLAG (nCR_flag); is_nCR = 0; } ! if (is_CR) { SET_FLAG (CR_flag); is_CR = 0; } ! LF_counter++; ! offset = must_rewind = 0; ! buf[k++] = buf[i]; continue; ! } ! if (is_CR_sequence) ! { ! if (buf[i] == CR) { buf[k++] = buf[i]; continue; } ! else is_CR_sequence = 0; ! } ! if (is_nCR) ! { ! if (buf[i] != CR || IS_LAST_IN_FILE) ! { ! is_CR_sequence = must_rewind = 1; ! is_nCR = 0; break; ! } ! else ! continue; ! } ! if (is_CR && buf[i] == CR) { is_nCR = 1; is_CR = 0; continue; } ! if (buf[i] == CR) ! { ! if (IS_LAST_IN_FILE) { buf[k++] = buf[i]; break; } ! is_CR = must_rewind = 1; ! offset = position + i; ! continue; ! } } ! else ! { ! if (buf[i] == LF) ! { ! if (is_CR) SET_FLAG (CR_flag); ! if (!is_CR) SET_FLAG (LF_flag); ! LF_counter++; ! } ! if (is_CR && buf[i] != LF) buf[k++] = CR; ! if (buf[i] == CR) ! { ! if (IS_LAST_IN_BUF) ! { ! if (buf_counter < nbufs) must_rewind = 1; ! else buf[k++] = CR; ! } ! is_CR = 1; continue; ! } ! is_CR = 0; ! } ! buf[k++] = buf[i]; ! } ! ! is_CR = 0; ! buf_counter++; ! position += l; ! /* Last character/s in buf are CR/s. ! Push it/them back and reread it/them next time. */ ! if (must_rewind) ! { ! position = lseek (sf, offset, whence); ! must_rewind = 0; ! } ! ! l2 = (k > 0 ? write (df, buf, k) : 0); ! if (l2 < 0 || CntlZ_flag) break; ! if (l2 != k) { exit_status = IO_ERROR; break; } } ! if (l < 0) perror (fname); ! if (l2 < 0) perror (tfname); ! if (exit_status != NO_ERROR) ! fprintf (stderr,"Cannot process file %s\n",fname); ! close (sf); ! close (df); ! if (l >= 0 && l2 >= 0 && exit_status == NO_ERROR) { ! remove (fname); ! rename (tfname, fname); ! chown (fname, st.st_uid, st.st_gid); ! chmod (fname, st.st_mode); ! if (t_mode) ! utime (fname, &tim1); ! if (v_mode) ! printf ("File: %s\n",fname); ! if (CR_flag) ! { ! exit_status |= CR_REMOVED; ! if (v_mode) ! printf ("At least one CRLF to LF transformation.\n"); ! } ! if (nCR_flag) ! { ! exit_status |= nCR_REMOVED; ! if (v_mode) ! printf ("Warning: At least one CR sequence striped from a LF.\n"); ! } ! if (CntlZ_flag) ! { ! exit_status |= CntlZ_EOF; ! if (v_mode) ! printf ("Warning: At least one Cntl-Z. File truncated at line %i.\n", LF_counter); ! } ! if (LF_flag) ! { ! exit_status |= LF_ONLY; ! if (v_mode) ! printf ("Warning: At least one LF without a preceeding CR.\n"); ! } ! if (v_mode && exit_status == NO_CHANGE) ! printf ("File unchanged.\n"); } ! else ! remove (tfname); ! ! return exit_status; ! } ! ! void ! usage(char *progname) ! { ! printf ("Usage: %s [-h] [-r] [-s] [-t] [-v] files...\n\n", progname); ! printf ("Options are:\n"); ! printf (" -h: Display this help and exit.\n"); ! printf (" -r: repair mode. Transform MSDOS-style EOF (CRLF) into\n"); ! printf (" UNIX-style EOL (LF).\n"); ! printf (" Cntl-Z are ignored and will not truncate the file and\n"); ! printf (" CR sequences in front of LF will left unchanged.\n"); ! printf (" -s: strip mode. Transform MSDOS-style EOF (CRLF) into\n"); ! printf (" UNIX-style EOL (LF) and strip a CR sequence of\n"); ! printf (" arbitrary length from the file, if and only if\n"); ! printf (" the sequence is followed by LF. CR sequences that\n"); ! printf (" are not followed by LF are always left unchanged.\n"); ! printf (" -t: timestamp. The timestamp of the file (modified or\n"); ! printf (" not modified) will be preserved.\n"); ! printf (" -v: verbose mode.\n\n"); ! printf ("The program is backward compatible with previous program versions if no options\n"); ! printf ("are given at all. In this case, an occurrence of Cntl-Z will truncate the file,\n"); ! printf ("MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence\n"); ! printf ("stripping will not happen at all. Also the timestamp will not be alterated.\n"); } int main(int argc, char **argv) { ! int exit_status = NO_ERROR, i, repair_mode, strip_mode, verbose_mode, timestamp; ! char* progname = strlwr(strdup(argv[0])); + if (argc < 2) + { + usage (progname); + exit(NO_ERROR); + } + + repair_mode = strip_mode = verbose_mode = 0; /* Default for */ + timestamp = 1; /* backward compatibility. */ + i = 1; + while ((argc > i) && (argv[i][0] == '-') && argv[i][1]) + { + switch (argv[i][1]) + { + case 'h': + usage (progname); + exit(NO_ERROR); + break; + case 'r': + repair_mode = 1; + strip_mode = 0; + timestamp = 0; + break; + case 's': + strip_mode = 1; + repair_mode = 0; + timestamp = 0; + break; + case 't': + timestamp = 1; + break; + case 'v': + verbose_mode = 1; + break; + } + i++; + } + + for (; i < argc; i++) + exit_status = dtou (argv[i], repair_mode, strip_mode, verbose_mode, timestamp); + return exit_status; + } diff -acprNC5 djgpp.orig/src/util/utils.tex djgpp/src/util/utils.tex *** djgpp.orig/src/util/utils.tex Wed Nov 22 23:44:24 2000 --- djgpp/src/util/utils.tex Thu Nov 23 01:22:58 2000 *************** so that they won't get mixed with the fi *** 320,333 **** @c ----------------------------------------------------------------------------- @node dtou, utod, djtar, Top @chapter dtou Each file specified on the command line is converted from dos's CR/LF text file mode to unix's NL text file mode. ! All djgpp wildcards are supported. Timestamps of the files are preserved. @c ----------------------------------------------------------------------------- @node utod, gxx, dtou, Top @chapter utod --- 320,433 ---- @c ----------------------------------------------------------------------------- @node dtou, utod, djtar, Top @chapter dtou + Usage: @code{dtou} [@code{-h}] [@code{-r}] [@code{-s}] [@code{-t}] + [@code{-v}] @file{files} + Each file specified on the command line is converted from dos's CR/LF text file mode to unix's NL text file mode. ! All djgpp wildcards are supported. Timestamps of the files are preserved ! if the files are left unchanged. ! ! @strong{Options:} ! ! @table @code ! ! @item -h ! ! Displays a help text and exits. ! ! @item -r ! ! Repair mode. This mode transforms MSDOS-style EOL (CRLF) into ! UNIX-style EOL (LF). It ignores Cntl-Z thus it will not truncate the file. ! CR sequences in front of LFs are left unchanged. This mode is intended ! for repairing files that have erroneously been transmited in text-mode ! instead of binary-mode during a FTP session. ! ! @item -s ! ! Strip mode. It transforms MSDOS-style EOL (CRLF) into UNIX-style EOL (LF) ! and strips a CR sequence of arbitrary length from a file, if the sequence ! followed by a LF. CR sequences that are not followed by a LF are left ! unchanged. ! ! @item -t ! ! Timestamp. With this option the timestamp of file (modified or not modified) ! will be preserved. ! ! @item -v ! ! Verbose mode. ! ! @end table ! ! The program is backward compatible with previous program versions if no options ! are given at all. In this case, an occurrence of Cntl-Z will truncate the file, ! MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence ! stripping will not happen at all. Also the timestamp will not be alterated. ! ! The table below summarizes the exit status. When wildcards are used ! the exit status always refers to the last processed file. ! ! @strong{Exit status:} ! ! @enumerate 0 ! ! @item ! File is unchanged. ! @item ! At least one CRLF to LF convertion has occurred in the processed file. ! @item ! At least one CR sequence has been removed from a LF in the processed file. ! @item ! At least one CR sequence has been removed from a LF and one CRLF to CR ! conversion has occurred in the processed file. ! @item ! Cntl-Z (software EOF) has occurred, thus the processed file has been truncated. ! @item ! Cntl-Z has occurred, thus the processed file has been truncated and at least ! one CRLF to LF convertion has occurred. ! @item ! Cntl-Z has occurred, thus the processed file has been truncated and at least ! one CR sequence has been removed from a LF in the processed file. ! @item ! At least one LF has ocurred without a preceeding CR in the processed file. ! @item ! At least one LF has ocurred without a preceeding CR and at least one ! CRLF to LF convertion has occurred in the processed file. ! @item ! At least one LF has ocurred without a preceeding CR and at least one ! CR sequence has been removed from a LF in the processed file. ! @item ! At least one LF has ocurred without a preceeding CR, one CR sequence ! has been removed from a LF and one CRLF to LF conversion has occurred ! in the processed file. ! @item ! At least one LF has ocurred without a preceeding CR and at least one ! Cntl-Z has occurred. The processed file has been truncated. ! @item ! At least one LF has ocurred without a preceeding CR, one CRLF to LF ! convertion and one Cntl-Z has occurred in the processed file. ! The file has been truncated. ! @item ! At least one LF has ocurred without a preceeding CR, one CR sequence ! has been removed and one Cntl-Z has occurred in the processed file. ! The file has been truncated. ! @item ! At least one LF has ocurred without a preceeding CR, one CR sequence ! has been removed, CRLF to LF convertion and one Cntl-Z has occurred ! in the processed file. The file has been truncated. ! @end enumerate ! @enumerate 16 ! @item ! Some I/O error occurred. ! @end enumerate @c ----------------------------------------------------------------------------- @node utod, gxx, dtou, Top @chapter utod