Mail Archives: djgpp-workers/2000/11/23/02:08:27
Date: Wed, 8 Nov 2000 09:43:15 +0200 (WET)
From: Andris Pavenis <pavenis AT lanet DOT lv>
> One additional suggestion: There are small DOS utility in Simtelnet
> (simtelnet/msdos/fileutils/nocrlf10.zip) which permits to repair
> binary files which are errorously transfered as text (it was DOS only so
> no LFN support, of course). Now only change we need for that in dtou is to
> skip Ctrl-Z processing. My suggestion is to do that if executable name is
> nocrlf only. So one can do:
Date: Wed, 8 Nov 2000 13:27:53 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
> And, if we are talking about adding features to DTOU, here's a small
> wishlist:
>
> - add verbose operation option, whereby the program will print whether
> it removed any CR's and ^Z's, and whether some lines had LF without a
> CR (a sure sign the file is either binary or has inconsistent EOL
> format for some other reason).
>
> - add an option which will remove any number of CRs before an LF, as in
> "\r\r\r\r\r\n" (this happens with buggy ports of Unix software, such
> as the Windows CVS client, which always blindly add a CR to LF, even
> if there's already a CR there).
>
> - return an exit status which says whether any changes were done to the
> file.
>
> - add an option which causes the file time stamps to be preserved only
> if the file was left unchanged.
>
> - explain more about how these two programs work in utils.tex.
I have added 5 command-line options to dtou:
-h: Displays a help text and exits.
-r: Repair mode. This mode transforms MSDOS-style EOL (CRLF) into
UNIX-style EOL (LF). It ignores Cntl-Z thus it will not truncate the file.
CR sequences in front of LFs are left unchanged. A CR sequence is a sequence
except for the last CR of the sequence. This last CR together with the LF
forms the MSDOS-style EOL (CRLF). This implies that if there are n CRs followed
by a LF, the sequence is only n-1 CRs long, of course. This mode is intended
for repairing files that have erroneously been transmited in text-mode
instead of binary-mode during a FTP session.
-s: Strip mode. It transforms MSDOS-style EOL (CRLF) into UNIX-style EOL (LF)
and strips a CR sequence of arbitrary length from a file, if the sequence
is followed by a LF. CR sequences that are not followed by a LF are left
unchanged.
-t: Timestamp. With this option the timestamp of a file (modified or not)
will be preserved.
-v: Verbose mode. This mode outputs some information during file processing.
All possible output looks like:
File: foo.c
File unchanged.
At least one CRLF to LF transformation.
Warning: At least one CR sequence striped from a LF.
Warning: At least one Cntl-Z. File truncated at line n.
Warning: At least one LF without a preceeding CR.
The program is backward compatible with previous program versions if no options
are given at all. In this case, an occurrence of Cntl-Z will truncate the file,
MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence
stripping will not happen at all. Also the timestamp will not be alterated.
The table below summarizes the exit status:
0: File is unchanged.
1: At least one CRLF to LF convertion has occurred in the processed file.
2: At least one CR sequence has been removed from a LF in the processed file.
3: At least one CR sequence has been removed from a LF and one CRLF to CR
conversion has occurred in the processed file.
4: Cntl-Z (software EOF) has occurred, thus the processed file has been truncated.
5: Cntl-Z has occurred, thus the processed file has been truncated and at least
one CRLF to LF convertion has occurred.
6: Cntl-Z has occurred, thus the processed file has been truncated and at least
one CR sequence has been removed from a LF in the processed file.
7: At least one LF has ocurred without a preceeding CR in the processed file.
8: At least one LF has ocurred without a preceeding CR and at least one
CRLF to LF convertion has occurred in the processed file.
9: At least one LF has ocurred without a preceeding CR and at least one
CR sequence has been removed from a LF in the processed file.
10: At least one LF has ocurred without a preceeding CR, one CR sequence
has been removed from a LF and one CRLF to LF conversion has occurred
in the processed file.
11: At least one LF has ocurred without a preceeding CR and at least one
Cntl-Z has occurred. The processed file has been truncated.
12: At least one LF has ocurred without a preceeding CR, one CRLF to LF
convertion and one Cntl-Z has occurred in the processed file.
The file has been truncated.
13: At least one LF has ocurred without a preceeding CR, one CR sequence
has been removed and one Cntl-Z has occurred in the processed file.
The file has been truncated.
14: At least one LF has ocurred without a preceeding CR, one CR sequence
has been removed, CRLF to LF convertion and one Cntl-Z has occurred
in the processed file. The file has been truncated.
16: Some I/O error occurred.
I have tested it on DOS. No linux/unix testing at all but I have only used posix functions
so there should appear no diffuculties when compiling and running dtou under unix.
Comments, objections, suggestions, etc. are welcome.
Regards,
Guerrero, Juan
diff -acprNC5 djgpp.orig/src/util/dtou.c djgpp/src/util/dtou.c
*** djgpp.orig/src/util/dtou.c Wed Nov 22 23:43:52 2000
--- djgpp/src/util/dtou.c Thu Nov 23 01:22:58 2000
***************
*** 12,98 ****
#ifndef O_BINARY
#define O_BINARY 0
#endif
static int
! dtou(char *fname)
{
! int i, k, k2, sf, df, l, l2=0, err=0, isCR=0;
! char buf[16384];
char tfname[FILENAME_MAX], *bn, *w;
struct stat st;
struct utimbuf tim1;
! sf = open(fname, O_RDONLY|O_BINARY);
if (sf < 1)
{
! perror(fname);
! return 1;
}
fstat (sf,&st);
tim1.actime = st.st_atime;
tim1.modtime = st.st_mtime;
strcpy (tfname, fname);
! for (bn=w=tfname; *w; w++)
! if (*w=='/' || *w=='\\' || *w==':')
bn = w+1;
if (bn) *bn=0;
! strcat (tfname,"utod.tm$");
! df = open(tfname, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, 0644);
if (df < 1)
{
! perror(tfname);
! close(sf);
! return 1;
}
! k2=0;
! while ((l=read(sf, buf, 16384)) > 0)
{
! int CtrlZ=0;
! for (i=k=0; i<l; i++)
{
! if (isCR && buf[i]!=0x0A) buf[k++] = 0x0D;
! if (buf[i]==0x0D) { isCR=1; continue; }
! if (buf[i]==0x1A) { CtrlZ=1; break; }
! else buf[k++] = buf[i];
! isCR = 0;
}
! l2=(k>0 ? write(df, buf, k) : 0);
! if (l2<0 || CtrlZ) break;
! if (l2!=k) { err=1; break; }
}
! if (l<0) perror (fname);
! if (l2<0) perror (tfname);
! if (err) fprintf (stderr,"Cannot process file %s\n",fname);
! close(sf);
! close(df);
! if (l>=0 && l2>=0 && err==0)
{
! remove(fname);
! rename(tfname, fname);
! utime(fname, &tim1);
! chown(fname, st.st_uid, st.st_gid);
! chmod(fname, st.st_mode);
}
! else
! {
! remove(tfname);
! }
! return 0;
}
int
main(int argc, char **argv)
{
! int rv = 0;
! for (argc--, argv++; argc; argc--, argv++)
! rv += dtou(*argv);
! return rv;
! }
--- 12,292 ----
#ifndef O_BINARY
#define O_BINARY 0
#endif
+ #define IS_DIR_SEPARATOR(path) ((path) == '/' || (path) == '\\' || (path) == ':')
+ #define IS_LAST_IN_BUF (i == l - 1)
+ #define IS_LAST_IN_FILE (position + i + 1 == st.st_size)
+ #define SET_FLAG(flag) \
+ do { \
+ if ((flag) == 0) (flag) = 1; \
+ } while (0)
+ #define BUF_SIZE 16384
+
+ /* Control characters. */
+ #define LF 0x0A
+ #define CR 0x0D
+ #define CntlZ 0x1A
+
+ /* Exit codes. */
+ #define NO_CHANGE 0x00 /* No changes at all have been done to the file. */
+ #define CR_REMOVED 0x01 /* Single CR removed from a LF. */
+ #define nCR_REMOVED 0x02 /* Multiple CRs removed from a LF. */
+ #define CntlZ_EOF 0x04 /* ^Z as EOF appeared. */
+ #define LF_ONLY 0x08 /* A LF without a preceeding CR appeared. */
+
+ #define NO_ERROR 0x00
+ #define IO_ERROR 0x10 /* Some I/O error occurred. */
+
+
static int
! dtou(char *fname, int r_mode, int s_mode, int v_mode, int t_mode)
{
! int i, k, sf, df, l, l2 = 0, is_CR = 0, is_nCR = 0, is_CR_sequence = 0;
! int CntlZ_flag = 0, CR_flag = 0, nCR_flag = 0, LF_flag = 0, exit_status = NO_CHANGE;
! int buf_counter, nbufs, LF_counter, must_rewind, position, offset, whence;
! char buf[BUF_SIZE];
char tfname[FILENAME_MAX], *bn, *w;
struct stat st;
struct utimbuf tim1;
!
! sf = open (fname, O_RDONLY|O_BINARY);
if (sf < 1)
{
! perror (fname);
! return IO_ERROR;
}
fstat (sf,&st);
tim1.actime = st.st_atime;
tim1.modtime = st.st_mtime;
+ nbufs = st.st_size / BUF_SIZE;
strcpy (tfname, fname);
! for (bn = w = tfname; *w; w++)
! if (IS_DIR_SEPARATOR (*w))
bn = w+1;
if (bn) *bn=0;
! strcat (tfname,"dtou.tm$");
! df = open (tfname, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, 0644);
if (df < 1)
{
! perror (tfname);
! close (sf);
! return IO_ERROR;
}
! buf_counter = LF_counter = must_rewind = position = 0;
! if (s_mode)
! {
! offset = 0;
! whence = SEEK_SET;
! }
! else
! {
! offset = -1;
! whence = SEEK_CUR;
! }
! while ((l = read (sf, buf, BUF_SIZE)) > 0)
{
! for (i = k = 0; i < l; i++)
! {
! if (!r_mode)
! if (buf[i] == CntlZ) { SET_FLAG (CntlZ_flag); break; }
! if (s_mode)
{
! if (buf[i] == LF)
! {
! if (!(is_CR || is_nCR)) SET_FLAG (LF_flag);
! if (is_nCR) { SET_FLAG (nCR_flag); is_nCR = 0; }
! if (is_CR) { SET_FLAG (CR_flag); is_CR = 0; }
! LF_counter++;
! offset = must_rewind = 0;
! buf[k++] = buf[i]; continue;
! }
! if (is_CR_sequence)
! {
! if (buf[i] == CR) { buf[k++] = buf[i]; continue; }
! else is_CR_sequence = 0;
! }
! if (is_nCR)
! {
! if (buf[i] != CR || IS_LAST_IN_FILE)
! {
! is_CR_sequence = must_rewind = 1;
! is_nCR = 0; break;
! }
! else
! continue;
! }
! if (is_CR && buf[i] == CR) { is_nCR = 1; is_CR = 0; continue; }
! if (buf[i] == CR)
! {
! if (IS_LAST_IN_FILE) { buf[k++] = buf[i]; break; }
! is_CR = must_rewind = 1;
! offset = position + i;
! continue;
! }
}
! else
! {
! if (buf[i] == LF)
! {
! if (is_CR) SET_FLAG (CR_flag);
! if (!is_CR) SET_FLAG (LF_flag);
! LF_counter++;
! }
! if (is_CR && buf[i] != LF) buf[k++] = CR;
! if (buf[i] == CR)
! {
! if (IS_LAST_IN_BUF)
! {
! if (buf_counter < nbufs) must_rewind = 1;
! else buf[k++] = CR;
! }
! is_CR = 1; continue;
! }
! is_CR = 0;
! }
! buf[k++] = buf[i];
! }
!
! is_CR = 0;
! buf_counter++;
! position += l;
! /* Last character/s in buf are CR/s.
! Push it/them back and reread it/them next time. */
! if (must_rewind)
! {
! position = lseek (sf, offset, whence);
! must_rewind = 0;
! }
!
! l2 = (k > 0 ? write (df, buf, k) : 0);
! if (l2 < 0 || CntlZ_flag) break;
! if (l2 != k) { exit_status = IO_ERROR; break; }
}
! if (l < 0) perror (fname);
! if (l2 < 0) perror (tfname);
! if (exit_status != NO_ERROR)
! fprintf (stderr,"Cannot process file %s\n",fname);
! close (sf);
! close (df);
! if (l >= 0 && l2 >= 0 && exit_status == NO_ERROR)
{
! remove (fname);
! rename (tfname, fname);
! chown (fname, st.st_uid, st.st_gid);
! chmod (fname, st.st_mode);
! if (t_mode)
! utime (fname, &tim1);
! if (v_mode)
! printf ("File: %s\n",fname);
! if (CR_flag)
! {
! exit_status |= CR_REMOVED;
! if (v_mode)
! printf ("At least one CRLF to LF transformation.\n");
! }
! if (nCR_flag)
! {
! exit_status |= nCR_REMOVED;
! if (v_mode)
! printf ("Warning: At least one CR sequence striped from a LF.\n");
! }
! if (CntlZ_flag)
! {
! exit_status |= CntlZ_EOF;
! if (v_mode)
! printf ("Warning: At least one Cntl-Z. File truncated at line %i.\n", LF_counter);
! }
! if (LF_flag)
! {
! exit_status |= LF_ONLY;
! if (v_mode)
! printf ("Warning: At least one LF without a preceeding CR.\n");
! }
! if (v_mode && exit_status == NO_CHANGE)
! printf ("File unchanged.\n");
}
! else
! remove (tfname);
!
! return exit_status;
! }
!
! void
! usage(char *progname)
! {
! printf ("Usage: %s [-h] [-r] [-s] [-t] [-v] files...\n\n", progname);
! printf ("Options are:\n");
! printf (" -h: Display this help and exit.\n");
! printf (" -r: repair mode. Transform MSDOS-style EOF (CRLF) into\n");
! printf (" UNIX-style EOL (LF).\n");
! printf (" Cntl-Z are ignored and will not truncate the file and\n");
! printf (" CR sequences in front of LF will left unchanged.\n");
! printf (" -s: strip mode. Transform MSDOS-style EOF (CRLF) into\n");
! printf (" UNIX-style EOL (LF) and strip a CR sequence of\n");
! printf (" arbitrary length from the file, if and only if\n");
! printf (" the sequence is followed by LF. CR sequences that\n");
! printf (" are not followed by LF are always left unchanged.\n");
! printf (" -t: timestamp. The timestamp of the file (modified or\n");
! printf (" not modified) will be preserved.\n");
! printf (" -v: verbose mode.\n\n");
! printf ("The program is backward compatible with previous program versions if no options\n");
! printf ("are given at all. In this case, an occurrence of Cntl-Z will truncate the file,\n");
! printf ("MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence\n");
! printf ("stripping will not happen at all. Also the timestamp will not be alterated.\n");
}
int
main(int argc, char **argv)
{
! int exit_status = NO_ERROR, i, repair_mode, strip_mode, verbose_mode, timestamp;
! char* progname = strlwr(strdup(argv[0]));
+ if (argc < 2)
+ {
+ usage (progname);
+ exit(NO_ERROR);
+ }
+
+ repair_mode = strip_mode = verbose_mode = 0; /* Default for */
+ timestamp = 1; /* backward compatibility. */
+ i = 1;
+ while ((argc > i) && (argv[i][0] == '-') && argv[i][1])
+ {
+ switch (argv[i][1])
+ {
+ case 'h':
+ usage (progname);
+ exit(NO_ERROR);
+ break;
+ case 'r':
+ repair_mode = 1;
+ strip_mode = 0;
+ timestamp = 0;
+ break;
+ case 's':
+ strip_mode = 1;
+ repair_mode = 0;
+ timestamp = 0;
+ break;
+ case 't':
+ timestamp = 1;
+ break;
+ case 'v':
+ verbose_mode = 1;
+ break;
+ }
+ i++;
+ }
+
+ for (; i < argc; i++)
+ exit_status = dtou (argv[i], repair_mode, strip_mode, verbose_mode, timestamp);
+ return exit_status;
+ }
diff -acprNC5 djgpp.orig/src/util/utils.tex djgpp/src/util/utils.tex
*** djgpp.orig/src/util/utils.tex Wed Nov 22 23:44:24 2000
--- djgpp/src/util/utils.tex Thu Nov 23 01:22:58 2000
*************** so that they won't get mixed with the fi
*** 320,333 ****
@c -----------------------------------------------------------------------------
@node dtou, utod, djtar, Top
@chapter dtou
Each file specified on the command line is converted from dos's CR/LF
text file mode to unix's NL text file mode.
! All djgpp wildcards are supported. Timestamps of the files are preserved.
@c -----------------------------------------------------------------------------
@node utod, gxx, dtou, Top
@chapter utod
--- 320,433 ----
@c -----------------------------------------------------------------------------
@node dtou, utod, djtar, Top
@chapter dtou
+ Usage: @code{dtou} [@code{-h}] [@code{-r}] [@code{-s}] [@code{-t}]
+ [@code{-v}] @file{files}
+
Each file specified on the command line is converted from dos's CR/LF
text file mode to unix's NL text file mode.
! All djgpp wildcards are supported. Timestamps of the files are preserved
! if the files are left unchanged.
!
! @strong{Options:}
!
! @table @code
!
! @item -h
!
! Displays a help text and exits.
!
! @item -r
!
! Repair mode. This mode transforms MSDOS-style EOL (CRLF) into
! UNIX-style EOL (LF). It ignores Cntl-Z thus it will not truncate the file.
! CR sequences in front of LFs are left unchanged. This mode is intended
! for repairing files that have erroneously been transmited in text-mode
! instead of binary-mode during a FTP session.
!
! @item -s
!
! Strip mode. It transforms MSDOS-style EOL (CRLF) into UNIX-style EOL (LF)
! and strips a CR sequence of arbitrary length from a file, if the sequence
! followed by a LF. CR sequences that are not followed by a LF are left
! unchanged.
!
! @item -t
!
! Timestamp. With this option the timestamp of file (modified or not modified)
! will be preserved.
!
! @item -v
!
! Verbose mode.
!
! @end table
!
! The program is backward compatible with previous program versions if no options
! are given at all. In this case, an occurrence of Cntl-Z will truncate the file,
! MSDOS-style EOL (CRLF) are transformed into UNIX-style EOL (LF) and CR sequence
! stripping will not happen at all. Also the timestamp will not be alterated.
!
! The table below summarizes the exit status. When wildcards are used
! the exit status always refers to the last processed file.
!
! @strong{Exit status:}
!
! @enumerate 0
!
! @item
! File is unchanged.
! @item
! At least one CRLF to LF convertion has occurred in the processed file.
! @item
! At least one CR sequence has been removed from a LF in the processed file.
! @item
! At least one CR sequence has been removed from a LF and one CRLF to CR
! conversion has occurred in the processed file.
! @item
! Cntl-Z (software EOF) has occurred, thus the processed file has been truncated.
! @item
! Cntl-Z has occurred, thus the processed file has been truncated and at least
! one CRLF to LF convertion has occurred.
! @item
! Cntl-Z has occurred, thus the processed file has been truncated and at least
! one CR sequence has been removed from a LF in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR and at least one
! CRLF to LF convertion has occurred in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR and at least one
! CR sequence has been removed from a LF in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR, one CR sequence
! has been removed from a LF and one CRLF to LF conversion has occurred
! in the processed file.
! @item
! At least one LF has ocurred without a preceeding CR and at least one
! Cntl-Z has occurred. The processed file has been truncated.
! @item
! At least one LF has ocurred without a preceeding CR, one CRLF to LF
! convertion and one Cntl-Z has occurred in the processed file.
! The file has been truncated.
! @item
! At least one LF has ocurred without a preceeding CR, one CR sequence
! has been removed and one Cntl-Z has occurred in the processed file.
! The file has been truncated.
! @item
! At least one LF has ocurred without a preceeding CR, one CR sequence
! has been removed, CRLF to LF convertion and one Cntl-Z has occurred
! in the processed file. The file has been truncated.
! @end enumerate
! @enumerate 16
! @item
! Some I/O error occurred.
! @end enumerate
@c -----------------------------------------------------------------------------
@node utod, gxx, dtou, Top
@chapter utod
- Raw text -