Mail Archives: cygwin/2003/04/22/01:44:19
Roman Belenov wrote:
> I encountered that cygwin tools can generate file with strange line
> endings in certain situation. I have a file (name it foo.txt) with
> dos-style line endings in text mounted directory. If I do
> gawk {print;} <foo.txt >bar.txt
> or
> cat foo.txt >bar.txt
> I get a copy of foo.txt. But if I do
> cat foo.txt | gawk {print;} >bar.txt
> I get 0xd doubled in line separators (so lines are separated with 0xd
> 0xd 0xa in bar.txt).
>
> <disclaimer>
> This is just a bug report, I don't expect timely reaction of any
> kind.
> </disclaimer>
>
> --
> With regards, Roman.
This is very interesting as I couldn't reproduce Roman's results at
all, although I did get some results that I didn't expect. Details
follow.
System: Win98SE
Cygwin: 1.3.22
Gawk: 3.1.2-2
$ echo "CYGWIN = $CYGWIN"
CYGWIN = tty
$ mount # output wrapped at col 72
C:\Cygwin\usr\X11R6\lib\X11\fonts on /usr/X11R6/lib/X11/fonts type
system (binmode)
C:\Cygwin\bin on /usr/bin type system (binmode)
C:\Cygwin\lib on /usr/lib type system (binmode)
C:\Cygwin on / type system (binmode)
a: on /cygdrive/a type user (textmode)
c: on /cygdrive/c type user (binmode,noumount)
d: on /cygdrive/d type user (binmode,noumount)
$ cd /cygdrive/a
The following 3 commands give the output that I expect.
$ od -ba foo.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
$ cat foo.txt | od -ba # same as above - as it should be: UUOC ;-)
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
$ cat foo.txt >bar.txt;od -ba bar.txt # as expected
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
However, this doesn't:
$ awk 1 foo.txt | od -ba
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
For a text mount I'd expect "\n" -> "\r\n" translation on output, but
it doesn't seem to be happening.
Other gawk Windows ports normally translate "\r\n" -> "\n" on input and
"\n" -> "\r\n" on output, unless the BINMODE variable is used. This is
so that gawk can work internally with "\n" as a line ending, but handle
the system's line endings correctly. [See gawk manual]
For the Cygwin port and a text mount I'd expect the same behaviour,
i.e., "\r\n" -> "\n" on input and "\n" -> "\r\n" on output, unless the
BINMODE variable was set.
Next I took a file on the text mount with unix line endings:
$ od -ba unixle.txt
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
$ cat unixle.txt | od -ba # no surprise here
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
$ cat unixle.txt >bar.txt;od -ba bar.txt # s/b "\r\n" endings surely?
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
$ awk 1 unixle.txt | od -ba # s/b "\r\n" endings surely?
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
For the above 2 commands the results seem odd again to me as I would
expect the output files to be "\r\n" terminated.
I re-read the rules in the Cygwin manual about line end translation and
tried this:
$ od -ba a:foo.txt # as expected
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
$ awk 1 a:foo.txt >bar.txt;od -ba bar.txt
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
But:
$ awk 1 a:foo.txt >a:bar.txt;od -ba bar.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
As the manual says if you use a path for the file that includes a drive
letter then the mount for that file is text, but shouldn't we get the
same output without the drive letter as /cygdrive/a is text mounted?
Interestingly (still on the text mounted /cygdrive/a):
$ od -ba unixle.txt
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
$ awk 1 a:unixle.txt >bar.txt;od -ba bar.txt
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
$ awk 1 a:unixle.txt >a:bar.txt;od -ba bar.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
$ awk 1 unixle.txt >a:bar.txt;od -ba bar.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
These are as I would expect, given the manual's rules.
How about a bin mount I thought? So:
$ cd ~
$ od -ba foo.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
$ cat foo.txt | od -ba
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
$ cat foo.txt >bar.txt;od -ba bar.txt # mmm should cat translate?
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl
0000017
$ awk 1 foo.txt | od -ba # well awk does ...
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
$ awk 1 foo.txt >bar.txt;od -ba bar.txt # ... however you do it
0000000 061 012 062 012 063 012 064 012 065 012
1 nl 2 nl 3 nl 4 nl 5 nl
0000012
$ # yes, I know the last two should work the same.
So it seems that with gawk on a bin mount we get line end translation
on output, but not on a text mount, unless you force Cygwin to do it by
using a drive letter in the file path.
Or am I missing something significant in the documentation?
Peter S Tillier
"Who needs perl when you can write dc, sokoban,
arkanoid and an unlambda interpreter in sed?"
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -