Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <002e01c30892$2ae05c20$5c16989e@oemcomputer> Reply-To: "Peter S Tillier" From: "Peter S Tillier" To: , "Roman Belenov" References: Subject: Re: Erroneous line endings (cat,gawk,text mount) Date: Tue, 22 Apr 2003 06:43:55 +0100 X-Priority: 3 X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Roman Belenov wrote: > I encountered that cygwin tools can generate file with strange line > endings in certain situation. I have a file (name it foo.txt) with > dos-style line endings in text mounted directory. If I do > gawk {print;} bar.txt > or > cat foo.txt >bar.txt > I get a copy of foo.txt. But if I do > cat foo.txt | gawk {print;} >bar.txt > I get 0xd doubled in line separators (so lines are separated with 0xd > 0xd 0xa in bar.txt). > > > This is just a bug report, I don't expect timely reaction of any > kind. > > > -- > With regards, Roman. This is very interesting as I couldn't reproduce Roman's results at all, although I did get some results that I didn't expect. Details follow. System: Win98SE Cygwin: 1.3.22 Gawk: 3.1.2-2 $ echo "CYGWIN = $CYGWIN" CYGWIN = tty $ mount # output wrapped at col 72 C:\Cygwin\usr\X11R6\lib\X11\fonts on /usr/X11R6/lib/X11/fonts type system (binmode) C:\Cygwin\bin on /usr/bin type system (binmode) C:\Cygwin\lib on /usr/lib type system (binmode) C:\Cygwin on / type system (binmode) a: on /cygdrive/a type user (textmode) c: on /cygdrive/c type user (binmode,noumount) d: on /cygdrive/d type user (binmode,noumount) $ cd /cygdrive/a The following 3 commands give the output that I expect. $ od -ba foo.txt 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 $ cat foo.txt | od -ba # same as above - as it should be: UUOC ;-) 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 $ cat foo.txt >bar.txt;od -ba bar.txt # as expected 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 However, this doesn't: $ awk 1 foo.txt | od -ba 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 For a text mount I'd expect "\n" -> "\r\n" translation on output, but it doesn't seem to be happening. Other gawk Windows ports normally translate "\r\n" -> "\n" on input and "\n" -> "\r\n" on output, unless the BINMODE variable is used. This is so that gawk can work internally with "\n" as a line ending, but handle the system's line endings correctly. [See gawk manual] For the Cygwin port and a text mount I'd expect the same behaviour, i.e., "\r\n" -> "\n" on input and "\n" -> "\r\n" on output, unless the BINMODE variable was set. Next I took a file on the text mount with unix line endings: $ od -ba unixle.txt 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 $ cat unixle.txt | od -ba # no surprise here 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 $ cat unixle.txt >bar.txt;od -ba bar.txt # s/b "\r\n" endings surely? 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 $ awk 1 unixle.txt | od -ba # s/b "\r\n" endings surely? 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 For the above 2 commands the results seem odd again to me as I would expect the output files to be "\r\n" terminated. I re-read the rules in the Cygwin manual about line end translation and tried this: $ od -ba a:foo.txt # as expected 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 $ awk 1 a:foo.txt >bar.txt;od -ba bar.txt 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 But: $ awk 1 a:foo.txt >a:bar.txt;od -ba bar.txt 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 As the manual says if you use a path for the file that includes a drive letter then the mount for that file is text, but shouldn't we get the same output without the drive letter as /cygdrive/a is text mounted? Interestingly (still on the text mounted /cygdrive/a): $ od -ba unixle.txt 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 $ awk 1 a:unixle.txt >bar.txt;od -ba bar.txt 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 $ awk 1 a:unixle.txt >a:bar.txt;od -ba bar.txt 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 $ awk 1 unixle.txt >a:bar.txt;od -ba bar.txt 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 These are as I would expect, given the manual's rules. How about a bin mount I thought? So: $ cd ~ $ od -ba foo.txt 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 $ cat foo.txt | od -ba 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 $ cat foo.txt >bar.txt;od -ba bar.txt # mmm should cat translate? 0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 0000017 $ awk 1 foo.txt | od -ba # well awk does ... 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 $ awk 1 foo.txt >bar.txt;od -ba bar.txt # ... however you do it 0000000 061 012 062 012 063 012 064 012 065 012 1 nl 2 nl 3 nl 4 nl 5 nl 0000012 $ # yes, I know the last two should work the same. So it seems that with gawk on a bin mount we get line end translation on output, but not on a text mount, unless you force Cygwin to do it by using a drive letter in the file path. Or am I missing something significant in the documentation? Peter S Tillier "Who needs perl when you can write dc, sokoban, arkanoid and an unlambda interpreter in sed?" -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/