delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/10/07/12:24:53

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <4ACCC0BC.4050204@freesbee.fr>
Date: Wed, 07 Oct 2009 18:24:28 +0200
From: =?ISO-8859-1?Q?Vincent_Rivi=E8re?= <vincent DOT riviere AT freesbee DOT fr>
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: Additional carriage return added by cygwin commands to DOS text files
References: <loom DOT 20091007T161054-245 AT post DOT gmane DOT org> <4ACCB085 DOT 3070304 AT freesbee DOT fr> <4ACCB4AE DOT 8030409 AT freesbee DOT fr> <loom DOT 20091007T174555-684 AT post DOT gmane DOT org>
In-Reply-To: <loom.20091007T174555-684@post.gmane.org>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

ttjqryfbndgdx wrote:
> Note that I don't have the issue with cat.
> bash-3.2$ cat test1 > test2
> bash-3.2$ xxd test2
> 0000000: 6161 610d 0a62 6262 0d0a                 aaa..bbb..

"cat" consider input and output as binary.
So the syntax "cat a > b" is always equivalent as "cp a b".

Now if you think that cat should consider the files as text, telling 
Cygwin to remove CR on input and add them on output:
There is an error on input (the CR are not removed)
and an error on output (they are not added).
The 2 errors cancel themselves, so the result is still good.

> I don't have it with sort used alone :
> bash-3.2$ /usr/bin/sort test1 > test2
> bash-3.2$ xxd test2
> 0000000: 6161 610d 0a62 6262 0d0a                 aaa..bbb..

"sort" open both input and output as text, it is what I call a "good 
text filter", like "more".

> But get it when using sort in a pipe with cat :
> bash-3.2$ cat test1 | /usr/bin/sort > test2
> bash-3.2$ xxd test2
> 0000000: 6161 610d 0d0a 6262 620d 0d0a            aaa...bbb...

"cat" opens test1 in binary: error on input.
The unexpected CRs goes into cat memory, then into the pipe, then into 
the sort memory, then into the output file, where additional CR are 
inserted, because sort use text-mode output.

> But using more instead of cat solves the issue :
> bash-3.2$ more test1 | /usr/bin/sort > test2
> bash-3.2$ xxd test2
> 0000000: 6161 610d 0a62 6262 0d0a                 aaa..bbb..

Same as sort.

test1 is opened in text mode by more, CRs are automatically stripped.
The correct data free of CR goes through "more" memory, the pipe, then 
"sort" memory.
Then test2 is opened for output in text mode and the CR automagically 
appears.

The key thing to understand is that when text files are opened using 
text mode (as they should always be), the programs never see the CR in 
memory. They are automatically stripped/appended by Cygwin when 
reading/writing into real files. Note that pipes (unlike real files) 
always contain binary data, without CRs.

No mystery (but hard to understand at first).

-- 
Vincent Rivière

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019