delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2003/01/15/19:38:37

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Message-ID: <231417CB271FD61197020002A593077FC4BEB3@cat01s2c.catena.com>
From: Stacey Sheldon <ssheldon AT catena DOT com>
To: "'cygwin AT cygwin DOT com'" <cygwin AT cygwin DOT com>
Subject: 1.3.18: BUG: Piping DOS files to grep (v2.5) doesn't work properl
y
Date: Wed, 15 Jan 2003 19:39:38 -0500
MIME-Version: 1.0

Mailing list search didn't find this, nor does it appear
in the FAQ... hopefully this isn't old news to all of you.

Files read from a pipe are treated differently by grep
than files read directly.  This results in some unexpected
(by me) behaviour when using grep on files which use
the a DOS line-end (cr/nl).  This looks like a bug to me.

I'd expect the following commands to have equivalent
results:

  grep myregex blah
  grep myregex < blah
  cat blah | grep myregex

They are equivalent when the regular file blah uses
Unix line ends, but they differ for a file blahdos which
uses DOS line ends.  It appears to me as though grep
is treating its input as binary when reading from a pipe,
but correctly using "undossify_input()" in other cases.

Here is an example.  I've created two files, blah (nl line-end)
and blahdos (cr/nl line-end).

   $ cat blah
   foobarTest
   $ od -Ax -a blah
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ od -Ax -a blahdos
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

These files should match the regex 'Test$' in all cases,
but grep on blahdos fails for this case:

   $ cat blahdos | grep 'Test$'
   $

And here's why (not the -v to invert the match so we have
something to look at):

   $ cat blahdos | grep -v 'Test$' | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

There's still a cr/nl on the output which wouldn't be there if
grep had interpreted its input as having DOS line ends.  Here's
what a successful grep of the UNIX line end file looks like:

   $ cat blah | grep 'Test$' | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b

In fact, if I read the blahdos file in any other way except through
a pipe, it successfully matches (note the stripped out cr on the output):

   $ grep 'Test$' blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ grep 'Test$' < blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b

Just in case you might think that this has something to do with cat
(I did), here's the output of cat for each file:

   $ cat blah | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  nl
   00000b
   $ cat blahdos | od -Ax -a
   000000   f   o   o   b   a   r   T   e   s   t  cr  nl
   00000c

Using head instead of cat gives the same results as well, just to 
completely remove cat from the picture.

I'm currently running these versions of tools on win2k:
  cygwin     1.3.18-1
  textutils  2.0.21 (cat, od, head)
  grep       2.5
  bash       2.05b.0(8)-release

I also tried this out with cygwin 1.3.17-1 with identical results.

If you need any further information, please cc me directly since I
don't read the mailing lists very often.

Stacey.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019