Mail Archives: cygwin/2003/01/15/19:38:37
Mailing list search didn't find this, nor does it appear
in the FAQ... hopefully this isn't old news to all of you.
Files read from a pipe are treated differently by grep
than files read directly. This results in some unexpected
(by me) behaviour when using grep on files which use
the a DOS line-end (cr/nl). This looks like a bug to me.
I'd expect the following commands to have equivalent
results:
grep myregex blah
grep myregex < blah
cat blah | grep myregex
They are equivalent when the regular file blah uses
Unix line ends, but they differ for a file blahdos which
uses DOS line ends. It appears to me as though grep
is treating its input as binary when reading from a pipe,
but correctly using "undossify_input()" in other cases.
Here is an example. I've created two files, blah (nl line-end)
and blahdos (cr/nl line-end).
$ cat blah
foobarTest
$ od -Ax -a blah
000000 f o o b a r T e s t nl
00000b
$ od -Ax -a blahdos
000000 f o o b a r T e s t cr nl
00000c
These files should match the regex 'Test$' in all cases,
but grep on blahdos fails for this case:
$ cat blahdos | grep 'Test$'
$
And here's why (not the -v to invert the match so we have
something to look at):
$ cat blahdos | grep -v 'Test$' | od -Ax -a
000000 f o o b a r T e s t cr nl
00000c
There's still a cr/nl on the output which wouldn't be there if
grep had interpreted its input as having DOS line ends. Here's
what a successful grep of the UNIX line end file looks like:
$ cat blah | grep 'Test$' | od -Ax -a
000000 f o o b a r T e s t nl
00000b
In fact, if I read the blahdos file in any other way except through
a pipe, it successfully matches (note the stripped out cr on the output):
$ grep 'Test$' blahdos | od -Ax -a
000000 f o o b a r T e s t nl
00000b
$ grep 'Test$' < blahdos | od -Ax -a
000000 f o o b a r T e s t nl
00000b
Just in case you might think that this has something to do with cat
(I did), here's the output of cat for each file:
$ cat blah | od -Ax -a
000000 f o o b a r T e s t nl
00000b
$ cat blahdos | od -Ax -a
000000 f o o b a r T e s t cr nl
00000c
Using head instead of cat gives the same results as well, just to
completely remove cat from the picture.
I'm currently running these versions of tools on win2k:
cygwin 1.3.18-1
textutils 2.0.21 (cat, od, head)
grep 2.5
bash 2.05b.0(8)-release
I also tried this out with cygwin 1.3.17-1 with identical results.
If you need any further information, please cc me directly since I
don't read the mailing lists very often.
Stacey.
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -