Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com X-Authentication-Warning: slinky.cs.nyu.edu: pechtcha owned process doing -bs Date: Wed, 15 Jan 2003 19:48:32 -0500 (EST) From: Igor Pechtchanski Reply-To: cygwin AT cygwin DOT com To: Stacey Sheldon cc: cygwin AT cygwin DOT com Subject: Re: 1.3.18: BUG: Piping DOS files to grep (v2.5) doesn't work properly In-Reply-To: <231417CB271FD61197020002A593077FC4BEB3@cat01s2c.catena.com> Message-ID: Importance: Normal MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Wed, 15 Jan 2003, Stacey Sheldon wrote: > Mailing list search didn't find this, nor does it appear > in the FAQ... hopefully this isn't old news to all of you. > > Files read from a pipe are treated differently by grep > than files read directly. This results in some unexpected > (by me) behaviour when using grep on files which use > the a DOS line-end (cr/nl). This looks like a bug to me. > > I'd expect the following commands to have equivalent > results: > > grep myregex blah > grep myregex < blah > cat blah | grep myregex > > They are equivalent when the regular file blah uses > Unix line ends, but they differ for a file blahdos which > uses DOS line ends. It appears to me as though grep > is treating its input as binary when reading from a pipe, > but correctly using "undossify_input()" in other cases. > > Here is an example. I've created two files, blah (nl line-end) > and blahdos (cr/nl line-end). > > $ cat blah > foobarTest > $ od -Ax -a blah > 000000 f o o b a r T e s t nl > 00000b > $ od -Ax -a blahdos > 000000 f o o b a r T e s t cr nl > 00000c > > These files should match the regex 'Test$' in all cases, > but grep on blahdos fails for this case: > > $ cat blahdos | grep 'Test$' > $ > > And here's why (not the -v to invert the match so we have > something to look at): > > $ cat blahdos | grep -v 'Test$' | od -Ax -a > 000000 f o o b a r T e s t cr nl > 00000c > > There's still a cr/nl on the output which wouldn't be there if > grep had interpreted its input as having DOS line ends. Here's > what a successful grep of the UNIX line end file looks like: > > $ cat blah | grep 'Test$' | od -Ax -a > 000000 f o o b a r T e s t nl > 00000b > > In fact, if I read the blahdos file in any other way except through > a pipe, it successfully matches (note the stripped out cr on the output): > > $ grep 'Test$' blahdos | od -Ax -a > 000000 f o o b a r T e s t nl > 00000b > $ grep 'Test$' < blahdos | od -Ax -a > 000000 f o o b a r T e s t nl > 00000b > > Just in case you might think that this has something to do with cat > (I did), here's the output of cat for each file: > > $ cat blah | od -Ax -a > 000000 f o o b a r T e s t nl > 00000b > $ cat blahdos | od -Ax -a > 000000 f o o b a r T e s t cr nl > 00000c > > Using head instead of cat gives the same results as well, just to > completely remove cat from the picture. > > I'm currently running these versions of tools on win2k: > cygwin 1.3.18-1 > textutils 2.0.21 (cat, od, head) > grep 2.5 > bash 2.05b.0(8)-release > > I also tried this out with cygwin 1.3.17-1 with identical results. > > If you need any further information, please cc me directly since I > don't read the mailing lists very often. > > Stacey. Stacey, This is not a bug. This is expected behavior. For details, read . Igor -- http://cs.nyu.edu/~pechtcha/ |\ _,,,---,,_ pechtcha AT cs DOT nyu DOT edu ZZZzz /,`.-'`' -. ;-;;,_ igor AT watson DOT ibm DOT com |,4- ) )-,_. ,\ ( `'-' Igor Pechtchanski '---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow! Oh, boy, virtual memory! Now I'm gonna make myself a really *big* RAMdisk! -- /usr/games/fortune -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/