delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/12/13/14:18:15

X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Subject: RE: xargs gives grep/gawk too much
Date: Wed, 13 Dec 2006 14:17:56 -0500
Message-ID: <31DDB7BE4BF41D4888D41709C476B657041694B6@NIHCESMLBX5.nih.gov>
In-Reply-To: <20061210063339.GB15846@trixie.casa.cgf.cx>
From: "Buchbinder, Barry \(NIH/NIAID\) [E]" <BBuchbinder AT niaid DOT nih DOT gov>
To: <cygwin AT cygwin DOT com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id kBDJIC1B003172

Christopher Faylor wrote on Sunday, December 10, 2006 1:34 AM:

> On Sun, Dec 10, 2006 at 01:21:19AM -0500, Christopher Faylor wrote:
>> On Sat, Dec 09, 2006 at 11:31:05AM -0800, Karr, David wrote:
>>> If the point of this note is to get your pipeline to work, would it
>>> help if you added something like "-n 30" to the xargs command?  This
>>> should execute one instance of 'grep -l -e "error '1234567890'"' for
>>> every 30 lines of output from the previous pipe entry.
>>> 
>>> 	$ echo path/* | \
>>> 		tr ' ' '\n' | \
>>> 		grep -v -e '\*' | \
>>> 		xargs -r grep -l -e "error '1234567890'" | \
>>> 		rest_of_pipe
>> 
>> I'm sure that the point of the note was to report a Cygwin bug.
> 
> ...but I can't duplicate this with the latest snapshot...

Sorry for my delayed follow-up, but between family duties and work ...

Below is a test that looks at length of the command line and the number
of arguments.  I started out by looking for the exact line length that
would cause problems.  The line length that resulted well was under 32k
so I started looking at the number of arguments.  I initially came up
with a number in the 5000s.  It initially seemed to be consistent but
after I took a break, did some things in other bash shells, and started
up again (I think I was in the same shell, the number changed.

As you can see below, I'm now getting errors with half as many
arguments.  (Different machines (laptop vs. desktop) but the setups
should be very close, if not identical.)  Especially interesting, is
that as the line length gets longer, a number of arguments that had been
giving error stops.

Also, inserting tr ' ' '\0' between gawk and xargs and adding -0 to
xargs shifted the point where the error start down one from 2868/2869 to
2867/2868.

As for my scripts, changing the directory from which I was doing the
echo path/* fixed things, though now that I've done these experiments, I
may switch to using the -s or -n options of xargs.  Thanks for the
suggestions.

And a big thanks in general to all the cygwin developers and package
maintainers and in particular to whoever may decide to debug this
(presuming that this is a problem with cygwin or one of its packages and
that one of the volunteers decides to debug it).  And if this is not a
cygwin/package problem, I'd still appreciate any hints about what I
might be doing wrong.

- Barry

===

for X in 1 2 4 8 16
do
    for A in 2968 2969
    do
        echo
        echo $X $A | \
            gawk '{
                for (N = 1; N <= $1; N++) { x = x "x" }
                x = " " x
                for (N = 1; N <= $2; N++) { o = o x }
                print "arg len: " $1 ", no args: " $2 ", tot len: "
length(o) - 1 > "/dev/stderr"
                print substr(o,2)
                }' | \
            xargs -r echo | \
            wc | \
            grep -v -e ' 0  * 0  * 0'
    done
done

arg len: 1, no args: 2968, tot len: 5935
      1    2968    5936

arg len: 1, no args: 2969, tot len: 5937
xargs: echo: Argument list too long

arg len: 2, no args: 2968, tot len: 8903
      1    2968    8904

arg len: 2, no args: 2969, tot len: 8906
xargs: echo: Argument list too long

arg len: 4, no args: 2968, tot len: 14839
      1    2968   14840

arg len: 4, no args: 2969, tot len: 14844
xargs: echo: Argument list too long

arg len: 8, no args: 2968, tot len: 26711
      1    2968   26712

arg len: 8, no args: 2969, tot len: 26720
      1    2969   26721

arg len: 16, no args: 2968, tot len: 50455
      2    2968   50456

arg len: 16, no args: 2969, tot len: 50472
      2    2969   50473

===

Doing it again later in a different bash shell gave the following.  So
it still is changing with immediate conditions.

===

arg len: 1, no args: 2968, tot len: 5935
      1    2968    5936

arg len: 1, no args: 2969, tot len: 5937
      1    2969    5938

arg len: 2, no args: 2968, tot len: 8903
      1    2968    8904

arg len: 2, no args: 2969, tot len: 8906
      1    2969    8907

arg len: 4, no args: 2968, tot len: 14839
      1    2968   14840

arg len: 4, no args: 2969, tot len: 14844
      1    2969   14845

arg len: 8, no args: 2968, tot len: 26711
xargs: echo: Argument list too long

arg len: 8, no args: 2969, tot len: 26720
xargs: echo: Argument list too long

arg len: 16, no args: 2968, tot len: 50455
xargs: echo: Argument list too long

arg len: 16, no args: 2969, tot len: 50472
xargs: echo: Argument list too long

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019