delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/09/01/02:17:29

DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4816HSYR443379
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=e95H0y/T
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E35E33858280
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1725171446;
bh=V1wn8O1uD8o1woAt36WzxFulG5/TSryCzQlZFx7ttF8=;
h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=e95H0y/TpYeseauHOkF1eSVJQEok4T/khbODhlu64hlhWPKxXdjOlcZbwuW0RS9MP
ZOv81zzfLozOTUKBi0N7FgeRQbkRzRKKYY58I0gFrFvRLsgthXyPw+22UOH217RXbY
Qozz2+SJNaDPVXYy//0d3tIyj8c3uXRM09BARAT8=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EBD513858D34
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EBD513858D34
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725171386; cv=none;
b=OXQZlqNEGfO490tk49FWXmfjUTus6y/pxb+iPd8zJ4EtB461fiv/1r/5kjrCxACNzL6QCWUsY+wCNiKlBJ0NQxZz9yuBAicmHllEpLlA+WKoYWdP80gzaEyIeAqyjy+GCY4AMPwVgxF8hFHfmvI2KDv0oRbhCaywhO0O1gU9TwE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1725171386; c=relaxed/simple;
bh=e6479mojJsFfznC4xl52Qbm2/jpoFlyhapQgtfCrEWo=;
h=Message-ID:Date:MIME-Version:Subject:To:From;
b=nf7JyNFfaku1NYPqFviw53/8i9D0dzOm0tEMlBZIf+Pq1obivx6MXxGONHifq0Zsp3N8/BQGpSYPR+Mtb3C6dfsBmSdJBWkOumWGk18JgEKCB8yKzo5v5BFfaZO+V1SGl/svpP/xTIdQtYbH1hgdQQaAkxbWw2jMFz+5DneeZqU=
ARC-Authentication-Results: i=1; server2.sourceware.org
Message-ID: <b54f8ffa-feea-424b-a8b3-9dfaf4adf00d@SystematicSW.ab.ca>
Date: Sun, 1 Sep 2024 00:16:20 -0600
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: grepping a large file through a pipe takes eons
To: cygwin AT cygwin DOT com
References: <CAK-n8j6cjd5mHah6y1EVgbRsXLrdbati-j1QS1r1+aDc8jwg=g AT mail DOT gmail DOT com>
<20240901042425 DOT 702a5242c4bd5573ae993497 AT nifty DOT ne DOT jp>
Organization: Systematic Software
In-Reply-To: <20240901042425.702a5242c4bd5573ae993497@nifty.ne.jp>
X-Rspamd-Queue-Id: 2F82C18
X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP,
T_SCC_BODY_TEXT_LINE,
UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Stat-Signature: prhuz1tky99s9xzrcqo8kanupfon9knb
X-Rspamd-Server: rspamout08
X-Session-Marker: 427269616E2E496E676C69734053797374656D6174696353572E61622E6361
X-Session-ID: U2FsdGVkX18Vycj+o4CfF6LD1N8zDp2gyru18N9fqpQ=
X-HE-Tag: 1725171381-500491
X-HE-Meta: U2FsdGVkX193yJ50y/sXvyWY78d381lWFnjsfBQ6L514dOidSCPNpJfqro6nx38+45CRM2GI9HXaFjZm34SSBIp/Uhc/Qt+J30e+V5b38Z0P7dGYoTKg1z444YrtDaV1dRJilSmwlbUnw2mn07zp/G28cCJxpUFUl09IlxFQUn0XDCaLnVkSMZqmN/I7dGuHLuu/x1qUwQND6xlDA8EmegTkK9i/Afzpxz+J9qqmIEZgtMb/YnLu0+lXUcDtZf2X87d60GVIqvkh1C9Uf8R9Cpw9DIqvrPCG5Q4H7UbXt3HMWdsDpmvZG87Dd4BAB+yvCwXDujb+RfdRy5AjRnPgrVjekpLB5IfE
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Brian Inglis via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Brian Inglis <Brian DOT Inglis AT SystematicSW DOT ab DOT ca>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4816HSYR443379

On 2024-08-31 13:24, Takashi Yano via Cygwin wrote:
> On Sat, 31 Aug 2024 09:59:11 -0600
> Jim Reisert AD1C wrote:
>> Something has changed in the last month or two.  I have a very large
>> file I am trying to grep (465 MB):
>>
>> -rwxrw----+ 1 jjrei jjrei 465092052 Aug 31 09:39 all_spots.txt
>>
>>
>> If I grep for something near the end of the file, the results return right away:
>>
>> # time grep -n N0FUL all_spots.txt
>>
>> 17027336:N0FUL,20240615,20240615,1
>> 17027337:N0FUL,20240629,20240629,1
>>
>> real    0m0.190s
>> user    0m0.078s
>> sys     0m0.078s
>>
>>
>> If I pipe the file through cat, grep takes much longer:
>>
>> # time cat all_spots.txt | grep -n N0FUL
>>
>> 17027336:N0FUL,20240615,20240615,1
>> 17027337:N0FUL,20240629,20240629,1
>>
>>
>> real    1m4.934s
>> user    0m0.031s
>> sys     0m0.124s
> 
> Thanks for the report. This seems to be a regression of cygwin 3.5.4.
> I'll submit a patch for this issue shortly.

Remember many Unix derived utilities use mmap-ed files when available, to have 
the paging system handle file I/O, allowing them to use memory operations to do 
read/write operations and searches at high speed.
It would be worth your while to time grepping all files vs cat into one file and 
grep that.
In either case, it will mostly be faster to operate directly on files.

$ ls -1gloU /var/log/*.log | awk '{t+=$3};END{print int(NR/1024+0.5) "k 
files",int(t/1024/1024+0.5) "MB"}'
26k files 59MB

$ time grep -h -e cygwin -- /var/log/*.log > /tmp/grep.log

real    0m8.996s
user    0m1.015s
sys     0m7.983s

$ time cat -- /var/log/*.log > /tmp/var.log && grep -h -e cygwin -- /tmp/var.log 
 > /tmp/cat-grep.log

real    0m9.557s
user    0m0.953s
sys     0m8.609s

$ wc -lc -- /tmp/var.log /tmp/*grep.log
   708552 61905630 /tmp/var.log
    35481  5652354 /tmp/cat-grep.log
    35481  5652354 /tmp/grep.log

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019