DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4816HSYR443379 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=e95H0y/T X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E35E33858280 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1725171446; bh=V1wn8O1uD8o1woAt36WzxFulG5/TSryCzQlZFx7ttF8=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=e95H0y/TpYeseauHOkF1eSVJQEok4T/khbODhlu64hlhWPKxXdjOlcZbwuW0RS9MP ZOv81zzfLozOTUKBi0N7FgeRQbkRzRKKYY58I0gFrFvRLsgthXyPw+22UOH217RXbY Qozz2+SJNaDPVXYy//0d3tIyj8c3uXRM09BARAT8= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EBD513858D34 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EBD513858D34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725171386; cv=none; b=OXQZlqNEGfO490tk49FWXmfjUTus6y/pxb+iPd8zJ4EtB461fiv/1r/5kjrCxACNzL6QCWUsY+wCNiKlBJ0NQxZz9yuBAicmHllEpLlA+WKoYWdP80gzaEyIeAqyjy+GCY4AMPwVgxF8hFHfmvI2KDv0oRbhCaywhO0O1gU9TwE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725171386; c=relaxed/simple; bh=e6479mojJsFfznC4xl52Qbm2/jpoFlyhapQgtfCrEWo=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=nf7JyNFfaku1NYPqFviw53/8i9D0dzOm0tEMlBZIf+Pq1obivx6MXxGONHifq0Zsp3N8/BQGpSYPR+Mtb3C6dfsBmSdJBWkOumWGk18JgEKCB8yKzo5v5BFfaZO+V1SGl/svpP/xTIdQtYbH1hgdQQaAkxbWw2jMFz+5DneeZqU= ARC-Authentication-Results: i=1; server2.sourceware.org Message-ID: Date: Sun, 1 Sep 2024 00:16:20 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: grepping a large file through a pipe takes eons Content-Language: en-CA To: cygwin AT cygwin DOT com References: <20240901042425 DOT 702a5242c4bd5573ae993497 AT nifty DOT ne DOT jp> Organization: Systematic Software In-Reply-To: <20240901042425.702a5242c4bd5573ae993497@nifty.ne.jp> X-Rspamd-Queue-Id: 2F82C18 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Stat-Signature: prhuz1tky99s9xzrcqo8kanupfon9knb X-Rspamd-Server: rspamout08 X-Session-Marker: 427269616E2E496E676C69734053797374656D6174696353572E61622E6361 X-Session-ID: U2FsdGVkX18Vycj+o4CfF6LD1N8zDp2gyru18N9fqpQ= X-HE-Tag: 1725171381-500491 X-HE-Meta: U2FsdGVkX193yJ50y/sXvyWY78d381lWFnjsfBQ6L514dOidSCPNpJfqro6nx38+45CRM2GI9HXaFjZm34SSBIp/Uhc/Qt+J30e+V5b38Z0P7dGYoTKg1z444YrtDaV1dRJilSmwlbUnw2mn07zp/G28cCJxpUFUl09IlxFQUn0XDCaLnVkSMZqmN/I7dGuHLuu/x1qUwQND6xlDA8EmegTkK9i/Afzpxz+J9qqmIEZgtMb/YnLu0+lXUcDtZf2X87d60GVIqvkh1C9Uf8R9Cpw9DIqvrPCG5Q4H7UbXt3HMWdsDpmvZG87Dd4BAB+yvCwXDujb+RfdRy5AjRnPgrVjekpLB5IfE X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Brian Inglis via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Brian Inglis Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4816HSYR443379 On 2024-08-31 13:24, Takashi Yano via Cygwin wrote: > On Sat, 31 Aug 2024 09:59:11 -0600 > Jim Reisert AD1C wrote: >> Something has changed in the last month or two. I have a very large >> file I am trying to grep (465 MB): >> >> -rwxrw----+ 1 jjrei jjrei 465092052 Aug 31 09:39 all_spots.txt >> >> >> If I grep for something near the end of the file, the results return right away: >> >> # time grep -n N0FUL all_spots.txt >> >> 17027336:N0FUL,20240615,20240615,1 >> 17027337:N0FUL,20240629,20240629,1 >> >> real 0m0.190s >> user 0m0.078s >> sys 0m0.078s >> >> >> If I pipe the file through cat, grep takes much longer: >> >> # time cat all_spots.txt | grep -n N0FUL >> >> 17027336:N0FUL,20240615,20240615,1 >> 17027337:N0FUL,20240629,20240629,1 >> >> >> real 1m4.934s >> user 0m0.031s >> sys 0m0.124s > > Thanks for the report. This seems to be a regression of cygwin 3.5.4. > I'll submit a patch for this issue shortly. Remember many Unix derived utilities use mmap-ed files when available, to have the paging system handle file I/O, allowing them to use memory operations to do read/write operations and searches at high speed. It would be worth your while to time grepping all files vs cat into one file and grep that. In either case, it will mostly be faster to operate directly on files. $ ls -1gloU /var/log/*.log | awk '{t+=$3};END{print int(NR/1024+0.5) "k files",int(t/1024/1024+0.5) "MB"}' 26k files 59MB $ time grep -h -e cygwin -- /var/log/*.log > /tmp/grep.log real 0m8.996s user 0m1.015s sys 0m7.983s $ time cat -- /var/log/*.log > /tmp/var.log && grep -h -e cygwin -- /tmp/var.log > /tmp/cat-grep.log real 0m9.557s user 0m0.953s sys 0m8.609s $ wc -lc -- /tmp/var.log /tmp/*grep.log 708552 61905630 /tmp/var.log 35481 5652354 /tmp/cat-grep.log 35481 5652354 /tmp/grep.log -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple