delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2016/02/17/08:44:30

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=paCcd77P8/BGFRur
64dljBjggVNXx0T7Jlw5sQsudwmsi0Vjuj7fgfJUOv9uk1N3pEGKCBh6YcqGPJji
ilU/RdYRB8jYnJFQaI0ShaiAXDUaAo0pc8VVwm0cPc6gtZqGCcXys0RkWYdMNEz1
/bXTiadlMEz8d9hbJBTha/qCH3Y=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=Svc1RORbTZAocTT1mx/0k3
SvYwQ=; b=E7sJHf8YjVQlNQ9dp9rxOVDouxqoTAkDEZwOcLxk2DThFbUallKwyb
/V8ciMgalh0JsHKqoZVq7oJUI4CjXpFb1gepqVvXaD1RYBMUZKFHdQX5tTPEabYK
UoUPn6yEg20Symq6QbRFgbBI4xZeB53AYmpHRfcWGeJgQwnkZfvoc=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.3 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=nih, Buchbinder, buchbinder, NIH
X-HELO: sender153-mail.zoho.com
Subject: Re: locate and updatedb
To: cygwin AT cygwin DOT com
References: <56BC940F DOT 6070109 AT zoho DOT com> <56BCD05C DOT 2040409 AT gmail DOT com> <56BCD414 DOT 2010304 AT zoho DOT com> <56BD0D87 DOT 6030008 AT gmail DOT com> <56BF1E4D DOT 5000901 AT tlinx DOT org> <6CF2FC1279D0844C9357664DC5A08BA21BD2FA07 AT msgb09 DOT nih DOT gov>
From: Byron Boulton <daytonb AT zoho DOT com>
Message-ID: <56C478E0.70904@zoho.com>
Date: Wed, 17 Feb 2016 08:42:56 -0500
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <6CF2FC1279D0844C9357664DC5A08BA21BD2FA07@msgb09.nih.gov>
X-Zoho-Virus-Status: 1
X-IsSubscribed: yes

On 2/16/2016 5:55 PM, Buchbinder, Barry (NIH/NIAID) [E] wrote:
> Linda Walsh sent the following at Saturday, February 13, 2016 7:15 AM
>> Marco Atzeri wrote: ---
>>> On 11/02/2016 19:33, Byron Boulton wrote:
>>>> On 2/11/2016 1:18 PM, cyg Simple wrote:
>>>>> On 2/11/2016 9:00 AM, Byron Boulton wrote:
>>>>>> Does anyone here have success using `updatedb` and `locate` in
>>>>>> cygwin? I use `locate` heavily on my Linux machines, but everytime
>>>>>> I've tried to run `updatedb` on cygwin I've given up and killed the
>>>>>> process because it is taking too long.
>> There's a reason why on linux it is usually set to run when you are asleep.  ;-)
>>
>>>>>>   Is there something wrong with cygwin's implementation of
>>>>>> `updatedb` making it not work at all or making it slower that on my
>>>>>> Linux machines? Or are there others who have success using it on
>>>>>> cygwin?
>>
>> But it might have to do with disk speed and memory. Laptop drives are
>> usually among the slowest.
>>
>> I ran it just now (this is with MS's Home Essentials real-time
>> protection turned on).
>>> locate / >/tmp/all
>>> wc /tmp/all
>>   1479146   4014375 133322318 /tmp/all
>>> df .
>>
>> law.Bliss/bin> time index_files.sh 670592 (process ID) old priority 0,
>> new priority 19 44.21sec 15.06usr 28.30sys (98.09% cpu) Filesystem Size
>> Used Avail Use% Mounted on C: 949G 585G 365G 62% / ----
>>
>> So ~1.4 million files... Using the following exclusions:
>>   Local+=" /windows/sysnative/."
>>
>> ---(index_files.sh)---- renice +19 $$ Local="/" if [[ -d
>> /windows/sysnative/. ]]; then fi Prunepaths='/.usr /proc /C /B /H /I
>> /M /D /P /System[[:space:]]Volume[[:space:]]Information /Windows/CSC
>> /pagefile.sys /Music /Pictures /Share /Media /home /Doc /$RECYCLE.BIN
>> /cygdrive'
>>
>> /bin/updatedb --findoptions=-noleaf --localpaths="$Local"
>> --prunepaths="$Prunepaths" --netpaths="$Net" ---- Most of those pruned
>> files are pruned either due to redundancy or being on a local network
>> server...
>>
>> That's fairly fast vs. the MS-Home Essentials, full malware scan I
>> run once a week that takes ~ 8-16 hours (It scans a few of my network
>> directories,as well).
>>
>>>>> Processing every file on the drive will be slow just because it's
>>>>> Windows.  Initializing the database with updatedb will require a large
>>>>> amount of time.  There are processes such as AntiVirus intrusion
>>>>> protection that might make it even slower.
>>>>>
>>>> Hmmm, the reason the slowness is particuarly strange to me is that in
>>>> place of using `locate` from my cygwin terminal, I have to use a program
>>>> called "Everything Search Engine" available at www.voidtools.com. The
>>>> first time I install it, it takes maybe a few minutes to index the hard
>>>> drive, then every once in a while when I open the program it takes a few
>>>> seconds to update the index, but in general the performance for indexing
>>>> and searching the index if comparable to `updatedb` and `locate` on a
>>>> Linux machine, so it's possible to do on Windows.
>>>>
>>>> Byron
>>>>
>>>
>>> the time taken from updatedb is mainly due to
>>> the execution time of "find" on the disks.
>>>
>>> It takes ~ 70 minutes for my 500 GB of data,
>>> and likely the AV is impacting the execution.
>>>
>>> I suspect voidtools is using MS disk indexing
>>> to speed up the things for it.
>
> This is technically OT since this involved a non-cygwin tool.
>
> find is slow compared with a non-Cygwin tool, specifically dir (cmd.exe).
>
> Compare find with cmd.exe's dir.  Note that even with the benefit of
> caching (compare the 1st and 3rd times), find takes twice as long as dir.
> Comparing cached times (2nd vs 3rd), dir is 3X faster.
>
> $ time cmd /c dir /s /b 'C:\usr' > /dev/null ; \
> time find /c/usr > /dev/null ; \
> time cmd /c dir /s /b 'C:\usr' > /dev/null
>
> real    0m1.326s
> user    0m0.000s
> sys     0m0.047s
>
> real    0m2.465s
> user    0m0.280s
> sys     0m2.184s
>
> real    0m0.874s
> user    0m0.000s
> sys     0m0.031s
>
> (Note: c:\usr has nothing to do with /usr.)
>
> Here's how I use dir *in the abstract* for drives C: and D:.  (Note: the
> /a: option of dir lists all files, including hidden ones; /o:n sorts by
> name.)
>
> for D in /c /d
> do
>      "$(cygpath "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "$D")"
> done | \
> tr -s '\r\n' '\n' | \
> cygpath -u -f - | \
> sed -e '/^$/d' -e 's,/\+,/,g' \
> sort -u \
> /usr/libexec/frcode > /tmp/updatedb.tmp
> chmod --reference /var/locatedb /tmp/updatedb.tmp
> mv /tmp/updatedb.tmp /var/locatedb
>
> What I actually do (attached) is more complicated.  My script chooses
> which directories are scanned, does them in parallel, and prints pretty
> messages.  I get error message for very long paths (> ~250 bytes).  It
> works well enough for me; YMMV.
>
> - Barry
>    Disclaimer: Statements made herein are not made on behalf of NIAID.
>
>
>
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
>

Barry,

Are you using dir in some sort of custom way to build the database used 
by locate? Or are you saying that rather than ever using the find 
command to find files, you use a custom script which uses dir?

Byron


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019