delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2021/08/29/20:07:13

X-Recipient: archive-cygwin AT delorie DOT com
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8AC0E3858C3B
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
header.from=SystematicSw.ab.ca
Authentication-Results: sourceware.org;
spf=none smtp.mailfrom=systematicsw.ab.ca
X-Authority-Analysis: v=2.4 cv=SdyUytdu c=1 sm=1 tr=0 ts=612c20fd
a=T+ovY1NZ+FAi/xYICV7Bgg==:117 a=T+ovY1NZ+FAi/xYICV7Bgg==:17
a=gyo83rQ1APxq5dxy:21 a=IkcTkHD0fZMA:10 a=mDV3o1hIAAAA:8
a=7nHte30WUusa6BRS63EA:9 a=QEXdDO2ut3YA:10 a=kR-5S0w_ZPQA:10
a=LXTszJ8gAtEA:10 a=lCl5JEtwN7ll65LlMAEr:22 a=_FVE-zBwftR9WsbkzFJk:22
To: cygwin AT cygwin DOT com
References: <986736274 DOT 144968 DOT 1630167325057 DOT ref AT mail DOT yahoo DOT com>
<986736274 DOT 144968 DOT 1630167325057 AT mail DOT yahoo DOT com>
<a60ffa68-274a-5072-c90a-0dce7bc93431 AT harkless DOT org>
<3457cee1-18b5-2916-adee-afdfaf9769ea AT t-online DOT de>
<525a832a-78fd-5a32-e195-5747120da922 AT harkless DOT org>
From: Brian Inglis <Brian DOT Inglis AT SystematicSw DOT ab DOT ca>
Organization: Systematic Software
Subject: Re: updatedb broken as of findutils 4.8.0-1 due to bigram.exe no
longer being provided
Message-ID: <a3772125-f8a0-05fb-2dc5-c3a650fed7c8@SystematicSw.ab.ca>
Date: Sun, 29 Aug 2021 18:06:21 -0600
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
MIME-Version: 1.0
In-Reply-To: <525a832a-78fd-5a32-e195-5747120da922@harkless.org>
X-CMAE-Envelope: MS4xfCqs47+v9n6tUy6YSK2eTEvITiAYVIYgsYwTtnH2TrLERzhM+8/Be1aYgZ1IVWvdIy3LKTaunii7m0gvNfty0hV+IeKdSKzVZMVzPP4VN6aXnxJLNUAJ
PtVVBBGPuaEZevzSCONzBLyvOIW8AuZtRqgAx948A2xHIYsEsNHgCH32h2jUn9i2W6driBEm9tZWcX+aCUAevORd9TLrjfG6gm4=
X-Spam-Status: No, score=-1159.6 required=5.0 tests=BAYES_00, BODY_8BITS,
KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, NICE_REPLY_A,
RCVD_IN_BARRACUDACENTRAL, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Reply-To: cygwin AT cygwin DOT com
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 17U07CNR024566

On 2021-08-29 06:06, Dan Harkless wrote:
> On 8/29/2021 4:02 AM, Hans-Bernhard Bröker wrote:
>> Am 28.08.2021 um 18:23 schrieb Dan Harkless:
>>> Looks like it's because in findutils 4.8.0-1, the bigram.exe program 
>>> is no longer provided, but the /usr/bin/updatedb script (still) 
>>> depends on it being there:
>>      [...]
>>>      + for binary in $find $frcode $bigram $code
>>>      + checkbinary /usr/libexec/frcode

>> The version of updatedb in the 4.8.0-1 package does not actually 
>> contain those lines.  Mention of both $bigram and $code has been 
>> removed from the loop construct (and from everywhere else in the script).
>>
>> That's because the old format of find databases, which is the only one 
>> actually using bigram and code, was removed from updatedb as of 
>> findutils version 4.7, so there really cannot be a need for the bigram 
>> tool any more.

> Argh!  So sorry for the false report!  I completely forgot that years 
> back I had made a locally patched version (which is earlier in my path) 
> of Cygwin updatedb 4.6.0-1 to troubleshoot and work around problems I 
> was having with the tool.
> 
> I have 12M+ pathnames on my main Windows system, and I suddenly started 
> having issues with the updatedb going from taking less than an hour, to 
> taking more than 24 hours, and running into the next job.
> 
> It was very awkward to try to troubleshoot what was going on without a 
> 'find' log to 'tail', so I patched my  local copy of updatedb to write 
> to an intermediate file, rather than going direct to 'sort' over a pipe.
> 
> Another problem I was having was that though I have 24 GB of RAM on my 
> system, I would get low-memory popup warnings from the OS when the sort 
> would go off.  (The warnings mislay the blame on Firefox, because I 
> usually have big sessions running that take even more RAM than the sort.)
> 
> I was hoping running sort on a _file_ rather than stdin might allow it 
> to reduce the RAM use enough to not get the warning, but unfortunately 
> (and unsurprisingly) I still get it with the intermediate file.  This is 
> just a warning, though — I haven't had it actually run out of RAM so 
> far, I don't think.
> 
> The final problem I was addressing in my patched version was some 
> missing error-checking, which was causing me to be left with _no_ 
> filename DB, when the update would fail, rather than at least being left 
> with the one from last time.
> 
> I could send along my patches, but I don't know that I've solved these 
> issues in a general enough way.  For instance, my 12 million+ pathnames 
> come out to about 1.4 GiB of UNIX-linefeed-separated UTF-8 strings. 
> Writing that much to my HD is not a concern, but obviously some people 
> might not want to write that much every time to, say, a small 
> flash-based device.
> 
> Thoughts?

Thanks for the analysis Hans-Bernhard.

Please recheck the announcement for 4.8 and change info for 4.7: as of 
4.8 locate should still work on old format dbs, but from 4.7 updatedb 
will no longer generate or update them, and in some future release, 
locate will no longer work on them.
The old (pre-GNU Unix) format was deprecated from 4.0 (~25 years ago!) 
and each run of updatedb should have warned you to upgrade, unless you 
patched that out.

See:

	$ info finding databases 'database formats' old

or:

<https://gnu.org/software/findutils/manual/html_node/find_html/Old-Database-Format.html>

I searched for more info on the discussion list archive at:

	<https://lists.gnu.org/archive/html/bug-findutils/>

but could find nothing obviously related to upgrading or migrating, 
although that archive goes back only ~20 years! ;^>

Migration appears to require running the previous 4.6 updatedb without 
--old-format to regenerate the new database in LOCATE02 format?
You should then be able to upgrade to the latest 4.8 findutils and use 
that going forward.

You could email the discussion list <mailto:bug-findutils AT gnu DOT org> about 
your situation, file sizes, timings, migration path, and issues, and 
cross-post here about anything in the replies we may be able to help you 
with.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019