delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2008/11/23/17:50:55

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
From: Barry Kelly <bkelly DOT ie AT gmail DOT com>
To: cygwin AT cygwin DOT com, bug-coreutils <bug-coreutils AT gnu DOT org>
Subject: Re: "du -b --files0-from=-" running out of memory
Date: Sun, 23 Nov 2008 22:49:47 +0000
Message-ID: <a2nji4p8vr3v453qa3lhi4kdd92e920qqk@4ax.com>
References: <nacii4p76633jbufvfoj4qjesrph05rjga AT 4ax DOT com> <49296551 DOT 4010801 AT byu DOT net>
In-Reply-To: <49296551.4010801@byu.net>
MIME-Version: 1.0
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id mANMorDR026980

Eric Blake wrote:

> [adding the upstream coreutils list]
> 
> According to Barry Kelly on 11/23/2008 6:24 AM:
> > I have a problem with du running out of memory.
> > 
> > I'm feeding it a list of null-separated file names via standard input,
> > to a command-line that looks like:
> > 
> >   du -b --files0-from=-
> > 
> > The problem is that when du is run in this way, it leaks memory like a
> > sieve. I feed it about 4.7 million paths but eventually it falls over as
> > it hits the 32-bit address space limit.
> 
> That's because du must keep track of which files it has visited, so that
> it can determine whether to recount or ignore hard links that visit a file

That's why I said this:

> > Now, I can understand why a du -c might want to exclude excess hard
> > links to files, but that at most requires a hash table for device &
> > inode pairs - it's hard to see why 4.7 million entries would cause OOM

And 4.7 million inode and device pairs, assuming 64-bit inodes and
16-bit device data (major & minor), even including alignment (so 16
bytes), only adds up to 75MB of data. That shouldn't cause an overflow
of 2GB address space.

> already seen.  The upstream ls source code was recently change to store
> this information only for command line arguments, rather than every file
> visited; I wonder if a similar change for du would make sense.

A "visited" hashtable would still be required for calculating '-c'
though.

-- Barry

-- 
http://barrkel.blogspot.com/

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019