delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2008/12/16/09:11:22

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Tue, 16 Dec 2008 15:09:49 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: [ANNOUNCEMENT] [1.7] Updated: coreutils-7.0-1
Message-ID: <20081216140949.GH6830@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <announce DOT 4947253E DOT 7000707 AT byu DOT net> <20081216092025 DOT GA15438 AT calimero DOT vinschen DOT de> <4947AC31 DOT 2000005 AT byu DOT net>
MIME-Version: 1.0
In-Reply-To: <4947AC31.2000005@byu.net>
User-Agent: Mutt/1.5.16 (2007-06-09)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Dec 16 06:25, Eric Blake wrote:
> According to Corinna Vinschen on 12/16/2008 2:20 AM:
> >> unfortunately, is that the Linux patch to use d_type and inode
> >> sorting to speed up rm from quadratic to linear on directories with a
> >> large number of files did not apply to cygwin because of differences in
> >> statfs.
> > 
> > -v?  Is that something we can support by tweaking Cygwin?
> 
> I'm not sure yet.  It doesn't even work on Hurd, and part of the bug is
> coreutils' fault:
> http://lists.gnu.org/archive/html/bug-coreutils/2008-10/msg00005.html
> 
> The problem is that Linux has hardcoded magic constants for various
> filesystem types, returned through struct statfs.f_type, which are

Hmm, Cygwin's statvfs struct doesn't have f_type.

> distinct from magic constants returned by other OSs. 
> [...]
> Also, coreutils currently only sorts large directories, but cygwin reports
> directory st_size as 0 regardless of directory size, so there is no way to
> identify large directories up front.

Not quite.  Did you call `ls -s' on cygwin's / directory lately?  A snippet
from mine on one of my machines look like this:

160 drwxrwx---+ 1 corinna vinschen       163840 Dec 16 10:13 bin
  0 drwxrwx---+ 1 corinna vinschen            0 Apr 15  2008 cygdrive
  0 drwxrwx---+ 1 corinna vinschen            0 Apr 30  2008 dev
 12 drwxrwx---+ 1 corinna vinschen        12288 Dec 15 11:15 etc
  4 drwxr-xr-x+ 1 corinna vinschen         4096 Jul  4 10:41 home
 40 drwxrwx---+ 1 corinna vinschen        40960 Dec  8 11:58 lib
  0 dr-xr-xr-x  8 corinna vinschen            0 Dec  1  2006 proc
  0 drwxrwx---+ 1 corinna vinschen            0 Apr 15  2008 sbin
  4 drwxrwxrwt+ 1 corinna vinschen         4096 Dec 15 16:35 tmp
  4 drwxrwx---+ 1 corinna vinschen         4096 Dec  8 11:54 usr
  0 drwxr-xr-x+ 1 SYSTEM  Administrators      0 May 21  2008 var

The size of a directory which you just created is 0.  But big
directories (like /bin), or directories which once were big (like /tmp)
have a size which is a multiple of 4K.  This size is what's returned by
the NT function NtQueryInformationFile.  I assume that a directory is
created with one block in a pre-allocated area in the MFT or so, which
explains size 0.  When the dir grows, then normal FS blocks are added,
so the size grows beyond 0.  But actualyy I have no idea, so it could be
entirely different. :)

>  183 /* Return the type of the specified file system.
>  184    Some systems have statfvs.f_basetype[FSTYPSZ] (AIX, HP-UX, and
> Solaris).
>  185    Others have statvfs.f_fstypename[_VFS_NAMELEN] (NetBSD 3.0).
>  186    Others have statfs.f_fstypename[MFSNAMELEN] (NetBSD 1.5.2).
>  187    Still others have neither and have to get by with f_type (Linux).
>  188    But f_type may only exist in statfs (Cygwin).  */

Yeah, but we don't have that.  For type recognition we have
statvfs::f_flag which is an exact copy of the Windows FS flags, or
mntent::mnt_type, which is the file system name (like "ntfs").  So the
ability would be available, it just had to be used.

> [...]
>   And even if the
> coreutils files are improved, we are back to the bigger original question:
> 
> Are there any file systems accessed by cygwin where sorting readdir()
> results into inode order, rather than visiting contents in directory
> listing or name order, provides a speedup by allowing less disk seek time
> (or put another way, do the inode numbers presented by Cygwin for local
> NTFS disks match disk seek order)?  Conversely, are there any file systems
> where taking the time to sort readdir() results is provably a waste (for
> example, a ramdisk, where seek time is instant regardless of inode, or FAT
> and NFS where inode numbers are synthesized with no correlation to disk
> layout,

Interesting question.  NTFS and FAT filesystems are name-sorted by
default.  AFAIK directory changes on FAT are done in-memory, resorted,
and then written back as a whole block to disk.  NTFS is using an
always name-sorted B+ tree anyway.  So, as far as I can tell, resorting
by inode number would probably not help to speed up rm.  But that's
just me.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019