delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2008/12/16/08:25:49

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_SOFTFAIL
X-Spam-Check-By: sourceware.org
Message-ID: <4947AC31.2000005@byu.net>
Date: Tue, 16 Dec 2008 06:25:05 -0700
From: Eric Blake <ebb9 AT byu DOT net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.18) Gecko/20081105 Thunderbird/2.0.0.18 Mnenhy/0.7.5.666
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: [ANNOUNCEMENT] [1.7] Updated: coreutils-7.0-1
References: <announce DOT 4947253E DOT 7000707 AT byu DOT net> <20081216092025 DOT GA15438 AT calimero DOT vinschen DOT de>
In-Reply-To: <20081216092025.GA15438@calimero.vinschen.de>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Corinna Vinschen on 12/16/2008 2:20 AM:
>>   This release also takes advantage of
>> the new d_type support to speed up several utilities; one notable
>> exception, unfortunately, is that the Linux patch to use d_type and inode
>> sorting to speed up rm from quadratic to linear on directories with a
>> large number of files did not apply to cygwin because of differences in
>> statfs.
> 
> -v?  Is that something we can support by tweaking Cygwin?

I'm not sure yet.  It doesn't even work on Hurd, and part of the bug is
coreutils' fault:
http://lists.gnu.org/archive/html/bug-coreutils/2008-10/msg00005.html

The problem is that Linux has hardcoded magic constants for various
filesystem types, returned through struct statfs.f_type, which are
distinct from magic constants returned by other OSs.  Coreutils provides a
reverse mapping of constants back to file system names, in a generated
file fs.h, but only builds that file for Linux:
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extract-magic;h=6a0ac7f;hb=99eccc3
Also, coreutils currently only sorts large directories, but cygwin reports
directory st_size as 0 regardless of directory size, so there is no way to
identify large directories up front.

The coreutils checks won't work as-is, even if cygwin were to use the same
magic constants when identifying the same types of file systems (which I'm
not sure whether that happens yet), and to use the same struct layout as
Linux (right now, it does not; this comment is rather telling):
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/stat.c;h=f5bf8cd;hb=99eccc3
 183 /* Return the type of the specified file system.
 184    Some systems have statfvs.f_basetype[FSTYPSZ] (AIX, HP-UX, and
Solaris).
 185    Others have statvfs.f_fstypename[_VFS_NAMELEN] (NetBSD 3.0).
 186    Others have statfs.f_fstypename[MFSNAMELEN] (NetBSD 1.5.2).
 187    Still others have neither and have to get by with f_type (Linux).
 188    But f_type may only exist in statfs (Cygwin).  */

My impression is that even if cygwin statfs/statvfs is more closely
aligned to Linux, upstream coreutils needs some work to make the generated
fs.h work across more OSs, before cygwin could even attempt to filter
which file systems might benefit from inode sorting.  And even if the
coreutils files are improved, we are back to the bigger original question:

Are there any file systems accessed by cygwin where sorting readdir()
results into inode order, rather than visiting contents in directory
listing or name order, provides a speedup by allowing less disk seek time
(or put another way, do the inode numbers presented by Cygwin for local
NTFS disks match disk seek order)?  Conversely, are there any file systems
where taking the time to sort readdir() results is provably a waste (for
example, a ramdisk, where seek time is instant regardless of inode, or FAT
and NFS where inode numbers are synthesized with no correlation to disk
layout, and thus no better than any other traversal ordering)?  Coreutils
can only answer this question if all four of d_ino, d_type, f_type, and
st_size are reliable, so that it can collect inode numbers without stat,
visit directories separately from other files, filter which file systems
the sort will help or hurt, and provide a heuristic of only sorting large
directories:
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=ab02e25
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=24412ed

- --
Don't work too hard, make some time for fun as well!

Eric Blake             ebb9 AT byu DOT net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklHrDEACgkQ84KuGfSFAYCh3gCeMN2trVOTq8eqDBIzNaVCpXDl
56gAn3r1hZSYKu1wD2T+YpCSDaIulA09
=9e50
-----END PGP SIGNATURE-----

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019