X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_SOFTFAIL X-Spam-Check-By: sourceware.org Message-ID: <4947AC31.2000005@byu.net> Date: Tue, 16 Dec 2008 06:25:05 -0700 From: Eric Blake User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.18) Gecko/20081105 Thunderbird/2.0.0.18 Mnenhy/0.7.5.666 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: [ANNOUNCEMENT] [1.7] Updated: coreutils-7.0-1 References: <20081216092025 DOT GA15438 AT calimero DOT vinschen DOT de> In-Reply-To: <20081216092025.GA15438@calimero.vinschen.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Corinna Vinschen on 12/16/2008 2:20 AM: >> This release also takes advantage of >> the new d_type support to speed up several utilities; one notable >> exception, unfortunately, is that the Linux patch to use d_type and inode >> sorting to speed up rm from quadratic to linear on directories with a >> large number of files did not apply to cygwin because of differences in >> statfs. > > -v? Is that something we can support by tweaking Cygwin? I'm not sure yet. It doesn't even work on Hurd, and part of the bug is coreutils' fault: http://lists.gnu.org/archive/html/bug-coreutils/2008-10/msg00005.html The problem is that Linux has hardcoded magic constants for various filesystem types, returned through struct statfs.f_type, which are distinct from magic constants returned by other OSs. Coreutils provides a reverse mapping of constants back to file system names, in a generated file fs.h, but only builds that file for Linux: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/extract-magic;h=6a0ac7f;hb=99eccc3 Also, coreutils currently only sorts large directories, but cygwin reports directory st_size as 0 regardless of directory size, so there is no way to identify large directories up front. The coreutils checks won't work as-is, even if cygwin were to use the same magic constants when identifying the same types of file systems (which I'm not sure whether that happens yet), and to use the same struct layout as Linux (right now, it does not; this comment is rather telling): http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/stat.c;h=f5bf8cd;hb=99eccc3 183 /* Return the type of the specified file system. 184 Some systems have statfvs.f_basetype[FSTYPSZ] (AIX, HP-UX, and Solaris). 185 Others have statvfs.f_fstypename[_VFS_NAMELEN] (NetBSD 3.0). 186 Others have statfs.f_fstypename[MFSNAMELEN] (NetBSD 1.5.2). 187 Still others have neither and have to get by with f_type (Linux). 188 But f_type may only exist in statfs (Cygwin). */ My impression is that even if cygwin statfs/statvfs is more closely aligned to Linux, upstream coreutils needs some work to make the generated fs.h work across more OSs, before cygwin could even attempt to filter which file systems might benefit from inode sorting. And even if the coreutils files are improved, we are back to the bigger original question: Are there any file systems accessed by cygwin where sorting readdir() results into inode order, rather than visiting contents in directory listing or name order, provides a speedup by allowing less disk seek time (or put another way, do the inode numbers presented by Cygwin for local NTFS disks match disk seek order)? Conversely, are there any file systems where taking the time to sort readdir() results is provably a waste (for example, a ramdisk, where seek time is instant regardless of inode, or FAT and NFS where inode numbers are synthesized with no correlation to disk layout, and thus no better than any other traversal ordering)? Coreutils can only answer this question if all four of d_ino, d_type, f_type, and st_size are reliable, so that it can collect inode numbers without stat, visit directories separately from other files, filter which file systems the sort will help or hurt, and provide a heuristic of only sorting large directories: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=ab02e25 http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=24412ed - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9 AT byu DOT net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklHrDEACgkQ84KuGfSFAYCh3gCeMN2trVOTq8eqDBIzNaVCpXDl 56gAn3r1hZSYKu1wD2T+YpCSDaIulA09 =9e50 -----END PGP SIGNATURE----- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/