Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16072.892.778395.24290@gargle.gargle.HOWL> Date: Sun, 18 May 2003 15:04:44 -0700 From: Martin Buchholz To: cygwin AT cygwin DOT com Subject: SPARSE files considered harmful - please revert Reply-To: martin AT xemacs DOT org This patch is a bad idea. 2003-02-18 Vaclav Haisman * fhandler_disk_file.cc: Include winioctl.h for DeviceIoControl. (fhandler_disk_file::open): Set newly created and truncated files as sparse on platforms that support it. As someone on the mailing list asked, "If making every file sparse is such a good idea, why isn't it the default?". My experience has been that for me, sparse files take up much more disk space than non-sparse files, and are also signicantly slower. I build software. My build trees have 50000 files, average size 8k. When I copied build trees to a Win2000 NTFS disk using Cygwin tools (either cp or tar or rsync) the actual space used on the disk (as reported by df, not du) quintupled. Here's what I think is happening. Sparse files are implemented like compressed files, using 16 clusters. See this web page: http://www.storageadmin.com/Articles/Index.cfm?ArticleID=15900&pg=1&show=654 As a result, a non-empty but small sparse file takes up a minimum of 16*clustersize bytes on the disk. My measurements suggest an overhead of 32kb per file with a cluster size of 4kb. Here are some experiments to support my results: MKS's commands creates files 5 times smaller than Cygwin commands. ---------------------------------------------------------------- In 1.3.22: cpdir is a trivial script that does basically (cd $dir1; tar cf - .) | (cd $dir2; tar xf -) `cp -pr' works the same way. # Use Cygwin commands to create a huge file tree # $ df .; cpdir dev2 copy-of-dev2; df . Filesystem Type 1M-blocks Used Available Use% Mounted on d: system 11492 6001 5491 53% /d ==> mkdir -p copy-of-dev2 cpdir dev2 copy-of-dev2 17.46s user 53.72s system 18% cpu 6:33.99 total Filesystem Type 1M-blocks Used Available Use% Mounted on d: system 11492 8438 3054 74% /d $ du -sm dev2 copy-of-dev2 419 dev2 419 copy-of-dev2 du -h -sm dev2 copy-of-dev2 5.64s user 16.36s system 76% cpu 28.784 total ---------------------------------------------------------------- After reverting to 1.3.20, or patching latest CVS: I used this method to reclaim disk space that was eaten up by the SPARSE file disk hog. $ df .; mv ws ws-old; cpdir ws-old ws; df . Filesystem Type 1M-blocks Used Available Use% Mounted on d: system 11492 6910 4582 61% /d ==> mkdir -p ws cpdir ws-old ws 58.68s user 225.50s system 19% cpu 23:44.30 total Filesystem Type 1M-blocks Used Available Use% Mounted on d: system 11492 9085 2407 80% /d $ df .; rm -rf ws-old; df . Filesystem Type 1M-blocks Used Available Use% Mounted on d: system 11492 9085 2407 80% /d rm -rf ws-old 21.86s user 71.33s system 38% cpu 4:01.85 total Filesystem Type 1M-blocks Used Available Use% Mounted on d: system 11492 3689 7803 33% /d ---------------------------------------------------------------- I'm sure if you do the experiments yourself, you will see this for yourself. To reproduce this problem, you need NTFS 5.0 on Windows 2000. Sparse files are a recent NTFS feature. The patch is obvious, but I'll send it to cygwin-patches anyways. Without this patch, Cygwin is unusable for me. Martin -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/