delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/11/19/23:49:59

X-Spam-Check-By: sourceware.org
Message-ID: <456133E5.8000509@tlinx.org>
Date: Sun, 19 Nov 2006 20:49:41 -0800
From: Linda Walsh <cygwin AT tlinx DOT org>
User-Agent: Thunderbird 1.5.0.8 (Windows/20061025)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
CC: dave DOT korn AT artimi DOT com, vdergachev AT rcgardis DOT com
Subject: NTFS fragmentation redux
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Some time back (~Aug), there was a discussion about NTFS's file 
fragmentation problem.

Some notes at the time:

From:  Vladimir Dergachev
>        I have encountered a rather puzzling fragmentation 
> that occurs when writing files using Cygwin. 
...
>        a small Tcl script that, when run, creates 
> files fragmented into about 300 pieces on my system)
        &&
On 03 August 2006 18:50, Vladimir Dergachev wrote:
> I guess this means that sequential writes are officially broken on NTFS. 
> Anyone has any idea for a workaround ? It would be nice if a simple
> tar zcvf a.tgz * does not result in a completely fragmented file.
	&&
On Aug  3 14:54, Vladimir Dergachev wrote:
> What I am thinking about is modifying cygwin's open and write calls so that 
> they preallocate files in chunks of 10MB (configurable by an environment 
> variable). 
------------

The "fault" is the behavior of the file system.
I compared NTFS with ext3 & xfs on linux (jfs & reiser hide how many
fragments a file is divided into).

NTFS is in the middle as far as fragmentation performance.  My disk
is usually defragmented, but the built-in Windows defragmenter doesn't
defragment free space.

I used a file size of 64M and proceeded copying that file to
a destination file using various utils.

With Xfs (linux), I wasn't able to fragment the target file.  Even
writing 1K chunks in append mode, the target file always ended up
in 1 64M fragment.

With Ext3 (also linux), it didn't seem to matter the copy method, 
cp, dd(blocksize 64M), and rsync all produced a target file with
2473 fragments.

NTFS using cygwin, varies the fragment size based on the the tool
writing the output.  
"cp" produced the most fragments at 515 fragments.
"rsync" came next with 19 fragments.
"dd" (using a bs=32M or bs=64M) did best at 1 fragment.
using "dd" and using a block size of 8k produced the same
results as "cp".

It appears cygwin does exactly the right thing as far as file
writes are concerned -- it writes the output using the block size
specified by the client program you are running.  If you use a
small block size, NTFS allocates space for each write that you do.
If you use a big block size, NTFS appears to look for the first 
place that the entire write will fit.  Back in DOS days, the 
built-in COPY command buffered as much data as would fit in 
memory then wrote it out -- meaning it would be like to create 
the output with a minimal number of fragments.  

If you want your files to be unfragmented, you need to use a
file copy (or file write) util that uses a large buffer size --
one that (if possible), writes the entire file in 1 write.

In the "tar zcvf a.tgz *" case, I'd suggest piping the output of
tar into "dd" and use a large blocksize.

Linda



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019