Mail Archives: cygwin/2006/11/20/12:58:35
On Sunday 19 November 2006 11:49 pm, Linda Walsh wrote:
> Some time back (~Aug), there was a discussion about NTFS's file
> fragmentation problem.
>
> Some notes at the time:
>
> From: Vladimir Dergachev
>
> > I have encountered a rather puzzling fragmentation
> > that occurs when writing files using Cygwin.
>
> ...
>
> > a small Tcl script that, when run, creates
> > files fragmented into about 300 pieces on my system)
>
> &&
>
> On 03 August 2006 18:50, Vladimir Dergachev wrote:
> > I guess this means that sequential writes are officially broken on NTFS.
> > Anyone has any idea for a workaround ? It would be nice if a simple
> > tar zcvf a.tgz * does not result in a completely fragmented file.
>
> &&
>
> On Aug 3 14:54, Vladimir Dergachev wrote:
> > What I am thinking about is modifying cygwin's open and write calls so
> > that they preallocate files in chunks of 10MB (configurable by an
> > environment variable).
>
> ------------
>
> The "fault" is the behavior of the file system.
> I compared NTFS with ext3 & xfs on linux (jfs & reiser hide how many
> fragments a file is divided into).
>
> NTFS is in the middle as far as fragmentation performance. My disk
> is usually defragmented, but the built-in Windows defragmenter doesn't
> defragment free space.
>
> I used a file size of 64M and proceeded copying that file to
> a destination file using various utils.
>
> With Xfs (linux), I wasn't able to fragment the target file. Even
> writing 1K chunks in append mode, the target file always ended up
> in 1 64M fragment.
>
> With Ext3 (also linux), it didn't seem to matter the copy method,
> cp, dd(blocksize 64M), and rsync all produced a target file with
> 2473 fragments.
This is curious - how do you find out fragmentation of ext3 file ? I do not
know of a utility to tell me that.
From indirect observation ext3 does not have fragmentation nearly that bad
until the filesystem is close to full or I would not be able to reach
sequential read speeds (the all-seeks speed is about 6 MB/sec for me, I was
getting 40-50 MB/sec). This was on much larger files though.
Which journal option was the filesystem mounted with ?
>
> NTFS using cygwin, varies the fragment size based on the the tool
> writing the output.
> "cp" produced the most fragments at 515 fragments.
> "rsync" came next with 19 fragments.
> "dd" (using a bs=32M or bs=64M) did best at 1 fragment.
> using "dd" and using a block size of 8k produced the same
> results as "cp".
>
> It appears cygwin does exactly the right thing as far as file
> writes are concerned -- it writes the output using the block size
> specified by the client program you are running. If you use a
> small block size, NTFS allocates space for each write that you do.
> If you use a big block size, NTFS appears to look for the first
> place that the entire write will fit. Back in DOS days, the
> built-in COPY command buffered as much data as would fit in
> memory then wrote it out -- meaning it would be like to create
> the output with a minimal number of fragments.
>
> If you want your files to be unfragmented, you need to use a
> file copy (or file write) util that uses a large buffer size --
> one that (if possible), writes the entire file in 1 write.
I actually implemented a workaround that calls "fsutil file createnew
FILESIZE" to preallocate space and then write data in append mode
(after doing seek 0).
thank you !
Vladimir Dergachev
>
> In the "tar zcvf a.tgz *" case, I'd suggest piping the output of
> tar into "dd" and use a large blocksize.
>
> Linda
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -