X-Spam-Check-By: sourceware.org X-BigFish: V From: Vladimir Dergachev To: Linda Walsh Subject: Re: NTFS fragmentation redux Date: Mon, 20 Nov 2006 12:52:31 -0500 User-Agent: KMail/1.9.5 Cc: cygwin AT cygwin DOT com, dave DOT korn AT artimi DOT com References: <456133E5 DOT 8000509 AT tlinx DOT org> In-Reply-To: <456133E5.8000509@tlinx.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Message-Id: <200611201252.31836.vdergachev@rcgardis.com> Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id kAKHwUhS002132 On Sunday 19 November 2006 11:49 pm, Linda Walsh wrote: > Some time back (~Aug), there was a discussion about NTFS's file > fragmentation problem. > > Some notes at the time: > > From: Vladimir Dergachev > > > I have encountered a rather puzzling fragmentation > > that occurs when writing files using Cygwin. > > ... > > > a small Tcl script that, when run, creates > > files fragmented into about 300 pieces on my system) > > && > > On 03 August 2006 18:50, Vladimir Dergachev wrote: > > I guess this means that sequential writes are officially broken on NTFS. > > Anyone has any idea for a workaround ? It would be nice if a simple > > tar zcvf a.tgz * does not result in a completely fragmented file. > > && > > On Aug 3 14:54, Vladimir Dergachev wrote: > > What I am thinking about is modifying cygwin's open and write calls so > > that they preallocate files in chunks of 10MB (configurable by an > > environment variable). > > ------------ > > The "fault" is the behavior of the file system. > I compared NTFS with ext3 & xfs on linux (jfs & reiser hide how many > fragments a file is divided into). > > NTFS is in the middle as far as fragmentation performance. My disk > is usually defragmented, but the built-in Windows defragmenter doesn't > defragment free space. > > I used a file size of 64M and proceeded copying that file to > a destination file using various utils. > > With Xfs (linux), I wasn't able to fragment the target file. Even > writing 1K chunks in append mode, the target file always ended up > in 1 64M fragment. > > With Ext3 (also linux), it didn't seem to matter the copy method, > cp, dd(blocksize 64M), and rsync all produced a target file with > 2473 fragments. This is curious - how do you find out fragmentation of ext3 file ? I do not know of a utility to tell me that. From indirect observation ext3 does not have fragmentation nearly that bad until the filesystem is close to full or I would not be able to reach sequential read speeds (the all-seeks speed is about 6 MB/sec for me, I was getting 40-50 MB/sec). This was on much larger files though. Which journal option was the filesystem mounted with ? > > NTFS using cygwin, varies the fragment size based on the the tool > writing the output. > "cp" produced the most fragments at 515 fragments. > "rsync" came next with 19 fragments. > "dd" (using a bs=32M or bs=64M) did best at 1 fragment. > using "dd" and using a block size of 8k produced the same > results as "cp". > > It appears cygwin does exactly the right thing as far as file > writes are concerned -- it writes the output using the block size > specified by the client program you are running. If you use a > small block size, NTFS allocates space for each write that you do. > If you use a big block size, NTFS appears to look for the first > place that the entire write will fit. Back in DOS days, the > built-in COPY command buffered as much data as would fit in > memory then wrote it out -- meaning it would be like to create > the output with a minimal number of fragments. > > If you want your files to be unfragmented, you need to use a > file copy (or file write) util that uses a large buffer size -- > one that (if possible), writes the entire file in 1 write. I actually implemented a workaround that calls "fsutil file createnew FILESIZE" to preallocate space and then write data in append mode (after doing seek 0). thank you ! Vladimir Dergachev > > In the "tar zcvf a.tgz *" case, I'd suggest piping the output of > tar into "dd" and use a large blocksize. > > Linda -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/