X-Spam-Check-By: sourceware.org Message-ID: <20060805210507.4049.qmail@web54206.mail.yahoo.com> Date: Sat, 5 Aug 2006 14:05:07 -0700 (PDT) From: Jim Lawson Subject: Re: NTFS fragmentation To: cygwin AT cygwin DOT com In-Reply-To: <1154685472.16415.ezmlm@cygwin.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com > > From: Vladimir Dergachev > To: cygwin AT cygwin DOT com > Subject: Re: NTFS fragmentation > Date: Thu, 3 Aug 2006 14:54:33 -0400 > > On Thursday 03 August 2006 2:37 pm, Dave Korn wrote: > > On 03 August 2006 18:50, Vladimir Dergachev wrote: > > > On Thursday 03 August 2006 5:18 am, Dave Korn > wrote: > > >> On 03 August 2006 00:46, Vladimir Dergachev > wrote: > > >> > > >> > > >> Hi Vladimir, > > >> > > >>>>> Please CC me - I am not on the list. > > >> > > >> Done :) > > >> > > > > > > I guess this means that sequential writes are > officially broken on NTFS. > > > > > > Anyone has any idea for a workaround ? It would > be nice if a simple > > > tar zcvf a.tgz * does not result in a completely > fragmented file. > > > > I can only think of one thing worth trying off > the top of my head: what > > happens if you open a file (in non-sparse mode) > and immediately seek to the > > file size, then seek back to the start and > actually write the contents? Or > > perhaps after seeking to the end you'd need to > write (at least) a single > > byte, then seek back to the beginning? > > > > I am not sure that I understand, if one creates the > file and then seeks to > +1G, wouldn't the file pointer be still at 0 as the > filesize is 0 ? > > What I am thinking about is modifying cygwin's open > and write calls so that > they preallocate files in chunks of 10MB > (configurable by an environment > variable). > > This way we still get some fragmentation, but it > would not be so bad - > assuming 50MB/sec disk read speed reading 10MB will > take 200ms, while a seek > is at worst 20ms (usually around 10-15ms). > > best > > Vladimir > Dergachev > It turns out that to actually allocate the file blocks, you need to write some data. Seeking to the desired size doesn't (or didn't used to) actually allocate the intervening blocks. As Dave suggests, you need to seek to the end and actually write something to get the file blocks allocated. If you try this for a very large file (several Gigabytes), you had better be prepared to go and have a nice meal while you wait for the block allocation to complete. Window's security policy requires that the blocks not only be allocated, but that they be written with data as well - ostensibly to prevent malicious code from reading old data it shouldn't have access to. Granted, there are better ways to do this - zero-fill on attempts to read from allocated but uninitialized file space or at the very least, throw some kind of exception when an application attempts to read uninitialized file data. Since Windows supports sparse files, the basic mechanism is there somewhere. Windows doesn't (or didn't use to) allow preallocation of files without actually writing data UNLESS you know the proper incantation to prove you're a good guy ( your application needs to do a dance to grant itself the "SeManageVolumePrivilege" privilege so it can issue the "SetFileValidData" call). __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/