Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <015f01c31edf$3b235f20$6400a8c0@FoxtrotTech0001> From: "Bill C. Riemers" To: References: <16072 DOT 6666 DOT 10124 DOT 338022 AT gargle DOT gargle DOT HOWL> <00f301c31e12$c29efdb0$6400a8c0 AT FoxtrotTech0001> <00be01c31e15$944d0d50$78d96f83 AT pomello> <005601c31e26$77671260$6400a8c0 AT FoxtrotTech0001> <20030519175913 DOT GA24066 AT redhat DOT com> <008001c31e5e$39c0c680$6400a8c0 AT FoxtrotTech0001> <20030520024151 DOT GA1812 AT redhat DOT com> Subject: Re: SPARSE files considered harmful - please revert Date: Tue, 20 May 2003 10:50:56 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 > 1) You are assuming behavior that isn't documented. I can imagine that > the first block could occupy, say 16 blocks and depending on the size of > the hole, there could be no fragmentation. You are assuming an optimization that may or may not exist. In my example, there is certainly no reason why the first block would occupy 16 blocks. I already specified the hole is exactly one block size. At most the file system may allocate 3 blocks, so the middle one could be filled later. But even in that case you would still get fragmentation as a result. However, the fragmentation would more likely result from a one block file being written into the "reserved" space, before it is needed for the updated sparse file. Either way use of a sparse file for a file that is regularly accessed in RW mode will result in fragmentation. The only question is how fast it will fragment. That behavior depends on the filesystem, and how the drivers are implemented. Really sophisticated drivers might even do things like rewrite the file if it is below a threshold size, just to fix fragmentation on the fly. I can definitely say NTFS is not that sophisticated. Even on disks with a large amount of free space NTFS fragments at an alarmingly fast rate. I defragment Linux partition once every few years at most (by repartitioning and copying). Any more frequent and there is no noticeable improvement in performance. For NTFS I find I need to run the defragmenter every weekend for optimal performance. > 2) Normal read/write behavior would not result in a file that has a > sparse block. I think it is a rare program which writes beyond EOF. So > this would normally be a non-issue. Correct. I am only talking about why it is bad idea to blindly convert all files to sparse files. This can be done with either GNU tar or GNU cp. The above fragmentation behavior is going to happen and does happen when the file in question is a database file, since databases tend to contain lots of blank space intended for adding new records. > 3) What no one seems to be mentioning is that we are trying to emulate > UNIX behavior here. If the above is an issue for Windows then it could > also be an issue for UNIX. It sounds like we are really on the same page, but discussing different issues. CYGWIN should definitely support creating sparse files in the classical Unix method of seeking beyond the end of the file. From what I've seen in this discussion it already does, and that is not an issue. What I'm arguing is that files should not be blindly converted into sparse files with GNU tar -S, GNU cp --sparse=always, etc. If for example, you convert a database file into a sparse file, it is not uncommon for the fragmentation to reduce database access times by an order of magnitude or more. Bill -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/