Message-ID: <32851B7C.66E2@ananke.amu.edu.pl> Date: Sun, 10 Nov 1996 01:02:04 +0100 From: Mark Habersack Reply-To: grendel AT ananke DOT amu DOT edu DOT pl Organization: Home, sweet home MIME-Version: 1.0 To: George Foot CC: djgpp AT delorie DOT com Subject: Re: Why not to use 'tar' before packing DJGPP? References: <32823D97 DOT 44DD AT sabat DOT tu DOT kielce DOT pl> <3282A82E DOT 7EE7 AT cs DOT com> <55vapk$s4l AT news DOT ox DOT ac DOT uk> <561pv7$36c AT news DOT ox DOT ac DOT uk> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit George Foot wrote: > Sorry, I don't really understand tar (yes, I'm a Dos user...), but I tar is not a compression utility, it stands for Tape ARchiver and was primarily designed to hold contents of tapes used in the early days of computing (and sometimes till today) as a storage media. As such, tar was meant to be the exact image of the data on tape. > thought the point of the original article was that tar could achieve > better compression ratios than zip? The quoted figures certainly looked > impressive... The reason of better compression ratios of tar archives is very simple. LZW compression which (or rather its modification) is used by PKZIP and some other archivers uses something called a "sliding dictionary". It's a structure that holds pairs of data: pattern and its numerical code. When the compressor reads the input stream it looks up the dictionary to whether the just-read pattern already ocurred in the previously read stream of data. If so, then the pattern is replaced with the corresponding code read from dictionary - thus you have just compressed the input data. This is a simplified description of the LZW algorithm, but it's enough to understand what follows. If the archiver compresses several files, it resets its dictionary every time a new file is opened for reading. This way all the patterns are lost and the compressor has to create the disctionary from scratch - this of course reduces the compression ratio as, at many times, the patterns have to be created anew although they ocurred, say, two files before. OTOH, when the archives compresses one file, like e.g. a tar archive, it doesn't reset the dictionary that often and there is less probability that the same patterns will be created many times. Thus the compression ratio increases at about 30%. Hope this will clarify things a little greetings, mark -- ************************************************************************** You tell me I'm drunk then you sit back and smug a while convinced that you're right, that you're still in command of your senses. I laugh at your superior attitude, your insincere platitudes will make me throw up. The sooner you realise I'm perfectly happy if I'm left to decide the company I choose. ********************** http://ananke.amu.edu.pl/~grendel *****************