Mail Archives: djgpp/1999/03/19/06:25:42
In article <36F17BC3 DOT 78B926E4 AT cableol DOT co DOT uk> you wrote:
> I'm sure I heard somewhere that tgz's are based around the same
> algorithm as zips, so why the mega space saving? (Perhaps because
> they use a different algorithm?)
No, the packing algorithm itself is 100% identical. The difference
between .zip and .tgz is in the stuff it's packing: single files *in*
the archive, or the whole archive as one.
[...]
> DJ Delorie wrote:
[...]
> > file zip tgz
> > djdev 1.42M 1.36M
> > djlsr 1.45M 0.87M
Just to complement what DJ already answered to this: note the
difference between the given examples: djdev gains much less than
djlsr does, from the use of tgz format. In the essence that's because
djlsr contains a really enormous amount of quite similar, and very
*small* files. That's exactly the situation where zip's approach of
packing each file individually is rather inefficient. Packing works by
finding and exploiting repetitions in the input, roughly, but inside a
single, small file, there's not much repetition to be, and thus little
to be gained from reducing them.
Some people have reported that you can get even a bit better than .tar.gz
by using .zip.gz, or .zip.zip, i.e.:
zip -0 temparchive contained_files...
zip -9 archive temparchive
(or equivalently, replace the 'zip -9' by 'gzip -9'). The trick is
that zip -0 makes a slightly smaller, and more easily packable package
file than tar does.
--
Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.
- Raw text -