Mail Archives: djgpp/1996/08/31/07:37:34
-----BEGIN PGP SIGNED MESSAGE-----
==============================================================================
LZO -- a real-time data compression library
==============================================================================
Author : Markus Franz Xaver Johannes Oberhumer
<markus DOT oberhumer AT jk DOT uni-linz DOT ac DOT at>
http://www.infosys.tuwien.ac.at/Staff/lux/marco/lzo.html
Version : 0.20
Date : 11-Aug-1996
Availability
------------
ftp://hpv17.infosys.tuwien.ac.at/pub/mfx/lzo/lzo-0.20.tar.gz
ftp://tsx-11.mit.edu/pub/linux/sources/libs/lzo-0.20.tar.gz
Archive size is about 99 kB.
Introduction
------------
LZO is a data compression library which is suitable for data
de-/compression in real-time. This means it favours speed
over compression ratio.
I named it LZO standing for Lempel-Ziv-Oberhumer.
LZO is written in ANSI C. Both the source code and the compressed
data format are designed to be portable across platforms,
and I intend to make LZO work on each architecture that
has at least 32 bit integers. Somewhat limited support for 16 bit
MSDOS is implemented as well (using 'huge' pointers).
This release includes 7 algorithms (LZO1, LZO1A, LZO1B, LZO1C,
LZO1D, LZO1F and LZO1X) each of which can be tuned to different
compression levels.
I already published LZO1 and LZO1A via the Internet newsgroups
comp.compression and comp.compression.research in March 1996.
LZO1 and LZO1A are mainly included for compatibility reasons.
LZO1D decompression is too slow, and there is no fast compressor anyway.
So the algorithms of current interest are LZO1B, LZO1C, LZO1F
and LZO1X. Each of these has a slightly different characteristics,
and it seems that the new LZO1X algorithm is a good overall
choice for a wide range of data.
The LZO1B/LZO1C/LZO1F/LZO1X algorithm has the following features:
- Decompression is simple and *very* fast.
- Requires no memory for decompression.
- Requires 64kB of memory for compression.
- Allows you to dial up extra compression at a speed cost in the
compressor. The speed of the decompressor is not affected.
- Algorithm is thread safe.
- Algorithm is lossless.
- LZO1B is deterministic.
Design criteria
---------------
LZO was designed with speed in mind. Decompressor speed has been
favoured over compressor speed. Real-time decompression should be
possible for virtually any application. I expect an implementation
of the decompressor in assembler code to run at the third of the
speed of a memcpy().
In fact, I first wrote the decompressor of each algorithm thereby
defining the compressed data format, verified it with manually created
test data and at last added the compressor.
Performance
-----------
To keep you interested, here is a short overview of the average
results when compressing the (in-)famous Calgary Corpus test suite
with a blocksize of 256kB, running on my i486 DX2/66.
The naming convention of the algorithms goes LZOxx-N, where N is the
compression level. Range 1-9 indicates the fast standard levels using
64kB memory for compression. Level 99 offers better compression at the
cost of more memory (256kB), and is still reasonable fast.
Level 999 achieves nearly optimal compression - but it is slow
and uses much memory, and is mainly intended for generating
pre-compressed data.
+--------------------------------------------------------------------+
| Algorithm Length CxB ComLen %Remn Bits Com K/s Dec K/s |
| --------- ------ --- ------ ----- ---- ------- ------- |
| LZO1-1 224401 1 118375 53.5 4.28 1704.11 3580.56 |
| |
| LZO1A-1 224401 1 115988 52.0 4.16 1656.07 3670.48 |
| |
| LZO1B-1 224401 1 110653 49.6 3.97 1575.28 4411.62 |
| LZO1B-2 224401 1 107416 48.6 3.89 1451.90 4437.53 |
| LZO1B-3 224401 1 105537 47.9 3.83 1343.32 4422.70 |
| LZO1B-4 224401 1 104828 47.4 3.79 1037.02 4514.11 |
| LZO1B-5 224401 1 102724 46.7 3.73 943.35 4521.54 |
| LZO1B-6 224401 1 101210 46.0 3.68 883.91 4483.46 |
| LZO1B-7 224401 1 101388 46.0 3.68 811.64 4572.20 |
| LZO1B-8 224401 1 99453 45.2 3.62 731.56 4514.98 |
| LZO1B-9 224401 1 99118 45.0 3.60 592.89 4426.57 |
| |
| LZO1C-1 224401 1 112051 50.2 4.02 1518.80 4392.48 |
| LZO1C-2 224401 1 108791 49.1 3.93 1416.69 4413.89 |
| LZO1C-3 224401 1 106825 48.4 3.87 1321.23 4389.14 |
| LZO1C-4 224401 1 105717 47.7 3.82 1032.84 4494.94 |
| LZO1C-5 224401 1 103605 46.9 3.76 940.19 4498.48 |
| LZO1C-6 224401 1 102585 46.5 3.72 863.84 4452.71 |
| LZO1C-7 224401 1 101937 46.2 3.70 772.97 4548.76 |
| LZO1C-8 224401 1 100779 45.6 3.65 708.72 4487.48 |
| LZO1C-9 224401 1 100252 45.4 3.63 598.06 4396.90 |
| |
| LZO1F-1 224401 1 116765 51.5 4.12 1665.56 4805.74 |
| |
| LZO1X-1 224401 1 110368 49.4 3.95 1601.11 4919.72 |
| |
| LZO1-99 224401 1 101560 46.7 3.73 473.35 3643.55 |
| LZO1A-99 224401 1 99958 45.5 3.64 471.55 3692.92 |
| LZO1B-99 224401 1 95399 43.6 3.49 454.05 4461.44 |
| LZO1C-99 224401 1 97252 44.1 3.53 455.97 4447.05 |
| |
| LZO1C-999 224401 1 87762 40.2 3.21 87.57 4714.26 |
| LZO1D-999 224401 1 87901 40.0 3.20 84.90 3208.91 |
| LZO1F-999 224401 1 89620 40.4 3.23 78.35 5466.84 |
| LZO1X-999 224401 1 84521 38.5 3.08 53.40 5203.72 |
| |
| LZRW1-A 224401 1 123194 55.1 4.41 1564.23 3396.10 |
| LZRW2 224401 1 115399 51.5 4.12 1301.08 2379.13 |
| LZRW3 224401 1 110942 50.0 4.00 1518.32 1893.92 |
| LZRW3-A(2) 224401 1 106308 48.1 3.85 1089.16 1989.55 |
| LZRW3-A(4) 224401 1 103126 46.8 3.74 889.30 2060.22 |
| LZRW3-A(8) 224401 1 100722 45.8 3.67 607.05 2064.86 |
| LZV 224401 1 112511 51.4 4.11 1115.18 4247.76 |
| zlib-8/1 224401 1 85302 38.8 3.10 360.44 1084.91 |
| memcpy() 224401 1 224401 100.0 8.00 19992.22 20023.29 |
+--------------------------------------------------------------------+
More detailed results can be found in the ./doc directory.
Short documentation
-------------------
LZO is a block compression algorithm - it compresses and decompresses
a block of data. Block size must be the same for compression
and decompression.
LZO compresses a block of data into matches (a LZ77 sliding dictionary)
and runs of non-matching literals. LZO takes care about long matches
and long literal runs so that it produces good results on highly
redundant data and deals accecptable with non-compressible data.
When dealing with uncompressible data, LZO expands the input
block by a maximum of 16 bytes per 1024 bytes input (probably
only by a maximum of 8 bytes, I still have to compute the exact
value).
Most algorithms come with two decompressors - a fast and a safe one.
The safe decompressor is designed such a way that it won't crash your
application when it is fed with incorrect or damaged data but will
catch all possible overruns and return an error code in this case.
When using the safe decompressor you must pass the number of
bytes available in 'dst' via the parameter 'dst_len'.
I have verified LZO using gcc 2.7.2 with Richard W.M. Jones's
bounds checking patches and compressed a lot of files when tuning
some parameters. LZO is free of any known bugs.
The algorithms
--------------
There are (too) many algorithms implemented. But I want to support
unlimited backwards compatibility, so I will *not* reduce the LZO
distribution in the future (what I had planned some time ago).
As the many object files are mostly independent of each other, the size
overhead for an executable statically linked with the LZO library
is usually pretty low cause the linker will only add the modules that
you are actually using.
My experiments have shown that LZO1B is good with a large blocksize
or with very redundant data, LZO1F is good with a small blocksize
or with binary data and that LZO1X is often the best choice of all.
LZO1F seems to compress better than LZO1C on these files where LZO1C
is better than LZO1B. Beware, your mileage may vary.
Portability
-----------
I have built and tested LZO successfully on the following platforms:
i486 - Linux 1.2.6 - gcc 2.6.3
i486 - MSDOS - djgpp v2 + gcc 2.7.2
i486 - MSDOS - emx + gcc 2.7.2
i486 - MSDOS - Watcom C 10.5 (32 bit)
i486 - MSDOS - Borland C 4.0 (16 bit)
LZO should work without problems on all machines that have an ANSI C
compiler and at least 32 bit integers. The supplied makefile requires
GNU make.
I need *your* feedback to support more platforms and fix possible
portability problems. Please take a look at the file PLATFORM.TXT.
The future
----------
Right now my time is very limited as I'm (still :-) working on finishing
my diploma thesis (which has nothing to do with compression at all).
I consider LZO to be pretty finished right now, and I'd also like
to freeze most of the source code except for fixes - but no promises...
The following things still need to be done before this could
be called version 1:
- test on other 32/64 bit architechtures. I do not expect major problems.
- thoroughly test on 16 bit systems (already works with Borland C 4.0)
- implement the missing LZO1F compression levels
- implement the missing LZO1X compression levels
- write a more detailed documentation
I'm also thinking about these features (contributions are welcome):
- implement an ultra fast LZO1X-0 (at the cost of compression ratio)
- speed up the compressors using unaligned memory access where possible
- a stream interface (compatible to zlib ?) would be nice (and slow ?)
- can someone help me setting up automake/autoconf ?
- interfaces to Perl and Java
Some comments about the source code
-----------------------------------
If you want to understand how LZO works, carefully study the sources
in the ./extra directory first.
Be warned: the main source code in the ./src directory is a
real pain to understand as I've experimented with hundreds of slightly
different versions. It contains many #if and some gotos, and
is completely optimized for speed and not for readability.
Code sharing of the different algorithms is implemented by stressing
the preprocessor - this can be really confusing. Lot's of marcos and
assertions don't make things better.
Nevertheless the sources compile very quiet on a variety of
compilers with the highest warning levels turned on, even
in C++ mode.
Thanks
------
Laszlo Molnar <lmolnar AT goliat DOT eik DOT bme DOT hu>
Charles W. Sandmann <sandmann AT clio DOT rice DOT edu>
Wolfgang Lugmayr <W DOT Lugmayr AT infosys DOT tuwien DOT ac DOT at>
Natascha
Copyright
---------
LZO is Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer
LZO is distributed under the terms of the GNU Library General
Public License (LGPL). See the file COPYING.LIB.
Special licenses for commercial and other applications which
are not willing to accept the GNU Library General Public License
are available by contacting the author.
-----BEGIN PGP SIGNATURE-----
Version: 2.6.2i
iQCVAwUBMh3RaW10fyLu8beJAQHZCwP/bE/eEJvRIsI2B3CtQ1g6fzmmt8ZHikXo
wRTdk5gSccTuDIFBYqOqKpMurmEjOKQFlr34e787MbpXZHAWl2yAbh1pPmg9iFZy
7d4uow1a3J/72QIt/ucWavT0mxXzsBl19ylc71E0Auc76xvxg2HgsD5Iax4L70au
PG/mQiOsVxo=
=LXqt
-----END PGP SIGNATURE-----
- Raw text -