snappy 0.0.10 → 0.0.11
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitmodules +3 -0
- data/Rakefile +12 -13
- data/ext/extconf.rb +22 -31
- data/lib/snappy/reader.rb +10 -7
- data/lib/snappy/version.rb +1 -1
- data/snappy.gemspec +24 -0
- data/test/test-snappy-reader.rb +16 -0
- data/vendor/snappy/AUTHORS +1 -0
- data/vendor/snappy/COPYING +54 -0
- data/vendor/snappy/ChangeLog +1916 -0
- data/vendor/snappy/Makefile.am +23 -0
- data/vendor/snappy/NEWS +128 -0
- data/vendor/snappy/README +135 -0
- data/vendor/snappy/autogen.sh +7 -0
- data/vendor/snappy/configure.ac +133 -0
- data/vendor/snappy/format_description.txt +110 -0
- data/vendor/snappy/framing_format.txt +135 -0
- data/vendor/snappy/m4/gtest.m4 +74 -0
- data/vendor/snappy/snappy-c.cc +90 -0
- data/vendor/snappy/snappy-c.h +138 -0
- data/vendor/snappy/snappy-internal.h +150 -0
- data/vendor/snappy/snappy-sinksource.cc +71 -0
- data/vendor/snappy/snappy-sinksource.h +137 -0
- data/vendor/snappy/snappy-stubs-internal.cc +42 -0
- data/vendor/snappy/snappy-stubs-internal.h +491 -0
- data/vendor/snappy/snappy-stubs-public.h.in +98 -0
- data/vendor/snappy/snappy-test.cc +606 -0
- data/vendor/snappy/snappy-test.h +582 -0
- data/vendor/snappy/snappy.cc +1306 -0
- data/vendor/snappy/snappy.h +184 -0
- data/vendor/snappy/snappy_unittest.cc +1355 -0
- data/vendor/snappy/testdata/alice29.txt +3609 -0
- data/vendor/snappy/testdata/asyoulik.txt +4122 -0
- data/vendor/snappy/testdata/baddata1.snappy +0 -0
- data/vendor/snappy/testdata/baddata2.snappy +0 -0
- data/vendor/snappy/testdata/baddata3.snappy +0 -0
- data/vendor/snappy/testdata/fireworks.jpeg +0 -0
- data/vendor/snappy/testdata/geo.protodata +0 -0
- data/vendor/snappy/testdata/html +1 -0
- data/vendor/snappy/testdata/html_x_4 +1 -0
- data/vendor/snappy/testdata/kppkn.gtb +0 -0
- data/vendor/snappy/testdata/lcet10.txt +7519 -0
- data/vendor/snappy/testdata/paper-100k.pdf +600 -2
- data/vendor/snappy/testdata/plrabn12.txt +10699 -0
- data/vendor/snappy/testdata/urls.10K +10000 -0
- metadata +51 -12
@@ -0,0 +1,23 @@
|
|
1
|
+
ACLOCAL_AMFLAGS = -I m4
|
2
|
+
|
3
|
+
# Library.
|
4
|
+
lib_LTLIBRARIES = libsnappy.la
|
5
|
+
libsnappy_la_SOURCES = snappy.cc snappy-sinksource.cc snappy-stubs-internal.cc snappy-c.cc
|
6
|
+
libsnappy_la_LDFLAGS = -version-info $(SNAPPY_LTVERSION)
|
7
|
+
|
8
|
+
include_HEADERS = snappy.h snappy-sinksource.h snappy-stubs-public.h snappy-c.h
|
9
|
+
noinst_HEADERS = snappy-internal.h snappy-stubs-internal.h snappy-test.h
|
10
|
+
|
11
|
+
# Unit tests and benchmarks.
|
12
|
+
snappy_unittest_CPPFLAGS = $(gflags_CFLAGS) $(GTEST_CPPFLAGS)
|
13
|
+
snappy_unittest_SOURCES = snappy_unittest.cc snappy-test.cc
|
14
|
+
snappy_unittest_LDFLAGS = $(GTEST_LDFLAGS)
|
15
|
+
snappy_unittest_LDADD = libsnappy.la $(UNITTEST_LIBS) $(gflags_LIBS) $(GTEST_LIBS)
|
16
|
+
TESTS = snappy_unittest
|
17
|
+
noinst_PROGRAMS = $(TESTS)
|
18
|
+
|
19
|
+
EXTRA_DIST = autogen.sh testdata/alice29.txt testdata/asyoulik.txt testdata/baddata1.snappy testdata/baddata2.snappy testdata/baddata3.snappy testdata/geo.protodata testdata/fireworks.jpeg testdata/html testdata/html_x_4 testdata/kppkn.gtb testdata/lcet10.txt testdata/paper-100k.pdf testdata/plrabn12.txt testdata/urls.10K
|
20
|
+
dist_doc_DATA = ChangeLog COPYING INSTALL NEWS README format_description.txt framing_format.txt
|
21
|
+
|
22
|
+
libtool: $(LIBTOOL_DEPS)
|
23
|
+
$(SHELL) ./config.status --recheck
|
data/vendor/snappy/NEWS
ADDED
@@ -0,0 +1,128 @@
|
|
1
|
+
Snappy v1.1.2, February 28th 2014:
|
2
|
+
|
3
|
+
This is a maintenance release with no changes to the actual library
|
4
|
+
source code.
|
5
|
+
|
6
|
+
* Stop distributing benchmark data files that have unclear
|
7
|
+
or unsuitable licensing.
|
8
|
+
|
9
|
+
* Add support for padding chunks in the framing format.
|
10
|
+
|
11
|
+
|
12
|
+
Snappy v1.1.1, October 15th 2013:
|
13
|
+
|
14
|
+
* Add support for uncompressing to iovecs (scatter I/O).
|
15
|
+
The bulk of this patch was contributed by Mohit Aron.
|
16
|
+
|
17
|
+
* Speed up decompression by ~2%; much more so (~13-20%) on
|
18
|
+
a few benchmarks on given compilers and CPUs.
|
19
|
+
|
20
|
+
* Fix a few issues with MSVC compilation.
|
21
|
+
|
22
|
+
* Support truncated test data in the benchmark.
|
23
|
+
|
24
|
+
|
25
|
+
Snappy v1.1.0, January 18th 2013:
|
26
|
+
|
27
|
+
* Snappy now uses 64 kB block size instead of 32 kB. On average,
|
28
|
+
this means it compresses about 3% denser (more so for some
|
29
|
+
inputs), at the same or better speeds.
|
30
|
+
|
31
|
+
* libsnappy no longer depends on iostream.
|
32
|
+
|
33
|
+
* Some small performance improvements in compression on x86
|
34
|
+
(0.5–1%).
|
35
|
+
|
36
|
+
* Various portability fixes for ARM-based platforms, for MSVC,
|
37
|
+
and for GNU/Hurd.
|
38
|
+
|
39
|
+
|
40
|
+
Snappy v1.0.5, February 24th 2012:
|
41
|
+
|
42
|
+
* More speed improvements. Exactly how big will depend on
|
43
|
+
the architecture:
|
44
|
+
|
45
|
+
- 3–10% faster decompression for the base case (x86-64).
|
46
|
+
|
47
|
+
- ARMv7 and higher can now use unaligned accesses,
|
48
|
+
and will see about 30% faster decompression and
|
49
|
+
20–40% faster compression.
|
50
|
+
|
51
|
+
- 32-bit platforms (ARM and 32-bit x86) will see 2–5%
|
52
|
+
faster compression.
|
53
|
+
|
54
|
+
These are all cumulative (e.g., ARM gets all three speedups).
|
55
|
+
|
56
|
+
* Fixed an issue where the unit test would crash on system
|
57
|
+
with less than 256 MB address space available,
|
58
|
+
e.g. some embedded platforms.
|
59
|
+
|
60
|
+
* Added a framing format description, for use over e.g. HTTP,
|
61
|
+
or for a command-line compressor. We do not have any
|
62
|
+
implementations of this at the current point, but there seems
|
63
|
+
to be enough of a general interest in the topic.
|
64
|
+
Also make the format description slightly clearer.
|
65
|
+
|
66
|
+
* Remove some compile-time warnings in -Wall
|
67
|
+
(mostly signed/unsigned comparisons), for easier embedding
|
68
|
+
into projects that use -Wall -Werror.
|
69
|
+
|
70
|
+
|
71
|
+
Snappy v1.0.4, September 15th 2011:
|
72
|
+
|
73
|
+
* Speeded up the decompressor somewhat; typically about 2–8%
|
74
|
+
for Core i7, in 64-bit mode (comparable for Opteron).
|
75
|
+
Somewhat more for some tests, almost no gain for others.
|
76
|
+
|
77
|
+
* Make Snappy compile on certain platforms it didn't before
|
78
|
+
(Solaris with SunPro C++, HP-UX, AIX).
|
79
|
+
|
80
|
+
* Correct some minor errors in the format description.
|
81
|
+
|
82
|
+
|
83
|
+
Snappy v1.0.3, June 2nd 2011:
|
84
|
+
|
85
|
+
* Speeded up the decompressor somewhat; about 3-6% for Core 2,
|
86
|
+
6-13% for Core i7, and 5-12% for Opteron (all in 64-bit mode).
|
87
|
+
|
88
|
+
* Added compressed format documentation. This text is new,
|
89
|
+
but an earlier version from Zeev Tarantov was used as reference.
|
90
|
+
|
91
|
+
* Only link snappy_unittest against -lz and other autodetected
|
92
|
+
libraries, not libsnappy.so (which doesn't need any such dependency).
|
93
|
+
|
94
|
+
* Fixed some display issues in the microbenchmarks, one of which would
|
95
|
+
frequently make the test crash on GNU/Hurd.
|
96
|
+
|
97
|
+
|
98
|
+
Snappy v1.0.2, April 29th 2011:
|
99
|
+
|
100
|
+
* Relicense to a BSD-type license.
|
101
|
+
|
102
|
+
* Added C bindings, contributed by Martin Gieseking.
|
103
|
+
|
104
|
+
* More Win32 fixes, in particular for MSVC.
|
105
|
+
|
106
|
+
* Replace geo.protodata with a newer version.
|
107
|
+
|
108
|
+
* Fix timing inaccuracies in the unit test when comparing Snappy
|
109
|
+
to other algorithms.
|
110
|
+
|
111
|
+
|
112
|
+
Snappy v1.0.1, March 25th 2011:
|
113
|
+
|
114
|
+
This is a maintenance release, mostly containing minor fixes.
|
115
|
+
There is no new functionality. The most important fixes include:
|
116
|
+
|
117
|
+
* The COPYING file and all licensing headers now correctly state that
|
118
|
+
Snappy is licensed under the Apache 2.0 license.
|
119
|
+
|
120
|
+
* snappy_unittest should now compile natively under Windows,
|
121
|
+
as well as on embedded systems with no mmap().
|
122
|
+
|
123
|
+
* Various autotools nits have been fixed.
|
124
|
+
|
125
|
+
|
126
|
+
Snappy v1.0, March 17th 2011:
|
127
|
+
|
128
|
+
* Initial version.
|
@@ -0,0 +1,135 @@
|
|
1
|
+
Snappy, a fast compressor/decompressor.
|
2
|
+
|
3
|
+
|
4
|
+
Introduction
|
5
|
+
============
|
6
|
+
|
7
|
+
Snappy is a compression/decompression library. It does not aim for maximum
|
8
|
+
compression, or compatibility with any other compression library; instead,
|
9
|
+
it aims for very high speeds and reasonable compression. For instance,
|
10
|
+
compared to the fastest mode of zlib, Snappy is an order of magnitude faster
|
11
|
+
for most inputs, but the resulting compressed files are anywhere from 20% to
|
12
|
+
100% bigger. (For more information, see "Performance", below.)
|
13
|
+
|
14
|
+
Snappy has the following properties:
|
15
|
+
|
16
|
+
* Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.
|
17
|
+
See "Performance" below.
|
18
|
+
* Stable: Over the last few years, Snappy has compressed and decompressed
|
19
|
+
petabytes of data in Google's production environment. The Snappy bitstream
|
20
|
+
format is stable and will not change between versions.
|
21
|
+
* Robust: The Snappy decompressor is designed not to crash in the face of
|
22
|
+
corrupted or malicious input.
|
23
|
+
* Free and open source software: Snappy is licensed under a BSD-type license.
|
24
|
+
For more information, see the included COPYING file.
|
25
|
+
|
26
|
+
Snappy has previously been called "Zippy" in some Google presentations
|
27
|
+
and the like.
|
28
|
+
|
29
|
+
|
30
|
+
Performance
|
31
|
+
===========
|
32
|
+
|
33
|
+
Snappy is intended to be fast. On a single core of a Core i7 processor
|
34
|
+
in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at
|
35
|
+
about 500 MB/sec or more. (These numbers are for the slowest inputs in our
|
36
|
+
benchmark suite; others are much faster.) In our tests, Snappy usually
|
37
|
+
is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,
|
38
|
+
etc.) while achieving comparable compression ratios.
|
39
|
+
|
40
|
+
Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x
|
41
|
+
for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and
|
42
|
+
other already-compressed data. Similar numbers for zlib in its fastest mode
|
43
|
+
are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are
|
44
|
+
capable of achieving yet higher compression rates, although usually at the
|
45
|
+
expense of speed. Of course, compression ratio will vary significantly with
|
46
|
+
the input.
|
47
|
+
|
48
|
+
Although Snappy should be fairly portable, it is primarily optimized
|
49
|
+
for 64-bit x86-compatible processors, and may run slower in other environments.
|
50
|
+
In particular:
|
51
|
+
|
52
|
+
- Snappy uses 64-bit operations in several places to process more data at
|
53
|
+
once than would otherwise be possible.
|
54
|
+
- Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.
|
55
|
+
On some platforms, these must be emulated with single-byte loads
|
56
|
+
and stores, which is much slower.
|
57
|
+
- Snappy assumes little-endian throughout, and needs to byte-swap data in
|
58
|
+
several places if running on a big-endian platform.
|
59
|
+
|
60
|
+
Experience has shown that even heavily tuned code can be improved.
|
61
|
+
Performance optimizations, whether for 64-bit x86 or other platforms,
|
62
|
+
are of course most welcome; see "Contact", below.
|
63
|
+
|
64
|
+
|
65
|
+
Usage
|
66
|
+
=====
|
67
|
+
|
68
|
+
Note that Snappy, both the implementation and the main interface,
|
69
|
+
is written in C++. However, several third-party bindings to other languages
|
70
|
+
are available; see the Google Code page at http://code.google.com/p/snappy/
|
71
|
+
for more information. Also, if you want to use Snappy from C code, you can
|
72
|
+
use the included C bindings in snappy-c.h.
|
73
|
+
|
74
|
+
To use Snappy from your own C++ program, include the file "snappy.h" from
|
75
|
+
your calling file, and link against the compiled library.
|
76
|
+
|
77
|
+
There are many ways to call Snappy, but the simplest possible is
|
78
|
+
|
79
|
+
snappy::Compress(input.data(), input.size(), &output);
|
80
|
+
|
81
|
+
and similarly
|
82
|
+
|
83
|
+
snappy::Uncompress(input.data(), input.size(), &output);
|
84
|
+
|
85
|
+
where "input" and "output" are both instances of std::string.
|
86
|
+
|
87
|
+
There are other interfaces that are more flexible in various ways, including
|
88
|
+
support for custom (non-array) input sources. See the header file for more
|
89
|
+
information.
|
90
|
+
|
91
|
+
|
92
|
+
Tests and benchmarks
|
93
|
+
====================
|
94
|
+
|
95
|
+
When you compile Snappy, snappy_unittest is compiled in addition to the
|
96
|
+
library itself. You do not need it to use the compressor from your own library,
|
97
|
+
but it contains several useful components for Snappy development.
|
98
|
+
|
99
|
+
First of all, it contains unit tests, verifying correctness on your machine in
|
100
|
+
various scenarios. If you want to change or optimize Snappy, please run the
|
101
|
+
tests to verify you have not broken anything. Note that if you have the
|
102
|
+
Google Test library installed, unit test behavior (especially failures) will be
|
103
|
+
significantly more user-friendly. You can find Google Test at
|
104
|
+
|
105
|
+
http://code.google.com/p/googletest/
|
106
|
+
|
107
|
+
You probably also want the gflags library for handling of command-line flags;
|
108
|
+
you can find it at
|
109
|
+
|
110
|
+
http://code.google.com/p/google-gflags/
|
111
|
+
|
112
|
+
In addition to the unit tests, snappy contains microbenchmarks used to
|
113
|
+
tune compression and decompression performance. These are automatically run
|
114
|
+
before the unit tests, but you can disable them using the flag
|
115
|
+
--run_microbenchmarks=false if you have gflags installed (otherwise you will
|
116
|
+
need to edit the source).
|
117
|
+
|
118
|
+
Finally, snappy can benchmark Snappy against a few other compression libraries
|
119
|
+
(zlib, LZO, LZF, FastLZ and QuickLZ), if they were detected at configure time.
|
120
|
+
To benchmark using a given file, give the compression algorithm you want to test
|
121
|
+
Snappy against (e.g. --zlib) and then a list of one or more file names on the
|
122
|
+
command line. The testdata/ directory contains the files used by the
|
123
|
+
microbenchmark, which should provide a reasonably balanced starting point for
|
124
|
+
benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they
|
125
|
+
are used to verify correctness in the presence of corrupted data in the unit
|
126
|
+
test.)
|
127
|
+
|
128
|
+
|
129
|
+
Contact
|
130
|
+
=======
|
131
|
+
|
132
|
+
Snappy is distributed through Google Code. For the latest version, a bug tracker,
|
133
|
+
and other information, see
|
134
|
+
|
135
|
+
http://code.google.com/p/snappy/
|
@@ -0,0 +1,133 @@
|
|
1
|
+
m4_define([snappy_major], [1])
|
2
|
+
m4_define([snappy_minor], [1])
|
3
|
+
m4_define([snappy_patchlevel], [2])
|
4
|
+
|
5
|
+
# Libtool shared library interface versions (current:revision:age)
|
6
|
+
# Update this value for every release! (A:B:C will map to foo.so.(A-C).C.B)
|
7
|
+
# http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
|
8
|
+
m4_define([snappy_ltversion], [3:1:2])
|
9
|
+
|
10
|
+
AC_INIT([snappy], [snappy_major.snappy_minor.snappy_patchlevel])
|
11
|
+
AC_CONFIG_MACRO_DIR([m4])
|
12
|
+
|
13
|
+
# These are flags passed to automake (though they look like gcc flags!)
|
14
|
+
AM_INIT_AUTOMAKE([-Wall])
|
15
|
+
|
16
|
+
LT_INIT
|
17
|
+
AC_SUBST([LIBTOOL_DEPS])
|
18
|
+
AC_PROG_CXX
|
19
|
+
AC_LANG([C++])
|
20
|
+
AC_C_BIGENDIAN
|
21
|
+
AC_TYPE_SIZE_T
|
22
|
+
AC_TYPE_SSIZE_T
|
23
|
+
AC_CHECK_HEADERS([stdint.h stddef.h sys/mman.h sys/resource.h windows.h byteswap.h sys/byteswap.h sys/endian.h sys/time.h])
|
24
|
+
|
25
|
+
# Don't use AC_FUNC_MMAP, as it checks for mappings of already-mapped memory,
|
26
|
+
# which we don't need (and does not exist on Windows).
|
27
|
+
AC_CHECK_FUNC([mmap])
|
28
|
+
|
29
|
+
GTEST_LIB_CHECK([], [true], [true # Ignore; we can live without it.])
|
30
|
+
|
31
|
+
AC_ARG_WITH([gflags],
|
32
|
+
[AS_HELP_STRING(
|
33
|
+
[--with-gflags],
|
34
|
+
[use Google Flags package to enhance the unit test @<:@default=check@:>@])],
|
35
|
+
[],
|
36
|
+
[with_gflags=check])
|
37
|
+
|
38
|
+
if test "x$with_gflags" != "xno"; then
|
39
|
+
PKG_CHECK_MODULES(
|
40
|
+
[gflags],
|
41
|
+
[libgflags],
|
42
|
+
[AC_DEFINE([HAVE_GFLAGS], [1], [Use the gflags package for command-line parsing.])],
|
43
|
+
[if test "x$with_gflags" != "xcheck"; then
|
44
|
+
AC_MSG_FAILURE([--with-gflags was given, but test for gflags failed])
|
45
|
+
fi])
|
46
|
+
fi
|
47
|
+
|
48
|
+
# See if we have __builtin_expect.
|
49
|
+
# TODO: Use AC_CACHE.
|
50
|
+
AC_MSG_CHECKING([if the compiler supports __builtin_expect])
|
51
|
+
|
52
|
+
AC_TRY_COMPILE(, [
|
53
|
+
return __builtin_expect(1, 1) ? 1 : 0
|
54
|
+
], [
|
55
|
+
snappy_have_builtin_expect=yes
|
56
|
+
AC_MSG_RESULT([yes])
|
57
|
+
], [
|
58
|
+
snappy_have_builtin_expect=no
|
59
|
+
AC_MSG_RESULT([no])
|
60
|
+
])
|
61
|
+
if test x$snappy_have_builtin_expect = xyes ; then
|
62
|
+
AC_DEFINE([HAVE_BUILTIN_EXPECT], [1], [Define to 1 if the compiler supports __builtin_expect.])
|
63
|
+
fi
|
64
|
+
|
65
|
+
# See if we have working count-trailing-zeros intrinsics.
|
66
|
+
# TODO: Use AC_CACHE.
|
67
|
+
AC_MSG_CHECKING([if the compiler supports __builtin_ctzll])
|
68
|
+
|
69
|
+
AC_TRY_COMPILE(, [
|
70
|
+
return (__builtin_ctzll(0x100000000LL) == 32) ? 1 : 0
|
71
|
+
], [
|
72
|
+
snappy_have_builtin_ctz=yes
|
73
|
+
AC_MSG_RESULT([yes])
|
74
|
+
], [
|
75
|
+
snappy_have_builtin_ctz=no
|
76
|
+
AC_MSG_RESULT([no])
|
77
|
+
])
|
78
|
+
if test x$snappy_have_builtin_ctz = xyes ; then
|
79
|
+
AC_DEFINE([HAVE_BUILTIN_CTZ], [1], [Define to 1 if the compiler supports __builtin_ctz and friends.])
|
80
|
+
fi
|
81
|
+
|
82
|
+
# Other compression libraries; the unit test can use these for comparison
|
83
|
+
# if they are available. If they are not found, just ignore.
|
84
|
+
UNITTEST_LIBS=""
|
85
|
+
AC_DEFUN([CHECK_EXT_COMPRESSION_LIB], [
|
86
|
+
AH_CHECK_LIB([$1])
|
87
|
+
AC_CHECK_LIB(
|
88
|
+
[$1],
|
89
|
+
[$2],
|
90
|
+
[
|
91
|
+
AC_DEFINE_UNQUOTED(AS_TR_CPP(HAVE_LIB$1))
|
92
|
+
UNITTEST_LIBS="-l$1 $UNITTEST_LIBS"
|
93
|
+
],
|
94
|
+
[true]
|
95
|
+
)
|
96
|
+
])
|
97
|
+
CHECK_EXT_COMPRESSION_LIB([z], [zlibVersion])
|
98
|
+
CHECK_EXT_COMPRESSION_LIB([lzo2], [lzo1x_1_15_compress])
|
99
|
+
CHECK_EXT_COMPRESSION_LIB([lzf], [lzf_compress])
|
100
|
+
CHECK_EXT_COMPRESSION_LIB([fastlz], [fastlz_compress])
|
101
|
+
CHECK_EXT_COMPRESSION_LIB([quicklz], [qlz_compress])
|
102
|
+
AC_SUBST([UNITTEST_LIBS])
|
103
|
+
|
104
|
+
# These are used by snappy-stubs-public.h.in.
|
105
|
+
if test "$ac_cv_header_stdint_h" = "yes"; then
|
106
|
+
AC_SUBST([ac_cv_have_stdint_h], [1])
|
107
|
+
else
|
108
|
+
AC_SUBST([ac_cv_have_stdint_h], [0])
|
109
|
+
fi
|
110
|
+
if test "$ac_cv_header_stddef_h" = "yes"; then
|
111
|
+
AC_SUBST([ac_cv_have_stddef_h], [1])
|
112
|
+
else
|
113
|
+
AC_SUBST([ac_cv_have_stddef_h], [0])
|
114
|
+
fi
|
115
|
+
if test "$ac_cv_header_sys_uio_h" = "yes"; then
|
116
|
+
AC_SUBST([ac_cv_have_sys_uio_h], [1])
|
117
|
+
else
|
118
|
+
AC_SUBST([ac_cv_have_sys_uio_h], [0])
|
119
|
+
fi
|
120
|
+
|
121
|
+
# Export the version to snappy-stubs-public.h.
|
122
|
+
SNAPPY_MAJOR="snappy_major"
|
123
|
+
SNAPPY_MINOR="snappy_minor"
|
124
|
+
SNAPPY_PATCHLEVEL="snappy_patchlevel"
|
125
|
+
|
126
|
+
AC_SUBST([SNAPPY_MAJOR])
|
127
|
+
AC_SUBST([SNAPPY_MINOR])
|
128
|
+
AC_SUBST([SNAPPY_PATCHLEVEL])
|
129
|
+
AC_SUBST([SNAPPY_LTVERSION], snappy_ltversion)
|
130
|
+
|
131
|
+
AC_CONFIG_HEADERS([config.h])
|
132
|
+
AC_CONFIG_FILES([Makefile snappy-stubs-public.h])
|
133
|
+
AC_OUTPUT
|
@@ -0,0 +1,110 @@
|
|
1
|
+
Snappy compressed format description
|
2
|
+
Last revised: 2011-10-05
|
3
|
+
|
4
|
+
|
5
|
+
This is not a formal specification, but should suffice to explain most
|
6
|
+
relevant parts of how the Snappy format works. It is originally based on
|
7
|
+
text by Zeev Tarantov.
|
8
|
+
|
9
|
+
Snappy is a LZ77-type compressor with a fixed, byte-oriented encoding.
|
10
|
+
There is no entropy encoder backend nor framing layer -- the latter is
|
11
|
+
assumed to be handled by other parts of the system.
|
12
|
+
|
13
|
+
This document only describes the format, not how the Snappy compressor nor
|
14
|
+
decompressor actually works. The correctness of the decompressor should not
|
15
|
+
depend on implementation details of the compressor, and vice versa.
|
16
|
+
|
17
|
+
|
18
|
+
1. Preamble
|
19
|
+
|
20
|
+
The stream starts with the uncompressed length (up to a maximum of 2^32 - 1),
|
21
|
+
stored as a little-endian varint. Varints consist of a series of bytes,
|
22
|
+
where the lower 7 bits are data and the upper bit is set iff there are
|
23
|
+
more bytes to be read. In other words, an uncompressed length of 64 would
|
24
|
+
be stored as 0x40, and an uncompressed length of 2097150 (0x1FFFFE)
|
25
|
+
would be stored as 0xFE 0xFF 0x7F.
|
26
|
+
|
27
|
+
|
28
|
+
2. The compressed stream itself
|
29
|
+
|
30
|
+
There are two types of elements in a Snappy stream: Literals and
|
31
|
+
copies (backreferences). There is no restriction on the order of elements,
|
32
|
+
except that the stream naturally cannot start with a copy. (Having
|
33
|
+
two literals in a row is never optimal from a compression point of
|
34
|
+
view, but nevertheless fully permitted.) Each element starts with a tag byte,
|
35
|
+
and the lower two bits of this tag byte signal what type of element will
|
36
|
+
follow:
|
37
|
+
|
38
|
+
00: Literal
|
39
|
+
01: Copy with 1-byte offset
|
40
|
+
10: Copy with 2-byte offset
|
41
|
+
11: Copy with 4-byte offset
|
42
|
+
|
43
|
+
The interpretation of the upper six bits are element-dependent.
|
44
|
+
|
45
|
+
|
46
|
+
2.1. Literals (00)
|
47
|
+
|
48
|
+
Literals are uncompressed data stored directly in the byte stream.
|
49
|
+
The literal length is stored differently depending on the length
|
50
|
+
of the literal:
|
51
|
+
|
52
|
+
- For literals up to and including 60 bytes in length, the upper
|
53
|
+
six bits of the tag byte contain (len-1). The literal follows
|
54
|
+
immediately thereafter in the bytestream.
|
55
|
+
- For longer literals, the (len-1) value is stored after the tag byte,
|
56
|
+
little-endian. The upper six bits of the tag byte describe how
|
57
|
+
many bytes are used for the length; 60, 61, 62 or 63 for
|
58
|
+
1-4 bytes, respectively. The literal itself follows after the
|
59
|
+
length.
|
60
|
+
|
61
|
+
|
62
|
+
2.2. Copies
|
63
|
+
|
64
|
+
Copies are references back into previous decompressed data, telling
|
65
|
+
the decompressor to reuse data it has previously decoded.
|
66
|
+
They encode two values: The _offset_, saying how many bytes back
|
67
|
+
from the current position to read, and the _length_, how many bytes
|
68
|
+
to copy. Offsets of zero can be encoded, but are not legal;
|
69
|
+
similarly, it is possible to encode backreferences that would
|
70
|
+
go past the end of the block (offset > current decompressed position),
|
71
|
+
which is also nonsensical and thus not allowed.
|
72
|
+
|
73
|
+
As in most LZ77-based compressors, the length can be larger than the offset,
|
74
|
+
yielding a form of run-length encoding (RLE). For instance,
|
75
|
+
"xababab" could be encoded as
|
76
|
+
|
77
|
+
<literal: "xab"> <copy: offset=2 length=4>
|
78
|
+
|
79
|
+
Note that since the current Snappy compressor works in 32 kB
|
80
|
+
blocks and does not do matching across blocks, it will never produce
|
81
|
+
a bitstream with offsets larger than about 32768. However, the
|
82
|
+
decompressor should not rely on this, as it may change in the future.
|
83
|
+
|
84
|
+
There are several different kinds of copy elements, depending on
|
85
|
+
the amount of bytes to be copied (length), and how far back the
|
86
|
+
data to be copied is (offset).
|
87
|
+
|
88
|
+
|
89
|
+
2.2.1. Copy with 1-byte offset (01)
|
90
|
+
|
91
|
+
These elements can encode lengths between [4..11] bytes and offsets
|
92
|
+
between [0..2047] bytes. (len-4) occupies three bits and is stored
|
93
|
+
in bits [2..4] of the tag byte. The offset occupies 11 bits, of which the
|
94
|
+
upper three are stored in the upper three bits ([5..7]) of the tag byte,
|
95
|
+
and the lower eight are stored in a byte following the tag byte.
|
96
|
+
|
97
|
+
|
98
|
+
2.2.2. Copy with 2-byte offset (10)
|
99
|
+
|
100
|
+
These elements can encode lengths between [1..64] and offsets from
|
101
|
+
[0..65535]. (len-1) occupies six bits and is stored in the upper
|
102
|
+
six bits ([2..7]) of the tag byte. The offset is stored as a
|
103
|
+
little-endian 16-bit integer in the two bytes following the tag byte.
|
104
|
+
|
105
|
+
|
106
|
+
2.2.3. Copy with 4-byte offset (11)
|
107
|
+
|
108
|
+
These are like the copies with 2-byte offsets (see previous subsection),
|
109
|
+
except that the offset is stored as a 32-bit integer instead of a
|
110
|
+
16-bit integer (and thus will occupy four bytes).
|
@@ -0,0 +1,135 @@
|
|
1
|
+
Snappy framing format description
|
2
|
+
Last revised: 2013-10-25
|
3
|
+
|
4
|
+
This format decribes a framing format for Snappy, allowing compressing to
|
5
|
+
files or streams that can then more easily be decompressed without having
|
6
|
+
to hold the entire stream in memory. It also provides data checksums to
|
7
|
+
help verify integrity. It does not provide metadata checksums, so it does
|
8
|
+
not protect against e.g. all forms of truncations.
|
9
|
+
|
10
|
+
Implementation of the framing format is optional for Snappy compressors and
|
11
|
+
decompressor; it is not part of the Snappy core specification.
|
12
|
+
|
13
|
+
|
14
|
+
1. General structure
|
15
|
+
|
16
|
+
The file consists solely of chunks, lying back-to-back with no padding
|
17
|
+
in between. Each chunk consists first a single byte of chunk identifier,
|
18
|
+
then a three-byte little-endian length of the chunk in bytes (from 0 to
|
19
|
+
16777215, inclusive), and then the data if any. The four bytes of chunk
|
20
|
+
header is not counted in the data length.
|
21
|
+
|
22
|
+
The different chunk types are listed below. The first chunk must always
|
23
|
+
be the stream identifier chunk (see section 4.1, below). The stream
|
24
|
+
ends when the file ends -- there is no explicit end-of-file marker.
|
25
|
+
|
26
|
+
|
27
|
+
2. File type identification
|
28
|
+
|
29
|
+
The following identifiers for this format are recommended where appropriate.
|
30
|
+
However, note that none have been registered officially, so this is only to
|
31
|
+
be taken as a guideline. We use "Snappy framed" to distinguish between this
|
32
|
+
format and raw Snappy data.
|
33
|
+
|
34
|
+
File extension: .sz
|
35
|
+
MIME type: application/x-snappy-framed
|
36
|
+
HTTP Content-Encoding: x-snappy-framed
|
37
|
+
|
38
|
+
|
39
|
+
3. Checksum format
|
40
|
+
|
41
|
+
Some chunks have data protected by a checksum (the ones that do will say so
|
42
|
+
explicitly). The checksums are always masked CRC-32Cs.
|
43
|
+
|
44
|
+
A description of CRC-32C can be found in RFC 3720, section 12.1, with
|
45
|
+
examples in section B.4.
|
46
|
+
|
47
|
+
Checksums are not stored directly, but masked, as checksumming data and
|
48
|
+
then its own checksum can be problematic. The masking is the same as used
|
49
|
+
in Apache Hadoop: Rotate the checksum by 15 bits, then add the constant
|
50
|
+
0xa282ead8 (using wraparound as normal for unsigned integers). This is
|
51
|
+
equivalent to the following C code:
|
52
|
+
|
53
|
+
uint32_t mask_checksum(uint32_t x) {
|
54
|
+
return ((x >> 15) | (x << 17)) + 0xa282ead8;
|
55
|
+
}
|
56
|
+
|
57
|
+
Note that the masking is reversible.
|
58
|
+
|
59
|
+
The checksum is always stored as a four bytes long integer, in little-endian.
|
60
|
+
|
61
|
+
|
62
|
+
4. Chunk types
|
63
|
+
|
64
|
+
The currently supported chunk types are described below. The list may
|
65
|
+
be extended in the future.
|
66
|
+
|
67
|
+
|
68
|
+
4.1. Stream identifier (chunk type 0xff)
|
69
|
+
|
70
|
+
The stream identifier is always the first element in the stream.
|
71
|
+
It is exactly six bytes long and contains "sNaPpY" in ASCII. This means that
|
72
|
+
a valid Snappy framed stream always starts with the bytes
|
73
|
+
|
74
|
+
0xff 0x06 0x00 0x00 0x73 0x4e 0x61 0x50 0x70 0x59
|
75
|
+
|
76
|
+
The stream identifier chunk can come multiple times in the stream besides
|
77
|
+
the first; if such a chunk shows up, it should simply be ignored, assuming
|
78
|
+
it has the right length and contents. This allows for easy concatenation of
|
79
|
+
compressed files without the need for re-framing.
|
80
|
+
|
81
|
+
|
82
|
+
4.2. Compressed data (chunk type 0x00)
|
83
|
+
|
84
|
+
Compressed data chunks contain a normal Snappy compressed bitstream;
|
85
|
+
see the compressed format specification. The compressed data is preceded by
|
86
|
+
the CRC-32C (see section 3) of the _uncompressed_ data.
|
87
|
+
|
88
|
+
Note that the data portion of the chunk, i.e., the compressed contents,
|
89
|
+
can be at most 16777211 bytes (2^24 - 1, minus the checksum).
|
90
|
+
However, we place an additional restriction that the uncompressed data
|
91
|
+
in a chunk must be no longer than 65536 bytes. This allows consumers to
|
92
|
+
easily use small fixed-size buffers.
|
93
|
+
|
94
|
+
|
95
|
+
4.3. Uncompressed data (chunk type 0x01)
|
96
|
+
|
97
|
+
Uncompressed data chunks allow a compressor to send uncompressed,
|
98
|
+
raw data; this is useful if, for instance, uncompressible or
|
99
|
+
near-incompressible data is detected, and faster decompression is desired.
|
100
|
+
|
101
|
+
As in the compressed chunks, the data is preceded by its own masked
|
102
|
+
CRC-32C (see section 3).
|
103
|
+
|
104
|
+
An uncompressed data chunk, like compressed data chunks, should contain
|
105
|
+
no more than 65536 data bytes, so the maximum legal chunk length with the
|
106
|
+
checksum is 65540.
|
107
|
+
|
108
|
+
|
109
|
+
4.4. Padding (chunk type 0xfe)
|
110
|
+
|
111
|
+
Padding chunks allow a compressor to increase the size of the data stream
|
112
|
+
so that it complies with external demands, e.g. that the total number of
|
113
|
+
bytes is a multiple of some value.
|
114
|
+
|
115
|
+
All bytes of the padding chunk, except the chunk byte itself and the length,
|
116
|
+
should be zero, but decompressors must not try to interpret or verify the
|
117
|
+
padding data in any way.
|
118
|
+
|
119
|
+
|
120
|
+
4.5. Reserved unskippable chunks (chunk types 0x02-0x7f)
|
121
|
+
|
122
|
+
These are reserved for future expansion. A decoder that sees such a chunk
|
123
|
+
should immediately return an error, as it must assume it cannot decode the
|
124
|
+
stream correctly.
|
125
|
+
|
126
|
+
Future versions of this specification may define meanings for these chunks.
|
127
|
+
|
128
|
+
|
129
|
+
4.6. Reserved skippable chunks (chunk types 0x80-0xfd)
|
130
|
+
|
131
|
+
These are also reserved for future expansion, but unlike the chunks
|
132
|
+
described in 4.5, a decoder seeing these must skip them and continue
|
133
|
+
decoding.
|
134
|
+
|
135
|
+
Future versions of this specification may define meanings for these chunks.
|