snappy 0.0.13 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (43) hide show
  1. checksums.yaml +5 -5
  2. data/.travis.yml +28 -1
  3. data/Gemfile +6 -1
  4. data/README.md +28 -4
  5. data/Rakefile +1 -0
  6. data/ext/extconf.rb +21 -24
  7. data/lib/snappy.rb +3 -1
  8. data/lib/snappy/hadoop.rb +22 -0
  9. data/lib/snappy/hadoop/reader.rb +58 -0
  10. data/lib/snappy/hadoop/writer.rb +51 -0
  11. data/lib/snappy/reader.rb +11 -7
  12. data/lib/snappy/shim.rb +30 -0
  13. data/lib/snappy/version.rb +3 -1
  14. data/lib/snappy/writer.rb +8 -9
  15. data/smoke.sh +8 -0
  16. data/snappy.gemspec +6 -30
  17. data/test/hadoop/test-snappy-hadoop-reader.rb +103 -0
  18. data/test/hadoop/test-snappy-hadoop-writer.rb +48 -0
  19. data/test/test-snappy-hadoop.rb +22 -0
  20. data/vendor/snappy/AUTHORS +1 -0
  21. data/vendor/snappy/CMakeLists.txt +174 -0
  22. data/vendor/snappy/CONTRIBUTING.md +26 -0
  23. data/vendor/snappy/COPYING +54 -0
  24. data/vendor/snappy/NEWS +180 -0
  25. data/vendor/snappy/README.md +149 -0
  26. data/vendor/snappy/cmake/SnappyConfig.cmake +1 -0
  27. data/vendor/snappy/cmake/config.h.in +62 -0
  28. data/vendor/snappy/format_description.txt +110 -0
  29. data/vendor/snappy/framing_format.txt +135 -0
  30. data/vendor/snappy/snappy-c.cc +90 -0
  31. data/vendor/snappy/snappy-c.h +138 -0
  32. data/vendor/snappy/snappy-internal.h +224 -0
  33. data/vendor/snappy/snappy-sinksource.cc +104 -0
  34. data/vendor/snappy/snappy-sinksource.h +182 -0
  35. data/vendor/snappy/snappy-stubs-internal.cc +42 -0
  36. data/vendor/snappy/snappy-stubs-internal.h +561 -0
  37. data/vendor/snappy/snappy-stubs-public.h.in +94 -0
  38. data/vendor/snappy/snappy-test.cc +612 -0
  39. data/vendor/snappy/snappy-test.h +573 -0
  40. data/vendor/snappy/snappy.cc +1515 -0
  41. data/vendor/snappy/snappy.h +203 -0
  42. data/vendor/snappy/snappy_unittest.cc +1410 -0
  43. metadata +38 -46
@@ -0,0 +1,180 @@
1
+ Snappy v1.1.7, August 24th 2017:
2
+
3
+ * Improved CMake build support for 64-bit Linux distributions.
4
+
5
+ * MSVC builds now use MSVC-specific intrinsics that map to clzll.
6
+
7
+ * ARM64 (AArch64) builds use the code paths optimized for 64-bit processors.
8
+
9
+ Snappy v1.1.6, July 12th 2017:
10
+
11
+ This is a re-release of v1.1.5 with proper SONAME / SOVERSION values.
12
+
13
+ Snappy v1.1.5, June 28th 2017:
14
+
15
+ This release has broken SONAME / SOVERSION values. Users of snappy as a shared
16
+ library should avoid 1.1.5 and use 1.1.6 instead. SONAME / SOVERSION errors will
17
+ manifest as the dynamic library loader complaining that it cannot find snappy's
18
+ shared library file (libsnappy.so / libsnappy.dylib), or that the library it
19
+ found does not have the required version. 1.1.6 has the same code as 1.1.5, but
20
+ carries build configuration fixes for the issues above.
21
+
22
+ * Add CMake build support. The autoconf build support is now deprecated, and
23
+ will be removed in the next release.
24
+
25
+ * Add AppVeyor configuration, for Windows CI coverage.
26
+
27
+ * Small performance improvement on little-endian PowerPC.
28
+
29
+ * Small performance improvement on LLVM with position-independent executables.
30
+
31
+ * Fix a few issues with various build environments.
32
+
33
+ Snappy v1.1.4, January 25th 2017:
34
+
35
+ * Fix a 1% performance regression when snappy is used in PIE executables.
36
+
37
+ * Improve compression performance by 5%.
38
+
39
+ * Improve decompression performance by 20%.
40
+
41
+ Snappy v1.1.3, July 6th 2015:
42
+
43
+ This is the first release to be done from GitHub, which means that
44
+ some minor things like the ChangeLog format has changed (git log
45
+ format instead of svn log).
46
+
47
+ * Add support for Uncompress() from a Source to a Sink.
48
+
49
+ * Various minor changes to improve MSVC support; in particular,
50
+ the unit tests now compile and run under MSVC.
51
+
52
+
53
+ Snappy v1.1.2, February 28th 2014:
54
+
55
+ This is a maintenance release with no changes to the actual library
56
+ source code.
57
+
58
+ * Stop distributing benchmark data files that have unclear
59
+ or unsuitable licensing.
60
+
61
+ * Add support for padding chunks in the framing format.
62
+
63
+
64
+ Snappy v1.1.1, October 15th 2013:
65
+
66
+ * Add support for uncompressing to iovecs (scatter I/O).
67
+ The bulk of this patch was contributed by Mohit Aron.
68
+
69
+ * Speed up decompression by ~2%; much more so (~13-20%) on
70
+ a few benchmarks on given compilers and CPUs.
71
+
72
+ * Fix a few issues with MSVC compilation.
73
+
74
+ * Support truncated test data in the benchmark.
75
+
76
+
77
+ Snappy v1.1.0, January 18th 2013:
78
+
79
+ * Snappy now uses 64 kB block size instead of 32 kB. On average,
80
+ this means it compresses about 3% denser (more so for some
81
+ inputs), at the same or better speeds.
82
+
83
+ * libsnappy no longer depends on iostream.
84
+
85
+ * Some small performance improvements in compression on x86
86
+ (0.5–1%).
87
+
88
+ * Various portability fixes for ARM-based platforms, for MSVC,
89
+ and for GNU/Hurd.
90
+
91
+
92
+ Snappy v1.0.5, February 24th 2012:
93
+
94
+ * More speed improvements. Exactly how big will depend on
95
+ the architecture:
96
+
97
+ - 3–10% faster decompression for the base case (x86-64).
98
+
99
+ - ARMv7 and higher can now use unaligned accesses,
100
+ and will see about 30% faster decompression and
101
+ 20–40% faster compression.
102
+
103
+ - 32-bit platforms (ARM and 32-bit x86) will see 2–5%
104
+ faster compression.
105
+
106
+ These are all cumulative (e.g., ARM gets all three speedups).
107
+
108
+ * Fixed an issue where the unit test would crash on system
109
+ with less than 256 MB address space available,
110
+ e.g. some embedded platforms.
111
+
112
+ * Added a framing format description, for use over e.g. HTTP,
113
+ or for a command-line compressor. We do not have any
114
+ implementations of this at the current point, but there seems
115
+ to be enough of a general interest in the topic.
116
+ Also make the format description slightly clearer.
117
+
118
+ * Remove some compile-time warnings in -Wall
119
+ (mostly signed/unsigned comparisons), for easier embedding
120
+ into projects that use -Wall -Werror.
121
+
122
+
123
+ Snappy v1.0.4, September 15th 2011:
124
+
125
+ * Speeded up the decompressor somewhat; typically about 2–8%
126
+ for Core i7, in 64-bit mode (comparable for Opteron).
127
+ Somewhat more for some tests, almost no gain for others.
128
+
129
+ * Make Snappy compile on certain platforms it didn't before
130
+ (Solaris with SunPro C++, HP-UX, AIX).
131
+
132
+ * Correct some minor errors in the format description.
133
+
134
+
135
+ Snappy v1.0.3, June 2nd 2011:
136
+
137
+ * Speeded up the decompressor somewhat; about 3-6% for Core 2,
138
+ 6-13% for Core i7, and 5-12% for Opteron (all in 64-bit mode).
139
+
140
+ * Added compressed format documentation. This text is new,
141
+ but an earlier version from Zeev Tarantov was used as reference.
142
+
143
+ * Only link snappy_unittest against -lz and other autodetected
144
+ libraries, not libsnappy.so (which doesn't need any such dependency).
145
+
146
+ * Fixed some display issues in the microbenchmarks, one of which would
147
+ frequently make the test crash on GNU/Hurd.
148
+
149
+
150
+ Snappy v1.0.2, April 29th 2011:
151
+
152
+ * Relicense to a BSD-type license.
153
+
154
+ * Added C bindings, contributed by Martin Gieseking.
155
+
156
+ * More Win32 fixes, in particular for MSVC.
157
+
158
+ * Replace geo.protodata with a newer version.
159
+
160
+ * Fix timing inaccuracies in the unit test when comparing Snappy
161
+ to other algorithms.
162
+
163
+
164
+ Snappy v1.0.1, March 25th 2011:
165
+
166
+ This is a maintenance release, mostly containing minor fixes.
167
+ There is no new functionality. The most important fixes include:
168
+
169
+ * The COPYING file and all licensing headers now correctly state that
170
+ Snappy is licensed under the Apache 2.0 license.
171
+
172
+ * snappy_unittest should now compile natively under Windows,
173
+ as well as on embedded systems with no mmap().
174
+
175
+ * Various autotools nits have been fixed.
176
+
177
+
178
+ Snappy v1.0, March 17th 2011:
179
+
180
+ * Initial version.
@@ -0,0 +1,149 @@
1
+ Snappy, a fast compressor/decompressor.
2
+
3
+
4
+ Introduction
5
+ ============
6
+
7
+ Snappy is a compression/decompression library. It does not aim for maximum
8
+ compression, or compatibility with any other compression library; instead,
9
+ it aims for very high speeds and reasonable compression. For instance,
10
+ compared to the fastest mode of zlib, Snappy is an order of magnitude faster
11
+ for most inputs, but the resulting compressed files are anywhere from 20% to
12
+ 100% bigger. (For more information, see "Performance", below.)
13
+
14
+ Snappy has the following properties:
15
+
16
+ * Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.
17
+ See "Performance" below.
18
+ * Stable: Over the last few years, Snappy has compressed and decompressed
19
+ petabytes of data in Google's production environment. The Snappy bitstream
20
+ format is stable and will not change between versions.
21
+ * Robust: The Snappy decompressor is designed not to crash in the face of
22
+ corrupted or malicious input.
23
+ * Free and open source software: Snappy is licensed under a BSD-type license.
24
+ For more information, see the included COPYING file.
25
+
26
+ Snappy has previously been called "Zippy" in some Google presentations
27
+ and the like.
28
+
29
+
30
+ Performance
31
+ ===========
32
+
33
+ Snappy is intended to be fast. On a single core of a Core i7 processor
34
+ in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at
35
+ about 500 MB/sec or more. (These numbers are for the slowest inputs in our
36
+ benchmark suite; others are much faster.) In our tests, Snappy usually
37
+ is faster than algorithms in the same class (e.g. LZO, LZF, QuickLZ,
38
+ etc.) while achieving comparable compression ratios.
39
+
40
+ Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x
41
+ for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and
42
+ other already-compressed data. Similar numbers for zlib in its fastest mode
43
+ are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are
44
+ capable of achieving yet higher compression rates, although usually at the
45
+ expense of speed. Of course, compression ratio will vary significantly with
46
+ the input.
47
+
48
+ Although Snappy should be fairly portable, it is primarily optimized
49
+ for 64-bit x86-compatible processors, and may run slower in other environments.
50
+ In particular:
51
+
52
+ - Snappy uses 64-bit operations in several places to process more data at
53
+ once than would otherwise be possible.
54
+ - Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.
55
+ On some platforms, these must be emulated with single-byte loads
56
+ and stores, which is much slower.
57
+ - Snappy assumes little-endian throughout, and needs to byte-swap data in
58
+ several places if running on a big-endian platform.
59
+
60
+ Experience has shown that even heavily tuned code can be improved.
61
+ Performance optimizations, whether for 64-bit x86 or other platforms,
62
+ are of course most welcome; see "Contact", below.
63
+
64
+
65
+ Building
66
+ ========
67
+
68
+ CMake is supported and autotools will soon be deprecated.
69
+ You need CMake 3.4 or above to build:
70
+
71
+ mkdir build
72
+ cd build && cmake ../ && make
73
+
74
+
75
+ Usage
76
+ =====
77
+
78
+ Note that Snappy, both the implementation and the main interface,
79
+ is written in C++. However, several third-party bindings to other languages
80
+ are available; see the home page at http://google.github.io/snappy/
81
+ for more information. Also, if you want to use Snappy from C code, you can
82
+ use the included C bindings in snappy-c.h.
83
+
84
+ To use Snappy from your own C++ program, include the file "snappy.h" from
85
+ your calling file, and link against the compiled library.
86
+
87
+ There are many ways to call Snappy, but the simplest possible is
88
+
89
+ snappy::Compress(input.data(), input.size(), &output);
90
+
91
+ and similarly
92
+
93
+ snappy::Uncompress(input.data(), input.size(), &output);
94
+
95
+ where "input" and "output" are both instances of std::string.
96
+
97
+ There are other interfaces that are more flexible in various ways, including
98
+ support for custom (non-array) input sources. See the header file for more
99
+ information.
100
+
101
+
102
+ Tests and benchmarks
103
+ ====================
104
+
105
+ When you compile Snappy, snappy_unittest is compiled in addition to the
106
+ library itself. You do not need it to use the compressor from your own library,
107
+ but it contains several useful components for Snappy development.
108
+
109
+ First of all, it contains unit tests, verifying correctness on your machine in
110
+ various scenarios. If you want to change or optimize Snappy, please run the
111
+ tests to verify you have not broken anything. Note that if you have the
112
+ Google Test library installed, unit test behavior (especially failures) will be
113
+ significantly more user-friendly. You can find Google Test at
114
+
115
+ http://github.com/google/googletest
116
+
117
+ You probably also want the gflags library for handling of command-line flags;
118
+ you can find it at
119
+
120
+ http://gflags.github.io/gflags/
121
+
122
+ In addition to the unit tests, snappy contains microbenchmarks used to
123
+ tune compression and decompression performance. These are automatically run
124
+ before the unit tests, but you can disable them using the flag
125
+ --run_microbenchmarks=false if you have gflags installed (otherwise you will
126
+ need to edit the source).
127
+
128
+ Finally, snappy can benchmark Snappy against a few other compression libraries
129
+ (zlib, LZO, LZF, and QuickLZ), if they were detected at configure time.
130
+ To benchmark using a given file, give the compression algorithm you want to test
131
+ Snappy against (e.g. --zlib) and then a list of one or more file names on the
132
+ command line. The testdata/ directory contains the files used by the
133
+ microbenchmark, which should provide a reasonably balanced starting point for
134
+ benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they
135
+ are used to verify correctness in the presence of corrupted data in the unit
136
+ test.)
137
+
138
+
139
+ Contact
140
+ =======
141
+
142
+ Snappy is distributed through GitHub. For the latest version, a bug tracker,
143
+ and other information, see
144
+
145
+ http://google.github.io/snappy/
146
+
147
+ or the repository at
148
+
149
+ https://github.com/google/snappy
@@ -0,0 +1 @@
1
+ include("${CMAKE_CURRENT_LIST_DIR}/SnappyTargets.cmake")
@@ -0,0 +1,62 @@
1
+ #ifndef THIRD_PARTY_SNAPPY_OPENSOURCE_CMAKE_CONFIG_H_
2
+ #define THIRD_PARTY_SNAPPY_OPENSOURCE_CMAKE_CONFIG_H_
3
+
4
+ /* Define to 1 if the compiler supports __builtin_ctz and friends. */
5
+ #cmakedefine HAVE_BUILTIN_CTZ 1
6
+
7
+ /* Define to 1 if the compiler supports __builtin_expect. */
8
+ #cmakedefine HAVE_BUILTIN_EXPECT 1
9
+
10
+ /* Define to 1 if you have the <byteswap.h> header file. */
11
+ #cmakedefine HAVE_BYTESWAP_H 1
12
+
13
+ /* Define to 1 if you have a definition for mmap() in <sys/mman.h>. */
14
+ #cmakedefine HAVE_FUNC_MMAP 1
15
+
16
+ /* Define to 1 if you have a definition for sysconf() in <unistd.h>. */
17
+ #cmakedefine HAVE_FUNC_SYSCONF 1
18
+
19
+ /* Define to 1 to use the gflags package for command-line parsing. */
20
+ #cmakedefine HAVE_GFLAGS 1
21
+
22
+ /* Define to 1 if you have Google Test. */
23
+ #cmakedefine HAVE_GTEST 1
24
+
25
+ /* Define to 1 if you have the `lzo2' library (-llzo2). */
26
+ #cmakedefine HAVE_LIBLZO2 1
27
+
28
+ /* Define to 1 if you have the `z' library (-lz). */
29
+ #cmakedefine HAVE_LIBZ 1
30
+
31
+ /* Define to 1 if you have the <stddef.h> header file. */
32
+ #cmakedefine HAVE_STDDEF_H 1
33
+
34
+ /* Define to 1 if you have the <stdint.h> header file. */
35
+ #cmakedefine HAVE_STDINT_H 1
36
+
37
+ /* Define to 1 if you have the <sys/endian.h> header file. */
38
+ #cmakedefine HAVE_SYS_ENDIAN_H 1
39
+
40
+ /* Define to 1 if you have the <sys/mman.h> header file. */
41
+ #cmakedefine HAVE_SYS_MMAN_H 1
42
+
43
+ /* Define to 1 if you have the <sys/resource.h> header file. */
44
+ #cmakedefine HAVE_SYS_RESOURCE_H 1
45
+
46
+ /* Define to 1 if you have the <sys/time.h> header file. */
47
+ #cmakedefine HAVE_SYS_TIME_H 1
48
+
49
+ /* Define to 1 if you have the <sys/uio.h> header file. */
50
+ #cmakedefine HAVE_SYS_UIO_H 1
51
+
52
+ /* Define to 1 if you have the <unistd.h> header file. */
53
+ #cmakedefine HAVE_UNISTD_H 1
54
+
55
+ /* Define to 1 if you have the <windows.h> header file. */
56
+ #cmakedefine HAVE_WINDOWS_H 1
57
+
58
+ /* Define to 1 if your processor stores words with the most significant byte
59
+ first (like Motorola and SPARC, unlike Intel and VAX). */
60
+ #cmakedefine SNAPPY_IS_BIG_ENDIAN 1
61
+
62
+ #endif // THIRD_PARTY_SNAPPY_OPENSOURCE_CMAKE_CONFIG_H_
@@ -0,0 +1,110 @@
1
+ Snappy compressed format description
2
+ Last revised: 2011-10-05
3
+
4
+
5
+ This is not a formal specification, but should suffice to explain most
6
+ relevant parts of how the Snappy format works. It is originally based on
7
+ text by Zeev Tarantov.
8
+
9
+ Snappy is a LZ77-type compressor with a fixed, byte-oriented encoding.
10
+ There is no entropy encoder backend nor framing layer -- the latter is
11
+ assumed to be handled by other parts of the system.
12
+
13
+ This document only describes the format, not how the Snappy compressor nor
14
+ decompressor actually works. The correctness of the decompressor should not
15
+ depend on implementation details of the compressor, and vice versa.
16
+
17
+
18
+ 1. Preamble
19
+
20
+ The stream starts with the uncompressed length (up to a maximum of 2^32 - 1),
21
+ stored as a little-endian varint. Varints consist of a series of bytes,
22
+ where the lower 7 bits are data and the upper bit is set iff there are
23
+ more bytes to be read. In other words, an uncompressed length of 64 would
24
+ be stored as 0x40, and an uncompressed length of 2097150 (0x1FFFFE)
25
+ would be stored as 0xFE 0xFF 0x7F.
26
+
27
+
28
+ 2. The compressed stream itself
29
+
30
+ There are two types of elements in a Snappy stream: Literals and
31
+ copies (backreferences). There is no restriction on the order of elements,
32
+ except that the stream naturally cannot start with a copy. (Having
33
+ two literals in a row is never optimal from a compression point of
34
+ view, but nevertheless fully permitted.) Each element starts with a tag byte,
35
+ and the lower two bits of this tag byte signal what type of element will
36
+ follow:
37
+
38
+ 00: Literal
39
+ 01: Copy with 1-byte offset
40
+ 10: Copy with 2-byte offset
41
+ 11: Copy with 4-byte offset
42
+
43
+ The interpretation of the upper six bits are element-dependent.
44
+
45
+
46
+ 2.1. Literals (00)
47
+
48
+ Literals are uncompressed data stored directly in the byte stream.
49
+ The literal length is stored differently depending on the length
50
+ of the literal:
51
+
52
+ - For literals up to and including 60 bytes in length, the upper
53
+ six bits of the tag byte contain (len-1). The literal follows
54
+ immediately thereafter in the bytestream.
55
+ - For longer literals, the (len-1) value is stored after the tag byte,
56
+ little-endian. The upper six bits of the tag byte describe how
57
+ many bytes are used for the length; 60, 61, 62 or 63 for
58
+ 1-4 bytes, respectively. The literal itself follows after the
59
+ length.
60
+
61
+
62
+ 2.2. Copies
63
+
64
+ Copies are references back into previous decompressed data, telling
65
+ the decompressor to reuse data it has previously decoded.
66
+ They encode two values: The _offset_, saying how many bytes back
67
+ from the current position to read, and the _length_, how many bytes
68
+ to copy. Offsets of zero can be encoded, but are not legal;
69
+ similarly, it is possible to encode backreferences that would
70
+ go past the end of the block (offset > current decompressed position),
71
+ which is also nonsensical and thus not allowed.
72
+
73
+ As in most LZ77-based compressors, the length can be larger than the offset,
74
+ yielding a form of run-length encoding (RLE). For instance,
75
+ "xababab" could be encoded as
76
+
77
+ <literal: "xab"> <copy: offset=2 length=4>
78
+
79
+ Note that since the current Snappy compressor works in 32 kB
80
+ blocks and does not do matching across blocks, it will never produce
81
+ a bitstream with offsets larger than about 32768. However, the
82
+ decompressor should not rely on this, as it may change in the future.
83
+
84
+ There are several different kinds of copy elements, depending on
85
+ the amount of bytes to be copied (length), and how far back the
86
+ data to be copied is (offset).
87
+
88
+
89
+ 2.2.1. Copy with 1-byte offset (01)
90
+
91
+ These elements can encode lengths between [4..11] bytes and offsets
92
+ between [0..2047] bytes. (len-4) occupies three bits and is stored
93
+ in bits [2..4] of the tag byte. The offset occupies 11 bits, of which the
94
+ upper three are stored in the upper three bits ([5..7]) of the tag byte,
95
+ and the lower eight are stored in a byte following the tag byte.
96
+
97
+
98
+ 2.2.2. Copy with 2-byte offset (10)
99
+
100
+ These elements can encode lengths between [1..64] and offsets from
101
+ [0..65535]. (len-1) occupies six bits and is stored in the upper
102
+ six bits ([2..7]) of the tag byte. The offset is stored as a
103
+ little-endian 16-bit integer in the two bytes following the tag byte.
104
+
105
+
106
+ 2.2.3. Copy with 4-byte offset (11)
107
+
108
+ These are like the copies with 2-byte offsets (see previous subsection),
109
+ except that the offset is stored as a 32-bit integer instead of a
110
+ 16-bit integer (and thus will occupy four bytes).