snappy 0.0.13 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (43) hide show
  1. checksums.yaml +5 -5
  2. data/.travis.yml +28 -1
  3. data/Gemfile +6 -1
  4. data/README.md +28 -4
  5. data/Rakefile +1 -0
  6. data/ext/extconf.rb +21 -24
  7. data/lib/snappy.rb +3 -1
  8. data/lib/snappy/hadoop.rb +22 -0
  9. data/lib/snappy/hadoop/reader.rb +58 -0
  10. data/lib/snappy/hadoop/writer.rb +51 -0
  11. data/lib/snappy/reader.rb +11 -7
  12. data/lib/snappy/shim.rb +30 -0
  13. data/lib/snappy/version.rb +3 -1
  14. data/lib/snappy/writer.rb +8 -9
  15. data/smoke.sh +8 -0
  16. data/snappy.gemspec +6 -30
  17. data/test/hadoop/test-snappy-hadoop-reader.rb +103 -0
  18. data/test/hadoop/test-snappy-hadoop-writer.rb +48 -0
  19. data/test/test-snappy-hadoop.rb +22 -0
  20. data/vendor/snappy/AUTHORS +1 -0
  21. data/vendor/snappy/CMakeLists.txt +174 -0
  22. data/vendor/snappy/CONTRIBUTING.md +26 -0
  23. data/vendor/snappy/COPYING +54 -0
  24. data/vendor/snappy/NEWS +180 -0
  25. data/vendor/snappy/README.md +149 -0
  26. data/vendor/snappy/cmake/SnappyConfig.cmake +1 -0
  27. data/vendor/snappy/cmake/config.h.in +62 -0
  28. data/vendor/snappy/format_description.txt +110 -0
  29. data/vendor/snappy/framing_format.txt +135 -0
  30. data/vendor/snappy/snappy-c.cc +90 -0
  31. data/vendor/snappy/snappy-c.h +138 -0
  32. data/vendor/snappy/snappy-internal.h +224 -0
  33. data/vendor/snappy/snappy-sinksource.cc +104 -0
  34. data/vendor/snappy/snappy-sinksource.h +182 -0
  35. data/vendor/snappy/snappy-stubs-internal.cc +42 -0
  36. data/vendor/snappy/snappy-stubs-internal.h +561 -0
  37. data/vendor/snappy/snappy-stubs-public.h.in +94 -0
  38. data/vendor/snappy/snappy-test.cc +612 -0
  39. data/vendor/snappy/snappy-test.h +573 -0
  40. data/vendor/snappy/snappy.cc +1515 -0
  41. data/vendor/snappy/snappy.h +203 -0
  42. data/vendor/snappy/snappy_unittest.cc +1410 -0
  43. metadata +38 -46
@@ -0,0 +1,135 @@
1
+ Snappy framing format description
2
+ Last revised: 2013-10-25
3
+
4
+ This format decribes a framing format for Snappy, allowing compressing to
5
+ files or streams that can then more easily be decompressed without having
6
+ to hold the entire stream in memory. It also provides data checksums to
7
+ help verify integrity. It does not provide metadata checksums, so it does
8
+ not protect against e.g. all forms of truncations.
9
+
10
+ Implementation of the framing format is optional for Snappy compressors and
11
+ decompressor; it is not part of the Snappy core specification.
12
+
13
+
14
+ 1. General structure
15
+
16
+ The file consists solely of chunks, lying back-to-back with no padding
17
+ in between. Each chunk consists first a single byte of chunk identifier,
18
+ then a three-byte little-endian length of the chunk in bytes (from 0 to
19
+ 16777215, inclusive), and then the data if any. The four bytes of chunk
20
+ header is not counted in the data length.
21
+
22
+ The different chunk types are listed below. The first chunk must always
23
+ be the stream identifier chunk (see section 4.1, below). The stream
24
+ ends when the file ends -- there is no explicit end-of-file marker.
25
+
26
+
27
+ 2. File type identification
28
+
29
+ The following identifiers for this format are recommended where appropriate.
30
+ However, note that none have been registered officially, so this is only to
31
+ be taken as a guideline. We use "Snappy framed" to distinguish between this
32
+ format and raw Snappy data.
33
+
34
+ File extension: .sz
35
+ MIME type: application/x-snappy-framed
36
+ HTTP Content-Encoding: x-snappy-framed
37
+
38
+
39
+ 3. Checksum format
40
+
41
+ Some chunks have data protected by a checksum (the ones that do will say so
42
+ explicitly). The checksums are always masked CRC-32Cs.
43
+
44
+ A description of CRC-32C can be found in RFC 3720, section 12.1, with
45
+ examples in section B.4.
46
+
47
+ Checksums are not stored directly, but masked, as checksumming data and
48
+ then its own checksum can be problematic. The masking is the same as used
49
+ in Apache Hadoop: Rotate the checksum by 15 bits, then add the constant
50
+ 0xa282ead8 (using wraparound as normal for unsigned integers). This is
51
+ equivalent to the following C code:
52
+
53
+ uint32_t mask_checksum(uint32_t x) {
54
+ return ((x >> 15) | (x << 17)) + 0xa282ead8;
55
+ }
56
+
57
+ Note that the masking is reversible.
58
+
59
+ The checksum is always stored as a four bytes long integer, in little-endian.
60
+
61
+
62
+ 4. Chunk types
63
+
64
+ The currently supported chunk types are described below. The list may
65
+ be extended in the future.
66
+
67
+
68
+ 4.1. Stream identifier (chunk type 0xff)
69
+
70
+ The stream identifier is always the first element in the stream.
71
+ It is exactly six bytes long and contains "sNaPpY" in ASCII. This means that
72
+ a valid Snappy framed stream always starts with the bytes
73
+
74
+ 0xff 0x06 0x00 0x00 0x73 0x4e 0x61 0x50 0x70 0x59
75
+
76
+ The stream identifier chunk can come multiple times in the stream besides
77
+ the first; if such a chunk shows up, it should simply be ignored, assuming
78
+ it has the right length and contents. This allows for easy concatenation of
79
+ compressed files without the need for re-framing.
80
+
81
+
82
+ 4.2. Compressed data (chunk type 0x00)
83
+
84
+ Compressed data chunks contain a normal Snappy compressed bitstream;
85
+ see the compressed format specification. The compressed data is preceded by
86
+ the CRC-32C (see section 3) of the _uncompressed_ data.
87
+
88
+ Note that the data portion of the chunk, i.e., the compressed contents,
89
+ can be at most 16777211 bytes (2^24 - 1, minus the checksum).
90
+ However, we place an additional restriction that the uncompressed data
91
+ in a chunk must be no longer than 65536 bytes. This allows consumers to
92
+ easily use small fixed-size buffers.
93
+
94
+
95
+ 4.3. Uncompressed data (chunk type 0x01)
96
+
97
+ Uncompressed data chunks allow a compressor to send uncompressed,
98
+ raw data; this is useful if, for instance, uncompressible or
99
+ near-incompressible data is detected, and faster decompression is desired.
100
+
101
+ As in the compressed chunks, the data is preceded by its own masked
102
+ CRC-32C (see section 3).
103
+
104
+ An uncompressed data chunk, like compressed data chunks, should contain
105
+ no more than 65536 data bytes, so the maximum legal chunk length with the
106
+ checksum is 65540.
107
+
108
+
109
+ 4.4. Padding (chunk type 0xfe)
110
+
111
+ Padding chunks allow a compressor to increase the size of the data stream
112
+ so that it complies with external demands, e.g. that the total number of
113
+ bytes is a multiple of some value.
114
+
115
+ All bytes of the padding chunk, except the chunk byte itself and the length,
116
+ should be zero, but decompressors must not try to interpret or verify the
117
+ padding data in any way.
118
+
119
+
120
+ 4.5. Reserved unskippable chunks (chunk types 0x02-0x7f)
121
+
122
+ These are reserved for future expansion. A decoder that sees such a chunk
123
+ should immediately return an error, as it must assume it cannot decode the
124
+ stream correctly.
125
+
126
+ Future versions of this specification may define meanings for these chunks.
127
+
128
+
129
+ 4.6. Reserved skippable chunks (chunk types 0x80-0xfd)
130
+
131
+ These are also reserved for future expansion, but unlike the chunks
132
+ described in 4.5, a decoder seeing these must skip them and continue
133
+ decoding.
134
+
135
+ Future versions of this specification may define meanings for these chunks.
@@ -0,0 +1,90 @@
1
+ // Copyright 2011 Martin Gieseking <martin.gieseking@uos.de>.
2
+ //
3
+ // Redistribution and use in source and binary forms, with or without
4
+ // modification, are permitted provided that the following conditions are
5
+ // met:
6
+ //
7
+ // * Redistributions of source code must retain the above copyright
8
+ // notice, this list of conditions and the following disclaimer.
9
+ // * Redistributions in binary form must reproduce the above
10
+ // copyright notice, this list of conditions and the following disclaimer
11
+ // in the documentation and/or other materials provided with the
12
+ // distribution.
13
+ // * Neither the name of Google Inc. nor the names of its
14
+ // contributors may be used to endorse or promote products derived from
15
+ // this software without specific prior written permission.
16
+ //
17
+ // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
18
+ // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
19
+ // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
20
+ // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
21
+ // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
22
+ // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
23
+ // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
24
+ // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
25
+ // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26
+ // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
27
+ // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28
+
29
+ #include "snappy.h"
30
+ #include "snappy-c.h"
31
+
32
+ extern "C" {
33
+
34
+ snappy_status snappy_compress(const char* input,
35
+ size_t input_length,
36
+ char* compressed,
37
+ size_t *compressed_length) {
38
+ if (*compressed_length < snappy_max_compressed_length(input_length)) {
39
+ return SNAPPY_BUFFER_TOO_SMALL;
40
+ }
41
+ snappy::RawCompress(input, input_length, compressed, compressed_length);
42
+ return SNAPPY_OK;
43
+ }
44
+
45
+ snappy_status snappy_uncompress(const char* compressed,
46
+ size_t compressed_length,
47
+ char* uncompressed,
48
+ size_t* uncompressed_length) {
49
+ size_t real_uncompressed_length;
50
+ if (!snappy::GetUncompressedLength(compressed,
51
+ compressed_length,
52
+ &real_uncompressed_length)) {
53
+ return SNAPPY_INVALID_INPUT;
54
+ }
55
+ if (*uncompressed_length < real_uncompressed_length) {
56
+ return SNAPPY_BUFFER_TOO_SMALL;
57
+ }
58
+ if (!snappy::RawUncompress(compressed, compressed_length, uncompressed)) {
59
+ return SNAPPY_INVALID_INPUT;
60
+ }
61
+ *uncompressed_length = real_uncompressed_length;
62
+ return SNAPPY_OK;
63
+ }
64
+
65
+ size_t snappy_max_compressed_length(size_t source_length) {
66
+ return snappy::MaxCompressedLength(source_length);
67
+ }
68
+
69
+ snappy_status snappy_uncompressed_length(const char *compressed,
70
+ size_t compressed_length,
71
+ size_t *result) {
72
+ if (snappy::GetUncompressedLength(compressed,
73
+ compressed_length,
74
+ result)) {
75
+ return SNAPPY_OK;
76
+ } else {
77
+ return SNAPPY_INVALID_INPUT;
78
+ }
79
+ }
80
+
81
+ snappy_status snappy_validate_compressed_buffer(const char *compressed,
82
+ size_t compressed_length) {
83
+ if (snappy::IsValidCompressedBuffer(compressed, compressed_length)) {
84
+ return SNAPPY_OK;
85
+ } else {
86
+ return SNAPPY_INVALID_INPUT;
87
+ }
88
+ }
89
+
90
+ } // extern "C"
@@ -0,0 +1,138 @@
1
+ /*
2
+ * Copyright 2011 Martin Gieseking <martin.gieseking@uos.de>.
3
+ *
4
+ * Redistribution and use in source and binary forms, with or without
5
+ * modification, are permitted provided that the following conditions are
6
+ * met:
7
+ *
8
+ * * Redistributions of source code must retain the above copyright
9
+ * notice, this list of conditions and the following disclaimer.
10
+ * * Redistributions in binary form must reproduce the above
11
+ * copyright notice, this list of conditions and the following disclaimer
12
+ * in the documentation and/or other materials provided with the
13
+ * distribution.
14
+ * * Neither the name of Google Inc. nor the names of its
15
+ * contributors may be used to endorse or promote products derived from
16
+ * this software without specific prior written permission.
17
+ *
18
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
20
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
21
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
22
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
23
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
24
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
25
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
26
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29
+ *
30
+ * Plain C interface (a wrapper around the C++ implementation).
31
+ */
32
+
33
+ #ifndef THIRD_PARTY_SNAPPY_OPENSOURCE_SNAPPY_C_H_
34
+ #define THIRD_PARTY_SNAPPY_OPENSOURCE_SNAPPY_C_H_
35
+
36
+ #ifdef __cplusplus
37
+ extern "C" {
38
+ #endif
39
+
40
+ #include <stddef.h>
41
+
42
+ /*
43
+ * Return values; see the documentation for each function to know
44
+ * what each can return.
45
+ */
46
+ typedef enum {
47
+ SNAPPY_OK = 0,
48
+ SNAPPY_INVALID_INPUT = 1,
49
+ SNAPPY_BUFFER_TOO_SMALL = 2
50
+ } snappy_status;
51
+
52
+ /*
53
+ * Takes the data stored in "input[0..input_length-1]" and stores
54
+ * it in the array pointed to by "compressed".
55
+ *
56
+ * <compressed_length> signals the space available in "compressed".
57
+ * If it is not at least equal to "snappy_max_compressed_length(input_length)",
58
+ * SNAPPY_BUFFER_TOO_SMALL is returned. After successful compression,
59
+ * <compressed_length> contains the true length of the compressed output,
60
+ * and SNAPPY_OK is returned.
61
+ *
62
+ * Example:
63
+ * size_t output_length = snappy_max_compressed_length(input_length);
64
+ * char* output = (char*)malloc(output_length);
65
+ * if (snappy_compress(input, input_length, output, &output_length)
66
+ * == SNAPPY_OK) {
67
+ * ... Process(output, output_length) ...
68
+ * }
69
+ * free(output);
70
+ */
71
+ snappy_status snappy_compress(const char* input,
72
+ size_t input_length,
73
+ char* compressed,
74
+ size_t* compressed_length);
75
+
76
+ /*
77
+ * Given data in "compressed[0..compressed_length-1]" generated by
78
+ * calling the snappy_compress routine, this routine stores
79
+ * the uncompressed data to
80
+ * uncompressed[0..uncompressed_length-1].
81
+ * Returns failure (a value not equal to SNAPPY_OK) if the message
82
+ * is corrupted and could not be decrypted.
83
+ *
84
+ * <uncompressed_length> signals the space available in "uncompressed".
85
+ * If it is not at least equal to the value returned by
86
+ * snappy_uncompressed_length for this stream, SNAPPY_BUFFER_TOO_SMALL
87
+ * is returned. After successful decompression, <uncompressed_length>
88
+ * contains the true length of the decompressed output.
89
+ *
90
+ * Example:
91
+ * size_t output_length;
92
+ * if (snappy_uncompressed_length(input, input_length, &output_length)
93
+ * != SNAPPY_OK) {
94
+ * ... fail ...
95
+ * }
96
+ * char* output = (char*)malloc(output_length);
97
+ * if (snappy_uncompress(input, input_length, output, &output_length)
98
+ * == SNAPPY_OK) {
99
+ * ... Process(output, output_length) ...
100
+ * }
101
+ * free(output);
102
+ */
103
+ snappy_status snappy_uncompress(const char* compressed,
104
+ size_t compressed_length,
105
+ char* uncompressed,
106
+ size_t* uncompressed_length);
107
+
108
+ /*
109
+ * Returns the maximal size of the compressed representation of
110
+ * input data that is "source_length" bytes in length.
111
+ */
112
+ size_t snappy_max_compressed_length(size_t source_length);
113
+
114
+ /*
115
+ * REQUIRES: "compressed[]" was produced by snappy_compress()
116
+ * Returns SNAPPY_OK and stores the length of the uncompressed data in
117
+ * *result normally. Returns SNAPPY_INVALID_INPUT on parsing error.
118
+ * This operation takes O(1) time.
119
+ */
120
+ snappy_status snappy_uncompressed_length(const char* compressed,
121
+ size_t compressed_length,
122
+ size_t* result);
123
+
124
+ /*
125
+ * Check if the contents of "compressed[]" can be uncompressed successfully.
126
+ * Does not return the uncompressed data; if so, returns SNAPPY_OK,
127
+ * or if not, returns SNAPPY_INVALID_INPUT.
128
+ * Takes time proportional to compressed_length, but is usually at least a
129
+ * factor of four faster than actual decompression.
130
+ */
131
+ snappy_status snappy_validate_compressed_buffer(const char* compressed,
132
+ size_t compressed_length);
133
+
134
+ #ifdef __cplusplus
135
+ } // extern "C"
136
+ #endif
137
+
138
+ #endif /* THIRD_PARTY_SNAPPY_OPENSOURCE_SNAPPY_C_H_ */
@@ -0,0 +1,224 @@
1
+ // Copyright 2008 Google Inc. All Rights Reserved.
2
+ //
3
+ // Redistribution and use in source and binary forms, with or without
4
+ // modification, are permitted provided that the following conditions are
5
+ // met:
6
+ //
7
+ // * Redistributions of source code must retain the above copyright
8
+ // notice, this list of conditions and the following disclaimer.
9
+ // * Redistributions in binary form must reproduce the above
10
+ // copyright notice, this list of conditions and the following disclaimer
11
+ // in the documentation and/or other materials provided with the
12
+ // distribution.
13
+ // * Neither the name of Google Inc. nor the names of its
14
+ // contributors may be used to endorse or promote products derived from
15
+ // this software without specific prior written permission.
16
+ //
17
+ // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
18
+ // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
19
+ // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
20
+ // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
21
+ // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
22
+ // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
23
+ // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
24
+ // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
25
+ // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26
+ // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
27
+ // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28
+ //
29
+ // Internals shared between the Snappy implementation and its unittest.
30
+
31
+ #ifndef THIRD_PARTY_SNAPPY_SNAPPY_INTERNAL_H_
32
+ #define THIRD_PARTY_SNAPPY_SNAPPY_INTERNAL_H_
33
+
34
+ #include "snappy-stubs-internal.h"
35
+
36
+ namespace snappy {
37
+ namespace internal {
38
+
39
+ class WorkingMemory {
40
+ public:
41
+ WorkingMemory() : large_table_(NULL) { }
42
+ ~WorkingMemory() { delete[] large_table_; }
43
+
44
+ // Allocates and clears a hash table using memory in "*this",
45
+ // stores the number of buckets in "*table_size" and returns a pointer to
46
+ // the base of the hash table.
47
+ uint16* GetHashTable(size_t input_size, int* table_size);
48
+
49
+ private:
50
+ uint16 small_table_[1<<10]; // 2KB
51
+ uint16* large_table_; // Allocated only when needed
52
+
53
+ // No copying
54
+ WorkingMemory(const WorkingMemory&);
55
+ void operator=(const WorkingMemory&);
56
+ };
57
+
58
+ // Flat array compression that does not emit the "uncompressed length"
59
+ // prefix. Compresses "input" string to the "*op" buffer.
60
+ //
61
+ // REQUIRES: "input_length <= kBlockSize"
62
+ // REQUIRES: "op" points to an array of memory that is at least
63
+ // "MaxCompressedLength(input_length)" in size.
64
+ // REQUIRES: All elements in "table[0..table_size-1]" are initialized to zero.
65
+ // REQUIRES: "table_size" is a power of two
66
+ //
67
+ // Returns an "end" pointer into "op" buffer.
68
+ // "end - op" is the compressed size of "input".
69
+ char* CompressFragment(const char* input,
70
+ size_t input_length,
71
+ char* op,
72
+ uint16* table,
73
+ const int table_size);
74
+
75
+ // Find the largest n such that
76
+ //
77
+ // s1[0,n-1] == s2[0,n-1]
78
+ // and n <= (s2_limit - s2).
79
+ //
80
+ // Return make_pair(n, n < 8).
81
+ // Does not read *s2_limit or beyond.
82
+ // Does not read *(s1 + (s2_limit - s2)) or beyond.
83
+ // Requires that s2_limit >= s2.
84
+ //
85
+ // Separate implementation for 64-bit, little-endian cpus.
86
+ #if !defined(SNAPPY_IS_BIG_ENDIAN) && \
87
+ (defined(ARCH_K8) || defined(ARCH_PPC) || defined(ARCH_ARM))
88
+ static inline std::pair<size_t, bool> FindMatchLength(const char* s1,
89
+ const char* s2,
90
+ const char* s2_limit) {
91
+ assert(s2_limit >= s2);
92
+ size_t matched = 0;
93
+
94
+ // This block isn't necessary for correctness; we could just start looping
95
+ // immediately. As an optimization though, it is useful. It creates some not
96
+ // uncommon code paths that determine, without extra effort, whether the match
97
+ // length is less than 8. In short, we are hoping to avoid a conditional
98
+ // branch, and perhaps get better code layout from the C++ compiler.
99
+ if (SNAPPY_PREDICT_TRUE(s2 <= s2_limit - 8)) {
100
+ uint64 a1 = UNALIGNED_LOAD64(s1);
101
+ uint64 a2 = UNALIGNED_LOAD64(s2);
102
+ if (a1 != a2) {
103
+ return std::pair<size_t, bool>(Bits::FindLSBSetNonZero64(a1 ^ a2) >> 3,
104
+ true);
105
+ } else {
106
+ matched = 8;
107
+ s2 += 8;
108
+ }
109
+ }
110
+
111
+ // Find out how long the match is. We loop over the data 64 bits at a
112
+ // time until we find a 64-bit block that doesn't match; then we find
113
+ // the first non-matching bit and use that to calculate the total
114
+ // length of the match.
115
+ while (SNAPPY_PREDICT_TRUE(s2 <= s2_limit - 8)) {
116
+ if (UNALIGNED_LOAD64(s2) == UNALIGNED_LOAD64(s1 + matched)) {
117
+ s2 += 8;
118
+ matched += 8;
119
+ } else {
120
+ uint64 x = UNALIGNED_LOAD64(s2) ^ UNALIGNED_LOAD64(s1 + matched);
121
+ int matching_bits = Bits::FindLSBSetNonZero64(x);
122
+ matched += matching_bits >> 3;
123
+ assert(matched >= 8);
124
+ return std::pair<size_t, bool>(matched, false);
125
+ }
126
+ }
127
+ while (SNAPPY_PREDICT_TRUE(s2 < s2_limit)) {
128
+ if (s1[matched] == *s2) {
129
+ ++s2;
130
+ ++matched;
131
+ } else {
132
+ return std::pair<size_t, bool>(matched, matched < 8);
133
+ }
134
+ }
135
+ return std::pair<size_t, bool>(matched, matched < 8);
136
+ }
137
+ #else
138
+ static inline std::pair<size_t, bool> FindMatchLength(const char* s1,
139
+ const char* s2,
140
+ const char* s2_limit) {
141
+ // Implementation based on the x86-64 version, above.
142
+ assert(s2_limit >= s2);
143
+ int matched = 0;
144
+
145
+ while (s2 <= s2_limit - 4 &&
146
+ UNALIGNED_LOAD32(s2) == UNALIGNED_LOAD32(s1 + matched)) {
147
+ s2 += 4;
148
+ matched += 4;
149
+ }
150
+ if (LittleEndian::IsLittleEndian() && s2 <= s2_limit - 4) {
151
+ uint32 x = UNALIGNED_LOAD32(s2) ^ UNALIGNED_LOAD32(s1 + matched);
152
+ int matching_bits = Bits::FindLSBSetNonZero(x);
153
+ matched += matching_bits >> 3;
154
+ } else {
155
+ while ((s2 < s2_limit) && (s1[matched] == *s2)) {
156
+ ++s2;
157
+ ++matched;
158
+ }
159
+ }
160
+ return std::pair<size_t, bool>(matched, matched < 8);
161
+ }
162
+ #endif
163
+
164
+ // Lookup tables for decompression code. Give --snappy_dump_decompression_table
165
+ // to the unit test to recompute char_table.
166
+
167
+ enum {
168
+ LITERAL = 0,
169
+ COPY_1_BYTE_OFFSET = 1, // 3 bit length + 3 bits of offset in opcode
170
+ COPY_2_BYTE_OFFSET = 2,
171
+ COPY_4_BYTE_OFFSET = 3
172
+ };
173
+ static const int kMaximumTagLength = 5; // COPY_4_BYTE_OFFSET plus the actual offset.
174
+
175
+ // Data stored per entry in lookup table:
176
+ // Range Bits-used Description
177
+ // ------------------------------------
178
+ // 1..64 0..7 Literal/copy length encoded in opcode byte
179
+ // 0..7 8..10 Copy offset encoded in opcode byte / 256
180
+ // 0..4 11..13 Extra bytes after opcode
181
+ //
182
+ // We use eight bits for the length even though 7 would have sufficed
183
+ // because of efficiency reasons:
184
+ // (1) Extracting a byte is faster than a bit-field
185
+ // (2) It properly aligns copy offset so we do not need a <<8
186
+ static const uint16 char_table[256] = {
187
+ 0x0001, 0x0804, 0x1001, 0x2001, 0x0002, 0x0805, 0x1002, 0x2002,
188
+ 0x0003, 0x0806, 0x1003, 0x2003, 0x0004, 0x0807, 0x1004, 0x2004,
189
+ 0x0005, 0x0808, 0x1005, 0x2005, 0x0006, 0x0809, 0x1006, 0x2006,
190
+ 0x0007, 0x080a, 0x1007, 0x2007, 0x0008, 0x080b, 0x1008, 0x2008,
191
+ 0x0009, 0x0904, 0x1009, 0x2009, 0x000a, 0x0905, 0x100a, 0x200a,
192
+ 0x000b, 0x0906, 0x100b, 0x200b, 0x000c, 0x0907, 0x100c, 0x200c,
193
+ 0x000d, 0x0908, 0x100d, 0x200d, 0x000e, 0x0909, 0x100e, 0x200e,
194
+ 0x000f, 0x090a, 0x100f, 0x200f, 0x0010, 0x090b, 0x1010, 0x2010,
195
+ 0x0011, 0x0a04, 0x1011, 0x2011, 0x0012, 0x0a05, 0x1012, 0x2012,
196
+ 0x0013, 0x0a06, 0x1013, 0x2013, 0x0014, 0x0a07, 0x1014, 0x2014,
197
+ 0x0015, 0x0a08, 0x1015, 0x2015, 0x0016, 0x0a09, 0x1016, 0x2016,
198
+ 0x0017, 0x0a0a, 0x1017, 0x2017, 0x0018, 0x0a0b, 0x1018, 0x2018,
199
+ 0x0019, 0x0b04, 0x1019, 0x2019, 0x001a, 0x0b05, 0x101a, 0x201a,
200
+ 0x001b, 0x0b06, 0x101b, 0x201b, 0x001c, 0x0b07, 0x101c, 0x201c,
201
+ 0x001d, 0x0b08, 0x101d, 0x201d, 0x001e, 0x0b09, 0x101e, 0x201e,
202
+ 0x001f, 0x0b0a, 0x101f, 0x201f, 0x0020, 0x0b0b, 0x1020, 0x2020,
203
+ 0x0021, 0x0c04, 0x1021, 0x2021, 0x0022, 0x0c05, 0x1022, 0x2022,
204
+ 0x0023, 0x0c06, 0x1023, 0x2023, 0x0024, 0x0c07, 0x1024, 0x2024,
205
+ 0x0025, 0x0c08, 0x1025, 0x2025, 0x0026, 0x0c09, 0x1026, 0x2026,
206
+ 0x0027, 0x0c0a, 0x1027, 0x2027, 0x0028, 0x0c0b, 0x1028, 0x2028,
207
+ 0x0029, 0x0d04, 0x1029, 0x2029, 0x002a, 0x0d05, 0x102a, 0x202a,
208
+ 0x002b, 0x0d06, 0x102b, 0x202b, 0x002c, 0x0d07, 0x102c, 0x202c,
209
+ 0x002d, 0x0d08, 0x102d, 0x202d, 0x002e, 0x0d09, 0x102e, 0x202e,
210
+ 0x002f, 0x0d0a, 0x102f, 0x202f, 0x0030, 0x0d0b, 0x1030, 0x2030,
211
+ 0x0031, 0x0e04, 0x1031, 0x2031, 0x0032, 0x0e05, 0x1032, 0x2032,
212
+ 0x0033, 0x0e06, 0x1033, 0x2033, 0x0034, 0x0e07, 0x1034, 0x2034,
213
+ 0x0035, 0x0e08, 0x1035, 0x2035, 0x0036, 0x0e09, 0x1036, 0x2036,
214
+ 0x0037, 0x0e0a, 0x1037, 0x2037, 0x0038, 0x0e0b, 0x1038, 0x2038,
215
+ 0x0039, 0x0f04, 0x1039, 0x2039, 0x003a, 0x0f05, 0x103a, 0x203a,
216
+ 0x003b, 0x0f06, 0x103b, 0x203b, 0x003c, 0x0f07, 0x103c, 0x203c,
217
+ 0x0801, 0x0f08, 0x103d, 0x203d, 0x1001, 0x0f09, 0x103e, 0x203e,
218
+ 0x1801, 0x0f0a, 0x103f, 0x203f, 0x2001, 0x0f0b, 0x1040, 0x2040
219
+ };
220
+
221
+ } // end namespace internal
222
+ } // end namespace snappy
223
+
224
+ #endif // THIRD_PARTY_SNAPPY_SNAPPY_INTERNAL_H_