RubyGems - snappy-ruby - Versions diffs - 0.1.0 - Mend

snappy-ruby 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

checksums.yaml +7 -0
data/LICENSE +21 -0
data/README.md +77 -0
data/Rakefile +12 -0
data/ext/snappy/extconf.rb +83 -0
data/ext/snappy/snappy-src/AUTHORS +1 -0
data/ext/snappy/snappy-src/BUILD.bazel +211 -0
data/ext/snappy/snappy-src/CMakeLists.txt +467 -0
data/ext/snappy/snappy-src/CONTRIBUTING.md +31 -0
data/ext/snappy/snappy-src/COPYING +54 -0
data/ext/snappy/snappy-src/MODULE.bazel +23 -0
data/ext/snappy/snappy-src/NEWS +215 -0
data/ext/snappy/snappy-src/README.md +165 -0
data/ext/snappy/snappy-src/WORKSPACE +27 -0
data/ext/snappy/snappy-src/WORKSPACE.bzlmod +0 -0
data/ext/snappy/snappy-src/cmake/SnappyConfig.cmake.in +33 -0
data/ext/snappy/snappy-src/cmake/config.h.in +75 -0
data/ext/snappy/snappy-src/config.h +78 -0
data/ext/snappy/snappy-src/docs/README.md +72 -0
data/ext/snappy/snappy-src/format_description.txt +110 -0
data/ext/snappy/snappy-src/framing_format.txt +135 -0
data/ext/snappy/snappy-src/snappy-c.cc +90 -0
data/ext/snappy/snappy-src/snappy-c.h +138 -0
data/ext/snappy/snappy-src/snappy-internal.h +444 -0
data/ext/snappy/snappy-src/snappy-sinksource.cc +121 -0
data/ext/snappy/snappy-src/snappy-sinksource.h +182 -0
data/ext/snappy/snappy-src/snappy-stubs-internal.cc +42 -0
data/ext/snappy/snappy-src/snappy-stubs-internal.h +531 -0
data/ext/snappy/snappy-src/snappy-stubs-public.h +60 -0
data/ext/snappy/snappy-src/snappy-stubs-public.h.in +63 -0
data/ext/snappy/snappy-src/snappy-test.cc +503 -0
data/ext/snappy/snappy-src/snappy-test.h +342 -0
data/ext/snappy/snappy-src/snappy.cc +2666 -0
data/ext/snappy/snappy-src/snappy.h +257 -0
data/ext/snappy/snappy-src/snappy_test_data.cc +57 -0
data/ext/snappy/snappy-src/snappy_test_data.h +68 -0
data/ext/snappy/snappy-src/snappy_test_tool.cc +471 -0
data/ext/snappy/snappy-src/snappy_unittest.cc +1023 -0
data/ext/snappy/snappy.c +282 -0
data/lib/snappy/snappy.so +0 -0
data/lib/snappy.rb +5 -0
metadata +142 -0

data/ext/snappy/snappy-src/format_description.txt ADDED Viewed

@@ -0,0 +1,110 @@
+Snappy compressed format description
+Last revised: 2011-10-05
+This is not a formal specification, but should suffice to explain most
+relevant parts of how the Snappy format works. It is originally based on
+text by Zeev Tarantov.
+Snappy is a LZ77-type compressor with a fixed, byte-oriented encoding.
+There is no entropy encoder backend nor framing layer -- the latter is
+assumed to be handled by other parts of the system.
+This document only describes the format, not how the Snappy compressor nor
+decompressor actually works. The correctness of the decompressor should not
+depend on implementation details of the compressor, and vice versa.
+1. Preamble
+The stream starts with the uncompressed length (up to a maximum of 2^32 - 1),
+stored as a little-endian varint. Varints consist of a series of bytes,
+where the lower 7 bits are data and the upper bit is set iff there are
+more bytes to be read. In other words, an uncompressed length of 64 would
+be stored as 0x40, and an uncompressed length of 2097150 (0x1FFFFE)
+would be stored as 0xFE 0xFF 0x7F.
+2. The compressed stream itself
+There are two types of elements in a Snappy stream: Literals and
+copies (backreferences). There is no restriction on the order of elements,
+except that the stream naturally cannot start with a copy. (Having
+two literals in a row is never optimal from a compression point of
+view, but nevertheless fully permitted.) Each element starts with a tag byte,
+and the lower two bits of this tag byte signal what type of element will
+follow:
+  00: Literal
+  01: Copy with 1-byte offset
+  10: Copy with 2-byte offset
+  11: Copy with 4-byte offset
+The interpretation of the upper six bits are element-dependent.
+2.1. Literals (00)
+Literals are uncompressed data stored directly in the byte stream.
+The literal length is stored differently depending on the length
+of the literal:
+ - For literals up to and including 60 bytes in length, the upper
+   six bits of the tag byte contain (len-1). The literal follows
+   immediately thereafter in the bytestream.
+ - For longer literals, the (len-1) value is stored after the tag byte,
+   little-endian. The upper six bits of the tag byte describe how
+   many bytes are used for the length; 60, 61, 62 or 63 for
+   1-4 bytes, respectively. The literal itself follows after the
+   length.
+2.2. Copies
+Copies are references back into previous decompressed data, telling
+the decompressor to reuse data it has previously decoded.
+They encode two values: The _offset_, saying how many bytes back
+from the current position to read, and the _length_, how many bytes
+to copy. Offsets of zero can be encoded, but are not legal;
+similarly, it is possible to encode backreferences that would
+go past the end of the block (offset > current decompressed position),
+which is also nonsensical and thus not allowed.
+As in most LZ77-based compressors, the length can be larger than the offset,
+yielding a form of run-length encoding (RLE). For instance,
+"xababab" could be encoded as
+  <literal: "xab"> <copy: offset=2 length=4>
+Note that since the current Snappy compressor works in 32 kB
+blocks and does not do matching across blocks, it will never produce
+a bitstream with offsets larger than about 32768. However, the
+decompressor should not rely on this, as it may change in the future.
+There are several different kinds of copy elements, depending on
+the amount of bytes to be copied (length), and how far back the
+data to be copied is (offset).
+2.2.1. Copy with 1-byte offset (01)
+These elements can encode lengths between [4..11] bytes and offsets
+between [0..2047] bytes. (len-4) occupies three bits and is stored
+in bits [2..4] of the tag byte. The offset occupies 11 bits, of which the
+upper three are stored in the upper three bits ([5..7]) of the tag byte,
+and the lower eight are stored in a byte following the tag byte.
+2.2.2. Copy with 2-byte offset (10)
+These elements can encode lengths between [1..64] and offsets from
+[0..65535]. (len-1) occupies six bits and is stored in the upper
+six bits ([2..7]) of the tag byte. The offset is stored as a
+little-endian 16-bit integer in the two bytes following the tag byte.
+2.2.3. Copy with 4-byte offset (11)
+These are like the copies with 2-byte offsets (see previous subsection),
+except that the offset is stored as a 32-bit integer instead of a
+16-bit integer (and thus will occupy four bytes).

data/ext/snappy/snappy-src/framing_format.txt ADDED Viewed

@@ -0,0 +1,135 @@
+Snappy framing format description
+Last revised: 2013-10-25
+This format decribes a framing format for Snappy, allowing compressing to
+files or streams that can then more easily be decompressed without having
+to hold the entire stream in memory. It also provides data checksums to
+help verify integrity. It does not provide metadata checksums, so it does
+not protect against e.g. all forms of truncations.
+Implementation of the framing format is optional for Snappy compressors and
+decompressor; it is not part of the Snappy core specification.
+1. General structure
+The file consists solely of chunks, lying back-to-back with no padding
+in between. Each chunk consists first a single byte of chunk identifier,
+then a three-byte little-endian length of the chunk in bytes (from 0 to
+16777215, inclusive), and then the data if any. The four bytes of chunk
+header is not counted in the data length.
+The different chunk types are listed below. The first chunk must always
+be the stream identifier chunk (see section 4.1, below). The stream
+ends when the file ends -- there is no explicit end-of-file marker.
+2. File type identification
+The following identifiers for this format are recommended where appropriate.
+However, note that none have been registered officially, so this is only to
+be taken as a guideline. We use "Snappy framed" to distinguish between this
+format and raw Snappy data.
+  File extension:         .sz
+  MIME type:              application/x-snappy-framed
+  HTTP Content-Encoding:  x-snappy-framed
+3. Checksum format
+Some chunks have data protected by a checksum (the ones that do will say so
+explicitly). The checksums are always masked CRC-32Cs.
+A description of CRC-32C can be found in RFC 3720, section 12.1, with
+examples in section B.4.
+Checksums are not stored directly, but masked, as checksumming data and
+then its own checksum can be problematic. The masking is the same as used
+in Apache Hadoop: Rotate the checksum by 15 bits, then add the constant
+0xa282ead8 (using wraparound as normal for unsigned integers). This is
+equivalent to the following C code:
+  uint32_t mask_checksum(uint32_t x) {
+    return ((x >> 15) | (x << 17)) + 0xa282ead8;
+  }
+Note that the masking is reversible.
+The checksum is always stored as a four bytes long integer, in little-endian.
+4. Chunk types
+The currently supported chunk types are described below. The list may
+be extended in the future.
+4.1. Stream identifier (chunk type 0xff)
+The stream identifier is always the first element in the stream.
+It is exactly six bytes long and contains "sNaPpY" in ASCII. This means that
+a valid Snappy framed stream always starts with the bytes
+  0xff 0x06 0x00 0x00 0x73 0x4e 0x61 0x50 0x70 0x59
+The stream identifier chunk can come multiple times in the stream besides
+the first; if such a chunk shows up, it should simply be ignored, assuming
+it has the right length and contents. This allows for easy concatenation of
+compressed files without the need for re-framing.
+4.2. Compressed data (chunk type 0x00)
+Compressed data chunks contain a normal Snappy compressed bitstream;
+see the compressed format specification. The compressed data is preceded by
+the CRC-32C (see section 3) of the _uncompressed_ data.
+Note that the data portion of the chunk, i.e., the compressed contents,
+can be at most 16777211 bytes (2^24 - 1, minus the checksum).
+However, we place an additional restriction that the uncompressed data
+in a chunk must be no longer than 65536 bytes. This allows consumers to
+easily use small fixed-size buffers.
+4.3. Uncompressed data (chunk type 0x01)
+Uncompressed data chunks allow a compressor to send uncompressed,
+raw data; this is useful if, for instance, uncompressible or
+near-incompressible data is detected, and faster decompression is desired.
+As in the compressed chunks, the data is preceded by its own masked
+CRC-32C (see section 3).
+An uncompressed data chunk, like compressed data chunks, should contain
+no more than 65536 data bytes, so the maximum legal chunk length with the
+checksum is 65540.
+4.4. Padding (chunk type 0xfe)
+Padding chunks allow a compressor to increase the size of the data stream
+so that it complies with external demands, e.g. that the total number of
+bytes is a multiple of some value.
+All bytes of the padding chunk, except the chunk byte itself and the length,
+should be zero, but decompressors must not try to interpret or verify the
+padding data in any way.
+4.5. Reserved unskippable chunks (chunk types 0x02-0x7f)
+These are reserved for future expansion. A decoder that sees such a chunk
+should immediately return an error, as it must assume it cannot decode the
+stream correctly.
+Future versions of this specification may define meanings for these chunks.
+4.6. Reserved skippable chunks (chunk types 0x80-0xfd)
+These are also reserved for future expansion, but unlike the chunks
+described in 4.5, a decoder seeing these must skip them and continue
+decoding.
+Future versions of this specification may define meanings for these chunks.

data/ext/snappy/snappy-src/snappy-c.cc ADDED Viewed

@@ -0,0 +1,90 @@
+// Copyright 2011 Martin Gieseking <martin.gieseking@uos.de>.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#include "snappy.h"
+#include "snappy-c.h"
+extern "C" {
+snappy_status snappy_compress(const char* input,
+                              size_t input_length,
+                              char* compressed,
+                              size_t *compressed_length) {
+  if (*compressed_length < snappy_max_compressed_length(input_length)) {
+    return SNAPPY_BUFFER_TOO_SMALL;
+  }
+  snappy::RawCompress(input, input_length, compressed, compressed_length);
+  return SNAPPY_OK;
+}
+snappy_status snappy_uncompress(const char* compressed,
+                                size_t compressed_length,
+                                char* uncompressed,
+                                size_t* uncompressed_length) {
+  size_t real_uncompressed_length;
+  if (!snappy::GetUncompressedLength(compressed,
+                                     compressed_length,
+                                     &real_uncompressed_length)) {
+    return SNAPPY_INVALID_INPUT;
+  }
+  if (*uncompressed_length < real_uncompressed_length) {
+    return SNAPPY_BUFFER_TOO_SMALL;
+  }
+  if (!snappy::RawUncompress(compressed, compressed_length, uncompressed)) {
+    return SNAPPY_INVALID_INPUT;
+  }
+  *uncompressed_length = real_uncompressed_length;
+  return SNAPPY_OK;
+}
+size_t snappy_max_compressed_length(size_t source_length) {
+  return snappy::MaxCompressedLength(source_length);
+}
+snappy_status snappy_uncompressed_length(const char *compressed,
+                                         size_t compressed_length,
+                                         size_t *result) {
+  if (snappy::GetUncompressedLength(compressed,
+                                    compressed_length,
+                                    result)) {
+    return SNAPPY_OK;
+  } else {
+    return SNAPPY_INVALID_INPUT;
+  }
+}
+snappy_status snappy_validate_compressed_buffer(const char *compressed,
+                                                size_t compressed_length) {
+  if (snappy::IsValidCompressedBuffer(compressed, compressed_length)) {
+    return SNAPPY_OK;
+  } else {
+    return SNAPPY_INVALID_INPUT;
+  }
+}
+}  // extern "C"

data/ext/snappy/snappy-src/snappy-c.h ADDED Viewed

@@ -0,0 +1,138 @@
+/*
+ * Copyright 2011 Martin Gieseking <martin.gieseking@uos.de>.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following disclaimer
+ * in the documentation and/or other materials provided with the
+ * distribution.
+ *     * Neither the name of Google Inc. nor the names of its
+ * contributors may be used to endorse or promote products derived from
+ * this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Plain C interface (a wrapper around the C++ implementation).
+ */
+#ifndef THIRD_PARTY_SNAPPY_OPENSOURCE_SNAPPY_C_H_
+#define THIRD_PARTY_SNAPPY_OPENSOURCE_SNAPPY_C_H_
+#ifdef __cplusplus
+extern "C" {
+#endif
+#include <stddef.h>
+/*
+ * Return values; see the documentation for each function to know
+ * what each can return.
+ */
+typedef enum {
+  SNAPPY_OK = 0,
+  SNAPPY_INVALID_INPUT = 1,
+  SNAPPY_BUFFER_TOO_SMALL = 2
+} snappy_status;
+/*
+ * Takes the data stored in "input[0..input_length-1]" and stores
+ * it in the array pointed to by "compressed".
+ *
+ * <compressed_length> signals the space available in "compressed".
+ * If it is not at least equal to "snappy_max_compressed_length(input_length)",
+ * SNAPPY_BUFFER_TOO_SMALL is returned. After successful compression,
+ * <compressed_length> contains the true length of the compressed output,
+ * and SNAPPY_OK is returned.
+ *
+ * Example:
+ *   size_t output_length = snappy_max_compressed_length(input_length);
+ *   char* output = (char*)malloc(output_length);
+ *   if (snappy_compress(input, input_length, output, &output_length)
+ *       == SNAPPY_OK) {
+ *     ... Process(output, output_length) ...
+ *   }
+ *   free(output);
+ */
+snappy_status snappy_compress(const char* input,
+                              size_t input_length,
+                              char* compressed,
+                              size_t* compressed_length);
+/*
+ * Given data in "compressed[0..compressed_length-1]" generated by
+ * calling the snappy_compress routine, this routine stores
+ * the uncompressed data to
+ *   uncompressed[0..uncompressed_length-1].
+ * Returns failure (a value not equal to SNAPPY_OK) if the message
+ * is corrupted and could not be decrypted.
+ *
+ * <uncompressed_length> signals the space available in "uncompressed".
+ * If it is not at least equal to the value returned by
+ * snappy_uncompressed_length for this stream, SNAPPY_BUFFER_TOO_SMALL
+ * is returned. After successful decompression, <uncompressed_length>
+ * contains the true length of the decompressed output.
+ *
+ * Example:
+ *   size_t output_length;
+ *   if (snappy_uncompressed_length(input, input_length, &output_length)
+ *       != SNAPPY_OK) {
+ *     ... fail ...
+ *   }
+ *   char* output = (char*)malloc(output_length);
+ *   if (snappy_uncompress(input, input_length, output, &output_length)
+ *       == SNAPPY_OK) {
+ *     ... Process(output, output_length) ...
+ *   }
+ *   free(output);
+ */
+snappy_status snappy_uncompress(const char* compressed,
+                                size_t compressed_length,
+                                char* uncompressed,
+                                size_t* uncompressed_length);
+/*
+ * Returns the maximal size of the compressed representation of
+ * input data that is "source_length" bytes in length.
+ */
+size_t snappy_max_compressed_length(size_t source_length);
+/*
+ * REQUIRES: "compressed[]" was produced by snappy_compress()
+ * Returns SNAPPY_OK and stores the length of the uncompressed data in
+ * *result normally. Returns SNAPPY_INVALID_INPUT on parsing error.
+ * This operation takes O(1) time.
+ */
+snappy_status snappy_uncompressed_length(const char* compressed,
+                                         size_t compressed_length,
+                                         size_t* result);
+/*
+ * Check if the contents of "compressed[]" can be uncompressed successfully.
+ * Does not return the uncompressed data; if so, returns SNAPPY_OK,
+ * or if not, returns SNAPPY_INVALID_INPUT.
+ * Takes time proportional to compressed_length, but is usually at least a
+ * factor of four faster than actual decompression.
+ */
+snappy_status snappy_validate_compressed_buffer(const char* compressed,
+                                                size_t compressed_length);
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+#endif  /* THIRD_PARTY_SNAPPY_OPENSOURCE_SNAPPY_C_H_ */