RubyGems - bentley_mcilroy - Versions diffs - 0.0.1 - Mend

bentley_mcilroy 0.0.1

Files changed (10) hide show

data/LICENSE +21 -0
data/README.md +133 -0
data/bentley_mcilroy.gemspec +14 -0
data/lib/bentley_mcilroy.rb +236 -0
data/lib/rolling_hash.rb +101 -0
data/rakefile +11 -0
data/test/bentley_mcilroy_test.rb +99 -0
data/test/rolling_hash_test.rb +20 -0
data/test/test_helper.rb +1 -0
metadata +90 -0

data/LICENSE ADDED

@@ -0,0 +1,21 @@
+(MIT License)
+Copyright (c) 2013 Adam Prescott
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED

@@ -0,0 +1,133 @@
+A Ruby implementation of Bentley-McIlroy's data compression scheme to encode
+compressed versions of strings, and compute deltas between source and target.
+Note the compression and delta encodings are simply represented with Ruby
+objects, and is independent of any particular binary format.
+The fingerprinting algorithm is the rolling hash frequently used for Rabin-Karp
+string matching.
+# Usage
+To compress a string, pass the input and block size.
+    codec = BentleyMcIlroy::Codec
+    codec.compress("aaaaaa", 3)     #=> ["a", [0, 5]]
+    codec.compress("abcabcabc", 3)  #=> ["abc", [0, 6]]
+    codec.compress("xabcdabcdy", 2) #=> ["xabcda", [2, 3], "y"]
+    codec.compress("xabcdabcdy", 1) #=> ["xabcd", [1, 4], "y"]
+# Modes of operation
+This library supports two modes of operation: compression and delta encoding.
+With compression, a single input is compressed. With delta encoding, there is a
+(non-empty) source and a target, and the result is a delta which can be
+used to reconstruct the target, given the source. Compression is a special
+case of delta encoding where there is no source.
+With compression, the source data is everything to the left of the position we've
+reached along the string. With delta encoding, the source data is fixed for the
+entire time we move left-to-right through the target string.
+Compression:
+    codec.compress("aaaaaa", 3)     #=> ["a", [0, 5]]
+    codec.compress("abcabcabc", 3)  #=> ["abc", [0, 6]]
+    codec.compress("xabcdabcdy", 2) #=> ["xabcda", [2, 3], "y"]
+    codec.compress("xabcdabcdy", 1) #=> ["xabcd", [1, 4], "y"]
+Delta encoding is similar:
+    codec.encode("abcd", "xabcdyabcdz", 1) #=> ["x", [0, 4], "y", [0, 4], "z"]
+    codec.encode("xyz", "xyz", 3) #=> []
+To decompress:
+    codec.decompress(["xabcd", [1, 4], "y"]) #=> "xabcdabcdy"
+To decode a delta against a source:
+    codec.decode("abcd", ["x", [0, 4], "y", [0, 4], "z"]) #=> "xabcdyabcdz"
+# About Bentley-McIlroy
+The Bentley-McIlroy compression scheme is an algorithm for compressing a
+string by finding long common substrings. The algorithm and its properties
+are described in greater detail in their [1999 paper][bentley-mcilroy paper]. The technique, with a
+source dictionary and a target string, is used in Google's implementation of
+a VCDIFF encoder, [open-vcdiff][open-vcdiff project], as part of encoding deltas.
+[bentley-mcilroy paper]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.11.8470&rep=rep1&type=pdf
+[open-vcdiff project]: http://code.google.com/p/open-vcdiff/
+To give a brief summary, the algorithm works by fixing a window of block size
+b and then sliding over the string, storing the fingerprint of every b-th
+window.  These stored fingerprints are then used to detect repetitions later
+on in the string.
+The algorithm in pseudocode, as given in the paper is:
+    initialize fp
+    for (i = b; i < n; i++)
+      if (i % b == 0)
+        store(fp, i)
+      update fp to include a[i] and exclude a[i-b]
+      checkformatch(fp, i)
+In the algorithm above, `checkformatch(fp, i)` looks up the fingerprint `fp` in a
+hash table and then encodes a match if one is found.
+`checkformatch(fp, i)` is the core piece of this algorithm, and "encodes a
+match" is not fully described in the paper. The rest of the algorithm simply
+describes moving through the string with a sliding window, looking at
+substrings and storing fingerprints whenever we cross a block boundary.
+As described in the paper, suppose b = 100 and that the current block matches
+block 56 (i.e., bytes 5600 through to 5699). This current block could then be
+encoded as <5600,100>.
+There are two similar improvements which can be made, so as to prevent
+`"ababab"` from compressing into `"ab<0,2><0,2>"`, both of which are also in the
+paper.  When we know that the current block matches block 56, we can extend
+the match as far back as possible, not exceeding b - 1 bytes. Similarly, we
+can move the match far forward as possible without limitation.
+The reason there is a limit of b-1 bytes when moving backwards is that if
+there were more to match beyond b-1 bytes, it would've been found in a
+previous iteration of the loop.
+This library implementation moves matches forward, but does not move matches
+backwards.
+To be more explicit about what extending the match means, consider
+    xabcdabcdy  (the string)
+    0123456789  (indices)
+with a block size of b = 2. Moving left to right, the fingerprints of `"xa"`,
+`"ab"`, `"bc"`, ..., are computed, but only `"xa"`, `"bc"`, `"da"`, ... are stored. When
+`"ab"` is seen at `5..6`, there is no corresponding entry in the hash table, so
+nothing is done, yet. On the next substring of length 2, `"bc"`, at positions
+`6..7`, there _is_ a corresponding entry in the hash table, so there's a match,
+which we could encode as `<2, 2>`, say. However, we'd like to _actually_ produce
+`<1, 4>`, which is more efficient. So starting with `<2, 2>`, we move the match
+back 1 character for both the `"bc"` at `6..7` and the `"bc"` at `2..3`, then check
+if `1..3` matches `5..7`, which it does. This is moving the match backwards.
+For moving the match forwards, simply do the same thing. Check if `1..4` matches
+`6..8`, which it does. `1..5` does not match `6..9`, so we use `<1, 4>` and we're done.
+The resulting string, with backward- and forward-extension is `xabcd<1, 4>y`. In
+the case of no backward extensions, it is `xabcda<2, 3>y`.
+# License
+Copyright (c) Adam Prescott, released under the MIT license. See the license file.
+# TODO
+    compress("abcaaaaaa", 1) -> ["abc", [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1]]
+Can this be fixed to be: ["abc", [0, 1], [3, 5]] ? Essentially following the paper
+and picking the longest match on a clash (here, index 0 and index 3 are hit for
+index 4, but index 3 leads to a better result when the match is extended forward)

data/bentley_mcilroy.gemspec ADDED

@@ -0,0 +1,14 @@
+Gem::Specification.new do |s|
+  s.name         = "bentley_mcilroy"
+  s.version      = "0.0.1"
+  s.authors      = ["Adam Prescott"]
+  s.email        = ["adam@aprescott.com"]
+  s.homepage     = "https://github.com/aprescott/bentley_mcilroy"
+  s.summary      = "Bentley-McIlroy compression scheme implementation in Ruby."
+  s.description  = "A compression scheme using the Bentley-McIlroy data compression technique of finding long common substrings."
+  s.files        = Dir["{lib/**/*,test/**/*}"] + %w[LICENSE README.md bentley_mcilroy.gemspec rakefile]
+  s.test_files   = Dir["test/*"]
+  s.require_path = "lib"
+  s.add_development_dependency "rake"
+  s.add_development_dependency "rspec"
+end

data/lib/bentley_mcilroy.rb ADDED

@@ -0,0 +1,236 @@
+require "rolling_hash"
+module BentleyMcIlroy
+# A fixed block of text, appearing in the original text at one of
+# 0..b-1, b..2b-1, 2b..3b-1, ...
+class Block
+  attr_reader :text, :position
+  def initialize(text, position)
+    @text = text
+    @position = position
+  end
+  def hash
+    RollingHash.new.hash(text)
+  end
+end
+# A container for the original text we're processing. Divides the text into
+# Block objects.
+class BlockSequencedText
+  attr_reader :blocks, :text
+  def initialize(text, block_size)
+    @text = text
+    @block_size = block_size
+    @blocks = []
+    # "onetwothree" -> ["one", "two", "thr", "ee"]
+    @text.scan(/.(?:.?){#{@block_size-1}}/).each.with_index do |text_block, index|
+      @blocks << Block.new(text_block, index * @block_size)
+    end
+  end
+end
+# Look-up table with a #find method which finds an appropriate block and then
+# modifies the match to extend it to more characters.
+class BlockFingerprintTable
+  def initialize(block_sequenced_text)
+    @blocked_text = block_sequenced_text
+    @hash = {}
+    @blocked_text.blocks.each do |block|
+      (@hash[block.hash] ||= []) << block
+    end
+  end
+  def find_for_compress(fingerprint, block_size, target, position)
+    source = @blocked_text.text
+    find(fingerprint, block_size, source, target, position)
+  end
+  def find_for_diff(fingerprint, block_size, target)
+    source = @blocked_text.text
+    find(fingerprint, block_size, source, target)
+  end
+  private
+  def find(fingerprint, block_size, source, target, position = nil)
+    blocks = @hash[fingerprint]
+    return nil unless blocks
+    blocks.each do |block|
+      next unless block.text == target[0, block_size]
+      # in compression, since we don't have true source and target strings as
+      # separate things, we have to ensure that we don't use a fingerprinted
+      # block which appears _after_ the current position, otherwise
+      #
+      # a<x, 0> with x > 0
+      #
+      # might happen, or similar. since blocks are ordered left to right in the
+      # string, we can just return nil, because we know there's not going to be
+      # a valid block for compression.
+      if position && block.position >= position
+        return nil
+      end
+      # we know that block matches, so cut it from the beginning,
+      # so we can then see how much of the rest also matches
+      source_match = source[block.position + block_size..-1]
+      target_match = target[block_size..-1]
+      # in a backwards extension, we can see how many of the characters before
+      # +position+ (up the previous block we covered, which is +limit+) match
+      # characters block.position (up to b-1) characters. In other words, we can
+      # find the maximum i such that
+      #
+      # original_text[position-k, 1] == original_text[block.position-k, 1]
+      #
+      # for all k in {1, 2, ..., i}, where i <= b-1
+      # it may be that the block we've matched on reaches to the end of the
+      # string, in which case, bail
+      if source_match.empty? || target_match.empty?
+        return block
+      end
+      end_index = find_end_index(source_match, target_match)
+      match = produce_match(end_index, block, source)
+      return match
+    end
+    nil
+  end
+  def find_end_index(source, target)
+    end_index = 0
+    any_match = false
+    while end_index < source.length && end_index < target.length && source[end_index, 1] == target[end_index, 1]
+      any_match = true
+      end_index += 1
+    end
+    # undo the final increment, since that's where it failed the equality check
+    end_index -= 1
+    any_match ? end_index : nil
+  end
+  def produce_match(end_index, block, source)
+    text = block.text
+    if end_index # we have more to grab in the string
+      text += source[0..end_index]
+    end
+    Block.new(text, block.position)
+  end
+end
+class Codec
+  def self.decompress(sequence)
+    sequence.inject("") do |result, i|
+      if i.is_a?(Array)
+        index, length = i
+        length.times do |k|
+          result << result[index+k, 1]
+        end
+        result
+      else
+        result << i
+      end
+    end
+  end
+  def self.decode(source, delta)
+    delta.inject("") do |result, i|
+      if i.is_a?(Array)
+        index, length = i
+        result << source[index, length]
+      else
+        result << i
+      end
+    end
+  end
+  def self.compress(text, block_size)
+    __compress_encode__(text, nil, block_size)
+  end
+  def self.encode(source, target, block_size)
+    __compress_encode__(source, target, block_size)
+  end
+  private
+  def self.__compress_encode__(source, target, block_size)
+    return [] if source == target
+    block_sequenced_text = BlockSequencedText.new(source, block_size)
+    table = BlockFingerprintTable.new(block_sequenced_text)
+    output = []
+    buffer = ""
+    current_hash = nil
+    hasher = RollingHash.new
+    mode = (target ? :diff : :compress)
+    if mode == :compress
+      # it's the source we're compressing, there is no target
+      text = source
+    else
+      # it's the target we're compressing against the source
+      text = target
+    end
+    position = 0
+    while position < text.length
+      if text.length - position < block_size
+        # if there isn't a block-sized substring in the remaining text, stop.
+        # note that we could add the buffer to the output here, but if block_size
+        # is 1, text.length - position < 1 can't be true, so the final character
+        # would go missing. so appending to the buffer goes below, outside the
+        # while loop.
+        break
+      end
+      # if we've recently found a block of text which matches and added that to
+      # the output, current_hash will be reset to nil, so get the new hash. note
+      # that we can't just use next_hash, because we might have skipped several
+      # characters in one go, which breaks the rolling aspect of the hash
+      if !current_hash
+        current_hash = hasher.hash(text[position, block_size])
+      else
+        # position-1 is the previous position, + block_size to get the last
+        # character of the current block
+        current_hash = hasher.next_hash(text[position-1 + block_size, 1])
+      end
+      match = target ? table.find_for_diff(current_hash, block_size, target[position..-1]) :
+                       table.find_for_compress(current_hash, block_size, text[position..-1], position)
+      if match
+        if !buffer.empty?
+          output << buffer
+          buffer = ""
+        end
+        output << [match.position, match.text.length]
+        position += match.text.length
+        current_hash = nil
+        # get a new hasher, because we've skipped over by match.text.length
+        # characters, so the rolling hash's next_hash won't work
+        hasher = RollingHash.new
+      else
+        buffer << text[position, 1]
+        position += 1
+      end
+    end
+    remainder = buffer + text[position..-1]
+    output << remainder if !remainder.empty?
+    output
+  end
+end
+end

data/lib/rolling_hash.rb ADDED

@@ -0,0 +1,101 @@
+if RUBY_VERSION < "1.9"
+  class String
+    def ord
+      self[0]
+    end
+  end
+end
+# Rolling hash as used in Rabin-Karp.
+#
+# hasher = RollingHash.new
+# hasher.hash("abc")    #=> 6432038
+# hasher.next_hash("d") #=> 6498345
+#                             ||
+# hasher.hash("bcd")    #=> 6498345
+class RollingHash
+  def initialize(hash = {})
+    hash = { :base => 257, # prime
+             :mod  => 1000000007
+           }.merge!(hash)
+    @base = hash[:base]
+    @mod  = hash[:mod]
+  end
+  # Compute @base**power working modulo @mod
+  def modulo_exp(power)
+    self.class.modulo_exp(@base, power, @mod)
+  end
+  # Given a string "abc...xyz" with length len,
+  # return the hash using @base as
+  #
+  # "a".ord * @base**(len - 1) +
+  # "b".ord * @base**(len - 2) +
+  # ... +
+  # "y".ord * @base**(1) +
+  # "z".ord * @base**0 (== "z".ord)
+  def hash(input)
+    hash = 0
+    characters = input.split("")
+    input_length = characters.length
+    characters.each_with_index do |character, index|
+      hash += character.ord * modulo_exp(input_length - 1 - index) % @mod
+      hash = hash % @mod
+    end
+    @prev_hash = hash
+    @prev_input = input
+    @highest_power = input_length - 1
+    hash
+  end
+  # Returns the hash of (@prev_input[1..-1] + character)
+  # by using @prev_hash, so that the sum turns from
+  #
+  # "a".ord       * @base**(len - 1) +
+  # "b".ord       * @base**(len - 2) +
+  # ... +
+  # "y".ord       * @base**(1) +
+  # "z".ord       * @base**0 (== "z".ord)
+  #
+  # into
+  #
+  # "b".ord       * @base**(len - 1) +
+  # ... +
+  # "y".ord       * @base**(2) +
+  # "z".ord       * @base**1 +
+  # character.ord * @base**0
+  def next_hash(character)
+    # the leading value of the computed sum
+    char_to_subtract = @prev_input.chars.first
+    hash = @prev_hash
+    # subtract the leading value
+    hash = hash - char_to_subtract.ord * @base**@highest_power
+    # shift everything over to the left by 1, and add the
+    # new character as the lowest value
+    hash = (hash * @base) + character.ord
+    hash = hash % @mod
+    # trim off the first character
+    @prev_input.slice!(0)
+    @prev_input << character
+    @prev_hash = hash
+    hash
+  end
+  private
+  # Returns n**power but reduced modulo mod
+  # at each step of the calculation.
+  def self.modulo_exp(n, power, mod)
+    value = 1
+    power.times do
+      value = (n * value) % mod
+    end
+    value
+  end
+end

data/rakefile ADDED

@@ -0,0 +1,11 @@
+require "rake"
+require "rspec/core/rake_task"
+RSpec::Core::RakeTask.new(:test) do |t|
+  t.rspec_opts = "-I test --color --format nested"
+  t.pattern = "test/**/*_test.rb"
+  t.verbose = false
+  t.fail_on_error = true
+end
+task :default => :test

data/test/bentley_mcilroy_test.rb ADDED

@@ -0,0 +1,99 @@
+require "test_helper"
+describe BentleyMcIlroy::Codec do
+  describe ".compress" do
+    it "compresses strings" do
+      codec = BentleyMcIlroy::Codec
+      str = "aaaaaaaaaaaaaaaaaaaaaaa"
+      (1..10).each { |i| codec.compress(str, i).should == [str[0, 1], [0, str.length-1]] }
+      codec.compress("abcabcabc", 3).should == ["abc", [0, 6]]
+      codec.compress("abababab", 2).should == ["ab", [0, 6]]
+      codec.compress("abcdefabc", 3).should == ["abcdef", [0, 3]]
+      codec.compress("abcdefabcdef", 3).should == ["abcdef", [0, 6]]
+      codec.compress("abcabcabc", 2).should == ["abc", [0, 6]]
+      codec.compress("xabcdabcdy", 2).should == ["xabcda", [2, 3], "y"]
+      codec.compress("xabcdabcdy", 1).should == ["xabcd", [1, 4], "y"]
+      codec.compress("xabcabcy", 2).should == ["xabca", [2, 2], "y"]
+    end
+    # "aaaa" should compress down to ["a", [0, 3]]
+    it "picks the longest match on clashes"
+    #                       11
+    #         0123    45678901
+    # encode("xaby", "abababab", 1) would be more efficiently encoded as
+    #
+    # ["x", [1, 2], [4, 6]]
+    #
+    # where [4, 6] refers to the decoded target itself, in the style of
+    # VCDIFF. See RFC3284 section 3, where COPY 4, 4 + COPY 12, 24 is used.
+    #
+    # this should probably only be allowed with a flag or something.
+    #
+    # note that compress is more efficient for this type of input,
+    # since the "source" is everything to the left of the current position:
+    #
+    # compress("abababab", 1) #=> ["ab", [0, 6]]
+    it "can refer to its own target"
+    it "handles binary" do
+      codec = BentleyMcIlroy::Codec
+      str = ("\x52\303\x66" * 3)
+      str.force_encoding("BINARY") if str.respond_to?(:force_encoding)
+      codec.compress(str, 3).should == ["\x52\303\x66", [0, 6]]
+    end
+  end
+  describe ".decompress" do
+    it "converts arrays representing compressed strings into the full string" do
+      codec = BentleyMcIlroy::Codec
+      codec.decompress(["abc", [0, 6]]).should == "abcabcabc"
+      codec.decompress(["abcdef", [0, 3]]).should == "abcdefabc"
+      codec.decompress(["xabcda", [2, 3], "y"]).should == "xabcdabcdy"
+      codec.decompress(["xabcd", [1, 4], "y"]).should == "xabcdabcdy"
+      codec.decompress(["xabca", [2, 2], "y"]).should == "xabcabcy"
+    end
+    it "round-trips with the compression method" do
+      codec = BentleyMcIlroy::Codec
+      %w[aaaaaaaaa abcabcabcabc abababab abcdefabc abcdefabcdef abcabcabc xabcdabcdy xabcabcy].each do |s|
+        (1..4).each do |n|
+          codec.decompress(codec.compress(s, n)).should == s
+        end
+      end
+    end
+  end
+  describe ".encode" do
+    it "encodes strings" do
+      codec = BentleyMcIlroy::Codec
+      codec.encode("abcdef", "defghiabc", 3).should == [[3, 3], "ghi", [0, 3]]
+      codec.encode("abcdef", "defghiabc", 2).should == ["d", [4, 2], "ghi", [0, 3]]
+      codec.encode("abcdef", "defghiabc", 1).should == [[3, 3], "ghi", [0, 3]]
+      codec.encode("abc", "d", 3).should == ["d"]
+      codec.encode("abc", "defghi", 3).should == ["defghi"]
+      codec.encode("abcdef", "abcdef", 3).should == []
+      codec.encode("abc", "abcdef", 3).should == [[0, 3], "def"]
+      codec.encode("aaaaa", "aaaaaaaaaa", 3).should == [[0, 5], [0, 5]]
+    end
+  end
+  describe ".decode" do
+    it "applies the given delta to the given source" do
+      codec = BentleyMcIlroy::Codec
+      codec.decode("aaaaa", [[0, 5], [0, 5]]).should == "aaaaaaaaaa"
+      codec.decode("abcdef", [[3, 3], "ghi", [0, 3]]).should == "defghiabc"
+    end
+    it "round-trips with the delta method" do
+      codec = BentleyMcIlroy::Codec
+      (1..4).each do |n|
+        codec.decode("abcdef", codec.encode("abcdef", "defghiabc", n)).should == "defghiabc"
+      end
+    end
+  end
+end

data/test/rolling_hash_test.rb ADDED

@@ -0,0 +1,20 @@
+require "test_helper"
+describe RollingHash do
+  describe "#hash(input)" do
+    it "hashes the input using a polynomial" do
+      hasher = RollingHash.new
+      hasher.hash("abc").should == 6432038
+      hasher.hash("bcd").should == 6498345
+    end
+  end
+  describe "#next_hash(next_input)" do
+    it "takes the previously hash, the given next input and computes the new hash" do
+      hasher = RollingHash.new
+      h = hasher.hash("abc")
+      new_h = hasher.next_hash("d")
+      new_h.should == RollingHash.new.hash("bcd")
+    end
+  end
+end

data/test/test_helper.rb ADDED

	@@ -0,0 +1 @@
1	+ require "bentley_mcilroy"

metadata ADDED

@@ -0,0 +1,90 @@
+--- !ruby/object:Gem::Specification
+name: bentley_mcilroy
+version: !ruby/object:Gem::Version
+  version: 0.0.1
+  prerelease:
+platform: ruby
+authors:
+- Adam Prescott
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2013-09-09 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+description: A compression scheme using the Bentley-McIlroy data compression technique
+  of finding long common substrings.
+email:
+- adam@aprescott.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- lib/bentley_mcilroy.rb
+- lib/rolling_hash.rb
+- test/test_helper.rb
+- test/bentley_mcilroy_test.rb
+- test/rolling_hash_test.rb
+- LICENSE
+- README.md
+- bentley_mcilroy.gemspec
+- rakefile
+homepage: https://github.com/aprescott/bentley_mcilroy
+licenses: []
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 1.8.24
+signing_key:
+specification_version: 3
+summary: Bentley-McIlroy compression scheme implementation in Ruby.
+test_files:
+- test/test_helper.rb
+- test/bentley_mcilroy_test.rb
+- test/rolling_hash_test.rb