RubyGems - fossilize - Versions diffs - 1.0.0 - Mend

fossilize 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

data/.gitignore +18 -0
data/Gemfile +4 -0
data/LICENSE +22 -0
data/README.md +105 -0
data/Rakefile +2 -0
data/bin/fossilize +9 -0
data/ext/fossilize/extconf.rb +7 -0
data/ext/fossilize/fossilize.c +626 -0
data/fossilize.gemspec +27 -0
data/lib/fossilize/delta.rb +13 -0
data/lib/fossilize/ring_buffer.rb +27 -0
data/lib/fossilize/version.rb +3 -0
data/lib/fossilize.rb +161 -0
data/spec/fossilize_spec.rb +35 -0
data/spec/spec_helper.rb +7 -0
metadata +129 -0

data/.gitignore ADDED Viewed

@@ -0,0 +1,18 @@
+*.gem
+*.rbc
+.bundle
+.config
+coverage
+InstalledFiles
+lib/bundler/man
+pkg
+rdoc
+spec/reports
+test/tmp
+test/version_tmp
+tmp
+# YARD artifacts
+.yardoc
+_yardoc
+doc/

data/Gemfile ADDED Viewed

@@ -0,0 +1,4 @@
+source 'https://rubygems.org'
+# Specify your gem's dependencies in fossilize.gemspec
+gemspec

data/LICENSE ADDED Viewed

@@ -0,0 +1,22 @@
+Copyright (c) 2012 Mark Anthony Gibbins
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,105 @@
+# Fossilize
+Fossilize is an FFI-powered C-extension for Ruby that interfaces with the delta encoding algorithm
+created by D. Richard Hipp for the [FOSSIL SCM project][fossil]. It enables a Ruby program to quickly (and I mean quickly) generate a delta between files and strings, as well as apply those deltas.
+Deltas can be created between a Ruby File object and a String and vice-versa, so you can read in some JSON from a remote server as a String, create a delta from your local File copy and then apply that delta to your local copy to merge the differences.
+**The project is currently considered a work-in-progress.**
+[fossil]: http://www.fossil-scm.org
+## Why use Fossilize?
+The algorithm itself is based on rsync and is a form of [Delta Encoding][de] (sometimes called Delta Compression). A delta encoding algorithm is designed to analyse two pieces of data and produce a delta (the differences between them) as an encoded string.
+Here, I'll give an example. If I give the following two strings to the algorithm:
+    xiy needs to get a job!
+    maybe xiy needs to get a real job!
+It spits out the following *delta string* (I sense sarcasm in its tone):
+    _
+    6:maybe J@0,B:*real* job!1rx1Az;
+Although explaining the format of the delta string is out of the scope of this internets page, you can see why this algorithm is so damn cool.
+For more info on the algorithm, see the excellent documentation over [here][delta-format].
+Git uses a similar algorithm to only store the changes to tracked files between revisions. However, the deltas created by Git can sometimes be huge.
+### Real World Examples
+As a real world example, here are the differences between `ruby/ruby/array.c@e3efce` and it's previous commit (you can see the diff [here](https://github.com/ruby/ruby/commit/e3efce6df1aa691e17c59f442b35b4fd129d3a13#array.c)).
+    WBD
+    N86@0,g:rb_random_ulong_limited((randgen), (max)-1)U@OvG,8:shuffle!H@OGW,_B@N9S,4:long49@Nii,F:i = RAND_UPTO(lb@NnL,4:<= iK@Nn~,N@Ntk,H:	}
+    	return ptr[i]b@3Rx,33@Npg,H:RAND_UPTO(len - iG@L00,8:k = len;~@NtF,C:len < k) {
+    	R@O1i,K@DZ~,_:n; ++i) {
+    		if (rnds[i] >= len) {
+    		I@F9W,C:new2(0);
+    		}H@QGF,U@NrC,16@Nu_,7:rnds[0]r@Nv~,L:rnds[0];
+    	j = rnds[1]1F@Nxf,Z:rnds[0];
+    	j = rnds[1];
+    	k = rnds[2]40@N~E,I:rnds[0];
+    	for (i=1J@OBW,B:k = rnds[i]86X@O4S,32IfMm;
+In terms of diffing, it wouldn't be hard to parse this delta and determine where in a file modifications took place at a *per-character level* as opposed to the traditional per-line approach.
+### Things Fossilize is good at:
+1. Patching - the only data to transmit is the delta string.
+2. Diffing - although the format isn't human readable, it wouldn't be hard to make it so. Unlike *diff*, the algorithm compares by character, not by line.
+3. Syncing - my project `wormhole` uses this algorithm to ensure it only syncs the updated portions of my files, not the entire thing.
+### Things Fossilize is *not* good at:
+1. Comparing differences between two almost completely different pieces of data (see [Wikipedia][de]).
+2. Binary diffing - while it can successfully create and apply binary patches, there are algorithms and tools better designed for these types of files. `bsdiff` will create a 2kb diff whereas Fossilize will create a 4kb diff. This is probably due to the fact that the algorithm uses base64 encoding of plain-text which ends up with binary artefacts popping up in the diff. Although they don't make it into the output, it's obviously better to use something like bsdiff. It would however be possible to modify the algorithm to use a different "mode" for binary files that uses binary encoding instead.
+[de]: http://en.wikipedia.org/wiki/Delta_encoding
+[delta-format]: http://www.fossil-scm.org/xfer/doc/trunk/www/delta_format.wiki
+## Installation
+Add this line to your application's Gemfile:
+    gem 'fossilize'
+And then execute:
+    $ bundle install
+Or install it yourself as:
+    $ gem install fossilize
+## Licensing
+Fossilized is distributed under the MIT License.
+Fossil (and the Fossil delta encoding algorithm included within) are distributed under the Simplified BSD License/FreeBSD License:
+    Copyright (c) 2006 D. Richard Hipp
+    This program is free software; you can redistribute it and/or
+    modify it under the terms of the Simplified BSD License (also
+    known as the "2-Clause License" or "FreeBSD License".)
+    This program is distributed in the hope that it will be useful,
+    but without any warranty; without even the implied warranty of
+    merchantability or fitness for a particular purpose.
+    Author contact information:
+       drh@hwaci.com
+       http://www.hwaci.com/drh/
+## Contributing
+1. Fork it
+2. Create your feature branch (`git checkout -b my-new-feature`)
+3. Commit your changes (`git commit -am 'Added some feature'`)
+4. Push to the branch (`git push origin my-new-feature`)
+5. Create new Pull Request

data/Rakefile ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ #!/usr/bin/env rake
2	+ require "bundler/gem_tasks"

data/bin/fossilize ADDED Viewed

@@ -0,0 +1,9 @@
+#!/usr/bin/env ruby
+require "fossilize"
+delta = Fossilize.create(ARGV[0], ARGV[1])
+output = Fossilize.apply(ARGV[0], delta)
+File.write("fossilized_#{ARGV[0]}", output)

data/ext/fossilize/extconf.rb ADDED Viewed

@@ -0,0 +1,7 @@
+require 'mkmf'
+dir_config 'fossilize'
+RbConfig::MAKEFILE_CONFIG['CC'] = 'gcc'
+RbConfig::MAKEFILE_CONFIG['CXX'] = 'g++'
+create_makefile 'fossilize/fossilize'

data/ext/fossilize/fossilize.c ADDED Viewed

@@ -0,0 +1,626 @@
+/*
+** Copyright (c) 2006 D. Richard Hipp
+**
+** This program is free software; you can redistribute it and/or
+** modify it under the terms of the Simplified BSD License (also
+** known as the "2-Clause License" or "FreeBSD License".)
+** This program is distributed in the hope that it will be useful,
+** but without any warranty; without even the implied warranty of
+** merchantability or fitness for a particular purpose.
+**
+** Author contact information:
+**   drh@hwaci.com
+**   http://www.hwaci.com/drh/
+**
+*******************************************************************************
+**
+** This module implements the delta compress algorithm.
+**
+** Though developed specifically for fossil, the code in this file
+** is generally appliable and is thus easily separated from the
+** fossil source code base.  Nothing in this file depends on anything
+** else in fossil.
+*/
+#include <stdio.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <string.h>
+/*
+** Macros for turning debugging printfs on and off
+*/
+#if 0
+# define DEBUG1(X) X
+#else
+# define DEBUG1(X)
+#endif
+#if 0
+#define DEBUG2(X) X
+/*
+** For debugging:
+** Print 16 characters of text from zBuf
+*/
+static const char *print16(const char *z){
+  int i;
+  static char zBuf[20];
+  for(i=0; i<16; i++){
+    if( z[i]>=0x20 && z[i]<=0x7e ){
+      zBuf[i] = z[i];
+    }else{
+      zBuf[i] = '.';
+    }
+  }
+  zBuf[i] = 0;
+  return zBuf;
+}
+#else
+# define DEBUG2(X)
+#endif
+// #if INTERFACE
+/*
+** The "u32" type must be an unsigned 32-bit integer.  Adjust this
+*/
+typedef unsigned int u32;
+/*
+** Must be a 16-bit value
+*/
+typedef short int s16;
+typedef unsigned short int u16;
+// #endif /* INTERFACE */
+/*
+** The width of a hash window in bytes.  The algorithm only works if this
+** is a power of 2.
+*/
+#define NHASH 16
+/*
+** The current state of the rolling hash.
+**
+** z[] holds the values that have been hashed.  z[] is a circular buffer.
+** z[i] is the first entry and z[(i+NHASH-1)%NHASH] is the last entry of
+** the window.
+**
+** Hash.a is the sum of all elements of hash.z[].  Hash.b is a weighted
+** sum.  Hash.b is z[i]*NHASH + z[i+1]*(NHASH-1) + ... + z[i+NHASH-1]*1.
+** (Each index for z[] should be module NHASH, of course.  The %NHASH operator
+** is omitted in the prior expression for brevity.)
+*/
+typedef struct hash hash;
+struct hash {
+  u16 a, b;         /* Hash values */
+  u16 i;            /* Start of the hash window */
+  char z[NHASH];    /* The values that have been hashed */
+};
+/*
+** Malloc and free routines that cannot fail
+*/
+void *fossil_malloc(size_t n){
+  void *p = malloc(n==0 ? 1 : n);
+  if( p==0 ) printf("out of memory");
+  return p;
+}
+void fossil_free(void *p){
+  free(p);
+}
+void *fossil_realloc(void *p, size_t n){
+  p = realloc(p, n);
+  if( p==0 ) printf("out of memory");
+  return p;
+}
+/*
+** Initialize the rolling hash using the first NHASH characters of z[]
+*/
+static void hash_init(hash *pHash, const char *z){
+  u16 a, b, i;
+  a = b = 0;
+  for(i=0; i<NHASH; i++){
+    a += z[i];
+    b += (NHASH-i)*z[i];
+    pHash->z[i] = z[i];
+  }
+  pHash->a = a & 0xffff;
+  pHash->b = b & 0xffff;
+  pHash->i = 0;
+}
+/*
+** Advance the rolling hash by a single character "c"
+*/
+static void hash_next(hash *pHash, int c){
+  u16 old = pHash->z[pHash->i];
+  pHash->z[pHash->i] = c;
+  pHash->i = (pHash->i+1)&(NHASH-1);
+  pHash->a = pHash->a - old + c;
+  pHash->b = pHash->b - NHASH*old + pHash->a;
+}
+/*
+** Return a 32-bit hash value
+*/
+static u32 hash_32bit(hash *pHash){
+  return (pHash->a & 0xffff) | (((u32)(pHash->b & 0xffff))<<16);
+}
+/*
+** Write an base-64 integer into the given buffer.
+*/
+static void putInt(unsigned int v, char **pz){
+  static const char zDigits[] =
+    "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~";
+  /*  123456789 123456789 123456789 123456789 123456789 123456789 123 */
+  int i, j;
+  char zBuf[20];
+  if( v==0 ){
+    *(*pz)++ = '0';
+    return;
+  }
+  for(i=0; v>0; i++, v>>=6){
+    zBuf[i] = zDigits[v&0x3f];
+  }
+  for(j=i-1; j>=0; j--){
+    *(*pz)++ = zBuf[j];
+  }
+}
+/*
+** Read bytes from *pz and convert them into a positive integer.  When
+** finished, leave *pz pointing to the first character past the end of
+** the integer.  The *pLen parameter holds the length of the string
+** in *pz and is decremented once for each character in the integer.
+*/
+static unsigned int getInt(const char **pz, int *pLen){
+  static const signed char zValue[] = {
+    -1, -1, -1, -1, -1, -1, -1, -1,   -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, -1, -1, -1, -1, -1, -1, -1,   -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, -1, -1, -1, -1, -1, -1, -1,   -1, -1, -1, -1, -1, -1, -1, -1,
+     0,  1,  2,  3,  4,  5,  6,  7,    8,  9, -1, -1, -1, -1, -1, -1,
+    -1, 10, 11, 12, 13, 14, 15, 16,   17, 18, 19, 20, 21, 22, 23, 24,
+    25, 26, 27, 28, 29, 30, 31, 32,   33, 34, 35, -1, -1, -1, -1, 36,
+    -1, 37, 38, 39, 40, 41, 42, 43,   44, 45, 46, 47, 48, 49, 50, 51,
+    52, 53, 54, 55, 56, 57, 58, 59,   60, 61, 62, -1, -1, -1, 63, -1,
+  };
+  unsigned int v = 0;
+  int c;
+  unsigned char *z = (unsigned char*)*pz;
+  unsigned char *zStart = z;
+  while( (c = zValue[0x7f&*(z++)])>=0 ){
+     v = (v<<6) + c;
+  }
+  z--;
+  *pLen -= z - zStart;
+  *pz = (char*)z;
+  return v;
+}
+/*
+** Return the number digits in the base-64 representation of a positive integer
+*/
+static int digit_count(int v){
+  int i, x;
+  for(i=1, x=64; v>=x; i++, x <<= 6){}
+  return i;
+}
+/*
+** Compute a 32-bit checksum on the N-byte buffer.  Return the result.
+*/
+static unsigned int checksum(const char *zIn, size_t N){
+  const unsigned char *z = (const unsigned char *)zIn;
+  unsigned sum0 = 0;
+  unsigned sum1 = 0;
+  unsigned sum2 = 0;
+  unsigned sum3 = 0;
+  while(N >= 16){
+    sum0 += ((unsigned)z[0] + z[4] + z[8] + z[12]);
+    sum1 += ((unsigned)z[1] + z[5] + z[9] + z[13]);
+    sum2 += ((unsigned)z[2] + z[6] + z[10]+ z[14]);
+    sum3 += ((unsigned)z[3] + z[7] + z[11]+ z[15]);
+    z += 16;
+    N -= 16;
+  }
+  while(N >= 4){
+    sum0 += z[0];
+    sum1 += z[1];
+    sum2 += z[2];
+    sum3 += z[3];
+    z += 4;
+    N -= 4;
+  }
+  sum3 += (sum2 << 8) + (sum1 << 16) + (sum0 << 24);
+  switch(N){
+    case 3:   sum3 += (z[2] << 8);
+    case 2:   sum3 += (z[1] << 16);
+    case 1:   sum3 += (z[0] << 24);
+    default:  ;
+  }
+  return sum3;
+}
+/*
+** Create a new delta.
+**
+** The delta is written into a preallocated buffer, zDelta, which
+** should be at least 60 bytes longer than the target file, zOut.
+** The delta string will be NUL-terminated, but it might also contain
+** embedded NUL characters if either the zSrc or zOut files are
+** binary.  This function returns the length of the delta string
+** in bytes, excluding the final NUL terminator character.
+**
+** Output Format:
+**
+** The delta begins with a base64 number followed by a newline.  This
+** number is the number of bytes in the TARGET file.  Thus, given a
+** delta file z, a program can compute the size of the output file
+** simply by reading the first line and decoding the base-64 number
+** found there.  The delta_output_size() routine does exactly this.
+**
+** After the initial size number, the delta consists of a series of
+** literal text segments and commands to copy from the SOURCE file.
+** A copy command looks like this:
+**
+**     NNN@MMM,
+**
+** where NNN is the number of bytes to be copied and MMM is the offset
+** into the source file of the first byte (both base-64).   If NNN is 0
+** it means copy the rest of the input file.  Literal text is like this:
+**
+**     NNN:TTTTT
+**
+** where NNN is the number of bytes of text (base-64) and TTTTT is the text.
+**
+** The last term is of the form
+**
+**     NNN;
+**
+** In this case, NNN is a 32-bit bigendian checksum of the output file
+** that can be used to verify that the delta applied correctly.  All
+** numbers are in base-64.
+**
+** Pure text files generate a pure text delta.  Binary files generate a
+** delta that may contain some binary data.
+**
+** Algorithm:
+**
+** The encoder first builds a hash table to help it find matching
+** patterns in the source file.  16-byte chunks of the source file
+** sampled at evenly spaced intervals are used to populate the hash
+** table.
+**
+** Next we begin scanning the target file using a sliding 16-byte
+** window.  The hash of the 16-byte window in the target is used to
+** search for a matching section in the source file.  When a match
+** is found, a copy command is added to the delta.  An effort is
+** made to extend the matching section to regions that come before
+** and after the 16-byte hash window.  A copy command is only issued
+** if the result would use less space that just quoting the text
+** literally. Literal text is added to the delta for sections that
+** do not match or which can not be encoded efficiently using copy
+** commands.
+*/
+int delta_create(
+  const char *zSrc,      /* The source or pattern file */
+  unsigned int lenSrc,   /* Length of the source file */
+  const char *zOut,      /* The target file */
+  unsigned int lenOut,   /* Length of the target file */
+  char *zDelta           /* Write the delta into this buffer */
+){
+  int i, base;
+  char *zOrigDelta = zDelta;
+  hash h;
+  int nHash;                 /* Number of hash table entries */
+  int *landmark;             /* Primary hash table */
+  int *collide;              /* Collision chain */
+  int lastRead = -1;         /* Last byte of zSrc read by a COPY command */
+  /* Add the target file size to the beginning of the delta
+  */
+  putInt(lenOut, &zDelta);
+  *(zDelta++) = '\n';
+  /* If the source file is very small, it means that we have no
+  ** chance of ever doing a copy command.  Just output a single
+  ** literal segment for the entire target and exit.
+  */
+  if( lenSrc<=NHASH ){
+    putInt(lenOut, &zDelta);
+    *(zDelta++) = ':';
+    memcpy(zDelta, zOut, lenOut);
+    zDelta += lenOut;
+    putInt(checksum(zOut, lenOut), &zDelta);
+    *(zDelta++) = ';';
+    return zDelta - zOrigDelta;
+  }
+  /* Compute the hash table used to locate matching sections in the
+  ** source file.
+  */
+  nHash = lenSrc/NHASH;
+  collide = (int*)fossil_malloc( nHash*2*sizeof(int) );
+  landmark = &collide[nHash];
+  memset(landmark, -1, nHash*sizeof(int));
+  memset(collide, -1, nHash*sizeof(int));
+  for(i=0; i<lenSrc-NHASH; i+=NHASH){
+    int hv;
+    hash_init(&h, &zSrc[i]);
+    hv = hash_32bit(&h) % nHash;
+    collide[i/NHASH] = landmark[hv];
+    landmark[hv] = i/NHASH;
+  }
+  /* Begin scanning the target file and generating copy commands and
+  ** literal sections of the delta.
+  */
+  base = 0;    /* We have already generated everything before zOut[base] */
+  while( base+NHASH<lenOut ){
+    int iSrc, iBlock;
+    unsigned int bestCnt, bestOfst=0, bestLitsz=0;
+    hash_init(&h, &zOut[base]);
+    i = 0;     /* Trying to match a landmark against zOut[base+i] */
+    bestCnt = 0;
+    while( 1 ){
+      int hv;
+      int limit = 250;
+      hv = hash_32bit(&h) % nHash;
+      DEBUG2( printf("LOOKING: %4d [%s]\n", base+i, print16(&zOut[base+i])); )
+      iBlock = landmark[hv];
+      while( iBlock>=0 && (limit--)>0 ){
+        /*
+        ** The hash window has identified a potential match against
+        ** landmark block iBlock.  But we need to investigate further.
+        **
+        ** Look for a region in zOut that matches zSrc. Anchor the search
+        ** at zSrc[iSrc] and zOut[base+i].  Do not include anything prior to
+        ** zOut[base] or after zOut[outLen] nor anything after zSrc[srcLen].
+        **
+        ** Set cnt equal to the length of the match and set ofst so that
+        ** zSrc[ofst] is the first element of the match.  litsz is the number
+        ** of characters between zOut[base] and the beginning of the match.
+        ** sz will be the overhead (in bytes) needed to encode the copy
+        ** command.  Only generate copy command if the overhead of the
+        ** copy command is less than the amount of literal text to be copied.
+        */
+        int cnt, ofst, litsz;
+        int j, k, x, y;
+        int sz;
+        /* Beginning at iSrc, match forwards as far as we can.  j counts
+        ** the number of characters that match */
+        iSrc = iBlock*NHASH;
+        for(j=0, x=iSrc, y=base+i; x<lenSrc && y<lenOut; j++, x++, y++){
+          if( zSrc[x]!=zOut[y] ) break;
+        }
+        j--;
+        /* Beginning at iSrc-1, match backwards as far as we can.  k counts
+        ** the number of characters that match */
+        for(k=1; k<iSrc && k<=i; k++){
+          if( zSrc[iSrc-k]!=zOut[base+i-k] ) break;
+        }
+        k--;
+        /* Compute the offset and size of the matching region */
+        ofst = iSrc-k;
+        cnt = j+k+1;
+        litsz = i-k;  /* Number of bytes of literal text before the copy */
+        DEBUG2( printf("MATCH %d bytes at %d: [%s] litsz=%d\n",
+                        cnt, ofst, print16(&zSrc[ofst]), litsz); )
+        /* sz will hold the number of bytes needed to encode the "insert"
+        ** command and the copy command, not counting the "insert" text */
+        sz = digit_count(i-k)+digit_count(cnt)+digit_count(ofst)+3;
+        if( cnt>=sz && cnt>bestCnt ){
+          /* Remember this match only if it is the best so far and it
+          ** does not increase the file size */
+          bestCnt = cnt;
+          bestOfst = iSrc-k;
+          bestLitsz = litsz;
+          DEBUG2( printf("... BEST SO FAR\n"); )
+        }
+        /* Check the next matching block */
+        iBlock = collide[iBlock];
+      }
+      /* We have a copy command that does not cause the delta to be larger
+      ** than a literal insert.  So add the copy command to the delta.
+      */
+      if( bestCnt>0 ){
+        if( bestLitsz>0 ){
+          /* Add an insert command before the copy */
+          putInt(bestLitsz,&zDelta);
+          *(zDelta++) = ':';
+          memcpy(zDelta, &zOut[base], bestLitsz);
+          zDelta += bestLitsz;
+          base += bestLitsz;
+          DEBUG2( printf("insert %d\n", bestLitsz); )
+        }
+        base += bestCnt;
+        putInt(bestCnt, &zDelta);
+        *(zDelta++) = '@';
+        putInt(bestOfst, &zDelta);
+        DEBUG2( printf("copy %d bytes from %d\n", bestCnt, bestOfst); )
+        *(zDelta++) = ',';
+        if( bestOfst + bestCnt -1 > lastRead ){
+          lastRead = bestOfst + bestCnt - 1;
+          DEBUG2( printf("lastRead becomes %d\n", lastRead); )
+        }
+        bestCnt = 0;
+        break;
+      }
+      /* If we reach this point, it means no match is found so far */
+      if( base+i+NHASH>=lenOut ){
+        /* We have reached the end of the file and have not found any
+        ** matches.  Do an "insert" for everything that does not match */
+        putInt(lenOut-base, &zDelta);
+        *(zDelta++) = ':';
+        memcpy(zDelta, &zOut[base], lenOut-base);
+        zDelta += lenOut-base;
+        base = lenOut;
+        break;
+      }
+      /* Advance the hash by one character.  Keep looking for a match */
+      hash_next(&h, zOut[base+i+NHASH]);
+      i++;
+    }
+  }
+  /* Output a final "insert" record to get all the text at the end of
+  ** the file that does not match anything in the source file.
+  */
+  if( base<lenOut ){
+    putInt(lenOut-base, &zDelta);
+    *(zDelta++) = ':';
+    memcpy(zDelta, &zOut[base], lenOut-base);
+    zDelta += lenOut-base;
+  }
+  /* Output the final checksum record. */
+  putInt(checksum(zOut, lenOut), &zDelta);
+  *(zDelta++) = ';';
+  free(collide);
+  return zDelta - zOrigDelta;
+}
+/*
+** Return the size (in bytes) of the output from applying
+** a delta.
+**
+** This routine is provided so that an procedure that is able
+** to call delta_apply() can learn how much space is required
+** for the output and hence allocate nor more space that is really
+** needed.
+*/
+int delta_output_size(const char *zDelta, int lenDelta){
+  int size;
+  size = getInt(&zDelta, &lenDelta);
+  if( *zDelta!='\n' ){
+    /* ERROR: size integer not terminated by "\n" */
+    return -1;
+  }
+  return size;
+}
+/*
+** Apply a delta.
+**
+** The output buffer should be big enough to hold the whole output
+** file and a NUL terminator at the end.  The delta_output_size()
+** routine will determine this size for you.
+**
+** The delta string should be null-terminated.  But the delta string
+** may contain embedded NUL characters (if the input and output are
+** binary files) so we also have to pass in the length of the delta in
+** the lenDelta parameter.
+**
+** This function returns the size of the output file in bytes (excluding
+** the final NUL terminator character).  Except, if the delta string is
+** malformed or intended for use with a source file other than zSrc,
+** then this routine returns -1.
+**
+** Refer to the delta_create() documentation above for a description
+** of the delta file format.
+*/
+int delta_apply(
+  const char *zSrc,      /* The source or pattern file */
+  int lenSrc,            /* Length of the source file */
+  const char *zDelta,    /* Delta to apply to the pattern */
+  int lenDelta,          /* Length of the delta */
+  char *zOut             /* Write the output into this preallocated buffer */
+){
+  unsigned int limit;
+  unsigned int total = 0;
+#ifndef FOSSIL_OMIT_DELTA_CKSUM_TEST
+  char *zOrigOut = zOut;
+#endif
+  limit = getInt(&zDelta, &lenDelta);
+  if( *zDelta!='\n' ){
+    /* ERROR: size integer not terminated by "\n" */
+    printf("ERROR: size integer not terminated by ""\/n""\n");
+    return -1;
+  }
+  zDelta++; lenDelta--;
+  while( *zDelta && lenDelta>0 ){
+    unsigned int cnt, ofst;
+    cnt = getInt(&zDelta, &lenDelta);
+    switch( zDelta[0] ){
+      case '@': {
+        zDelta++; lenDelta--;
+        ofst = getInt(&zDelta, &lenDelta);
+        if( lenDelta>0 && zDelta[0]!=',' ){
+          /* ERROR: copy command not terminated by ',' */
+          printf("ERROR: copy command not terminated by ','\n");
+          return -1;
+        }
+        zDelta++; lenDelta--;
+        DEBUG1( printf("COPY %d from %d\n", cnt, ofst); )
+        total += cnt;
+        if( total>limit ){
+          /* ERROR: copy exceeds output file size */
+          printf("ERROR: copy exceeds output file size\n");
+          return -1;
+        }
+        if( ofst+cnt > lenSrc ){
+          /* ERROR: copy extends past end of input */
+          printf("ERROR: copy extends past end of input: ofst: %d, cnt: %d, lenSrc: %d \n", ofst, cnt, lenSrc);
+          return -1;
+        }
+        memcpy(zOut, &zSrc[ofst], cnt);
+        zOut += cnt;
+        break;
+      }
+      case ':': {
+        zDelta++; lenDelta--;
+        total += cnt;
+        if( total>limit ){
+          /* ERROR:  insert command gives an output larger than predicted */
+          printf("ERROR:  insert command gives an output larger than predicted\n");
+          return -1;
+        }
+        DEBUG1( printf("INSERT %d\n", cnt); )
+        if( cnt>lenDelta ){
+          /* ERROR: insert count exceeds size of delta */
+          printf("ERROR: insert count exceeds size of delta\n");
+          return -1;
+        }
+        memcpy(zOut, zDelta, cnt);
+        zOut += cnt;
+        zDelta += cnt;
+        lenDelta -= cnt;
+        break;
+      }
+      case ';': {
+        zDelta++; lenDelta--;
+        zOut[0] = 0;
+#ifndef FOSSIL_OMIT_DELTA_CKSUM_TEST
+        if( cnt!=checksum(zOrigOut, total) ){
+          /* ERROR:  bad checksum */
+          return -1;
+        }
+#endif
+        if( total!=limit ){
+          /* ERROR: generated size does not match predicted size */
+          printf("ERROR: generated size does not match predicted size\n");
+          return -1;
+        }
+        return total;
+      }
+      default: {
+        /* ERROR: unknown delta operator */
+        printf("ERROR: unknown delta operator\n");
+        return -1;
+      }
+    }
+  }
+  /* ERROR: unterminated delta */
+  printf("ERROR: unterminated delta\n");
+  return -1;
+}

data/fossilize.gemspec ADDED Viewed

@@ -0,0 +1,27 @@
+# -*- encoding: utf-8 -*-
+require File.expand_path('../lib/fossilize/version', __FILE__)
+Gem::Specification.new do |gem|
+  gem.authors       = ["Mark Anthony Gibbins"]
+  gem.email         = ["xiy3x0@gmail.com"]
+  gem.description   = %q{A ruby extension to the Fossil delta compression algorithm written
+                      by D. Richard Hipp for the Fossil SCM project.}
+  gem.summary       = %q{Delta compression for Ruby using the Fossil delta
+                      compression algorithm.}
+  gem.homepage      = "http://github.com/xiy/fossilize"
+  gem.files         = `git ls-files`.split($\)
+  gem.executables   = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
+  gem.test_files    = gem.files.grep(%r{^(test|spec|features)/})
+  gem.name          = "fossilize"
+  gem.require_paths = ["lib"]
+  gem.version       = Fossilize::VERSION
+  gem.extensions = ['ext/fossilize/extconf.rb']
+  gem.add_runtime_dependency('ffi')
+  gem.add_runtime_dependency('digest-crc')
+  gem.add_development_dependency('rspec', '2.8.0')
+  gem.add_development_dependency('tomdoc')
+end

data/lib/fossilize/delta.rb ADDED Viewed

@@ -0,0 +1,13 @@
+require 'digest-crc/crc32'
+module Fossilize
+  module Delta
+    attr_accessor :origin, :target, :chunks
+    NHASH = 16
+    def self.process_origin(origin)
+    end
+  end
+end

data/lib/fossilize/ring_buffer.rb ADDED Viewed

@@ -0,0 +1,27 @@
+module Fossilize
+  class RingBuffer < Array
+    alias_method :array_push, :push
+    alias_method :array_element, :[]
+    def initialize(size)
+      @ring_size = size
+      super(size)
+    end
+    def push(element)
+      if length == @ring_size
+        shift # loose element
+      end
+      array_push element
+    end
+    # Access elements in the RingBuffer
+    #
+    # offset will be typically negative!
+    #
+    def [](offset = 0)
+      return self.array_element(- 1 + offset)
+    end
+  end
+end

data/lib/fossilize/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module Fossilize
+  VERSION = "1.0.0"
+end

data/lib/fossilize.rb ADDED Viewed

@@ -0,0 +1,161 @@
+require "fossilize/version"
+require "ffi"
+# Accesss
+#   Public Module
+#
+# Summary
+#   Provides an interface through Ruby-FFI to the delta encoding algorithm
+#   written for the Fossil SCM project by D. Richard Hipp. All methods are module methods.
+#
+# Examples
+#
+#   delta = Fossilize.create(file1, file2)
+#   output = Fossilize.apply(file1, delta)
+#
+module Fossilize
+  extend FFI::Library
+  ffi_lib File.expand_path("../../ext/fossilize/fossilize.#{RbConfig::CONFIG['DLEXT']}", __FILE__)
+  attach_function :delta_create, [:pointer, :int, :pointer, :int, :pointer], :int
+  attach_function :delta_output_size, [:pointer, :int], :int
+  attach_function :delta_apply, [:pointer, :int, :pointer, :int, :pointer], :int
+  # Access
+  #   Public Module Method
+  #
+  # Summary
+  #   Creates a delta of two strings using the Fossil delta encoding algorithm.
+  #
+  # Parameters
+  #   old - The old string.
+  #   new - The new string to create the delta from.
+  #
+  # Returns a String that represents the deltaed differences between the two Strings.
+  #
+  # Examples
+  #
+  #   # Create the delta between two strings
+  #   Fossilize.create("Hello World!", "Hello Everyone!")
+  #
+  #   # Create the delta between two files (note the passing of a File object)
+  #   source = File.new("README.md", "r")
+  #   target = File.new("README_new.md", "r")
+  #   Fossilize.create(source, target)
+  #
+  #   # You can also create a delta between a file and a string (the arguments are interchangeable)
+  #   Fossilize(source, "This is the new README for Fossilize!")
+  #
+  def self.create(source, target)
+    # Because this method can accept three different types of parameter (path, String or File)
+    # we need to do a sanity check on the input parameters.
+    source_string = check_input(source)
+    target_string = check_input(target)
+    # Create native memory buffers for the input Strings.
+    source_ptr = FFI::MemoryPointer.new(:char, source_string.size)
+    source_ptr.put_bytes(0, source_string)
+    target_ptr = FFI::MemoryPointer.new(:char, target_string.size)
+    target_ptr.put_bytes(0, target_string)
+    # Create a bare string to hold to returning delta from the C function that's the
+    # size of the target + 60 (according to the Fossil source docs).
+    delta = (' ' * (target_string.size + 60))
+    # create the delta, retaining the size of the delta output and return the delta,
+    # stripping out any excess left over (needs refinement...).
+    delta_size = delta_create(source_ptr, source_ptr.size, target_ptr, target_ptr.size, delta)
+    return delta.strip!
+  end
+  # Access
+  #   Public Module Method
+  #
+  # Summary
+  #   Applies a delta string to another string.
+  #
+  # Parameters
+  #   source - The old string to apply the delta string to.
+  #   delta - The delta string created using *create*.
+  #
+  # Returns a new unified string created by applying the delta to the source
+  # if successful. The algorithm returns -1 as the output_size if the delta was
+  # not created from the given source or is malformed. In this case, this method returns nil.
+  #
+  # Examples
+  #
+  #   # original is a String or File object
+  #   result = Fossilize.apply(original, delta)
+  #
+  def self.apply(source, delta)
+    # Check the input types first
+    source_string = check_input(source)
+    delta_string = check_input(delta)
+    # Get the eventual size of the deltaed file and create a string to hold it
+    expected_output_size = delta_output_size(delta_string, delta_string.size)
+    puts "expected = #{expected_output_size}"
+    # The algorithm will return -1 as the output size if there was an error
+    if expected_output_size == -1
+      raise MalformedDeltaError, "Was this delta intended for this string/file?"
+      return nil
+    end
+    # Create an empty string that is at-least the output_size given by *delta_output_size*
+    output = "\0" * expected_output_size
+    # Apply the delta to the old file to produce the merged result
+    output_size = delta_apply(source_string, source_string.size, delta_string, delta_string.size, output)
+    if output_size != expected_output_size
+      raise DeltaApplicationError,
+      "Output was #{output_size}, but I expected #{expected_output_size}!"
+      return nil
+    end
+    return output.strip!
+  end
+  private
+  # Access
+  #   Private Module Method
+  #
+  # Summary
+  #   checks that the input given to the delta methods is sane, i.e. can it be passed
+  #   to the C-extension without any problems? The algorithm itself expects a String no matter what.
+  #
+  # Parameters
+  #   input - The input to perform the sanity check on.
+  #
+  # Returns a String as read from a File object (either through a path or a File object) or a
+  # direct String as passed to the calling method.
+  def self.check_input(input)
+    raise ArgumentError, "source or target input was nil!" if input.nil?
+    raise ArgumentError,
+      "Only Strings (including file paths) and File objects can be used to create deltas." if
+      (!input.instance_of? String and !input.instance_of? File)
+    # Now we know the input is a valid type, we need to check exactly what type it is.
+    # We know straight away if the input is a File object, just read from it.
+    if input.instance_of? File
+      input_string = input.read
+    elsif input.instance_of? String
+      # We can determine quickly if the input is a path to a file or merely a String object
+      # by checking if it actually exists. Sneaky.
+      if File.exists?(input)
+        input_string = File.read(input)
+      else
+        return input
+      end
+    end
+    return input_string
+  end
+end
+class MalformedDeltaError < StandardError; end
+class DeltaApplicationError < StandardError; end

data/spec/fossilize_spec.rb ADDED Viewed

@@ -0,0 +1,35 @@
+require "spec_helper"
+require "fossilize"
+describe Fossilize do
+  describe "#create" do
+    it "should return a valid delta string from two String arguements" do
+      Fossilize.create("Test", "Test String").should == "B\nB:Test String3U9pwb;"
+    end
+    it "should return a valid delta string from two File arguments" do
+      f1 = File.open('Gemfile')
+      f2 = File.open('Rakefile')
+      Fossilize.create(f1, f2).should ==
+        "l\nl:#!/usr/bin/env rake\nrequire \"bundler/gem_tasks\"\nhrAtN;"
+    end
+    it "should return a valid delta when the arguments contain a file path" do
+      puts Fossilize.create('README.md', 'ext/fossil_delta/extconf.rb')
+    end
+    it "should return a valid delta when the arguments are mixed" do
+      puts Fossilize.create('/Users/xiy/.zshrc', 'Is this in the README?')
+    end
+  end
+  describe "#input_is_sane?" do
+    it "should raise an ArgumentError if the input is an invalid type" do
+      expect { Fossilize.check_input(1) }.to raise_error ArgumentError
+    end
+    it "should raise an ArgumentError if the input is nil" do
+      expect { Fossilize.check_input(nil) }.to raise_error ArgumentError
+    end
+  end
+end

data/spec/spec_helper.rb ADDED Viewed

@@ -0,0 +1,7 @@
+require 'rspec/expectations'
+RSpec.configure do |config|
+  config.treat_symbols_as_metadata_keys_with_true_values = true
+  config.run_all_when_everything_filtered = true
+  config.filter_run :focus
+end

metadata ADDED Viewed

@@ -0,0 +1,129 @@
+--- !ruby/object:Gem::Specification
+name: fossilize
+version: !ruby/object:Gem::Version
+  version: 1.0.0
+  prerelease:
+platform: ruby
+authors:
+- Mark Anthony Gibbins
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2012-10-15 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: ffi
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: digest-crc
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - '='
+      - !ruby/object:Gem::Version
+        version: 2.8.0
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - '='
+      - !ruby/object:Gem::Version
+        version: 2.8.0
+- !ruby/object:Gem::Dependency
+  name: tomdoc
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+description: ! "A ruby extension to the Fossil delta compression algorithm written\n
+  \                     by D. Richard Hipp for the Fossil SCM project."
+email:
+- xiy3x0@gmail.com
+executables:
+- fossilize
+extensions:
+- ext/fossilize/extconf.rb
+extra_rdoc_files: []
+files:
+- .gitignore
+- Gemfile
+- LICENSE
+- README.md
+- Rakefile
+- bin/fossilize
+- ext/fossilize/extconf.rb
+- ext/fossilize/fossilize.c
+- fossilize.gemspec
+- lib/fossilize.rb
+- lib/fossilize/delta.rb
+- lib/fossilize/ring_buffer.rb
+- lib/fossilize/version.rb
+- spec/fossilize_spec.rb
+- spec/spec_helper.rb
+homepage: http://github.com/xiy/fossilize
+licenses: []
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 1.8.24
+signing_key:
+specification_version: 3
+summary: Delta compression for Ruby using the Fossil delta compression algorithm.
+test_files:
+- spec/fossilize_spec.rb
+- spec/spec_helper.rb