RubyGems - jaro_winkler - Versions diffs - 1.3.2.beta → 1.3.2.beta2 - Mend

jaro_winkler 1.3.2.beta → 1.3.2.beta2

Files changed (17) hide show

checksums.yaml +4 -4
data/README.md +15 -4
data/Rakefile +1 -1
data/benchmark/native.txt +9 -9
data/ext/jaro_winkler/adj_matrix.c +25 -2
data/ext/jaro_winkler/adj_matrix.h +5 -4
data/ext/jaro_winkler/codepoints.c +29 -0
data/ext/jaro_winkler/codepoints.h +17 -0
data/ext/jaro_winkler/distance.c +12 -77
data/ext/jaro_winkler/distance.h +1 -1
data/ext/jaro_winkler/extconf.rb +1 -1
data/ext/jaro_winkler/jaro_winkler.c +5 -5
data/ext/jaro_winkler/murmur_hash2.c +64 -64
data/jaro_winkler.gemspec +4 -0
data/lib/jaro_winkler/version.rb +1 -1
data/spec/jaro_winkler_spec.rb +17 -16
metadata +61 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 9a90ee9e013479d3c47c7cd632646921f87af4fc
-  data.tar.gz: 30d065fc0728d3fda6db84af71584f7a60b038cb
+  metadata.gz: 6e3750b9af1a515ee78a01b77192fd9eb2697f56
+  data.tar.gz: c1473c5a327eda3fc5973935dca50f584399d5d0
 SHA512:
-  metadata.gz: d29510e81e2ab5510a85360321e77444363531a2786c5f5dd6213514c0f5d97232ac5d19b25eb0fbd3bc9972dc255e326b7f600c2254cdb2fe6ed7be20cd76e9
-  data.tar.gz: 5cbb9e3167a42f86ecd6b93dfd9fe21aca7e599e7faf492e9c8f59b109958b157faff0601f1b8078ee6b3a0e92165eda9cab36d6d8eb9d7c5935f3828da18d74
+  metadata.gz: 615204f1ab3906d01e44d92b685751004887c339bee5fc35633c26f928b2d7e3d07159085b7122683eba21b3c56cabfdbf28687a7d95aa3815445693a9a9979a
+  data.tar.gz: 63601a6358c4a600c3ef258f465b38944ce346e48720613342da7b258663fefc1d51df984e2bab97b245c6e0ef0ecfcaf76ff3ad5b3b0281a9bd748b4f7d3951

data/README.md CHANGED Viewed

@@ -33,7 +33,9 @@ weight      | number  | 0.1     | A constant scaling factor for how much the sco
 threshold   | number  | 0.7     | The prefix bonus is only added when the compared strings have a Jaro distance above the threshold.
 adj_table   | boolean | false   | The option is used to give partial credit for characters that may be errors due to known phonetic or character recognition errors. A typical example is to match the letter "O" with the number "0".
-## Default Adjusting Table
+# About Adjusting Table
+## Default Table
 ```
 ['A', 'E'], ['A', 'I'], ['A', 'O'], ['A', 'U'], ['B', 'V'], ['E', 'I'], ['E', 'O'], ['E', 'U'], ['I', 'O'], ['I', 'U'],
@@ -42,9 +44,9 @@ adj_table   | boolean | false   | The option is used to give partial credit for
 ['1', 'I'], ['1', 'L'], ['0', 'O'], ['0', 'Q'], ['C', 'K'], ['G', 'J'], ['E', ' '], ['Y', ' '], ['S', ' ']
 ```
-## How Adjusting Table Work
+## How it works?
-origin formula:
+Original Formula:
 ![origin](https://chart.googleapis.com/chart?cht=tx&chs&chl=%5Cbegin%7Bcases%7D0%26%7B%5Ctext%7Bif%20%7Dm%3D0%7D%5C%5C%5Cfrac%7B1%7D%7B3%7D(%5Cfrac%7Bm%7D%7B%5Cleft%7Cs1%5Cright%7C%7D%2B%5Cfrac%7Bm%7D%7B%5Cleft%7Cs2%5Cright%7C%7D%2B%5Cfrac%7Bm-t%7D%7Bm%7D)%26%5Ctext%7Bothers%7D%5Cend%7Bcases%7D)
@@ -53,7 +55,7 @@ where
 - `m` is the number of matching characters.
 - `t` is half the number of transpositions.
-with adjusting table:
+With Adjusting Table:
 ![adj](https://chart.googleapis.com/chart?cht=tx&chs&chl=%5Cbegin%7Bcases%7D0%26%5Ctext%7Bif%20%7Dm%3D0%5C%5C%5Cfrac%7B1%7D%7B3%7D(%5Cfrac%7B%5Cfrac%7Bs%7D%7B10%7D%2Bm%7D%7B%5Cleft%7Cs1%5Cright%7C%7D%2B%5Cfrac%7B%5Cfrac%7Bs%7D%7B10%7D%2Bm%7D%7B%5Cleft%7Cs2%5Cright%7C%7D%2B%5Cfrac%7Bm-t%7D%7Bm%7D)%26%5Ctext%7Bothers%7D%5Cend%7Bcases%7D)
@@ -61,6 +63,15 @@ where
 - `s` is the number of nonmatching but similar characters.
+## Difference Between v1.3.1 And v1.3.2.beta
+Version     | Algorithm
+----------- | -----------------------------------------------------------------------
+v1.3.1      | One linked list to store sparse matrix and iterate to find similar character.
+v1.3.2.beta | One hash table with multiple linked lists for collision handling.
+In theory, the latter should work more efficient than the former (more test data needed).
 # Why This?
 There is also another similar gem named [fuzzy-string-match](https://github.com/kiyoka/fuzzy-string-match) which both provides C and Ruby version as well.

data/Rakefile CHANGED Viewed

@@ -27,7 +27,7 @@ task :compare do
   require 'fuzzystringmatch'
   require 'hotwater'
   require 'amatch'
-  @ary = [['henka', 'henkan'], ['al', 'al'], ['martha', 'marhta'], ['jones', 'johnson'], ['abcvwxyz', 'cabvwxyz'], ['dwayne', 'duane'], ['dixon', 'dicksonx'], ['fvie', 'ten']]
+  @ary = [['henka', 'henkan'], ['al', 'al'], ['martha', 'marhta'], ['jones', 'johnson'], ['abcvwxyz', 'cabvwxyz'], ['dwayne', 'duane'], ['dixon', 'dicksonx'], ['fvie', 'ten'], ['San Francisco', 'Santa Monica']]
   table = []
   table << %w[str_1 str_2 jaro_winkler fuzzystringmatch hotwater amatch]
   table << %w[--- --- --- --- --- ---]

data/benchmark/native.txt CHANGED Viewed

@@ -1,12 +1,12 @@
 Rehearsal ----------------------------------------------------
-jaro_winkler       0.350000   0.000000   0.350000 (  0.358591)
-fuzzystringmatch   0.360000   0.020000   0.380000 (  0.381666)
-hotwater           0.340000   0.000000   0.340000 (  0.337789)
-amatch             1.010000   0.000000   1.010000 (  1.010946)
-------------------------------------------- total: 2.080000sec
+jaro_winkler       0.350000   0.000000   0.350000 (  0.348383)
+fuzzystringmatch   0.330000   0.020000   0.350000 (  0.354850)
+hotwater           0.280000   0.000000   0.280000 (  0.278819)
+amatch             0.980000   0.000000   0.980000 (  0.983325)
+------------------------------------------- total: 1.960000sec
                        user     system      total        real
-jaro_winkler       0.350000   0.010000   0.360000 (  0.345293)
-fuzzystringmatch   0.140000   0.000000   0.140000 (  0.138711)
-hotwater           0.310000   0.000000   0.310000 (  0.306498)
-amatch             0.960000   0.000000   0.960000 (  0.961509)
+jaro_winkler       0.330000   0.000000   0.330000 (  0.331923)
+fuzzystringmatch   0.140000   0.000000   0.140000 (  0.135655)
+hotwater           0.280000   0.000000   0.280000 (  0.276728)
+amatch             0.930000   0.010000   0.940000 (  0.932943)

data/ext/jaro_winkler/adj_matrix.c CHANGED Viewed

@@ -1,8 +1,16 @@
 #include <stdlib.h>
 #include "adj_matrix.h"
+#include "codepoints.h"
+const char *DEFAULT_ADJ_TABLE[] = {
+  "A","E", "A","I", "A","O", "A","U", "B","V", "E","I", "E","O", "E","U", "I","O", "I","U", "O","U",
+  "I","Y", "E","Y", "C","G", "E","F", "W","U", "W","V", "X","K", "S","Z", "X","S", "Q","C", "U","V",
+  "M","N", "L","I", "Q","O", "P","R", "I","J", "2","Z", "5","S", "8","B", "1","I", "1","L", "0","O",
+  "0","Q", "C","K", "G","J", "E"," ", "Y"," ", "S"," "
+};
 extern unsigned int MurmurHash2(const void * key, int len, unsigned int seed);
-static void node_free(Node *head);
+inline void node_free(Node *head);
 AdjMatrix* adj_matrix_new(unsigned int length){
   AdjMatrix *matrix = malloc(sizeof(AdjMatrix));
@@ -42,7 +50,7 @@ char adj_matrix_find(AdjMatrix *matrix, unsigned long long x, unsigned long long
   }
 }
-static void node_free(Node *head){
+inline void node_free(Node *head){
   if(head == NULL) return;
   node_free(head->next);
   free(head);
@@ -59,4 +67,19 @@ void adj_matrix_free(AdjMatrix *matrix){
   }
   free(matrix->table);
   free(matrix);
+}
+AdjMatrix* adj_matrix_default(){
+  static char first_time = 1;
+  static AdjMatrix *ret_matrix;
+  if(first_time){
+    ret_matrix = adj_matrix_new(ADJ_MATRIX_DEFAULT_LENGTH);
+    int length = sizeof(DEFAULT_ADJ_TABLE) / sizeof(char*);
+    for(int i = 0; i < length; i += 2){
+      UnicodeHash h1 = unicode_hash_new(DEFAULT_ADJ_TABLE[i]), h2 = unicode_hash_new(DEFAULT_ADJ_TABLE[i + 1]);
+      adj_matrix_add(ret_matrix, h1.code, h2.code);
+    }
+    first_time = 0;
+  }
+  return ret_matrix;
 }

data/ext/jaro_winkler/adj_matrix.h CHANGED Viewed

@@ -13,9 +13,10 @@ typedef struct{
   unsigned int length;
 } AdjMatrix;
-AdjMatrix* adj_matrix_new(unsigned int length);
-void adj_matrix_add(AdjMatrix *matrix, unsigned long long x, unsigned long long y);
-char adj_matrix_find(AdjMatrix *matrix, unsigned long long x, unsigned long long y);
-void adj_matrix_free(AdjMatrix *matrix);
+AdjMatrix* adj_matrix_new    (unsigned int length);
+void       adj_matrix_add    (AdjMatrix *matrix, unsigned long long x, unsigned long long y);
+char       adj_matrix_find   (AdjMatrix *matrix, unsigned long long x, unsigned long long y);
+void       adj_matrix_free   (AdjMatrix *matrix);
+AdjMatrix* adj_matrix_default();
 #endif /* ADJ_MATRIX_H */

data/ext/jaro_winkler/codepoints.c ADDED Viewed

@@ -0,0 +1,29 @@
+#include <string.h>
+#include <stdlib.h>
+#include "codepoints.h"
+UnicodeHash unicode_hash_new(const char *str){
+  UnicodeHash ret = {};
+  unsigned char first_char = str[0];
+  if(first_char >= 252) ret.byte_length = 6;      // 1111110x
+  else if(first_char >= 248) ret.byte_length = 5; // 111110xx
+  else if(first_char >= 240) ret.byte_length = 4; // 11110xxx
+  else if(first_char >= 224) ret.byte_length = 3; // 1110xxxx
+  else if(first_char >= 192) ret.byte_length = 2; // 110xxxxx
+  else ret.byte_length = 1;
+  memcpy(&ret.code, str, ret.byte_length);
+  return ret;
+}
+Codepoints codepoints_new(const char *str, int byte_len){
+  Codepoints ret = {};
+  ret.ary = malloc(byte_len * sizeof(long long));
+  ret.length = 0;
+  for(int i = 0; i < byte_len;){
+    UnicodeHash hash = unicode_hash_new(str + i);
+    ret.ary[ret.length] = hash.code;
+    ret.length++;
+    i += hash.byte_length;
+  }
+  return ret;
+}

data/ext/jaro_winkler/codepoints.h ADDED Viewed

@@ -0,0 +1,17 @@
+#ifndef CODEPOINTS_H
+#define CODEPOINTS_H 1
+typedef struct{
+  unsigned long long code;
+  unsigned int byte_length;
+} UnicodeHash;
+typedef struct{
+  unsigned long long *ary;
+  int length;
+} Codepoints;
+UnicodeHash unicode_hash_new(const char *str);
+Codepoints  codepoints_new  (const char *str, int byte_len);
+#endif /* CODEPOINTS_H */

data/ext/jaro_winkler/distance.c CHANGED Viewed

@@ -1,30 +1,9 @@
-#include <string.h>
 #include <stdlib.h>
 #include <ctype.h>
 #include "distance.h"
+#include "codepoints.h"
 #include "adj_matrix.h"
-typedef struct{
-  unsigned long long code;
-  unsigned int byte_length;
-} UnicodeHash;
-typedef struct{
-  unsigned long long *ary;
-  int length;
-} Codepoints;
-const char *DEFAULT_ADJ_TABLE[] = {
-  "A","E", "A","I", "A","O", "A","U", "B","V", "E","I", "E","O", "E","U", "I","O", "I","U", "O","U",
-  "I","Y", "E","Y", "C","G", "E","F", "W","U", "W","V", "X","K", "S","Z", "X","S", "Q","C", "U","V",
-  "M","N", "L","I", "Q","O", "P","R", "I","J", "2","Z", "5","S", "8","B", "1","I", "1","L", "0","O",
-  "0","Q", "C","K", "G","J", "E"," ", "Y"," ", "S"," "
-};
-static UnicodeHash unicode_hash_new(const char *str);
-static Codepoints codepoints_new(const char *str, int byte_len);
-static AdjMatrix* adj_matrix_default();
 Option option_new(){
   Option opt;
   opt.ignore_case = opt.adj_table = 0;
@@ -33,12 +12,9 @@ Option option_new(){
   return opt;
 }
-double c_distance(char *s1, int s1_byte_len, char *s2, int s2_byte_len, Option opt){
-  // set default option if NULL passed
-  int free_opt_flag = 0;
-  Codepoints code_ary_1 = codepoints_new(s1, s1_byte_len);
-  Codepoints code_ary_2 = codepoints_new(s2, s2_byte_len);
+double distance(char *s1, int s1_byte_len, char *s2, int s2_byte_len, Option opt){
+  Codepoints code_ary_1 = codepoints_new(s1, s1_byte_len),
+             code_ary_2 = codepoints_new(s2, s2_byte_len);
   if(opt.ignore_case){
     for(int i = 0; i < code_ary_1.length; ++i) if(code_ary_1.ary[i] < 256 && islower(code_ary_1.ary[i])) code_ary_1.ary[i] -= 32;
@@ -54,19 +30,19 @@ double c_distance(char *s1, int s1_byte_len, char *s2, int s2_byte_len, Option o
   // Compute jaro distance
   int window_size = code_ary_2.length / 2 - 1;
   if(window_size < 0) window_size = 0;
-  double matches     = 0.0;
-  double sim_matches = 0.0;
-  int transpositions = 0;
-  int previous_index = -1;
-  int max_index      = code_ary_2.length - 1;
+  double matches     = 0.0,
+         sim_matches = 0.0;
+  int transpositions = 0,
+      previous_index = -1,
+      max_index      = code_ary_2.length - 1;
   for(int i = 0; i < code_ary_1.length; i++){
     int left  = i - window_size;
     int right = i + window_size;
     if(left  < 0) left = 0;
     if(right > max_index) right = max_index;
-    char matched     = 0;
-    char found       = 0;
-    char sim_matched = 0;
+    char matched     = 0,
+         found       = 0,
+         sim_matched = 0;
     for(int j = left; j <= right; j++){
       if(code_ary_1.ary[i] == code_ary_2.ary[j]){
         matched = 1;
@@ -97,45 +73,4 @@ double c_distance(char *s1, int s1_byte_len, char *s2, int s2_byte_len, Option o
   }
   free(code_ary_1.ary); free(code_ary_2.ary);
   return jaro_distance < threshold ? jaro_distance : jaro_distance + ((prefix * weight) * (1 - jaro_distance));
-}
-static UnicodeHash unicode_hash_new(const char *str){
-  UnicodeHash ret = {};
-  unsigned char first_char = str[0];
-  if(first_char >= 252) ret.byte_length = 6;      // 1111110x
-  else if(first_char >= 248) ret.byte_length = 5; // 111110xx
-  else if(first_char >= 240) ret.byte_length = 4; // 11110xxx
-  else if(first_char >= 224) ret.byte_length = 3; // 1110xxxx
-  else if(first_char >= 192) ret.byte_length = 2; // 110xxxxx
-  else ret.byte_length = 1;
-  memcpy(&ret.code, str, ret.byte_length);
-  return ret;
-}
-static Codepoints codepoints_new(const char *str, int byte_len){
-  Codepoints ret = {};
-  ret.ary = calloc(byte_len, sizeof(long long));
-  int count = 0;
-  for(int i = 0; i < byte_len;){
-    UnicodeHash hash = unicode_hash_new(str + i);
-    ret.ary[count] = hash.code;
-    count++;
-    i += hash.byte_length;
-  }
-  ret.length += count;
-  return ret;
-}
-static AdjMatrix* adj_matrix_default(){
-  static char first_time = 1;
-  static AdjMatrix *ret_matrix;
-  if(first_time){
-    ret_matrix = adj_matrix_new(ADJ_MATRIX_DEFAULT_LENGTH);
-    for(int i = 0; i < 78; i += 2){
-      UnicodeHash h1 = unicode_hash_new(DEFAULT_ADJ_TABLE[i]), h2 = unicode_hash_new(DEFAULT_ADJ_TABLE[i + 1]);
-      adj_matrix_add(ret_matrix, h1.code, h2.code);
-    }
-    first_time = 0;
-  }
-  return ret_matrix;
 }

data/ext/jaro_winkler/distance.h CHANGED Viewed

@@ -6,7 +6,7 @@ typedef struct{
   char ignore_case, adj_table;
 } Option;
-double c_distance(char *s1, int s1_byte_len, char *s2, int s2_byte_len, Option opt);
+double distance(char *s1, int s1_byte_len, char *s2, int s2_byte_len, Option opt);
 Option option_new();
 #endif /* DISTANCE_H */

data/ext/jaro_winkler/extconf.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 require "mkmf"
-$CFLAGS << ' -std=gnu99' if Gem.win_platform?
+$CFLAGS << ' -std=c99 '
 create_makefile("jaro_winkler/jaro_winkler")

data/ext/jaro_winkler/jaro_winkler.c CHANGED Viewed

@@ -13,15 +13,15 @@ VALUE rb_distance(int argc, VALUE *argv, VALUE self){
   rb_scan_args(argc, argv, "2:", &s1, &s2, &opt);
   Option c_opt = option_new();
   if(TYPE(opt) == T_HASH){
-    VALUE weight      = rb_hash_aref(opt, ID2SYM(rb_intern("weight")));
-    VALUE threshold   = rb_hash_aref(opt, ID2SYM(rb_intern("threshold")));
-    VALUE ignore_case = rb_hash_aref(opt, ID2SYM(rb_intern("ignore_case")));
-    VALUE adj_table   = rb_hash_aref(opt, ID2SYM(rb_intern("adj_table")));
+    VALUE weight      = rb_hash_aref(opt, ID2SYM(rb_intern("weight"))),
+          threshold   = rb_hash_aref(opt, ID2SYM(rb_intern("threshold"))),
+          ignore_case = rb_hash_aref(opt, ID2SYM(rb_intern("ignore_case"))),
+          adj_table   = rb_hash_aref(opt, ID2SYM(rb_intern("adj_table")));
     if(!NIL_P(weight)) c_opt.weight = NUM2DBL(weight);
     if(c_opt.weight > 0.25) rb_raise(rb_eRuntimeError, "Scaling factor should not exceed 0.25, otherwise the distance can become larger than 1.");
     if(!NIL_P(threshold)) c_opt.threshold     = NUM2DBL(threshold);
     if(!NIL_P(ignore_case)) c_opt.ignore_case = (TYPE(ignore_case)  == T_FALSE || NIL_P(ignore_case)) ? 0 : 1;
     if(!NIL_P(adj_table)) c_opt.adj_table     = (TYPE(adj_table)    == T_FALSE || NIL_P(adj_table))   ? 0 : 1;
   }
-  return rb_float_new(c_distance(StringValuePtr(s1), RSTRING_LEN(s1), StringValuePtr(s2), RSTRING_LEN(s2), c_opt));
+  return rb_float_new(distance(StringValuePtr(s1), RSTRING_LEN(s1), StringValuePtr(s2), RSTRING_LEN(s2), c_opt));
 }

data/ext/jaro_winkler/murmur_hash2.c CHANGED Viewed

@@ -1,64 +1,64 @@
-//-----------------------------------------------------------------------------
-// MurmurHash2, by Austin Appleby
-// Note - This code makes a few assumptions about how your machine behaves -
-// 1. We can read a 4-byte value from any address without crashing
-// 2. sizeof(int) == 4
-// And it has a few limitations -
-// 1. It will not work incrementally.
-// 2. It will not produce the same results on little-endian and big-endian
-//    machines.
-unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
-{
-	// 'm' and 'r' are mixing constants generated offline.
-	// They're not really 'magic', they just happen to work well.
-	const unsigned int m = 0x5bd1e995;
-	const int r = 24;
-	// Initialize the hash to a 'random' value
-	unsigned int h = seed ^ len;
-	// Mix 4 bytes at a time into the hash
-	const unsigned char * data = (const unsigned char *)key;
-	while(len >= 4)
-	{
-		unsigned int k = *(unsigned int *)data;
-		k *= m;
-		k ^= k >> r;
-		k *= m;
-		h *= m;
-		h ^= k;
-		data += 4;
-		len -= 4;
-	}
-	// Handle the last few bytes of the input array
-	switch(len)
-	{
-	case 3: h ^= data[2] << 16;
-	case 2: h ^= data[1] << 8;
-	case 1: h ^= data[0];
-	        h *= m;
-	};
-	// Do a few final mixes of the hash to ensure the last few
-	// bytes are well-incorporated.
-	h ^= h >> 13;
-	h *= m;
-	h ^= h >> 15;
-	return h;
-}
+//-----------------------------------------------------------------------------
+// MurmurHash2, by Austin Appleby
+// Note - This code makes a few assumptions about how your machine behaves -
+// 1. We can read a 4-byte value from any address without crashing
+// 2. sizeof(int) == 4
+// And it has a few limitations -
+// 1. It will not work incrementally.
+// 2. It will not produce the same results on little-endian and big-endian
+//    machines.
+unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
+{
+  // 'm' and 'r' are mixing constants generated offline.
+  // They're not really 'magic', they just happen to work well.
+  const unsigned int m = 0x5bd1e995;
+  const int r = 24;
+  // Initialize the hash to a 'random' value
+  unsigned int h = seed ^ len;
+  // Mix 4 bytes at a time into the hash
+  const unsigned char * data = (const unsigned char *)key;
+  while(len >= 4)
+  {
+    unsigned int k = *(unsigned int *)data;
+    k *= m;
+    k ^= k >> r;
+    k *= m;
+    h *= m;
+    h ^= k;
+    data += 4;
+    len -= 4;
+  }
+  // Handle the last few bytes of the input array
+  switch(len)
+  {
+  case 3: h ^= data[2] << 16;
+  case 2: h ^= data[1] << 8;
+  case 1: h ^= data[0];
+          h *= m;
+  };
+  // Do a few final mixes of the hash to ensure the last few
+  // bytes are well-incorporated.
+  h ^= h >> 13;
+  h *= m;
+  h ^= h >> 15;
+  return h;
+}

data/jaro_winkler.gemspec CHANGED Viewed

@@ -23,4 +23,8 @@ Gem::Specification.new do |spec|
   spec.add_development_dependency "bundler", "~> 1.7"
   spec.add_development_dependency "rake", "~> 10.0"
   spec.add_development_dependency "rake-compiler"
+  spec.add_development_dependency "rspec"
+  spec.add_development_dependency "fuzzy-string-match"
+  spec.add_development_dependency "hotwater"
+  spec.add_development_dependency "amatch"
 end

data/lib/jaro_winkler/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module JaroWinkler
-  VERSION = "1.3.2.beta"
+  VERSION = "1.3.2.beta2"
 end

data/spec/jaro_winkler_spec.rb CHANGED Viewed

@@ -5,22 +5,23 @@ include JaroWinkler
 shared_examples 'common' do |strategy|
   it 'works' do
     ary = [
-      ['henka'       , 'henkan'      , 0.9667] ,
-      ['al'          , 'al'          , 1.0]    ,
-      ['martha'      , 'marhta'      , 0.9611] ,
-      ['jones'       , 'johnson'     , 0.8323] ,
-      ['abcvwxyz'    , 'cabvwxyz'    , 0.9583] ,
-      ['dwayne'      , 'duane'       , 0.8400] ,
-      ['dixon'       , 'dicksonx'    , 0.8133] ,
-      ['fvie'        , 'ten'         , 0.0]    ,
-      ['tony'        , 'tony'        , 1.0]    ,
-      ['tonytonyjan' , 'tonytonyjan' , 1.0]    ,
-      ['x'           , 'x'           , 1.0]    ,
-      [''            , ''            , 0.0]    ,
-      ['tony'        , ''            , 0.0]    ,
-      [''            , 'tony'        , 0.0]    ,
-      ['tonytonyjan' , 'tony'        , 0.8727] ,
-      ['tony'        , 'tonytonyjan' , 0.8727]
+      ['henka'         , 'henkan'       , 0.9667] ,
+      ['al'            , 'al'           , 1.0]    ,
+      ['martha'        , 'marhta'       , 0.9611] ,
+      ['jones'         , 'johnson'      , 0.8323] ,
+      ['abcvwxyz'      , 'cabvwxyz'     , 0.9583] ,
+      ['dwayne'        , 'duane'        , 0.8400] ,
+      ['dixon'         , 'dicksonx'     , 0.8133] ,
+      ['fvie'          , 'ten'          , 0.0]    ,
+      ['tony'          , 'tony'         , 1.0]    ,
+      ['tonytonyjan'   , 'tonytonyjan'  , 1.0]    ,
+      ['x'             , 'x'            , 1.0]    ,
+      [''              , ''             , 0.0]    ,
+      ['tony'          , ''             , 0.0]    ,
+      [''              , 'tony'         , 0.0]    ,
+      ['tonytonyjan'   , 'tony'         , 0.8727] ,
+      ['tony'          , 'tonytonyjan'  , 0.8727] ,
+      ['San Francisco' , 'Santa Monica' , 0.8180]
     ]
     ary.each do |s1, s2, ans|
       expect(send(strategy, s1, s2)).to be_within(0.0001).of(ans)

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: jaro_winkler
 version: !ruby/object:Gem::Version
-  version: 1.3.2.beta
+  version: 1.3.2.beta2
 platform: ruby
 authors:
 - Jian Weihang
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-09-11 00:00:00.000000000 Z
+date: 2014-10-30 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -52,6 +52,62 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: fuzzy-string-match
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: hotwater
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: amatch
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 description: It's a implementation of Jaro-Winkler distance algorithm, it uses C extension
   and will fallback to pure Ruby version in JRuby. Both implementation supports UTF-8
   string.
@@ -74,6 +130,8 @@ files:
 - benchmark/pure.txt
 - ext/jaro_winkler/adj_matrix.c
 - ext/jaro_winkler/adj_matrix.h
+- ext/jaro_winkler/codepoints.c
+- ext/jaro_winkler/codepoints.h
 - ext/jaro_winkler/distance.c
 - ext/jaro_winkler/distance.h
 - ext/jaro_winkler/extconf.rb
@@ -107,7 +165,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: 1.3.1
 requirements: []
 rubyforge_project:
-rubygems_version: 2.4.1
+rubygems_version: 2.4.2
 signing_key:
 specification_version: 4
 summary: Ruby & C implementation of Jaro-Winkler distance algorithm which both support