aurelian-ruby-ahocorasick 0.4.4 → 0.4.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. data/README.textile +25 -13
  2. data/ext/ruby-ahocorasick.c +3 -7
  3. metadata +1 -1
data/README.textile CHANGED
@@ -1,30 +1,36 @@
1
1
  h1. This is a work in progress.
2
2
 
3
- h3. Introduction
3
+
4
+ h2. Introduction
4
5
 
5
6
  This library is a "Ruby":http://ruby-lang.org extension, a wrapper around the "Aho-Corasick":http://en.wikipedia.org/wiki/Aho-Corasick_algorithm implementation in C, found in "Strmat":http://www.cs.ucdavis.edu/~gusfield/strmat.html package.
6
7
 
7
8
  The source code (ac.c and ac.h) was "adapted" from Strmat. In fact, I've changed only 3-4 lines of code from the original implementation so it will feat my needs: search needed to return the current position in the searched string.
8
9
 
9
- h3. Okay, so what's the idea?
10
+
11
+ h2. Okay, so what's the idea?
10
12
 
11
13
  Having a dictionary of known sentences (note: not *words*!), this kick ass algorithm can find individual patterns in an incoming stream of data. Kinda Fast.
12
14
 
13
15
  The algorithm has 2 stages: one where an internal tree in being build from the given dictionary leaving the search to the second step.
14
16
 
15
- h3. Okay, so where can I use this?
17
+
18
+ h2. Okay, so where can I use this?
16
19
 
17
20
  Well, you can do some crazy things with it, like, you can lookup for DNA patterns or maybe analyze network sequences (read: strange and maybe proprietary network protocols), or domestic stuff like building contextual links on your blog posts to enrich your users experience.
18
21
 
19
- h3. Okay, so how can I install it?
20
22
 
21
- h4. Rubygems - Development Version
23
+ h2. Okay, so how can I install it?
24
+
25
+
26
+ h3. Rubygems - Development Version
22
27
 
23
28
  <pre>
24
29
  gem install aurelian-ruby-ahocorasick --source=http://gems.github.com
25
30
  </pre>
26
31
 
27
- h4. Build it from source
32
+
33
+ h3. Build it from source
28
34
 
29
35
  <pre>
30
36
  $ git clone git://github.com/aurelian/ruby-ahocorasick.git
@@ -37,18 +43,21 @@ To build and install the gem on your machine (run with sudo if needed):
37
43
  $ rake install
38
44
  </pre>
39
45
 
40
- @rake -T@ should list other cool tasks, like @rake spec@ to run the rspec test.
46
+ @rake -T@ will list other cool tasks.
41
47
 
42
- h4. Rubygems - Stable Version
48
+
49
+ h3. Rubygems - Stable Version
43
50
 
44
51
  There's no stable version right now.
45
52
 
46
- h5. Notes
53
+
54
+ h4. Notes
47
55
 
48
56
  It's known to work / compile / install on Ubuntu 8.04 and Mac OS 10.4.*. It should work out of the box if you have gcc around.
49
57
  Unfortunately I don't have a Windows PC around nor required knowledge about Microsoft compliers.
50
58
 
51
- h3. Okay, so how do I use it?
59
+
60
+ h2. Okay, so how do I use it?
52
61
 
53
62
  <pre>
54
63
  require 'ahocorasick'
@@ -68,11 +77,13 @@ h3. Okay, so how do I use it?
68
77
 
69
78
  You can get some API reference on the "wiki":http://github.com/aurelian/ruby-ahocorasick/wikis.
70
79
 
71
- h3. Bugs? Suggestions? Ideas? Patches?
80
+
81
+ h2. Bugs? Suggestions? Ideas? Patches?
72
82
 
73
83
  For now, just use the email address.
74
84
 
75
- h3. Additional Reading
85
+
86
+ h2. Additional Reading
76
87
 
77
88
  Other suffix - tree implementations:
78
89
 
@@ -82,7 +93,8 @@ Other suffix - tree implementations:
82
93
  * "Keyword Prospector":http://latimes.rubyforge.org/keyword_prospector/rdoc/
83
94
  * "libstree":http://www.cl.cam.ac.uk/~cpk25/libstree/
84
95
 
85
- h3. License
96
+
97
+ h2. License
86
98
 
87
99
  (c) 2008 - Aurelian Oancea, < oancea at gmail dot com >
88
100
 
@@ -131,12 +131,11 @@ rb_kwt_make(VALUE self)
131
131
  static VALUE
132
132
  rb_kwt_find_all(int argc, VALUE *argv, VALUE self)
133
133
  {
134
- char * result; // itermediate result
135
134
  char * remain; // returned by ac_search, the remaing text to search
136
- int lgt, id, ends_at; // filled in by ac_search, the id, length and ends_at position
137
- int starts_at;
135
+ int lgt, id, ends_at, starts_at; // filled in by ac_search: the length of the result, the id, and starts_at/ends_at position
138
136
  VALUE v_result; // one result, as hash
139
137
  VALUE v_results; // all the results, an array
138
+
140
139
  VALUE v_search; // search string, function argument
141
140
  struct kwt_struct_data *kwt_data;
142
141
 
@@ -166,11 +165,8 @@ rb_kwt_find_all(int argc, VALUE *argv, VALUE self)
166
165
  rb_hash_aset( v_result, sym_id, INT2FIX(id) );
167
166
  rb_hash_aset( v_result, sym_starts_at, INT2FIX( ends_at - lgt - 1 ) );
168
167
  rb_hash_aset( v_result, sym_ends_at, INT2FIX( ends_at - 1 ) );
169
- result = (char*) malloc (sizeof(char)*lgt);
170
- sprintf( result, "%.*s", lgt, remain);
171
- rb_hash_aset( v_result, sym_value, rb_str_new(result, lgt) );
168
+ rb_hash_aset( v_result, sym_value, rb_str_new(remain, lgt) );
172
169
  rb_ary_push( v_results, v_result );
173
- free(result);
174
170
  }
175
171
  // reopen the tree
176
172
  kwt_data->is_frozen= 0;
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: aurelian-ruby-ahocorasick
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.4
4
+ version: 0.4.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Aurelian Oancea