tyler-trie 0.2.1 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.textile ADDED
@@ -0,0 +1,102 @@
1
+ h1. Trie
2
+
3
+ This is a Ruby binding for libdatrie, a dual-array trie implemented in C. It is a disk-based trie so the memory usage is minimal, but it's still quite fast.
4
+
5
+
6
+ h2. What is a trie?
7
+
8
+ I suck at explaining things. Wikipedia doesn't. http://wikipedia.org/wiki/Trie.
9
+
10
+ But in short a trie is a data structure that holds strings in a tree. So if you inserted the words 'arc', 'ark', and 'ape' in a trie you could visualize it thusly:
11
+
12
+ <pre>
13
+ p - e
14
+ /
15
+ a - r - c
16
+ \
17
+ k
18
+ </pre>
19
+
20
+ It's easy to see how this can have pretty neat implications for things like searching through lists of strings, sorting lists of strings, and things like spelling correction and autocompletion.
21
+
22
+
23
+ h2. Tutorial
24
+
25
+ Let's go through building a simple autocompleter using Trie. The very first thing you'll want to do is create a directory for your trie's data to be held in. Remember, this is a disk-based trie so having a place to store the files is important.
26
+
27
+ <pre><code>
28
+ Trie.new('your-directory')
29
+ </code></pre>
30
+
31
+ When you call <code>Trie.new</code> for the first time with the given directory as the first argument it will create three files. 'trie.br', 'trie.tl', and 'trie.sbm'. 'trie.br' and 'trie.tl' are binary files corresponding to the two arrays which represent the trie structure it self and the tails and data for the strings, respectively. You probably don't want to mess with these directly, use the library for that. 'trie.sbm' controls what characters are valid in the trie. Look into the libdatrie documentation for more details.
32
+
33
+ Anyway. So we've created our blank trie. Now, since we're creating an autocompleter, we'll need to add some words into it. We do that simply with the add method.
34
+
35
+ <pre><code>
36
+ words.each do |word|
37
+ trie.add word
38
+ end
39
+ </code></pre>
40
+
41
+ Or if you have some integer data to store along with the words, such as weights or scores of some kind, you'd do it like so...
42
+
43
+ <pre><code>
44
+ words_and_weights do |word,weight|
45
+ trie.add word, weight
46
+ end
47
+ </code></pre>
48
+
49
+ Great, so we've populated our trie with some words. (Note the limitations section below.) Let's make sure those words are really there.
50
+
51
+ <pre><code>
52
+ trie.has_key?('widget') #=> true
53
+
54
+ trie.get('widget') #=> -1 or your value
55
+
56
+ trie.get('not-in-the-trie') #=> nil
57
+ </code></pre>
58
+
59
+ If you didn't enter a value to go along with the word, calling <code>get</code> with it will return -1.
60
+
61
+ Okay great, we have our populated trie, we've confirmed that the keys are in there. Let's make an autocompleter! For this we'll need to use the <code>children</code> method. We'll do this as a simple Rails action, with the assumption you've initialized the trie into <code>TRIE</code>.
62
+
63
+ <pre><code>
64
+ def autocomplete
65
+ children = TRIE.children(params[:prefix])
66
+
67
+ respond_to do |format|
68
+ format.js { render(:string => JSON.dump(children)) }
69
+ format.yaml { render(:string => YAML.dump(children)) }
70
+ end
71
+ end
72
+ </code></pre>
73
+
74
+ Yep, that's it.
75
+
76
+ There are, of course, some more interesting and advanced ways to use a trie. For instance, this snippet take a string, then walks down the trie, noting each word it finds along the way.
77
+
78
+ <pre><code>
79
+ word = 'forestry'
80
+ node = trie.root
81
+
82
+ word.split('').each do |char|
83
+ break unless node.walk!(char)
84
+ if node.terminal?
85
+ puts "Found me a word: #{node.full_state}"
86
+ end
87
+ end
88
+ </code></pre>
89
+
90
+ By calling <code>root</code> on a Trie object, you get a TrieNode, pointed at the root of the trie. You can then use this node to walk the trie and perceive things about each word.
91
+
92
+
93
+ h2. Limitations
94
+
95
+ By default libdatrie supports only 32767 words in a trie, as well as only 16-bit integers for the value that goes along with inserted strings. This certainly makes sense for some purposes on some platforms... but I want to be able to enter bajillions of words with large bits of data associated. So, I've forked the project to switch both indexes and datum to 32-bit. So you can enter... a lot of information now. You can find my fork at http://github.com/tyler/libdatrie.
96
+
97
+ h2. Bugs
98
+
99
+ Saving to disk doesn't work correctly. Not sure why... maybe related to my libdatrie changes.
100
+
101
+
102
+ Copyright (c) 2008 Tyler McMullen. See LICENSE for details.
data/VERSION.yml CHANGED
@@ -1,4 +1,4 @@
1
1
  ---
2
- patch: 1
2
+ patch: 2
3
3
  major: 0
4
4
  minor: 2
data/ext/trie/trie.c CHANGED
@@ -95,7 +95,7 @@ static VALUE trie_add(VALUE self, VALUE args) {
95
95
 
96
96
  VALUE key;
97
97
  key = RARRAY(args)->ptr[0];
98
- int32 value = size == 2 ? NUM2INT(RARRAY(args)->ptr[1]) : TRIE_DATA_ERROR;
98
+ TrieData value = size == 2 ? NUM2INT(RARRAY(args)->ptr[1]) : TRIE_DATA_ERROR;
99
99
 
100
100
  const TrieChar *sb_key = stringToTrieChar(key);
101
101
 
@@ -160,7 +160,7 @@ static VALUE trie_children(VALUE self, VALUE prefix) {
160
160
  TrieChar *iterator = (TrieChar*)sb_prefix;
161
161
  while(*iterator != '\0') {
162
162
  if(!sb_trie_state_is_walkable(state, *iterator))
163
- return Qnil;
163
+ return rb_ary_new();
164
164
  sb_trie_state_walk(state, *iterator);
165
165
  iterator++;
166
166
  }
@@ -173,6 +173,72 @@ static VALUE trie_children(VALUE self, VALUE prefix) {
173
173
  sb_trie_state_free(state);
174
174
  return children;
175
175
  }
176
+ static VALUE walk_all_paths_with_values(VALUE children, SBTrieState *state, char *prefix) {
177
+ int c;
178
+ for(c = 1; c < TRIE_CHAR_MAX; c++) {
179
+ if(sb_trie_state_is_walkable(state,c)) {
180
+ SBTrieState *next_state = sb_trie_state_clone(state);
181
+ sb_trie_state_walk(next_state, (TrieChar)c);
182
+
183
+ char *word = (char*)malloc(strlen(prefix) + 2);
184
+ strcat(strcpy(word, prefix), (char*)&c);
185
+
186
+ if(sb_trie_state_is_terminal(next_state)) {
187
+ SBTrieState *end_state = sb_trie_state_clone(next_state);
188
+ sb_trie_state_walk(end_state, '\0');
189
+
190
+ VALUE tuple = rb_ary_new();
191
+ rb_ary_push(tuple, rb_str_new2(word));
192
+ TrieData trie_data = sb_trie_state_get_data(end_state);
193
+ rb_ary_push(tuple, INT2FIX(trie_data));
194
+ rb_ary_push(children, tuple);
195
+
196
+ sb_trie_state_free(end_state);
197
+ }
198
+
199
+ walk_all_paths_with_values(children, next_state, word);
200
+
201
+ sb_trie_state_free(next_state);
202
+ }
203
+ }
204
+ }
205
+
206
+ static VALUE trie_children_with_values(VALUE self, VALUE prefix) {
207
+ SBTrie *sb_trie;
208
+ Data_Get_Struct(self, SBTrie, sb_trie);
209
+
210
+ const TrieChar *sb_prefix = stringToTrieChar(prefix);
211
+
212
+ VALUE children = rb_ary_new();
213
+
214
+ SBTrieState *state = sb_trie_root(sb_trie);
215
+
216
+ TrieChar *iterator = (TrieChar*)sb_prefix;
217
+ while(*iterator != '\0') {
218
+ if(!sb_trie_state_is_walkable(state, *iterator))
219
+ return rb_ary_new();
220
+ sb_trie_state_walk(state, *iterator);
221
+ iterator++;
222
+ }
223
+
224
+ if(sb_trie_state_is_terminal(state)) {
225
+ SBTrieState *end_state = sb_trie_state_clone(state);
226
+ sb_trie_state_walk(end_state, '\0');
227
+
228
+ VALUE tuple = rb_ary_new();
229
+ rb_ary_push(tuple, prefix);
230
+ TrieData trie_data = sb_trie_state_get_data(end_state);
231
+ rb_ary_push(tuple, INT2FIX(trie_data));
232
+ rb_ary_push(children, tuple);
233
+
234
+ sb_trie_state_free(end_state);
235
+ }
236
+
237
+ walk_all_paths_with_values(children, state, (char*)sb_prefix);
238
+
239
+ sb_trie_state_free(state);
240
+ return children;
241
+ }
176
242
 
177
243
  static VALUE trie_walk_to_terminal(VALUE self, VALUE args) {
178
244
  SBTrie *sb_trie;
@@ -334,6 +400,7 @@ void Init_trie() {
334
400
  rb_define_method(cTrie, "delete", trie_delete, 1);
335
401
  rb_define_method(cTrie, "close", trie_close, 0);
336
402
  rb_define_method(cTrie, "children", trie_children, 1);
403
+ rb_define_method(cTrie, "children_with_values", trie_children_with_values, 1);
337
404
  rb_define_method(cTrie, "walk_to_terminal", trie_walk_to_terminal, -2);
338
405
  rb_define_method(cTrie, "root", trie_root, 0);
339
406
  rb_define_method(cTrie, "save", trie_save, 0);
data/spec/trie_spec.rb CHANGED
@@ -76,7 +76,7 @@ describe Trie do
76
76
  end
77
77
 
78
78
  it 'returns nil if prefix does not exist' do
79
- @trie.children('ajsodij').should be_nil
79
+ @trie.children('ajsodij').should == []
80
80
  end
81
81
 
82
82
  it 'includes the prefix if the prefix is a word' do
@@ -87,6 +87,31 @@ describe Trie do
87
87
  end
88
88
  end
89
89
 
90
+ describe :children_with_values do
91
+ before :each do
92
+ @trie.add('abc',2)
93
+ @trie.add('abcd',4)
94
+ end
95
+
96
+ it 'returns all words with values beginning with a given prefix' do
97
+ children = @trie.children_with_values('ab')
98
+ children.size.should == 2
99
+ children.should include(['abc',2])
100
+ children.should include(['abcd',4])
101
+ end
102
+
103
+ it 'returns nil if prefix does not exist' do
104
+ @trie.children_with_values('ajsodij').should == []
105
+ end
106
+
107
+ it 'includes the prefix if the prefix is a word' do
108
+ children = @trie.children_with_values('abc')
109
+ children.size.should == 2
110
+ children.should include(['abc',2])
111
+ children.should include(['abcd',4])
112
+ end
113
+ end
114
+
90
115
  describe :walk_to_terminal do
91
116
  it 'returns the first word found along a path' do
92
117
  @trie.add 'anderson'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tyler-trie
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.2.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tyler McMullen
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-03-11 00:00:00 -07:00
12
+ date: 2009-03-14 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies: []
15
15
 
@@ -22,6 +22,7 @@ extensions:
22
22
  extra_rdoc_files: []
23
23
 
24
24
  files:
25
+ - README.textile
25
26
  - VERSION.yml
26
27
  - lib/trie.rb
27
28
  - spec/test-trie