hamster 0.1.2 → 0.1.5

Sign up to get free protection for your applications and to get access to all the features.
Files changed (39) hide show
  1. data/History.rdoc +13 -1
  2. data/README.rdoc +72 -25
  3. data/TODO +1 -0
  4. data/lib/hamster.rb +4 -1
  5. data/lib/hamster/hash.rb +66 -0
  6. data/lib/hamster/list.rb +105 -0
  7. data/lib/hamster/set.rb +65 -0
  8. data/lib/hamster/stack.rb +53 -0
  9. data/lib/hamster/trie.rb +45 -28
  10. data/lib/hamster/version.rb +1 -1
  11. data/spec/hamster/hash/copying_spec.rb +25 -0
  12. data/spec/hamster/hash/each_spec.rb +47 -0
  13. data/spec/hamster/hash/empty_spec.rb +17 -0
  14. data/spec/hamster/hash/eql_spec.rb +20 -0
  15. data/spec/hamster/hash/get_spec.rb +22 -0
  16. data/spec/hamster/hash/has_key_spec.rb +21 -0
  17. data/spec/hamster/hash/put_spec.rb +93 -0
  18. data/spec/hamster/hash/remove_spec.rb +113 -0
  19. data/spec/hamster/list/accessor_spec.rb +26 -0
  20. data/spec/hamster/list/car_spec.rb +17 -0
  21. data/spec/hamster/list/copying_spec.rb +25 -0
  22. data/spec/hamster/list/each_spec.rb +45 -0
  23. data/spec/hamster/list/empty_spec.rb +18 -0
  24. data/spec/hamster/list/eql_spec.rb +22 -0
  25. data/spec/hamster/list/map_spec.rb +43 -0
  26. data/spec/hamster/list/reduce_spec.rb +31 -0
  27. data/spec/hamster/stack/copying_spec.rb +25 -0
  28. data/spec/hamster/stack/empty_spec.rb +17 -0
  29. data/spec/hamster/stack/eql_spec.rb +20 -0
  30. data/spec/hamster/stack/push_spec.rb +42 -0
  31. metadata +28 -12
  32. data/lib/hamster/entry.rb +0 -18
  33. data/spec/hamster/trie/each_spec.rb +0 -51
  34. data/spec/hamster/trie/empty_spec.rb +0 -22
  35. data/spec/hamster/trie/enumerable_spec.rb +0 -13
  36. data/spec/hamster/trie/get_spec.rb +0 -26
  37. data/spec/hamster/trie/has_key_spec.rb +0 -25
  38. data/spec/hamster/trie/put_spec.rb +0 -97
  39. data/spec/hamster/trie/remove_spec.rb +0 -115
data/History.rdoc CHANGED
@@ -1,4 +1,16 @@
1
- === 0.1.2 / 2009-10-23
1
+ === 0.1.5 / 2009-11-5
2
+
3
+ * Add some examples
4
+
5
+ === 0.1.4 / 2009-10-26
6
+
7
+ * Simplify and share Trie between Hash and Set
8
+
9
+ === 0.1.3 / 2009-10-26
10
+
11
+ * All known issues fixed.
12
+
13
+ === 0.1.2 / 2009-10-25
2
14
 
3
15
  * Fixed all but one outstanding issue with #remove
4
16
 
data/README.rdoc CHANGED
@@ -1,45 +1,92 @@
1
1
  = Hamster
2
2
 
3
- Hash Array Mapped Tries (HAMT) for Ruby (see http://lamp.epfl.ch/papers/idealhashtrees.pdf).
3
+ Hamster started out as an implementation of Hash Array Mapped Hashs (HAMT) for Ruby (see http://lamp.epfl.ch/papers/idealhashtrees.pdf) and has since expanded to include implementations of other Persistent Data Structures (see http://en.wikipedia.org/wiki/Persistent_data_structure) such as Sets, Lists, Stacks, etc.
4
4
 
5
- Why do you care?
5
+ == Huh?
6
6
 
7
- HAMTs are hash tables with one really neat property: their structure enables you to perform very efficient copy-on-write operations. For example:
7
+ Persistent data structures have a really neat property: very efficient copy-on-write operations. That allows you to create immutable data-structures that only need copying when something changes. For example:
8
8
 
9
- trie = Hamster::Trie.new
9
+ require 'hamster'
10
10
 
11
- trie.put("Name", "Simon")
12
- trie.get("Name") # => nil
11
+ hash = Hamster::Hash.new
13
12
 
14
- Huh? That's not much use!
13
+ hash.put("Name", "Simon")
14
+ hash.has_key?("Name") # => false
15
+ hash.get("Name") # => nil
15
16
 
16
- Remember, each instance of a trie is immutable. #put creates an efficient copy containing the modifications. So, let's try that again:
17
+ == Double Huh? That's not much use!
17
18
 
18
- trie = Hamster::Trie.new
19
+ Whoops! Remember, each call to <tt>#put</tt> creates an efficient copy containing the modifications, leaving the original unmodified. So, unlike Ruby's built-in <tt>Hash</tt> all Hamster classes follow Command-Query-Seperation (see http://martinfowler.com/bliki/CommandQuerySeparation.html) and return the modified copy of themselves after any mutating operation. Let's try that again:
19
20
 
20
- trie = trie.put("Name", "Simon")
21
- trie.get("Name") # => "Simon"
21
+ require 'hamster'
22
22
 
23
- The same goes for remove:
23
+ original = Hamster::Hash.new
24
+ copy = hash.put("Name", "Simon")
24
25
 
25
- trie = Hamster::Trie.new
26
+ original.get("Name") # => nil
27
+ copy.get("Name") # => "Simon"
26
28
 
27
- trie = trie.put("Name", "Simon")
28
- trie = trie.put("Gender", "Male")
29
- trie = trie.remove("Name")
30
- trie.get("Name") # => nil
31
- trie.get("Gender") # => "Male"
29
+ The same goes for <tt>#remove</tt>:
32
30
 
33
- So tell me again why I care?
31
+ require 'hamster'
34
32
 
35
- As mentioned earlier, HAMTs perform a copy whenever they are modified meaning there is never any chance that two threads could be modifying the same instance at any one time. And, because they are very efficient copies, you don't need to worry about using up gobs of heap space in the process.
33
+ original = Hamster::Hash.new
34
+ original = hash.put("Name", "Simon")
35
+ copy = hash.remove("Name")
36
+
37
+ original.get("Name") # => Simon
38
+ copy.get("Name") # => nil
36
39
 
37
- Thats nice but I don't really have multi-threading issues.
40
+ == Oh, I get it. Cool. But I still don't understand why I should care?
38
41
 
39
- OK, how about transactional memory:
42
+ As mentioned earlier, persistent data structures perform a copy whenever they are modified meaning there is never any chance that two threads could be modifying the same instance at any one time. And, because they are very efficient copies, you don't need to worry about using up gobs of heap space in the process.
40
43
 
41
- Need an example
44
+ == OK, that sounds mildly interesting. What's the downside--there's always a downside?
42
45
 
43
- So what's the downside?
46
+ There's a potential performance hit when compared with MRI's built-in, native, hand-crafted C-code implementation of <tt>Hash</tt>. For example:
44
47
 
45
- The downside is that because the implementation is pure Ruby, MRI's built-in, native, hand-crafted C-code implementation of Hash is 10-times faster!
48
+ require 'hamster'
49
+
50
+ hash = Hamster::Hash.new
51
+ (1..10000).each { |i| hash = hash.put(i, i) } # => 0.05s
52
+ (1..10000).each { |i| hash.get(i) } # => 0.008s
53
+
54
+ versus
55
+
56
+ hash = {}
57
+ (1..10000).each { |i| hash[i] = i } # => 0.004s
58
+ (1..10000).each { |i| hash[i] } # => 0.001s
59
+
60
+ == That seems pretty bad?
61
+
62
+ Well, yes and no. The previous comparison wasn't really fair. Sure, if all you want to do is replace your existing uses of <tt>Hash</tt> in single-threaded environments then don't even bother. However, if you need something that can be used efficiently in concurrent environments where multiple threads are accessing--reading AND writing--the contents things get much better.
63
+
64
+ == Do you have a better example?
65
+
66
+ A more realistic comparison might look like:
67
+
68
+ require 'hamster'
69
+
70
+ hash = Hamster::Hash.new
71
+ (1..10000).each { |i| hash = hash.put(i, i) } # => 0.05s
72
+ (1..10000).each { |i| hash.get(i) } # => 0.008s
73
+
74
+ versus
75
+
76
+ hash = {}
77
+ (1..10000).each { |i|
78
+ hash = hash.dup
79
+ hash[i] = i
80
+ } # => 19.8s
81
+
82
+ (1..10000).each { |i| hash[i] } # => 0.001s
83
+
84
+ Impressive huh? What's even better is--or worse depending on your perspective--is that after all that, the native <tt>Hash</tt> version still isn't thread-safe and still requires some synchronisation around it slowing it down even further!
85
+
86
+ The <tt>Hamster::Hash</tt> version on the other hand was unchanged from the original whilst remaining inherently thread-safe, and 3 orders of magnitude faster!
87
+
88
+ == Sure, but as you say, you still need synchronisation so why bother with the copying?
89
+
90
+ Well, I could show you one but I'd have to re-write--or at least wrap--most <tt>Hash</tt> methods to make it generic, or at least write some application-specific code that synchronised using a <tt>Mutex</tt> and ... well ... it's hard, I always make mistakes, I always end up with weird edge cases and race conditions so, I'll leave that as an exercise for you :)
91
+
92
+ == And that, my friends, is why you might want to use one :)
data/TODO CHANGED
@@ -0,0 +1 @@
1
+ * Implement Enumerable methods such that they return instances of the class (or List as a last resort) rather than Array.
data/lib/hamster.rb CHANGED
@@ -1,3 +1,6 @@
1
- require 'hamster/entry'
2
1
  require 'hamster/trie'
2
+ require 'hamster/list'
3
+ require 'hamster/stack'
4
+ require 'hamster/set'
5
+ require 'hamster/hash'
3
6
  require 'hamster/version'
@@ -0,0 +1,66 @@
1
+ module Hamster
2
+
3
+ class Hash
4
+
5
+ def initialize(trie = Trie.new)
6
+ @trie = trie
7
+ end
8
+
9
+ # Returns the number of key-value pairs in the hash.
10
+ def size
11
+ @trie.size
12
+ end
13
+
14
+ # Returns <tt>true</tt> if the hash contains no key-value pairs.
15
+ def empty?
16
+ @trie.empty?
17
+ end
18
+
19
+ # Returns <tt>true</tt> if the given key is present in the hash.
20
+ def has_key?(key)
21
+ @trie.has_key?(key)
22
+ end
23
+
24
+ # Retrieves the value corresponding to the given key. If not found, returns <tt>nil</tt>.
25
+ def get(key)
26
+ @trie.get(key)
27
+ end
28
+
29
+ # Returns a copy of <tt>self</tt> with the given value associated with the key.
30
+ def put(key, value)
31
+ self.class.new(@trie.put(key, value))
32
+ end
33
+
34
+ # Returns a copy of <tt>self</tt> with the given key/value pair removed. If not found, returns <tt>self</tt>.
35
+ def remove(key)
36
+ copy = @trie.remove(key)
37
+ if copy.equal?(@trie)
38
+ self
39
+ else
40
+ self.class.new(copy)
41
+ end
42
+ end
43
+
44
+ # Calls <tt>block</tt> once for each entry in the hash, passing the key-value pair as parameters.
45
+ # Returns <tt>self</tt>
46
+ def each
47
+ block_given? or return enum_for(__method__)
48
+ @trie.each { |key, value| yield key, value }
49
+ self
50
+ end
51
+
52
+ # Returns <tt>true</tt> if . <tt>eql?</tt> is synonymous with <tt>==</tt>
53
+ def eql?(other)
54
+ equal?(other) || (self.class.equal?(other.class) && @trie.eql?(other.instance_eval{@trie}))
55
+ end
56
+ alias :== :eql?
57
+
58
+ # Returns <tt>self</tt>
59
+ def dup
60
+ self
61
+ end
62
+ alias :clone :dup
63
+
64
+ end
65
+
66
+ end
@@ -0,0 +1,105 @@
1
+ module Hamster
2
+
3
+ class List
4
+
5
+ include Enumerable
6
+
7
+ def initialize(head = nil, tail = self)
8
+ @head = head
9
+ @tail = tail
10
+ end
11
+
12
+ # Returns <tt>true</tt> if the list contains no items.
13
+ def empty?
14
+ @tail.equal?(self)
15
+ end
16
+
17
+ # Returns the number of items in the list.
18
+ def size
19
+ if empty?
20
+ 0
21
+ else
22
+ @tail.size + 1
23
+ end
24
+ end
25
+
26
+ # Returns the first item.
27
+ def car
28
+ @head
29
+ end
30
+
31
+ # Returns a copy of <tt>self</tt> without the first item.
32
+ def cdr
33
+ @tail
34
+ end
35
+
36
+ # Returns a copy of <tt>self</tt> with it as the head.
37
+ def cons(item)
38
+ self.class.new(item, self)
39
+ end
40
+
41
+ # Calls <tt>block</tt> once for each item in the list, passing the item as the only parameter.
42
+ # Returns <tt>self</tt>
43
+ def each
44
+ block_given? or return enum_for(__method__)
45
+ unless empty?
46
+ yield(@head)
47
+ @tail.each { |item| yield(item) }
48
+ end
49
+ self
50
+ end
51
+
52
+ # Returns <tt>true</tt> if . <tt>eql?</tt> is synonymous with <tt>==</tt>
53
+ def eql?(other)
54
+ blammo!
55
+ end
56
+ alias :== :eql?
57
+
58
+ # Returns <tt>self</tt>
59
+ def dup
60
+ self
61
+ end
62
+ alias :clone :dup
63
+
64
+ def map
65
+ if empty?
66
+ self
67
+ else
68
+ @tail.map { |item| yield(item) }.cons(yield(@head))
69
+ end
70
+ end
71
+
72
+ def reduce(memo)
73
+ if empty?
74
+ memo
75
+ else
76
+ @tail.reduce(yield(memo, @head)) { |memo, item| yield(memo, item) }
77
+ end
78
+ end
79
+
80
+ private
81
+
82
+ def method_missing(name, *args, &block)
83
+ if name.to_s =~ /^c([ad]+)r$/
84
+ accessor($1)
85
+ else
86
+ super
87
+ end
88
+ end
89
+
90
+ # Perform compositions of <tt>car</tt> and <tt>cdr</tt> operations. Their names consist of a 'c', followed by at
91
+ # least one 'a' or 'd', and finally an 'r'. The series of 'a's and 'd's in each function's name is chosen to
92
+ # identify the series of car and cdr operations that is performed by the function. The order in which the 'a's and
93
+ # 'd's appear is the inverse of the order in which the corresponding operations are performed.
94
+ def accessor(sequence)
95
+ sequence.split(//).reverse!.inject(self) do |memo, char|
96
+ case char
97
+ when "a" then memo.car
98
+ when "d" then memo.cdr
99
+ end
100
+ end
101
+ end
102
+
103
+ end
104
+
105
+ end
@@ -0,0 +1,65 @@
1
+ module Hamster
2
+
3
+ class Set
4
+
5
+ def initialize(trie = Trie.new)
6
+ @trie = trie
7
+ end
8
+
9
+ # Returns the number of items in the set.
10
+ def size
11
+ @trie.size
12
+ end
13
+
14
+ # Returns <tt>true</tt> if the set contains no items.
15
+ def empty?
16
+ @trie.size
17
+ end
18
+
19
+ # Returns <tt>true</tt> if the given item is present in the set.
20
+ def include?(item)
21
+ @trie.has_key?(item)
22
+ end
23
+
24
+ # Returns a copy of <tt>self</tt> with the given item added. If already exists, returns <tt>self</tt>.
25
+ def add(item)
26
+ if include?(item)
27
+ self
28
+ else
29
+ self.class.new(@trie.put(item, nil))
30
+ end
31
+ end
32
+
33
+ # Returns a copy of <tt>self</tt> with the given item removed. If not found, returns <tt>self</tt>.
34
+ def remove(key)
35
+ copy = @trie.remove(item)
36
+ if copy.equal?(@trie)
37
+ self
38
+ else
39
+ self.class.new(copy)
40
+ end
41
+ end
42
+
43
+ # Calls <tt>block</tt> once for each item in the set, passing the item as the only parameter.
44
+ # Returns <tt>self</tt>
45
+ def each
46
+ block_given? or return enum_for(__method__)
47
+ @trie.each { |key, value| yield key }
48
+ self
49
+ end
50
+
51
+ # Returns <tt>true</tt> if . <tt>eql?</tt> is synonymous with <tt>==</tt>
52
+ def eql?(other)
53
+ equal?(other) || (self.class.equal?(other.class) && @trie.eql?(other.instance_eval{@trie}))
54
+ end
55
+ alias :== :eql?
56
+
57
+ # Returns <tt>self</tt>
58
+ def dup
59
+ self
60
+ end
61
+ alias :clone :dup
62
+
63
+ end
64
+
65
+ end
@@ -0,0 +1,53 @@
1
+ module Hamster
2
+
3
+ class Stack
4
+
5
+ def initialize(list = List.new)
6
+ @list = list
7
+ end
8
+
9
+ # Returns <tt>true</tt> if the stack contains no items.
10
+ def empty?
11
+ @list.empty?
12
+ end
13
+
14
+ # Returns the number of items on the stack.
15
+ def size
16
+ @list.size
17
+ end
18
+
19
+ # Returns the item at the top of the stack.
20
+ def top
21
+ @list.car
22
+ end
23
+
24
+ # Returns a copy of <tt>self</tt> with the given item as the new top
25
+ def push(item)
26
+ self.class.new(@list.cons(item))
27
+ end
28
+
29
+ # Returns a copy of <tt>self</tt> without the top item.
30
+ def pop
31
+ copy = @list.cdr
32
+ if !copy.equal?(@list)
33
+ self.class.new(copy)
34
+ else
35
+ self
36
+ end
37
+ end
38
+
39
+ # Returns <tt>true</tt> if . <tt>eql?</tt> is synonymous with <tt>==</tt>
40
+ def eql?(other)
41
+ equal?(other) || (self.class.equal?(other.class) && @list.eql?(other.instance_eval{@list}))
42
+ end
43
+ alias :== :eql?
44
+
45
+ # Returns <tt>self</tt>
46
+ def dup
47
+ self
48
+ end
49
+ alias :clone :dup
50
+
51
+ end
52
+
53
+ end
data/lib/hamster/trie.rb CHANGED
@@ -2,8 +2,6 @@ module Hamster
2
2
 
3
3
  class Trie
4
4
 
5
- include Enumerable
6
-
7
5
  def initialize(significant_bits = 0, entries = [], children = [])
8
6
  @significant_bits = significant_bits
9
7
  @entries = entries
@@ -12,8 +10,9 @@ module Hamster
12
10
 
13
11
  # Returns the number of key-value pairs in the trie.
14
12
  def size
15
- # TODO: This definitely won't scale!
16
- to_a.size
13
+ count = 0
14
+ each { count += 1 }
15
+ count
17
16
  end
18
17
 
19
18
  # Returns <tt>true</tt> if the trie contains no key-value pairs.
@@ -26,37 +25,30 @@ module Hamster
26
25
  !! get(key)
27
26
  end
28
27
 
29
- # Calls <tt>block</tt> once for each key in the trie, passing the key-value pair as parameters.
30
- # Returns <tt>self</tt>
28
+ # Calls <tt>block</tt> once for each entry in the trie, passing the key-value pair as parameters.
31
29
  def each
32
- block_given? or return enum_for(__method__)
33
30
  @entries.each { |entry| yield entry.key, entry.value if entry }
34
31
  @children.each do |child|
35
32
  child.each { |key, value| yield key, value } if child
36
33
  end
37
- self
38
34
  end
39
35
 
40
36
  # Returns a copy of <tt>self</tt> with the given value associated with the key.
41
37
  def put(key, value)
42
38
  index = index_for(key)
43
39
  entry = @entries[index]
44
-
45
40
  if entry && !entry.has_key?(key)
46
41
  children = @children.dup
47
42
  child = children[index]
48
-
49
43
  children[index] = if child
50
44
  child.put(key, value)
51
45
  else
52
46
  self.class.new(@significant_bits + 5).put!(key, value)
53
47
  end
54
-
55
48
  self.class.new(@significant_bits, @entries, children)
56
49
  else
57
50
  entries = @entries.dup
58
51
  entries[index] = Entry.new(key, value)
59
-
60
52
  self.class.new(@significant_bits, entries, @children)
61
53
  end
62
54
  end
@@ -65,12 +57,12 @@ module Hamster
65
57
  def get(key)
66
58
  index = index_for(key)
67
59
  entry = @entries[index]
68
- if entry
69
- if entry.has_key?(key)
70
- entry.value
71
- else
72
- child = @children[index]
73
- child.get(key) if child
60
+ if entry && entry.has_key?(key)
61
+ entry.value
62
+ else
63
+ child = @children[index]
64
+ if child
65
+ child.get(key)
74
66
  end
75
67
  end
76
68
  end
@@ -80,6 +72,12 @@ module Hamster
80
72
  remove!(key) || self
81
73
  end
82
74
 
75
+ # Returns <tt>true</tt> if . <tt>eql?</tt> is synonymous with <tt>==</tt>
76
+ def eql?(other)
77
+ blammo!
78
+ end
79
+ alias :== :eql?
80
+
83
81
  protected
84
82
 
85
83
  def put!(key, value)
@@ -90,17 +88,21 @@ module Hamster
90
88
  def remove!(key)
91
89
  index = index_for(key)
92
90
  entry = @entries[index]
93
- child = @children[index]
94
91
  if entry && entry.has_key?(key)
95
- entries = @entries.dup
96
- entries[index] = nil
97
- self.class.new(@significant_bits, entries, @children) unless size == 1
98
- elsif child
99
- new_child = child.remove!(key)
100
- if new_child != child
101
- children = @children.dup
102
- children[index] = new_child
103
- self.class.new(@significant_bits, @entries, children)
92
+ if size > 1
93
+ entries = @entries.dup
94
+ entries[index] = nil
95
+ self.class.new(@significant_bits, entries, @children)
96
+ end
97
+ else
98
+ child = @children[index]
99
+ if child
100
+ copy = child.remove!(key)
101
+ if !copy.equal?(child)
102
+ children = @children.dup
103
+ children[index] = copy
104
+ self.class.new(@significant_bits, @entries, children)
105
+ end
104
106
  end
105
107
  end
106
108
  end
@@ -111,6 +113,21 @@ module Hamster
111
113
  (key.hash.abs >> @significant_bits) & 31
112
114
  end
113
115
 
116
+ class Entry
117
+
118
+ attr_reader :key, :value
119
+
120
+ def initialize(key, value)
121
+ @key = key
122
+ @value = value
123
+ end
124
+
125
+ def has_key?(key)
126
+ @key.eql?(key)
127
+ end
128
+
129
+ end
130
+
114
131
  end
115
132
 
116
133
  end