mapredus 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +45 -30
- data/lib/mapredus.rb +2 -1
- data/lib/mapredus/default_classes.rb +54 -0
- data/lib/mapredus/filesystem.rb +7 -7
- data/lib/mapredus/keys.rb +0 -7
- data/lib/mapredus/outputter.rb +0 -10
- data/lib/mapredus/process.rb +113 -58
- data/lib/mapredus/support.rb +11 -11
- data/spec/helper_classes.rb +8 -65
- data/spec/mapredus_spec.rb +46 -17
- metadata +5 -4
data/README.md
CHANGED
@@ -15,7 +15,7 @@ Goals:
|
|
15
15
|
* simple M/R-style programming for existing Ruby projects
|
16
16
|
* low cost of entry (no need for a dedicated cluster)
|
17
17
|
|
18
|
-
|
18
|
+
if you are looking for a high-performance MapReduce implementation
|
19
19
|
that can meet your big data needs, try Hadoop.
|
20
20
|
|
21
21
|
|
@@ -25,26 +25,31 @@ Using MapRedus
|
|
25
25
|
MapRedus uses Resque to handle the processes that it runs, and redis
|
26
26
|
to keep a store for the values/data produced.
|
27
27
|
|
28
|
-
Workers for a MapRedus process
|
28
|
+
Workers for a MapRedus process are Resque workers. Refer to the
|
29
29
|
Resque worker documentation to see how to load the necessary
|
30
30
|
environment for your worker to be able to run mapreduce processs. An
|
31
31
|
example is also located in the tests.
|
32
32
|
|
33
33
|
### Attaching a mapreduce process to a class
|
34
|
-
|
35
|
-
|
36
|
-
|
34
|
+
|
35
|
+
You will often want to define a mapreduce process that does some
|
36
|
+
operations on data within a class. The process should have an
|
37
|
+
inputter, mapper, reducer, finalizer, and outputter defined. By
|
38
|
+
default a process will have the specifications shown below. There is
|
39
|
+
also an example of how to do this in the tests.
|
40
|
+
|
37
41
|
class GetWordCount < MapRedus::Process
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
42
|
+
inputter MapRedus::WordStream
|
43
|
+
mapper MapRedus::WordCounter
|
44
|
+
reducer MapRedus::Adder
|
45
|
+
finalizer MapRedus::ToRedisHash
|
46
|
+
outputter MapRedus::RedisHasher
|
47
|
+
ordered false
|
48
|
+
end
|
49
|
+
|
50
|
+
class GetCharCount < MapRedus::Process
|
51
|
+
inputter MapRedus::CharStream
|
52
|
+
mapper MapRedus::CharCounter
|
48
53
|
end
|
49
54
|
|
50
55
|
class Job
|
@@ -91,10 +96,10 @@ example:
|
|
91
96
|
end
|
92
97
|
end
|
93
98
|
|
94
|
-
In this example, the
|
99
|
+
In this example, the input stream calls yield to output a mapredus
|
95
100
|
file number and a the value that is saved to file (in redis). The
|
96
|
-
mapper's map function calls yield to emit the key value pair for
|
97
|
-
storage in redis. The reducer's reduce function acts similarly.
|
101
|
+
mapper's `map` function calls yield to emit the key value pair for
|
102
|
+
storage in redis. The reducer's `reduce` function acts similarly.
|
98
103
|
|
99
104
|
The finalizer runs whatever needs to be run when a process completes,
|
100
105
|
an example:
|
@@ -102,7 +107,7 @@ an example:
|
|
102
107
|
class Finalizer < MapRedus::Finalizer
|
103
108
|
def self.finalize(process)
|
104
109
|
process.each_key_reduced_value do |key, value|
|
105
|
-
process.outputter.encode(process.
|
110
|
+
process.outputter.encode(process.result_key, key, value)
|
106
111
|
end
|
107
112
|
...
|
108
113
|
< set off a new mapredus process to use this stored data >
|
@@ -127,6 +132,16 @@ hash.
|
|
127
132
|
The default Outputter makes no changes to original result, and tries
|
128
133
|
to store that directly into redis as a string.
|
129
134
|
|
135
|
+
Working Locally
|
136
|
+
---------------
|
137
|
+
|
138
|
+
MapRedus uses Bundler to manage dependencies. With Bundler installed:
|
139
|
+
|
140
|
+
bundle install
|
141
|
+
|
142
|
+
You should now be able to run tests and do all other tasks with
|
143
|
+
`rake`.
|
144
|
+
|
130
145
|
Running Tests
|
131
146
|
-------------
|
132
147
|
|
@@ -136,15 +151,15 @@ tests (you'll need to have bundler installed)
|
|
136
151
|
|
137
152
|
Requirements
|
138
153
|
------------
|
139
|
-
Bundler (this will install all the requirements below)
|
140
|
-
Redis
|
141
|
-
RedisSupport
|
142
|
-
Resque
|
143
|
-
Resque-scheduler
|
154
|
+
* Bundler (this will install all the requirements below)
|
155
|
+
* Redis
|
156
|
+
* RedisSupport
|
157
|
+
* Resque
|
158
|
+
* Resque-scheduler
|
144
159
|
|
145
160
|
### Notes
|
146
|
-
Instead of calling
|
147
|
-
to produce a key value pair/value you call yield
|
161
|
+
Instead of calling `emit_intermediate`/`emit` in your map/reduce
|
162
|
+
to produce a key value pair/value you call `yield`, which will call
|
148
163
|
emit_intermediate/emit for you. This gives flexibility in using
|
149
164
|
Mapper/Reducer classes especially in testing.
|
150
165
|
|
@@ -198,17 +213,17 @@ not necessarily in the given order
|
|
198
213
|
|
199
214
|
* think about the following logic
|
200
215
|
|
201
|
-
if a reducer starts working on a key after all maps have finished
|
216
|
+
+ if a reducer starts working on a key after all maps have finished
|
202
217
|
then when it is done the work on that key is finished forerver
|
203
218
|
|
204
|
-
this would imply a process finishes when all map tasks have
|
219
|
+
+ this would imply a process finishes when all map tasks have
|
205
220
|
finished and all reduce tasks that start after the map tasks have
|
206
221
|
finished
|
207
222
|
|
208
|
-
if a reducer started before all map tasks were finished, then load
|
223
|
+
+ if a reducer started before all map tasks were finished, then load
|
209
224
|
its reduced result back onto the value list
|
210
225
|
|
211
|
-
if the reducer started after all map tasks finished, then emit the
|
226
|
+
+ if the reducer started after all map tasks finished, then emit the
|
212
227
|
result
|
213
228
|
|
214
229
|
Note on Patches/Pull Requests
|
data/lib/mapredus.rb
CHANGED
@@ -95,7 +95,6 @@ module MapRedus
|
|
95
95
|
end
|
96
96
|
|
97
97
|
require 'mapredus/keys'
|
98
|
-
require 'mapredus/process'
|
99
98
|
require 'mapredus/filesystem'
|
100
99
|
require 'mapredus/master'
|
101
100
|
require 'mapredus/mapper'
|
@@ -104,3 +103,5 @@ require 'mapredus/finalizer'
|
|
104
103
|
require 'mapredus/support'
|
105
104
|
require 'mapredus/outputter'
|
106
105
|
require 'mapredus/inputter'
|
106
|
+
require 'mapredus/default_classes'
|
107
|
+
require 'mapredus/process'
|
@@ -0,0 +1,54 @@
|
|
1
|
+
module MapRedus
|
2
|
+
class WordStream < InputStream
|
3
|
+
def self.scan(data_object)
|
4
|
+
#
|
5
|
+
# The data_object should be a reference to an object that is
|
6
|
+
# stored on your system. The scanner is used to break up what you
|
7
|
+
# need from the object into manageable pieces for the mapper. In
|
8
|
+
# this example, the data object is a reference to a redis string.
|
9
|
+
#
|
10
|
+
test_string = FileSystem.get(data_object)
|
11
|
+
|
12
|
+
test_string.split.each_slice(10).each_with_index do |word_set, i|
|
13
|
+
yield(i, word_set.join(" "))
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
class WordCounter < Mapper
|
19
|
+
def self.map(map_data)
|
20
|
+
map_data.split(/\W/).each do |word|
|
21
|
+
next if word.empty?
|
22
|
+
yield(word.downcase, 1)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
class Adder < Reducer
|
28
|
+
def self.reduce(value_list)
|
29
|
+
yield( value_list.reduce(0) { |r, v| r += v.to_i } )
|
30
|
+
end
|
31
|
+
end
|
32
|
+
|
33
|
+
class ToRedisHash < Finalizer
|
34
|
+
def self.finalize(process)
|
35
|
+
process.each_key_reduced_value do |key, value|
|
36
|
+
process.outputter.encode(process.result_key, key, value)
|
37
|
+
end
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
class RedisHasher < Outputter
|
42
|
+
def self.keys(result_key)
|
43
|
+
FileSystem.hkeys(result_key)
|
44
|
+
end
|
45
|
+
|
46
|
+
def self.encode(result_key, k, v)
|
47
|
+
FileSystem.hset(result_key, k, v)
|
48
|
+
end
|
49
|
+
|
50
|
+
def self.decode(result_key, k)
|
51
|
+
FileSystem.hget(result_key, k)
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
data/lib/mapredus/filesystem.rb
CHANGED
@@ -24,20 +24,20 @@ module MapRedus
|
|
24
24
|
# Setup locks on results using RedisSupport lock functionality
|
25
25
|
#
|
26
26
|
# Examples
|
27
|
-
# FileSystem::has_lock?(
|
27
|
+
# FileSystem::has_lock?(key)
|
28
28
|
# # => true or false
|
29
29
|
#
|
30
30
|
# Returns true if there's a lock
|
31
|
-
def self.has_lock?(
|
32
|
-
MapRedus.has_redis_lock?( RedisKey.result_cache(
|
31
|
+
def self.has_lock?(key)
|
32
|
+
MapRedus.has_redis_lock?( RedisKey.result_cache(key) )
|
33
33
|
end
|
34
34
|
|
35
|
-
def self.acquire_lock(
|
36
|
-
MapRedus.acquire_redis_lock_nonblock( RedisKey.result_cache(
|
35
|
+
def self.acquire_lock(key)
|
36
|
+
MapRedus.acquire_redis_lock_nonblock( RedisKey.result_cache(key), 60 * 60 )
|
37
37
|
end
|
38
38
|
|
39
|
-
def self.release_lock(
|
40
|
-
MapRedus.release_redis_lock( RedisKey.result_cache(
|
39
|
+
def self.release_lock(key)
|
40
|
+
MapRedus.release_redis_lock( RedisKey.result_cache(key) )
|
41
41
|
end
|
42
42
|
end
|
43
43
|
end
|
data/lib/mapredus/keys.rb
CHANGED
@@ -36,13 +36,6 @@ module MapRedus
|
|
36
36
|
#
|
37
37
|
redis_key :temp, "mapredus:process:PID:temp_reduce_key:HASHED_KEY:UNIQUE_REDUCE_HOSTNAME:UNIQUE_REDUCE_PROCESS_ID"
|
38
38
|
|
39
|
-
# If we want to hold on to our final data we have a key to put that data in
|
40
|
-
# In normal map reduce we would just be outputting files
|
41
|
-
#
|
42
|
-
redis_key :result, "mapredus:process:PID:result"
|
43
|
-
redis_key :result_cache, "mapredus:result:KEYNAME"
|
44
|
-
|
45
|
-
|
46
39
|
#### USED WITHIN master.rb ####
|
47
40
|
|
48
41
|
# Keeps track of the current slaves (by appending "1" to a redis list)
|
data/lib/mapredus/outputter.rb
CHANGED
@@ -29,14 +29,4 @@ module MapRedus
|
|
29
29
|
FileSystem.set(result_key, Helper.encode(o))
|
30
30
|
end
|
31
31
|
end
|
32
|
-
|
33
|
-
class RedisHasher < Outputter
|
34
|
-
def self.encode(result_key, k, v)
|
35
|
-
FileSystem.hset(result_key, k, v)
|
36
|
-
end
|
37
|
-
|
38
|
-
def self.decode(result_key, k)
|
39
|
-
FileSystem.hget(result_key, k)
|
40
|
-
end
|
41
|
-
end
|
42
32
|
end
|
data/lib/mapredus/process.rb
CHANGED
@@ -6,16 +6,19 @@ module MapRedus
|
|
6
6
|
# the value of the redis object is a json object which contains:
|
7
7
|
#
|
8
8
|
# {
|
9
|
+
# inputter : inputstreamclass,
|
9
10
|
# mapper : mapclass,
|
10
11
|
# reducer : reduceclass,
|
11
12
|
# finalizer : finalizerclass,
|
13
|
+
# outputter : outputterclass,
|
12
14
|
# partitioner : <not supported>,
|
13
15
|
# combiner : <not supported>,
|
14
16
|
# ordered : true_or_false ## ensures ordering keys from the map output --> [ order, key, value ],
|
15
17
|
# synchronous : true_or_false ## runs the process synchronously or not (generally used for testing)
|
16
18
|
# result_timeout : lenght of time a result is saved ## 3600 * 24
|
17
|
-
#
|
19
|
+
# key_args : arguments to be added to the key location of the result save (cache location)
|
18
20
|
# state : the current state of the process (shouldn't be set by the process and starts off as nil)
|
21
|
+
# type : the original process class ( currently this is needed so we can have namespaces for the result_cache keys )
|
19
22
|
# }
|
20
23
|
#
|
21
24
|
# The user has the ability in subclassing this class to create extra features if needed
|
@@ -24,7 +27,7 @@ module MapRedus
|
|
24
27
|
# Public: Keep track of information that may show up as the redis json value
|
25
28
|
# This is so we know exactly what might show up in the json hash
|
26
29
|
READERS = [:pid]
|
27
|
-
ATTRS = [:inputter, :mapper, :reducer, :finalizer, :outputter, :ordered, :synchronous, :result_timeout, :
|
30
|
+
ATTRS = [:inputter, :mapper, :reducer, :finalizer, :outputter, :ordered, :synchronous, :result_timeout, :key_args, :state, :type]
|
28
31
|
READERS.each { |r| attr_reader r }
|
29
32
|
ATTRS.each { |a| attr_accessor a }
|
30
33
|
|
@@ -42,10 +45,11 @@ module MapRedus
|
|
42
45
|
@ordered = json_helper(json_info, :ordered)
|
43
46
|
@synchronous = json_helper(json_info, :synchronous)
|
44
47
|
@result_timeout = json_helper(json_info, :result_timeout) || DEFAULT_TIME
|
45
|
-
@
|
48
|
+
@key_args = json_helper(json_info, :key_args) || []
|
46
49
|
@state = json_helper(json_info, :state) || NOT_STARTED
|
47
50
|
@outputter = json_helper(json_info, :outputter)
|
48
51
|
@outputter = @outputter ? Helper.class_get(@outputter) : MapRedus::Outputter
|
52
|
+
@type = Helper.class_get(json_helper(json_info, :type) || Process)
|
49
53
|
end
|
50
54
|
|
51
55
|
def json_helper(json_info, key)
|
@@ -174,7 +178,7 @@ module MapRedus
|
|
174
178
|
#
|
175
179
|
# Examples
|
176
180
|
# emit_intermediate(key, value)
|
177
|
-
# # =>
|
181
|
+
# # => if an ordering is required
|
178
182
|
# emit_intermediate(rank, key, value)
|
179
183
|
#
|
180
184
|
# Returns the true on success.
|
@@ -197,6 +201,16 @@ module MapRedus
|
|
197
201
|
true
|
198
202
|
end
|
199
203
|
|
204
|
+
# The emission associated with a reduce. Currently all reduced
|
205
|
+
# values are pushed onto a redis list. It may be the case that we
|
206
|
+
# want to directly use a different redis type given the kind of
|
207
|
+
# reduce we are doing. Often a reduce only returns one value, so
|
208
|
+
# instead of a rpush, we should do a set.
|
209
|
+
#
|
210
|
+
# Examples
|
211
|
+
# emit(key, reduced_value)
|
212
|
+
#
|
213
|
+
# Returns "OK" on success.
|
200
214
|
def emit(key, reduce_val)
|
201
215
|
hashed_key = Helper.hash(key)
|
202
216
|
FileSystem.rpush( ProcessInfo.reduce(@pid, hashed_key), reduce_val )
|
@@ -207,32 +221,6 @@ module MapRedus
|
|
207
221
|
FileSystem.get( ProcessInfo.hash_to_key(@pid, hashed_key) ) == key.to_s )
|
208
222
|
end
|
209
223
|
|
210
|
-
# Saves the result to the specified keyname, using the specified outputter
|
211
|
-
#
|
212
|
-
# Example
|
213
|
-
# (mapreduce:process:result:KEYNAME)
|
214
|
-
# OR
|
215
|
-
# process:pid:result
|
216
|
-
#
|
217
|
-
# The client must ensure the the result will not be affected when to_s is applied
|
218
|
-
# since redis stores all values as strings
|
219
|
-
#
|
220
|
-
# Returns true on success.
|
221
|
-
def save_result(result)
|
222
|
-
res = @outputter.encode(result)
|
223
|
-
FileSystem.save(ProcessInfo.result(@pid), res)
|
224
|
-
FileSystem.save(ProcessInfo.result_cache(@keyname), res, @result_timeout) if @keyname
|
225
|
-
true
|
226
|
-
end
|
227
|
-
|
228
|
-
def get_saved_result
|
229
|
-
@outputter.decode(Process.get_saved_result(@keyname))
|
230
|
-
end
|
231
|
-
|
232
|
-
def delete_saved_result
|
233
|
-
Process.delete_saved_result(@keyname)
|
234
|
-
end
|
235
|
-
|
236
224
|
# Keys that the map operation produced
|
237
225
|
#
|
238
226
|
# Examples
|
@@ -248,11 +236,6 @@ module MapRedus
|
|
248
236
|
end
|
249
237
|
end
|
250
238
|
|
251
|
-
def num_values(key)
|
252
|
-
hashed_key = Helper.hash(key)
|
253
|
-
FileSystem.llen( ProcessInfo.map(@pid, hashed_key) )
|
254
|
-
end
|
255
|
-
|
256
239
|
# values that the map operation produced, for a key
|
257
240
|
#
|
258
241
|
# Examples
|
@@ -265,6 +248,10 @@ module MapRedus
|
|
265
248
|
FileSystem.lrange( ProcessInfo.map(@pid, hashed_key), 0, -1 )
|
266
249
|
end
|
267
250
|
|
251
|
+
def num_values(key)
|
252
|
+
hashed_key = Helper.hash(key)
|
253
|
+
FileSystem.llen( ProcessInfo.map(@pid, hashed_key) )
|
254
|
+
end
|
268
255
|
|
269
256
|
# values that the reduce operation produced, for a key
|
270
257
|
#
|
@@ -278,37 +265,104 @@ module MapRedus
|
|
278
265
|
FileSystem.lrange( ProcessInfo.reduce(@pid, hashed_key), 0, -1 )
|
279
266
|
end
|
280
267
|
|
281
|
-
|
282
|
-
|
283
|
-
|
284
|
-
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
|
268
|
+
def result_key(*args)
|
269
|
+
Helper.class_get(@type).result_key(*[@key_args, args].flatten)
|
270
|
+
end
|
271
|
+
|
272
|
+
def self.result_key(*args)
|
273
|
+
ProcessInfo.send( "#{self.to_s.gsub(/\W/,"_")}_result_cache", *args )
|
274
|
+
end
|
275
|
+
|
276
|
+
def self.set_result_key(key_struct)
|
277
|
+
MapRedus.redefine_redis_key( "#{self.to_s.gsub(/\W/,"_")}_result_cache", key_struct )
|
278
|
+
end
|
279
|
+
|
280
|
+
# Create sets up a process to be run with the given specification.
|
281
|
+
# It saves the information in the FileSystem and returns an
|
282
|
+
# instance of the process that run should be called on when
|
283
|
+
# running is desired.
|
290
284
|
#
|
291
|
-
#
|
292
|
-
|
285
|
+
# Example
|
286
|
+
# process = MapRedus::Process.create
|
287
|
+
# process.run
|
288
|
+
#
|
289
|
+
# Returns an instance of the process
|
290
|
+
def self.create
|
293
291
|
new_pid = get_available_pid
|
294
|
-
|
295
|
-
|
296
|
-
|
297
|
-
|
298
|
-
|
292
|
+
specification = ATTRS.inject({}) do |ret, attr|
|
293
|
+
ret[attr] = send(attr)
|
294
|
+
ret
|
295
|
+
end
|
296
|
+
specification[:type] = self
|
297
|
+
self.new(new_pid, specification).save
|
299
298
|
end
|
300
|
-
|
301
|
-
|
302
|
-
|
299
|
+
|
300
|
+
# This defines the attributes to be associated with a MapRedus process
|
301
|
+
# This will allow us to subclass a Process, creating a new specification
|
302
|
+
# by specifying what say the inputter should equal
|
303
|
+
#
|
304
|
+
# Example
|
305
|
+
# class AnswerDistribution < MapRedus::Process
|
306
|
+
# inputter = JudgmentStream
|
307
|
+
# mapper = ResponseFrequencyMap
|
308
|
+
# reducer = Adder
|
309
|
+
# finalizer = AnswerCount
|
310
|
+
# outputter = MapRedus::RedisHasher
|
311
|
+
# end
|
312
|
+
class << self; attr_reader *ATTRS; end
|
313
|
+
|
314
|
+
# Setter/Getter method definitions to set/get the attribute for
|
315
|
+
# the class. In the getter if it is not defined (nil) then return
|
316
|
+
# the default attribute defined in MapRedus::Process.
|
317
|
+
#
|
318
|
+
# Example
|
319
|
+
# class AnswerDistribution < MapRedus::Process
|
320
|
+
# inputter JudgmentStream
|
321
|
+
# mapper ResponseFrequency
|
322
|
+
# end
|
323
|
+
# AnswerDistribution.reducer.should == Adder
|
324
|
+
ATTRS.each do |attr|
|
325
|
+
(class << self; self; end).send(:define_method, attr) do |*one_arg|
|
326
|
+
attribute = "@#{attr}"
|
327
|
+
case one_arg.size
|
328
|
+
when 0
|
329
|
+
instance_variable_get(attribute) || MapRedus::Process.instance_variable_get(attribute)
|
330
|
+
when 1
|
331
|
+
instance_variable_set(attribute, one_arg.first)
|
332
|
+
else
|
333
|
+
raise ArgumentError.new("wrong number of arguments (#{one_arg.size}) when zero or one arguments were expected")
|
334
|
+
end
|
335
|
+
end
|
303
336
|
end
|
304
337
|
|
338
|
+
# Default attributes for the process class. All other attributes
|
339
|
+
# are nil by default.
|
340
|
+
inputter WordStream
|
341
|
+
mapper WordCounter
|
342
|
+
reducer Adder
|
343
|
+
finalizer ToRedisHash
|
344
|
+
outputter RedisHasher
|
345
|
+
type Process
|
346
|
+
|
347
|
+
# This function returns all the redis keys produced associated
|
348
|
+
# with a process's process id.
|
349
|
+
#
|
350
|
+
# Example
|
351
|
+
# Process.info(17)
|
352
|
+
#
|
353
|
+
# Returns an array of keys associated with the process id.
|
305
354
|
def self.info(pid)
|
306
355
|
FileSystem.keys(ProcessInfo.pid(pid) + "*")
|
307
356
|
end
|
308
357
|
|
358
|
+
# Returns an instance of the process class given the process id.
|
359
|
+
# If no such process id exists returns nil.
|
360
|
+
#
|
361
|
+
# Example
|
362
|
+
# process = Process.open(17)
|
309
363
|
def self.open(pid)
|
310
364
|
spec = Helper.decode( FileSystem.get(ProcessInfo.pid(pid)) )
|
311
|
-
spec &&
|
365
|
+
spec && self.new( pid, spec )
|
312
366
|
end
|
313
367
|
|
314
368
|
# Find out what map reduce processes are out there
|
@@ -331,12 +385,13 @@ module MapRedus
|
|
331
385
|
FileSystem.incrby(ProcessInfo.processes_count, 1 + rand(20))
|
332
386
|
end
|
333
387
|
|
334
|
-
# Given a result
|
388
|
+
# Given a arguments for a result key, delete the result from the
|
389
|
+
# filesystem.
|
335
390
|
#
|
336
391
|
# Examples
|
337
392
|
# Process.delete_saved_result(key)
|
338
|
-
def self.delete_saved_result(
|
339
|
-
FileSystem.del(
|
393
|
+
def self.delete_saved_result(*key_args)
|
394
|
+
FileSystem.del( result_key(*key_args) )
|
340
395
|
end
|
341
396
|
|
342
397
|
# Remove redis keys associated with this process if the Master isn't working.
|
data/lib/mapredus/support.rb
CHANGED
@@ -4,7 +4,6 @@ module MapRedus
|
|
4
4
|
class DuplicateProcessDefinitionError < MapRedusRunnerError ; end
|
5
5
|
|
6
6
|
class Runner
|
7
|
-
attr_reader :process
|
8
7
|
def initialize(class_name)
|
9
8
|
@class = class_name
|
10
9
|
end
|
@@ -14,7 +13,7 @@ module MapRedus
|
|
14
13
|
if self.respond_to?(mr_process)
|
15
14
|
self.send(mr_process, *args, &block)
|
16
15
|
else
|
17
|
-
super(method, *args, &block)
|
16
|
+
super(method, *args, &block)
|
18
17
|
end
|
19
18
|
end
|
20
19
|
end
|
@@ -24,7 +23,7 @@ module MapRedus
|
|
24
23
|
end
|
25
24
|
|
26
25
|
module ClassMethods
|
27
|
-
def mapreduce_process( process_name, mapredus_process_class, result_store
|
26
|
+
def mapreduce_process( process_name, mapredus_process_class, result_store )
|
28
27
|
runner_self = Runner
|
29
28
|
class_name = self.to_s.gsub(/\W/,"_")
|
30
29
|
|
@@ -34,17 +33,18 @@ module MapRedus
|
|
34
33
|
raise DuplicateProcessDefintionError
|
35
34
|
end
|
36
35
|
|
37
|
-
|
38
|
-
RedisSupport.redis_key( keyname, result_store )
|
36
|
+
mapredus_process_class.set_result_key( result_store )
|
39
37
|
|
40
|
-
runner_self.send( :define_method, global_process_name ) do |data,
|
41
|
-
|
42
|
-
|
43
|
-
|
38
|
+
runner_self.send( :define_method, global_process_name ) do |data, key_arguments|
|
39
|
+
process = mapredus_process_class.create
|
40
|
+
process.update(:key_args => key_arguments)
|
41
|
+
process.run(data)
|
42
|
+
process
|
44
43
|
end
|
45
44
|
|
46
|
-
runner_self.send( :define_method, "#{global_process_name}_result" ) do
|
47
|
-
|
45
|
+
runner_self.send( :define_method, "#{global_process_name}_result" ) do |key_arguments, *outputter_args|
|
46
|
+
key = mapredus_process_class.result_key( *key_arguments )
|
47
|
+
mapredus_process_class.outputter.decode( key, *outputter_args)
|
48
48
|
end
|
49
49
|
end
|
50
50
|
end
|
data/spec/helper_classes.rb
CHANGED
@@ -1,49 +1,3 @@
|
|
1
|
-
class GetCharCount < MapRedus::Process
|
2
|
-
EXPECTED_ANSWER = {"k"=>2, "v"=>1, " "=>54, ","=>3, "w"=>7, "a"=>17, "l"=>12, "b"=>2, "m"=>4, "c"=>3, "."=>2, "y"=>3, "n"=>18, "D"=>1, "d"=>15, "o"=>13, "p"=>14, "e"=>34, "f"=>6, "r"=>13, "g"=>6, "S"=>1, "s"=>12, "h"=>19, "H"=>1, "t"=>20, "i"=>16, "u"=>5, "j"=>1}
|
3
|
-
def self.specification
|
4
|
-
{
|
5
|
-
:inputter => CharStream,
|
6
|
-
:mapper => CharCounter,
|
7
|
-
:reducer => Adder,
|
8
|
-
:finalizer => ToRedisHash,
|
9
|
-
:outputter => MapRedus::RedisHasher,
|
10
|
-
:ordered => false
|
11
|
-
}
|
12
|
-
end
|
13
|
-
end
|
14
|
-
|
15
|
-
class GetWordCount < MapRedus::Process
|
16
|
-
TEST = "He pointed his finger in friendly jest and went over to the parapet laughing to himself. Stephen Dedalus stepped up, followed him wearily halfway and sat down on the edge of the gunrest, watching him still as he propped his mirror on the parapet, dipped the brush in the bowl and lathered cheeks and neck."
|
17
|
-
EXPECTED_ANSWER = {"gunrest"=>1, "over"=>1, "still"=>1, "of"=>1, "him"=>2, "and"=>4, "bowl"=>1, "himself"=>1, "went"=>1, "friendly"=>1, "finger"=>1, "propped"=>1, "cheeks"=>1, "dipped"=>1, "down"=>1, "wearily"=>1, "up"=>1, "stepped"=>1, "dedalus"=>1, "to"=>2, "in"=>2, "sat"=>1, "the"=>6, "pointed"=>1, "as"=>1, "followed"=>1, "stephen"=>1, "laughing"=>1, "his"=>2, "he"=>2, "brush"=>1, "jest"=>1, "neck"=>1, "mirror"=>1, "edge"=>1, "on"=>2, "parapet"=>2, "lathered"=>1, "watching"=>1, "halfway"=>1}
|
18
|
-
def self.specification
|
19
|
-
{
|
20
|
-
:inputter => WordStream,
|
21
|
-
:mapper => WordCounter,
|
22
|
-
:reducer => Adder,
|
23
|
-
:finalizer => ToRedisHash,
|
24
|
-
:outputter => MapRedus::RedisHasher,
|
25
|
-
:ordered => false,
|
26
|
-
:keyname => "test:result"
|
27
|
-
}
|
28
|
-
end
|
29
|
-
end
|
30
|
-
|
31
|
-
class WordStream < MapRedus::InputStream
|
32
|
-
def self.scan(data_object)
|
33
|
-
#
|
34
|
-
# The data_object should be a reference to an object that is
|
35
|
-
# stored on your system. The scanner is used to break up what you
|
36
|
-
# need from the object into manageable pieces for the mapper. In
|
37
|
-
# this example, the data object is a reference to a redis string.
|
38
|
-
#
|
39
|
-
test_string = MapRedus::FileSystem.get(data_object)
|
40
|
-
|
41
|
-
test_string.split.each_slice(10).each_with_index do |word_set, i|
|
42
|
-
yield(i, word_set.join(" "))
|
43
|
-
end
|
44
|
-
end
|
45
|
-
end
|
46
|
-
|
47
1
|
class CharStream < MapRedus::InputStream
|
48
2
|
def self.scan(data_object)
|
49
3
|
test_string = MapRedus::FileSystem.get(data_object)
|
@@ -56,15 +10,6 @@ class CharStream < MapRedus::InputStream
|
|
56
10
|
end
|
57
11
|
end
|
58
12
|
|
59
|
-
class WordCounter < MapRedus::Mapper
|
60
|
-
def self.map(map_data)
|
61
|
-
map_data.split(/\W/).each do |word|
|
62
|
-
next if word.empty?
|
63
|
-
yield(word.downcase, 1)
|
64
|
-
end
|
65
|
-
end
|
66
|
-
end
|
67
|
-
|
68
13
|
class CharCounter < MapRedus::Mapper
|
69
14
|
def self.map(map_data)
|
70
15
|
map_data.each_char do |char|
|
@@ -73,18 +18,16 @@ class CharCounter < MapRedus::Mapper
|
|
73
18
|
end
|
74
19
|
end
|
75
20
|
|
76
|
-
class
|
77
|
-
|
78
|
-
|
79
|
-
|
21
|
+
class GetCharCount < MapRedus::Process
|
22
|
+
EXPECTED_ANSWER = {"k"=>2, "v"=>1, " "=>54, ","=>3, "w"=>7, "a"=>17, "l"=>12, "b"=>2, "m"=>4, "c"=>3, "."=>2, "y"=>3, "n"=>18, "D"=>1, "d"=>15, "o"=>13, "p"=>14, "e"=>34, "f"=>6, "r"=>13, "g"=>6, "S"=>1, "s"=>12, "h"=>19, "H"=>1, "t"=>20, "i"=>16, "u"=>5, "j"=>1}
|
23
|
+
inputter CharStream
|
24
|
+
mapper CharCounter
|
80
25
|
end
|
81
26
|
|
82
|
-
class
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
end
|
87
|
-
end
|
27
|
+
class GetWordCount < MapRedus::Process
|
28
|
+
TEST = "He pointed his finger in friendly jest and went over to the parapet laughing to himself. Stephen Dedalus stepped up, followed him wearily halfway and sat down on the edge of the gunrest, watching him still as he propped his mirror on the parapet, dipped the brush in the bowl and lathered cheeks and neck."
|
29
|
+
EXPECTED_ANSWER = {"gunrest"=>1, "over"=>1, "still"=>1, "of"=>1, "him"=>2, "and"=>4, "bowl"=>1, "himself"=>1, "went"=>1, "friendly"=>1, "finger"=>1, "propped"=>1, "cheeks"=>1, "dipped"=>1, "down"=>1, "wearily"=>1, "up"=>1, "stepped"=>1, "dedalus"=>1, "to"=>2, "in"=>2, "sat"=>1, "the"=>6, "pointed"=>1, "as"=>1, "followed"=>1, "stephen"=>1, "laughing"=>1, "his"=>2, "he"=>2, "brush"=>1, "jest"=>1, "neck"=>1, "mirror"=>1, "edge"=>1, "on"=>2, "parapet"=>2, "lathered"=>1, "watching"=>1, "halfway"=>1}
|
30
|
+
set_result_key "test:result"
|
88
31
|
end
|
89
32
|
|
90
33
|
class Document
|
data/spec/mapredus_spec.rb
CHANGED
@@ -8,13 +8,41 @@ describe "MapRedus" do
|
|
8
8
|
MapRedus::FileSystem.setnx("wordstream:test", GetWordCount::TEST)
|
9
9
|
end
|
10
10
|
|
11
|
+
it "has sets up the correct default classes" do
|
12
|
+
MapRedus::Process.inputter.should == MapRedus::WordStream
|
13
|
+
MapRedus::Process.mapper.should == MapRedus::WordCounter
|
14
|
+
MapRedus::Process.reducer.should == MapRedus::Adder
|
15
|
+
MapRedus::Process.finalizer.should == MapRedus::ToRedisHash
|
16
|
+
MapRedus::Process.outputter.should == MapRedus::RedisHasher
|
17
|
+
|
18
|
+
GetWordCount.result_key.should == "test:result"
|
19
|
+
GetWordCount.inputter.should == MapRedus::WordStream
|
20
|
+
GetWordCount.mapper.should == MapRedus::WordCounter
|
21
|
+
GetWordCount.reducer.should == MapRedus::Adder
|
22
|
+
GetWordCount.finalizer.should == MapRedus::ToRedisHash
|
23
|
+
GetWordCount.outputter.should == MapRedus::RedisHasher
|
24
|
+
|
25
|
+
GetCharCount.inputter.should == CharStream
|
26
|
+
GetCharCount.mapper.should == CharCounter
|
27
|
+
GetCharCount.reducer.should == MapRedus::Adder
|
28
|
+
GetCharCount.finalizer.should == MapRedus::ToRedisHash
|
29
|
+
GetCharCount.outputter.should == MapRedus::RedisHasher
|
30
|
+
end
|
31
|
+
|
11
32
|
it "creates a process successfully" do
|
12
33
|
process = GetWordCount.open(@process.pid)
|
13
34
|
|
14
|
-
process.inputter.should == WordStream
|
15
|
-
process.mapper.should == WordCounter
|
16
|
-
process.reducer.should == Adder
|
17
|
-
process.finalizer.should == ToRedisHash
|
35
|
+
process.inputter.should == MapRedus::WordStream
|
36
|
+
process.mapper.should == MapRedus::WordCounter
|
37
|
+
process.reducer.should == MapRedus::Adder
|
38
|
+
process.finalizer.should == MapRedus::ToRedisHash
|
39
|
+
process.outputter.should == MapRedus::RedisHasher
|
40
|
+
|
41
|
+
process = GetCharCount.create
|
42
|
+
process.inputter.should == CharStream
|
43
|
+
process.mapper.should == CharCounter
|
44
|
+
process.reducer.should == MapRedus::Adder
|
45
|
+
process.finalizer.should == MapRedus::ToRedisHash
|
18
46
|
process.outputter.should == MapRedus::RedisHasher
|
19
47
|
end
|
20
48
|
|
@@ -22,6 +50,7 @@ describe "MapRedus" do
|
|
22
50
|
##
|
23
51
|
## In general map reduce shouldn't be running operations synchronously
|
24
52
|
##
|
53
|
+
@process.class.should == GetWordCount
|
25
54
|
@process.run("wordstream:test", synchronously = true)
|
26
55
|
@process.map_keys.size.should == GetWordCount::EXPECTED_ANSWER.size
|
27
56
|
|
@@ -31,7 +60,7 @@ describe "MapRedus" do
|
|
31
60
|
end
|
32
61
|
|
33
62
|
@process.each_key_reduced_value do |key, value|
|
34
|
-
@process.outputter.decode(@process.
|
63
|
+
@process.outputter.decode(@process.result_key, key).to_i.should == GetWordCount::EXPECTED_ANSWER[key]
|
35
64
|
end
|
36
65
|
end
|
37
66
|
|
@@ -46,7 +75,7 @@ describe "MapRedus" do
|
|
46
75
|
end
|
47
76
|
|
48
77
|
@process.each_key_reduced_value do |key, value|
|
49
|
-
@process.outputter.decode(@process.
|
78
|
+
@process.outputter.decode(@process.result_key, key).to_i.should == GetWordCount::EXPECTED_ANSWER[key]
|
50
79
|
end
|
51
80
|
end
|
52
81
|
end
|
@@ -158,8 +187,8 @@ describe "MapRedus Process" do
|
|
158
187
|
|
159
188
|
it "emit_intermediate on an ordered process" do
|
160
189
|
@process.update(:ordered => true)
|
161
|
-
@process.emit_intermediate(1, "number", "one")
|
162
190
|
@process.emit_intermediate(2, "place", "two")
|
191
|
+
@process.emit_intermediate(1, "number", "one")
|
163
192
|
res = []
|
164
193
|
@process.each_key_nonreduced_value do |key, value|
|
165
194
|
res << [key, value]
|
@@ -207,14 +236,14 @@ describe "MapRedus Master" do
|
|
207
236
|
end
|
208
237
|
|
209
238
|
it "handles slaves (enslaving) correctly" do
|
210
|
-
MapRedus::Master.enslave(@process, WordCounter, @process.pid, "test")
|
211
|
-
Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, "test"], "class"=>"WordCounter"}]
|
239
|
+
MapRedus::Master.enslave(@process, MapRedus::WordCounter, @process.pid, "test")
|
240
|
+
Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, "test"], "class"=>"MapRedus::WordCounter"}]
|
212
241
|
MapRedus::Master.slaves(@process.pid).should == ["1"]
|
213
242
|
end
|
214
243
|
|
215
244
|
it "handles slaves (freeing) correctly" do
|
216
|
-
MapRedus::Master.enslave(@process, WordCounter, @process.pid, "test")
|
217
|
-
MapRedus::Master.enslave(@process, WordCounter, @process.pid, "test")
|
245
|
+
MapRedus::Master.enslave(@process, MapRedus::WordCounter, @process.pid, "test")
|
246
|
+
MapRedus::Master.enslave(@process, MapRedus::WordCounter, @process.pid, "test")
|
218
247
|
|
219
248
|
MapRedus::Master.slaves(@process.pid).should == ["1", "1"]
|
220
249
|
|
@@ -237,12 +266,12 @@ describe "MapRedus Mapper/Reducer/Finalizer" do
|
|
237
266
|
@process.update(:state => MapRedus::INPUT_MAP_IN_PROGRESS)
|
238
267
|
@process.state.should == MapRedus::INPUT_MAP_IN_PROGRESS
|
239
268
|
@process.inputter.perform(@process.pid, "wordstream:test")
|
240
|
-
Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, 0], "class"=>"WordCounter"}]
|
269
|
+
Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, 0], "class"=>"MapRedus::WordCounter"}]
|
241
270
|
Resque.pop(:mapredus)
|
242
271
|
@process.mapper.perform(@process.pid, 0)
|
243
272
|
@process.reload
|
244
273
|
@process.state.should == MapRedus::REDUCE_IN_PROGRESS
|
245
|
-
Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, "data"], "class"=>"Adder"}]
|
274
|
+
Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, "data"], "class"=>"MapRedus::Adder"}]
|
246
275
|
end
|
247
276
|
|
248
277
|
it "runs a reduce correctly proceeding to the correct next state" do
|
@@ -252,7 +281,7 @@ describe "MapRedus Mapper/Reducer/Finalizer" do
|
|
252
281
|
@process.reducer.perform(@process.pid, "data")
|
253
282
|
@process.reload
|
254
283
|
@process.state.should == MapRedus::FINALIZER_IN_PROGRESS
|
255
|
-
Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid], "class"=>"ToRedisHash"}]
|
284
|
+
Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid], "class"=>"MapRedus::ToRedisHash"}]
|
256
285
|
end
|
257
286
|
|
258
287
|
it "should test that the finalizer correctly saves" do
|
@@ -268,7 +297,7 @@ describe "MapRedus Mapper/Reducer/Finalizer" do
|
|
268
297
|
end
|
269
298
|
end
|
270
299
|
|
271
|
-
describe "
|
300
|
+
describe "MapRedus Support" do
|
272
301
|
before(:each) do
|
273
302
|
MapRedus::FileSystem.flushall
|
274
303
|
@doc = Document.new(10)
|
@@ -285,11 +314,11 @@ describe "MapReduce Support" do
|
|
285
314
|
work_off
|
286
315
|
|
287
316
|
GetCharCount::EXPECTED_ANSWER.keys.each do |char|
|
288
|
-
@doc.mapreduce.char_count_result(char).should == GetCharCount::EXPECTED_ANSWER[char].to_s
|
317
|
+
@doc.mapreduce.char_count_result([@doc.id], char).should == GetCharCount::EXPECTED_ANSWER[char].to_s
|
289
318
|
end
|
290
319
|
|
291
320
|
other_answer.keys.each do |char|
|
292
|
-
@other_doc.mapreduce.char_count_result(char).should == other_answer[char].to_s
|
321
|
+
@other_doc.mapreduce.char_count_result([@other_doc.id], char).should == other_answer[char].to_s
|
293
322
|
end
|
294
323
|
end
|
295
324
|
end
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: mapredus
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 27
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
8
|
- 0
|
9
|
-
-
|
10
|
-
version: 0.0.
|
9
|
+
- 2
|
10
|
+
version: 0.0.2
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- John Le
|
@@ -16,7 +16,7 @@ autorequire:
|
|
16
16
|
bindir: bin
|
17
17
|
cert_chain: []
|
18
18
|
|
19
|
-
date: 2010-07-
|
19
|
+
date: 2010-07-09 00:00:00 -07:00
|
20
20
|
default_executable:
|
21
21
|
dependencies:
|
22
22
|
- !ruby/object:Gem::Dependency
|
@@ -89,6 +89,7 @@ extra_rdoc_files:
|
|
89
89
|
- README.md
|
90
90
|
files:
|
91
91
|
- lib/mapredus.rb
|
92
|
+
- lib/mapredus/default_classes.rb
|
92
93
|
- lib/mapredus/filesystem.rb
|
93
94
|
- lib/mapredus/finalizer.rb
|
94
95
|
- lib/mapredus/inputter.rb
|