mapredus 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -15,7 +15,7 @@ Goals:
15
15
  * simple M/R-style programming for existing Ruby projects
16
16
  * low cost of entry (no need for a dedicated cluster)
17
17
 
18
- If you are looking for a high-performance MapReduce implementation
18
+ if you are looking for a high-performance MapReduce implementation
19
19
  that can meet your big data needs, try Hadoop.
20
20
 
21
21
 
@@ -25,26 +25,31 @@ Using MapRedus
25
25
  MapRedus uses Resque to handle the processes that it runs, and redis
26
26
  to keep a store for the values/data produced.
27
27
 
28
- Workers for a MapRedus process, are Resque workers. Refer to the
28
+ Workers for a MapRedus process are Resque workers. Refer to the
29
29
  Resque worker documentation to see how to load the necessary
30
30
  environment for your worker to be able to run mapreduce processs. An
31
31
  example is also located in the tests.
32
32
 
33
33
  ### Attaching a mapreduce process to a class
34
- Often times you'll want to define a mapreduce process that does
35
- operation on data within a class. Here is how this looks. There is
36
- also an example of this in the tests.
34
+
35
+ You will often want to define a mapreduce process that does some
36
+ operations on data within a class. The process should have an
37
+ inputter, mapper, reducer, finalizer, and outputter defined. By
38
+ default a process will have the specifications shown below. There is
39
+ also an example of how to do this in the tests.
40
+
37
41
  class GetWordCount < MapRedus::Process
38
- def self.specification
39
- {
40
- :inputter => WordStream,
41
- :mapper => WordCounter,
42
- :reducer => Adder,
43
- :finalizer => ToRedisHash,
44
- :outputter => MapRedus::RedisHasher,
45
- :ordered => false
46
- }
47
- end
42
+ inputter MapRedus::WordStream
43
+ mapper MapRedus::WordCounter
44
+ reducer MapRedus::Adder
45
+ finalizer MapRedus::ToRedisHash
46
+ outputter MapRedus::RedisHasher
47
+ ordered false
48
+ end
49
+
50
+ class GetCharCount < MapRedus::Process
51
+ inputter MapRedus::CharStream
52
+ mapper MapRedus::CharCounter
48
53
  end
49
54
 
50
55
  class Job
@@ -91,10 +96,10 @@ example:
91
96
  end
92
97
  end
93
98
 
94
- In this example, the inputt stream calls yield to output a mapredus
99
+ In this example, the input stream calls yield to output a mapredus
95
100
  file number and a the value that is saved to file (in redis). The
96
- mapper's map function calls yield to emit the key value pair for
97
- storage in redis. The reducer's reduce function acts similarly.
101
+ mapper's `map` function calls yield to emit the key value pair for
102
+ storage in redis. The reducer's `reduce` function acts similarly.
98
103
 
99
104
  The finalizer runs whatever needs to be run when a process completes,
100
105
  an example:
@@ -102,7 +107,7 @@ an example:
102
107
  class Finalizer < MapRedus::Finalizer
103
108
  def self.finalize(process)
104
109
  process.each_key_reduced_value do |key, value|
105
- process.outputter.encode(process.keyname, key, value)
110
+ process.outputter.encode(process.result_key, key, value)
106
111
  end
107
112
  ...
108
113
  < set off a new mapredus process to use this stored data >
@@ -127,6 +132,16 @@ hash.
127
132
  The default Outputter makes no changes to original result, and tries
128
133
  to store that directly into redis as a string.
129
134
 
135
+ Working Locally
136
+ ---------------
137
+
138
+ MapRedus uses Bundler to manage dependencies. With Bundler installed:
139
+
140
+ bundle install
141
+
142
+ You should now be able to run tests and do all other tasks with
143
+ `rake`.
144
+
130
145
  Running Tests
131
146
  -------------
132
147
 
@@ -136,15 +151,15 @@ tests (you'll need to have bundler installed)
136
151
 
137
152
  Requirements
138
153
  ------------
139
- Bundler (this will install all the requirements below)
140
- Redis
141
- RedisSupport
142
- Resque
143
- Resque-scheduler
154
+ * Bundler (this will install all the requirements below)
155
+ * Redis
156
+ * RedisSupport
157
+ * Resque
158
+ * Resque-scheduler
144
159
 
145
160
  ### Notes
146
- Instead of calling "emit_intermediate"/"emit" in your map/reduce
147
- to produce a key value pair/value you call yield, which will call
161
+ Instead of calling `emit_intermediate`/`emit` in your map/reduce
162
+ to produce a key value pair/value you call `yield`, which will call
148
163
  emit_intermediate/emit for you. This gives flexibility in using
149
164
  Mapper/Reducer classes especially in testing.
150
165
 
@@ -198,17 +213,17 @@ not necessarily in the given order
198
213
 
199
214
  * think about the following logic
200
215
 
201
- if a reducer starts working on a key after all maps have finished
216
+ + if a reducer starts working on a key after all maps have finished
202
217
  then when it is done the work on that key is finished forerver
203
218
 
204
- this would imply a process finishes when all map tasks have
219
+ + this would imply a process finishes when all map tasks have
205
220
  finished and all reduce tasks that start after the map tasks have
206
221
  finished
207
222
 
208
- if a reducer started before all map tasks were finished, then load
223
+ + if a reducer started before all map tasks were finished, then load
209
224
  its reduced result back onto the value list
210
225
 
211
- if the reducer started after all map tasks finished, then emit the
226
+ + if the reducer started after all map tasks finished, then emit the
212
227
  result
213
228
 
214
229
  Note on Patches/Pull Requests
data/lib/mapredus.rb CHANGED
@@ -95,7 +95,6 @@ module MapRedus
95
95
  end
96
96
 
97
97
  require 'mapredus/keys'
98
- require 'mapredus/process'
99
98
  require 'mapredus/filesystem'
100
99
  require 'mapredus/master'
101
100
  require 'mapredus/mapper'
@@ -104,3 +103,5 @@ require 'mapredus/finalizer'
104
103
  require 'mapredus/support'
105
104
  require 'mapredus/outputter'
106
105
  require 'mapredus/inputter'
106
+ require 'mapredus/default_classes'
107
+ require 'mapredus/process'
@@ -0,0 +1,54 @@
1
+ module MapRedus
2
+ class WordStream < InputStream
3
+ def self.scan(data_object)
4
+ #
5
+ # The data_object should be a reference to an object that is
6
+ # stored on your system. The scanner is used to break up what you
7
+ # need from the object into manageable pieces for the mapper. In
8
+ # this example, the data object is a reference to a redis string.
9
+ #
10
+ test_string = FileSystem.get(data_object)
11
+
12
+ test_string.split.each_slice(10).each_with_index do |word_set, i|
13
+ yield(i, word_set.join(" "))
14
+ end
15
+ end
16
+ end
17
+
18
+ class WordCounter < Mapper
19
+ def self.map(map_data)
20
+ map_data.split(/\W/).each do |word|
21
+ next if word.empty?
22
+ yield(word.downcase, 1)
23
+ end
24
+ end
25
+ end
26
+
27
+ class Adder < Reducer
28
+ def self.reduce(value_list)
29
+ yield( value_list.reduce(0) { |r, v| r += v.to_i } )
30
+ end
31
+ end
32
+
33
+ class ToRedisHash < Finalizer
34
+ def self.finalize(process)
35
+ process.each_key_reduced_value do |key, value|
36
+ process.outputter.encode(process.result_key, key, value)
37
+ end
38
+ end
39
+ end
40
+
41
+ class RedisHasher < Outputter
42
+ def self.keys(result_key)
43
+ FileSystem.hkeys(result_key)
44
+ end
45
+
46
+ def self.encode(result_key, k, v)
47
+ FileSystem.hset(result_key, k, v)
48
+ end
49
+
50
+ def self.decode(result_key, k)
51
+ FileSystem.hget(result_key, k)
52
+ end
53
+ end
54
+ end
@@ -24,20 +24,20 @@ module MapRedus
24
24
  # Setup locks on results using RedisSupport lock functionality
25
25
  #
26
26
  # Examples
27
- # FileSystem::has_lock?(keyname)
27
+ # FileSystem::has_lock?(key)
28
28
  # # => true or false
29
29
  #
30
30
  # Returns true if there's a lock
31
- def self.has_lock?(keyname)
32
- MapRedus.has_redis_lock?( RedisKey.result_cache(keyname) )
31
+ def self.has_lock?(key)
32
+ MapRedus.has_redis_lock?( RedisKey.result_cache(key) )
33
33
  end
34
34
 
35
- def self.acquire_lock(keyname)
36
- MapRedus.acquire_redis_lock_nonblock( RedisKey.result_cache(keyname), 60 * 60 )
35
+ def self.acquire_lock(key)
36
+ MapRedus.acquire_redis_lock_nonblock( RedisKey.result_cache(key), 60 * 60 )
37
37
  end
38
38
 
39
- def self.release_lock(keyname)
40
- MapRedus.release_redis_lock( RedisKey.result_cache(keyname) )
39
+ def self.release_lock(key)
40
+ MapRedus.release_redis_lock( RedisKey.result_cache(key) )
41
41
  end
42
42
  end
43
43
  end
data/lib/mapredus/keys.rb CHANGED
@@ -36,13 +36,6 @@ module MapRedus
36
36
  #
37
37
  redis_key :temp, "mapredus:process:PID:temp_reduce_key:HASHED_KEY:UNIQUE_REDUCE_HOSTNAME:UNIQUE_REDUCE_PROCESS_ID"
38
38
 
39
- # If we want to hold on to our final data we have a key to put that data in
40
- # In normal map reduce we would just be outputting files
41
- #
42
- redis_key :result, "mapredus:process:PID:result"
43
- redis_key :result_cache, "mapredus:result:KEYNAME"
44
-
45
-
46
39
  #### USED WITHIN master.rb ####
47
40
 
48
41
  # Keeps track of the current slaves (by appending "1" to a redis list)
@@ -29,14 +29,4 @@ module MapRedus
29
29
  FileSystem.set(result_key, Helper.encode(o))
30
30
  end
31
31
  end
32
-
33
- class RedisHasher < Outputter
34
- def self.encode(result_key, k, v)
35
- FileSystem.hset(result_key, k, v)
36
- end
37
-
38
- def self.decode(result_key, k)
39
- FileSystem.hget(result_key, k)
40
- end
41
- end
42
32
  end
@@ -6,16 +6,19 @@ module MapRedus
6
6
  # the value of the redis object is a json object which contains:
7
7
  #
8
8
  # {
9
+ # inputter : inputstreamclass,
9
10
  # mapper : mapclass,
10
11
  # reducer : reduceclass,
11
12
  # finalizer : finalizerclass,
13
+ # outputter : outputterclass,
12
14
  # partitioner : <not supported>,
13
15
  # combiner : <not supported>,
14
16
  # ordered : true_or_false ## ensures ordering keys from the map output --> [ order, key, value ],
15
17
  # synchronous : true_or_false ## runs the process synchronously or not (generally used for testing)
16
18
  # result_timeout : lenght of time a result is saved ## 3600 * 24
17
- # keyname : the location to the save the result of the process (cache location)
19
+ # key_args : arguments to be added to the key location of the result save (cache location)
18
20
  # state : the current state of the process (shouldn't be set by the process and starts off as nil)
21
+ # type : the original process class ( currently this is needed so we can have namespaces for the result_cache keys )
19
22
  # }
20
23
  #
21
24
  # The user has the ability in subclassing this class to create extra features if needed
@@ -24,7 +27,7 @@ module MapRedus
24
27
  # Public: Keep track of information that may show up as the redis json value
25
28
  # This is so we know exactly what might show up in the json hash
26
29
  READERS = [:pid]
27
- ATTRS = [:inputter, :mapper, :reducer, :finalizer, :outputter, :ordered, :synchronous, :result_timeout, :keyname, :state]
30
+ ATTRS = [:inputter, :mapper, :reducer, :finalizer, :outputter, :ordered, :synchronous, :result_timeout, :key_args, :state, :type]
28
31
  READERS.each { |r| attr_reader r }
29
32
  ATTRS.each { |a| attr_accessor a }
30
33
 
@@ -42,10 +45,11 @@ module MapRedus
42
45
  @ordered = json_helper(json_info, :ordered)
43
46
  @synchronous = json_helper(json_info, :synchronous)
44
47
  @result_timeout = json_helper(json_info, :result_timeout) || DEFAULT_TIME
45
- @keyname = json_helper(json_info, :keyname)
48
+ @key_args = json_helper(json_info, :key_args) || []
46
49
  @state = json_helper(json_info, :state) || NOT_STARTED
47
50
  @outputter = json_helper(json_info, :outputter)
48
51
  @outputter = @outputter ? Helper.class_get(@outputter) : MapRedus::Outputter
52
+ @type = Helper.class_get(json_helper(json_info, :type) || Process)
49
53
  end
50
54
 
51
55
  def json_helper(json_info, key)
@@ -174,7 +178,7 @@ module MapRedus
174
178
  #
175
179
  # Examples
176
180
  # emit_intermediate(key, value)
177
- # # =>
181
+ # # => if an ordering is required
178
182
  # emit_intermediate(rank, key, value)
179
183
  #
180
184
  # Returns the true on success.
@@ -197,6 +201,16 @@ module MapRedus
197
201
  true
198
202
  end
199
203
 
204
+ # The emission associated with a reduce. Currently all reduced
205
+ # values are pushed onto a redis list. It may be the case that we
206
+ # want to directly use a different redis type given the kind of
207
+ # reduce we are doing. Often a reduce only returns one value, so
208
+ # instead of a rpush, we should do a set.
209
+ #
210
+ # Examples
211
+ # emit(key, reduced_value)
212
+ #
213
+ # Returns "OK" on success.
200
214
  def emit(key, reduce_val)
201
215
  hashed_key = Helper.hash(key)
202
216
  FileSystem.rpush( ProcessInfo.reduce(@pid, hashed_key), reduce_val )
@@ -207,32 +221,6 @@ module MapRedus
207
221
  FileSystem.get( ProcessInfo.hash_to_key(@pid, hashed_key) ) == key.to_s )
208
222
  end
209
223
 
210
- # Saves the result to the specified keyname, using the specified outputter
211
- #
212
- # Example
213
- # (mapreduce:process:result:KEYNAME)
214
- # OR
215
- # process:pid:result
216
- #
217
- # The client must ensure the the result will not be affected when to_s is applied
218
- # since redis stores all values as strings
219
- #
220
- # Returns true on success.
221
- def save_result(result)
222
- res = @outputter.encode(result)
223
- FileSystem.save(ProcessInfo.result(@pid), res)
224
- FileSystem.save(ProcessInfo.result_cache(@keyname), res, @result_timeout) if @keyname
225
- true
226
- end
227
-
228
- def get_saved_result
229
- @outputter.decode(Process.get_saved_result(@keyname))
230
- end
231
-
232
- def delete_saved_result
233
- Process.delete_saved_result(@keyname)
234
- end
235
-
236
224
  # Keys that the map operation produced
237
225
  #
238
226
  # Examples
@@ -248,11 +236,6 @@ module MapRedus
248
236
  end
249
237
  end
250
238
 
251
- def num_values(key)
252
- hashed_key = Helper.hash(key)
253
- FileSystem.llen( ProcessInfo.map(@pid, hashed_key) )
254
- end
255
-
256
239
  # values that the map operation produced, for a key
257
240
  #
258
241
  # Examples
@@ -265,6 +248,10 @@ module MapRedus
265
248
  FileSystem.lrange( ProcessInfo.map(@pid, hashed_key), 0, -1 )
266
249
  end
267
250
 
251
+ def num_values(key)
252
+ hashed_key = Helper.hash(key)
253
+ FileSystem.llen( ProcessInfo.map(@pid, hashed_key) )
254
+ end
268
255
 
269
256
  # values that the reduce operation produced, for a key
270
257
  #
@@ -278,37 +265,104 @@ module MapRedus
278
265
  FileSystem.lrange( ProcessInfo.reduce(@pid, hashed_key), 0, -1 )
279
266
  end
280
267
 
281
- # Map and Reduce are strings naming the Mapper and Reducer
282
- # classes we want to run our map reduce with.
283
- #
284
- # For instance
285
- # Mapper = "Mapper"
286
- # Reducer = "Reducer"
287
- #
288
- # Default finalizer
289
- # "MapRedus::Finalizer"
268
+ def result_key(*args)
269
+ Helper.class_get(@type).result_key(*[@key_args, args].flatten)
270
+ end
271
+
272
+ def self.result_key(*args)
273
+ ProcessInfo.send( "#{self.to_s.gsub(/\W/,"_")}_result_cache", *args )
274
+ end
275
+
276
+ def self.set_result_key(key_struct)
277
+ MapRedus.redefine_redis_key( "#{self.to_s.gsub(/\W/,"_")}_result_cache", key_struct )
278
+ end
279
+
280
+ # Create sets up a process to be run with the given specification.
281
+ # It saves the information in the FileSystem and returns an
282
+ # instance of the process that run should be called on when
283
+ # running is desired.
290
284
  #
291
- # Returns the new process id.
292
- def self.create( *args )
285
+ # Example
286
+ # process = MapRedus::Process.create
287
+ # process.run
288
+ #
289
+ # Returns an instance of the process
290
+ def self.create
293
291
  new_pid = get_available_pid
294
-
295
- spec = specification(*args)
296
- return nil unless spec
297
-
298
- Process.new(new_pid, spec).save
292
+ specification = ATTRS.inject({}) do |ret, attr|
293
+ ret[attr] = send(attr)
294
+ ret
295
+ end
296
+ specification[:type] = self
297
+ self.new(new_pid, specification).save
299
298
  end
300
-
301
- def self.specification(*args)
302
- raise ProcessSpecificationError
299
+
300
+ # This defines the attributes to be associated with a MapRedus process
301
+ # This will allow us to subclass a Process, creating a new specification
302
+ # by specifying what say the inputter should equal
303
+ #
304
+ # Example
305
+ # class AnswerDistribution < MapRedus::Process
306
+ # inputter = JudgmentStream
307
+ # mapper = ResponseFrequencyMap
308
+ # reducer = Adder
309
+ # finalizer = AnswerCount
310
+ # outputter = MapRedus::RedisHasher
311
+ # end
312
+ class << self; attr_reader *ATTRS; end
313
+
314
+ # Setter/Getter method definitions to set/get the attribute for
315
+ # the class. In the getter if it is not defined (nil) then return
316
+ # the default attribute defined in MapRedus::Process.
317
+ #
318
+ # Example
319
+ # class AnswerDistribution < MapRedus::Process
320
+ # inputter JudgmentStream
321
+ # mapper ResponseFrequency
322
+ # end
323
+ # AnswerDistribution.reducer.should == Adder
324
+ ATTRS.each do |attr|
325
+ (class << self; self; end).send(:define_method, attr) do |*one_arg|
326
+ attribute = "@#{attr}"
327
+ case one_arg.size
328
+ when 0
329
+ instance_variable_get(attribute) || MapRedus::Process.instance_variable_get(attribute)
330
+ when 1
331
+ instance_variable_set(attribute, one_arg.first)
332
+ else
333
+ raise ArgumentError.new("wrong number of arguments (#{one_arg.size}) when zero or one arguments were expected")
334
+ end
335
+ end
303
336
  end
304
337
 
338
+ # Default attributes for the process class. All other attributes
339
+ # are nil by default.
340
+ inputter WordStream
341
+ mapper WordCounter
342
+ reducer Adder
343
+ finalizer ToRedisHash
344
+ outputter RedisHasher
345
+ type Process
346
+
347
+ # This function returns all the redis keys produced associated
348
+ # with a process's process id.
349
+ #
350
+ # Example
351
+ # Process.info(17)
352
+ #
353
+ # Returns an array of keys associated with the process id.
305
354
  def self.info(pid)
306
355
  FileSystem.keys(ProcessInfo.pid(pid) + "*")
307
356
  end
308
357
 
358
+ # Returns an instance of the process class given the process id.
359
+ # If no such process id exists returns nil.
360
+ #
361
+ # Example
362
+ # process = Process.open(17)
309
363
  def self.open(pid)
310
364
  spec = Helper.decode( FileSystem.get(ProcessInfo.pid(pid)) )
311
- spec && Process.new( pid, spec )
365
+ spec && self.new( pid, spec )
312
366
  end
313
367
 
314
368
  # Find out what map reduce processes are out there
@@ -331,12 +385,13 @@ module MapRedus
331
385
  FileSystem.incrby(ProcessInfo.processes_count, 1 + rand(20))
332
386
  end
333
387
 
334
- # Given a result keyname, delete the result
388
+ # Given a arguments for a result key, delete the result from the
389
+ # filesystem.
335
390
  #
336
391
  # Examples
337
392
  # Process.delete_saved_result(key)
338
- def self.delete_saved_result(keyname)
339
- FileSystem.del( ProcessInfo.result_cache(keyname) )
393
+ def self.delete_saved_result(*key_args)
394
+ FileSystem.del( result_key(*key_args) )
340
395
  end
341
396
 
342
397
  # Remove redis keys associated with this process if the Master isn't working.
@@ -4,7 +4,6 @@ module MapRedus
4
4
  class DuplicateProcessDefinitionError < MapRedusRunnerError ; end
5
5
 
6
6
  class Runner
7
- attr_reader :process
8
7
  def initialize(class_name)
9
8
  @class = class_name
10
9
  end
@@ -14,7 +13,7 @@ module MapRedus
14
13
  if self.respond_to?(mr_process)
15
14
  self.send(mr_process, *args, &block)
16
15
  else
17
- super(method, *args, &block)
16
+ super(method, *args, &block)
18
17
  end
19
18
  end
20
19
  end
@@ -24,7 +23,7 @@ module MapRedus
24
23
  end
25
24
 
26
25
  module ClassMethods
27
- def mapreduce_process( process_name, mapredus_process_class, result_store, opts = {})
26
+ def mapreduce_process( process_name, mapredus_process_class, result_store )
28
27
  runner_self = Runner
29
28
  class_name = self.to_s.gsub(/\W/,"_")
30
29
 
@@ -34,17 +33,18 @@ module MapRedus
34
33
  raise DuplicateProcessDefintionError
35
34
  end
36
35
 
37
- keyname = "mapredus_key_#{global_process_name}"
38
- RedisSupport.redis_key( keyname, result_store )
36
+ mapredus_process_class.set_result_key( result_store )
39
37
 
40
- runner_self.send( :define_method, global_process_name ) do |data, *var|
41
- @process = mapredus_process_class.create
42
- @process.update(:keyname => RedisSupport::Keys.send( keyname, *var ))
43
- @process.run(data)
38
+ runner_self.send( :define_method, global_process_name ) do |data, key_arguments|
39
+ process = mapredus_process_class.create
40
+ process.update(:key_args => key_arguments)
41
+ process.run(data)
42
+ process
44
43
  end
45
44
 
46
- runner_self.send( :define_method, "#{global_process_name}_result" ) do |*outputter_args|
47
- @process.outputter.decode(@process.keyname, *outputter_args)
45
+ runner_self.send( :define_method, "#{global_process_name}_result" ) do |key_arguments, *outputter_args|
46
+ key = mapredus_process_class.result_key( *key_arguments )
47
+ mapredus_process_class.outputter.decode( key, *outputter_args)
48
48
  end
49
49
  end
50
50
  end
@@ -1,49 +1,3 @@
1
- class GetCharCount < MapRedus::Process
2
- EXPECTED_ANSWER = {"k"=>2, "v"=>1, " "=>54, ","=>3, "w"=>7, "a"=>17, "l"=>12, "b"=>2, "m"=>4, "c"=>3, "."=>2, "y"=>3, "n"=>18, "D"=>1, "d"=>15, "o"=>13, "p"=>14, "e"=>34, "f"=>6, "r"=>13, "g"=>6, "S"=>1, "s"=>12, "h"=>19, "H"=>1, "t"=>20, "i"=>16, "u"=>5, "j"=>1}
3
- def self.specification
4
- {
5
- :inputter => CharStream,
6
- :mapper => CharCounter,
7
- :reducer => Adder,
8
- :finalizer => ToRedisHash,
9
- :outputter => MapRedus::RedisHasher,
10
- :ordered => false
11
- }
12
- end
13
- end
14
-
15
- class GetWordCount < MapRedus::Process
16
- TEST = "He pointed his finger in friendly jest and went over to the parapet laughing to himself. Stephen Dedalus stepped up, followed him wearily halfway and sat down on the edge of the gunrest, watching him still as he propped his mirror on the parapet, dipped the brush in the bowl and lathered cheeks and neck."
17
- EXPECTED_ANSWER = {"gunrest"=>1, "over"=>1, "still"=>1, "of"=>1, "him"=>2, "and"=>4, "bowl"=>1, "himself"=>1, "went"=>1, "friendly"=>1, "finger"=>1, "propped"=>1, "cheeks"=>1, "dipped"=>1, "down"=>1, "wearily"=>1, "up"=>1, "stepped"=>1, "dedalus"=>1, "to"=>2, "in"=>2, "sat"=>1, "the"=>6, "pointed"=>1, "as"=>1, "followed"=>1, "stephen"=>1, "laughing"=>1, "his"=>2, "he"=>2, "brush"=>1, "jest"=>1, "neck"=>1, "mirror"=>1, "edge"=>1, "on"=>2, "parapet"=>2, "lathered"=>1, "watching"=>1, "halfway"=>1}
18
- def self.specification
19
- {
20
- :inputter => WordStream,
21
- :mapper => WordCounter,
22
- :reducer => Adder,
23
- :finalizer => ToRedisHash,
24
- :outputter => MapRedus::RedisHasher,
25
- :ordered => false,
26
- :keyname => "test:result"
27
- }
28
- end
29
- end
30
-
31
- class WordStream < MapRedus::InputStream
32
- def self.scan(data_object)
33
- #
34
- # The data_object should be a reference to an object that is
35
- # stored on your system. The scanner is used to break up what you
36
- # need from the object into manageable pieces for the mapper. In
37
- # this example, the data object is a reference to a redis string.
38
- #
39
- test_string = MapRedus::FileSystem.get(data_object)
40
-
41
- test_string.split.each_slice(10).each_with_index do |word_set, i|
42
- yield(i, word_set.join(" "))
43
- end
44
- end
45
- end
46
-
47
1
  class CharStream < MapRedus::InputStream
48
2
  def self.scan(data_object)
49
3
  test_string = MapRedus::FileSystem.get(data_object)
@@ -56,15 +10,6 @@ class CharStream < MapRedus::InputStream
56
10
  end
57
11
  end
58
12
 
59
- class WordCounter < MapRedus::Mapper
60
- def self.map(map_data)
61
- map_data.split(/\W/).each do |word|
62
- next if word.empty?
63
- yield(word.downcase, 1)
64
- end
65
- end
66
- end
67
-
68
13
  class CharCounter < MapRedus::Mapper
69
14
  def self.map(map_data)
70
15
  map_data.each_char do |char|
@@ -73,18 +18,16 @@ class CharCounter < MapRedus::Mapper
73
18
  end
74
19
  end
75
20
 
76
- class Adder < MapRedus::Reducer
77
- def self.reduce(value_list)
78
- yield( value_list.reduce(0) { |r, v| r += v.to_i } )
79
- end
21
+ class GetCharCount < MapRedus::Process
22
+ EXPECTED_ANSWER = {"k"=>2, "v"=>1, " "=>54, ","=>3, "w"=>7, "a"=>17, "l"=>12, "b"=>2, "m"=>4, "c"=>3, "."=>2, "y"=>3, "n"=>18, "D"=>1, "d"=>15, "o"=>13, "p"=>14, "e"=>34, "f"=>6, "r"=>13, "g"=>6, "S"=>1, "s"=>12, "h"=>19, "H"=>1, "t"=>20, "i"=>16, "u"=>5, "j"=>1}
23
+ inputter CharStream
24
+ mapper CharCounter
80
25
  end
81
26
 
82
- class ToRedisHash < MapRedus::Finalizer
83
- def self.finalize(process)
84
- process.each_key_reduced_value do |key, value|
85
- process.outputter.encode(process.keyname, key, value)
86
- end
87
- end
27
+ class GetWordCount < MapRedus::Process
28
+ TEST = "He pointed his finger in friendly jest and went over to the parapet laughing to himself. Stephen Dedalus stepped up, followed him wearily halfway and sat down on the edge of the gunrest, watching him still as he propped his mirror on the parapet, dipped the brush in the bowl and lathered cheeks and neck."
29
+ EXPECTED_ANSWER = {"gunrest"=>1, "over"=>1, "still"=>1, "of"=>1, "him"=>2, "and"=>4, "bowl"=>1, "himself"=>1, "went"=>1, "friendly"=>1, "finger"=>1, "propped"=>1, "cheeks"=>1, "dipped"=>1, "down"=>1, "wearily"=>1, "up"=>1, "stepped"=>1, "dedalus"=>1, "to"=>2, "in"=>2, "sat"=>1, "the"=>6, "pointed"=>1, "as"=>1, "followed"=>1, "stephen"=>1, "laughing"=>1, "his"=>2, "he"=>2, "brush"=>1, "jest"=>1, "neck"=>1, "mirror"=>1, "edge"=>1, "on"=>2, "parapet"=>2, "lathered"=>1, "watching"=>1, "halfway"=>1}
30
+ set_result_key "test:result"
88
31
  end
89
32
 
90
33
  class Document
@@ -8,13 +8,41 @@ describe "MapRedus" do
8
8
  MapRedus::FileSystem.setnx("wordstream:test", GetWordCount::TEST)
9
9
  end
10
10
 
11
+ it "has sets up the correct default classes" do
12
+ MapRedus::Process.inputter.should == MapRedus::WordStream
13
+ MapRedus::Process.mapper.should == MapRedus::WordCounter
14
+ MapRedus::Process.reducer.should == MapRedus::Adder
15
+ MapRedus::Process.finalizer.should == MapRedus::ToRedisHash
16
+ MapRedus::Process.outputter.should == MapRedus::RedisHasher
17
+
18
+ GetWordCount.result_key.should == "test:result"
19
+ GetWordCount.inputter.should == MapRedus::WordStream
20
+ GetWordCount.mapper.should == MapRedus::WordCounter
21
+ GetWordCount.reducer.should == MapRedus::Adder
22
+ GetWordCount.finalizer.should == MapRedus::ToRedisHash
23
+ GetWordCount.outputter.should == MapRedus::RedisHasher
24
+
25
+ GetCharCount.inputter.should == CharStream
26
+ GetCharCount.mapper.should == CharCounter
27
+ GetCharCount.reducer.should == MapRedus::Adder
28
+ GetCharCount.finalizer.should == MapRedus::ToRedisHash
29
+ GetCharCount.outputter.should == MapRedus::RedisHasher
30
+ end
31
+
11
32
  it "creates a process successfully" do
12
33
  process = GetWordCount.open(@process.pid)
13
34
 
14
- process.inputter.should == WordStream
15
- process.mapper.should == WordCounter
16
- process.reducer.should == Adder
17
- process.finalizer.should == ToRedisHash
35
+ process.inputter.should == MapRedus::WordStream
36
+ process.mapper.should == MapRedus::WordCounter
37
+ process.reducer.should == MapRedus::Adder
38
+ process.finalizer.should == MapRedus::ToRedisHash
39
+ process.outputter.should == MapRedus::RedisHasher
40
+
41
+ process = GetCharCount.create
42
+ process.inputter.should == CharStream
43
+ process.mapper.should == CharCounter
44
+ process.reducer.should == MapRedus::Adder
45
+ process.finalizer.should == MapRedus::ToRedisHash
18
46
  process.outputter.should == MapRedus::RedisHasher
19
47
  end
20
48
 
@@ -22,6 +50,7 @@ describe "MapRedus" do
22
50
  ##
23
51
  ## In general map reduce shouldn't be running operations synchronously
24
52
  ##
53
+ @process.class.should == GetWordCount
25
54
  @process.run("wordstream:test", synchronously = true)
26
55
  @process.map_keys.size.should == GetWordCount::EXPECTED_ANSWER.size
27
56
 
@@ -31,7 +60,7 @@ describe "MapRedus" do
31
60
  end
32
61
 
33
62
  @process.each_key_reduced_value do |key, value|
34
- @process.outputter.decode(@process.keyname, key).to_i.should == GetWordCount::EXPECTED_ANSWER[key]
63
+ @process.outputter.decode(@process.result_key, key).to_i.should == GetWordCount::EXPECTED_ANSWER[key]
35
64
  end
36
65
  end
37
66
 
@@ -46,7 +75,7 @@ describe "MapRedus" do
46
75
  end
47
76
 
48
77
  @process.each_key_reduced_value do |key, value|
49
- @process.outputter.decode(@process.keyname, key).to_i.should == GetWordCount::EXPECTED_ANSWER[key]
78
+ @process.outputter.decode(@process.result_key, key).to_i.should == GetWordCount::EXPECTED_ANSWER[key]
50
79
  end
51
80
  end
52
81
  end
@@ -158,8 +187,8 @@ describe "MapRedus Process" do
158
187
 
159
188
  it "emit_intermediate on an ordered process" do
160
189
  @process.update(:ordered => true)
161
- @process.emit_intermediate(1, "number", "one")
162
190
  @process.emit_intermediate(2, "place", "two")
191
+ @process.emit_intermediate(1, "number", "one")
163
192
  res = []
164
193
  @process.each_key_nonreduced_value do |key, value|
165
194
  res << [key, value]
@@ -207,14 +236,14 @@ describe "MapRedus Master" do
207
236
  end
208
237
 
209
238
  it "handles slaves (enslaving) correctly" do
210
- MapRedus::Master.enslave(@process, WordCounter, @process.pid, "test")
211
- Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, "test"], "class"=>"WordCounter"}]
239
+ MapRedus::Master.enslave(@process, MapRedus::WordCounter, @process.pid, "test")
240
+ Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, "test"], "class"=>"MapRedus::WordCounter"}]
212
241
  MapRedus::Master.slaves(@process.pid).should == ["1"]
213
242
  end
214
243
 
215
244
  it "handles slaves (freeing) correctly" do
216
- MapRedus::Master.enslave(@process, WordCounter, @process.pid, "test")
217
- MapRedus::Master.enslave(@process, WordCounter, @process.pid, "test")
245
+ MapRedus::Master.enslave(@process, MapRedus::WordCounter, @process.pid, "test")
246
+ MapRedus::Master.enslave(@process, MapRedus::WordCounter, @process.pid, "test")
218
247
 
219
248
  MapRedus::Master.slaves(@process.pid).should == ["1", "1"]
220
249
 
@@ -237,12 +266,12 @@ describe "MapRedus Mapper/Reducer/Finalizer" do
237
266
  @process.update(:state => MapRedus::INPUT_MAP_IN_PROGRESS)
238
267
  @process.state.should == MapRedus::INPUT_MAP_IN_PROGRESS
239
268
  @process.inputter.perform(@process.pid, "wordstream:test")
240
- Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, 0], "class"=>"WordCounter"}]
269
+ Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, 0], "class"=>"MapRedus::WordCounter"}]
241
270
  Resque.pop(:mapredus)
242
271
  @process.mapper.perform(@process.pid, 0)
243
272
  @process.reload
244
273
  @process.state.should == MapRedus::REDUCE_IN_PROGRESS
245
- Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, "data"], "class"=>"Adder"}]
274
+ Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid, "data"], "class"=>"MapRedus::Adder"}]
246
275
  end
247
276
 
248
277
  it "runs a reduce correctly proceeding to the correct next state" do
@@ -252,7 +281,7 @@ describe "MapRedus Mapper/Reducer/Finalizer" do
252
281
  @process.reducer.perform(@process.pid, "data")
253
282
  @process.reload
254
283
  @process.state.should == MapRedus::FINALIZER_IN_PROGRESS
255
- Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid], "class"=>"ToRedisHash"}]
284
+ Resque.peek(:mapredus, 0, -1).should == [{"args"=>[@process.pid], "class"=>"MapRedus::ToRedisHash"}]
256
285
  end
257
286
 
258
287
  it "should test that the finalizer correctly saves" do
@@ -268,7 +297,7 @@ describe "MapRedus Mapper/Reducer/Finalizer" do
268
297
  end
269
298
  end
270
299
 
271
- describe "MapReduce Support" do
300
+ describe "MapRedus Support" do
272
301
  before(:each) do
273
302
  MapRedus::FileSystem.flushall
274
303
  @doc = Document.new(10)
@@ -285,11 +314,11 @@ describe "MapReduce Support" do
285
314
  work_off
286
315
 
287
316
  GetCharCount::EXPECTED_ANSWER.keys.each do |char|
288
- @doc.mapreduce.char_count_result(char).should == GetCharCount::EXPECTED_ANSWER[char].to_s
317
+ @doc.mapreduce.char_count_result([@doc.id], char).should == GetCharCount::EXPECTED_ANSWER[char].to_s
289
318
  end
290
319
 
291
320
  other_answer.keys.each do |char|
292
- @other_doc.mapreduce.char_count_result(char).should == other_answer[char].to_s
321
+ @other_doc.mapreduce.char_count_result([@other_doc.id], char).should == other_answer[char].to_s
293
322
  end
294
323
  end
295
324
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: mapredus
3
3
  version: !ruby/object:Gem::Version
4
- hash: 29
4
+ hash: 27
5
5
  prerelease: false
6
6
  segments:
7
7
  - 0
8
8
  - 0
9
- - 1
10
- version: 0.0.1
9
+ - 2
10
+ version: 0.0.2
11
11
  platform: ruby
12
12
  authors:
13
13
  - John Le
@@ -16,7 +16,7 @@ autorequire:
16
16
  bindir: bin
17
17
  cert_chain: []
18
18
 
19
- date: 2010-07-06 00:00:00 -07:00
19
+ date: 2010-07-09 00:00:00 -07:00
20
20
  default_executable:
21
21
  dependencies:
22
22
  - !ruby/object:Gem::Dependency
@@ -89,6 +89,7 @@ extra_rdoc_files:
89
89
  - README.md
90
90
  files:
91
91
  - lib/mapredus.rb
92
+ - lib/mapredus/default_classes.rb
92
93
  - lib/mapredus/filesystem.rb
93
94
  - lib/mapredus/finalizer.rb
94
95
  - lib/mapredus/inputter.rb