map-reduce-ruby 2.0.0 → 2.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 68967e02da6776738d27e48228374396dc613f4ee4a48f08268b71c92764eaa0
4
- data.tar.gz: 0f0633adb6f2c51617ea3234284b30433da4a9ceb25d167ce8eaef2527dc09db
3
+ metadata.gz: 4f3eeb7739b733f1abdf325ddfcf5a4c42fb257edd837952781246fcda9f48fb
4
+ data.tar.gz: f40eb08a341fc522c8f7043b74ac1c47fb00af491a107f51a6a76ca67f389629
5
5
  SHA512:
6
- metadata.gz: d27e08793121dd81d4f8cc887f350d30ec9ea4039dad3490a4c7969164413cf575bdba0aa4c767097fc4af124aa3bb17b55f6781ff03919ef7d36164b3b14fa6
7
- data.tar.gz: 71165ebbab647370de51c709b34f8e5cc8348300a41809354472ac125ef542255ec767a4613046cb964443b698fa7ec5dc8334d40df2a64728822aab0b170b43
6
+ metadata.gz: 6e91d7a1f55f0b89d333b317ecedc7978ca29a709dfad403906f6f2835b121a0a49afb29d03f55d8d7c8aa5c1b70628400404b7bf4d590046ae2e66ed48d7abc
7
+ data.tar.gz: cc4524aec895d935b70548163e5818f8725c5a05985e55f9b8d9330c34491968bb3ec54f02466ac0748eeb130f47a40d07baefa2627b04ed10dbac1c9638b1af
data/CHANGELOG.md CHANGED
@@ -1,5 +1,9 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## v2.1.0
4
+
5
+ * Do not reduce in `MapReduce::Mapper` when no `reduce` implementation is given
6
+
3
7
  ## v2.0.0
4
8
 
5
9
  * [BREAKING] Keys are no longer automatically converted to json before using
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- map-reduce-ruby (2.0.0)
4
+ map-reduce-ruby (2.1.0)
5
5
  json
6
6
  lazy_priority_queue
7
7
 
data/README.md CHANGED
@@ -7,8 +7,7 @@ than memory map-reduce jobs by using your local disk and some arbitrary storage
7
7
  layer like s3. You can specify how much memory you are willing to offer and
8
8
  MapReduce will use its buffers accordingly. Finally, you can use your already
9
9
  existing background job system like `sidekiq` or one of its various
10
- alternatives. Finally, your keys and values can be everything that can be
11
- serialized as json.
10
+ alternatives.
12
11
 
13
12
  ## Installation
14
13
 
@@ -30,9 +29,7 @@ Or install it yourself as:
30
29
 
31
30
  Any map-reduce job consists of an implementation of your `map` function, your
32
31
  `reduce` function and worker code. So let's start with an implementation for a
33
- word count map-reduce task which fetches txt documents from the web. Please
34
- note that your keys and values can be everything that can be serialized as
35
- json, but nothing else.
32
+ word count map-reduce task which fetches txt documents from the web.
36
33
 
37
34
  ```ruby
38
35
  class WordCounter
@@ -68,8 +65,8 @@ class WordCountMapper
68
65
  end
69
66
  ```
70
67
 
71
- Please note that `MapReduce::HashPartitioner.new(16)` states that we want split
72
- the dataset into 16 partitions (i.e. 0, 1, ... 15). Finally, we need some
68
+ Please note that `MapReduce::HashPartitioner.new(16)` states that we want to
69
+ split the dataset into 16 partitions (i.e. 0, 1, ... 15). Finally, we need some
73
70
  worker code to run the reduce part:
74
71
 
75
72
  ```ruby
@@ -71,7 +71,10 @@ module MapReduce
71
71
 
72
72
  partitions = {}
73
73
 
74
- reduce_chunk(k_way_merge(@chunks), @implementation).each do |pair|
74
+ chunk = k_way_merge(@chunks)
75
+ chunk = reduce_chunk(chunk, @implementation) if @implementation.respond_to?(:reduce)
76
+
77
+ chunk.each do |pair|
75
78
  partition = @partitioner.call(pair[0])
76
79
 
77
80
  (partitions[partition] ||= Tempfile.new).puts(JSON.generate(pair))
@@ -20,6 +20,16 @@ module MapReduce
20
20
  def k_way_merge(files)
21
21
  return enum_for(:k_way_merge, files) unless block_given?
22
22
 
23
+ if files.size == 1
24
+ files.first.each_line do |line|
25
+ yield(JSON.parse(line))
26
+ end
27
+
28
+ files.each(&:rewind)
29
+
30
+ return
31
+ end
32
+
23
33
  queue = PriorityQueue.new
24
34
 
25
35
  files.each_with_index do |file, index|
@@ -1,3 +1,3 @@
1
1
  module MapReduce
2
- VERSION = "2.0.0"
2
+ VERSION = "2.1.0"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: map-reduce-ruby
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.0
4
+ version: 2.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Benjamin Vetter
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-09-22 00:00:00.000000000 Z
11
+ date: 2022-10-24 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec