map-reduce-ruby 2.0.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 68967e02da6776738d27e48228374396dc613f4ee4a48f08268b71c92764eaa0
4
- data.tar.gz: 0f0633adb6f2c51617ea3234284b30433da4a9ceb25d167ce8eaef2527dc09db
3
+ metadata.gz: 4f3eeb7739b733f1abdf325ddfcf5a4c42fb257edd837952781246fcda9f48fb
4
+ data.tar.gz: f40eb08a341fc522c8f7043b74ac1c47fb00af491a107f51a6a76ca67f389629
5
5
  SHA512:
6
- metadata.gz: d27e08793121dd81d4f8cc887f350d30ec9ea4039dad3490a4c7969164413cf575bdba0aa4c767097fc4af124aa3bb17b55f6781ff03919ef7d36164b3b14fa6
7
- data.tar.gz: 71165ebbab647370de51c709b34f8e5cc8348300a41809354472ac125ef542255ec767a4613046cb964443b698fa7ec5dc8334d40df2a64728822aab0b170b43
6
+ metadata.gz: 6e91d7a1f55f0b89d333b317ecedc7978ca29a709dfad403906f6f2835b121a0a49afb29d03f55d8d7c8aa5c1b70628400404b7bf4d590046ae2e66ed48d7abc
7
+ data.tar.gz: cc4524aec895d935b70548163e5818f8725c5a05985e55f9b8d9330c34491968bb3ec54f02466ac0748eeb130f47a40d07baefa2627b04ed10dbac1c9638b1af
data/CHANGELOG.md CHANGED
@@ -1,5 +1,9 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## v2.1.0
4
+
5
+ * Do not reduce in `MapReduce::Mapper` when no `reduce` implementation is given
6
+
3
7
  ## v2.0.0
4
8
 
5
9
  * [BREAKING] Keys are no longer automatically converted to json before using
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- map-reduce-ruby (2.0.0)
4
+ map-reduce-ruby (2.1.0)
5
5
  json
6
6
  lazy_priority_queue
7
7
 
data/README.md CHANGED
@@ -7,8 +7,7 @@ than memory map-reduce jobs by using your local disk and some arbitrary storage
7
7
  layer like s3. You can specify how much memory you are willing to offer and
8
8
  MapReduce will use its buffers accordingly. Finally, you can use your already
9
9
  existing background job system like `sidekiq` or one of its various
10
- alternatives. Finally, your keys and values can be everything that can be
11
- serialized as json.
10
+ alternatives.
12
11
 
13
12
  ## Installation
14
13
 
@@ -30,9 +29,7 @@ Or install it yourself as:
30
29
 
31
30
  Any map-reduce job consists of an implementation of your `map` function, your
32
31
  `reduce` function and worker code. So let's start with an implementation for a
33
- word count map-reduce task which fetches txt documents from the web. Please
34
- note that your keys and values can be everything that can be serialized as
35
- json, but nothing else.
32
+ word count map-reduce task which fetches txt documents from the web.
36
33
 
37
34
  ```ruby
38
35
  class WordCounter
@@ -68,8 +65,8 @@ class WordCountMapper
68
65
  end
69
66
  ```
70
67
 
71
- Please note that `MapReduce::HashPartitioner.new(16)` states that we want split
72
- the dataset into 16 partitions (i.e. 0, 1, ... 15). Finally, we need some
68
+ Please note that `MapReduce::HashPartitioner.new(16)` states that we want to
69
+ split the dataset into 16 partitions (i.e. 0, 1, ... 15). Finally, we need some
73
70
  worker code to run the reduce part:
74
71
 
75
72
  ```ruby
@@ -71,7 +71,10 @@ module MapReduce
71
71
 
72
72
  partitions = {}
73
73
 
74
- reduce_chunk(k_way_merge(@chunks), @implementation).each do |pair|
74
+ chunk = k_way_merge(@chunks)
75
+ chunk = reduce_chunk(chunk, @implementation) if @implementation.respond_to?(:reduce)
76
+
77
+ chunk.each do |pair|
75
78
  partition = @partitioner.call(pair[0])
76
79
 
77
80
  (partitions[partition] ||= Tempfile.new).puts(JSON.generate(pair))
@@ -20,6 +20,16 @@ module MapReduce
20
20
  def k_way_merge(files)
21
21
  return enum_for(:k_way_merge, files) unless block_given?
22
22
 
23
+ if files.size == 1
24
+ files.first.each_line do |line|
25
+ yield(JSON.parse(line))
26
+ end
27
+
28
+ files.each(&:rewind)
29
+
30
+ return
31
+ end
32
+
23
33
  queue = PriorityQueue.new
24
34
 
25
35
  files.each_with_index do |file, index|
@@ -1,3 +1,3 @@
1
1
  module MapReduce
2
- VERSION = "2.0.0"
2
+ VERSION = "2.1.0"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: map-reduce-ruby
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.0
4
+ version: 2.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Benjamin Vetter
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-09-22 00:00:00.000000000 Z
11
+ date: 2022-10-24 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec