RubyGems - map-reduce-ruby - Versions diffs - 2.0.0 → 2.1.0 - Mend

map-reduce-ruby 2.0.0 → 2.1.0

Files changed (8) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 68967e02da6776738d27e48228374396dc613f4ee4a48f08268b71c92764eaa0
-  data.tar.gz: 0f0633adb6f2c51617ea3234284b30433da4a9ceb25d167ce8eaef2527dc09db
+  metadata.gz: 4f3eeb7739b733f1abdf325ddfcf5a4c42fb257edd837952781246fcda9f48fb
+  data.tar.gz: f40eb08a341fc522c8f7043b74ac1c47fb00af491a107f51a6a76ca67f389629
 SHA512:
-  metadata.gz: d27e08793121dd81d4f8cc887f350d30ec9ea4039dad3490a4c7969164413cf575bdba0aa4c767097fc4af124aa3bb17b55f6781ff03919ef7d36164b3b14fa6
-  data.tar.gz: 71165ebbab647370de51c709b34f8e5cc8348300a41809354472ac125ef542255ec767a4613046cb964443b698fa7ec5dc8334d40df2a64728822aab0b170b43
+  metadata.gz: 6e91d7a1f55f0b89d333b317ecedc7978ca29a709dfad403906f6f2835b121a0a49afb29d03f55d8d7c8aa5c1b70628400404b7bf4d590046ae2e66ed48d7abc
+  data.tar.gz: cc4524aec895d935b70548163e5818f8725c5a05985e55f9b8d9330c34491968bb3ec54f02466ac0748eeb130f47a40d07baefa2627b04ed10dbac1c9638b1af

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,9 @@
 # CHANGELOG
+## v2.1.0
+* Do not reduce in `MapReduce::Mapper` when no `reduce` implementation is given
 ## v2.0.0
 * [BREAKING] Keys are no longer automatically converted to json before using

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    map-reduce-ruby (2.0.0)
+    map-reduce-ruby (2.1.0)
       json
       lazy_priority_queue

data/README.md CHANGED Viewed

@@ -7,8 +7,7 @@ than memory map-reduce jobs by using your local disk and some arbitrary storage
 layer like s3. You can specify how much memory you are willing to offer and
 MapReduce will use its buffers accordingly. Finally, you can use your already
 existing background job system like `sidekiq` or one of its various
-alternatives. Finally, your keys and values can be everything that can be
-serialized as json.
+alternatives.
 ## Installation
@@ -30,9 +29,7 @@ Or install it yourself as:
 Any map-reduce job consists of an implementation of your `map` function, your
 `reduce` function and worker code. So let's start with an implementation for a
-word count map-reduce task which fetches txt documents from the web. Please
-note that your keys and values can be everything that can be serialized as
-json, but nothing else.
+word count map-reduce task which fetches txt documents from the web.
 ```ruby
 class WordCounter
@@ -68,8 +65,8 @@ class WordCountMapper
 end
 ```
-Please note that `MapReduce::HashPartitioner.new(16)` states that we want split
-the dataset into 16 partitions (i.e. 0, 1, ... 15). Finally, we need some
+Please note that `MapReduce::HashPartitioner.new(16)` states that we want to
+split the dataset into 16 partitions (i.e. 0, 1, ... 15). Finally, we need some
 worker code to run the reduce part:
 ```ruby

data/lib/map_reduce/mapper.rb CHANGED Viewed

@@ -71,7 +71,10 @@ module MapReduce
       partitions = {}
-      reduce_chunk(k_way_merge(@chunks), @implementation).each do |pair|
+      chunk = k_way_merge(@chunks)
+      chunk = reduce_chunk(chunk, @implementation) if @implementation.respond_to?(:reduce)
+      chunk.each do |pair|
         partition = @partitioner.call(pair[0])
         (partitions[partition] ||= Tempfile.new).puts(JSON.generate(pair))

data/lib/map_reduce/mergeable.rb CHANGED Viewed

@@ -20,6 +20,16 @@ module MapReduce
     def k_way_merge(files)
       return enum_for(:k_way_merge, files) unless block_given?
+      if files.size == 1
+        files.first.each_line do |line|
+          yield(JSON.parse(line))
+        end
+        files.each(&:rewind)
+        return
+      end
       queue = PriorityQueue.new
       files.each_with_index do |file, index|

data/lib/map_reduce/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module MapReduce
-  VERSION = "2.0.0"
+  VERSION = "2.1.0"
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: map-reduce-ruby
 version: !ruby/object:Gem::Version
-  version: 2.0.0
+  version: 2.1.0
 platform: ruby
 authors:
 - Benjamin Vetter
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2022-09-22 00:00:00.000000000 Z
+date: 2022-10-24 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec