enumerating 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,4 @@
1
+ *.gem
2
+ .bundle
3
+ Gemfile.lock
4
+ pkg/*
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source :rubygems
2
+
3
+ gemspec
4
+
5
+ gem "rake"
6
+ gem "rspec", ">= 2"
data/README.markdown ADDED
@@ -0,0 +1,101 @@
1
+ Enumerating
2
+ ===========
3
+
4
+ Lazy "filtering" and transforming
5
+ ---------------------------------
6
+
7
+ Enumerating extends Enumerable with "lazy" versions of various common operations:
8
+
9
+ * `#selecting` selects elements that pass a test block (like `Enumerable#select`)
10
+ * `#rejecting` selects elements that fail a test block (like `Enumerable#reject`)
11
+ * `#collecting` applies a transforming block to each element (like `Enumerable#collect`)
12
+ * `#uniqing` discards duplicates (like `Enumerable#uniq`)
13
+
14
+ We say the "...ing" variants are "lazy", because they defer per-element processing until the result is used. They return Enumerable "result proxy" objects, rather than Arrays, and only perform the actual filtering (or transformation) as the result proxy is enumerated.
15
+
16
+ Perhaps an example would help. Consider the following snippet:
17
+
18
+ >> (1..10).collect { |x| puts "#{x}^2 = #{x*x}"; x*x }.take_while { |x| x < 20 }
19
+ 1^2 = 1
20
+ 2^2 = 4
21
+ 3^2 = 9
22
+ 4^2 = 16
23
+ 5^2 = 25
24
+ 6^2 = 36
25
+ 7^2 = 49
26
+ 8^2 = 64
27
+ 9^2 = 81
28
+ 10^2 = 100
29
+ => [1, 4, 9, 16]
30
+
31
+ Here we use plain old `#collect` to square a bunch of numbers, and then grab the ones less than 20. We can do the same thing using `#collecting`, rather than `#collect`:
32
+
33
+ >> (1..10).collecting { |x| puts "#{x}^2 = #{x*x}"; x*x }.take_while { |x| x < 20 }
34
+ 1^2 = 1
35
+ 2^2 = 4
36
+ 3^2 = 9
37
+ 4^2 = 16
38
+ 5^2 = 25
39
+ => [1, 4, 9, 16]
40
+
41
+ Same result, but notice how only the first five inputs were ever squared; just enough to find the first result above 20.
42
+
43
+ Lazy pipelines
44
+ --------------
45
+
46
+ By combining two or more of the lazy operations provided by Enumerating, you can create an efficient "pipeline", e.g.
47
+
48
+ # enumerate all users
49
+ users = User.to_enum(:find_each)
50
+
51
+ # where first and last names start with the same letter
52
+ users = users.selecting { |u| u.first_name[0] == u.last_name[0] }
53
+
54
+ # grab their company (weeding out duplicates)
55
+ companies = users.collecting(&:company).uniqing
56
+
57
+ # resolve
58
+ companies.to_a #=> ["Disney"]
59
+
60
+ Because each processing step proceeds in parallel, without creation of intermediate collections (Arrays), you can efficiently operate on large (or even infinite) Enumerable collections.
61
+
62
+ Lazy combination of Enumerables
63
+ -------------------------------
64
+
65
+ Enumerating also provides some interesting ways to combine several Enumerable collections to create a new collection. Again, these operate "lazily".
66
+
67
+ `Enumerating.zipping` pulls elements from a number of collections in parallel, yielding each group.
68
+
69
+ array1 = [1,3,6]
70
+ array2 = [2,4,7]
71
+ Enumerating.zipping(array1, array2) # generates: [1,2], [3,4], [6,7]
72
+
73
+ `Enumerating.merging` merges multiple collections, preserving sort-order. The inputs are assumed to be sorted already.
74
+
75
+ array1 = [1,4,5]
76
+ array2 = [2,3,6]
77
+ Enumerating.merging(array1, array2) # generates: 1, 2, 3, 4, 5, 6
78
+
79
+ Variant `Enumerating.merging_by` uses a block to determine sort-order.
80
+
81
+ array1 = %w(a dd cccc)
82
+ array2 = %w(eee bbbbb)
83
+ Enumerating.merging_by(array1, array2) { |x| x.length }
84
+ # generates: %w(a dd eee cccc bbbbb)
85
+
86
+ Same but different
87
+ ------------------
88
+
89
+ There are numerous similar implementations of lazy operations on Enumerables. A nod, in particular, to:
90
+
91
+ * Greg Spurrier's gem "`lazing`" (from which Enumerating borrows the convention of using "..ing" to name lazy methods)
92
+ * `Enumerable#defer` from the Ruby Facets library
93
+
94
+ In the end, though, I felt the world deserved another. Enumerating's selling point is that it's basic (filtering/transforming) operations work on any Ruby, whereas most of the other implementations depend on the availablity of Ruby 1.9's "`Enumerator`". Enumerating has been tested on:
95
+
96
+ * MRI 1.8.6
97
+ * MRI 1.8.7
98
+ * MRI 1.9.2
99
+ * JRuby 1.5.3
100
+ * JRuby 1.6.0
101
+ * Rubinius 1.2.3
data/Rakefile ADDED
@@ -0,0 +1,13 @@
1
+ require "rake"
2
+
3
+ require 'bundler'
4
+ Bundler::GemHelper.install_tasks
5
+
6
+ require "rspec/core/rake_task"
7
+
8
+ RSpec::Core::RakeTask.new do |t|
9
+ t.pattern = 'spec/**/*_spec.rb'
10
+ t.rspec_opts = ["--colour", "--format", "nested"]
11
+ end
12
+
13
+ task "default" => "spec"
@@ -0,0 +1,71 @@
1
+ require 'benchmark'
2
+
3
+ $: << File.expand_path("../../lib", __FILE__)
4
+
5
+ RUBY19 = RUBY_VERSION =~ /^1\.9/
6
+
7
+ if RUBY19
8
+ require "lazing"
9
+ module Enumerable
10
+ alias :lazing_select :selecting
11
+ alias :lazing_collect :collecting
12
+ end
13
+ end
14
+
15
+ require "enumerating"
16
+
17
+ require 'facets'
18
+
19
+ array = (1..100000).to_a
20
+
21
+ # Test scenario:
22
+ # - filter out even numbers
23
+ # - square them
24
+ # - grab the first thousand
25
+
26
+ printf "%-30s", "IMPLEMENTATION"
27
+ printf "%12s", "take(10)"
28
+ printf "%12s", "take(100)"
29
+ printf "%12s", "take(1000)"
30
+ printf "%12s", "to_a"
31
+ puts ""
32
+
33
+ def measure(&block)
34
+ begin
35
+ printf "%12.5f", Benchmark.realtime(&block)
36
+ rescue
37
+ printf "%12s", "n/a"
38
+ end
39
+ end
40
+
41
+ def benchmark(description, control_result = nil)
42
+ result = nil
43
+ printf "%-30s", description
44
+ measure { yield.take(10).to_a }
45
+ measure { yield.take(100).to_a }
46
+ measure { result = yield.take(1000).to_a }
47
+ measure { yield.to_a }
48
+ puts ""
49
+ unless control_result.nil? || result == control_result
50
+ raise "unexpected result from '#{description}'"
51
+ end
52
+ result
53
+ end
54
+
55
+ @control = benchmark "conventional" do
56
+ array.select { |x| x.even? }.collect { |x| x*x }
57
+ end
58
+
59
+ benchmark "enumerating", @control do
60
+ array.selecting { |x| x.even? }.collecting { |x| x*x }
61
+ end
62
+
63
+ if RUBY19
64
+ benchmark "lazing", @control do
65
+ array.lazing_select { |x| x.even? }.lazing_collect { |x| x*x }
66
+ end
67
+ end
68
+
69
+ benchmark "facets Enumerable#defer", @control do
70
+ array.defer.select { |x| x.even? }.collect { |x| x*x }
71
+ end
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "enumerating/version"
4
+
5
+ Gem::Specification.new do |s|
6
+
7
+ s.name = "enumerating"
8
+ s.version = Enumerating::VERSION.dup
9
+ s.platform = Gem::Platform::RUBY
10
+ s.authors = ["Mike Williams"]
11
+ s.email = "mdub@dogbiscuit.org"
12
+ s.homepage = "http://github.com/mdub/enumerating"
13
+
14
+ s.summary = %{Lazy filtering/transforming of Enumerable collections}
15
+ s.description = %{Enumerating extends Enumerable with "lazy" versions of various operations, allowing streamed processing of large (or even infinite) collections. Even in Ruby 1.8.x.}
16
+
17
+ s.files = `git ls-files`.split("\n")
18
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.require_paths = ["lib"]
20
+
21
+ end
@@ -0,0 +1,2 @@
1
+ require 'enumerating/filtering'
2
+ require 'enumerating/mixing'
@@ -0,0 +1,68 @@
1
+ require 'set'
2
+
3
+ module Enumerating
4
+
5
+ class Filter
6
+
7
+ include Enumerable
8
+
9
+ def initialize(&generator)
10
+ @generator = generator
11
+ end
12
+
13
+ def each
14
+ return to_enum unless block_given?
15
+ yielder = proc { |x| yield x }
16
+ @generator.call(yielder)
17
+ end
18
+
19
+ end
20
+
21
+ end
22
+
23
+ module Enumerable
24
+
25
+ def collecting
26
+ Enumerating::Filter.new do |output|
27
+ each do |element|
28
+ output.call yield(element)
29
+ end
30
+ end
31
+ end
32
+
33
+ def selecting
34
+ Enumerating::Filter.new do |output|
35
+ each do |element|
36
+ output.call(element) if yield(element)
37
+ end
38
+ end
39
+ end
40
+
41
+ def rejecting
42
+ Enumerating::Filter.new do |output|
43
+ each do |element|
44
+ output.call(element) unless yield(element)
45
+ end
46
+ end
47
+ end
48
+
49
+ def uniqing
50
+ Enumerating::Filter.new do |output|
51
+ seen = Set.new
52
+ each do |element|
53
+ output.call(element) if seen.add?(element)
54
+ end
55
+ end
56
+ end
57
+
58
+ def uniqing_by
59
+ Enumerating::Filter.new do |output|
60
+ seen = Set.new
61
+ each do |element|
62
+ output.call(element) if seen.add?(yield element)
63
+ end
64
+ end
65
+ end
66
+
67
+ end
68
+
@@ -0,0 +1,70 @@
1
+ module Enumerating
2
+
3
+ class Merger
4
+
5
+ include Enumerable
6
+
7
+ def initialize(enumerables, &transformer)
8
+ @enumerables = enumerables
9
+ @transformer = transformer
10
+ end
11
+
12
+ def each(&block)
13
+ return to_enum unless block_given?
14
+ Generator.new(@enumerables.map(&:to_enum), @transformer).each(&block)
15
+ end
16
+
17
+ class Generator
18
+
19
+ def initialize(enumerators, transformer)
20
+ @enumerators = enumerators
21
+ @transformer = transformer
22
+ end
23
+
24
+ def each
25
+ while true do
26
+ discard_empty_enumerators
27
+ break if @enumerators.empty?
28
+ yield next_enumerator.next
29
+ end
30
+ end
31
+
32
+ private
33
+
34
+ def discard_empty_enumerators
35
+ @enumerators.delete_if do |e|
36
+ begin
37
+ e.peek
38
+ false
39
+ rescue StopIteration
40
+ true
41
+ end
42
+ end
43
+ end
44
+
45
+ def next_enumerator
46
+ @enumerators.min_by { |enumerator| transform(enumerator.peek) }
47
+ end
48
+
49
+ def transform(item)
50
+ return item unless @transformer
51
+ @transformer.call(item)
52
+ end
53
+
54
+ end
55
+
56
+ end
57
+
58
+ end
59
+
60
+ class << Enumerating
61
+
62
+ def merging(*enumerables)
63
+ Enumerating::Merger.new(enumerables)
64
+ end
65
+
66
+ def merging_by(*enumerables, &block)
67
+ Enumerating::Merger.new(enumerables, &block)
68
+ end
69
+
70
+ end
@@ -0,0 +1,2 @@
1
+ require 'enumerating/merging'
2
+ require 'enumerating/zipping'
@@ -0,0 +1,3 @@
1
+ module Enumerating
2
+ VERSION = "1.0.0".freeze
3
+ end
@@ -0,0 +1,36 @@
1
+ module Enumerating
2
+
3
+ class Zipper
4
+
5
+ include Enumerable
6
+
7
+ def initialize(enumerables)
8
+ @enumerables = enumerables
9
+ end
10
+
11
+ def each
12
+ enumerators = @enumerables.map(&:to_enum)
13
+ while true
14
+ chunk = enumerators.map do |enumerator|
15
+ begin
16
+ enumerator.next
17
+ rescue StopIteration
18
+ nil
19
+ end
20
+ end
21
+ break if chunk.all?(&:nil?)
22
+ yield chunk
23
+ end
24
+ end
25
+
26
+ end
27
+
28
+ end
29
+
30
+ class << Enumerating
31
+
32
+ def zipping(*enumerables)
33
+ Enumerating::Zipper.new(enumerables)
34
+ end
35
+
36
+ end
@@ -0,0 +1,74 @@
1
+ require "spec_helper"
2
+
3
+ module Enumerable
4
+
5
+ unless method_defined?(:first)
6
+ def first
7
+ each do |first_item|
8
+ return first_item
9
+ end
10
+ end
11
+ end
12
+
13
+ end
14
+
15
+ describe Enumerable do
16
+
17
+ describe "#collecting" do
18
+
19
+ it "transforms items" do
20
+ [1,2,3].collecting { |x| x * 2 }.to_a.should == [2,4,6]
21
+ end
22
+
23
+ it "is lazy" do
24
+ [1,2,3].with_time_bomb.collecting { |x| x * 2 }.first.should == 2
25
+ end
26
+
27
+ end
28
+
29
+ describe "#selecting" do
30
+
31
+ it "excludes items that don't pass the predicate" do
32
+ (1..6).selecting { |x| x%2 == 0 }.to_a.should == [2,4,6]
33
+ end
34
+
35
+ it "is lazy" do
36
+ (1..6).with_time_bomb.selecting { |x| x%2 == 0 }.first == 2
37
+ end
38
+
39
+ end
40
+
41
+ describe "#rejecting" do
42
+
43
+ it "excludes items that do pass the predicate" do
44
+ (1..6).rejecting { |x| x%2 == 0 }.to_a.should == [1,3,5]
45
+ end
46
+
47
+ it "is lazy" do
48
+ (1..6).with_time_bomb.rejecting { |x| x%2 == 0 }.first == 1
49
+ end
50
+
51
+ end
52
+
53
+ describe "#uniqing" do
54
+
55
+ it "removes duplicates" do
56
+ [1,3,2,4,3,5,4,6].uniqing.to_a.should == [1,3,2,4,5,6]
57
+ end
58
+
59
+ it "is lazy" do
60
+ [1,2,3].with_time_bomb.uniqing.first.should == 1
61
+ end
62
+
63
+ end
64
+
65
+ describe "#uniqing_by" do
66
+
67
+ it "uses the block to derive identity" do
68
+ @array = %w(A1 A2 B1 A3 C1 B2 C2)
69
+ @array.uniqing_by { |s| s[0,1] }.to_a.should == %w(A1 B1 C1)
70
+ end
71
+
72
+ end
73
+
74
+ end
@@ -0,0 +1,35 @@
1
+ require "spec_helper"
2
+
3
+ describe Enumerating, :needs_enumerators => true do
4
+
5
+ describe ".merging" do
6
+
7
+ it "merges multiple Enumerators" do
8
+ @array1 = [1,3,6]
9
+ @array2 = [2,4,7]
10
+ @array3 = [5,8]
11
+ @merge = Enumerating.merging(@array1, @array2, @array3)
12
+ @merge.to_a.should == [1,2,3,4,5,6,7,8]
13
+ end
14
+
15
+ it "is lazy" do
16
+ @enum1 = [1,3]
17
+ @enum2 = [2,4].with_time_bomb
18
+ @merge = Enumerating.merging(@enum1, @enum2)
19
+ @merge.take(4).should == [1,2,3,4]
20
+ end
21
+
22
+ end
23
+
24
+ describe ".merging_by" do
25
+
26
+ it "uses the block to determine order" do
27
+ @array1 = %w(cccc dd a)
28
+ @array2 = %w(eeeee bbb)
29
+ @merge = Enumerating.merging_by(@array1, @array2) { |s| -s.length }
30
+ @merge.to_a.should == %w(eeeee cccc bbb dd a)
31
+ end
32
+
33
+ end
34
+
35
+ end
@@ -0,0 +1,22 @@
1
+ require "spec_helper"
2
+
3
+ describe Enumerating, :needs_enumerators => true do
4
+
5
+ describe "#zipping" do
6
+
7
+ it "zips together multiple Enumerables" do
8
+ @array1 = [1,3,6]
9
+ @array2 = [2,4,7]
10
+ @array3 = [5,8]
11
+ @zip = Enumerating.zipping(@array1, @array2, @array3)
12
+ @zip.to_a.should == [[1,2,5], [3,4,8], [6,7,nil]]
13
+ end
14
+
15
+ it "is lazy" do
16
+ @zip = Enumerating.zipping(%w(a b c), [1,2].with_time_bomb)
17
+ @zip.take(2).should == [["a", 1], ["b", 2]]
18
+ end
19
+
20
+ end
21
+
22
+ end
@@ -0,0 +1,37 @@
1
+ require "rubygems"
2
+
3
+ require 'rspec'
4
+
5
+ RSpec.configure do |config|
6
+ unless defined?(::Enumerator)
7
+ config.filter_run_excluding :needs_enumerators => true
8
+ end
9
+ end
10
+
11
+ class Boom < StandardError; end
12
+
13
+ class WithTimeBomb
14
+
15
+ include Enumerable
16
+
17
+ def initialize(source)
18
+ @source = source
19
+ end
20
+
21
+ def each(&block)
22
+ @source.each(&block)
23
+ raise Boom
24
+ end
25
+
26
+ end
27
+
28
+ module Enumerable
29
+
30
+ # extend an Enumerable to throw an exception after last element
31
+ def with_time_bomb
32
+ WithTimeBomb.new(self)
33
+ end
34
+
35
+ end
36
+
37
+ require "enumerating"
metadata ADDED
@@ -0,0 +1,72 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: enumerating
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 1.0.0
6
+ platform: ruby
7
+ authors:
8
+ - Mike Williams
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2011-05-05 00:00:00 Z
14
+ dependencies: []
15
+
16
+ description: Enumerating extends Enumerable with "lazy" versions of various operations, allowing streamed processing of large (or even infinite) collections. Even in Ruby 1.8.x.
17
+ email: mdub@dogbiscuit.org
18
+ executables: []
19
+
20
+ extensions: []
21
+
22
+ extra_rdoc_files: []
23
+
24
+ files:
25
+ - .gitignore
26
+ - Gemfile
27
+ - README.markdown
28
+ - Rakefile
29
+ - benchmarks/pipeline_bench.rb
30
+ - enumerating.gemspec
31
+ - lib/enumerating.rb
32
+ - lib/enumerating/filtering.rb
33
+ - lib/enumerating/merging.rb
34
+ - lib/enumerating/mixing.rb
35
+ - lib/enumerating/version.rb
36
+ - lib/enumerating/zipping.rb
37
+ - spec/enumerating/filtering_spec.rb
38
+ - spec/enumerating/merging_spec.rb
39
+ - spec/enumerating/zipping_spec.rb
40
+ - spec/spec_helper.rb
41
+ homepage: http://github.com/mdub/enumerating
42
+ licenses: []
43
+
44
+ post_install_message:
45
+ rdoc_options: []
46
+
47
+ require_paths:
48
+ - lib
49
+ required_ruby_version: !ruby/object:Gem::Requirement
50
+ none: false
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: "0"
55
+ required_rubygems_version: !ruby/object:Gem::Requirement
56
+ none: false
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: "0"
61
+ requirements: []
62
+
63
+ rubyforge_project:
64
+ rubygems_version: 1.7.2
65
+ signing_key:
66
+ specification_version: 3
67
+ summary: Lazy filtering/transforming of Enumerable collections
68
+ test_files:
69
+ - spec/enumerating/filtering_spec.rb
70
+ - spec/enumerating/merging_spec.rb
71
+ - spec/enumerating/zipping_spec.rb
72
+ - spec/spec_helper.rb