crunchpipe 0.0.1beta1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.gitignore ADDED
@@ -0,0 +1,19 @@
1
+ ## MAC OS
2
+ .DS_Store
3
+
4
+ ## TEXTMATE
5
+ *.tmproj
6
+ tmtags
7
+
8
+ ## EMACS
9
+ *~
10
+ \#*
11
+ .\#*
12
+
13
+ ## VIM
14
+ *.swp
15
+
16
+ ## PROJECT
17
+ .bundle
18
+ coverage*
19
+ pkg*
data/.rvmrc ADDED
@@ -0,0 +1 @@
1
+ rvm use 1.9.2@CrunchPipe --create
data/Gemfile ADDED
@@ -0,0 +1,17 @@
1
+ # A sample Gemfile
2
+ source "http://rubygems.org"
3
+
4
+ gem 'rake'
5
+ gem 'parallel'
6
+ gem 'ruby-debug19'
7
+
8
+ group :development do
9
+ gem 'ruby-debug19'
10
+ end
11
+
12
+ group :test do
13
+ gem 'rspec'
14
+ gem 'simplecov'
15
+ end
16
+
17
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,49 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ crunchpipe (0.0.1beta1)
5
+
6
+ GEM
7
+ remote: http://rubygems.org/
8
+ specs:
9
+ archive-tar-minitar (0.5.2)
10
+ columnize (0.3.4)
11
+ diff-lcs (1.1.3)
12
+ linecache19 (0.5.12)
13
+ ruby_core_source (>= 0.1.4)
14
+ multi_json (1.0.3)
15
+ parallel (0.5.9)
16
+ rake (0.9.2)
17
+ rspec (2.6.0)
18
+ rspec-core (~> 2.6.0)
19
+ rspec-expectations (~> 2.6.0)
20
+ rspec-mocks (~> 2.6.0)
21
+ rspec-core (2.6.4)
22
+ rspec-expectations (2.6.0)
23
+ diff-lcs (~> 1.1.2)
24
+ rspec-mocks (2.6.0)
25
+ ruby-debug-base19 (0.11.25)
26
+ columnize (>= 0.3.1)
27
+ linecache19 (>= 0.5.11)
28
+ ruby_core_source (>= 0.1.4)
29
+ ruby-debug19 (0.11.6)
30
+ columnize (>= 0.3.1)
31
+ linecache19 (>= 0.5.11)
32
+ ruby-debug-base19 (>= 0.11.19)
33
+ ruby_core_source (0.1.5)
34
+ archive-tar-minitar (>= 0.5.2)
35
+ simplecov (0.5.3)
36
+ multi_json (~> 1.0.3)
37
+ simplecov-html (~> 0.5.3)
38
+ simplecov-html (0.5.3)
39
+
40
+ PLATFORMS
41
+ ruby
42
+
43
+ DEPENDENCIES
44
+ crunchpipe!
45
+ parallel
46
+ rake
47
+ rspec
48
+ ruby-debug19
49
+ simplecov
data/README.md ADDED
@@ -0,0 +1,74 @@
1
+ CrunchPipe
2
+ ==========
3
+
4
+ CrunchPipe is a library for creating and coordinating modular
5
+ computation pipelines. Computation can take place in parallel and data
6
+ sources are kept separate from the computation itself leading to
7
+ modular and maintainable programs.
8
+
9
+ The Basics
10
+ ----------
11
+
12
+ CrunchPipe utilized computation pipelines connected to streams to
13
+ model the processing of data.
14
+
15
+
16
+
17
+ /--------------\
18
+ | Input Stream |
19
+ \--------------/
20
+
21
+ ||
22
+ \/
23
+
24
+ /----------\
25
+ | Pipeline |
26
+ |----------|
27
+ | Op 1 |
28
+ |----------|
29
+ | Op 2 |
30
+ |----------|
31
+ | Op 3 |
32
+ \----------/
33
+
34
+ ||
35
+ \/
36
+
37
+ /---------------\
38
+ | Output Stream |
39
+ \---------------/
40
+
41
+ Streams
42
+ ----------
43
+
44
+ Streams are the sources and sinks of data. You create a stream and add
45
+ elements to it. All pipelines connected to the stream will be alerted
46
+ when data is added to a stream. Pipelines also write their finished
47
+ results to a stream which can, optionally, have other pipelines
48
+ connected to it. Since streams are also data sinks, streams can be
49
+ provided with the means to save the results of computation in an
50
+ abstract and general way.
51
+
52
+ Pipelines
53
+ ----------
54
+
55
+ Pipelines represent computational processes. When a pipeline is
56
+ created, you can bind an arbitrary number of transformations to it in
57
+ the form of blocks to create an "assembly line" of operations to be
58
+ performed on data. Pipelines are connected to streams and will be
59
+ notified when new data is available. Each new element from the stream
60
+ will be run through the bound operations in the order in which they
61
+ were bound to the pipeline. However, the elements obtained from
62
+ streams can be processed in parallel (threads or processes) thus
63
+ leading to performance improvements. Since the order of operation
64
+ application is preserved, it is the elements from the stream which are
65
+ processed in parallel. The parallelism is encapsulated within the
66
+ pipeline thus freeing the developer from the concerns traditionally
67
+ associated with writing parallel code.
68
+
69
+
70
+ ToDo
71
+ ----------
72
+
73
+ * Get specs passing, dammit
74
+ * Improved DSL
data/Rakefile ADDED
@@ -0,0 +1,14 @@
1
+
2
+ require "rubygems"
3
+ require "bundler/setup"
4
+
5
+ require 'rake'
6
+ require 'rspec/core/rake_task'
7
+ require 'bundler/gem_tasks'
8
+
9
+ task :default => :spec
10
+
11
+ desc "Run all examples"
12
+ RSpec::Core::RakeTask.new(:spec) do |t|
13
+ t.rspec_opts = '--format documentation --color'
14
+ end
data/configure ADDED
@@ -0,0 +1,9 @@
1
+ #!/usr/bin/env bash
2
+ # Configure build environment.
3
+ #
4
+ # @author Benjamin Oakes <hello@benjaminoakes.com>
5
+
6
+ echo "[$0] starting"
7
+ gem install bundler --version '~> 1.0.21' --no-rdoc --no-ri
8
+ bundle install
9
+ echo "[$0] finished"
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "crunchpipe/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "crunchpipe"
7
+ s.version = Crunchpipe::VERSION
8
+ s.authors = ["yonkeltron"]
9
+ s.email = ["yonkeltron@gmail.com"]
10
+ s.homepage = "https://github.com/yonkeltron/CrunchPipe"
11
+ s.summary = %q{A library for modular, pipeline-based computation}
12
+ s.description = %q{Using the data-pipeline pattern loosely-based on dataflow programming, CrunchPipe helps you to write modular, cohesive, loosely-coupled programs for computation with optional features for parallelization.}
13
+ s.has_rdoc = false
14
+
15
+ s.rubyforge_project = "crunchpipe"
16
+
17
+ s.files = `git ls-files`.split("\n")
18
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
20
+ s.require_paths = ["lib"]
21
+ end
@@ -0,0 +1,25 @@
1
+ require_relative '../lib/crunchpipe'
2
+
3
+ provider = CrunchPipe::DataProvider.new
4
+
5
+ input_stream = CrunchPipe::Stream.new
6
+
7
+ pipeline = CrunchPipe::Pipeline.new(:parallel => true)
8
+
9
+ pipeline.bind do |element|
10
+ puts "--- Processing #{element}..."
11
+ element + 1
12
+ end
13
+
14
+ output_stream = CrunchPipe::Stream.new
15
+
16
+ end_point = CrunchPipe::DataEndPoint.new do |data|
17
+ puts "+++ End point got #{data}"
18
+ end
19
+
20
+ provider | input_stream
21
+ pipeline << input_stream
22
+ pipeline | output_stream
23
+ output_stream > end_point
24
+
25
+ provider.provide([1,2,3,4,5])
data/lib/crunchpipe.rb ADDED
@@ -0,0 +1,12 @@
1
+ require_relative './crunchpipe/pipeline'
2
+ require_relative './crunchpipe/stream'
3
+ require_relative './crunchpipe/data_end_point'
4
+ require_relative './crunchpipe/data_provider'
5
+
6
+ module CrunchPipe
7
+ class InvalidProcessorError < Exception
8
+ end
9
+
10
+ class MissingActionError < Exception
11
+ end
12
+ end
@@ -0,0 +1,17 @@
1
+ module CrunchPipe
2
+ class DataEndPoint
3
+
4
+ def initialize(&block)
5
+ if block
6
+ @default_action_block = block
7
+ else
8
+ fail MissingActionError
9
+ end
10
+ end
11
+
12
+ def receive(data)
13
+ @default_action_block.yield data
14
+ end
15
+
16
+ end
17
+ end
@@ -0,0 +1,22 @@
1
+ module CrunchPipe
2
+ class DataProvider
3
+
4
+ attr_reader :output_streams
5
+
6
+ def initialize(stream = nil)
7
+ if stream
8
+ @output_streams = [stream]
9
+ else
10
+ @output_streams = []
11
+ end
12
+ end
13
+
14
+ def |(stream)
15
+ @output_streams.push stream
16
+ end
17
+
18
+ def provide(data)
19
+ @output_streams.each {|stream| stream.add data }
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,63 @@
1
+ require 'parallel'
2
+
3
+ module CrunchPipe
4
+ class Pipeline
5
+ attr_accessor :parallel
6
+ attr_reader :processors, :default_sink
7
+
8
+ def initialize(args)
9
+ @parallel = args[:parallel]
10
+ @processors = []
11
+ @default_sink = nil
12
+ end
13
+
14
+ def bind(processor = nil, &block)
15
+ if block_given?
16
+ Pipeline.check_arity(block)
17
+ @processors.push block
18
+ elsif processor
19
+ Pipeline.check_arity(processor)
20
+ @processors.push processor
21
+ end
22
+ end
23
+
24
+ def update(stream, elements)
25
+ if @parallel
26
+ results = Parallel.map(elements) {|element| process element }
27
+ else
28
+ results = elements.map {|element| process element }
29
+ end
30
+
31
+ @default_sink.add results if @default_sink
32
+
33
+ results
34
+ end
35
+
36
+ def process(element)
37
+ result = element
38
+ @processors.each do |processor|
39
+ result = processor.yield(result)
40
+ end
41
+
42
+ result
43
+ end
44
+
45
+ def <(stream)
46
+ stream.add_observer(self)
47
+ end
48
+
49
+ def |(stream)
50
+ @default_sink = stream
51
+ end
52
+
53
+ def self.check_arity(processor)
54
+ unless processor.is_a?(Proc)
55
+ fail CrunchPipe::InvalidProcessorError, "Processor must be a Proc but was a #{processor.class}"
56
+ end
57
+
58
+ unless processor.arity == 1
59
+ fail CrunchPipe::InvalidProcessorError, "Processor must take 1 argument but instead takes #{processor.arity}"
60
+ end
61
+ end
62
+ end
63
+ end
@@ -0,0 +1,21 @@
1
+ require 'thread'
2
+ require 'observer'
3
+
4
+ module CrunchPipe
5
+ class Stream
6
+ include Observable
7
+
8
+ attr_reader :default_end_point
9
+
10
+ def add(elements = [])
11
+ changed
12
+ notify_observers(self, elements)
13
+ @default_end_point.receive elements if @default_end_point
14
+ end
15
+
16
+ def >(end_point)
17
+ @default_end_point = end_point
18
+ end
19
+
20
+ end
21
+ end
@@ -0,0 +1,3 @@
1
+ module Crunchpipe
2
+ VERSION = "0.0.1beta1"
3
+ end
@@ -0,0 +1,39 @@
1
+ require 'spec_helper'
2
+
3
+ describe CrunchPipe::DataEndPoint do
4
+
5
+ describe '.new' do
6
+ context 'given a block' do
7
+ it 'does not throw' do
8
+ lambda {
9
+ CrunchPipe::DataEndPoint.new {}
10
+ }.should_not raise_error
11
+ end
12
+ end
13
+
14
+ context 'given no block' do
15
+ it 'throws' do
16
+ lambda {
17
+ CrunchPipe::DataEndPoint.new
18
+ }.should raise_error(CrunchPipe::MissingActionError)
19
+ end
20
+ end
21
+ end
22
+
23
+ describe '#receive' do
24
+ it 'yields data to the block' do
25
+ results = []
26
+
27
+ data_end_point = CrunchPipe::DataEndPoint.new do |data|
28
+ results.push data
29
+ end
30
+
31
+ fake_data = [1,2,3]
32
+
33
+ data_end_point.receive(fake_data)
34
+
35
+ results.should include(fake_data)
36
+ end
37
+ end
38
+
39
+ end
@@ -0,0 +1,44 @@
1
+ require 'spec_helper'
2
+
3
+ describe CrunchPipe::DataProvider do
4
+ let(:stream) { stub(CrunchPipe::Stream) }
5
+ describe '.new' do
6
+ context 'given no output stream' do
7
+ it 'sets an empty array for output streams' do
8
+ provider = CrunchPipe::DataProvider.new
9
+ provider.output_streams.should == []
10
+ end
11
+ end
12
+
13
+ context 'given an output stream'do
14
+ it 'the output stream list contains the stream' do
15
+ provider = CrunchPipe::DataProvider.new(stream)
16
+ provider.output_streams.should include(stream)
17
+ end
18
+ end
19
+ end
20
+
21
+ describe '#|' do
22
+ it 'adds stream to output stream list' do
23
+ provider = CrunchPipe::DataProvider.new
24
+ provider | stream
25
+ provider.output_streams.should include(stream)
26
+ end
27
+ end
28
+
29
+ describe '#provide' do
30
+ it 'sends data to registered streams' do
31
+ streams = (0..5).map { stub(CrunchPipe::Stream, :add => true) }
32
+ provider = CrunchPipe::DataProvider.new
33
+
34
+ fake_data = [1,2,3,4,5]
35
+
36
+ streams.each do |stream|
37
+ provider | stream
38
+ stream.should_receive(:add).with(fake_data)
39
+ end
40
+
41
+ provider.provide fake_data
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,163 @@
1
+ require 'spec_helper'
2
+
3
+ describe CrunchPipe::Pipeline do
4
+ before(:each) do
5
+ @pipeline_name = 'panda'
6
+ @pipeline_parallel = false
7
+ @pipeline = CrunchPipe::Pipeline.new(:parallel => @pipeline_parallel)
8
+ end
9
+
10
+ context 'initialization' do
11
+ it "sets parallel flag" do
12
+ @pipeline.parallel.should == @pipeline_parallel
13
+ end
14
+
15
+ it 'sets an empty processor array' do
16
+ @pipeline.processors.should == []
17
+ end
18
+ end
19
+
20
+ describe "#bind" do
21
+ context 'given a proc' do
22
+ it 'adds proc to pipeline' do
23
+ processor = Proc.new {|a|}
24
+ expect {
25
+ @pipeline.bind processor
26
+ }.to change(@pipeline.processors, :count).by(1)
27
+
28
+ @pipeline.processors.should include(processor)
29
+ end
30
+ end
31
+
32
+ context 'given a block' do
33
+ it 'adds block to pipeline' do
34
+ expect {
35
+ @pipeline.bind do |i|
36
+ end
37
+ }.to change(@pipeline.processors, :count).by(1)
38
+ end
39
+ end
40
+ end
41
+
42
+ describe '#<' do
43
+ let(:fake_source) do
44
+ stub(:add_observer => nil,
45
+ :delete_observer => nil)
46
+ end
47
+
48
+ it 'subscribes to source' do
49
+ fake_source.should_receive(:add_observer).with(@pipeline)
50
+ @pipeline < fake_source
51
+ end
52
+ end
53
+
54
+ describe '#|' do
55
+ it 'sets stream as default sink' do
56
+ fake_sink = 'Fake Stream'
57
+ @pipeline | fake_sink
58
+ @pipeline.default_sink.should == fake_sink
59
+ end
60
+ end
61
+
62
+ describe '#process' do
63
+ context 'given a single processor' do
64
+ it 'runs element through the processors' do
65
+ @pipeline.bind do |elem|
66
+ elem + 1
67
+ end
68
+
69
+ @pipeline.process(1).should == 2
70
+ end
71
+ end
72
+
73
+ context 'given multiple processors' do
74
+ it 'runs element through all processors' do
75
+ n = 5
76
+
77
+ n.times do
78
+ @pipeline.bind lambda {|elem|
79
+ elem + 1
80
+ }
81
+ end
82
+
83
+ @pipeline.processors.count.should == n
84
+
85
+ @pipeline.process(1).should == n+1
86
+ end
87
+ end
88
+ end
89
+
90
+ describe '#update' do
91
+ let(:data) { [1,1,1,1] }
92
+ let(:output) { stub(CrunchPipe::Stream, :add_observer => true, :add => true) }
93
+
94
+ before(:each) do
95
+ @pipeline.bind lambda {|elem|
96
+ elem + 1
97
+ }
98
+
99
+ @pipeline | output
100
+
101
+ @pipeline.parallel = false
102
+ end
103
+
104
+ context 'given a non-parallel pipeline' do
105
+ it 'processes all elements' do
106
+ @pipeline.should_receive(:process).with(1).exactly(data.length).times.and_return(2)
107
+ @pipeline.update(output, data)
108
+ end
109
+
110
+ it 'adds results to output stream' do
111
+ output.should_receive(:add).with(data.map {|i| i + 1 })
112
+ @pipeline.update(output, data)
113
+ end
114
+ end
115
+
116
+ context 'given a parallel pipeline' do
117
+ before(:each) do
118
+ @pipeline.parallel = true
119
+ end
120
+
121
+ it 'processes all elements in parallel' do
122
+ Parallel.should_receive(:map)
123
+ @pipeline.update(output, data)
124
+ end
125
+ end
126
+ end
127
+
128
+ describe '.check_arity' do
129
+ context 'given a non-proc' do
130
+ it 'throws' do
131
+ lambda {
132
+ CrunchPipe::Pipeline.check_arity('Panda')
133
+ }.should raise_error
134
+ end
135
+ end
136
+
137
+ context 'given a Proc' do
138
+ context 'with an arity of 0' do
139
+ it 'throws' do
140
+ lambda {
141
+ CrunchPipe::Pipeline.check_arity(Proc.new {})
142
+ }.should raise_error
143
+ end
144
+ end
145
+
146
+ context 'with an arity of 1' do
147
+ it 'does not throw' do
148
+ lambda {
149
+ CrunchPipe::Pipeline.check_arity(Proc.new {|a|})
150
+ }.should_not raise_error
151
+ end
152
+ end
153
+
154
+ context 'with an arity greater than 1' do
155
+ it 'throws' do
156
+ lambda {
157
+ CrunchPipe::Pipeline.check_arity(Proc.new {|a,b|})
158
+ }.should raise_error
159
+ end
160
+ end
161
+ end
162
+ end
163
+ end
@@ -0,0 +1,5 @@
1
+ require 'simplecov'
2
+ SimpleCov.start
3
+
4
+ require_relative File.join(File.dirname(__FILE__), '..', 'lib', 'crunchpipe')
5
+ require 'parallel'
@@ -0,0 +1,49 @@
1
+ require 'spec_helper'
2
+
3
+ describe CrunchPipe::Stream do
4
+ before(:each) do
5
+ @stream = CrunchPipe::Stream.new
6
+ end
7
+
8
+ it 'is observable' do
9
+ @stream.class.should include(Observable)
10
+ end
11
+
12
+ describe '#add' do
13
+ let(:pipeline) { stub(CrunchPipe::Pipeline, :update => true)}
14
+ let(:data) { [1,1,1,1] }
15
+ let(:fake_end_point) { fake_end_point = stub(CrunchPipe::DataEndPoint, :receive => true) }
16
+
17
+ it 'notifies observers' do
18
+ @stream.add_observer(pipeline)
19
+
20
+ pipeline.should_receive(:update).exactly(1).times
21
+
22
+ @stream.add data
23
+ end
24
+
25
+ context 'given a default_end_point' do
26
+ it 'calls receive on the end point with the elements' do
27
+ @stream > fake_end_point
28
+ fake_end_point.should_receive(:receive).with(data)
29
+ @stream.add data
30
+ end
31
+ end
32
+
33
+ context 'given no default_end_point' do
34
+ it 'does not pass elements to end point' do
35
+ fake_end_point.should_not_receive(:receive)
36
+ @stream.add data
37
+ end
38
+ end
39
+ end
40
+
41
+ describe '#>' do
42
+ it 'sets the default end point' do
43
+ fake_end_point = "Fake End Point"
44
+ @stream > fake_end_point
45
+ @stream.default_end_point.should == fake_end_point
46
+ end
47
+ end
48
+ end
49
+
metadata ADDED
@@ -0,0 +1,70 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: crunchpipe
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1beta1
5
+ prerelease: 5
6
+ platform: ruby
7
+ authors:
8
+ - yonkeltron
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2011-10-16 00:00:00.000000000Z
13
+ dependencies: []
14
+ description: Using the data-pipeline pattern loosely-based on dataflow programming,
15
+ CrunchPipe helps you to write modular, cohesive, loosely-coupled programs for computation
16
+ with optional features for parallelization.
17
+ email:
18
+ - yonkeltron@gmail.com
19
+ executables: []
20
+ extensions: []
21
+ extra_rdoc_files: []
22
+ files:
23
+ - .gitignore
24
+ - .rvmrc
25
+ - Gemfile
26
+ - Gemfile.lock
27
+ - README.md
28
+ - Rakefile
29
+ - configure
30
+ - crunchpipe.gemspec
31
+ - examples/complete_hello_world.rb
32
+ - lib/crunchpipe.rb
33
+ - lib/crunchpipe/data_end_point.rb
34
+ - lib/crunchpipe/data_provider.rb
35
+ - lib/crunchpipe/pipeline.rb
36
+ - lib/crunchpipe/stream.rb
37
+ - lib/crunchpipe/version.rb
38
+ - spec/data_end_point_spec.rb
39
+ - spec/data_provider_spec.rb
40
+ - spec/pipeline_spec.rb
41
+ - spec/spec_helper.rb
42
+ - spec/stream_spec.rb
43
+ homepage: https://github.com/yonkeltron/CrunchPipe
44
+ licenses: []
45
+ post_install_message:
46
+ rdoc_options: []
47
+ require_paths:
48
+ - lib
49
+ required_ruby_version: !ruby/object:Gem::Requirement
50
+ none: false
51
+ requirements:
52
+ - - ! '>='
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ segments:
56
+ - 0
57
+ hash: -269882111
58
+ required_rubygems_version: !ruby/object:Gem::Requirement
59
+ none: false
60
+ requirements:
61
+ - - ! '>'
62
+ - !ruby/object:Gem::Version
63
+ version: 1.3.1
64
+ requirements: []
65
+ rubyforge_project: crunchpipe
66
+ rubygems_version: 1.8.11
67
+ signing_key:
68
+ specification_version: 3
69
+ summary: A library for modular, pipeline-based computation
70
+ test_files: []