crunchpipe 0.0.1beta1

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,19 @@
1
+ ## MAC OS
2
+ .DS_Store
3
+
4
+ ## TEXTMATE
5
+ *.tmproj
6
+ tmtags
7
+
8
+ ## EMACS
9
+ *~
10
+ \#*
11
+ .\#*
12
+
13
+ ## VIM
14
+ *.swp
15
+
16
+ ## PROJECT
17
+ .bundle
18
+ coverage*
19
+ pkg*
data/.rvmrc ADDED
@@ -0,0 +1 @@
1
+ rvm use 1.9.2@CrunchPipe --create
data/Gemfile ADDED
@@ -0,0 +1,17 @@
1
+ # A sample Gemfile
2
+ source "http://rubygems.org"
3
+
4
+ gem 'rake'
5
+ gem 'parallel'
6
+ gem 'ruby-debug19'
7
+
8
+ group :development do
9
+ gem 'ruby-debug19'
10
+ end
11
+
12
+ group :test do
13
+ gem 'rspec'
14
+ gem 'simplecov'
15
+ end
16
+
17
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,49 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ crunchpipe (0.0.1beta1)
5
+
6
+ GEM
7
+ remote: http://rubygems.org/
8
+ specs:
9
+ archive-tar-minitar (0.5.2)
10
+ columnize (0.3.4)
11
+ diff-lcs (1.1.3)
12
+ linecache19 (0.5.12)
13
+ ruby_core_source (>= 0.1.4)
14
+ multi_json (1.0.3)
15
+ parallel (0.5.9)
16
+ rake (0.9.2)
17
+ rspec (2.6.0)
18
+ rspec-core (~> 2.6.0)
19
+ rspec-expectations (~> 2.6.0)
20
+ rspec-mocks (~> 2.6.0)
21
+ rspec-core (2.6.4)
22
+ rspec-expectations (2.6.0)
23
+ diff-lcs (~> 1.1.2)
24
+ rspec-mocks (2.6.0)
25
+ ruby-debug-base19 (0.11.25)
26
+ columnize (>= 0.3.1)
27
+ linecache19 (>= 0.5.11)
28
+ ruby_core_source (>= 0.1.4)
29
+ ruby-debug19 (0.11.6)
30
+ columnize (>= 0.3.1)
31
+ linecache19 (>= 0.5.11)
32
+ ruby-debug-base19 (>= 0.11.19)
33
+ ruby_core_source (0.1.5)
34
+ archive-tar-minitar (>= 0.5.2)
35
+ simplecov (0.5.3)
36
+ multi_json (~> 1.0.3)
37
+ simplecov-html (~> 0.5.3)
38
+ simplecov-html (0.5.3)
39
+
40
+ PLATFORMS
41
+ ruby
42
+
43
+ DEPENDENCIES
44
+ crunchpipe!
45
+ parallel
46
+ rake
47
+ rspec
48
+ ruby-debug19
49
+ simplecov
data/README.md ADDED
@@ -0,0 +1,74 @@
1
+ CrunchPipe
2
+ ==========
3
+
4
+ CrunchPipe is a library for creating and coordinating modular
5
+ computation pipelines. Computation can take place in parallel and data
6
+ sources are kept separate from the computation itself leading to
7
+ modular and maintainable programs.
8
+
9
+ The Basics
10
+ ----------
11
+
12
+ CrunchPipe utilized computation pipelines connected to streams to
13
+ model the processing of data.
14
+
15
+
16
+
17
+ /--------------\
18
+ | Input Stream |
19
+ \--------------/
20
+
21
+ ||
22
+ \/
23
+
24
+ /----------\
25
+ | Pipeline |
26
+ |----------|
27
+ | Op 1 |
28
+ |----------|
29
+ | Op 2 |
30
+ |----------|
31
+ | Op 3 |
32
+ \----------/
33
+
34
+ ||
35
+ \/
36
+
37
+ /---------------\
38
+ | Output Stream |
39
+ \---------------/
40
+
41
+ Streams
42
+ ----------
43
+
44
+ Streams are the sources and sinks of data. You create a stream and add
45
+ elements to it. All pipelines connected to the stream will be alerted
46
+ when data is added to a stream. Pipelines also write their finished
47
+ results to a stream which can, optionally, have other pipelines
48
+ connected to it. Since streams are also data sinks, streams can be
49
+ provided with the means to save the results of computation in an
50
+ abstract and general way.
51
+
52
+ Pipelines
53
+ ----------
54
+
55
+ Pipelines represent computational processes. When a pipeline is
56
+ created, you can bind an arbitrary number of transformations to it in
57
+ the form of blocks to create an "assembly line" of operations to be
58
+ performed on data. Pipelines are connected to streams and will be
59
+ notified when new data is available. Each new element from the stream
60
+ will be run through the bound operations in the order in which they
61
+ were bound to the pipeline. However, the elements obtained from
62
+ streams can be processed in parallel (threads or processes) thus
63
+ leading to performance improvements. Since the order of operation
64
+ application is preserved, it is the elements from the stream which are
65
+ processed in parallel. The parallelism is encapsulated within the
66
+ pipeline thus freeing the developer from the concerns traditionally
67
+ associated with writing parallel code.
68
+
69
+
70
+ ToDo
71
+ ----------
72
+
73
+ * Get specs passing, dammit
74
+ * Improved DSL
data/Rakefile ADDED
@@ -0,0 +1,14 @@
1
+
2
+ require "rubygems"
3
+ require "bundler/setup"
4
+
5
+ require 'rake'
6
+ require 'rspec/core/rake_task'
7
+ require 'bundler/gem_tasks'
8
+
9
+ task :default => :spec
10
+
11
+ desc "Run all examples"
12
+ RSpec::Core::RakeTask.new(:spec) do |t|
13
+ t.rspec_opts = '--format documentation --color'
14
+ end
data/configure ADDED
@@ -0,0 +1,9 @@
1
+ #!/usr/bin/env bash
2
+ # Configure build environment.
3
+ #
4
+ # @author Benjamin Oakes <hello@benjaminoakes.com>
5
+
6
+ echo "[$0] starting"
7
+ gem install bundler --version '~> 1.0.21' --no-rdoc --no-ri
8
+ bundle install
9
+ echo "[$0] finished"
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "crunchpipe/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "crunchpipe"
7
+ s.version = Crunchpipe::VERSION
8
+ s.authors = ["yonkeltron"]
9
+ s.email = ["yonkeltron@gmail.com"]
10
+ s.homepage = "https://github.com/yonkeltron/CrunchPipe"
11
+ s.summary = %q{A library for modular, pipeline-based computation}
12
+ s.description = %q{Using the data-pipeline pattern loosely-based on dataflow programming, CrunchPipe helps you to write modular, cohesive, loosely-coupled programs for computation with optional features for parallelization.}
13
+ s.has_rdoc = false
14
+
15
+ s.rubyforge_project = "crunchpipe"
16
+
17
+ s.files = `git ls-files`.split("\n")
18
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
20
+ s.require_paths = ["lib"]
21
+ end
@@ -0,0 +1,25 @@
1
+ require_relative '../lib/crunchpipe'
2
+
3
+ provider = CrunchPipe::DataProvider.new
4
+
5
+ input_stream = CrunchPipe::Stream.new
6
+
7
+ pipeline = CrunchPipe::Pipeline.new(:parallel => true)
8
+
9
+ pipeline.bind do |element|
10
+ puts "--- Processing #{element}..."
11
+ element + 1
12
+ end
13
+
14
+ output_stream = CrunchPipe::Stream.new
15
+
16
+ end_point = CrunchPipe::DataEndPoint.new do |data|
17
+ puts "+++ End point got #{data}"
18
+ end
19
+
20
+ provider | input_stream
21
+ pipeline << input_stream
22
+ pipeline | output_stream
23
+ output_stream > end_point
24
+
25
+ provider.provide([1,2,3,4,5])
data/lib/crunchpipe.rb ADDED
@@ -0,0 +1,12 @@
1
+ require_relative './crunchpipe/pipeline'
2
+ require_relative './crunchpipe/stream'
3
+ require_relative './crunchpipe/data_end_point'
4
+ require_relative './crunchpipe/data_provider'
5
+
6
+ module CrunchPipe
7
+ class InvalidProcessorError < Exception
8
+ end
9
+
10
+ class MissingActionError < Exception
11
+ end
12
+ end
@@ -0,0 +1,17 @@
1
+ module CrunchPipe
2
+ class DataEndPoint
3
+
4
+ def initialize(&block)
5
+ if block
6
+ @default_action_block = block
7
+ else
8
+ fail MissingActionError
9
+ end
10
+ end
11
+
12
+ def receive(data)
13
+ @default_action_block.yield data
14
+ end
15
+
16
+ end
17
+ end
@@ -0,0 +1,22 @@
1
+ module CrunchPipe
2
+ class DataProvider
3
+
4
+ attr_reader :output_streams
5
+
6
+ def initialize(stream = nil)
7
+ if stream
8
+ @output_streams = [stream]
9
+ else
10
+ @output_streams = []
11
+ end
12
+ end
13
+
14
+ def |(stream)
15
+ @output_streams.push stream
16
+ end
17
+
18
+ def provide(data)
19
+ @output_streams.each {|stream| stream.add data }
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,63 @@
1
+ require 'parallel'
2
+
3
+ module CrunchPipe
4
+ class Pipeline
5
+ attr_accessor :parallel
6
+ attr_reader :processors, :default_sink
7
+
8
+ def initialize(args)
9
+ @parallel = args[:parallel]
10
+ @processors = []
11
+ @default_sink = nil
12
+ end
13
+
14
+ def bind(processor = nil, &block)
15
+ if block_given?
16
+ Pipeline.check_arity(block)
17
+ @processors.push block
18
+ elsif processor
19
+ Pipeline.check_arity(processor)
20
+ @processors.push processor
21
+ end
22
+ end
23
+
24
+ def update(stream, elements)
25
+ if @parallel
26
+ results = Parallel.map(elements) {|element| process element }
27
+ else
28
+ results = elements.map {|element| process element }
29
+ end
30
+
31
+ @default_sink.add results if @default_sink
32
+
33
+ results
34
+ end
35
+
36
+ def process(element)
37
+ result = element
38
+ @processors.each do |processor|
39
+ result = processor.yield(result)
40
+ end
41
+
42
+ result
43
+ end
44
+
45
+ def <(stream)
46
+ stream.add_observer(self)
47
+ end
48
+
49
+ def |(stream)
50
+ @default_sink = stream
51
+ end
52
+
53
+ def self.check_arity(processor)
54
+ unless processor.is_a?(Proc)
55
+ fail CrunchPipe::InvalidProcessorError, "Processor must be a Proc but was a #{processor.class}"
56
+ end
57
+
58
+ unless processor.arity == 1
59
+ fail CrunchPipe::InvalidProcessorError, "Processor must take 1 argument but instead takes #{processor.arity}"
60
+ end
61
+ end
62
+ end
63
+ end
@@ -0,0 +1,21 @@
1
+ require 'thread'
2
+ require 'observer'
3
+
4
+ module CrunchPipe
5
+ class Stream
6
+ include Observable
7
+
8
+ attr_reader :default_end_point
9
+
10
+ def add(elements = [])
11
+ changed
12
+ notify_observers(self, elements)
13
+ @default_end_point.receive elements if @default_end_point
14
+ end
15
+
16
+ def >(end_point)
17
+ @default_end_point = end_point
18
+ end
19
+
20
+ end
21
+ end
@@ -0,0 +1,3 @@
1
+ module Crunchpipe
2
+ VERSION = "0.0.1beta1"
3
+ end
@@ -0,0 +1,39 @@
1
+ require 'spec_helper'
2
+
3
+ describe CrunchPipe::DataEndPoint do
4
+
5
+ describe '.new' do
6
+ context 'given a block' do
7
+ it 'does not throw' do
8
+ lambda {
9
+ CrunchPipe::DataEndPoint.new {}
10
+ }.should_not raise_error
11
+ end
12
+ end
13
+
14
+ context 'given no block' do
15
+ it 'throws' do
16
+ lambda {
17
+ CrunchPipe::DataEndPoint.new
18
+ }.should raise_error(CrunchPipe::MissingActionError)
19
+ end
20
+ end
21
+ end
22
+
23
+ describe '#receive' do
24
+ it 'yields data to the block' do
25
+ results = []
26
+
27
+ data_end_point = CrunchPipe::DataEndPoint.new do |data|
28
+ results.push data
29
+ end
30
+
31
+ fake_data = [1,2,3]
32
+
33
+ data_end_point.receive(fake_data)
34
+
35
+ results.should include(fake_data)
36
+ end
37
+ end
38
+
39
+ end
@@ -0,0 +1,44 @@
1
+ require 'spec_helper'
2
+
3
+ describe CrunchPipe::DataProvider do
4
+ let(:stream) { stub(CrunchPipe::Stream) }
5
+ describe '.new' do
6
+ context 'given no output stream' do
7
+ it 'sets an empty array for output streams' do
8
+ provider = CrunchPipe::DataProvider.new
9
+ provider.output_streams.should == []
10
+ end
11
+ end
12
+
13
+ context 'given an output stream'do
14
+ it 'the output stream list contains the stream' do
15
+ provider = CrunchPipe::DataProvider.new(stream)
16
+ provider.output_streams.should include(stream)
17
+ end
18
+ end
19
+ end
20
+
21
+ describe '#|' do
22
+ it 'adds stream to output stream list' do
23
+ provider = CrunchPipe::DataProvider.new
24
+ provider | stream
25
+ provider.output_streams.should include(stream)
26
+ end
27
+ end
28
+
29
+ describe '#provide' do
30
+ it 'sends data to registered streams' do
31
+ streams = (0..5).map { stub(CrunchPipe::Stream, :add => true) }
32
+ provider = CrunchPipe::DataProvider.new
33
+
34
+ fake_data = [1,2,3,4,5]
35
+
36
+ streams.each do |stream|
37
+ provider | stream
38
+ stream.should_receive(:add).with(fake_data)
39
+ end
40
+
41
+ provider.provide fake_data
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,163 @@
1
+ require 'spec_helper'
2
+
3
+ describe CrunchPipe::Pipeline do
4
+ before(:each) do
5
+ @pipeline_name = 'panda'
6
+ @pipeline_parallel = false
7
+ @pipeline = CrunchPipe::Pipeline.new(:parallel => @pipeline_parallel)
8
+ end
9
+
10
+ context 'initialization' do
11
+ it "sets parallel flag" do
12
+ @pipeline.parallel.should == @pipeline_parallel
13
+ end
14
+
15
+ it 'sets an empty processor array' do
16
+ @pipeline.processors.should == []
17
+ end
18
+ end
19
+
20
+ describe "#bind" do
21
+ context 'given a proc' do
22
+ it 'adds proc to pipeline' do
23
+ processor = Proc.new {|a|}
24
+ expect {
25
+ @pipeline.bind processor
26
+ }.to change(@pipeline.processors, :count).by(1)
27
+
28
+ @pipeline.processors.should include(processor)
29
+ end
30
+ end
31
+
32
+ context 'given a block' do
33
+ it 'adds block to pipeline' do
34
+ expect {
35
+ @pipeline.bind do |i|
36
+ end
37
+ }.to change(@pipeline.processors, :count).by(1)
38
+ end
39
+ end
40
+ end
41
+
42
+ describe '#<' do
43
+ let(:fake_source) do
44
+ stub(:add_observer => nil,
45
+ :delete_observer => nil)
46
+ end
47
+
48
+ it 'subscribes to source' do
49
+ fake_source.should_receive(:add_observer).with(@pipeline)
50
+ @pipeline < fake_source
51
+ end
52
+ end
53
+
54
+ describe '#|' do
55
+ it 'sets stream as default sink' do
56
+ fake_sink = 'Fake Stream'
57
+ @pipeline | fake_sink
58
+ @pipeline.default_sink.should == fake_sink
59
+ end
60
+ end
61
+
62
+ describe '#process' do
63
+ context 'given a single processor' do
64
+ it 'runs element through the processors' do
65
+ @pipeline.bind do |elem|
66
+ elem + 1
67
+ end
68
+
69
+ @pipeline.process(1).should == 2
70
+ end
71
+ end
72
+
73
+ context 'given multiple processors' do
74
+ it 'runs element through all processors' do
75
+ n = 5
76
+
77
+ n.times do
78
+ @pipeline.bind lambda {|elem|
79
+ elem + 1
80
+ }
81
+ end
82
+
83
+ @pipeline.processors.count.should == n
84
+
85
+ @pipeline.process(1).should == n+1
86
+ end
87
+ end
88
+ end
89
+
90
+ describe '#update' do
91
+ let(:data) { [1,1,1,1] }
92
+ let(:output) { stub(CrunchPipe::Stream, :add_observer => true, :add => true) }
93
+
94
+ before(:each) do
95
+ @pipeline.bind lambda {|elem|
96
+ elem + 1
97
+ }
98
+
99
+ @pipeline | output
100
+
101
+ @pipeline.parallel = false
102
+ end
103
+
104
+ context 'given a non-parallel pipeline' do
105
+ it 'processes all elements' do
106
+ @pipeline.should_receive(:process).with(1).exactly(data.length).times.and_return(2)
107
+ @pipeline.update(output, data)
108
+ end
109
+
110
+ it 'adds results to output stream' do
111
+ output.should_receive(:add).with(data.map {|i| i + 1 })
112
+ @pipeline.update(output, data)
113
+ end
114
+ end
115
+
116
+ context 'given a parallel pipeline' do
117
+ before(:each) do
118
+ @pipeline.parallel = true
119
+ end
120
+
121
+ it 'processes all elements in parallel' do
122
+ Parallel.should_receive(:map)
123
+ @pipeline.update(output, data)
124
+ end
125
+ end
126
+ end
127
+
128
+ describe '.check_arity' do
129
+ context 'given a non-proc' do
130
+ it 'throws' do
131
+ lambda {
132
+ CrunchPipe::Pipeline.check_arity('Panda')
133
+ }.should raise_error
134
+ end
135
+ end
136
+
137
+ context 'given a Proc' do
138
+ context 'with an arity of 0' do
139
+ it 'throws' do
140
+ lambda {
141
+ CrunchPipe::Pipeline.check_arity(Proc.new {})
142
+ }.should raise_error
143
+ end
144
+ end
145
+
146
+ context 'with an arity of 1' do
147
+ it 'does not throw' do
148
+ lambda {
149
+ CrunchPipe::Pipeline.check_arity(Proc.new {|a|})
150
+ }.should_not raise_error
151
+ end
152
+ end
153
+
154
+ context 'with an arity greater than 1' do
155
+ it 'throws' do
156
+ lambda {
157
+ CrunchPipe::Pipeline.check_arity(Proc.new {|a,b|})
158
+ }.should raise_error
159
+ end
160
+ end
161
+ end
162
+ end
163
+ end
@@ -0,0 +1,5 @@
1
+ require 'simplecov'
2
+ SimpleCov.start
3
+
4
+ require_relative File.join(File.dirname(__FILE__), '..', 'lib', 'crunchpipe')
5
+ require 'parallel'
@@ -0,0 +1,49 @@
1
+ require 'spec_helper'
2
+
3
+ describe CrunchPipe::Stream do
4
+ before(:each) do
5
+ @stream = CrunchPipe::Stream.new
6
+ end
7
+
8
+ it 'is observable' do
9
+ @stream.class.should include(Observable)
10
+ end
11
+
12
+ describe '#add' do
13
+ let(:pipeline) { stub(CrunchPipe::Pipeline, :update => true)}
14
+ let(:data) { [1,1,1,1] }
15
+ let(:fake_end_point) { fake_end_point = stub(CrunchPipe::DataEndPoint, :receive => true) }
16
+
17
+ it 'notifies observers' do
18
+ @stream.add_observer(pipeline)
19
+
20
+ pipeline.should_receive(:update).exactly(1).times
21
+
22
+ @stream.add data
23
+ end
24
+
25
+ context 'given a default_end_point' do
26
+ it 'calls receive on the end point with the elements' do
27
+ @stream > fake_end_point
28
+ fake_end_point.should_receive(:receive).with(data)
29
+ @stream.add data
30
+ end
31
+ end
32
+
33
+ context 'given no default_end_point' do
34
+ it 'does not pass elements to end point' do
35
+ fake_end_point.should_not_receive(:receive)
36
+ @stream.add data
37
+ end
38
+ end
39
+ end
40
+
41
+ describe '#>' do
42
+ it 'sets the default end point' do
43
+ fake_end_point = "Fake End Point"
44
+ @stream > fake_end_point
45
+ @stream.default_end_point.should == fake_end_point
46
+ end
47
+ end
48
+ end
49
+
metadata ADDED
@@ -0,0 +1,70 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: crunchpipe
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1beta1
5
+ prerelease: 5
6
+ platform: ruby
7
+ authors:
8
+ - yonkeltron
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2011-10-16 00:00:00.000000000Z
13
+ dependencies: []
14
+ description: Using the data-pipeline pattern loosely-based on dataflow programming,
15
+ CrunchPipe helps you to write modular, cohesive, loosely-coupled programs for computation
16
+ with optional features for parallelization.
17
+ email:
18
+ - yonkeltron@gmail.com
19
+ executables: []
20
+ extensions: []
21
+ extra_rdoc_files: []
22
+ files:
23
+ - .gitignore
24
+ - .rvmrc
25
+ - Gemfile
26
+ - Gemfile.lock
27
+ - README.md
28
+ - Rakefile
29
+ - configure
30
+ - crunchpipe.gemspec
31
+ - examples/complete_hello_world.rb
32
+ - lib/crunchpipe.rb
33
+ - lib/crunchpipe/data_end_point.rb
34
+ - lib/crunchpipe/data_provider.rb
35
+ - lib/crunchpipe/pipeline.rb
36
+ - lib/crunchpipe/stream.rb
37
+ - lib/crunchpipe/version.rb
38
+ - spec/data_end_point_spec.rb
39
+ - spec/data_provider_spec.rb
40
+ - spec/pipeline_spec.rb
41
+ - spec/spec_helper.rb
42
+ - spec/stream_spec.rb
43
+ homepage: https://github.com/yonkeltron/CrunchPipe
44
+ licenses: []
45
+ post_install_message:
46
+ rdoc_options: []
47
+ require_paths:
48
+ - lib
49
+ required_ruby_version: !ruby/object:Gem::Requirement
50
+ none: false
51
+ requirements:
52
+ - - ! '>='
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ segments:
56
+ - 0
57
+ hash: -269882111
58
+ required_rubygems_version: !ruby/object:Gem::Requirement
59
+ none: false
60
+ requirements:
61
+ - - ! '>'
62
+ - !ruby/object:Gem::Version
63
+ version: 1.3.1
64
+ requirements: []
65
+ rubyforge_project: crunchpipe
66
+ rubygems_version: 1.8.11
67
+ signing_key:
68
+ specification_version: 3
69
+ summary: A library for modular, pipeline-based computation
70
+ test_files: []