andromeda 0.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore CHANGED
@@ -1,6 +1,7 @@
1
1
  .idea
2
2
  .bundle
3
3
  .yardoc
4
+ .rbx
4
5
  doc
5
6
  db
6
7
  pkg
data/.rvmrc CHANGED
@@ -1,2 +1,3 @@
1
1
  rvm --create gemset use andromeda
2
+ export RBXOPT="$RBXOPT -X19 -v"
2
3
  export JRUBY_OPTS="$JRUBY_OPTS -Xcompat.version=1.9"
@@ -0,0 +1,49 @@
1
+ # CHANGELOG for andromeda
2
+
3
+ *Note* Not all versions are released gems, many version numbers just exist in the github repository.
4
+
5
+
6
+ ## 0.1.2 Architecture Refactoring
7
+
8
+ * via(:emit), Spot::>>, entry/dest, enter/exit separation
9
+ * ConnectorBase, post_data clean up
10
+ * (meth_|attr_)spot queries
11
+ * Tested with rbx, mri, and jruby
12
+ * Renaming and reorganization of architecture:
13
+ Stages are now called plans, chunks data, opts tags and dests spots. Construction of Pools and state management (i.e. copying) of Plans and Tags has been factored into two new abstractions: Guards (state management, track selection), and Tracks (Executors/Thread pools).
14
+ * Reorganization of modules (Plan is toplevel + Kit, Sync, Cmd, Atom)
15
+ * New code: error.rb, copy_clone.rb, sugar.rb
16
+ * Beginning docs: CHANGELOG, ROADMAP
17
+ * Cleaned up output in irb considerably
18
+ * Wrote helper support for testing: Atom::(Var, Region, FillOnce, Combiner)
19
+ * Renamed Command to Cmd and moved into Cmd:: module
20
+ * Added guide nick names to Guides.self
21
+
22
+
23
+ ## 0.1.1 Architecture Refactoringm
24
+
25
+ * Commands have support for comments
26
+ * some work left todo for chunking
27
+ * emit is protected now
28
+ * exit as default "emitter" for on_enter (allow overloading in subclasses)
29
+ * Tested with rubinius
30
+ * Set pool from other stage
31
+ * Added globally shared single pool
32
+ * Andromeda.reload! + maruku installed as fallback for yard by default
33
+ * Added FileChunker and FileReader to helpers
34
+ * Added signals (keeping track of dests not intended for map/filter by Transf etc)
35
+ * Renamed Bases to Stages (was talking about stages all the time anyhow)
36
+ * Polished logging/error catching and helpers
37
+ * PoolSupport.num_processors caches num_cpus value (dont trust Facter that much to be quick)
38
+ * Made >> chaining and added chunk_val for easier mapping
39
+ * Added chunk keys for map reduce like handling
40
+ * Added GathererBase and a plain Reducer to helpers.rb
41
+ * Overhauled Join for concurrent synchronization
42
+ * Avoids cloning in single-threaded scenarios for shared state in gatherers
43
+ * Modified thread pool creation to happen on init if possible
44
+ * Added trace_pool for debugging which pools get used by whom
45
+
46
+
47
+ ## 0.1 (Release)
48
+
49
+ Initial Version
data/Gemfile CHANGED
@@ -1,22 +1,30 @@
1
1
  source "http://rubygems.org"
2
2
 
3
3
  gem 'json', '>=1.6.5'
4
- gem 'threadpool'
5
- gem 'facter'
6
4
  gem 'atomic'
5
+ gem 'facter'
6
+ gem 'statval'
7
+ gem 'threadpool'
7
8
 
8
9
  group :development do
9
10
  gem 'rake'
10
- gem 'redcarpet', :require => false
11
- gem 'yard', :require => false
12
- gem 'irbtools', :require => false
13
- end
14
-
15
- group :jruby do
16
- gem 'maruku'
11
+ gem 'yard'
12
+ gem 'irbtools'
17
13
  end
18
14
 
19
15
  group :test do
20
16
  gem 'rspec', '2.6.0'
21
17
  gem 'simplecov'
22
18
  end
19
+
20
+ platforms :ruby do
21
+ gem 'redcarpet'
22
+ end
23
+
24
+ platforms :rbx do
25
+ gem 'redcarpet'
26
+ end
27
+
28
+ platforms :jruby do
29
+ gem 'maruku'
30
+ end
@@ -2,6 +2,7 @@ GEM
2
2
  remote: http://rubygems.org/
3
3
  specs:
4
4
  atomic (1.0.0)
5
+ atomic (1.0.0-java)
5
6
  awesome_print (1.0.2)
6
7
  boson (1.1.1)
7
8
  clipboard (1.0.1)
@@ -36,6 +37,7 @@ GEM
36
37
  wirb (>= 0.4.2)
37
38
  zucker (>= 12.1)
38
39
  json (1.6.6)
40
+ json (1.6.6-java)
39
41
  maruku (0.6.0)
40
42
  syntax (>= 1.0.0)
41
43
  method_locator (0.0.4)
@@ -61,6 +63,7 @@ GEM
61
63
  simplecov-html (0.5.3)
62
64
  sketches (0.1.1)
63
65
  spoon (0.0.1)
66
+ statval (0.1.2)
64
67
  syntax (1.0.0)
65
68
  threadpool (0.1.0.1)
66
69
  unicode-display_width (0.1.1)
@@ -69,6 +72,7 @@ GEM
69
72
  zucker (12.1)
70
73
 
71
74
  PLATFORMS
75
+ java
72
76
  ruby
73
77
 
74
78
  DEPENDENCIES
@@ -81,5 +85,6 @@ DEPENDENCIES
81
85
  redcarpet
82
86
  rspec (= 2.6.0)
83
87
  simplecov
88
+ statval
84
89
  threadpool
85
90
  yard
@@ -17,5 +17,4 @@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
17
  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
18
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
19
  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21
-
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md CHANGED
@@ -1,28 +1,163 @@
1
1
  # andromeda
2
2
 
3
- Andromeda is a ultra light weight multicore stream processing framework based on a small dataflow DSL
3
+ Andromeda is a light weight framework for complex event processing on multicore architectures. Andromeda users construct networks of plans that are interconnected via endpoint spots, describe how plans are scheduled onto threads, and process data by feeding data events to the resulting structure.
4
4
 
5
- It is currently untested and undocumented.
5
+ It currently comes without tests but the core architectures is stable (i.e. the concepts have been fleshed out).
6
6
 
7
- Below is an example that writes events to a file and reads them back in, to give an idea of what it does:
7
+ ## Example
8
+
9
+ Below is an example that writes events to a file and reads them back such that the JSON gets parsed in parallel, to give an idea of what it does:
8
10
 
9
11
  require 'andromeda'
10
- w = Andromeda::CommandoWriter.new path: '/tmp/some_file'
11
- w << (Commando.new :test)
12
- w << (Commando.new :test, weight: 40)
13
- w << (Commando.new :test, height: 20)
12
+
13
+ # Enter scope 'Andromeda' in irb
14
+ cb Andromeda
15
+
16
+ # Write Cmd instances to a log file, nothing fancy here
17
+ w = Cmd::Writer.new path: '/tmp/some_file'
18
+ w << :open
19
+ w << (Cmd::Cmd.new_input :test)
20
+ w << (Cmd::Cmd.new_input :test, weight: 40)
21
+ w << (Cmd::Cmd.new_input :test, height: 20)
14
22
  w << :close
15
23
 
16
- r = Andromeda::CommandParser.new path: '/tmp/some_file'
17
- # make r process events using a global thread pool of num_cpus threads
18
- r.pool = :global
19
- t = Andromeda::Tee.new
20
- # make r output to t
21
- r >> t
22
- # start reading
23
- r << :start
24
+ r = Cmd::Reader.new path: '/tmp/some_file'
25
+ p = Cmd::Parser.new
26
+ t = Kit::Tee.new
27
+ s = Sync::ScopeWaiter.new
28
+ # Connect the processing steps (Plans)
29
+ s >> r >> p >> t
30
+ # Enfore reader to run on a separate single thread
31
+ r.guide = Guides.single
32
+ # Set multicore processing behaviour to parse Cmd's in parallel
33
+ p.guide = Guides.shared_pool
34
+ # Set logger to execute in sending thread (i.e. Parser)
35
+ t.guide = Guides.local
36
+ # Start reading and wait till processing finishes
37
+ s << :start
24
38
  # t will log to a Logger.new(STDERR) by default
25
39
 
26
40
  There is much more, dig the source, luke!
27
41
 
42
+ *Note* All active development happens on the devel branch, cf. boggle/devel, too.
43
+
44
+ ## Installation
45
+
46
+ gem install andromeda
47
+
48
+ ## Requirements
49
+
50
+ Any ruby that has working atomic and threadpool gems should do.
51
+
52
+ Effectively, that is rubinius, jruby and mri ruby (if the provided threading of mri ruby is enough for your purpose).
53
+
54
+ ## Online Docs
55
+
56
+ Docs for the latest released gems are to be found in:
57
+
58
+ http://rubydoc.info/gems/andromeda
59
+
60
+ ## Overview
61
+
62
+ ### Key Concepts: Spots, Plans, Guides, and Tracks
63
+
64
+ Andromeda works by sending data as events over a network of interconnected event handler endpoints (called spots). Each spot is implemented in a container object that is called it's plan. A plan can contain multiple spots, either in the form of event handling method spots (on_name methods of the plan) or as attribute spots that point to spots in other plans. Each plan has a default entry spot, a default exit spot, and an optional spot attribute called errors for signaling exceptions. Plans are connected with each other by assigning spot references to a plan's spot attributes.
65
+
66
+ Event handling is initiated by sending data to a plan's start spot (a special spot that encapsulates the plan's entry spot). Sendin data to a spot is called method spot activation.
67
+
68
+ During processing, andromeda distinguised between two kinds of state, plan state and tag state. Plan state is the state of the concret plan instance that contains an event handling method spot prior to its activation. Tag state is state that gets passed along between spots as a side-effect of event handling.
69
+
70
+ Each plan is associated to a guide. First, guides control if and how plan instances are *copied* (or locked) prior to method spot activation to ensure isolated state access. Secondly, guides assign each method spot activation to a track that describes how and where (on which thread) it actually gets executed.
71
+
72
+ Out of the box andromeda supports various guides: single thread (per plan or globally shared), thread pool (per plan or globally shared), execution in current thread, and spawning of a new thread per data event.
73
+
74
+ To sum up, plans are factory objects that describe the instantiation of concrete data processing networks as guided by their associated guide objects and according to the rules of the underlying, executing tracks.
75
+
76
+ ### Quick Usage Example
77
+
78
+ class MyPlan < Andromeda::Plan
79
+ attr_spot :a, :b
80
+ meth_spot :alternative
81
+
82
+ def data_key(name, data) ; data end
83
+
84
+ def on_enter(key, val)
85
+ exit << val
86
+ end
87
+
88
+ def on_alternative(key, val)
89
+ return (a << val) if key == :a
90
+ return (b << val) if key == :b
91
+ signal_error ArgumentError.new("Unknown key: #{data}")
92
+ end
93
+ end
94
+
95
+ p = MyPlan.new
96
+ p.guide = Andromeda::Guides.shared_pool
97
+ p >> Andromeda::Kit::Tee.new(nick: :red)
98
+ p.a = Andromeda::Kit::Tee.new(nick: :green)
99
+ p.b = Andromeda::Kit::Tee.new(nick: :blue)
100
+
101
+ p << :a # logs to :red
102
+ p << :b # logs to :red
103
+ p.alternative << :a # logs to :green
104
+ p.alternative << :b # logs to :blue
105
+ p << :c # logs error
106
+
107
+
108
+ ### Event handling details
109
+
110
+ Data processing starts when a data object is submitted to a spot. Processing
111
+ happens mainly in two steps, preprocessing in the sending thread, and actual
112
+ execution (processing) on the target track.
113
+
114
+ #### Preprocessing
115
+
116
+ During preprocessing, the data object may be mapped, split into a key for routing and an actual data value, it may be used to modify the set of associated tags, and finally get filtered out before sending. Furthermore, the key may be used to switch the target spot name and track label. All of these steps are optional and aim to push preprocessing fucntionality to the sending thread to avoid unneccesary thread context switches.
117
+
118
+ Please consult the documentation of class Plan to discover the preprocessing methods that are available for overloading in subclasses.
119
+
120
+ #### Execution/Processing
121
+
122
+ Prior to execution, the plan's guide selects a track for spot activation, packs the plan (i.e. copies/freezes/locks it's plan state as necessary), and optionally modifies the associated incoming tags. Finally, the method spot gets activated by calling the spot's method on the packed plan inside the chosen track with the accumulated tags (plan tags and incoming tags).
123
+
124
+ #### Tags
125
+
126
+ Each method activation is associated with a set of tags (a hash) that contains optional parameters. The tags may be modified by the spot method
127
+ and are passed on whenever a spot method activates another.
128
+
129
+ Andromeda provides a small set of reserved default tags that should not be overwritten:
130
+
131
+ * `tags[:name]` final target spot name
132
+ * `tags[:scope]` an Atom::Region instance that is used to wait for completion of processing (cf. below)
133
+ * `tags[:label]` the label passed on to the guide to select the track for execution (usually identical to name)
134
+ * `tags[:mark]` used for xor-mark based tracking of event flow
135
+
136
+ #### Wating for event handling completion
137
+
138
+ Waiting for event handling completion may be achieved by utilizing a special wrapper plan (cf. Sync::ScopeWaiter). This is implemented using an atomic counter (cf. Atom::Region).
139
+
140
+ #### Performance
141
+
142
+ Andromeda's event handling mechanism is powerful but associated with some performance overhead due to the associated state management. It was written for using it with larger events (i.e. array slices) that user plans iterate over and is not intended for the processing of massively many small events. YMMV.
143
+
144
+ #### Correctness
145
+
146
+ Andromeda provides guides to ensure that state is only accessed by a single thread or that it's state is locked apropriately otherwise. However this only works if you assign correct guides to your plans. Please read and understand the documentation of the various available guides to make sure that no unintended concurrent access of plans takes place.
147
+
148
+ Alternatively, look at the provided plan implementations for example code.
149
+
150
+ ## Remarks
151
+
152
+ ### Inspiration
153
+
154
+ Andromeda takes inspiration from several existing approaches / techniques in the area of concurrent programming.
155
+
156
+ * actor model: state encapsulation
157
+ * event processing: preprocessing in sender thread, large events
158
+ * libdispatch: abstracting over used queues / thread pool
159
+ * join calculus: Sync::Sync
160
+
161
+ ### Status
28
162
 
163
+ Alpha at best.
@@ -0,0 +1,73 @@
1
+ # Roadmap
2
+
3
+ This document contains planning steps and ideas for the future of andromeda.
4
+
5
+ ## Short-Term Todo's
6
+
7
+ * Test with macruby, figure out if rubinius pre-compilation should be added
8
+ * Convert old Pool code into Guards
9
+ * Convert Kit into Plans
10
+ * Convert Command into Plans, moving it into a submodule
11
+ * Test scope, tags, threading in IRB
12
+ * Subscopes
13
+
14
+ ### Write Test-Cases
15
+
16
+ This needs to be done as soon as the general API has matured enough, i.e.
17
+ around when andromeda is re-used by neoscout.
18
+
19
+ ### Write Docs
20
+
21
+ * Get started on stable calls in the API first
22
+ * Complete as time goes by
23
+ * Add high-level description to README.md
24
+ * Add examples to README.md
25
+ * Add link to yardocs to README.md as soon as that makes sense
26
+ * Figure out yardoc methods for documenting meth_spot and attr_spot
27
+
28
+ ### Write a better DSL for connecting plans
29
+
30
+ * connect
31
+ * Arrow Calculus via Kit comes to mind
32
+ * More operators like '>>': Add multiple via splitter, join results etc.
33
+ * This needs more practical experience with the framework first.
34
+
35
+ ### Implement map_reduce.rb
36
+
37
+ ### Implement ActorGuide
38
+
39
+ ### Implement csv.rb
40
+
41
+ * map statval over everything that looks like a number
42
+
43
+ ## Long-Term Ideas
44
+
45
+ ### Implement more synchronization primitives
46
+
47
+ * Buffered join
48
+
49
+ ### Implement zmq.rb
50
+
51
+ ### Implement network visualization using GraphViz
52
+
53
+ * Really should use an abstract graph builder interface
54
+
55
+ ### Implement additional connectors
56
+
57
+ * TCP
58
+ * Syslog
59
+ * EventMachine
60
+
61
+ ## Open Issues
62
+
63
+ ### Avoid memory leaks
64
+
65
+ I'm undecied on this, but spots could be cached instead of being recreated
66
+ on intern using the yet to be written Atom::* vars.
67
+
68
+ ## Far, far in the future
69
+
70
+ Add automatic distribution support.
71
+
72
+ It should not be that hard. In the end this is just a mildly interesting graph transformation on the topology, a bit of rewiring, and some support code to run stuff on remote machines. Ah maybe we just use capistrano for that. Of course, that would be static only. Dynamic job submission is a diffrerent story, as is at-most-once messaging (i.e. transactionality).
73
+
data/Rakefile CHANGED
@@ -5,6 +5,7 @@ require 'rspec/core/rake_task'
5
5
 
6
6
  require 'yard'
7
7
  require 'yard/rake/yardoc_task'
8
+ require File.dirname(__FILE__) + '/yard_extensions/andromeda'
8
9
 
9
10
  desc 'Run all rspecs'
10
11
  RSpec::Core::RakeTask.new(:spec) do |spec|
@@ -15,15 +16,12 @@ end
15
16
 
16
17
  desc 'Run yardoc over project sources'
17
18
  YARD::Rake::YardocTask.new(:ydoc) do |t|
18
- t.options = ['--verbose']
19
+ t.options = ['--verbose']
19
20
  t.files = ['lib/**/*.rb', '-', 'README.md', 'AUTHORS', 'LICENSE.txt']
21
+ t.files << 'CHANGELOG.md'
22
+ t.files << 'ROADMAP.md'
20
23
  end
21
24
 
22
- #RDoc::Task.new(:rdoc) do |rdoc|
23
- # # rdoc.main = "README.rdoc"
24
- # rdoc.rdoc_files.include("lib/**/*.rb")
25
- #end
26
-
27
25
  desc 'Run irb in project environment'
28
26
  task :console do
29
27
  require 'irb'
@@ -5,8 +5,8 @@ require 'andromeda/version'
5
5
  Gem::Specification.new do |s|
6
6
  s.name = 'andromeda'
7
7
  s.version = Andromeda::VERSION
8
- s.summary = 'Ultra light weight multicore stream processing framework based on a dataflow DSL'
9
- s.description = s.summary
8
+ s.summary = 'light weght framework for complex event processing based on a dataflow DSL'
9
+ s.description = 'Andromeda is a light weight framework for complex event processing on multicore architectures. Andromeda users construct networks of plans that are interconnected via endpoint spots, describe how plans are scheduled onto threads, and process data by feeding data events to the resulting structure.'
10
10
  s.author = 'Stefan Plantikow'
11
11
  s.email = 'stefanp@moviepilot.com'
12
12
  s.homepage = 'https://github.com/moviepilot/andromeda'
@@ -1,14 +1,57 @@
1
+ require 'rubygems'
2
+
3
+ require 'set'
1
4
  require 'json'
2
5
  require 'logger'
3
- require 'threadpool'
4
- require 'facter'
6
+ require 'delegate'
7
+ require 'singleton'
8
+
9
+ require 'atomic'
5
10
  require 'thread'
11
+ require 'facter'
12
+ require 'threadpool'
6
13
  Facter.loadfacts
7
14
 
8
- require 'andromeda/id'
9
- require 'andromeda/pools'
10
- require 'andromeda/scope'
11
- require 'andromeda/andromeda'
12
- require 'andromeda/helpers'
13
- require 'andromeda/join'
14
- require 'andromeda/commando'
15
+ require 'andromeda/version'
16
+
17
+ module Andromeda
18
+
19
+ def self.files
20
+ f = []
21
+ f << 'andromeda/impl/to_s'
22
+ f << 'andromeda/impl/atom'
23
+ f << 'andromeda/impl/xor_id'
24
+ f << 'andromeda/impl/class_attr'
25
+ f << 'andromeda/impl/proto_plan'
26
+
27
+ f << 'andromeda/id'
28
+ f << 'andromeda/atom'
29
+ f << 'andromeda/error'
30
+ f << 'andromeda/copy_clone'
31
+ f << 'andromeda/guide_track'
32
+ f << 'andromeda/pool_guide'
33
+
34
+ f << 'andromeda/spot'
35
+ f << 'andromeda/plan'
36
+ f << 'andromeda/sync'
37
+ f << 'andromeda/sugar'
38
+
39
+ f << 'andromeda/kit'
40
+ f << 'andromeda/cmd'
41
+ f << 'andromeda/map_reduce'
42
+ f
43
+ end
44
+
45
+ def self.load_relative(f)
46
+ path = "#{File.join(File.dirname(caller[0]), f)}.rb"
47
+ load path
48
+ end
49
+
50
+ def self.reload!
51
+ files.each { |f| load_relative f }
52
+ end
53
+
54
+ end
55
+
56
+ Andromeda.files.each { |f| require f }
57
+