deadly_serious 0.3.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 68d3c2f2fb290436ae2543707c16063f4747b7c4
4
- data.tar.gz: 76d915c5453302c81cd2b9a0cb7171824212f340
3
+ metadata.gz: 3dfdbc7c45af05380d213ab4011d4228e2e8b6af
4
+ data.tar.gz: 089e715ec1cc97af1b01b43d6bc8133a33353bb3
5
5
  SHA512:
6
- metadata.gz: 9796184360d511d9c3dc54ef94b31d965adbd25c54f347866118f3b6afd3b8beb73ce3a1a669c026bd6fc8426f8df0be7ff2f9deb25cdd9e5e29a5100d4a57e4
7
- data.tar.gz: 27c775a1c743c74c34c707d1176302ccf9bd23a347d606dbe3e19b699953e871d7ca9f72dd9fab47c94d5747e2eb39570dbac7e4cdb84791959d363d3b2a34d9
6
+ metadata.gz: e1e87aca37f80261b0ab8f0d7e1786d5fa20d224bed7277a0eb72833200f9a47b198f888b1d77a6e15af3a2a479237d1715dccff0b460fafffe934012179e316
7
+ data.tar.gz: b5df200c7451bf4d058dea4b791b013c93c5ed0a5dbb3715a41deecee859c7a4249c42b5145a28fa3a8d6e881a46d9dffc046939f61a5d90492fc51e52485535
data/README.md CHANGED
@@ -1,10 +1,26 @@
1
1
  # DeadlySerious
2
2
 
3
- Flow Based Programming Engine mechanically sympathetic to \*nix.
3
+ Flow Based Programming Maestro!
4
4
 
5
- Flow Based Programming engine that relies on named pipes and Linux processes (sorry, it not works on Windows right now). That means it uses "mechanical sympathy" with the Operating System, i.e., the S.O. is *part* of the program, it's not something *below* it.
5
+ This relies on [*named pipes*](http://linux.die.net/man/7/fifo) and *Linux processes* to create a program. Each component runs as a separate linux process and they exchange information through pipes.
6
6
 
7
- **REQUIRES** Ruby 2.0 and a \*nix based SO (tested on *Ubuntu* and *Arch Linux*)
7
+ **REQUIRES** Ruby 2.0 and a \*nix based OS (Operating System, tested on *Ubuntu* and *Arch Linux*)
8
+
9
+ Unlike [NoFlo](http://noflojs.org), this is not a real engine. It just "orchestrates" linux processes and pipes to create a flow based system.
10
+
11
+ Overall, it's slower than a normal ruby program (the pipes add some overhead). However, there are 4 points where this approach is pretty interesting:
12
+
13
+ 1. High modifiabilty:
14
+ * The interfaces between each component is tiny and very clear: they are just streams of characteres. I usually use csv format or json when I need more structure than that.
15
+ * You can connect ruby process to anything else that deals with STDIN, STDOUT or files (which includes shell commands, of course).
16
+ 2. Cheap parallelism and distributed computation:
17
+ * Each component runs as a separated process. The OS is in charge here (and it does a amazing work running things in parallel).
18
+ * As any shell command can be use as a component, you can use a simple [ncat](http://nmap.org/ncat) (or something similar) to disttribute jobs between different boxes.
19
+ * It's really easy to avoid deadlocks and race conditions with the FBP paradigm.
20
+ 3. Low memory footprint
21
+ * As each component usually process things as they appear in the pipe, it's easy to crush tons of data using very little memory. Notable exceptions as components that needs to accumulate things to process, like "sort".
22
+ 4. Very easy to reason about (personal opinion):
23
+ * Of course, this is not a merit of this gem, but of Flow Based Programming in general. I dare do say (oh, blasphemy!) that Object Oriented and Functional programming paradigms are good ONLY for tiny systems. They make a huge mess on big ones (#prontofalei).
8
24
 
9
25
  ## Installation
10
26
 
@@ -22,7 +38,237 @@ Or install it yourself as:
22
38
 
23
39
  ## Usage
24
40
 
25
- TODO: Write usage instructions here
41
+ ### Basic pipeline
42
+
43
+ Create a class that will orchestrate the pipeline:
44
+
45
+ ```ruby
46
+ #!/usr/bin/env ruby
47
+ # Assuming your are using RVM
48
+
49
+ class Pipeline < DeadlySerious::Engine::Spawner
50
+ def run_pipeline
51
+ # Here comes the code
52
+ end
53
+ end
54
+
55
+ # This line will alow you to run
56
+ # it directly from the shell.
57
+ #
58
+ # Please, note that you fires the pipeline
59
+ # calling "run" not "run_pipeline".
60
+ Pipeline.new.run if __FILE__ == $0
61
+ ```
62
+
63
+ You can spawn process the following way:
64
+
65
+ ```ruby
66
+ class Pipeline < DeadlySerious::Engine::Spawner
67
+ def run_pipeline
68
+
69
+ spawn_process(YourComponentClass,
70
+ readers: ['>an_awesome_text_file.txt'], # reads from a file
71
+ writers: ['your_first_output_pipe']) # outputs to a pipe
72
+
73
+ spawn_process(YourOtherComponentClass,
74
+ readers: ['your_first_output_pipe'],
75
+ writers: ['more_pipe1', 'more_pipe2'])
76
+
77
+ end
78
+ end
79
+ ```
80
+
81
+ A component is any class with a "run" method and two named parameters "readers" and "writers":
82
+
83
+ ```ruby
84
+ class EchoComponent
85
+ # "readers" and "writers" are both Array of IO objects.
86
+ def run(readers: [], writers: [])
87
+ reader = readers.first
88
+ writer = writers.first
89
+
90
+ reader.each_line do |line|
91
+ writer << line
92
+ end
93
+ end
94
+ end
95
+ ```
96
+
97
+ ### Pipes and files
98
+
99
+ The parameters you receive in the "def run(readers: [], writers: [])" method are [**IO**](http://www.ruby-doc.org/core-2.0/IO.html) objects.
100
+
101
+ They are already opened when they are passed to your component, and they are properly closed when your component is done.
102
+
103
+ In the Pipeline class, readers and writers are just pipe names *or* file names. If you want to read or write to a file instead of a pipe, prepend its name with ">", like this:
104
+
105
+ ```ruby
106
+ spawn_process(YourComponentClass,
107
+ readers: ['>an_awesome_text_file.txt'], # reads from a file
108
+ writers: ['your_first_output_pipe']) # outputs to a pipe
109
+
110
+ spawn_process(YourComponentClass,
111
+ readers: ['an_awesome_pipe'], # reads from a pipe
112
+ writers: ['>your_first_output_file']) # outputs to a file
113
+ ```
114
+
115
+ Files are read and created in the "./data" directory, "." being the directory where you fired the program.
116
+
117
+ Pipes are created in the '/tmp/deadly_serious/&lt;pid&gt;/' directory and they live just during the program execution. Once it's done, the directory is deleted.
118
+
119
+ ### Shell commands
120
+
121
+ Spawning shell commands are simples as that:
122
+
123
+ ```ruby
124
+ spawn_command('cat ((>a_file_in_data_dir.csv)) | grep wabba > ((some_pipe))')
125
+ spawn_command('cat ((some_pipe)) > ((>my_own_output_file.txt))')
126
+ ```
127
+
128
+ The "((" and "))" are replaced by the actual pipe (or file) path before execution.
129
+
130
+ ### Preserve pipe directory and change data directory
131
+
132
+ In the Pipeline class (the one you extended from Engine::Spawner), you can override the "initialize" method to pass some parameters:
133
+
134
+ ```ruby
135
+ class Pipeline < DeadlySerious::Engine::Spawner
136
+ def initialize
137
+ super(
138
+ data_dir: './data', # Files directory
139
+ pipe_dir: "/tmp/deadly_serious/#{Process.pid}", # Pipes directory
140
+ preserve_pipe_dir: false) # Keeps the pipes directory after finish execution?
141
+ end
142
+ end
143
+ ```
144
+
145
+ You can overwrite any of them. The ones presented above are default.
146
+
147
+ ### JSON integration
148
+
149
+ Yet to be explained.
150
+
151
+ ### Pre-made components
152
+
153
+ * Source components
154
+ * Splitter
155
+ * Joiner
156
+
157
+ Yet to be explained.
158
+
159
+ ### Examples
160
+
161
+ Here a simple program to deal with the "Telegram Problem" as first described by Peter Naur.
162
+
163
+ > Write a program that takes a number **w**, then accepts lines of text and outputs lines of text, where the output lines have as many words as possible but are never longer than **w** characters. Words may not be split, but you may assume that no single word is too long for a line.
164
+
165
+ ```ruby
166
+ require 'deadly_serious'
167
+
168
+ module TelegramProblem
169
+ LINE_SIZE = 80
170
+ PARAGRAPH_PACKET = 'CONTROL PACKET: paragraph'
171
+ EOL_PACKET = 'CONTROL PACKET: end of line'
172
+
173
+ # Break text in words and "END OF LINE" packets (EOL)
174
+ class WordSplitter
175
+ def run(readers: [], writers: [])
176
+ reader = readers.first
177
+ writer = writers.first
178
+
179
+ reader.each do |line|
180
+ line.chomp!
181
+ line.scan(/$|\S+/) do |word|
182
+ packet = (word == '' ? EOL_PACKET : word)
183
+ writer << packet << "\n"
184
+ end
185
+ end
186
+ end
187
+ end
188
+
189
+ # Transform double "end of line" in "paragraph"
190
+ class EolToParagraph
191
+ def run(readers: [], writers: [])
192
+ reader = readers.first
193
+ writer = writers.first
194
+
195
+ last2 = ''
196
+ last1 = ''
197
+ reader.each do |packet|
198
+ packet.chomp!
199
+
200
+ if packet == EOL_PACKET
201
+ last2 = last1
202
+ last1 = packet
203
+ next
204
+ end
205
+
206
+ if last1 == EOL_PACKET && last2 == EOL_PACKET
207
+ writer << PARAGRAPH_PACKET << "\n"
208
+ end
209
+
210
+ writer << packet << "\n"
211
+ last2 = last1
212
+ last1 = packet
213
+ end
214
+ end
215
+ end
216
+
217
+ # Join words
218
+ class SentenceJoiner
219
+ def run(readers: [], writers: [])
220
+ reader = readers.first
221
+ writer = writers.first
222
+
223
+ line_size = 0
224
+
225
+ reader.each do |packet|
226
+ packet.chomp!
227
+
228
+ if packet == PARAGRAPH_PACKET
229
+ writer << "\n\n"
230
+ line_size = 0
231
+ next
232
+ end
233
+
234
+ if line_size + packet.size + 1 <= LINE_SIZE
235
+ writer << ' ' if line_size > 0
236
+ writer << packet
237
+ line_size += packet.size + 1
238
+ else
239
+ writer << "\n"
240
+ writer << packet
241
+ line_size = packet.size
242
+ end
243
+ end
244
+ writer << "\n" unless line_size == 0
245
+ end
246
+ end
247
+
248
+ class Pipeline < DeadlySerious::Engine::Spawner
249
+ include DeadlySerious
250
+ def run_pipeline
251
+ spawn_process(WordSplitter, readers: ['>war_and_peace.txt'], writers: ['words_and_eol'])
252
+ spawn_process(EolToParagraph, readers: ['words_and_eol'], writers: ['just_words'])
253
+ spawn_process(SentenceJoiner, readers: ['just_words'], writers: ['>output.data'])
254
+ end
255
+ end
256
+ end
257
+
258
+ if __FILE__ == $0
259
+ TelegramProblem::Pipeline.new.run
260
+ end
261
+ ```
262
+
263
+ Check the "examples" directory for other examples of use. There's even a version of this Telegram Problem program made without pipes and such, but using the same logic. I made it to have a "feeling" of the overhead of using pipes.
264
+
265
+ In my findings, the overhead is roughly 100% in this very simple problem (same time, 2x cpu). Considering that each of the components above are *really* simple (just split, join words and an "if" and 2 pipes), I found the overhead not a great deal. However, I need more tests.
266
+
267
+ ## Future features (a.k.a. "The Wishlist")
268
+
269
+ * Socket connectors (pipe things through net)
270
+ * Remote coordination (create and running remote components from a master box)
271
+ * More pre-made components (using C?)
26
272
 
27
273
  ## Contributing
28
274
 
@@ -20,6 +20,7 @@ Gem::Specification.new do |spec|
20
20
 
21
21
  spec.add_development_dependency 'bundler', '~> 1.3'
22
22
  spec.add_development_dependency 'rake'
23
+ spec.add_development_dependency 'yard'
23
24
 
24
25
  spec.add_dependency 'json'
25
26
  end
@@ -1,26 +1,28 @@
1
+ require 'socket'
2
+
1
3
  module DeadlySerious
2
4
  module Engine
3
- class Channel
4
- def initialize(name, data_dir: nil, pipe_dir: nil)
5
- matcher = name.match(/^(>?)(.*)$/)
6
- @type = matcher[1] == '>' ? :file : :pipe
7
- name = matcher[2]
8
- @io_name = if @type == :file
9
- "#{data_dir}/#{name}"
10
- else
11
- "#{pipe_dir}/#{name}"
12
- end
13
- end
14
-
15
- # Create a pipe or file (acording to name)
16
- # and returns the full name of the thing created.
17
- def create
18
- return @io_name if File.exist?(@io_name)
19
- if @type == :file
20
- `touch #{@io_name}`
5
+ # Fake class, it's actually a factory ¬¬
6
+ module Channel
7
+ def self.new(name, data_dir: nil, pipe_dir: nil)
8
+ matcher = name.match(/^(>)?(.*?)(?:(:)(\d{1,5}))?$/)
9
+ if matcher[1] == '>'
10
+ FileChannel.new(matcher[2], data_dir)
11
+ elsif matcher[3] == ':'
12
+ SocketChannel.new(matcher[2], matcher[4].to_i)
21
13
  else
22
- `mkfifo #{@io_name}`
14
+ PipeChannel.new(matcher[2], pipe_dir)
23
15
  end
16
+ end
17
+ end
18
+
19
+ class FileChannel
20
+ def initialize(name, directory)
21
+ @io_name = File.join(directory, name)
22
+ end
23
+
24
+ def create
25
+ `touch #{@io_name}` unless File.exist?(@io_name)
24
26
  @io_name
25
27
  end
26
28
 
@@ -34,5 +36,53 @@ module DeadlySerious
34
36
  open(@io_name, 'w')
35
37
  end
36
38
  end
39
+
40
+ class PipeChannel
41
+ def initialize(name, directory)
42
+ @io_name = File.join(directory, name)
43
+ end
44
+
45
+ def create
46
+ `mkfifo #{@io_name}` unless File.exist?(@io_name)
47
+ @io_name
48
+ end
49
+
50
+ def open_reader
51
+ fail %(Pipe "#{@io_name}" not found) unless File.exist?(@io_name)
52
+ open(@io_name, 'r')
53
+ end
54
+
55
+ def open_writer
56
+ fail %(Pipe "#{@io_name}" not found) unless File.exist?(@io_name)
57
+ open(@io_name, 'w')
58
+ end
59
+ end
60
+
61
+ class SocketChannel
62
+ def initialize(host, port)
63
+ @host, @port = host, port
64
+ @retry_counter = 3
65
+ end
66
+
67
+ def create
68
+ # Do nothing
69
+ end
70
+
71
+ def open_reader
72
+ TCPSocket.new(@host, @port)
73
+ rescue Exception => e
74
+ @retry_counter -= 1
75
+ if @retry_counter > 0
76
+ sleep 1 and retry
77
+ else
78
+ raise e
79
+ end
80
+ end
81
+
82
+ def open_writer
83
+ server = TCPServer.new(@port)
84
+ server.accept
85
+ end
86
+ end
37
87
  end
38
88
  end
@@ -16,48 +16,12 @@ module DeadlySerious
16
16
  end
17
17
  end
18
18
 
19
- def self.dasherize(a_string)
20
- a_string.gsub(/(.)([A-Z])/, '\1-\2').downcase.gsub(/\W+/, '-')
21
- end
22
-
23
- def set_process_name(name)
24
- $0 = "ruby #{self.class.dasherize(name)}"
25
- end
26
-
27
- def channel_for(pipe_name)
28
- Channel.new(pipe_name, data_dir: @data_dir, pipe_dir: @pipe_dir)
29
- end
30
-
31
- def create_pipe(pipe_name)
32
- channel_for(pipe_name).create
33
- end
34
-
35
- def read_pipe(pipe_name)
36
- channel_for(pipe_name).open_reader
37
- end
38
-
39
- def write_pipe(pipe_name)
40
- channel = channel_for(pipe_name)
41
- return channel.open_writer unless block_given?
42
-
43
- channel.open_writer do |io|
44
- yield io
45
- end
46
- end
47
-
48
- def fork_it
49
- @ids << fork do
50
- yield
51
- end
52
- end
53
-
54
- def wait_children
55
- @ids.each { |id| Process.wait(id) }
56
- end
57
-
58
- def kill_children
59
- @ids.each { |id| Process.kill('SIGTERM', id) }
19
+ def run
20
+ run_pipeline
60
21
  wait_children
22
+ rescue Exception => e
23
+ kill_children
24
+ raise e
61
25
  end
62
26
 
63
27
  def spawn_source(a_class, *args, writer: self.class.dasherize(a_class.name))
@@ -94,12 +58,58 @@ module DeadlySerious
94
58
  @ids << spawn(command)
95
59
  end
96
60
 
97
- def run
98
- run_pipeline
61
+ private
62
+
63
+ # @!group Process Control
64
+
65
+ def fork_it
66
+ @ids << fork do
67
+ yield
68
+ end
69
+ end
70
+
71
+ def wait_children
72
+ @ids.each { |id| Process.wait(id) }
73
+ end
74
+
75
+ def kill_children
76
+ @ids.each { |id| Process.kill('SIGTERM', id) }
99
77
  wait_children
100
- rescue Exception => e
101
- kill_children
102
- raise e
78
+ end
79
+
80
+ def set_process_name(name)
81
+ $0 = "ruby #{self.class.dasherize(name)}"
82
+ end
83
+
84
+ # @!endgroup
85
+ # @!group Channel Helpers
86
+
87
+ def channel_for(pipe_name)
88
+ Channel.new(pipe_name, data_dir: @data_dir, pipe_dir: @pipe_dir)
89
+ end
90
+
91
+ def create_pipe(pipe_name)
92
+ channel_for(pipe_name).create
93
+ end
94
+
95
+ def read_pipe(pipe_name)
96
+ channel_for(pipe_name).open_reader
97
+ end
98
+
99
+ def write_pipe(pipe_name)
100
+ channel = channel_for(pipe_name)
101
+ return channel.open_writer unless block_given?
102
+
103
+ channel.open_writer do |io|
104
+ yield io
105
+ end
106
+ end
107
+
108
+ # @!endgroup
109
+ # @!group Minor Helpers
110
+
111
+ def self.dasherize(a_string)
112
+ a_string.gsub(/(.)([A-Z])/, '\1-\2').downcase.gsub(/\W+/, '-')
103
113
  end
104
114
  end
105
115
  end
@@ -1,3 +1,3 @@
1
1
  module DeadlySerious
2
- VERSION = "0.3.0"
2
+ VERSION = "0.4.0"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: deadly_serious
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ronie Uliana
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-08-10 00:00:00.000000000 Z
11
+ date: 2013-08-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - '>='
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: yard
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - '>='
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - '>='
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: json
43
57
  requirement: !ruby/object:Gem::Requirement
@@ -105,3 +119,4 @@ summary: Flow Based Programming engine that relies on named pipes and Linux proc
105
119
  with the Operating System, i.e., the S.O. is *part* of the program, it's not something
106
120
  *below* it.
107
121
  test_files: []
122
+ has_rdoc: