deadly_serious 0.3.0 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +250 -4
- data/deadly_serious.gemspec +1 -0
- data/lib/deadly_serious/engine/channel.rb +69 -19
- data/lib/deadly_serious/engine/spawner.rb +56 -46
- data/lib/deadly_serious/version.rb +1 -1
- metadata +17 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3dfdbc7c45af05380d213ab4011d4228e2e8b6af
|
4
|
+
data.tar.gz: 089e715ec1cc97af1b01b43d6bc8133a33353bb3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e1e87aca37f80261b0ab8f0d7e1786d5fa20d224bed7277a0eb72833200f9a47b198f888b1d77a6e15af3a2a479237d1715dccff0b460fafffe934012179e316
|
7
|
+
data.tar.gz: b5df200c7451bf4d058dea4b791b013c93c5ed0a5dbb3715a41deecee859c7a4249c42b5145a28fa3a8d6e881a46d9dffc046939f61a5d90492fc51e52485535
|
data/README.md
CHANGED
@@ -1,10 +1,26 @@
|
|
1
1
|
# DeadlySerious
|
2
2
|
|
3
|
-
Flow Based Programming
|
3
|
+
Flow Based Programming Maestro!
|
4
4
|
|
5
|
-
|
5
|
+
This relies on [*named pipes*](http://linux.die.net/man/7/fifo) and *Linux processes* to create a program. Each component runs as a separate linux process and they exchange information through pipes.
|
6
6
|
|
7
|
-
**REQUIRES** Ruby 2.0 and a \*nix based
|
7
|
+
**REQUIRES** Ruby 2.0 and a \*nix based OS (Operating System, tested on *Ubuntu* and *Arch Linux*)
|
8
|
+
|
9
|
+
Unlike [NoFlo](http://noflojs.org), this is not a real engine. It just "orchestrates" linux processes and pipes to create a flow based system.
|
10
|
+
|
11
|
+
Overall, it's slower than a normal ruby program (the pipes add some overhead). However, there are 4 points where this approach is pretty interesting:
|
12
|
+
|
13
|
+
1. High modifiabilty:
|
14
|
+
* The interfaces between each component is tiny and very clear: they are just streams of characteres. I usually use csv format or json when I need more structure than that.
|
15
|
+
* You can connect ruby process to anything else that deals with STDIN, STDOUT or files (which includes shell commands, of course).
|
16
|
+
2. Cheap parallelism and distributed computation:
|
17
|
+
* Each component runs as a separated process. The OS is in charge here (and it does a amazing work running things in parallel).
|
18
|
+
* As any shell command can be use as a component, you can use a simple [ncat](http://nmap.org/ncat) (or something similar) to disttribute jobs between different boxes.
|
19
|
+
* It's really easy to avoid deadlocks and race conditions with the FBP paradigm.
|
20
|
+
3. Low memory footprint
|
21
|
+
* As each component usually process things as they appear in the pipe, it's easy to crush tons of data using very little memory. Notable exceptions as components that needs to accumulate things to process, like "sort".
|
22
|
+
4. Very easy to reason about (personal opinion):
|
23
|
+
* Of course, this is not a merit of this gem, but of Flow Based Programming in general. I dare do say (oh, blasphemy!) that Object Oriented and Functional programming paradigms are good ONLY for tiny systems. They make a huge mess on big ones (#prontofalei).
|
8
24
|
|
9
25
|
## Installation
|
10
26
|
|
@@ -22,7 +38,237 @@ Or install it yourself as:
|
|
22
38
|
|
23
39
|
## Usage
|
24
40
|
|
25
|
-
|
41
|
+
### Basic pipeline
|
42
|
+
|
43
|
+
Create a class that will orchestrate the pipeline:
|
44
|
+
|
45
|
+
```ruby
|
46
|
+
#!/usr/bin/env ruby
|
47
|
+
# Assuming your are using RVM
|
48
|
+
|
49
|
+
class Pipeline < DeadlySerious::Engine::Spawner
|
50
|
+
def run_pipeline
|
51
|
+
# Here comes the code
|
52
|
+
end
|
53
|
+
end
|
54
|
+
|
55
|
+
# This line will alow you to run
|
56
|
+
# it directly from the shell.
|
57
|
+
#
|
58
|
+
# Please, note that you fires the pipeline
|
59
|
+
# calling "run" not "run_pipeline".
|
60
|
+
Pipeline.new.run if __FILE__ == $0
|
61
|
+
```
|
62
|
+
|
63
|
+
You can spawn process the following way:
|
64
|
+
|
65
|
+
```ruby
|
66
|
+
class Pipeline < DeadlySerious::Engine::Spawner
|
67
|
+
def run_pipeline
|
68
|
+
|
69
|
+
spawn_process(YourComponentClass,
|
70
|
+
readers: ['>an_awesome_text_file.txt'], # reads from a file
|
71
|
+
writers: ['your_first_output_pipe']) # outputs to a pipe
|
72
|
+
|
73
|
+
spawn_process(YourOtherComponentClass,
|
74
|
+
readers: ['your_first_output_pipe'],
|
75
|
+
writers: ['more_pipe1', 'more_pipe2'])
|
76
|
+
|
77
|
+
end
|
78
|
+
end
|
79
|
+
```
|
80
|
+
|
81
|
+
A component is any class with a "run" method and two named parameters "readers" and "writers":
|
82
|
+
|
83
|
+
```ruby
|
84
|
+
class EchoComponent
|
85
|
+
# "readers" and "writers" are both Array of IO objects.
|
86
|
+
def run(readers: [], writers: [])
|
87
|
+
reader = readers.first
|
88
|
+
writer = writers.first
|
89
|
+
|
90
|
+
reader.each_line do |line|
|
91
|
+
writer << line
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
95
|
+
```
|
96
|
+
|
97
|
+
### Pipes and files
|
98
|
+
|
99
|
+
The parameters you receive in the "def run(readers: [], writers: [])" method are [**IO**](http://www.ruby-doc.org/core-2.0/IO.html) objects.
|
100
|
+
|
101
|
+
They are already opened when they are passed to your component, and they are properly closed when your component is done.
|
102
|
+
|
103
|
+
In the Pipeline class, readers and writers are just pipe names *or* file names. If you want to read or write to a file instead of a pipe, prepend its name with ">", like this:
|
104
|
+
|
105
|
+
```ruby
|
106
|
+
spawn_process(YourComponentClass,
|
107
|
+
readers: ['>an_awesome_text_file.txt'], # reads from a file
|
108
|
+
writers: ['your_first_output_pipe']) # outputs to a pipe
|
109
|
+
|
110
|
+
spawn_process(YourComponentClass,
|
111
|
+
readers: ['an_awesome_pipe'], # reads from a pipe
|
112
|
+
writers: ['>your_first_output_file']) # outputs to a file
|
113
|
+
```
|
114
|
+
|
115
|
+
Files are read and created in the "./data" directory, "." being the directory where you fired the program.
|
116
|
+
|
117
|
+
Pipes are created in the '/tmp/deadly_serious/<pid>/' directory and they live just during the program execution. Once it's done, the directory is deleted.
|
118
|
+
|
119
|
+
### Shell commands
|
120
|
+
|
121
|
+
Spawning shell commands are simples as that:
|
122
|
+
|
123
|
+
```ruby
|
124
|
+
spawn_command('cat ((>a_file_in_data_dir.csv)) | grep wabba > ((some_pipe))')
|
125
|
+
spawn_command('cat ((some_pipe)) > ((>my_own_output_file.txt))')
|
126
|
+
```
|
127
|
+
|
128
|
+
The "((" and "))" are replaced by the actual pipe (or file) path before execution.
|
129
|
+
|
130
|
+
### Preserve pipe directory and change data directory
|
131
|
+
|
132
|
+
In the Pipeline class (the one you extended from Engine::Spawner), you can override the "initialize" method to pass some parameters:
|
133
|
+
|
134
|
+
```ruby
|
135
|
+
class Pipeline < DeadlySerious::Engine::Spawner
|
136
|
+
def initialize
|
137
|
+
super(
|
138
|
+
data_dir: './data', # Files directory
|
139
|
+
pipe_dir: "/tmp/deadly_serious/#{Process.pid}", # Pipes directory
|
140
|
+
preserve_pipe_dir: false) # Keeps the pipes directory after finish execution?
|
141
|
+
end
|
142
|
+
end
|
143
|
+
```
|
144
|
+
|
145
|
+
You can overwrite any of them. The ones presented above are default.
|
146
|
+
|
147
|
+
### JSON integration
|
148
|
+
|
149
|
+
Yet to be explained.
|
150
|
+
|
151
|
+
### Pre-made components
|
152
|
+
|
153
|
+
* Source components
|
154
|
+
* Splitter
|
155
|
+
* Joiner
|
156
|
+
|
157
|
+
Yet to be explained.
|
158
|
+
|
159
|
+
### Examples
|
160
|
+
|
161
|
+
Here a simple program to deal with the "Telegram Problem" as first described by Peter Naur.
|
162
|
+
|
163
|
+
> Write a program that takes a number **w**, then accepts lines of text and outputs lines of text, where the output lines have as many words as possible but are never longer than **w** characters. Words may not be split, but you may assume that no single word is too long for a line.
|
164
|
+
|
165
|
+
```ruby
|
166
|
+
require 'deadly_serious'
|
167
|
+
|
168
|
+
module TelegramProblem
|
169
|
+
LINE_SIZE = 80
|
170
|
+
PARAGRAPH_PACKET = 'CONTROL PACKET: paragraph'
|
171
|
+
EOL_PACKET = 'CONTROL PACKET: end of line'
|
172
|
+
|
173
|
+
# Break text in words and "END OF LINE" packets (EOL)
|
174
|
+
class WordSplitter
|
175
|
+
def run(readers: [], writers: [])
|
176
|
+
reader = readers.first
|
177
|
+
writer = writers.first
|
178
|
+
|
179
|
+
reader.each do |line|
|
180
|
+
line.chomp!
|
181
|
+
line.scan(/$|\S+/) do |word|
|
182
|
+
packet = (word == '' ? EOL_PACKET : word)
|
183
|
+
writer << packet << "\n"
|
184
|
+
end
|
185
|
+
end
|
186
|
+
end
|
187
|
+
end
|
188
|
+
|
189
|
+
# Transform double "end of line" in "paragraph"
|
190
|
+
class EolToParagraph
|
191
|
+
def run(readers: [], writers: [])
|
192
|
+
reader = readers.first
|
193
|
+
writer = writers.first
|
194
|
+
|
195
|
+
last2 = ''
|
196
|
+
last1 = ''
|
197
|
+
reader.each do |packet|
|
198
|
+
packet.chomp!
|
199
|
+
|
200
|
+
if packet == EOL_PACKET
|
201
|
+
last2 = last1
|
202
|
+
last1 = packet
|
203
|
+
next
|
204
|
+
end
|
205
|
+
|
206
|
+
if last1 == EOL_PACKET && last2 == EOL_PACKET
|
207
|
+
writer << PARAGRAPH_PACKET << "\n"
|
208
|
+
end
|
209
|
+
|
210
|
+
writer << packet << "\n"
|
211
|
+
last2 = last1
|
212
|
+
last1 = packet
|
213
|
+
end
|
214
|
+
end
|
215
|
+
end
|
216
|
+
|
217
|
+
# Join words
|
218
|
+
class SentenceJoiner
|
219
|
+
def run(readers: [], writers: [])
|
220
|
+
reader = readers.first
|
221
|
+
writer = writers.first
|
222
|
+
|
223
|
+
line_size = 0
|
224
|
+
|
225
|
+
reader.each do |packet|
|
226
|
+
packet.chomp!
|
227
|
+
|
228
|
+
if packet == PARAGRAPH_PACKET
|
229
|
+
writer << "\n\n"
|
230
|
+
line_size = 0
|
231
|
+
next
|
232
|
+
end
|
233
|
+
|
234
|
+
if line_size + packet.size + 1 <= LINE_SIZE
|
235
|
+
writer << ' ' if line_size > 0
|
236
|
+
writer << packet
|
237
|
+
line_size += packet.size + 1
|
238
|
+
else
|
239
|
+
writer << "\n"
|
240
|
+
writer << packet
|
241
|
+
line_size = packet.size
|
242
|
+
end
|
243
|
+
end
|
244
|
+
writer << "\n" unless line_size == 0
|
245
|
+
end
|
246
|
+
end
|
247
|
+
|
248
|
+
class Pipeline < DeadlySerious::Engine::Spawner
|
249
|
+
include DeadlySerious
|
250
|
+
def run_pipeline
|
251
|
+
spawn_process(WordSplitter, readers: ['>war_and_peace.txt'], writers: ['words_and_eol'])
|
252
|
+
spawn_process(EolToParagraph, readers: ['words_and_eol'], writers: ['just_words'])
|
253
|
+
spawn_process(SentenceJoiner, readers: ['just_words'], writers: ['>output.data'])
|
254
|
+
end
|
255
|
+
end
|
256
|
+
end
|
257
|
+
|
258
|
+
if __FILE__ == $0
|
259
|
+
TelegramProblem::Pipeline.new.run
|
260
|
+
end
|
261
|
+
```
|
262
|
+
|
263
|
+
Check the "examples" directory for other examples of use. There's even a version of this Telegram Problem program made without pipes and such, but using the same logic. I made it to have a "feeling" of the overhead of using pipes.
|
264
|
+
|
265
|
+
In my findings, the overhead is roughly 100% in this very simple problem (same time, 2x cpu). Considering that each of the components above are *really* simple (just split, join words and an "if" and 2 pipes), I found the overhead not a great deal. However, I need more tests.
|
266
|
+
|
267
|
+
## Future features (a.k.a. "The Wishlist")
|
268
|
+
|
269
|
+
* Socket connectors (pipe things through net)
|
270
|
+
* Remote coordination (create and running remote components from a master box)
|
271
|
+
* More pre-made components (using C?)
|
26
272
|
|
27
273
|
## Contributing
|
28
274
|
|
data/deadly_serious.gemspec
CHANGED
@@ -1,26 +1,28 @@
|
|
1
|
+
require 'socket'
|
2
|
+
|
1
3
|
module DeadlySerious
|
2
4
|
module Engine
|
3
|
-
class
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
"#{pipe_dir}/#{name}"
|
12
|
-
end
|
13
|
-
end
|
14
|
-
|
15
|
-
# Create a pipe or file (acording to name)
|
16
|
-
# and returns the full name of the thing created.
|
17
|
-
def create
|
18
|
-
return @io_name if File.exist?(@io_name)
|
19
|
-
if @type == :file
|
20
|
-
`touch #{@io_name}`
|
5
|
+
# Fake class, it's actually a factory ¬¬
|
6
|
+
module Channel
|
7
|
+
def self.new(name, data_dir: nil, pipe_dir: nil)
|
8
|
+
matcher = name.match(/^(>)?(.*?)(?:(:)(\d{1,5}))?$/)
|
9
|
+
if matcher[1] == '>'
|
10
|
+
FileChannel.new(matcher[2], data_dir)
|
11
|
+
elsif matcher[3] == ':'
|
12
|
+
SocketChannel.new(matcher[2], matcher[4].to_i)
|
21
13
|
else
|
22
|
-
|
14
|
+
PipeChannel.new(matcher[2], pipe_dir)
|
23
15
|
end
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
class FileChannel
|
20
|
+
def initialize(name, directory)
|
21
|
+
@io_name = File.join(directory, name)
|
22
|
+
end
|
23
|
+
|
24
|
+
def create
|
25
|
+
`touch #{@io_name}` unless File.exist?(@io_name)
|
24
26
|
@io_name
|
25
27
|
end
|
26
28
|
|
@@ -34,5 +36,53 @@ module DeadlySerious
|
|
34
36
|
open(@io_name, 'w')
|
35
37
|
end
|
36
38
|
end
|
39
|
+
|
40
|
+
class PipeChannel
|
41
|
+
def initialize(name, directory)
|
42
|
+
@io_name = File.join(directory, name)
|
43
|
+
end
|
44
|
+
|
45
|
+
def create
|
46
|
+
`mkfifo #{@io_name}` unless File.exist?(@io_name)
|
47
|
+
@io_name
|
48
|
+
end
|
49
|
+
|
50
|
+
def open_reader
|
51
|
+
fail %(Pipe "#{@io_name}" not found) unless File.exist?(@io_name)
|
52
|
+
open(@io_name, 'r')
|
53
|
+
end
|
54
|
+
|
55
|
+
def open_writer
|
56
|
+
fail %(Pipe "#{@io_name}" not found) unless File.exist?(@io_name)
|
57
|
+
open(@io_name, 'w')
|
58
|
+
end
|
59
|
+
end
|
60
|
+
|
61
|
+
class SocketChannel
|
62
|
+
def initialize(host, port)
|
63
|
+
@host, @port = host, port
|
64
|
+
@retry_counter = 3
|
65
|
+
end
|
66
|
+
|
67
|
+
def create
|
68
|
+
# Do nothing
|
69
|
+
end
|
70
|
+
|
71
|
+
def open_reader
|
72
|
+
TCPSocket.new(@host, @port)
|
73
|
+
rescue Exception => e
|
74
|
+
@retry_counter -= 1
|
75
|
+
if @retry_counter > 0
|
76
|
+
sleep 1 and retry
|
77
|
+
else
|
78
|
+
raise e
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
def open_writer
|
83
|
+
server = TCPServer.new(@port)
|
84
|
+
server.accept
|
85
|
+
end
|
86
|
+
end
|
37
87
|
end
|
38
88
|
end
|
@@ -16,48 +16,12 @@ module DeadlySerious
|
|
16
16
|
end
|
17
17
|
end
|
18
18
|
|
19
|
-
def
|
20
|
-
|
21
|
-
end
|
22
|
-
|
23
|
-
def set_process_name(name)
|
24
|
-
$0 = "ruby #{self.class.dasherize(name)}"
|
25
|
-
end
|
26
|
-
|
27
|
-
def channel_for(pipe_name)
|
28
|
-
Channel.new(pipe_name, data_dir: @data_dir, pipe_dir: @pipe_dir)
|
29
|
-
end
|
30
|
-
|
31
|
-
def create_pipe(pipe_name)
|
32
|
-
channel_for(pipe_name).create
|
33
|
-
end
|
34
|
-
|
35
|
-
def read_pipe(pipe_name)
|
36
|
-
channel_for(pipe_name).open_reader
|
37
|
-
end
|
38
|
-
|
39
|
-
def write_pipe(pipe_name)
|
40
|
-
channel = channel_for(pipe_name)
|
41
|
-
return channel.open_writer unless block_given?
|
42
|
-
|
43
|
-
channel.open_writer do |io|
|
44
|
-
yield io
|
45
|
-
end
|
46
|
-
end
|
47
|
-
|
48
|
-
def fork_it
|
49
|
-
@ids << fork do
|
50
|
-
yield
|
51
|
-
end
|
52
|
-
end
|
53
|
-
|
54
|
-
def wait_children
|
55
|
-
@ids.each { |id| Process.wait(id) }
|
56
|
-
end
|
57
|
-
|
58
|
-
def kill_children
|
59
|
-
@ids.each { |id| Process.kill('SIGTERM', id) }
|
19
|
+
def run
|
20
|
+
run_pipeline
|
60
21
|
wait_children
|
22
|
+
rescue Exception => e
|
23
|
+
kill_children
|
24
|
+
raise e
|
61
25
|
end
|
62
26
|
|
63
27
|
def spawn_source(a_class, *args, writer: self.class.dasherize(a_class.name))
|
@@ -94,12 +58,58 @@ module DeadlySerious
|
|
94
58
|
@ids << spawn(command)
|
95
59
|
end
|
96
60
|
|
97
|
-
|
98
|
-
|
61
|
+
private
|
62
|
+
|
63
|
+
# @!group Process Control
|
64
|
+
|
65
|
+
def fork_it
|
66
|
+
@ids << fork do
|
67
|
+
yield
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
71
|
+
def wait_children
|
72
|
+
@ids.each { |id| Process.wait(id) }
|
73
|
+
end
|
74
|
+
|
75
|
+
def kill_children
|
76
|
+
@ids.each { |id| Process.kill('SIGTERM', id) }
|
99
77
|
wait_children
|
100
|
-
|
101
|
-
|
102
|
-
|
78
|
+
end
|
79
|
+
|
80
|
+
def set_process_name(name)
|
81
|
+
$0 = "ruby #{self.class.dasherize(name)}"
|
82
|
+
end
|
83
|
+
|
84
|
+
# @!endgroup
|
85
|
+
# @!group Channel Helpers
|
86
|
+
|
87
|
+
def channel_for(pipe_name)
|
88
|
+
Channel.new(pipe_name, data_dir: @data_dir, pipe_dir: @pipe_dir)
|
89
|
+
end
|
90
|
+
|
91
|
+
def create_pipe(pipe_name)
|
92
|
+
channel_for(pipe_name).create
|
93
|
+
end
|
94
|
+
|
95
|
+
def read_pipe(pipe_name)
|
96
|
+
channel_for(pipe_name).open_reader
|
97
|
+
end
|
98
|
+
|
99
|
+
def write_pipe(pipe_name)
|
100
|
+
channel = channel_for(pipe_name)
|
101
|
+
return channel.open_writer unless block_given?
|
102
|
+
|
103
|
+
channel.open_writer do |io|
|
104
|
+
yield io
|
105
|
+
end
|
106
|
+
end
|
107
|
+
|
108
|
+
# @!endgroup
|
109
|
+
# @!group Minor Helpers
|
110
|
+
|
111
|
+
def self.dasherize(a_string)
|
112
|
+
a_string.gsub(/(.)([A-Z])/, '\1-\2').downcase.gsub(/\W+/, '-')
|
103
113
|
end
|
104
114
|
end
|
105
115
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: deadly_serious
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ronie Uliana
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2013-08-
|
11
|
+
date: 2013-08-12 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -38,6 +38,20 @@ dependencies:
|
|
38
38
|
- - '>='
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: yard
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - '>='
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - '>='
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
41
55
|
- !ruby/object:Gem::Dependency
|
42
56
|
name: json
|
43
57
|
requirement: !ruby/object:Gem::Requirement
|
@@ -105,3 +119,4 @@ summary: Flow Based Programming engine that relies on named pipes and Linux proc
|
|
105
119
|
with the Operating System, i.e., the S.O. is *part* of the program, it's not something
|
106
120
|
*below* it.
|
107
121
|
test_files: []
|
122
|
+
has_rdoc:
|