fbp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: f960d9070fecbed44e1f00cf0101600ba45ae40e
4
+ data.tar.gz: 421aaed3fb2d274f77654b085362d9078fb6aff8
5
+ SHA512:
6
+ metadata.gz: 3be973bebeaca66839b8e3927895c2e872e547a83abf35615a39b4ab0b8f2a5ea6a56f633b111a923e7d96fa5ea4f61d475e8eaaa5a49af3ccca7604f0af334b
7
+ data.tar.gz: 940deecb561721366fed0ecaaa39af8cf95c7fd681441b8f3cf999b0aff1ae05d7a0614c1bb09431d8a7fba0c0ae4ec458fd8031dd50619b685a973a2dfbd731
data/lib/fbp.rb ADDED
@@ -0,0 +1,464 @@
1
+ require "fbp/version"
2
+ require "fbp/fpb-thread-pool"
3
+
4
+ =begin rdoc
5
+ == Description
6
+ The Fbp module provides support for Flow Based Programming for the Ruby
7
+ language.
8
+
9
+ Flow Based Programming is described in the Book:
10
+ <b>Flow-Based Programming, 2nd Edition: A New Approach to Application Development</b>
11
+ Written by J. Paul Morrison
12
+ ISBN-10: 1451542321
13
+ ISBN-13: 978-1451542325
14
+
15
+ Documentation about Flow Based Programing can also be found
16
+ at http://www.jpaulmorrison.com/fbp/
17
+
18
+ == Discussion
19
+ The main idea behind Flow based Programming is that data should flow from
20
+ one asynchronous processing unit to another asynchronous processing unit.
21
+ These asynchronous processing units together would form a network that would
22
+ constitute an application. One of the many benefits of this programming
23
+ model is it makes it much easier to create multi-threaded and multi-process
24
+ applications. It also fosters the creation of small reusable components that
25
+ maybe used in multiple applications. J. Paul Morrison does a great job of
26
+ describing the benefits of this programming model and I highly recommend reading
27
+ his book on the subject.
28
+
29
+ Recently developers have rediscovered Flow Based Programming as a way to deal
30
+ with managing multi-threaded applications and to deal with the complexity of
31
+ web development. NoFlo is a company that developed a Flow Based Programming
32
+ system for Javascript.
33
+
34
+ While having used the precepts of Flow Based Programming over the years, there
35
+ was not a unified way to do Flow Based Programming. Given that Ruby is one of the
36
+ main development languages for web development, it seemed time was ripe for creating
37
+ a Flow Based Programming system for Ruby
38
+
39
+ == Concepts
40
+ One of the basic concepts of FBP is an Information Packet (IP) An IP defines the data
41
+ that flows between asynchronous processing units. For Ruby Fbp an IP is a hash object.
42
+ This allows for a common data type with infinite variations.
43
+
44
+ For the Ruby implementation, each asynchronous unit is implemented by a Node object.
45
+ Each Node object runs on a thread to ensure asynchronous execution. To keep the
46
+ number of threads to a reasonable amount, Node objects, execute on a thread in a
47
+ thread pool managed by the Node_Pool class which in turn uses the Pool class developed
48
+ by Kim Burgestrand kim@burgestrand.se.
49
+
50
+ Nodes define an input queue and at least one output channel. The input queue of a
51
+ node uses a Queue object to manage the IPs that come into the node. While every node
52
+ is run on a thread, no processing occurs on that thread until an item has been pushed
53
+ into the input queue of an node. This is possible because the Queue class will block
54
+ until it can get an item out of the queue.
55
+
56
+ Nodes may need to be parameterized . In J. Paul Morrison's book, the data used to do
57
+ parameterization is called an Initial Information Packed (IIP). For Ruby an IIP is a
58
+ hash object. A Node typically takes an IIP as an optional argument when creating a
59
+ new instance of an object. An IIP may also come to a node the node from its input queue,
60
+ doing so means that the parameterization of a node can come from an upstream node.
61
+
62
+ By default Nodes instances have a single output channel named :output. For many Nodes
63
+ this is sufficient. There are however some nodes that need to have multiple output
64
+ channels. An example would be a Selector_node. A Selector_node takes IPs and compares
65
+ a value in the IP with a preset value. Given the selector record the IP either matches
66
+ a criteria or not. Depending on if the incoming IP matches the preset value or not
67
+ determines which output channel. So while the default is that this is only a single
68
+ output channel, Nodes may have any number of output channels. For Nodes that do have
69
+ a multiple output channels, it is possible to write to every output channel by
70
+ specifying the output channel :all.
71
+
72
+ == Current Implementation
73
+ The current release must be consider a proof of concept. Only a handful of Nodes have
74
+ been created and many more would need to be created before significant applications
75
+ can be made. Over time more nodes will be developed but there are enough nodes to
76
+ start looking at Flow Based Programming for Ruby.
77
+ =end
78
+ module Fbp
79
+
80
+ protected
81
+ def self.create(name) #:nodoc:
82
+ class_object = nil
83
+
84
+ class_object = name if name.is_a? Class
85
+ if class_object.nil?
86
+ begin
87
+ name_str = name.to_s
88
+ class_object = Kernel.const_get(name_str)
89
+ rescue
90
+ class_object = nil
91
+ end
92
+ return nil if class_object.nil?
93
+ end
94
+
95
+ instance = nil
96
+ begin
97
+ instance = class_object.new()
98
+ rescue
99
+ instance = nil
100
+ end
101
+
102
+ instance
103
+ end
104
+
105
+ =begin rdoc
106
+ === Description
107
+ The Node class defines the base class for all Ruby Fbp Nodes.
108
+ It defines the basic operations of all Nodes.
109
+ === Discussion
110
+ When creating a new Node subclass is made it will need to at least
111
+ override the do_node_work method. If the subclass requires an IIP
112
+ the subclass will also need to override the is_ready_to_run? method.
113
+
114
+ The Node class basic behavior is to write its incoming IPs to its
115
+ single output channel.
116
+
117
+ === Node object life cycle
118
+ A node is created it is quiescent. This is necessary as most node need
119
+ other nodes in a network of node before it is useful. Once all of the
120
+ nodes have been created that are needed for an application, they need to
121
+ be placed into a network using the register_for_output_from_node method
122
+ to hve the output of one node become the input to another node. Once the
123
+ network has been made. The first node in the network needs to be executed.
124
+ Calling the execute method on the first node will ensure that all nodes
125
+ in the network will also be executed. When a node is execute, it will
126
+ be sent an IP {:start => true}. This means that when a new node subclass
127
+ is made it needs to know that it will receive this IP in its
128
+ def do_node_work method. Once a node is executing, it will block until
129
+ an IP is push into the input queue of the node unless the node has the :
130
+ :requires_input option is set to false. If that is the case the node will not
131
+ block but will send an IP to the node of the form {:continue => true} for
132
+ the node to process. The ability not to block is useful for nodes like
133
+ the Test_file_reader_node node that reads input from a file and
134
+ creates IPs for down stream nodes to process.
135
+
136
+ Each subclass of the Node class must override the do_node_work method. If the node has
137
+ not completed its work then the do_node_work method should return true.
138
+ If the node has completed its work it should return false. When a node
139
+ has completed its work, it will be sent an IP of {:completed => true}.
140
+ This :completed IP is handled in a special way. It will push this IP into
141
+ output_queue attribute of a Node object. This allows the wait_until_completed
142
+ to block until all of the work of a node is completed. The {:completed => true}
143
+ IP is also sent to every output channel for a node telling all of the
144
+ down stream nodes that its upstream node has finished.
145
+
146
+ === Example Usage
147
+ # Need to require the Fbp gem
148
+ require 'Fbp'
149
+ # First Set the number of threads that should be used for this solution
150
+ Fbp::num_threads = 5
151
+ # Make the thread pool that will be used to run the nodes in the application
152
+ Fbp::make_pool
153
+ # Make the nodes needed for the application
154
+ read_node = Fbp::Test_file_reader_node.new(File.expand_path('~/input.txt'))
155
+ write_node = Fbp::Text_file_writer_node.new(File.expand_path('~/output.txt'))
156
+ # Hook up the nodes into the network needed for the application
157
+ write_node.register_for_output_from_node(read_node)
158
+ # Execute the first node in the network
159
+ read_node.execute
160
+ # Wait for the network to complete its work by checking to see if the last node
161
+ # in the network has completed
162
+ write_node.wait_until_completed
163
+ # With the work completed shutdown the thread pool
164
+ Fbp::shutdown
165
+ =end
166
+ class Node
167
+
168
+ # The output attribute is an Array that holds all of the
169
+ # output channels for this Node
170
+ attr_accessor :output
171
+
172
+ # The executing attribute specifies if a Node is executing.
173
+ attr_reader :executing
174
+
175
+ # The options attribute hold the IIP data for a Node.
176
+ attr_reader :options
177
+
178
+ protected
179
+ # The input attribute is the Queue instance that holds incoming
180
+ # IPs for this Node.
181
+ attr_accessor :input
182
+
183
+ public
184
+ def initialize() #:nodoc:
185
+ # Initialize the input and out data types
186
+ @input_queue = Queue.new
187
+ @output_queue = Queue.new
188
+
189
+ channel_output = Array.new
190
+ @output = {:output => channel_output}
191
+
192
+ @mutex = Mutex.new
193
+
194
+ # Provide for parameterization for a node
195
+ @options = Hash.new
196
+ @options[:output] = :all
197
+ @options[:requires_input] = true
198
+
199
+ # Initialize the state variables for a node
200
+ @executing = false
201
+ @in_transaction = false
202
+ @continue_processing = true
203
+
204
+ # Provide for transaction support
205
+ @transactions_must_queue = true
206
+ @transaction_queue = Array.new
207
+ end
208
+ =begin rdoc
209
+ The do_node_work method is where the work of a Node is done.
210
+ Each subclass of the Node object will need to override
211
+ this method to implement the behavior of the node. The
212
+ base behavior of this method is to write its incoming IPs
213
+ to it single output channel
214
+ =end
215
+ def do_node_work(args)
216
+ write_to_output(args)
217
+ return false if args.has_key?(:stop)
218
+ true
219
+ end
220
+
221
+ =begin rdoc
222
+ The is_ready_to_run? method is used to ensure that a Node has
223
+ received its required IIP before it is allowed to process incoming
224
+ IPs. Each subclass of the Node class that needs an IIP before it
225
+ can process IPs needs to override this method and check to see if
226
+ all of the required options have been received. If all of the
227
+ required options have been set then this method should return true
228
+ otherwise it should return false. The default behavior of the Node
229
+ class is to simply return true.
230
+ =end
231
+ def is_ready_to_run?()
232
+ true
233
+ end
234
+
235
+ =begin rdoc
236
+ The execute method will start the execution of this node on one of
237
+ the threads in the thread pool. It will block until data
238
+ comes into the input queue unless the node has has the :requires_input
239
+ option it is false. If the :requires_input option is set to false then
240
+ the input queue will be checked but if there are no IPs to process in the queue
241
+ an IP will created of the form {:continue => true} and that will be sent to
242
+ the node for processing.
243
+ =end
244
+ def execute
245
+ return if @executing || !Fbp.has_pool
246
+
247
+ @output.each do |key, channel|
248
+ next if channel.nil?
249
+ channel.each {|n| n.execute if !n.executing}
250
+ end
251
+
252
+ write_to_input({:start => true})
253
+ @executing = true
254
+ Fbp.schedule do
255
+ while @continue_processing
256
+ @mutex.synchronize do
257
+ if @options[:requires_input]
258
+ @continue_processing = should_continue?(@input_queue.pop)
259
+ else
260
+ begin
261
+ ip = @input_queue.pop(true) # non blocking
262
+ rescue
263
+ ip = nil
264
+ end
265
+ ip = ip.nil? ? {:continue => true} : ip
266
+ @continue_processing = should_continue?(ip)
267
+ end
268
+ end
269
+ end
270
+ @executing = false
271
+ @output.each_key {|channel| write_to_output({:completed => true}, channel)}
272
+ end
273
+ end
274
+
275
+ =begin rcod
276
+ The write_to_input method will push an IP onto the input queue of a Node.
277
+ =end
278
+ def write_to_input(obj)
279
+ @input_queue << obj if !obj.nil?
280
+ end
281
+
282
+ =begin
283
+ The write_to_output will write an IP into the input queue of all of
284
+ the nodes in the specified output channel. The default output
285
+ channel is the :all channel which write the IP to every channel.
286
+ =end
287
+ def write_to_output(result, output_channel = :all)
288
+ return if result.nil?
289
+ @output_queue << result if result.has_key? :completed
290
+
291
+ # Get the channels that will be written to
292
+ channels = nil
293
+ if :all == output_channel
294
+ channels = @output.keys
295
+ else
296
+ channels = [output_channel]
297
+ end
298
+
299
+ # With the channel set, iterate each node and write to
300
+ # that nodes output
301
+ channels.each do |channel_key|
302
+ next if channel_key.nil?
303
+ channel_array = @output[channel_key]
304
+ next if channel_array.nil?
305
+ channel_array.each do |a_node|
306
+ next if a_node.nil?
307
+ a_node.write_to_input(result)
308
+ end
309
+ end
310
+
311
+ end
312
+ =begin rdoc
313
+ The register_for_output_from_node method is how networks of nodes are created.
314
+ A down stream node will register with an up stream node for the up stream's
315
+ node output on a specific output channel. The default output channel is the
316
+ :output channel. Calling method will place the calling object into the array
317
+ of nodes in the upstream node's output channel. When the up stream node
318
+ writes out it output, it will write it to all of the input queues of all of
319
+ the down stream nodes that have registered for the output of the up steam node.
320
+ =end
321
+ def register_for_output_from_node(node, output_channel = :output)
322
+ return if node.nil?
323
+ channel = node.output[output_channel]
324
+ if channel.nil?
325
+ channel = Array.new
326
+ node.output[output_channel] = channel
327
+ end
328
+ node.output[output_channel] << self if !node.output[output_channel].include? self
329
+ end
330
+
331
+ =begin rdoc
332
+ The unregister_for_output_from_node method will remove this node from
333
+ an output queue of an up stream node.
334
+ =end
335
+ def unregister_for_output_from_node(node, output_channel = :output)
336
+ return if node.nil?
337
+ channel = node.output[output_channel]
338
+ node.output[output_channel] = node.output[output_channel] - [self] if !channel.nil?
339
+ end
340
+
341
+ def merge_options!(options) #:nodoc:
342
+ @options.merge!(options)
343
+ end
344
+
345
+ def set_option (key, value) #:nodoc:
346
+ return if key.nil?
347
+ @options[key] = value
348
+ end
349
+
350
+ def clean_option(key) #:nodoc:
351
+ return if key.nil?
352
+ @options.delete key
353
+ end
354
+
355
+ =begin rdoc
356
+ The stop method will send an IP to this node of the form {:stop => true}.
357
+ The default implementation would be to have the execution of the node stop
358
+ though subclasses of the Node class could change that behavior
359
+ =end
360
+ def stop
361
+ write_to_input({:stop => true})
362
+ end
363
+
364
+ =begin rdoc
365
+ The wait_until_completed provides a way to wait until a node has completed
366
+ its work. This is needed as all of the work of the nodes in a network of
367
+ nodes is done asynchronously. Typically this is called on the last node
368
+ in a network of nodes to ensure that all processing has completed. This
369
+ method works by waiting on the output_queue which is only written to when
370
+ all work of the node has completed. The Queue instance will cause the
371
+ calling thread to block until an IP has been placed into the output
372
+ queue.
373
+ =end
374
+ def wait_until_completed
375
+ @output_queue.pop
376
+ end
377
+
378
+ protected
379
+ def should_continue?(args) #:nodoc:
380
+ # If the IP contains the :stop key then stop execution
381
+ return false if args.has_key? :stop
382
+
383
+ # Options (IIP) support.
384
+ # If this node requires an IIP before it can process IPs
385
+ # check to see if all of the required IIPs have been received
386
+ # by the node. This is done by calling is_ready_to_run?
387
+ # Each node that requires IIP(s) before it can processes IPs must
388
+ # re-implement the is_ready_to_run? method and have that
389
+ # method determine if all required options have been set.
390
+ #
391
+ # If an IP is sent before all of the required options then the
392
+ # node is put into a transaction and all of the IPs will be cached
393
+ # in order until all of the required options have been set. Once
394
+ # all of the required options have been set then the end
395
+ # of transaction is sent and the cached IPs will be processed.
396
+ if !is_ready_to_run?()
397
+ if args.has_key? :option
398
+ @options.merge!(args[:option])
399
+ args.delete :option
400
+ end
401
+ args[:end_transaction] = true if required_options_recieved? && @in_transaction
402
+ args[:begin_transaction] = true if !required_options_recieved? && !@in_transaction
403
+ end
404
+
405
+ # If the IP signals the beginning of a transaction, ensure that this node
406
+ # is not already in a transaction. Nested transactions are not supported.
407
+ # Also if the node is marked with as a one shot that is incompatible with
408
+ # being in a transaction
409
+ return false if args.has_key? :begin_transaction && @in_transaction
410
+
411
+ # If the IP signals the end of a transaction and the node is not in a
412
+ # transaction, this is a programming error
413
+ return false if args.has_key? :end_transaction && !@in_transaction
414
+
415
+ # If the IP signals the beginning of a transaction then mark this node
416
+ # as being in a transaction and if there is no other data in the IP
417
+ # then return signaling that the node should continue
418
+ if args.has_key?(:begin_transaction)
419
+ @in_transaction = true
420
+ args.delete :begin_transaction
421
+ return true if args.empty?
422
+ end
423
+
424
+ # If the IP signals the end of a transaction then see if there is any other
425
+ # data in the IP. If there is any additional data add it to the
426
+ # transaction queue before processing the transaction. Once processed mark
427
+ # this node as not being in a transaction
428
+ if args.has_key?(:end_transaction) || (args.has_key?(:completed) && !@transaction_queue.empty?)
429
+ args.delete :end_transaction
430
+ @transaction_queue << args if !args.empty?
431
+ args.clear
432
+ args[:ips] = @transaction_queue.clone if !@transaction_queue.empty?
433
+ @transaction_queue.clear
434
+ @in_transaction = false
435
+ return true if !args.has_key? :ips
436
+ end
437
+
438
+ # If the node is in a transaction add the IP to the transaction queue,
439
+ # otherwise tell the node to process th IP
440
+ result = true
441
+
442
+ if @in_transaction
443
+ @transaction_queue << args
444
+ else
445
+ result = do_node_work(args)
446
+ end
447
+
448
+ result
449
+ end
450
+ end
451
+ end
452
+
453
+ require "fbp/selector-node"
454
+ require "fbp/concatenate-node"
455
+ require "fbp/assign-node"
456
+ require "fbp/encode-node"
457
+ require "fbp/decode-node"
458
+ require "fbp/sort-node"
459
+ require "fbp/counter-node"
460
+ require "fbp/text_file_reader_node"
461
+ require "fbp/text_file_writer_node"
462
+ require "fbp/flow-node"
463
+ require "fbp/aggregator-node"
464
+