qswarm 0.0.21 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1 +1,264 @@
1
- Work in progress
1
+ # Qswarm - stream processing for Ruby #
2
+
3
+ Qswarm is a Ruby DSL for manipulating real-time streams of messages. It defines three basic concepts - [connections](#connections), [sources](#sources), and [sinks](#sinks). Connections emit messages which sources can catch, sinking them to other connections. In this way you can construct data flows between systems and transform messages in-flight with Ruby.
4
+
5
+ Install with:
6
+
7
+ ```sh
8
+ gem install qswarm
9
+ ```
10
+
11
+ ## Agent ##
12
+
13
+ Use the `agent` command to wrap a set of DSL commands.
14
+
15
+ ```ruby
16
+ agent :bob do
17
+ ...
18
+ end
19
+ ```
20
+
21
+ Alternatively you could save each agent in a separate file and use a process manager such as [supervisord][], [god][], [bluepill][] to manage the swarm.
22
+
23
+ ## Connections ##
24
+
25
+
26
+ Use `connect` to setup connections to services. Currently [Logger](#logger), [AMQP](#amqp), [XMPP](#xmpp), and [Twitter](#twitter) are supported. You can also pass an optional block which will be executed once the connection is set up.
27
+
28
+ ### Logger ###
29
+
30
+ ```ruby
31
+ connect :mylog,
32
+ :type => :logger,
33
+ :filename => 'foo.log'
34
+ ```
35
+
36
+ Logger is a very simple connection type which can be used to append a stream of messages to a file. It can only `sink` messages (i.e. doesn't not emit any data) and it provides no arguments to `sink`.
37
+
38
+ ### AMQP ###
39
+
40
+ ```ruby
41
+ connect :messages,
42
+ :type => :amqp,
43
+ :uri => 'guest:guest@localhost:5672/',
44
+ :exchange_type => :headers,
45
+ :exchange_name => 'myexchange',
46
+ :exchange_args => { :durable => true },
47
+ :queue_args => { :auto_delete => true, :durable => true, :exclusive => true },
48
+ :subscribe_args => { :exclusive => false, :ack => false },
49
+ :bind_args => {},
50
+ :prefetch => 0,
51
+ :bind => 'foo.bar.#',
52
+ :format => :json
53
+ ```
54
+
55
+ This sets up an AMQP connection called `:messages` using the credentials in `:uri` (user:pass@host:port/vhost) and creates the exchange if it doesn't exist already (using `:exchange_args`). If a routing key is passed with `:bind` then a queue will be created with the dotted concatenation of the agent name and the connection name, e.g. bob.messages, and bound to the exchange specified (you can pass `:uniq => true` if you want a UUID appended to the queue name to make it unique for situations such as load balancing). At the moment you can't bind a queue to an exchange without specifying a routing key in `:bind`. You can pass configuration to the binding with `:bind_args`. Similarly `:queue_args` allow you to pass configuration options to the queue creation. Defaults for *_args are as in the example.
56
+
57
+ The agent is automatically subscribed to the created queue and you can pass `:subscribe_args` to configure the subscription. If you specified `:ack` to be true then you can use `:prefetch` to specify how many messages you want to have from the queue at a time.
58
+
59
+ The `:format` argument determines what Qswarm does with the payloads it receives and how it transforms messages to be sent, see section [Payload](#payload).
60
+
61
+ AMQP sets the following headers for `source` to use as [guards](#filters-and-guards):
62
+
63
+ * `:routing_key`
64
+ * Any headers from a headers exchange will be passed verbatim
65
+
66
+ AMQP supports the following arguments to `sink`:
67
+
68
+ * `:routing_key` - the key to post the message under
69
+ * `:headers` - a Hash that will be used instead of the payload.headers for posting to headers exchanges
70
+
71
+ ### XMPP ###
72
+
73
+ ```ruby
74
+ connect :hipchat,
75
+ :type => :xmpp,
76
+ :jid => '54321_123456@chat.hipchat.com',
77
+ :real_name => 'My bot',
78
+ :channel => ['54321_lounge@conf.hipchat.com', '54321_chat@conf.hipchat.com'],
79
+ :password => 'foobar'
80
+ ```
81
+
82
+ The above example connects to an XMPP service called `:hipchat` using the JID and password provided. The `:real_name` will be used when joining groupchat rooms and for some services (like Hipchat) needs to match exactly your registered name including case. The script will automatically join the groupchat channel(s) specified in `:channel` and will use these channels list for sinks which don't specify a channel destination. XMPP support is provided using the [Blather][] library, which means that you can include Blather DSL in connect's block to implement bot behaviours. This block will execute once the connection to the XMPP server has been established (when_ready).
83
+
84
+ Currently there is no support to `source` messages from an XMPP connection (i.e. you can only talk not listen) so the Blather DSL is your only option if you want interactivity at the moment.
85
+
86
+ XMPP supports the following arguments to `sink`:
87
+
88
+ * `:channel` - an Array or String of the groupchat channel(s) to sink the message to (will join if not already present)
89
+
90
+ ### Twitter ###
91
+
92
+ ```ruby
93
+ connect :tweetstream,
94
+ :type => :twitter,
95
+ :consumer_key => 'YOURKEYHERE',
96
+ :consumer_secret => 'YOURSECRETHERE',
97
+ :oauth_token => 'YOURTOKENHERE',
98
+ :oauth_token_secret => 'YOURSECRETHERE',
99
+ :track => {
100
+ :colours => ['red', 'green', 'blue'],
101
+ :feelings => ['happy', 'sad'],
102
+ :tech => ['ruby', 'python'],
103
+ },
104
+ :follow => {
105
+ :tech => [11987892]
106
+ },
107
+ :list => {
108
+ :flibbertigibbets => { 'Scobleizer' => 'most-influential-in-tech' }
109
+ }
110
+ ```
111
+
112
+ A Twitter connection uses the [Tweetstream][Tweetstream gem] and [Twitter][Twitter gem] gems and requires oAuth credentials which you can get from [dev.twitter.com][Twitter auth]. There are three options that the Twitter API gives you - you can `:track` keywords in the global tweet stream using track, you can `:follow` the full stream of particular users (by twitter ID as Tweetstream doesn't let you use handles), or you can get updates from everyone included on a `:list` (this uses the REST API and the list is polled every minute).
113
+
114
+ You specify groups (:colours/:feelings/:tech/:flibbertigibbets above) to allow for easy filtering later on with [guards](#filters-and-guards). Twitter messages are always JSON so there is no `:format` option for connect.
115
+
116
+ Twitter would set the following example headers for `source` to use depending on the `:type` that generated the message.
117
+
118
+ * **:type** => 'track', **:group** => 'colours|feelings|tech', **:matches** => [red|green|blue|happy|sad|ruby|python]
119
+ * **:type** => 'follow', **:group** => 'tech', **:user_id** => 11987892
120
+ * **:type** => 'list', **:group** => 'flibbertigibbets', **:user_id** => 'Scobleizer', **:slug** => 'most-influential-in-tech'
121
+
122
+ There is currently no support to `sink` messages to a Twitter connection - i.e. you cannot Tweet.
123
+
124
+
125
+ ## Payload ##
126
+
127
+ Payload is a Hash containing the following data by default:
128
+
129
+ * payload.raw
130
+ * payload.data
131
+ * payload.format (set by arguments to the originating connect)
132
+ * payload.headers
133
+
134
+ When a message is received from a connection, it is accessible in the DSL with `payload`. The `:format` option in a `connect` declares the format of messages emitted by this connection and determines processing that will be applied to the raw payload. What this means is that if the `:format` is set to JSON, `payload.data` will be set to a Ruby Hash created by `JSON.parse(payload.raw, :symbolize_names => true)`. If `:format` is :xml then `payload.data` will be set to `Nokogiri::XML(payload.raw)`. If :raw then `payload.data` will equal `payload.raw`.
135
+
136
+ Sinks can also set `:format` to define the reverse as messages are converted back from their Ruby objects for transmission. If no argument is supplied a `sink` will assume the `connect` specified value as a default.
137
+
138
+ Some connection types add `payload.headers` which will contain a Ruby Hash of relevant data.
139
+
140
+ ## Filters and Guards ##
141
+
142
+ You can use `before` and `after` as filters which will execute on receipt of a message from a specified connection. They will execute before or after your `source` commands. This example creates a plain text format of a tweet, expanding twitter handles, which can then be used in all `source` blocks.
143
+
144
+ ```ruby
145
+ before :tweetstream do
146
+ @pp = "<#{payload.data[:user][:name]}/#{payload.data[:user][:screen_name]}> #{payload.data[:text]}"
147
+ payload.data[:entities][:user_mentions].each do |u|
148
+ @pp.gsub!(/@#{u[:screen_name]}/,"<#{u[:name]}/#{u[:screen_name]}>")
149
+ end
150
+ end
151
+ ```
152
+
153
+ Guards (shamelessly copied from [Blather][]) allow you to put conditional execution on `before`, `after`, and `source` blocks by filtering on data passed in `payload.headers`. Please note that header values are always Strings not Symbols.
154
+
155
+ The types of guards are:
156
+
157
+ ```ruby
158
+ # Hash with any value
159
+ # Equivalent to payload.headers[:body] == 'exit'
160
+ source :messages, :body => 'exit'
161
+
162
+ # Hash with regular expression
163
+ # Equivalent to payload.headers[:body].match /exit/
164
+ source :messages, :body => /exit/
165
+
166
+ # Hash with array
167
+ # Equivalent to ['gone', 'forbidden'].include?(payload.headers[:name])
168
+ source :messages, :name => ['gone', 'forbidden']
169
+
170
+ # Proc
171
+ # Calls the proc passing in payload.headers
172
+ # Checks that the ID is modulo 3
173
+ source :messages, Proc { |header| header[:id] % 3 == 0 }
174
+
175
+ # Array
176
+ # Use arrays with the previous types effectively turns the guard into
177
+ # an OR statement.
178
+ # Equivalent to payload.headers[:body] == 'foo' || payload.headers[:body] == 'baz'
179
+ source :messages, [{:body => 'foo'}, {:body => 'baz'}]
180
+ ```
181
+
182
+ ## Sources ##
183
+
184
+ Sources listen to messages from connections and process them using their blocks which are executed on receipt of a message. The headers available for [guards][Filters and Guards] will be dependant on the connection that sent the message. All sources that match will receive the message and execute.
185
+
186
+ ```ruby
187
+ source :tweetstream, :type => 'follow', :user_id => 224662544 do
188
+ if payload.data[:text].match(/A14/)
189
+ ...
190
+ end
191
+ end
192
+ ```
193
+
194
+ The above will listen to messages from the `:tweetstream` connection. The [guards](#filters-and-guards) will eliminate any tweet which doesn't come from a `:follow` (rather than `:track` or `:list`) and where the user doesn't match the provided ID which happens to be the Highways Agency twitter account for East of England travel news. The pattern match for A14 is done in the block because the tweet text isn't available in the headers.
195
+
196
+ ## Sinks ##
197
+
198
+ Sinks publish to connections the output of their blocks. Here's an example of sinking a text message generated from a connection sending XML messages.
199
+
200
+ ```ruby
201
+ sink :hipchat,
202
+ :format => :xml,
203
+ :channels => ['12345_errors@conf.hipchat.com'] do
204
+
205
+ message = payload.data.at_xpath('error')['message']
206
+ "*** ERROR: " + message[0..140] + (message.size > 140 ? ' ... ' : ' ' ) + payload.headers.to_s
207
+ end
208
+ ```
209
+
210
+ In this case the `:format` argument isn't really needed because a return payload is specified by the block, but if the block was absent Qswarm would use it to know it needed to do a `payload.data.to_xml` before sending to the connection. You can have multiple sinks in a single source block that will all process the same payload.
211
+
212
+ ## Full Example ##
213
+
214
+ ```ruby
215
+ agent :bob do
216
+ connect :hipchat,
217
+ :type => :xmpp,
218
+ :jid => '54321_123456@chat.hipchat.com',
219
+ :channel => ['54321_lounge@conf.hipchat.com', '54321_chat@conf.hipchat.com'],
220
+ :password => 'foobar'
221
+
222
+ connect :tweetstream,
223
+ :type => :twitter,
224
+ :consumer_key => 'YOURKEYHERE',
225
+ :consumer_secret => 'YOURSECRETHERE',
226
+ :oauth_token => 'YOURTOKENHERE',
227
+ :oauth_token_secret => 'YOURSECRETHERE',
228
+ :track => {
229
+ :colours => ['red', 'green', 'blue'],
230
+ :feelings => ['happy', 'sad'],
231
+ :tech => ['ruby', 'python'],
232
+ },
233
+ :follow => {
234
+ :tech => [11987892]
235
+ },
236
+ :list => {
237
+ :flibbertigibbets => { 'Scobleizer' => 'most-influential-in-tech' }
238
+ }
239
+
240
+ source :tweetstream, :type => %w( follow list ) do
241
+ sink :hipchat,
242
+ :channel => '54321_influencers@conf.hipchat.com'
243
+ end
244
+
245
+ source :tweetstream, :group => 'tech' do
246
+ sink :hipchat,
247
+ :channel => '54321_cool_stuff@conf.hipchat.com'
248
+ end
249
+ end
250
+ ```
251
+
252
+ More examples can be found in these blog posts:
253
+
254
+ * [Stream processing in Ruby](http://ecafe.org/blog/2013/12/13/stream-processing-in-ruby.html)
255
+
256
+ ----
257
+
258
+ [supervisord]: http://supervisord.org
259
+ [god]: http://godrb.com
260
+ [bluepill]: https://github.com/bluepill-rb/bluepill
261
+ [Blather]: https://github.com/adhearsion/blather
262
+ [Tweetstream gem]: https://github.com/tweetstream/tweetstream
263
+ [Twitter gem]: https://github.com/sferik/twitter
264
+ [Twitter auth]: https://dev.twitter.com/docs/auth/tokens-devtwittercom
data/bin/qswarm CHANGED
@@ -3,14 +3,17 @@
3
3
  $stdout.sync = true
4
4
 
5
5
  require 'qswarm'
6
+ require 'trollop'
6
7
 
7
- usage = "#{$0} <configuration file>"
8
- abort usage unless config = ARGV.shift
8
+ opts = Trollop::options do
9
+ opt :debug, "Turn log level up to DEBUG"
10
+ end
9
11
 
10
- $graylog2_facility = config.split('.')[0]
11
- $graylog2_host = ARGV.shift || 'localhost'
12
+ abort "Usage: #{$0} <configuration file>" unless config = ARGV.shift
12
13
 
14
+ if opts[:debug]
15
+ Qswarm.logger.level = Logger::DEBUG
16
+ end
13
17
 
14
- swarm = Qswarm::Swarm.load config
15
- swarm.log.level = Logger::DEBUG
18
+ swarm = Qswarm::Swarm.new(config)
16
19
  swarm.run
@@ -1,3 +1,17 @@
1
- require "qswarm/version"
2
- require 'qswarm/loggable'
3
- require 'qswarm/swarm'
1
+ %w[
2
+ qswarm/version
3
+ qswarm/dsl
4
+ qswarm/swarm
5
+ qswarm/agent
6
+ qswarm/connection
7
+ logger
8
+ ].each { |r| require r }
9
+
10
+ module Qswarm
11
+ @@logger = nil
12
+ class << self
13
+ def logger
14
+ @@logger ||= Logger.new($stdout).tap {|logger| logger.level = Logger::INFO }
15
+ end
16
+ end
17
+ end
@@ -1,65 +1,185 @@
1
- require 'qswarm/broker'
2
- require 'qswarm/listener'
1
+ require 'ostruct'
2
+ require 'json'
3
+ require 'nokogiri'
3
4
 
4
5
  module Qswarm
5
6
  class Agent
6
- include Qswarm::Loggable
7
+ include Qswarm::DSL
7
8
 
8
9
  attr_reader :swarm, :name
10
+ dsl :connect, :before, :after, :source, :sink, :emit, :agent, :payload
9
11
 
10
- def initialize(swarm, name, args, &block)
12
+ def initialize(swarm, name, args = nil, &block)
11
13
  @swarm = swarm
12
- @name = name.to_s
13
- @brokers = {}
14
- @listeners = []
14
+ @name = name
15
+ @clients = {}
15
16
  @args = args
17
+ @filters = {}
18
+ @handlers = {}
19
+ @payload = nil
16
20
 
17
- unless args.nil?
18
- case args.delete :type
19
- when :esper
20
- require 'java'
21
-
22
- require 'esper-4.5.0/esper-4.5.0.jar'
23
- require 'esper-4.5.0/esper/lib/commons-logging-1.1.1.jar'
24
- require 'esper-4.5.0/esper/lib/antlr-runtime-3.2.jar'
25
- require 'esper-4.5.0/esper/lib/cglib-nodep-2.2.jar'
26
- require 'esper-4.5.0/esper/lib/log4j-1.2.16.jar'
27
-
28
- include_class 'com.espertech.esper.client.EPRuntime'
29
- include_class 'com.espertech.esper.client.EPServiceProviderManager'
30
- include_class 'com.espertech.esper.client.EPServiceProvider'
31
- include_class 'com.espertech.esper.client.EPStatement'
32
-
33
- include_class 'com.espertech.esper.client.UpdateListener'
34
- include_class 'com.espertech.esper.client.EventBean'
35
- include_class 'org.apache.commons.logging.Log'
36
- include_class 'org.apache.commons.logging.LogFactory'
37
- end
21
+ dsl_call(&block)
22
+ end
23
+
24
+ def agent
25
+ self
26
+ end
27
+
28
+ def payload
29
+ @payload
30
+ end
31
+
32
+ # Connects to a data stream
33
+ #
34
+ # @param name [String] the name of the connection
35
+ # @param args [Hash] arguments for the connection
36
+ # @param &block [Proc] a block which is passed to the client constructor
37
+ def connect(name, args = nil, &block)
38
+ raise "Connection '#{name.inspect}' is already registered" if @clients[name]
39
+
40
+ if !args.nil? && !args[:type].nil?
41
+ Qswarm.logger.info "[#{@name.inspect}] Registering #{args[:type].inspect} connection #{name.inspect}"
42
+ require "qswarm/connections/#{args[:type].downcase}"
43
+ @clients[name] = eval("Qswarm::Connections::#{args[:type].capitalize}").new(self, name, args, &block)
44
+ else
45
+ Qswarm.logger.info "[#{@name.inspect}] Registering default connection #{name.inspect}"
46
+ @clients[name] = Qswarm::Connection.new(self, name, args, &block)
38
47
  end
48
+ end
49
+
50
+ def before(connection, *guards, &block)
51
+ Qswarm.logger.info "[#{@name.inspect}] Registering :before filter for #{connection.inspect}/#{guards.inspect}"
39
52
 
40
- self.instance_eval(&block)
53
+ [*connection].each do |c|
54
+ register_filter :before, c, *guards, &block
55
+ end
41
56
  end
42
57
 
43
- def listen(name, args = nil, &block)
44
- logger.info "Registering listener: #{name}"
45
- @listeners << Qswarm::Listener.new(self, name, args, &block)
58
+ def after(connection, *guards, &block)
59
+ Qswarm.logger.info "[#{@name.inspect}] Registering :after filter for #{connection.inspect}/#{guards.inspect}"
60
+
61
+ [*connection].each do |c|
62
+ register_filter :after, c, *guards, &block
63
+ end
46
64
  end
47
65
 
48
- def broker(name, &block)
49
- logger.info "Registering broker: #{name}"
50
- @brokers[name] = Qswarm::Broker.new(name, &block)
66
+ def source(connection, *guards, &block)
67
+ Qswarm.logger.info "[#{@name.inspect}] Registering handler for #{connection.inspect}/#{guards.inspect}"
68
+
69
+ [*connection].each do |c|
70
+ register_handler c, *guards, &block
71
+ end
51
72
  end
52
73
 
53
- def get_broker(name)
54
- @brokers[name] || @swarm.get_broker(name)
74
+ def sink(connection, args = nil, &block)
75
+ Qswarm.logger.debug "[#{@name.inspect}] Sink #{connection.inspect} received #{@payload.inspect}"
76
+
77
+ # Payload from DSL parent context overidden by arguments and block locally to sink
78
+ p = @payload.dup
79
+ p.data = args[:data] unless args.nil? || args[:data].nil?
80
+ p.data = dsl_call(&block) if block_given?
81
+
82
+ [*connection].each do |c|
83
+ # Update raw from the current data
84
+ p.raw = case args[:format].nil? ? @clients[c].format : args[:format]
85
+ when :json
86
+ JSON.generate(p.data)
87
+ when :xml
88
+ p.data.to_xml
89
+ else # raw
90
+ p.data
91
+ end unless args.nil?
92
+
93
+ @clients[c].sink(args, p)
94
+ end
55
95
  end
56
96
 
57
- def bind
58
- logger.info "Binding to exchange"
97
+ def emit(connection, args = nil, &block)
98
+ # Need to set @payload for access by the dsl_call when handlers are run
99
+ # Overwriting global parent this will break when nesting emit in source which will loose the payload originating from connection
100
+ @payload = args[:payload] unless args.nil? || args[:payload].nil?
101
+ @payload = dsl_call(&block) if block_given?
102
+
103
+ Qswarm.logger.debug "[#{@name.inspect}] Connection #{connection.inspect} emitting #{@payload.inspect}"
104
+
105
+ @payload.data = case payload.format
106
+ when :json
107
+ JSON.parse(@payload.raw, :symbolize_names => true)
108
+ when :xml
109
+ Nokogiri::XML(@payload.raw)
110
+ else # :raw
111
+ @payload.raw
112
+ end
113
+
114
+ [*connection].each do |c|
115
+ run_filters :before, c
116
+ call_handlers c
117
+ run_filters :after, c
118
+ end
59
119
  end
60
120
 
61
121
  def run
62
- @listeners.map { |l| l.run }
122
+ @clients.each { |name, client| client.run }
123
+ end
124
+
125
+ private
126
+
127
+ def run_filters(type, connection)
128
+ return if @filters[type].nil?
129
+ @filters[type].each do |guards, client, block|
130
+ next if client != connection
131
+ dsl_call(&block) unless guarded?(guards, OpenStruct.new(@payload.headers))
132
+ end
133
+ end
134
+
135
+ def call_handlers(connection)
136
+ return if !handlers = @handlers[connection]
137
+ handlers.each do |guards, block|
138
+ if !guarded?(guards, OpenStruct.new(@payload.headers))
139
+ Qswarm.logger.debug "[#{@name.inspect}] Source #{connection.inspect} received #{@payload.inspect}"
140
+ dsl_call(&block)
141
+ end
142
+ end
143
+ end
144
+
145
+ def register_filter(type, client, *guards, &block)
146
+ raise "Invalid filter: #{type}. Must be :before or :after" unless [:before, :after].include? type
147
+ @filters[type] ||= []
148
+ @filters[type] << [guards, client, block]
149
+ end
150
+
151
+ def register_handler(client, *guards, &block)
152
+ @handlers[client] ||= []
153
+ @handlers[client] << [guards, block]
63
154
  end
155
+
156
+ def guarded?(guards, data)
157
+ return false if guards.nil? || guards.empty?
158
+ guards.find do |guard|
159
+ case guard
160
+ when Symbol
161
+ !data.__send__(guard)
162
+ when Array
163
+ # return FALSE if any item is TRUE
164
+ !guard.detect { |condition| !guarded?([condition], data) }
165
+ when Hash
166
+ # return FALSE unless any inequality is found
167
+ guard.find do |method, test|
168
+ value = data.__send__(method)
169
+ # last_match is the only method found unique to Regexp classes
170
+ if test.class.respond_to?(:last_match)
171
+ !(test =~ value.to_s)
172
+ elsif test.is_a?(Array)
173
+ !test.include? value
174
+ else
175
+ test != value
176
+ end
177
+ end
178
+ when Proc
179
+ !guard.call(data)
180
+ end
181
+ end
182
+ end
183
+
64
184
  end
65
185
  end