flamingo 0.1 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +115 -20
- data/examples/Rakefile +15 -8
- data/examples/flamingo.yml +19 -7
- data/lib/flamingo.rb +9 -0
- data/lib/flamingo/daemon/child_process.rb +15 -0
- data/lib/flamingo/daemon/flamingod.rb +54 -7
- data/lib/flamingo/daemon/trap_keeper.rb +22 -0
- data/lib/flamingo/daemon/wader_process.rb +51 -2
- data/lib/flamingo/dispatch_event.rb +17 -16
- data/lib/flamingo/stream_params.rb +15 -16
- data/lib/flamingo/subscription.rb +8 -11
- data/lib/flamingo/version.rb +1 -1
- data/lib/flamingo/wader.rb +127 -35
- metadata +22 -4
data/README.md
CHANGED
@@ -2,9 +2,13 @@ Flamingo
|
|
2
2
|
========
|
3
3
|
Flamingo is a resque-based system for handling the Twitter Streaming API.
|
4
4
|
|
5
|
-
This is *early alpha* code.
|
6
|
-
|
7
|
-
|
5
|
+
This is *early alpha* code. Parts of it are graceful, like the curve of a
|
6
|
+
flamingo's neck: it capably processes the multiple high-volume sample and filter
|
7
|
+
streams that power tweetreach.com. Many parts of it are ungainly, like a
|
8
|
+
flamingo's knees: this is early code, and it will change rapidly. And parts of
|
9
|
+
it are mired in muck, like a flamingo's feet: it has too few tests, and surely
|
10
|
+
some configuration we forgot to tell you about. That said, it does work: give it
|
11
|
+
a try if you have the need.
|
8
12
|
|
9
13
|
Dependencies
|
10
14
|
------------
|
@@ -13,6 +17,8 @@ Dependencies
|
|
13
17
|
* sinatra
|
14
18
|
* twitter-stream
|
15
19
|
* yajl-ruby
|
20
|
+
* active_support
|
21
|
+
* redis-namespace
|
16
22
|
|
17
23
|
By default, the `resque` gem installs the latest 2.x `redis` gem, so if
|
18
24
|
you are using Redis 1.x, you may want to swap it out.
|
@@ -30,44 +36,133 @@ Getting Started
|
|
30
36
|
1. Install the gem
|
31
37
|
sudo gem install flamingo
|
32
38
|
|
33
|
-
2. Create a config file (see `examples/flamingo.yml`) with at least a username
|
39
|
+
2. Create a config file (see `examples/flamingo.yml`) with at least a username
|
40
|
+
and password. You can store this in ~/flamingo.yml or specify it on the
|
41
|
+
commandline (see below)
|
34
42
|
|
35
|
-
username:
|
43
|
+
username: SCREEN_NAME
|
36
44
|
password: PASSWORD
|
37
|
-
|
45
|
+
|
46
|
+
# should be "filter" or "sample", probably.
|
47
|
+
# Set the track terms for "filter" from the flamingo console (see README)
|
48
|
+
stream: filter
|
49
|
+
|
38
50
|
logging:
|
39
|
-
dest:
|
40
|
-
level:
|
51
|
+
dest: /tmp/flamingo.log
|
52
|
+
level: DEBUG
|
53
|
+
|
54
|
+
redis:
|
55
|
+
host: 0.0.0.0:6379
|
56
|
+
web:
|
57
|
+
host: 0.0.0.0:4711
|
41
58
|
|
42
59
|
`LOGLEVEL` is one of the following:
|
43
60
|
`DEBUG` < `INFO` < `WARN` < `ERROR` < `FATAL` < `UNKNOWN`
|
44
61
|
|
45
|
-
3. Start the Redis server
|
62
|
+
3. Start the Redis server, and (optionally) open the resque web dashboard:
|
46
63
|
|
47
64
|
$ redis-server
|
65
|
+
$ resque-web
|
66
|
+
|
67
|
+
4. To set your tracking terms and start the queue subscription, jump into the `flamingo` client (installed during `gem install`):
|
48
68
|
|
49
|
-
|
69
|
+
$ flamingo path/to/flamingo.yml
|
70
|
+
|
71
|
+
5. This is a regular-old irb console, so anything ruby goes. First, register the terms you'd like to search on. This doesn't have a direct effect: it just pokes the values into the database so that the wader knows what to listen for.
|
50
72
|
|
51
|
-
$ flamingo
|
52
73
|
>> s = Stream.get(:filter)
|
53
|
-
>> s.params[:track] =
|
54
|
-
|
74
|
+
>> s.params[:track] = ["@cheaptweet", "austin", "#etsy"]
|
75
|
+
|
76
|
+
For now, use those three actual terms -- they'll give you a nice, testable receipt rate that is neither too slow ('...is this thing on?') nor torrential (you can watch the sream from your terminal window). Also note that you don't have to escape the tracking terms: twitter-stream will handle all that.
|
77
|
+
|
78
|
+
6. Your second task from the flamingo console is to route the incoming tweets onto a queue -- in this case the EXAMPLE queue. This is used by the flamingod we'll start next but has no direct effect now.
|
79
|
+
|
80
|
+
>> Subscription.new('EXAMPLE').save
|
55
81
|
|
56
|
-
|
82
|
+
7. Start the Flamingo Daemon (`flamingod` installed during `gem install`), and also start watching its log file:
|
57
83
|
|
58
|
-
$ flamingod -c
|
84
|
+
$ flamingod -c path/to/flamingo.yml
|
85
|
+
$ tail -f /tmp/flamingo.log
|
59
86
|
|
87
|
+
If things go well, you'll see something like
|
88
|
+
|
89
|
+
[2010-07-20 05:58:07, INFO] - Loaded config file from flamingo.yml
|
90
|
+
[2010-07-20 05:58:07, INFO] - Flamingod starting children
|
91
|
+
[2010-07-20 05:58:07, INFO] - Flamingod starting new wader
|
92
|
+
[2010-07-20 05:58:07, INFO] - Flamingod starting new dispatcher
|
93
|
+
[2010-07-20 05:58:07, INFO] - Flamingod starting new web server
|
94
|
+
[2010-07-20 05:58:07, INFO] - Starting wader on pid=91008 under pid=91003
|
95
|
+
[2010-07-20 05:58:07, INFO] - Starting dispatcher on pid=91009 under pid=91003
|
96
|
+
[2010-07-20 05:58:12, INFO] - Listening on stream: /1/statuses/filter.json?track=%23etsy,austin,cheaptweet
|
97
|
+
... short initial delay ....
|
98
|
+
[2010-07-20 05:58:42, DEBUG] - Wader dispatched event
|
99
|
+
[2010-07-20 05:58:42, DEBUG] - Put job on subscription queue EXAMPLE for {"text":If you ever visit Austin make sure to go to Torchy's Tacos",...
|
100
|
+
|
101
|
+
On the resque-web dashboard, you should see a queue come up called EXAMPLE, with jobs accruing. There will only be 0 of 0 workers working: let's fix that
|
60
102
|
|
61
|
-
|
103
|
+
8. You'll consume those events with a resque worker, something like the following but more audacious:
|
62
104
|
|
63
105
|
class HandleFlamingoEvent
|
64
|
-
|
65
106
|
# type: One of "tweet" or "delete"
|
66
107
|
# event: a hash of the json data from twitter
|
67
108
|
def self.perform(type,event)
|
68
|
-
# Do stuff with the data
|
109
|
+
# Do stuff with the data, probably something more interesting than this:
|
110
|
+
puts [type, event].inspect
|
69
111
|
end
|
70
|
-
|
71
112
|
end
|
113
|
+
|
114
|
+
9. Start the worker task (see `examples/Rakefile`):
|
72
115
|
|
73
|
-
$ QUEUE=
|
116
|
+
$ QUEUE=EXAMPLE rake -t examples/Rakefile resque:work
|
117
|
+
|
118
|
+
Two things should now happen:
|
119
|
+
* The pent-up jobs from the EXAMPLE queue should spray across your console
|
120
|
+
* The resque dashboard should show the queue being emptied as a result
|
121
|
+
|
122
|
+
|
123
|
+
|
124
|
+
Overview
|
125
|
+
--------
|
126
|
+
|
127
|
+
Flamingo uses EventMachine, sinatra and the twitter-stream API library to
|
128
|
+
efficiently route and process stream and dispatch events. Here are the
|
129
|
+
components of the flamingo flock:
|
130
|
+
|
131
|
+
*flamingo daemon (flamingod)*
|
132
|
+
|
133
|
+
Coordinates the wader process (initiates stream request, pushes each response
|
134
|
+
into the queue), the Sinatra webserver (handles subscriptions and changing
|
135
|
+
stream parameters), and a set of dispatchers (routes responses).
|
136
|
+
|
137
|
+
You can control flamingod with the following signals:
|
138
|
+
|
139
|
+
* TERM and INT will kill the flamingod parent process, and signal each child with TERM
|
140
|
+
* USR1 will restart the wader gracefully. This is used to change stream parameters
|
141
|
+
|
142
|
+
*wader*
|
143
|
+
|
144
|
+
The wader process starts the stream and dispatches stream responses as they arrive into a Resque queue.
|
145
|
+
|
146
|
+
*web server*
|
147
|
+
|
148
|
+
The flamingo webserver code creates and manages stream requests using a
|
149
|
+
lightweight Sinatra responder.
|
150
|
+
|
151
|
+
*workers*
|
152
|
+
|
153
|
+
This is the part you write. These are standard resque workers, living on one or
|
154
|
+
many machines, doing anything that your heart can imagine and your fingers can
|
155
|
+
code.
|
156
|
+
|
157
|
+
|
158
|
+
TODO
|
159
|
+
-----
|
160
|
+
* OAuth instructions
|
161
|
+
|
162
|
+
|
163
|
+
Flamingo
|
164
|
+
--------
|
165
|
+
|
166
|
+
Here is a photo of a flamingo:
|
167
|
+
|
168
|
+
![Flamingo!](http://farm4.static.flickr.com/3438/3302580937_0ec540b73e_z_d.jpg "Flamingo Photo by William Warby, CC-BY License: http://www.flickr.com/photos/wwarby/3302580937 :: photo taken 21 Feb 2009 in Dagnall, England.")
|
data/examples/Rakefile
CHANGED
@@ -1,15 +1,22 @@
|
|
1
|
-
#
|
2
|
-
# to STDOUT
|
1
|
+
#
|
2
|
+
# Simple example: reads from a subscription queue, writes the events to STDOUT
|
3
|
+
#
|
3
4
|
# Usage (from this directory):
|
4
|
-
# $ QUEUE=
|
5
|
+
# $ QUEUE=EXAMPLES rake resque:work
|
5
6
|
|
6
7
|
require 'rubygems'
|
7
8
|
require 'resque/tasks'
|
8
9
|
|
9
10
|
class HandleFlamingoEvent
|
10
|
-
|
11
|
-
|
12
|
-
|
11
|
+
|
12
|
+
#
|
13
|
+
# type: One of "tweet" or "delete"
|
14
|
+
# event: a hash of the json data from twitter
|
15
|
+
#
|
16
|
+
#def self.perform(type, event_info, event)
|
17
|
+
def self.perform(type, event)
|
18
|
+
# Do stuff with the data, probably something more interesting than this:
|
19
|
+
puts [type, event].inspect
|
13
20
|
end
|
14
|
-
|
15
|
-
end
|
21
|
+
|
22
|
+
end
|
data/examples/flamingo.yml
CHANGED
@@ -1,10 +1,22 @@
|
|
1
|
-
username:
|
2
|
-
password:
|
3
|
-
|
1
|
+
username: SCREEN_NAME
|
2
|
+
password: PASSWORD
|
3
|
+
|
4
|
+
# either "filter" or "sample"
|
5
|
+
# For filter, set the terms to track in the flamingo console (see README.md)
|
6
|
+
stream: filter
|
7
|
+
|
8
|
+
# Point the logs where you like.
|
9
|
+
# Should change the log level from DEBUG to INFO before you deploy: allowed levels are
|
10
|
+
# DEBUG < INFO < WARN < ERROR < FATAL < UNKNOWN
|
4
11
|
logging:
|
5
|
-
dest:
|
6
|
-
level:
|
12
|
+
dest: /tmp/flamingo.log
|
13
|
+
level: DEBUG
|
14
|
+
|
15
|
+
# Where is the redis server the flamingod processes should connect to?
|
7
16
|
redis:
|
8
|
-
host:
|
17
|
+
host: 0.0.0.0:6379
|
18
|
+
|
19
|
+
# What port and interface should the flamingod web_server listen on?
|
20
|
+
# use 0.0.0.0 for all interfaces, 127.0.0.1 to listen on only localhost
|
9
21
|
web:
|
10
|
-
host:
|
22
|
+
host: 0.0.0.0:4711
|
data/lib/flamingo.rb
CHANGED
@@ -17,6 +17,7 @@ require 'flamingo/stream_params'
|
|
17
17
|
require 'flamingo/stream'
|
18
18
|
require 'flamingo/subscription'
|
19
19
|
require 'flamingo/wader'
|
20
|
+
require 'flamingo/daemon/trap_keeper'
|
20
21
|
require 'flamingo/daemon/pid_file'
|
21
22
|
require 'flamingo/daemon/child_process'
|
22
23
|
require 'flamingo/daemon/dispatcher_process'
|
@@ -37,6 +38,10 @@ module Flamingo
|
|
37
38
|
logger.info "Loaded config file from #{config_file}"
|
38
39
|
end
|
39
40
|
|
41
|
+
def config=(config)
|
42
|
+
@config = config
|
43
|
+
end
|
44
|
+
|
40
45
|
def config
|
41
46
|
@config
|
42
47
|
end
|
@@ -98,6 +103,10 @@ module Flamingo
|
|
98
103
|
@logger ||= new_logger
|
99
104
|
end
|
100
105
|
|
106
|
+
def logger=(logger)
|
107
|
+
@logger = logger
|
108
|
+
end
|
109
|
+
|
101
110
|
private
|
102
111
|
def root_dir
|
103
112
|
File.expand_path(File.dirname(__FILE__)+'/..')
|
@@ -1,6 +1,10 @@
|
|
1
1
|
module Flamingo
|
2
2
|
module Daemon
|
3
3
|
class ChildProcess
|
4
|
+
|
5
|
+
# For process-scoping of traps
|
6
|
+
include TrapKeeper
|
7
|
+
|
4
8
|
attr_accessor :pid
|
5
9
|
|
6
10
|
def kill(sig)
|
@@ -8,6 +12,17 @@ module Flamingo
|
|
8
12
|
end
|
9
13
|
alias_method :signal, :kill
|
10
14
|
|
15
|
+
def running?
|
16
|
+
# Borrowed from daemons gem
|
17
|
+
Process.kill(0, pid)
|
18
|
+
return true
|
19
|
+
rescue Errno::ESRCH
|
20
|
+
return false
|
21
|
+
rescue ::Exception
|
22
|
+
# for example on EPERM (process exists but does not belong to us)
|
23
|
+
return true
|
24
|
+
end
|
25
|
+
|
11
26
|
def start
|
12
27
|
self.pid = fork { run }
|
13
28
|
end
|
@@ -1,7 +1,25 @@
|
|
1
1
|
module Flamingo
|
2
2
|
module Daemon
|
3
|
+
#
|
4
|
+
# Flamingod is the main overseer of the Flamingo flock.
|
5
|
+
#
|
6
|
+
# Starts three sets of children:
|
7
|
+
#
|
8
|
+
# * A wader process: initiates stream request, pushes each response into the queue
|
9
|
+
# * A Sinatra server: lightweight responder to create and manage subscriptions
|
10
|
+
# * A set of dispatchers: worker processes that handle each stream response.
|
11
|
+
#
|
12
|
+
# You can control the flamingod with the following signals:
|
13
|
+
#
|
14
|
+
# * TERM and INT will kill the flamingod parent process, and signal each
|
15
|
+
# child with TERM
|
16
|
+
# * USR1 will restart the wader gracefully.
|
17
|
+
#
|
3
18
|
class Flamingod
|
4
|
-
|
19
|
+
|
20
|
+
# For process-scoping of traps
|
21
|
+
include TrapKeeper
|
22
|
+
|
5
23
|
def exit_signaled?
|
6
24
|
@exit_signaled
|
7
25
|
end
|
@@ -40,14 +58,27 @@ module Flamingo
|
|
40
58
|
end
|
41
59
|
|
42
60
|
def restart_wader
|
43
|
-
|
44
|
-
|
61
|
+
if @wader
|
62
|
+
Flamingo.logger.info "Flamingod restarting wader pid=#{@wader.pid} with SIGINT"
|
63
|
+
@wader.kill("INT")
|
64
|
+
else
|
65
|
+
Flamingo.logger.info "Wader is not started. Attempting to start new wader."
|
66
|
+
@wader = start_new_wader
|
67
|
+
end
|
45
68
|
end
|
46
69
|
|
47
70
|
def signal_children(sig)
|
48
71
|
pids = (children.map {|c| c.pid}).join(",")
|
49
72
|
Flamingo.logger.info "Flamingod sending SIG#{sig} to pids=#{pids}"
|
50
|
-
children.each
|
73
|
+
children.each do |child|
|
74
|
+
if child.running?
|
75
|
+
begin
|
76
|
+
child.signal(sig)
|
77
|
+
rescue => e
|
78
|
+
Flamingo.logger.info "Failure sending SIG#{sig} to child #{child.pid}: #{e}"
|
79
|
+
end
|
80
|
+
end
|
81
|
+
end
|
51
82
|
end
|
52
83
|
|
53
84
|
def terminate!
|
@@ -57,7 +88,7 @@ module Flamingo
|
|
57
88
|
end
|
58
89
|
|
59
90
|
def children
|
60
|
-
[@wader,@web_server] + @dispatchers
|
91
|
+
([@wader,@web_server] + @dispatchers).compact
|
61
92
|
end
|
62
93
|
|
63
94
|
def start_children
|
@@ -67,12 +98,17 @@ module Flamingo
|
|
67
98
|
@web_server = start_new_web_server
|
68
99
|
end
|
69
100
|
|
101
|
+
#
|
102
|
+
# Unless signaled externally, waits in an endless loop. If any child
|
103
|
+
# process terminates, it restarts that process.
|
104
|
+
# TODO Needs intelligent behavior so we don't get endless loops
|
70
105
|
def wait_on_children()
|
71
106
|
until exit_signaled?
|
72
107
|
child_pid = Process.wait(-1)
|
108
|
+
child_status = $?
|
73
109
|
unless exit_signaled?
|
74
|
-
if @wader.pid == child_pid
|
75
|
-
|
110
|
+
if @wader && @wader.pid == child_pid
|
111
|
+
handle_wader_exit(child_status)
|
76
112
|
elsif @web_server.pid == child_pid
|
77
113
|
@web_server = start_new_web_server
|
78
114
|
elsif (to_delete = @dispatchers.find{|d| d.pid == child_pid})
|
@@ -84,6 +120,17 @@ module Flamingo
|
|
84
120
|
end
|
85
121
|
end
|
86
122
|
end
|
123
|
+
|
124
|
+
def handle_wader_exit(status)
|
125
|
+
if WaderProcess.fatal_exit?(status)
|
126
|
+
Flamingo.logger.error "Wader exited with status "+
|
127
|
+
"#{status.exitstatus} and cannot be automatically restarted"
|
128
|
+
$stderr.write("Wader exited with fatal error. Check the the log.")
|
129
|
+
terminate!
|
130
|
+
else
|
131
|
+
@wader = start_new_wader
|
132
|
+
end
|
133
|
+
end
|
87
134
|
|
88
135
|
def run_as_daemon
|
89
136
|
pid_file = PidFile.new
|
@@ -0,0 +1,22 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Daemon
|
3
|
+
module TrapKeeper
|
4
|
+
|
5
|
+
# Use instead of Kernel.trap to ensure that only the process that
|
6
|
+
# originally registered the trap has its block executed. This is necessary
|
7
|
+
# for cases where we fork after setting up traps since the child process
|
8
|
+
# gets the traps from the parent.
|
9
|
+
def trap(signal,&block)
|
10
|
+
owner_pid = Process.pid
|
11
|
+
Kernel.trap(signal) do
|
12
|
+
if Process.pid == owner_pid
|
13
|
+
block.call
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
module_function :trap
|
19
|
+
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
@@ -1,6 +1,29 @@
|
|
1
1
|
module Flamingo
|
2
2
|
module Daemon
|
3
3
|
class WaderProcess < ChildProcess
|
4
|
+
|
5
|
+
# Exit codes
|
6
|
+
EXIT_CLEAN = 0
|
7
|
+
|
8
|
+
# Non-fatal exit code - For transient network errors where a retry is
|
9
|
+
# likely to resolve the problem
|
10
|
+
EXIT_UNKNOWN_ERROR = 001
|
11
|
+
EXIT_MAX_RECONNECTS = 002
|
12
|
+
EXIT_SERVER_UNAVAILABLE = 003
|
13
|
+
|
14
|
+
# 1XX is a fatal exit code - Human intervention or a configuration change
|
15
|
+
# is necessary to get the wader started
|
16
|
+
EXIT_FATAL_RANGE = 100..199
|
17
|
+
EXIT_AUTHENTICATION = 100
|
18
|
+
EXIT_UNKNOWN_STREAM = 101
|
19
|
+
EXIT_INVALID_PARAMS = 102
|
20
|
+
|
21
|
+
class << self
|
22
|
+
def fatal_exit?(status)
|
23
|
+
status && EXIT_FATAL_RANGE.include?(status.exitstatus)
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
4
27
|
def register_signal_handlers
|
5
28
|
trap("INT") { stop }
|
6
29
|
end
|
@@ -16,13 +39,39 @@ module Flamingo
|
|
16
39
|
|
17
40
|
@wader = Flamingo::Wader.new(screen_name,password,stream)
|
18
41
|
Flamingo.logger.info "Starting wader on pid=#{Process.pid} under pid=#{Process.ppid}"
|
19
|
-
|
20
|
-
|
42
|
+
|
43
|
+
exit_code = EXIT_CLEAN
|
44
|
+
begin
|
45
|
+
@wader.run
|
46
|
+
rescue => e
|
47
|
+
exit_code = error_exit_code(e)
|
48
|
+
end
|
49
|
+
|
50
|
+
Flamingo.logger.info "Wader pid=#{Process.pid} exited with code #{exit_code}"
|
51
|
+
exit(exit_code)
|
21
52
|
end
|
22
53
|
|
23
54
|
def stop
|
24
55
|
@wader.stop
|
25
56
|
end
|
57
|
+
|
58
|
+
private
|
59
|
+
def error_exit_code(ex)
|
60
|
+
case ex
|
61
|
+
when Flamingo::Wader::AuthenticationError
|
62
|
+
then EXIT_AUTHENTICATION
|
63
|
+
when Flamingo::Wader::UnknownStreamError
|
64
|
+
then EXIT_UNKNOWN_STREAM
|
65
|
+
when Flamingo::Wader::InvalidParametersError
|
66
|
+
then EXIT_INVALID_PARAMS
|
67
|
+
when Flamingo::Wader::MaxReconnectsExceededError
|
68
|
+
then EXIT_MAX_RECONNECTS
|
69
|
+
when Flamingo::Wader::ServerUnavailableError
|
70
|
+
then EXIT_SERVER_UNAVAILABLE
|
71
|
+
else
|
72
|
+
EXIT_UNKNOWN_ERROR
|
73
|
+
end
|
74
|
+
end
|
26
75
|
end
|
27
76
|
end
|
28
77
|
end
|
@@ -1,40 +1,41 @@
|
|
1
1
|
module Flamingo
|
2
2
|
class DispatchEvent
|
3
|
-
|
3
|
+
|
4
4
|
@queue = :flamingo
|
5
5
|
@parser = Yajl::Parser.new(:symbolize_keys => true)
|
6
|
-
|
6
|
+
|
7
7
|
class << self
|
8
|
-
|
8
|
+
|
9
|
+
#
|
10
|
+
# TODO Track stats including: tweets per second and last tweet time
|
11
|
+
# TODO Provide some first-level check for repeated status ids
|
12
|
+
# TODO Consider subscribers for receiving particular terms - do the heavy
|
13
|
+
# lifting of parsing tweets and delivering them to particular subscribers
|
14
|
+
# TODO Consider window of tweets (approx 3 seconds) and sort before
|
15
|
+
# dispatching to improve in-order delivery (helps with "k-sorted")
|
16
|
+
#
|
9
17
|
def perform(event_json)
|
10
|
-
#TODO Track stats including: tweets per second and last tweet time
|
11
|
-
#TODO Provide some first-level check for repeated status ids
|
12
|
-
#TODO Consider subscribers for receiving particular terms - do the heavy
|
13
|
-
# lifting of parsing tweets and delivering them to particular subscribers
|
14
|
-
#TODO Consider window of tweets (approx 3 seconds) and sort before
|
15
|
-
# dispatching to improve in-order delivery (helps with "k-sorted")
|
16
18
|
type, event = typed_event(parse(event_json))
|
17
|
-
# Flamingo.logger.info Flamingo.router.destinations(type,event).inspect
|
18
19
|
Subscription.all.each do |sub|
|
19
20
|
Resque::Job.create(sub.name, "HandleFlamingoEvent", type, event)
|
20
21
|
Flamingo.logger.debug "Put job on subscription queue #{sub.name} for #{event_json}"
|
21
22
|
end
|
22
23
|
end
|
23
|
-
|
24
|
+
|
24
25
|
def parse(json)
|
25
26
|
@parser.parse(json)
|
26
27
|
end
|
27
|
-
|
28
|
+
|
28
29
|
def typed_event(event)
|
29
30
|
if event[:delete]
|
30
|
-
[:delete,event[:delete]]
|
31
|
+
[:delete, event[:delete]]
|
31
32
|
elsif event[:link]
|
32
|
-
[:link,event[:link]]
|
33
|
+
[:link, event[:link]]
|
33
34
|
else
|
34
|
-
[:tweet,event]
|
35
|
+
[:tweet, event]
|
35
36
|
end
|
36
37
|
end
|
37
|
-
|
38
|
+
|
38
39
|
end
|
39
40
|
end
|
40
41
|
end
|
@@ -1,41 +1,42 @@
|
|
1
1
|
module Flamingo
|
2
|
-
|
2
|
+
#
|
3
|
+
# Facade for redis:
|
4
|
+
# database object that behaves like a hash
|
5
|
+
#
|
3
6
|
class StreamParams
|
4
|
-
|
5
7
|
include Enumerable
|
6
|
-
|
7
8
|
attr_accessor :stream_name
|
8
|
-
|
9
|
+
|
9
10
|
def initialize(stream_name)
|
10
11
|
self.stream_name = stream_name
|
11
12
|
end
|
12
|
-
|
13
|
+
|
13
14
|
def set(key,*values)
|
14
15
|
delete(key)
|
15
16
|
add(key,*values)
|
16
17
|
end
|
17
|
-
|
18
|
+
|
18
19
|
def []=(key,values)
|
19
20
|
values = [values] unless values.is_a?(Array)
|
20
21
|
set(key,*values)
|
21
22
|
end
|
22
|
-
|
23
|
+
|
23
24
|
def add(key,*values)
|
24
25
|
values.each do |value|
|
25
26
|
Flamingo.redis.sadd redis_key(key), value
|
26
27
|
end
|
27
28
|
end
|
28
|
-
|
29
|
+
|
29
30
|
def remove(key,*values)
|
30
31
|
values.each do |value|
|
31
32
|
Flamingo.redis.srem redis_key(key), value
|
32
33
|
end
|
33
34
|
end
|
34
|
-
|
35
|
+
|
35
36
|
def delete(key)
|
36
37
|
Flamingo.redis.del redis_key(key)
|
37
38
|
end
|
38
|
-
|
39
|
+
|
39
40
|
def get(key)
|
40
41
|
Flamingo.redis.smembers redis_key(key)
|
41
42
|
end
|
@@ -53,22 +54,20 @@ module Flamingo
|
|
53
54
|
h
|
54
55
|
end
|
55
56
|
end
|
56
|
-
|
57
|
+
|
57
58
|
def each
|
58
59
|
keys.each do |key|
|
59
60
|
yield(key,get(key))
|
60
61
|
end
|
61
62
|
end
|
62
|
-
|
63
|
+
|
63
64
|
private
|
64
65
|
def redis_key_pattern
|
65
66
|
"streams/#{stream_name}?*"
|
66
67
|
end
|
67
|
-
|
68
|
+
|
68
69
|
def redis_key(key)
|
69
70
|
"streams/#{stream_name}?#{key}"
|
70
71
|
end
|
71
|
-
|
72
72
|
end
|
73
|
-
|
74
|
-
end
|
73
|
+
end
|
@@ -1,9 +1,9 @@
|
|
1
1
|
module Flamingo
|
2
|
-
|
2
|
+
#
|
3
|
+
# Track stream subscriptions in the Redis db.
|
4
|
+
#
|
3
5
|
class Subscription
|
4
|
-
|
5
6
|
class << self
|
6
|
-
|
7
7
|
def all
|
8
8
|
Flamingo.redis.smembers("subscriptions").map do |name|
|
9
9
|
new(name)
|
@@ -15,25 +15,22 @@ module Flamingo
|
|
15
15
|
Subscription.new(name)
|
16
16
|
end
|
17
17
|
end
|
18
|
-
|
19
18
|
end
|
20
|
-
|
19
|
+
|
21
20
|
attr_accessor :name
|
22
|
-
|
23
21
|
def initialize(name)
|
24
|
-
self.name = name
|
22
|
+
self.name = name
|
25
23
|
end
|
26
|
-
|
24
|
+
|
27
25
|
def save
|
28
26
|
Flamingo.logger.info("Adding #{name} to subscriptions")
|
29
27
|
Flamingo.redis.sadd("subscriptions",name)
|
30
28
|
end
|
31
|
-
|
29
|
+
|
32
30
|
def delete
|
33
31
|
Flamingo.logger.info("Removing #{name} from subscriptions")
|
34
32
|
Flamingo.redis.srem("subscriptions",name)
|
35
33
|
end
|
36
|
-
|
34
|
+
|
37
35
|
end
|
38
|
-
|
39
36
|
end
|
data/lib/flamingo/version.rb
CHANGED
data/lib/flamingo/wader.rb
CHANGED
@@ -1,57 +1,149 @@
|
|
1
1
|
module Flamingo
|
2
2
|
class Wader
|
3
3
|
|
4
|
-
|
4
|
+
class WaderError < StandardError
|
5
|
+
end
|
6
|
+
|
7
|
+
class HttpStatusError < WaderError
|
8
|
+
|
9
|
+
attr_accessor :code
|
10
|
+
|
11
|
+
def initialize(message,code)
|
12
|
+
super(message)
|
13
|
+
self.code = code
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
# Errors from certain HTTP Statuses
|
18
|
+
class AuthenticationError < HttpStatusError; end
|
19
|
+
class UnknownStreamError < HttpStatusError; end
|
20
|
+
class InvalidParametersError < HttpStatusError; end
|
5
21
|
|
22
|
+
# Fatal error from too many reconnection attempts
|
23
|
+
class MaxReconnectsExceededError < WaderError; end
|
24
|
+
|
25
|
+
# Raised if the server is just not available, e.g. Twitter is down
|
26
|
+
class ServerUnavailableError < WaderError; end
|
27
|
+
|
28
|
+
attr_accessor :screen_name, :password, :stream, :connection,
|
29
|
+
:server_unavailable_max_retries,
|
30
|
+
:server_unavailable_wait,
|
31
|
+
:server_unavailable_retries
|
32
|
+
|
6
33
|
def initialize(screen_name,password,stream)
|
7
34
|
self.screen_name = screen_name
|
8
35
|
self.password = password
|
9
36
|
self.stream = stream
|
37
|
+
self.server_unavailable_max_retries = 5
|
38
|
+
self.server_unavailable_wait = 60
|
10
39
|
end
|
11
|
-
|
40
|
+
|
41
|
+
#
|
42
|
+
# The main EventMachine run loop
|
43
|
+
#
|
44
|
+
# Start the stream listener (using twitter-stream, http://github.com/voloko/twitter-stream)
|
45
|
+
# Listen for responses and errors;
|
46
|
+
# dispatch each for later handling
|
47
|
+
#
|
12
48
|
def run
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
49
|
+
self.server_unavailable_retries = 0
|
50
|
+
begin
|
51
|
+
connect_and_run
|
52
|
+
rescue => e
|
53
|
+
# This is largely to get around a bug in Twitter-Stream that should
|
54
|
+
# be fixed in the next release. If the server is just not there on
|
55
|
+
# the first try, it blows up. Hopefully this code can be removed after
|
56
|
+
# that release.
|
57
|
+
Flamingo.logger.warn "Failure initiating connection. Most likely "+
|
58
|
+
"because server is unavailable.\n#{e}\n#{e.backtrace.join("\n\t")}"
|
59
|
+
if server_unavailable_retries < server_unavailable_max_retries
|
60
|
+
sleep(server_unavailable_wait)
|
61
|
+
self.server_unavailable_retries += 1
|
62
|
+
retry
|
63
|
+
else
|
64
|
+
raise ServerUnavailableError.new
|
19
65
|
end
|
20
|
-
|
21
|
-
|
22
|
-
dispatch_error(:generic,message)
|
23
|
-
end
|
24
|
-
|
25
|
-
connection.on_reconnect do |timeout, retries|
|
26
|
-
dispatch_error(:reconnection,
|
27
|
-
"Will reconnect after #{timeout}. Retry \##{retries}",
|
28
|
-
{:timeout=>timeout,:retries=>retries}
|
29
|
-
)
|
30
|
-
end
|
31
|
-
|
32
|
-
connection.on_max_reconnects do |timeout, retries|
|
33
|
-
dispatch_error(:fatal,
|
34
|
-
"Failed to reconnect after #{retries} retries",
|
35
|
-
{:timeout=>timeout,:retries=>retries}
|
36
|
-
)
|
37
|
-
end
|
38
|
-
end
|
66
|
+
end
|
67
|
+
raise @error if @error
|
39
68
|
end
|
40
|
-
|
69
|
+
|
70
|
+
def retries
|
71
|
+
if connection
|
72
|
+
# This is weird but necessary because twitter-stream increments the
|
73
|
+
# reconnect_retries a bit oddly. They are incremented prior to the
|
74
|
+
# actual reconnect which means that the last reconnect_retries value
|
75
|
+
# is 1 more than the real value.
|
76
|
+
rs = connection.reconnect_retries
|
77
|
+
rs == 0 ? 0 : rs - 1
|
78
|
+
else
|
79
|
+
0
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
41
83
|
def stop
|
42
|
-
connection
|
84
|
+
if connection
|
85
|
+
connection.stop
|
86
|
+
end
|
43
87
|
EM.stop
|
44
88
|
end
|
45
89
|
|
46
90
|
private
|
47
|
-
def
|
48
|
-
|
49
|
-
|
91
|
+
def connect_and_run
|
92
|
+
EventMachine::run do
|
93
|
+
self.connection = stream.connect(:auth=>"#{screen_name}:#{password}")
|
94
|
+
Flamingo.logger.info("Listening on stream: #{stream.path}")
|
95
|
+
|
96
|
+
connection.each_item do |event_json|
|
97
|
+
dispatch_event(event_json)
|
98
|
+
end
|
99
|
+
|
100
|
+
connection.on_error do |message|
|
101
|
+
handle_connection_error(message)
|
102
|
+
end
|
103
|
+
|
104
|
+
connection.on_reconnect do |timeout, retries|
|
105
|
+
Flamingo.logger.warn "Failed to connect. Will reconnect after "+
|
106
|
+
"#{timeout}. Retry \##{retries}"
|
107
|
+
end
|
108
|
+
|
109
|
+
connection.on_max_reconnects do |timeout, retries|
|
110
|
+
stop_and_raise!(MaxReconnectsExceededError.new(
|
111
|
+
"Failed to reconnect after #{retries-1} retries"
|
112
|
+
))
|
113
|
+
end
|
114
|
+
end
|
115
|
+
end
|
116
|
+
|
117
|
+
# Decides what to do with specific connection errors. For explanations
|
118
|
+
# of various HTTP status codes from the Streaming API, see:
|
119
|
+
# http://dev.twitter.com/pages/streaming_api_response_codes
|
120
|
+
def handle_connection_error(message)
|
121
|
+
code = connection.code # HTTP status code
|
122
|
+
if [401,403].include?(code)
|
123
|
+
stop_and_raise!(AuthenticationError.new(message,code))
|
124
|
+
elsif code == 404
|
125
|
+
stop_and_raise!(UnknownStreamError.new(message,code))
|
126
|
+
elsif [406,413,416].include?(code)
|
127
|
+
stop_and_raise!(InvalidParametersError.new(message,code))
|
128
|
+
elsif code && code > 0
|
129
|
+
Flamingo.logger.warn "Received non-fatal HTTP status #{code} with "+
|
130
|
+
"message \"#{message}\". Will retry."
|
131
|
+
else
|
132
|
+
Flamingo.logger.warn "Unknown connection error: #{message}. "+
|
133
|
+
"Will retry."
|
134
|
+
end
|
50
135
|
end
|
51
136
|
|
52
|
-
def
|
53
|
-
Flamingo.logger.error "
|
54
|
-
|
137
|
+
def stop_and_raise!(error)
|
138
|
+
Flamingo.logger.error "Stopping wader due to error: #{error}"
|
139
|
+
stop
|
140
|
+
@error = error
|
141
|
+
end
|
142
|
+
|
143
|
+
def dispatch_event(event_json)
|
144
|
+
Flamingo.logger.debug "Wader dispatched event"
|
145
|
+
Resque.enqueue(Flamingo::DispatchEvent, event_json)
|
55
146
|
end
|
147
|
+
|
56
148
|
end
|
57
149
|
end
|
metadata
CHANGED
@@ -1,12 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: flamingo
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 23
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
|
-
-
|
9
|
-
|
8
|
+
- 2
|
9
|
+
- 0
|
10
|
+
version: 0.2.0
|
10
11
|
platform: ruby
|
11
12
|
authors:
|
12
13
|
- Hayes Davis
|
@@ -15,7 +16,7 @@ autorequire:
|
|
15
16
|
bindir: bin
|
16
17
|
cert_chain: []
|
17
18
|
|
18
|
-
date: 2010-
|
19
|
+
date: 2010-08-01 00:00:00 -05:00
|
19
20
|
default_executable:
|
20
21
|
dependencies:
|
21
22
|
- !ruby/object:Gem::Dependency
|
@@ -130,6 +131,22 @@ dependencies:
|
|
130
131
|
version: 2.1.0
|
131
132
|
type: :runtime
|
132
133
|
version_requirements: *id007
|
134
|
+
- !ruby/object:Gem::Dependency
|
135
|
+
name: mockingbird
|
136
|
+
prerelease: false
|
137
|
+
requirement: &id008 !ruby/object:Gem::Requirement
|
138
|
+
none: false
|
139
|
+
requirements:
|
140
|
+
- - ">="
|
141
|
+
- !ruby/object:Gem::Version
|
142
|
+
hash: 27
|
143
|
+
segments:
|
144
|
+
- 0
|
145
|
+
- 1
|
146
|
+
- 0
|
147
|
+
version: 0.1.0
|
148
|
+
type: :development
|
149
|
+
version_requirements: *id008
|
133
150
|
description: " Flamingo makes it easy to wade through the Twitter Streaming API by \n handling all connectivity and resource management for you. You just tell \n it what to track and consume the information in a resque queue. \n\n Flamingo isn't a traditional ruby gem. You don't require it into your code.\n Instead, it's designed to run as a daemon like redis or mysql. It provides \n a REST interface to change the parameters sent to the Twitter Streaming \n resource. All events from the streaming API are placed on a resque job \n queue where your application can process them.\n\n"
|
134
151
|
email: hayes@appozite.com
|
135
152
|
executables:
|
@@ -148,6 +165,7 @@ files:
|
|
148
165
|
- lib/flamingo/daemon/dispatcher_process.rb
|
149
166
|
- lib/flamingo/daemon/flamingod.rb
|
150
167
|
- lib/flamingo/daemon/pid_file.rb
|
168
|
+
- lib/flamingo/daemon/trap_keeper.rb
|
151
169
|
- lib/flamingo/daemon/wader_process.rb
|
152
170
|
- lib/flamingo/daemon/web_server_process.rb
|
153
171
|
- lib/flamingo/dispatch_error.rb
|