flamingo 0.3.1 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +30 -18
- data/examples/flamingo.yml +11 -0
- data/lib/flamingo.rb +29 -7
- data/lib/flamingo/daemon/dispatcher_process.rb +15 -6
- data/lib/flamingo/daemon/flamingod.rb +1 -0
- data/lib/flamingo/dispatch_queue.rb +30 -0
- data/lib/flamingo/dispatcher.rb +98 -0
- data/lib/flamingo/logging/event_log.rb +78 -0
- data/lib/flamingo/logging/utils.rb +22 -0
- data/lib/flamingo/stats/connection.rb +59 -0
- data/lib/flamingo/stats/events.rb +52 -0
- data/lib/flamingo/stats/rate_counter.rb +38 -0
- data/lib/flamingo/stream.rb +8 -4
- data/lib/flamingo/version.rb +2 -2
- data/lib/flamingo/wader.rb +4 -2
- metadata +12 -6
- data/lib/flamingo/dispatch_event.rb +0 -51
data/README.md
CHANGED
@@ -1,14 +1,19 @@
|
|
1
1
|
Flamingo
|
2
2
|
========
|
3
|
-
Flamingo is a
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
3
|
+
Flamingo is a service for connecting to and processing events from the Twitter
|
4
|
+
Streaming API. Here are the highlights:
|
5
|
+
|
6
|
+
* It runs as a daemon that you communicate with via a REST API interface.
|
7
|
+
* Handles all the work of intelligently managing connections to the
|
8
|
+
Streaming API (handling things like backoffs and reconnects).
|
9
|
+
* Stream events (tweets) can be stored directly to a file on disk via the
|
10
|
+
built in event log functionality. This is useful for collecting data for
|
11
|
+
further batch processing of incoming data via hadoop, for example.
|
12
|
+
* Stream events can be placed on a Resque queue for downstream processing. This
|
13
|
+
is an easy way to connect your application logic for processing tweets.
|
14
|
+
* It provides helpful metrics like stream rates, event counts, limit information
|
15
|
+
available via the REST endpoint /meta.json.
|
16
|
+
* It supports a minimal configuration REPL via the flamingo command.
|
12
17
|
|
13
18
|
Dependencies
|
14
19
|
------------
|
@@ -17,6 +22,11 @@ few dependencies and they are very specific. We plan to have fewer dependencies
|
|
17
22
|
and be more liberal with versions soon. Right now these gems and versions are
|
18
23
|
what is working well in production for us.
|
19
24
|
|
25
|
+
Caveat Emptor
|
26
|
+
-------------
|
27
|
+
This is *alpha* code. However, it processes multiple high-volume streams
|
28
|
+
in production as part of TweetReach.com.
|
29
|
+
|
20
30
|
Getting Started
|
21
31
|
---------------
|
22
32
|
1. Install the gem
|
@@ -63,7 +73,7 @@ commandline (see below)
|
|
63
73
|
|
64
74
|
6. Your second task from the flamingo console is to route the incoming tweets onto a queue -- in this case the EXAMPLE queue. This is used by the flamingod we'll start next but has no direct effect now.
|
65
75
|
|
66
|
-
>> Subscription.new('
|
76
|
+
>> Subscription.new('example').save
|
67
77
|
|
68
78
|
7. Start the Flamingo Daemon (`flamingod` installed during `gem install`), and also start watching its log file:
|
69
79
|
|
@@ -80,11 +90,8 @@ commandline (see below)
|
|
80
90
|
[2010-07-20 05:58:07, INFO] - Starting wader on pid=91008 under pid=91003
|
81
91
|
[2010-07-20 05:58:07, INFO] - Starting dispatcher on pid=91009 under pid=91003
|
82
92
|
[2010-07-20 05:58:12, INFO] - Listening on stream: /1/statuses/filter.json?track=%23etsy,austin,cheaptweet
|
83
|
-
... short initial delay ....
|
84
|
-
[2010-07-20 05:58:42, DEBUG] - Wader dispatched event
|
85
|
-
[2010-07-20 05:58:42, DEBUG] - Put job on subscription queue EXAMPLE for {"text":If you ever visit Austin make sure to go to Torchy's Tacos",...
|
86
93
|
|
87
|
-
On the resque-web dashboard, you should see a queue come up called
|
94
|
+
On the resque-web dashboard, you should see a queue come up called example, with jobs accruing. There will only be 0 of 0 workers working: let's fix that
|
88
95
|
|
89
96
|
8. You'll consume those events with a resque worker, something like the following but more audacious:
|
90
97
|
|
@@ -99,10 +106,10 @@ commandline (see below)
|
|
99
106
|
|
100
107
|
9. Start the worker task (see `examples/Rakefile`):
|
101
108
|
|
102
|
-
$ QUEUE=
|
109
|
+
$ QUEUE=example rake -t examples/Rakefile resque:work
|
103
110
|
|
104
111
|
Two things should now happen:
|
105
|
-
* The pent-up jobs from the
|
112
|
+
* The pent-up jobs from the example queue should spray across your console
|
106
113
|
* The resque dashboard should show the queue being emptied as a result
|
107
114
|
|
108
115
|
10. Interact with your running flamingod instance via the REST API (by default it is on port 4711)
|
@@ -122,7 +129,7 @@ components of the flamingo flock:
|
|
122
129
|
|
123
130
|
Coordinates the wader process (initiates stream request, pushes each response
|
124
131
|
into the queue), the Sinatra webserver (handles subscriptions and changing
|
125
|
-
stream parameters), and a
|
132
|
+
stream parameters), and a dispatcher (routes events to subscribers).
|
126
133
|
|
127
134
|
You can control flamingod with the following signals:
|
128
135
|
|
@@ -131,7 +138,12 @@ You can control flamingod with the following signals:
|
|
131
138
|
|
132
139
|
*wader*
|
133
140
|
|
134
|
-
The wader process starts the stream and
|
141
|
+
The wader process starts the stream and queues events as they arrive into a redis list.
|
142
|
+
|
143
|
+
*dispatcher*
|
144
|
+
|
145
|
+
The dispatcher process retrieves events from the dispatch queue, writes them to
|
146
|
+
the event log (if configured) and to any subscriptions (if configured).
|
135
147
|
|
136
148
|
*web server*
|
137
149
|
|
data/examples/flamingo.yml
CHANGED
@@ -11,6 +11,17 @@ stream: filter
|
|
11
11
|
logging:
|
12
12
|
dest: /tmp/flamingo.log
|
13
13
|
level: DEBUG
|
14
|
+
|
15
|
+
# Event logging (optional)
|
16
|
+
# Allows you to log the raw JSON of stream events to a set of rotating files
|
17
|
+
# stored in the directory you specify. Size is the maximum number of events
|
18
|
+
# that will be written to a log file before it is rotated. If size is omitted
|
19
|
+
# no rotation will be done. If you expect a high volume stream, set this number
|
20
|
+
# to something relatively large or you will end up with lots of small log files.
|
21
|
+
# 10000-100000 is probably a good place to start.
|
22
|
+
event:
|
23
|
+
dir: /tmp/flamingo_events
|
24
|
+
size: 100000
|
14
25
|
|
15
26
|
# Where is the redis server the flamingod processes should connect to?
|
16
27
|
# By default, all keys are namespaced wih "flamingo". May be changed
|
data/lib/flamingo.rb
CHANGED
@@ -11,8 +11,15 @@ require 'sinatra/base'
|
|
11
11
|
|
12
12
|
require 'flamingo/version'
|
13
13
|
require 'flamingo/config'
|
14
|
+
require 'flamingo/logging/formatter'
|
15
|
+
require 'flamingo/logging/utils'
|
16
|
+
require 'flamingo/logging/event_log'
|
14
17
|
require 'flamingo/meta'
|
15
|
-
require 'flamingo/
|
18
|
+
require 'flamingo/stats/rate_counter'
|
19
|
+
require 'flamingo/stats/events'
|
20
|
+
require 'flamingo/stats/connection'
|
21
|
+
require 'flamingo/dispatch_queue'
|
22
|
+
require 'flamingo/dispatcher'
|
16
23
|
require 'flamingo/stream_params'
|
17
24
|
require 'flamingo/stream'
|
18
25
|
require 'flamingo/subscription'
|
@@ -24,7 +31,6 @@ require 'flamingo/daemon/dispatcher_process'
|
|
24
31
|
require 'flamingo/daemon/web_server_process'
|
25
32
|
require 'flamingo/daemon/wader_process'
|
26
33
|
require 'flamingo/daemon/flamingod'
|
27
|
-
require 'flamingo/logging/formatter'
|
28
34
|
require 'flamingo/web/server'
|
29
35
|
|
30
36
|
module Flamingo
|
@@ -64,7 +70,7 @@ module Flamingo
|
|
64
70
|
def redis=(server)
|
65
71
|
host, port, db = server.split(':')
|
66
72
|
redis = Redis.new(:host => host, :port => port,
|
67
|
-
:thread_safe
|
73
|
+
:thread_safe=>true, :db => db)
|
68
74
|
@redis = Redis::Namespace.new(namespace, :redis => redis)
|
69
75
|
|
70
76
|
# Ensure resque is configured to use this redis as well
|
@@ -92,21 +98,37 @@ module Flamingo
|
|
92
98
|
end
|
93
99
|
|
94
100
|
def dispatch_queue
|
95
|
-
@dispatch_queue ||=
|
101
|
+
@dispatch_queue ||= DispatchQueue.new(redis)
|
96
102
|
end
|
97
103
|
|
98
104
|
def meta
|
99
|
-
@meta ||=
|
105
|
+
@meta ||= Meta.new(redis)
|
106
|
+
end
|
107
|
+
|
108
|
+
def event_stats
|
109
|
+
@event_stats ||= Stats::Events.new
|
110
|
+
end
|
111
|
+
|
112
|
+
def connection_stats
|
113
|
+
@connection_stats ||= Stats::Connection.new
|
114
|
+
end
|
115
|
+
|
116
|
+
def new_event_log
|
117
|
+
if event_config = config.logging.event(nil)
|
118
|
+
Logging::EventLog.new(event_config.dir,event_config.size(0))
|
119
|
+
else
|
120
|
+
nil
|
121
|
+
end
|
100
122
|
end
|
101
123
|
|
102
124
|
# Intended to be called after a fork so that we don't have
|
103
125
|
# issues with shared file descriptors, sockets, etc
|
104
126
|
def reconnect!
|
105
|
-
reconnect_redis_client(@redis)
|
106
|
-
reconnect_redis_client(Resque.redis)
|
107
127
|
# Reload logger
|
108
128
|
logger.close
|
109
129
|
self.logger = new_logger
|
130
|
+
reconnect_redis_client(@redis)
|
131
|
+
reconnect_redis_client(Resque.redis)
|
110
132
|
end
|
111
133
|
|
112
134
|
private
|
@@ -1,15 +1,24 @@
|
|
1
1
|
module Flamingo
|
2
2
|
module Daemon
|
3
3
|
class DispatcherProcess < ChildProcess
|
4
|
+
|
4
5
|
def run
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
$0 = "flamingod-dispatcher"
|
9
|
-
end
|
6
|
+
register_signal_handlers
|
7
|
+
$0 = "flamingod-dispatcher"
|
8
|
+
@dispatcher = Flamingo::Dispatcher.new
|
10
9
|
Flamingo.logger.info "Starting dispatcher on pid=#{Process.pid} under pid=#{Process.ppid}"
|
11
|
-
|
10
|
+
@dispatcher.run
|
12
11
|
end
|
12
|
+
|
13
|
+
def stop
|
14
|
+
@dispatcher.stop
|
15
|
+
end
|
16
|
+
|
17
|
+
def register_signal_handlers
|
18
|
+
trap("INT") { stop }
|
19
|
+
trap("TERM") { stop }
|
20
|
+
end
|
21
|
+
|
13
22
|
end
|
14
23
|
end
|
15
24
|
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
module Flamingo
|
2
|
+
class DispatchQueue
|
3
|
+
|
4
|
+
attr_accessor :redis
|
5
|
+
|
6
|
+
def initialize(redis)
|
7
|
+
self.redis = redis
|
8
|
+
@queue_name = "queue:dispatch"
|
9
|
+
end
|
10
|
+
|
11
|
+
def enqueue(event)
|
12
|
+
redis.rpush(@queue_name,event)
|
13
|
+
end
|
14
|
+
|
15
|
+
def dequeue
|
16
|
+
redis.lpop(@queue_name)
|
17
|
+
end
|
18
|
+
|
19
|
+
def page(page_num,page_size=20)
|
20
|
+
start_index = page_num*page_size
|
21
|
+
end_index = start_index+page_size-1
|
22
|
+
redis.lrange(@queue_name,start_index,end_index)
|
23
|
+
end
|
24
|
+
|
25
|
+
def size
|
26
|
+
redis.llen(@queue_name)
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
30
|
+
end
|
@@ -0,0 +1,98 @@
|
|
1
|
+
module Flamingo
|
2
|
+
class Dispatcher
|
3
|
+
|
4
|
+
def initialize
|
5
|
+
@shutdown = false
|
6
|
+
end
|
7
|
+
|
8
|
+
def stop
|
9
|
+
@shutdown = true
|
10
|
+
end
|
11
|
+
|
12
|
+
def run(wait_time=0.5)
|
13
|
+
init_event_log
|
14
|
+
while(!@shutdown) do
|
15
|
+
if event = next_event
|
16
|
+
dispatch(event)
|
17
|
+
else
|
18
|
+
if wait_time == 0
|
19
|
+
stop
|
20
|
+
else
|
21
|
+
wait(wait_time)
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
private
|
28
|
+
def next_event
|
29
|
+
Flamingo.dispatch_queue.dequeue
|
30
|
+
end
|
31
|
+
|
32
|
+
def meta
|
33
|
+
Flamingo.meta
|
34
|
+
end
|
35
|
+
|
36
|
+
def logger
|
37
|
+
Flamingo.logger
|
38
|
+
end
|
39
|
+
|
40
|
+
def init_event_log
|
41
|
+
@event_log = Flamingo.new_event_log
|
42
|
+
end
|
43
|
+
|
44
|
+
def event_log
|
45
|
+
@event_log
|
46
|
+
end
|
47
|
+
|
48
|
+
def wait(time=0.5)
|
49
|
+
sleep(time) unless @shutdown
|
50
|
+
end
|
51
|
+
|
52
|
+
def dispatch(event_json)
|
53
|
+
type, event = typed_event(parse(event_json))
|
54
|
+
update_stats(type,event)
|
55
|
+
if event_log
|
56
|
+
event_log << event_json
|
57
|
+
end
|
58
|
+
if type == :limit
|
59
|
+
handle_limit(event)
|
60
|
+
end
|
61
|
+
Subscription.all.each do |sub|
|
62
|
+
Resque::Job.create(sub.name, "HandleFlamingoEvent", type, event)
|
63
|
+
end
|
64
|
+
rescue => e
|
65
|
+
handle_error(event_json,e)
|
66
|
+
end
|
67
|
+
|
68
|
+
def update_stats(type, event)
|
69
|
+
Flamingo.event_stats.event!(type)
|
70
|
+
end
|
71
|
+
|
72
|
+
def handle_error(event_json,error)
|
73
|
+
Logging::Utils.log_error(logger,
|
74
|
+
"Failure dispatching event: #{event_json}",error)
|
75
|
+
end
|
76
|
+
|
77
|
+
def handle_limit(event)
|
78
|
+
skipped = event.values.first
|
79
|
+
Flamingo.connection_stats.limited!(skipped)
|
80
|
+
logger.warn "Rate limited: #{skipped} skipped"
|
81
|
+
end
|
82
|
+
|
83
|
+
def parse(json)
|
84
|
+
Yajl::Parser.parse(json,:symbolize_keys=>true)
|
85
|
+
end
|
86
|
+
|
87
|
+
def typed_event(event)
|
88
|
+
# Events with one {key: value} pair are used as control events from
|
89
|
+
# Twitter. These include limit, delete, scrub_geo and others.
|
90
|
+
if event.size == 1
|
91
|
+
event.shift
|
92
|
+
else
|
93
|
+
[:tweet, event]
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
end
|
98
|
+
end
|
@@ -0,0 +1,78 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Logging
|
3
|
+
class EventLog
|
4
|
+
|
5
|
+
attr_accessor :dir, :max_size
|
6
|
+
|
7
|
+
def initialize(dir,size=10000)
|
8
|
+
self.dir = dir
|
9
|
+
self.max_size = size
|
10
|
+
@rotations = 0
|
11
|
+
rotate!
|
12
|
+
unless open?
|
13
|
+
raise "Failure opening log file"
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
def append(event)
|
18
|
+
if should_rotate?
|
19
|
+
rotate!
|
20
|
+
end
|
21
|
+
@log << "#{event}\n"
|
22
|
+
@event_count += 1
|
23
|
+
end
|
24
|
+
alias_method :<<, :append
|
25
|
+
|
26
|
+
def open?
|
27
|
+
!@log.nil?
|
28
|
+
end
|
29
|
+
|
30
|
+
private
|
31
|
+
def should_rotate?
|
32
|
+
max_size > 0 && @event_count >= max_size
|
33
|
+
end
|
34
|
+
|
35
|
+
def rotate!
|
36
|
+
close_current_log
|
37
|
+
open_new_log
|
38
|
+
if open?
|
39
|
+
symlink_current_log
|
40
|
+
update_counters
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
def update_counters
|
45
|
+
@event_count = 0
|
46
|
+
@rotations += 1
|
47
|
+
end
|
48
|
+
|
49
|
+
def close_current_log
|
50
|
+
@log.close if @log
|
51
|
+
rescue => e
|
52
|
+
Logging::Utils.log_error(Flamingo.logger,
|
53
|
+
"Failure closing event log #{@log_filename}",e)
|
54
|
+
end
|
55
|
+
|
56
|
+
def open_new_log
|
57
|
+
@log_filename = File.expand_path(File.join(dir,log_filename))
|
58
|
+
@log = File.open(@log_filename,'a')
|
59
|
+
@log.sync = true #Immediately flush all output
|
60
|
+
rescue => e
|
61
|
+
Logging::Utils.log_error(Flamingo.logger,
|
62
|
+
"Failure opening event log #{@log_filename}",e)
|
63
|
+
@log = nil
|
64
|
+
end
|
65
|
+
|
66
|
+
def log_filename
|
67
|
+
ts = Time.now.strftime("%Y%m%d-%H%M%S")
|
68
|
+
"event-#{ts}-#{@rotations}.log"
|
69
|
+
end
|
70
|
+
|
71
|
+
def symlink_current_log
|
72
|
+
current_log = File.expand_path(File.join(dir,"event.log"))
|
73
|
+
`ln -fs #{@log_filename} #{current_log}`
|
74
|
+
end
|
75
|
+
|
76
|
+
end
|
77
|
+
end
|
78
|
+
end
|
@@ -0,0 +1,22 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Logging
|
3
|
+
module Utils
|
4
|
+
|
5
|
+
def log_error(logger, msg, e)
|
6
|
+
logger.error msg
|
7
|
+
logger.error error_trace(e,2)
|
8
|
+
end
|
9
|
+
|
10
|
+
def error_trace(e,indent=0)
|
11
|
+
space = " "*indent
|
12
|
+
err = "#{space}#{e.class.name}: #{e.message}\n"
|
13
|
+
space = " "*(indent+2)
|
14
|
+
err << "#{space}#{e.backtrace.join("\n#{space}")}\n"
|
15
|
+
err
|
16
|
+
end
|
17
|
+
|
18
|
+
extend self
|
19
|
+
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
@@ -0,0 +1,59 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Stats
|
3
|
+
class Connection
|
4
|
+
|
5
|
+
START_TIME = "conn:start:time"
|
6
|
+
START_EVENT_COUNT = "conn:start:event_count"
|
7
|
+
START_TWEET_COUNT = "conn:start:tweet_count"
|
8
|
+
LIMIT_COUNT = "conn:limit:count"
|
9
|
+
LIMIT_TIME = "conn:limit:time"
|
10
|
+
COVERAGE = "conn:coverage"
|
11
|
+
|
12
|
+
def connected!
|
13
|
+
meta.set(START_TIME,Time.now.to_i)
|
14
|
+
meta.set(START_EVENT_COUNT,event_stats.all_count)
|
15
|
+
meta.set(START_TWEET_COUNT,event_stats.tweet_count)
|
16
|
+
meta.set(COVERAGE,100)
|
17
|
+
meta.delete(LIMIT_COUNT)
|
18
|
+
meta.delete(LIMIT_TIME)
|
19
|
+
end
|
20
|
+
|
21
|
+
def limited!(count)
|
22
|
+
meta.set(LIMIT_COUNT,count)
|
23
|
+
meta.set(LIMIT_TIME,Time.now.to_i)
|
24
|
+
meta.set(COVERAGE,coverage_rate)
|
25
|
+
end
|
26
|
+
|
27
|
+
def received_tweets
|
28
|
+
event_stats.tweet_count - (meta.get(START_TWEET_COUNT) || 0)
|
29
|
+
end
|
30
|
+
|
31
|
+
def skipped_tweets
|
32
|
+
meta.get(LIMIT_COUNT) || 0
|
33
|
+
end
|
34
|
+
|
35
|
+
def received_events
|
36
|
+
event_stats.all_count - (meta.get(START_EVENT_COUNT) || 0)
|
37
|
+
end
|
38
|
+
|
39
|
+
def coverage_rate
|
40
|
+
received = received_tweets
|
41
|
+
possible_tweets = received + skipped_tweets
|
42
|
+
if possible_tweets == 0
|
43
|
+
0
|
44
|
+
else
|
45
|
+
(received / possible_tweets.to_f)*100
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
def meta
|
50
|
+
Flamingo.meta
|
51
|
+
end
|
52
|
+
|
53
|
+
def event_stats
|
54
|
+
Flamingo.event_stats
|
55
|
+
end
|
56
|
+
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|
@@ -0,0 +1,52 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Stats
|
3
|
+
class Events
|
4
|
+
|
5
|
+
ALL_COUNT = "events:all_count"
|
6
|
+
RATE = "events:rate"
|
7
|
+
LAST_TIME = "events:last_time"
|
8
|
+
TYPE_COUNT = "events:%s_count"
|
9
|
+
TWEET_COUNT = TYPE_COUNT % [:tweet]
|
10
|
+
|
11
|
+
def initialize
|
12
|
+
@rate_counter = Flamingo::Stats::RateCounter.new(10) do |eps|
|
13
|
+
meta.set(RATE,eps)
|
14
|
+
logger.debug "%.3f eps" % [eps]
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
def event!(type)
|
19
|
+
@rate_counter.event!
|
20
|
+
meta.incr(ALL_COUNT)
|
21
|
+
meta.set(LAST_TIME,Time.now.to_i)
|
22
|
+
meta.incr(TYPE_COUNT % [type])
|
23
|
+
end
|
24
|
+
|
25
|
+
def all_count
|
26
|
+
meta.get(ALL_COUNT) || 0
|
27
|
+
end
|
28
|
+
|
29
|
+
def last_time
|
30
|
+
meta.get(LAST_TIME)
|
31
|
+
end
|
32
|
+
|
33
|
+
def type_count(type)
|
34
|
+
meta.get(TYPE_COUNT % [type]) || 0
|
35
|
+
end
|
36
|
+
|
37
|
+
def tweet_count
|
38
|
+
type_count(:tweet)
|
39
|
+
end
|
40
|
+
|
41
|
+
private
|
42
|
+
def logger
|
43
|
+
Flamingo.logger
|
44
|
+
end
|
45
|
+
|
46
|
+
def meta
|
47
|
+
Flamingo.meta
|
48
|
+
end
|
49
|
+
|
50
|
+
end
|
51
|
+
end
|
52
|
+
end
|
@@ -0,0 +1,38 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Stats
|
3
|
+
|
4
|
+
# Simple counter for measuring stream rates in events per second
|
5
|
+
class RateCounter
|
6
|
+
|
7
|
+
attr_accessor :rate, :callback
|
8
|
+
|
9
|
+
def initialize(sample_duration=60, &block)
|
10
|
+
@sample_duration = sample_duration
|
11
|
+
self.callback = block
|
12
|
+
start_sample
|
13
|
+
end
|
14
|
+
|
15
|
+
def event!
|
16
|
+
@count += 1
|
17
|
+
if (diff = (now - @sample_start_time)) >= @sample_duration
|
18
|
+
self.rate = (@count / diff.to_f)
|
19
|
+
if callback
|
20
|
+
callback.call(rate)
|
21
|
+
end
|
22
|
+
start_sample
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
private
|
27
|
+
def now
|
28
|
+
Time.now.to_i
|
29
|
+
end
|
30
|
+
|
31
|
+
def start_sample
|
32
|
+
@sample_start_time = now
|
33
|
+
@count = 0
|
34
|
+
end
|
35
|
+
|
36
|
+
end
|
37
|
+
end
|
38
|
+
end
|
data/lib/flamingo/stream.rb
CHANGED
@@ -53,12 +53,16 @@ module Flamingo
|
|
53
53
|
:name=>name,:resource=>resource,:params=>params.all
|
54
54
|
)
|
55
55
|
end
|
56
|
+
|
57
|
+
def query
|
58
|
+
params.map{|key,value| "#{key}=#{param_value(value)}" }.join("&")
|
59
|
+
end
|
56
60
|
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
end
|
61
|
+
def to_s
|
62
|
+
"#{path}?#{query}"
|
63
|
+
end
|
61
64
|
|
65
|
+
private
|
62
66
|
def param_value(val)
|
63
67
|
case val
|
64
68
|
when String then CGI.escape(val)
|
data/lib/flamingo/version.rb
CHANGED
@@ -1,3 +1,3 @@
|
|
1
1
|
module Flamingo
|
2
|
-
Version = VERSION = '0.
|
3
|
-
end
|
2
|
+
Version = VERSION = '0.4.0'
|
3
|
+
end
|
data/lib/flamingo/wader.rb
CHANGED
@@ -90,8 +90,10 @@ module Flamingo
|
|
90
90
|
private
|
91
91
|
def connect_and_run
|
92
92
|
EventMachine::run do
|
93
|
+
Flamingo.logger.info("Connecting to stream: #{stream}")
|
93
94
|
self.connection = stream.connect(:auth=>"#{screen_name}:#{password}")
|
94
|
-
Flamingo.logger.info("
|
95
|
+
Flamingo.logger.info("Connected to stream")
|
96
|
+
Flamingo.connection_stats.connected!
|
95
97
|
|
96
98
|
connection.each_item do |event_json|
|
97
99
|
dispatch_event(event_json)
|
@@ -141,7 +143,7 @@ module Flamingo
|
|
141
143
|
end
|
142
144
|
|
143
145
|
def dispatch_event(event_json)
|
144
|
-
|
146
|
+
Flamingo.dispatch_queue.enqueue(event_json)
|
145
147
|
end
|
146
148
|
|
147
149
|
end
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: flamingo
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 15
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
|
-
-
|
9
|
-
-
|
10
|
-
version: 0.
|
8
|
+
- 4
|
9
|
+
- 0
|
10
|
+
version: 0.4.0
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- Hayes Davis
|
@@ -16,7 +16,7 @@ autorequire:
|
|
16
16
|
bindir: bin
|
17
17
|
cert_chain: []
|
18
18
|
|
19
|
-
date:
|
19
|
+
date: 2011-02-10 00:00:00 -08:00
|
20
20
|
default_executable:
|
21
21
|
dependencies:
|
22
22
|
- !ruby/object:Gem::Dependency
|
@@ -262,9 +262,15 @@ files:
|
|
262
262
|
- lib/flamingo/daemon/trap_keeper.rb
|
263
263
|
- lib/flamingo/daemon/wader_process.rb
|
264
264
|
- lib/flamingo/daemon/web_server_process.rb
|
265
|
-
- lib/flamingo/
|
265
|
+
- lib/flamingo/dispatch_queue.rb
|
266
|
+
- lib/flamingo/dispatcher.rb
|
267
|
+
- lib/flamingo/logging/event_log.rb
|
266
268
|
- lib/flamingo/logging/formatter.rb
|
269
|
+
- lib/flamingo/logging/utils.rb
|
267
270
|
- lib/flamingo/meta.rb
|
271
|
+
- lib/flamingo/stats/connection.rb
|
272
|
+
- lib/flamingo/stats/events.rb
|
273
|
+
- lib/flamingo/stats/rate_counter.rb
|
268
274
|
- lib/flamingo/stream.rb
|
269
275
|
- lib/flamingo/stream_params.rb
|
270
276
|
- lib/flamingo/subscription.rb
|
@@ -1,51 +0,0 @@
|
|
1
|
-
module Flamingo
|
2
|
-
class DispatchEvent
|
3
|
-
|
4
|
-
@parser = Yajl::Parser.new(:symbolize_keys => true)
|
5
|
-
|
6
|
-
class << self
|
7
|
-
|
8
|
-
def queue
|
9
|
-
Flamingo.dispatch_queue
|
10
|
-
end
|
11
|
-
|
12
|
-
def meta
|
13
|
-
Flamingo.meta
|
14
|
-
end
|
15
|
-
|
16
|
-
#
|
17
|
-
# TODO Track stats including: tweets per second and last tweet time
|
18
|
-
# TODO Provide some first-level check for repeated status ids
|
19
|
-
# TODO Consider subscribers for receiving particular terms - do the heavy
|
20
|
-
# lifting of parsing tweets and delivering them to particular subscribers
|
21
|
-
# TODO Consider window of tweets (approx 3 seconds) and sort before
|
22
|
-
# dispatching to improve in-order delivery (helps with "k-sorted")
|
23
|
-
#
|
24
|
-
def perform(event_json)
|
25
|
-
meta.incr("events:all_count")
|
26
|
-
meta.set("events:last_time",Time.now.utc.to_i)
|
27
|
-
type, event = typed_event(parse(event_json))
|
28
|
-
meta.incr("events:#{type}_count")
|
29
|
-
Subscription.all.each do |sub|
|
30
|
-
Resque::Job.create(sub.name, "HandleFlamingoEvent", type, event)
|
31
|
-
Flamingo.logger.debug "Put job on subscription queue #{sub.name}\n#{event_json}"
|
32
|
-
end
|
33
|
-
end
|
34
|
-
|
35
|
-
def parse(json)
|
36
|
-
@parser.parse(json)
|
37
|
-
end
|
38
|
-
|
39
|
-
def typed_event(event)
|
40
|
-
if event[:delete]
|
41
|
-
[:delete, event[:delete]]
|
42
|
-
elsif event[:link]
|
43
|
-
[:link, event[:link]]
|
44
|
-
else
|
45
|
-
[:tweet, event]
|
46
|
-
end
|
47
|
-
end
|
48
|
-
|
49
|
-
end
|
50
|
-
end
|
51
|
-
end
|