flamingo 0.3.1 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +30 -18
- data/examples/flamingo.yml +11 -0
- data/lib/flamingo.rb +29 -7
- data/lib/flamingo/daemon/dispatcher_process.rb +15 -6
- data/lib/flamingo/daemon/flamingod.rb +1 -0
- data/lib/flamingo/dispatch_queue.rb +30 -0
- data/lib/flamingo/dispatcher.rb +98 -0
- data/lib/flamingo/logging/event_log.rb +78 -0
- data/lib/flamingo/logging/utils.rb +22 -0
- data/lib/flamingo/stats/connection.rb +59 -0
- data/lib/flamingo/stats/events.rb +52 -0
- data/lib/flamingo/stats/rate_counter.rb +38 -0
- data/lib/flamingo/stream.rb +8 -4
- data/lib/flamingo/version.rb +2 -2
- data/lib/flamingo/wader.rb +4 -2
- metadata +12 -6
- data/lib/flamingo/dispatch_event.rb +0 -51
data/README.md
CHANGED
@@ -1,14 +1,19 @@
|
|
1
1
|
Flamingo
|
2
2
|
========
|
3
|
-
Flamingo is a
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
3
|
+
Flamingo is a service for connecting to and processing events from the Twitter
|
4
|
+
Streaming API. Here are the highlights:
|
5
|
+
|
6
|
+
* It runs as a daemon that you communicate with via a REST API interface.
|
7
|
+
* Handles all the work of intelligently managing connections to the
|
8
|
+
Streaming API (handling things like backoffs and reconnects).
|
9
|
+
* Stream events (tweets) can be stored directly to a file on disk via the
|
10
|
+
built in event log functionality. This is useful for collecting data for
|
11
|
+
further batch processing of incoming data via hadoop, for example.
|
12
|
+
* Stream events can be placed on a Resque queue for downstream processing. This
|
13
|
+
is an easy way to connect your application logic for processing tweets.
|
14
|
+
* It provides helpful metrics like stream rates, event counts, limit information
|
15
|
+
available via the REST endpoint /meta.json.
|
16
|
+
* It supports a minimal configuration REPL via the flamingo command.
|
12
17
|
|
13
18
|
Dependencies
|
14
19
|
------------
|
@@ -17,6 +22,11 @@ few dependencies and they are very specific. We plan to have fewer dependencies
|
|
17
22
|
and be more liberal with versions soon. Right now these gems and versions are
|
18
23
|
what is working well in production for us.
|
19
24
|
|
25
|
+
Caveat Emptor
|
26
|
+
-------------
|
27
|
+
This is *alpha* code. However, it processes multiple high-volume streams
|
28
|
+
in production as part of TweetReach.com.
|
29
|
+
|
20
30
|
Getting Started
|
21
31
|
---------------
|
22
32
|
1. Install the gem
|
@@ -63,7 +73,7 @@ commandline (see below)
|
|
63
73
|
|
64
74
|
6. Your second task from the flamingo console is to route the incoming tweets onto a queue -- in this case the EXAMPLE queue. This is used by the flamingod we'll start next but has no direct effect now.
|
65
75
|
|
66
|
-
>> Subscription.new('
|
76
|
+
>> Subscription.new('example').save
|
67
77
|
|
68
78
|
7. Start the Flamingo Daemon (`flamingod` installed during `gem install`), and also start watching its log file:
|
69
79
|
|
@@ -80,11 +90,8 @@ commandline (see below)
|
|
80
90
|
[2010-07-20 05:58:07, INFO] - Starting wader on pid=91008 under pid=91003
|
81
91
|
[2010-07-20 05:58:07, INFO] - Starting dispatcher on pid=91009 under pid=91003
|
82
92
|
[2010-07-20 05:58:12, INFO] - Listening on stream: /1/statuses/filter.json?track=%23etsy,austin,cheaptweet
|
83
|
-
... short initial delay ....
|
84
|
-
[2010-07-20 05:58:42, DEBUG] - Wader dispatched event
|
85
|
-
[2010-07-20 05:58:42, DEBUG] - Put job on subscription queue EXAMPLE for {"text":If you ever visit Austin make sure to go to Torchy's Tacos",...
|
86
93
|
|
87
|
-
On the resque-web dashboard, you should see a queue come up called
|
94
|
+
On the resque-web dashboard, you should see a queue come up called example, with jobs accruing. There will only be 0 of 0 workers working: let's fix that
|
88
95
|
|
89
96
|
8. You'll consume those events with a resque worker, something like the following but more audacious:
|
90
97
|
|
@@ -99,10 +106,10 @@ commandline (see below)
|
|
99
106
|
|
100
107
|
9. Start the worker task (see `examples/Rakefile`):
|
101
108
|
|
102
|
-
$ QUEUE=
|
109
|
+
$ QUEUE=example rake -t examples/Rakefile resque:work
|
103
110
|
|
104
111
|
Two things should now happen:
|
105
|
-
* The pent-up jobs from the
|
112
|
+
* The pent-up jobs from the example queue should spray across your console
|
106
113
|
* The resque dashboard should show the queue being emptied as a result
|
107
114
|
|
108
115
|
10. Interact with your running flamingod instance via the REST API (by default it is on port 4711)
|
@@ -122,7 +129,7 @@ components of the flamingo flock:
|
|
122
129
|
|
123
130
|
Coordinates the wader process (initiates stream request, pushes each response
|
124
131
|
into the queue), the Sinatra webserver (handles subscriptions and changing
|
125
|
-
stream parameters), and a
|
132
|
+
stream parameters), and a dispatcher (routes events to subscribers).
|
126
133
|
|
127
134
|
You can control flamingod with the following signals:
|
128
135
|
|
@@ -131,7 +138,12 @@ You can control flamingod with the following signals:
|
|
131
138
|
|
132
139
|
*wader*
|
133
140
|
|
134
|
-
The wader process starts the stream and
|
141
|
+
The wader process starts the stream and queues events as they arrive into a redis list.
|
142
|
+
|
143
|
+
*dispatcher*
|
144
|
+
|
145
|
+
The dispatcher process retrieves events from the dispatch queue, writes them to
|
146
|
+
the event log (if configured) and to any subscriptions (if configured).
|
135
147
|
|
136
148
|
*web server*
|
137
149
|
|
data/examples/flamingo.yml
CHANGED
@@ -11,6 +11,17 @@ stream: filter
|
|
11
11
|
logging:
|
12
12
|
dest: /tmp/flamingo.log
|
13
13
|
level: DEBUG
|
14
|
+
|
15
|
+
# Event logging (optional)
|
16
|
+
# Allows you to log the raw JSON of stream events to a set of rotating files
|
17
|
+
# stored in the directory you specify. Size is the maximum number of events
|
18
|
+
# that will be written to a log file before it is rotated. If size is omitted
|
19
|
+
# no rotation will be done. If you expect a high volume stream, set this number
|
20
|
+
# to something relatively large or you will end up with lots of small log files.
|
21
|
+
# 10000-100000 is probably a good place to start.
|
22
|
+
event:
|
23
|
+
dir: /tmp/flamingo_events
|
24
|
+
size: 100000
|
14
25
|
|
15
26
|
# Where is the redis server the flamingod processes should connect to?
|
16
27
|
# By default, all keys are namespaced wih "flamingo". May be changed
|
data/lib/flamingo.rb
CHANGED
@@ -11,8 +11,15 @@ require 'sinatra/base'
|
|
11
11
|
|
12
12
|
require 'flamingo/version'
|
13
13
|
require 'flamingo/config'
|
14
|
+
require 'flamingo/logging/formatter'
|
15
|
+
require 'flamingo/logging/utils'
|
16
|
+
require 'flamingo/logging/event_log'
|
14
17
|
require 'flamingo/meta'
|
15
|
-
require 'flamingo/
|
18
|
+
require 'flamingo/stats/rate_counter'
|
19
|
+
require 'flamingo/stats/events'
|
20
|
+
require 'flamingo/stats/connection'
|
21
|
+
require 'flamingo/dispatch_queue'
|
22
|
+
require 'flamingo/dispatcher'
|
16
23
|
require 'flamingo/stream_params'
|
17
24
|
require 'flamingo/stream'
|
18
25
|
require 'flamingo/subscription'
|
@@ -24,7 +31,6 @@ require 'flamingo/daemon/dispatcher_process'
|
|
24
31
|
require 'flamingo/daemon/web_server_process'
|
25
32
|
require 'flamingo/daemon/wader_process'
|
26
33
|
require 'flamingo/daemon/flamingod'
|
27
|
-
require 'flamingo/logging/formatter'
|
28
34
|
require 'flamingo/web/server'
|
29
35
|
|
30
36
|
module Flamingo
|
@@ -64,7 +70,7 @@ module Flamingo
|
|
64
70
|
def redis=(server)
|
65
71
|
host, port, db = server.split(':')
|
66
72
|
redis = Redis.new(:host => host, :port => port,
|
67
|
-
:thread_safe
|
73
|
+
:thread_safe=>true, :db => db)
|
68
74
|
@redis = Redis::Namespace.new(namespace, :redis => redis)
|
69
75
|
|
70
76
|
# Ensure resque is configured to use this redis as well
|
@@ -92,21 +98,37 @@ module Flamingo
|
|
92
98
|
end
|
93
99
|
|
94
100
|
def dispatch_queue
|
95
|
-
@dispatch_queue ||=
|
101
|
+
@dispatch_queue ||= DispatchQueue.new(redis)
|
96
102
|
end
|
97
103
|
|
98
104
|
def meta
|
99
|
-
@meta ||=
|
105
|
+
@meta ||= Meta.new(redis)
|
106
|
+
end
|
107
|
+
|
108
|
+
def event_stats
|
109
|
+
@event_stats ||= Stats::Events.new
|
110
|
+
end
|
111
|
+
|
112
|
+
def connection_stats
|
113
|
+
@connection_stats ||= Stats::Connection.new
|
114
|
+
end
|
115
|
+
|
116
|
+
def new_event_log
|
117
|
+
if event_config = config.logging.event(nil)
|
118
|
+
Logging::EventLog.new(event_config.dir,event_config.size(0))
|
119
|
+
else
|
120
|
+
nil
|
121
|
+
end
|
100
122
|
end
|
101
123
|
|
102
124
|
# Intended to be called after a fork so that we don't have
|
103
125
|
# issues with shared file descriptors, sockets, etc
|
104
126
|
def reconnect!
|
105
|
-
reconnect_redis_client(@redis)
|
106
|
-
reconnect_redis_client(Resque.redis)
|
107
127
|
# Reload logger
|
108
128
|
logger.close
|
109
129
|
self.logger = new_logger
|
130
|
+
reconnect_redis_client(@redis)
|
131
|
+
reconnect_redis_client(Resque.redis)
|
110
132
|
end
|
111
133
|
|
112
134
|
private
|
@@ -1,15 +1,24 @@
|
|
1
1
|
module Flamingo
|
2
2
|
module Daemon
|
3
3
|
class DispatcherProcess < ChildProcess
|
4
|
+
|
4
5
|
def run
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
$0 = "flamingod-dispatcher"
|
9
|
-
end
|
6
|
+
register_signal_handlers
|
7
|
+
$0 = "flamingod-dispatcher"
|
8
|
+
@dispatcher = Flamingo::Dispatcher.new
|
10
9
|
Flamingo.logger.info "Starting dispatcher on pid=#{Process.pid} under pid=#{Process.ppid}"
|
11
|
-
|
10
|
+
@dispatcher.run
|
12
11
|
end
|
12
|
+
|
13
|
+
def stop
|
14
|
+
@dispatcher.stop
|
15
|
+
end
|
16
|
+
|
17
|
+
def register_signal_handlers
|
18
|
+
trap("INT") { stop }
|
19
|
+
trap("TERM") { stop }
|
20
|
+
end
|
21
|
+
|
13
22
|
end
|
14
23
|
end
|
15
24
|
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
module Flamingo
|
2
|
+
class DispatchQueue
|
3
|
+
|
4
|
+
attr_accessor :redis
|
5
|
+
|
6
|
+
def initialize(redis)
|
7
|
+
self.redis = redis
|
8
|
+
@queue_name = "queue:dispatch"
|
9
|
+
end
|
10
|
+
|
11
|
+
def enqueue(event)
|
12
|
+
redis.rpush(@queue_name,event)
|
13
|
+
end
|
14
|
+
|
15
|
+
def dequeue
|
16
|
+
redis.lpop(@queue_name)
|
17
|
+
end
|
18
|
+
|
19
|
+
def page(page_num,page_size=20)
|
20
|
+
start_index = page_num*page_size
|
21
|
+
end_index = start_index+page_size-1
|
22
|
+
redis.lrange(@queue_name,start_index,end_index)
|
23
|
+
end
|
24
|
+
|
25
|
+
def size
|
26
|
+
redis.llen(@queue_name)
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
30
|
+
end
|
@@ -0,0 +1,98 @@
|
|
1
|
+
module Flamingo
|
2
|
+
class Dispatcher
|
3
|
+
|
4
|
+
def initialize
|
5
|
+
@shutdown = false
|
6
|
+
end
|
7
|
+
|
8
|
+
def stop
|
9
|
+
@shutdown = true
|
10
|
+
end
|
11
|
+
|
12
|
+
def run(wait_time=0.5)
|
13
|
+
init_event_log
|
14
|
+
while(!@shutdown) do
|
15
|
+
if event = next_event
|
16
|
+
dispatch(event)
|
17
|
+
else
|
18
|
+
if wait_time == 0
|
19
|
+
stop
|
20
|
+
else
|
21
|
+
wait(wait_time)
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
private
|
28
|
+
def next_event
|
29
|
+
Flamingo.dispatch_queue.dequeue
|
30
|
+
end
|
31
|
+
|
32
|
+
def meta
|
33
|
+
Flamingo.meta
|
34
|
+
end
|
35
|
+
|
36
|
+
def logger
|
37
|
+
Flamingo.logger
|
38
|
+
end
|
39
|
+
|
40
|
+
def init_event_log
|
41
|
+
@event_log = Flamingo.new_event_log
|
42
|
+
end
|
43
|
+
|
44
|
+
def event_log
|
45
|
+
@event_log
|
46
|
+
end
|
47
|
+
|
48
|
+
def wait(time=0.5)
|
49
|
+
sleep(time) unless @shutdown
|
50
|
+
end
|
51
|
+
|
52
|
+
def dispatch(event_json)
|
53
|
+
type, event = typed_event(parse(event_json))
|
54
|
+
update_stats(type,event)
|
55
|
+
if event_log
|
56
|
+
event_log << event_json
|
57
|
+
end
|
58
|
+
if type == :limit
|
59
|
+
handle_limit(event)
|
60
|
+
end
|
61
|
+
Subscription.all.each do |sub|
|
62
|
+
Resque::Job.create(sub.name, "HandleFlamingoEvent", type, event)
|
63
|
+
end
|
64
|
+
rescue => e
|
65
|
+
handle_error(event_json,e)
|
66
|
+
end
|
67
|
+
|
68
|
+
def update_stats(type, event)
|
69
|
+
Flamingo.event_stats.event!(type)
|
70
|
+
end
|
71
|
+
|
72
|
+
def handle_error(event_json,error)
|
73
|
+
Logging::Utils.log_error(logger,
|
74
|
+
"Failure dispatching event: #{event_json}",error)
|
75
|
+
end
|
76
|
+
|
77
|
+
def handle_limit(event)
|
78
|
+
skipped = event.values.first
|
79
|
+
Flamingo.connection_stats.limited!(skipped)
|
80
|
+
logger.warn "Rate limited: #{skipped} skipped"
|
81
|
+
end
|
82
|
+
|
83
|
+
def parse(json)
|
84
|
+
Yajl::Parser.parse(json,:symbolize_keys=>true)
|
85
|
+
end
|
86
|
+
|
87
|
+
def typed_event(event)
|
88
|
+
# Events with one {key: value} pair are used as control events from
|
89
|
+
# Twitter. These include limit, delete, scrub_geo and others.
|
90
|
+
if event.size == 1
|
91
|
+
event.shift
|
92
|
+
else
|
93
|
+
[:tweet, event]
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
end
|
98
|
+
end
|
@@ -0,0 +1,78 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Logging
|
3
|
+
class EventLog
|
4
|
+
|
5
|
+
attr_accessor :dir, :max_size
|
6
|
+
|
7
|
+
def initialize(dir,size=10000)
|
8
|
+
self.dir = dir
|
9
|
+
self.max_size = size
|
10
|
+
@rotations = 0
|
11
|
+
rotate!
|
12
|
+
unless open?
|
13
|
+
raise "Failure opening log file"
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
def append(event)
|
18
|
+
if should_rotate?
|
19
|
+
rotate!
|
20
|
+
end
|
21
|
+
@log << "#{event}\n"
|
22
|
+
@event_count += 1
|
23
|
+
end
|
24
|
+
alias_method :<<, :append
|
25
|
+
|
26
|
+
def open?
|
27
|
+
!@log.nil?
|
28
|
+
end
|
29
|
+
|
30
|
+
private
|
31
|
+
def should_rotate?
|
32
|
+
max_size > 0 && @event_count >= max_size
|
33
|
+
end
|
34
|
+
|
35
|
+
def rotate!
|
36
|
+
close_current_log
|
37
|
+
open_new_log
|
38
|
+
if open?
|
39
|
+
symlink_current_log
|
40
|
+
update_counters
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
def update_counters
|
45
|
+
@event_count = 0
|
46
|
+
@rotations += 1
|
47
|
+
end
|
48
|
+
|
49
|
+
def close_current_log
|
50
|
+
@log.close if @log
|
51
|
+
rescue => e
|
52
|
+
Logging::Utils.log_error(Flamingo.logger,
|
53
|
+
"Failure closing event log #{@log_filename}",e)
|
54
|
+
end
|
55
|
+
|
56
|
+
def open_new_log
|
57
|
+
@log_filename = File.expand_path(File.join(dir,log_filename))
|
58
|
+
@log = File.open(@log_filename,'a')
|
59
|
+
@log.sync = true #Immediately flush all output
|
60
|
+
rescue => e
|
61
|
+
Logging::Utils.log_error(Flamingo.logger,
|
62
|
+
"Failure opening event log #{@log_filename}",e)
|
63
|
+
@log = nil
|
64
|
+
end
|
65
|
+
|
66
|
+
def log_filename
|
67
|
+
ts = Time.now.strftime("%Y%m%d-%H%M%S")
|
68
|
+
"event-#{ts}-#{@rotations}.log"
|
69
|
+
end
|
70
|
+
|
71
|
+
def symlink_current_log
|
72
|
+
current_log = File.expand_path(File.join(dir,"event.log"))
|
73
|
+
`ln -fs #{@log_filename} #{current_log}`
|
74
|
+
end
|
75
|
+
|
76
|
+
end
|
77
|
+
end
|
78
|
+
end
|
@@ -0,0 +1,22 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Logging
|
3
|
+
module Utils
|
4
|
+
|
5
|
+
def log_error(logger, msg, e)
|
6
|
+
logger.error msg
|
7
|
+
logger.error error_trace(e,2)
|
8
|
+
end
|
9
|
+
|
10
|
+
def error_trace(e,indent=0)
|
11
|
+
space = " "*indent
|
12
|
+
err = "#{space}#{e.class.name}: #{e.message}\n"
|
13
|
+
space = " "*(indent+2)
|
14
|
+
err << "#{space}#{e.backtrace.join("\n#{space}")}\n"
|
15
|
+
err
|
16
|
+
end
|
17
|
+
|
18
|
+
extend self
|
19
|
+
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
@@ -0,0 +1,59 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Stats
|
3
|
+
class Connection
|
4
|
+
|
5
|
+
START_TIME = "conn:start:time"
|
6
|
+
START_EVENT_COUNT = "conn:start:event_count"
|
7
|
+
START_TWEET_COUNT = "conn:start:tweet_count"
|
8
|
+
LIMIT_COUNT = "conn:limit:count"
|
9
|
+
LIMIT_TIME = "conn:limit:time"
|
10
|
+
COVERAGE = "conn:coverage"
|
11
|
+
|
12
|
+
def connected!
|
13
|
+
meta.set(START_TIME,Time.now.to_i)
|
14
|
+
meta.set(START_EVENT_COUNT,event_stats.all_count)
|
15
|
+
meta.set(START_TWEET_COUNT,event_stats.tweet_count)
|
16
|
+
meta.set(COVERAGE,100)
|
17
|
+
meta.delete(LIMIT_COUNT)
|
18
|
+
meta.delete(LIMIT_TIME)
|
19
|
+
end
|
20
|
+
|
21
|
+
def limited!(count)
|
22
|
+
meta.set(LIMIT_COUNT,count)
|
23
|
+
meta.set(LIMIT_TIME,Time.now.to_i)
|
24
|
+
meta.set(COVERAGE,coverage_rate)
|
25
|
+
end
|
26
|
+
|
27
|
+
def received_tweets
|
28
|
+
event_stats.tweet_count - (meta.get(START_TWEET_COUNT) || 0)
|
29
|
+
end
|
30
|
+
|
31
|
+
def skipped_tweets
|
32
|
+
meta.get(LIMIT_COUNT) || 0
|
33
|
+
end
|
34
|
+
|
35
|
+
def received_events
|
36
|
+
event_stats.all_count - (meta.get(START_EVENT_COUNT) || 0)
|
37
|
+
end
|
38
|
+
|
39
|
+
def coverage_rate
|
40
|
+
received = received_tweets
|
41
|
+
possible_tweets = received + skipped_tweets
|
42
|
+
if possible_tweets == 0
|
43
|
+
0
|
44
|
+
else
|
45
|
+
(received / possible_tweets.to_f)*100
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
def meta
|
50
|
+
Flamingo.meta
|
51
|
+
end
|
52
|
+
|
53
|
+
def event_stats
|
54
|
+
Flamingo.event_stats
|
55
|
+
end
|
56
|
+
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|
@@ -0,0 +1,52 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Stats
|
3
|
+
class Events
|
4
|
+
|
5
|
+
ALL_COUNT = "events:all_count"
|
6
|
+
RATE = "events:rate"
|
7
|
+
LAST_TIME = "events:last_time"
|
8
|
+
TYPE_COUNT = "events:%s_count"
|
9
|
+
TWEET_COUNT = TYPE_COUNT % [:tweet]
|
10
|
+
|
11
|
+
def initialize
|
12
|
+
@rate_counter = Flamingo::Stats::RateCounter.new(10) do |eps|
|
13
|
+
meta.set(RATE,eps)
|
14
|
+
logger.debug "%.3f eps" % [eps]
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
def event!(type)
|
19
|
+
@rate_counter.event!
|
20
|
+
meta.incr(ALL_COUNT)
|
21
|
+
meta.set(LAST_TIME,Time.now.to_i)
|
22
|
+
meta.incr(TYPE_COUNT % [type])
|
23
|
+
end
|
24
|
+
|
25
|
+
def all_count
|
26
|
+
meta.get(ALL_COUNT) || 0
|
27
|
+
end
|
28
|
+
|
29
|
+
def last_time
|
30
|
+
meta.get(LAST_TIME)
|
31
|
+
end
|
32
|
+
|
33
|
+
def type_count(type)
|
34
|
+
meta.get(TYPE_COUNT % [type]) || 0
|
35
|
+
end
|
36
|
+
|
37
|
+
def tweet_count
|
38
|
+
type_count(:tweet)
|
39
|
+
end
|
40
|
+
|
41
|
+
private
|
42
|
+
def logger
|
43
|
+
Flamingo.logger
|
44
|
+
end
|
45
|
+
|
46
|
+
def meta
|
47
|
+
Flamingo.meta
|
48
|
+
end
|
49
|
+
|
50
|
+
end
|
51
|
+
end
|
52
|
+
end
|
@@ -0,0 +1,38 @@
|
|
1
|
+
module Flamingo
|
2
|
+
module Stats
|
3
|
+
|
4
|
+
# Simple counter for measuring stream rates in events per second
|
5
|
+
class RateCounter
|
6
|
+
|
7
|
+
attr_accessor :rate, :callback
|
8
|
+
|
9
|
+
def initialize(sample_duration=60, &block)
|
10
|
+
@sample_duration = sample_duration
|
11
|
+
self.callback = block
|
12
|
+
start_sample
|
13
|
+
end
|
14
|
+
|
15
|
+
def event!
|
16
|
+
@count += 1
|
17
|
+
if (diff = (now - @sample_start_time)) >= @sample_duration
|
18
|
+
self.rate = (@count / diff.to_f)
|
19
|
+
if callback
|
20
|
+
callback.call(rate)
|
21
|
+
end
|
22
|
+
start_sample
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
private
|
27
|
+
def now
|
28
|
+
Time.now.to_i
|
29
|
+
end
|
30
|
+
|
31
|
+
def start_sample
|
32
|
+
@sample_start_time = now
|
33
|
+
@count = 0
|
34
|
+
end
|
35
|
+
|
36
|
+
end
|
37
|
+
end
|
38
|
+
end
|
data/lib/flamingo/stream.rb
CHANGED
@@ -53,12 +53,16 @@ module Flamingo
|
|
53
53
|
:name=>name,:resource=>resource,:params=>params.all
|
54
54
|
)
|
55
55
|
end
|
56
|
+
|
57
|
+
def query
|
58
|
+
params.map{|key,value| "#{key}=#{param_value(value)}" }.join("&")
|
59
|
+
end
|
56
60
|
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
end
|
61
|
+
def to_s
|
62
|
+
"#{path}?#{query}"
|
63
|
+
end
|
61
64
|
|
65
|
+
private
|
62
66
|
def param_value(val)
|
63
67
|
case val
|
64
68
|
when String then CGI.escape(val)
|
data/lib/flamingo/version.rb
CHANGED
@@ -1,3 +1,3 @@
|
|
1
1
|
module Flamingo
|
2
|
-
Version = VERSION = '0.
|
3
|
-
end
|
2
|
+
Version = VERSION = '0.4.0'
|
3
|
+
end
|
data/lib/flamingo/wader.rb
CHANGED
@@ -90,8 +90,10 @@ module Flamingo
|
|
90
90
|
private
|
91
91
|
def connect_and_run
|
92
92
|
EventMachine::run do
|
93
|
+
Flamingo.logger.info("Connecting to stream: #{stream}")
|
93
94
|
self.connection = stream.connect(:auth=>"#{screen_name}:#{password}")
|
94
|
-
Flamingo.logger.info("
|
95
|
+
Flamingo.logger.info("Connected to stream")
|
96
|
+
Flamingo.connection_stats.connected!
|
95
97
|
|
96
98
|
connection.each_item do |event_json|
|
97
99
|
dispatch_event(event_json)
|
@@ -141,7 +143,7 @@ module Flamingo
|
|
141
143
|
end
|
142
144
|
|
143
145
|
def dispatch_event(event_json)
|
144
|
-
|
146
|
+
Flamingo.dispatch_queue.enqueue(event_json)
|
145
147
|
end
|
146
148
|
|
147
149
|
end
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: flamingo
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 15
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
|
-
-
|
9
|
-
-
|
10
|
-
version: 0.
|
8
|
+
- 4
|
9
|
+
- 0
|
10
|
+
version: 0.4.0
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- Hayes Davis
|
@@ -16,7 +16,7 @@ autorequire:
|
|
16
16
|
bindir: bin
|
17
17
|
cert_chain: []
|
18
18
|
|
19
|
-
date:
|
19
|
+
date: 2011-02-10 00:00:00 -08:00
|
20
20
|
default_executable:
|
21
21
|
dependencies:
|
22
22
|
- !ruby/object:Gem::Dependency
|
@@ -262,9 +262,15 @@ files:
|
|
262
262
|
- lib/flamingo/daemon/trap_keeper.rb
|
263
263
|
- lib/flamingo/daemon/wader_process.rb
|
264
264
|
- lib/flamingo/daemon/web_server_process.rb
|
265
|
-
- lib/flamingo/
|
265
|
+
- lib/flamingo/dispatch_queue.rb
|
266
|
+
- lib/flamingo/dispatcher.rb
|
267
|
+
- lib/flamingo/logging/event_log.rb
|
266
268
|
- lib/flamingo/logging/formatter.rb
|
269
|
+
- lib/flamingo/logging/utils.rb
|
267
270
|
- lib/flamingo/meta.rb
|
271
|
+
- lib/flamingo/stats/connection.rb
|
272
|
+
- lib/flamingo/stats/events.rb
|
273
|
+
- lib/flamingo/stats/rate_counter.rb
|
268
274
|
- lib/flamingo/stream.rb
|
269
275
|
- lib/flamingo/stream_params.rb
|
270
276
|
- lib/flamingo/subscription.rb
|
@@ -1,51 +0,0 @@
|
|
1
|
-
module Flamingo
|
2
|
-
class DispatchEvent
|
3
|
-
|
4
|
-
@parser = Yajl::Parser.new(:symbolize_keys => true)
|
5
|
-
|
6
|
-
class << self
|
7
|
-
|
8
|
-
def queue
|
9
|
-
Flamingo.dispatch_queue
|
10
|
-
end
|
11
|
-
|
12
|
-
def meta
|
13
|
-
Flamingo.meta
|
14
|
-
end
|
15
|
-
|
16
|
-
#
|
17
|
-
# TODO Track stats including: tweets per second and last tweet time
|
18
|
-
# TODO Provide some first-level check for repeated status ids
|
19
|
-
# TODO Consider subscribers for receiving particular terms - do the heavy
|
20
|
-
# lifting of parsing tweets and delivering them to particular subscribers
|
21
|
-
# TODO Consider window of tweets (approx 3 seconds) and sort before
|
22
|
-
# dispatching to improve in-order delivery (helps with "k-sorted")
|
23
|
-
#
|
24
|
-
def perform(event_json)
|
25
|
-
meta.incr("events:all_count")
|
26
|
-
meta.set("events:last_time",Time.now.utc.to_i)
|
27
|
-
type, event = typed_event(parse(event_json))
|
28
|
-
meta.incr("events:#{type}_count")
|
29
|
-
Subscription.all.each do |sub|
|
30
|
-
Resque::Job.create(sub.name, "HandleFlamingoEvent", type, event)
|
31
|
-
Flamingo.logger.debug "Put job on subscription queue #{sub.name}\n#{event_json}"
|
32
|
-
end
|
33
|
-
end
|
34
|
-
|
35
|
-
def parse(json)
|
36
|
-
@parser.parse(json)
|
37
|
-
end
|
38
|
-
|
39
|
-
def typed_event(event)
|
40
|
-
if event[:delete]
|
41
|
-
[:delete, event[:delete]]
|
42
|
-
elsif event[:link]
|
43
|
-
[:link, event[:link]]
|
44
|
-
else
|
45
|
-
[:tweet, event]
|
46
|
-
end
|
47
|
-
end
|
48
|
-
|
49
|
-
end
|
50
|
-
end
|
51
|
-
end
|