girl_friday 0.9.1 → 0.9.2
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +1 -0
- data/.rvmrc +2 -1
- data/.travis.yml +8 -0
- data/Gemfile +1 -0
- data/History.md +12 -2
- data/README.md +11 -9
- data/TODO.md +0 -1
- data/examples/batch.rb +42 -0
- data/examples/pipeline.rb +89 -0
- data/girl_friday.gemspec +1 -0
- data/lib/girl_friday/actor.rb +20 -13
- data/lib/girl_friday/batch.rb +47 -0
- data/lib/girl_friday/error_handler.rb +1 -1
- data/lib/girl_friday/persistence.rb +4 -5
- data/lib/girl_friday/server.rb +12 -13
- data/lib/girl_friday/timed_queue.rb +48 -0
- data/lib/girl_friday/version.rb +1 -1
- data/lib/girl_friday/work_queue.rb +37 -8
- data/lib/girl_friday.rb +6 -9
- data/server/public/css/style.css +10 -0
- data/server/public/js/girl_friday.js +55 -0
- data/server/public/js/jquery.min.js +18 -0
- data/server/views/index.erb +9 -19
- data/test/helper.rb +2 -1
- data/test/test_batch.rb +23 -0
- data/test/test_girl_friday.rb +31 -18
- data/test/test_girl_friday_immediately.rb +49 -0
- metadata +27 -3
data/.gitignore
CHANGED
data/.rvmrc
CHANGED
data/.travis.yml
ADDED
data/Gemfile
CHANGED
data/History.md
CHANGED
@@ -1,14 +1,24 @@
|
|
1
1
|
Changes
|
2
2
|
================
|
3
3
|
|
4
|
+
0.9.2
|
5
|
+
---------
|
6
|
+
|
7
|
+
* Remove use of weakrefs to track queue instances, use ObjectSpace
|
8
|
+
instead.
|
9
|
+
* Add support for Batch operations, providing an easy way to fan out
|
10
|
+
operations and then collect results when completed.
|
11
|
+
* Added WorkQueue.immediate! and WorkQueue.queue! to switch background processing off and back on respectively. Nice to use when testing. (jc00ke, ryanlecompte)
|
12
|
+
* Added some ajax updates to the girl\_friday status server. (jc00ke)
|
13
|
+
|
4
14
|
0.9.1
|
5
15
|
---------
|
6
16
|
|
7
17
|
* Lazy initialize the worker actors to avoid dead thread problems with Unicorn forking processes.
|
8
|
-
* Add initial pass at
|
18
|
+
* Add initial pass at girl\_friday Rack server (see wiki). It's awful looking, trust me, help wanted.
|
9
19
|
|
10
20
|
|
11
21
|
0.9.0
|
12
22
|
---------
|
13
23
|
|
14
|
-
* Initial release
|
24
|
+
* Initial release
|
data/README.md
CHANGED
@@ -1,29 +1,29 @@
|
|
1
|
-
|
1
|
+
girl\_friday
|
2
2
|
====================
|
3
3
|
|
4
|
-
Have a task you want to get done sometime soon but don't want to do it yourself? Give it to
|
4
|
+
Have a task you want to get done sometime soon but don't want to do it yourself? Give it to girl\_friday! From wikipedia:
|
5
5
|
|
6
6
|
> The term Man Friday has become an idiom, still in mainstream usage, to describe an especially faithful servant or
|
7
7
|
> one's best servant or right-hand man. The female equivalent is Girl Friday. The title of the movie His Girl Friday
|
8
8
|
> alludes to it and may have popularized it.
|
9
9
|
|
10
|
-
|
10
|
+
girl\_friday is a Ruby library for performing asynchronous tasks. Often times you don't want to block a web response by performing some task, like sending an email, so you can just use this gem to perform it in the background. It works with any Ruby application, including Rails 3 applications.
|
11
11
|
|
12
12
|
|
13
13
|
Installation
|
14
14
|
------------------
|
15
15
|
|
16
|
-
We recommend using [JRuby 1.6+](http://jruby.org) or [Rubinius 2.0+](http://rubini.us) with
|
16
|
+
We recommend using [JRuby 1.6+](http://jruby.org) or [Rubinius 2.0+](http://rubini.us) with girl\_friday. Both are excellent options for executing Ruby these days.
|
17
17
|
|
18
18
|
gem install girl_friday
|
19
19
|
|
20
|
-
|
20
|
+
girl\_friday does not support Ruby 1.8 (MRI) because of its poor threading support. Ruby 1.9 will work reasonably well if you use gems that release the GIL for network I/O (mysql2 is a good example of this, do **not** use the original mysql gem).
|
21
21
|
|
22
22
|
|
23
23
|
Usage
|
24
24
|
--------------------
|
25
25
|
|
26
|
-
Put
|
26
|
+
Put girl\_friday in your Gemfile:
|
27
27
|
|
28
28
|
gem 'girl_friday'
|
29
29
|
|
@@ -46,17 +46,19 @@ The msg parameter to push is just a Hash whose contents are completely up to you
|
|
46
46
|
|
47
47
|
Your message processing block should **not** access any instance data or variables outside of the block. That's shared mutable state and dangerous to touch! I also strongly recommend your queue processor block be **VERY** short, ideally just a method call or two. You can unit test those methods easily but not the processor block itself.
|
48
48
|
|
49
|
+
You can call `GirlFriday::WorkQueue.immediate!` to process jobs immediately, which is helpful when testing. `GirlFriday::WorkQueue.queue!` will revert this & jobs will be processed by actors.
|
50
|
+
|
49
51
|
|
50
52
|
More Detail
|
51
53
|
--------------------
|
52
54
|
|
53
|
-
Please see the [
|
55
|
+
Please see the [girl\_friday wiki](https://github.com/mperham/girl_friday/wiki) for more detail and advanced options and tuning. You'll find details on queue persistence with Redis, implementing clean shutdown, querying runtime metrics and SO MUCH MORE!
|
54
56
|
|
55
57
|
|
56
58
|
Thanks
|
57
59
|
--------------------
|
58
60
|
|
59
|
-
[Carbon Five](http://carbonfive.com), I write and maintain
|
61
|
+
[Carbon Five](http://carbonfive.com), I write and maintain girl\_friday on their clock.
|
60
62
|
|
61
63
|
This gem contains a copy of the Rubinius Actor API, modified to work on any Ruby VM. Thanks to Evan Phoenix, MenTaLguY and the Rubinius project for permission to use and distribute this code.
|
62
64
|
|
@@ -64,4 +66,4 @@ This gem contains a copy of the Rubinius Actor API, modified to work on any Ruby
|
|
64
66
|
Author
|
65
67
|
--------------------
|
66
68
|
|
67
|
-
Mike Perham, [@mperham](https://twitter.com/mperham), [mikeperham.com](http://mikeperham.com)
|
69
|
+
Mike Perham, [@mperham](https://twitter.com/mperham), [mikeperham.com](http://mikeperham.com)
|
data/TODO.md
CHANGED
data/examples/batch.rb
ADDED
@@ -0,0 +1,42 @@
|
|
1
|
+
require 'girl_friday'
|
2
|
+
require 'open-uri'
|
3
|
+
require 'benchmark'
|
4
|
+
require 'nokogiri'
|
5
|
+
|
6
|
+
class UrlProcessor
|
7
|
+
URLS = %w(http://www.bing.com http://www.google.com http://www.yahoo.com)
|
8
|
+
|
9
|
+
def parallel
|
10
|
+
batch = GirlFriday::Batch.new(URLS, :size => 3) do |url|
|
11
|
+
html = open(url)
|
12
|
+
doc = Nokogiri::HTML(html.read)
|
13
|
+
doc.css('span').count
|
14
|
+
end
|
15
|
+
p URLS.zip(batch.results)
|
16
|
+
end
|
17
|
+
|
18
|
+
def serial
|
19
|
+
results = URLS.map do |url|
|
20
|
+
html = open(url)
|
21
|
+
doc = Nokogiri::HTML(html.read)
|
22
|
+
doc.css('span').count
|
23
|
+
end
|
24
|
+
p URLS.zip(results)
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
# Expected output:
|
29
|
+
# [["http://www.bing.com", 24], ["http://www.google.com", 8], ["http://www.yahoo.com", 172]]
|
30
|
+
#
|
31
|
+
# Benchmark results:
|
32
|
+
# serial 1.231000 0.000000 1.231000 ( 1.231000)
|
33
|
+
# parallel 0.447000 0.000000 0.447000 ( 0.447000)
|
34
|
+
|
35
|
+
processor = UrlProcessor.new
|
36
|
+
Benchmark.bm(25) do |x|
|
37
|
+
%w(serial parallel).each do |op|
|
38
|
+
x.report(op) do
|
39
|
+
processor.send(op.to_sym)
|
40
|
+
end
|
41
|
+
end
|
42
|
+
end
|
@@ -0,0 +1,89 @@
|
|
1
|
+
require 'open-uri'
|
2
|
+
require 'nokogiri'
|
3
|
+
require 'girl_friday'
|
4
|
+
|
5
|
+
|
6
|
+
##
|
7
|
+
# In this example, we use girl_friday to implement a processing pipeline
|
8
|
+
# for scraping large images from a website. Given a URL, we want to fetch
|
9
|
+
# the HTML for that URL, find all the images, download those images, discard images
|
10
|
+
# which do not meet a size heuristic and save the ones that match. This
|
11
|
+
# processing is I/O-heavy and perfect for breaking into many threads.
|
12
|
+
#
|
13
|
+
# A processing pipeline is just a series of linked processing steps.
|
14
|
+
# We create a girl_friday queue for each step, sized appropriately for how few/many parallel worker threads we want for that step.
|
15
|
+
# The process_xxx methods implement the actual logic for the step.
|
16
|
+
# The finish_xxx methods pass the result to the next step in the pipeline.
|
17
|
+
#
|
18
|
+
class ImagePipeline
|
19
|
+
def initialize
|
20
|
+
@download_html = GirlFriday::Queue.new(:download_html, :size => 5, &method(:process_html))
|
21
|
+
@extract_imgs = GirlFriday::Queue.new(:extract, :size => 2, &method(:process_extract))
|
22
|
+
@download_imgs = GirlFriday::Queue.new(:download_imgs, :size => 10, &method(:process_imgs))
|
23
|
+
@thumb = GirlFriday::Queue.new(:thumb_imgs, :size => 5, &method(:process_thumb))
|
24
|
+
end
|
25
|
+
|
26
|
+
def process(url)
|
27
|
+
log "Pushing #{url}"
|
28
|
+
@download_html.push({ :url => url }, &method(:finish_html))
|
29
|
+
end
|
30
|
+
|
31
|
+
private
|
32
|
+
|
33
|
+
def process_html(msg)
|
34
|
+
msg.merge(:htmlfile => open(msg[:url]))
|
35
|
+
end
|
36
|
+
|
37
|
+
def finish_html(result)
|
38
|
+
@extract_imgs.push(result, &method(:finish_extract))
|
39
|
+
end
|
40
|
+
|
41
|
+
def process_extract(msg)
|
42
|
+
doc = Nokogiri::HTML(msg[:htmlfile].read)
|
43
|
+
doc.css('img[src]').map{|n| n['src']}.select { |url| url =~ /^#{msg[:url]}/ }
|
44
|
+
end
|
45
|
+
|
46
|
+
def finish_extract(result)
|
47
|
+
result.each do |imgurl|
|
48
|
+
@download_imgs.push(imgurl, &method(:finish_imgs))
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
def process_imgs(msg)
|
53
|
+
log "Fetching image: #{msg}"
|
54
|
+
imgfile = open msg
|
55
|
+
return if imgfile.size < 20_000 # ignore images less than 20k
|
56
|
+
result = `identify #{imgfile.path}`
|
57
|
+
log "Image: #{result}"
|
58
|
+
return unless result =~ /(\d+)x(\d+)\+0\+0/
|
59
|
+
return if Integer($1) + Integer($2) < 500
|
60
|
+
# Passed all our heuristics, pass it on!
|
61
|
+
imgfile
|
62
|
+
end
|
63
|
+
|
64
|
+
def finish_imgs(result)
|
65
|
+
return unless result
|
66
|
+
@thumb.push(result, &method(:finish_thumb))
|
67
|
+
end
|
68
|
+
|
69
|
+
def process_thumb(msg)
|
70
|
+
FileUtils.cp msg.path, Time.now.to_f.to_s
|
71
|
+
msg.path
|
72
|
+
end
|
73
|
+
|
74
|
+
def finish_thumb(result)
|
75
|
+
log "Finished image at #{result}"
|
76
|
+
end
|
77
|
+
|
78
|
+
def log(msg)
|
79
|
+
print "#{Thread.current}: #{msg}\n"
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
|
84
|
+
pipeline = ImagePipeline.new
|
85
|
+
pipeline.process 'http://blog.carbonfive.com'
|
86
|
+
|
87
|
+
loop do
|
88
|
+
sleep 1
|
89
|
+
end
|
data/girl_friday.gemspec
CHANGED
data/lib/girl_friday/actor.rb
CHANGED
@@ -53,7 +53,7 @@ class Actor
|
|
53
53
|
@@registered_lock = Queue.new
|
54
54
|
@@registered = {}
|
55
55
|
@@registered_lock << nil
|
56
|
-
|
56
|
+
|
57
57
|
def current
|
58
58
|
Thread.current[:__current_actor__] ||= private_new
|
59
59
|
end
|
@@ -113,7 +113,7 @@ class Actor
|
|
113
113
|
recipient.notify_exited(current, reason)
|
114
114
|
self
|
115
115
|
end
|
116
|
-
|
116
|
+
|
117
117
|
# Link the current Actor to another one.
|
118
118
|
def link(actor)
|
119
119
|
current = self.current
|
@@ -121,7 +121,7 @@ class Actor
|
|
121
121
|
actor.notify_link current
|
122
122
|
self
|
123
123
|
end
|
124
|
-
|
124
|
+
|
125
125
|
# Unlink the current Actor from another one
|
126
126
|
def unlink(actor)
|
127
127
|
current = self.current
|
@@ -259,12 +259,19 @@ class Actor
|
|
259
259
|
begin
|
260
260
|
raise @interrupts.shift unless @interrupts.empty?
|
261
261
|
|
262
|
-
|
263
|
-
message = @mailbox
|
262
|
+
if @mailbox.size > 0
|
263
|
+
message = @mailbox.shift
|
264
264
|
action = filter.action_for(message)
|
265
|
-
|
266
|
-
@mailbox
|
267
|
-
|
265
|
+
unless action
|
266
|
+
@mailbox << message
|
267
|
+
for i in 1...(@mailbox.size)
|
268
|
+
message = @mailbox[i]
|
269
|
+
action = filter.action_for(message)
|
270
|
+
if action
|
271
|
+
@mailbox.delete_at(i)
|
272
|
+
break
|
273
|
+
end
|
274
|
+
end
|
268
275
|
end
|
269
276
|
end
|
270
277
|
|
@@ -303,7 +310,7 @@ class Actor
|
|
303
310
|
action.call message
|
304
311
|
end
|
305
312
|
end
|
306
|
-
|
313
|
+
|
307
314
|
# Notify this actor that it's now linked to the given one; this is not
|
308
315
|
# intended to be used directly except by actor implementations. Most
|
309
316
|
# users will want to use Actor.link instead.
|
@@ -322,7 +329,7 @@ class Actor
|
|
322
329
|
actor.notify_exited(self, exit_reason) unless alive
|
323
330
|
self
|
324
331
|
end
|
325
|
-
|
332
|
+
|
326
333
|
# Notify this actor that it's now unlinked from the given one; this is
|
327
334
|
# not intended to be used directly except by actor implementations. Most
|
328
335
|
# users will want to use Actor.unlink instead.
|
@@ -337,7 +344,7 @@ class Actor
|
|
337
344
|
end
|
338
345
|
self
|
339
346
|
end
|
340
|
-
|
347
|
+
|
341
348
|
# Notify this actor that one of the Actors it's linked to has exited;
|
342
349
|
# this is not intended to be used directly except by actor implementations.
|
343
350
|
# Most users will want to use Actor.send_exit instead.
|
@@ -399,7 +406,7 @@ class Actor
|
|
399
406
|
end
|
400
407
|
end
|
401
408
|
private :check_thread
|
402
|
-
|
409
|
+
|
403
410
|
def _trap_exit=(value) #:nodoc:
|
404
411
|
check_thread
|
405
412
|
@lock.pop
|
@@ -410,7 +417,7 @@ class Actor
|
|
410
417
|
@lock << nil
|
411
418
|
end
|
412
419
|
end
|
413
|
-
|
420
|
+
|
414
421
|
def _trap_exit #:nodoc:
|
415
422
|
check_thread
|
416
423
|
@lock.pop
|
@@ -0,0 +1,47 @@
|
|
1
|
+
module GirlFriday
|
2
|
+
|
3
|
+
##
|
4
|
+
# Batch represents a set of operations which can be processed
|
5
|
+
# concurrently. Asking for the results of the batch acts as a barrier:
|
6
|
+
# the calling thread will block until all operations have completed.
|
7
|
+
# Results are guaranteed to be returned in the
|
8
|
+
# same order as the operations are given.
|
9
|
+
#
|
10
|
+
# Internally a girl_friday queue is created which limits the
|
11
|
+
# number of concurrent operations based on the :size option.
|
12
|
+
#
|
13
|
+
# TODO Errors are not handled well at all.
|
14
|
+
class Batch
|
15
|
+
def initialize(enumerable, options, &block)
|
16
|
+
@queue = GirlFriday::Queue.new(:batch, options, &block)
|
17
|
+
@complete = 0
|
18
|
+
@size = enumerable.count
|
19
|
+
@results = Array.new(@size)
|
20
|
+
@lock = Mutex.new
|
21
|
+
@condition = ConditionVariable.new
|
22
|
+
start(enumerable)
|
23
|
+
end
|
24
|
+
|
25
|
+
def results(timeout=nil)
|
26
|
+
@lock.synchronize do
|
27
|
+
@condition.wait(@lock, timeout) if @complete != @size
|
28
|
+
@results
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
private
|
33
|
+
|
34
|
+
def start(operations)
|
35
|
+
operations.each_with_index do |packet, index|
|
36
|
+
@queue.push(packet) do |result|
|
37
|
+
@lock.synchronize do
|
38
|
+
@complete += 1
|
39
|
+
@results[index] = result
|
40
|
+
@condition.signal if @complete == @size
|
41
|
+
end
|
42
|
+
end
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
end
|
47
|
+
end
|
@@ -10,16 +10,16 @@ module GirlFriday
|
|
10
10
|
@backlog << work
|
11
11
|
end
|
12
12
|
alias_method :<<, :push
|
13
|
-
|
13
|
+
|
14
14
|
def pop
|
15
|
-
@backlog.
|
15
|
+
@backlog.shift
|
16
16
|
end
|
17
|
-
|
17
|
+
|
18
18
|
def size
|
19
19
|
@backlog.size
|
20
20
|
end
|
21
21
|
end
|
22
|
-
|
22
|
+
|
23
23
|
class Redis
|
24
24
|
def initialize(name, options)
|
25
25
|
@opts = options
|
@@ -53,4 +53,3 @@ module GirlFriday
|
|
53
53
|
end
|
54
54
|
end
|
55
55
|
end
|
56
|
-
|
data/lib/girl_friday/server.rb
CHANGED
@@ -16,24 +16,23 @@ module GirlFriday
|
|
16
16
|
set :views, "#{basedir}/views"
|
17
17
|
set :public, "#{basedir}/public"
|
18
18
|
set :static, true
|
19
|
-
|
19
|
+
|
20
20
|
helpers do
|
21
21
|
include Rack::Utils
|
22
22
|
alias_method :h, :escape_html
|
23
23
|
|
24
|
-
def
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
end
|
24
|
+
def url_path(*path_parts)
|
25
|
+
[path_prefix, path_parts].join('/').squeeze('/')
|
26
|
+
end
|
27
|
+
alias_method :u, :url_path
|
28
|
+
|
29
|
+
def path_prefix
|
30
|
+
request.env['SCRIPT_NAME']
|
32
31
|
end
|
33
32
|
end
|
34
|
-
|
35
|
-
get '
|
36
|
-
redirect
|
33
|
+
|
34
|
+
get '/?' do
|
35
|
+
redirect url_path('status')
|
37
36
|
end
|
38
37
|
|
39
38
|
get '/status' do
|
@@ -46,4 +45,4 @@ module GirlFriday
|
|
46
45
|
GirlFriday.status.to_json
|
47
46
|
end
|
48
47
|
end
|
49
|
-
end
|
48
|
+
end
|