sidekiq-paquet 0.1.1 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 2a8cb3d7307887cc9c34999050e7460e6515f381
4
- data.tar.gz: 6097a9b12ffc1546b784c2b519d5687da3017418
3
+ metadata.gz: 2a3886887de8e637e18bd9908831371f37ac7ea9
4
+ data.tar.gz: c424dee95bc830a67aad1b22581ea8b4924ea4a6
5
5
  SHA512:
6
- metadata.gz: dc9e033a09c25ef09aae43065f9b92329acf33225c640d2737b4ef0dcc112681c5c436938206c0b67c4864d43ad8479a6533b78295ec05b8c1978cf60f216cc9
7
- data.tar.gz: c970fde44941632e1d115dd88a5bbdfaf65e44f005ca13bcf356305d042283a705ca9ee1df46a5bd585de6a18b3fef92762a00c1b8ef5ab1b07e03fc0c9920c4
6
+ metadata.gz: 57cc001b4f3335cf0b53c712d62bd02f266b502589256b5456aa8d437ca952868f58f239c83deacca4e6b108a86789730feb5d5f0065ae3d50d6613025df3b48
7
+ data.tar.gz: 25d88e4e1ebcc4e956838e928575dfb325acbb1312a91309810278143c1d214c9128b285d5c128dab336a729115b6478d161471c86612055f00f662d65072e0d
data/.travis.yml CHANGED
@@ -1,6 +1,8 @@
1
1
  language: ruby
2
+ cache: bundler
2
3
  services:
3
4
  - redis-server
4
5
  rvm:
5
- - 2.2.2
6
+ - 2.2.4
7
+ - 2.3.0
6
8
  before_install: gem install bundler -v 1.10.6
data/README.md CHANGED
@@ -9,17 +9,17 @@ Useful for grouping background API calls or intensive database inserts coming fr
9
9
  gem install 'sidekiq-paquet'
10
10
  ```
11
11
 
12
- sidekiq-bulk requires Sidekiq 4+.
12
+ sidekiq-paquet requires Sidekiq 4+.
13
13
 
14
14
  ## Usage
15
15
 
16
- Add `bulk: true` option to your worker's `sidekiq_options` to have jobs processed in bulk. The size of the bulk can be configured per worker. If not specified, the `Sidekiq::Paquet.options[:default_bulk_size]` is used.
16
+ Add `bundled: true` option to your worker's `sidekiq_options` to have jobs processed in bulk. The size of the bundle can be configured per worker. If not specified, the `Sidekiq::Paquet.options[:default_bundle_size]` is used.
17
17
 
18
18
  ```ruby
19
19
  class ElasticIndexerWorker
20
20
  include Sidekiq::Worker
21
21
 
22
- sidekiq_options bulk: true, bulk_size: 100
22
+ sidekiq_options bundled: true, bundle_size: 100
23
23
 
24
24
  def perform(*values)
25
25
  # Perform work with the array of values
@@ -27,7 +27,7 @@ class ElasticIndexerWorker
27
27
  end
28
28
  ```
29
29
 
30
- Instead of being processed by Sidekiq, jobs will be stored into a separate queue and periodically, a poller will retrieve them by slice of `bulk_size` and enqueue a regular Sidekiq job with that bulk as argument.
30
+ Instead of being processed by Sidekiq right away, jobs will be stored into a separate queue and periodically, a separate thread will pick up this internal queue, slice `bundle_size` elements into an array and enqueue a regular Sidekiq job with that bundle as argument.
31
31
  Thus, your worker will only be invoked with an array of values, never with single values themselves.
32
32
 
33
33
  For example, if you call `perform_async` twice on the previous worker
@@ -41,23 +41,23 @@ the worker instance will receive these values as a single argument
41
41
 
42
42
  ```ruby
43
43
  [
44
- { delete: { _index: 'users', _id: 1, _type: 'user' } },
45
- { delete: { _index: 'users', _id: 2, _type: 'user' } }
44
+ [{ delete: { _index: 'users', _id: 1, _type: 'user' } }],
45
+ [{ delete: { _index: 'users', _id: 2, _type: 'user' } }]
46
46
  ]
47
47
  ```
48
48
 
49
- Every time polling happens, `sidekiq-paquet` will try to process all your workers marked for bulk. If you want to limit the time between two polling per worker, you can pass the `bulk_minimum_interval` option to sidekiq options.
49
+ Every time flushing happens, `sidekiq-paquet` will try to process all your workers marked as bundled. If you want to limit the time between two flushing in a worker, you can pass the `minimum_execution_interval` option to sidekiq options.
50
50
 
51
51
  ## Configuration
52
52
 
53
53
  You can change global configuration by modifying the `Sidekiq::Paquet.options` hash.
54
54
 
55
55
  ```
56
- Sidekiq::Paquet.options[:default_bulk_size] = 500 # Default is 100
57
- Sidekiq::Paquet.options[:average_bulk_flush_interval] = 30 # Default is 15
56
+ Sidekiq::Paquet.options[:default_bundle_size] = 500 # Default is 100
57
+ Sidekiq::Paquet.options[:average_flush_interval] = 30 # Default is 15
58
58
  ```
59
59
 
60
- The `average_bulk_flush_interval` represent the average time elapsed between two polling of values. This scales with the number of sidekiq processes you're running. So if you have 5 sidekiq processes, and set the `average_bulk_flush_interval` to 15, each process will check for new bulk jobs every 75 seconds -- so that in average, the bulk queue will be checked every 15 seconds.
60
+ The `average_flush_interval` represent the average time elapsed between two polling of values. This scales with the number of sidekiq processes you're running. So if you have 5 sidekiq processes, and set the `average_flush_interval` to 15, each process will check for new bundled jobs every 75 seconds -- so that in average, the bundles queue will be checked every 15 seconds.
61
61
 
62
62
  ## Contributing
63
63
 
@@ -0,0 +1,70 @@
1
+ module Sidekiq
2
+ module Paquet
3
+ class Bundle
4
+
5
+ def self.append(item)
6
+ worker_name = item['class'.freeze]
7
+ args = item.fetch('args'.freeze, [])
8
+
9
+ Sidekiq.redis do |conn|
10
+ conn.multi do
11
+ conn.zadd('bundles'.freeze, 0, worker_name, nx: true)
12
+ conn.rpush("bundle:#{worker_name}", Sidekiq.dump_json(args))
13
+ end
14
+ end
15
+ end
16
+
17
+ def self.enqueue_jobs
18
+ now = Time.now.to_f
19
+ Sidekiq.redis do |conn|
20
+ workers = conn.zrangebyscore('bundles'.freeze, '-inf', now)
21
+
22
+ workers.each do |worker|
23
+ klass = worker.constantize
24
+ opts = klass.get_sidekiq_options
25
+ min_interval = opts['minimum_execution_interval'.freeze]
26
+
27
+ items = conn.lrange("bundle:#{worker}", 0, -1)
28
+ items.map! { |i| Sidekiq.load_json(i) }
29
+
30
+ items.each_slice(opts.fetch('bundle_size'.freeze, Sidekiq::Paquet.options[:default_bundle_size])) do |vs|
31
+ Sidekiq::Client.push(
32
+ 'class' => worker,
33
+ 'queue' => opts['queue'.freeze],
34
+ 'args' => vs
35
+ )
36
+ end
37
+
38
+ conn.ltrim("bundle:#{worker}", items.size, -1)
39
+ conn.zadd('bundles'.freeze, now + min_interval, worker) if min_interval
40
+ end
41
+ end
42
+ end
43
+
44
+ def initialize(name)
45
+ @lname = "bundle:#{name}"
46
+ end
47
+
48
+ def queue
49
+ worker_name.constantize.get_sidekiq_options['queue'.freeze]
50
+ end
51
+
52
+ def worker_name
53
+ @lname.split(':').last
54
+ end
55
+
56
+ def size
57
+ Sidekiq.redis { |c| c.llen(@lname) }
58
+ end
59
+
60
+ def items
61
+ Sidekiq.redis { |c| c.lrange(@lname, 0, -1) }
62
+ end
63
+
64
+ def clear
65
+ Sidekiq.redis { |c| c.del(@lname) }
66
+ end
67
+
68
+ end
69
+ end
70
+ end
@@ -0,0 +1,51 @@
1
+ require 'concurrent/timer_task'
2
+
3
+ module Sidekiq
4
+ module Paquet
5
+ class Flusher
6
+
7
+ def initialize
8
+ @task = Concurrent::TimerTask.new(
9
+ execution_interval: execution_interval) { Bundle.enqueue_jobs }
10
+ end
11
+
12
+ def start
13
+ Sidekiq.logger.info('Starting paquet flusher')
14
+ @task.execute
15
+ end
16
+
17
+ def shutdown
18
+ Sidekiq.logger.info('Paquet flusher exiting...')
19
+ @task.shutdown
20
+ end
21
+
22
+ private
23
+
24
+ # To avoid having all processes flushing at the same time, randomize
25
+ # the execution interval between 0.5-1.5 the scaled interval, so that
26
+ # on average, interval is respected.
27
+ #
28
+ def execution_interval
29
+ avg = scaled_interval.to_f
30
+ avg * rand + avg / 2
31
+ end
32
+
33
+ # Scale interval with the number of Sidekiq processes running. Each one
34
+ # is going to run a flusher instance. If you have 10 processes and an
35
+ # average flush interval of 10s, it means one process is flushing every
36
+ # second, which is wasteful and beats the purpose of bundling.
37
+ #
38
+ # To avoid this, we scale the average flush interval with the number of
39
+ # Sidekiq processes running, i.e instead of flushing every 10s, let every
40
+ # process flush every 100 seconds.
41
+ #
42
+ def scaled_interval
43
+ Sidekiq::Paquet.options[:flush_interval] ||= begin
44
+ pcount = Sidekiq::ProcessSet.new.size
45
+ pcount = 1 if pcount == 0 # Maybe raise here
46
+ pcount * Sidekiq::Paquet.options[:average_flush_interval]
47
+ end
48
+ end
49
+ end
50
+ end
51
+ end
@@ -2,8 +2,8 @@ module Sidekiq
2
2
  module Paquet
3
3
  class Middleware
4
4
  def call(worker, item, queue, redis_pool = nil)
5
- if item['bulk'.freeze]
6
- Batch.append(item)
5
+ if item['bundled'.freeze]
6
+ Bundle.append(item)
7
7
  false
8
8
  else
9
9
  yield
@@ -1,5 +1,5 @@
1
1
  module Sidekiq
2
2
  module Paquet
3
- VERSION = '0.1.1'
3
+ VERSION = '0.2.0'
4
4
  end
5
5
  end
@@ -0,0 +1,21 @@
1
+ <h3><%= t('Queues') %></h3>
2
+
3
+ <div class="table_container">
4
+ <table class="table table-hover table-bordered table-striped table-white">
5
+ <thead>
6
+ <th><%= t('Worker') %></th>
7
+ <th><%= t('Queue') %></th>
8
+ <th><%= t('Size') %></th>
9
+ <th><%= t('Actions') %></th>
10
+ </thead>
11
+ <% @lists.each do |list| %>
12
+ <tr>
13
+ <td><%= list.worker_name %></td>
14
+ <td><%= list.queue %></td>
15
+ <td><%= number_with_delimiter(list.size) %> </td>
16
+ <td width="20%">
17
+ </td>
18
+ </tr>
19
+ <% end %>
20
+ </table>
21
+ </div>
@@ -0,0 +1,20 @@
1
+ require 'sidekiq/web'
2
+
3
+ module Sidekiq
4
+ module Paquet
5
+ module Web
6
+ VIEWS = File.expand_path('views', File.dirname(__FILE__))
7
+
8
+ def self.registered(app)
9
+ app.get '/paquet' do
10
+ @lists = Sidekiq.redis { |c| c.zrange('bundles', 0, -1) }.map { |n| Bundle.new(n) }
11
+ erb File.read(File.join(VIEWS, 'index.erb'))
12
+ end
13
+ end
14
+
15
+ end
16
+ end
17
+ end
18
+
19
+ Sidekiq::Web.register(Sidekiq::Paquet::Web)
20
+ Sidekiq::Web.tabs['Bundles'] = 'paquet'
@@ -1,18 +1,19 @@
1
+ require 'concurrent/scheduled_task'
2
+
1
3
  require 'sidekiq'
2
4
  require 'sidekiq/paquet/version'
3
5
 
4
- require 'sidekiq/paquet/list'
5
- require 'sidekiq/paquet/batch'
6
+ require 'sidekiq/paquet/bundle'
6
7
  require 'sidekiq/paquet/middleware'
7
- require 'sidekiq/paquet/poller'
8
+ require 'sidekiq/paquet/flusher'
8
9
 
9
10
  module Sidekiq
10
11
  module Paquet
11
12
  DEFAULTS = {
12
- default_bulk_size: 100,
13
- bulk_flush_interval: nil,
14
- average_bulk_flush_interval: 15,
15
- dynamic_interval_scaling: true
13
+ default_bundle_size: 100,
14
+ flush_interval: nil,
15
+ average_flush_interval: 15,
16
+ initial_wait: 10
16
17
  }
17
18
 
18
19
  def self.options
@@ -22,6 +23,10 @@ module Sidekiq
22
23
  def self.options=(opts)
23
24
  @options = opts
24
25
  end
26
+
27
+ def self.initial_wait
28
+ options[:initial_wait] + (5 * rand)
29
+ end
25
30
  end
26
31
  end
27
32
 
@@ -37,11 +42,13 @@ Sidekiq.configure_server do |config|
37
42
  end
38
43
 
39
44
  config.on(:startup) do
40
- config.options[:bulk_poller] = Sidekiq::Paquet::Poller.new
41
- config.options[:bulk_poller].start
45
+ config.options[:paquet_flusher] = Sidekiq::Paquet::Flusher.new
46
+ Concurrent::ScheduledTask.execute(Sidekiq::Paquet.initial_wait) {
47
+ config.options[:paquet_flusher].start
48
+ }
42
49
  end
43
50
 
44
51
  config.on(:shutdown) do
45
- config.options[:bulk_poller].terminate
52
+ config.options[:paquet_flusher].shutdown
46
53
  end
47
54
  end
@@ -19,8 +19,10 @@ Gem::Specification.new do |spec|
19
19
  spec.require_paths = ["lib"]
20
20
 
21
21
  spec.add_dependency "sidekiq", ">= 4"
22
+ spec.add_dependency "concurrent-ruby", "~> 1.0"
22
23
 
23
24
  spec.add_development_dependency "bundler", "~> 1.10"
24
25
  spec.add_development_dependency "rake", "~> 10.0"
25
26
  spec.add_development_dependency "minitest"
27
+ spec.add_development_dependency "redis-namespace", "~> 1.5"
26
28
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sidekiq-paquet
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - ccocchi
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-03-18 00:00:00.000000000 Z
11
+ date: 2016-03-23 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: sidekiq
@@ -24,6 +24,20 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '4'
27
+ - !ruby/object:Gem::Dependency
28
+ name: concurrent-ruby
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.0'
27
41
  - !ruby/object:Gem::Dependency
28
42
  name: bundler
29
43
  requirement: !ruby/object:Gem::Requirement
@@ -66,6 +80,20 @@ dependencies:
66
80
  - - ">="
67
81
  - !ruby/object:Gem::Version
68
82
  version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: redis-namespace
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.5'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.5'
69
97
  description:
70
98
  email:
71
99
  - cocchi.c@gmail.com
@@ -81,12 +109,13 @@ files:
81
109
  - README.md
82
110
  - Rakefile
83
111
  - lib/sidekiq/paquet.rb
84
- - lib/sidekiq/paquet/batch.rb
85
- - lib/sidekiq/paquet/list.rb
112
+ - lib/sidekiq/paquet/bundle.rb
113
+ - lib/sidekiq/paquet/flusher.rb
86
114
  - lib/sidekiq/paquet/middleware.rb
87
- - lib/sidekiq/paquet/poller.rb
88
115
  - lib/sidekiq/paquet/version.rb
89
- - sidekiq-bulk.gemspec
116
+ - lib/sidekiq/paquet/views/index.erb
117
+ - lib/sidekiq/paquet/web.rb
118
+ - sidekiq-paquet.gemspec
90
119
  homepage: https://github.com/ccocchi/sidekiq-paquet
91
120
  licenses:
92
121
  - MIT
@@ -1,46 +0,0 @@
1
- module Sidekiq
2
- module Paquet
3
- module Batch
4
-
5
- def self.append(item)
6
- worker_name = item['class'.freeze]
7
- args = item.fetch('args'.freeze, [])
8
-
9
- Sidekiq.redis do |conn|
10
- conn.multi do
11
- conn.zadd('bulks'.freeze, 0, worker_name, nx: true)
12
- conn.rpush("bulk:#{worker_name}", Sidekiq.dump_json(args))
13
- end
14
- end
15
- end
16
-
17
- def self.enqueue_jobs
18
- now = Time.now.to_f
19
- Sidekiq.redis do |conn|
20
- workers = conn.zrangebyscore('bulks'.freeze, '-inf', now)
21
-
22
- workers.each do |worker|
23
- klass = worker.constantize
24
- opts = klass.get_sidekiq_options
25
- min_interval = opts['bulk_minimum_interval'.freeze]
26
-
27
- items = conn.lrange("bulk:#{worker}", 0, -1)
28
- items.map! { |i| Sidekiq.load_json(i) }
29
-
30
- items.each_slice(opts.fetch('bulk_size'.freeze, Sidekiq::Paquet.options[:default_bulk_size])) do |vs|
31
- Sidekiq::Client.push(
32
- 'class' => worker,
33
- 'queue' => opts['queue'.freeze],
34
- 'args' => vs
35
- )
36
- end
37
-
38
- conn.ltrim("bulk:#{worker}", items.size, -1)
39
- conn.zadd('bulks'.freeze, now + min_interval, worker) if min_interval
40
- end
41
- end
42
- end
43
-
44
- end
45
- end
46
- end
@@ -1,21 +0,0 @@
1
- module Sidekiq
2
- module Paquet
3
- class List
4
- def initialize(name)
5
- @lname = "bulk:#{name}"
6
- end
7
-
8
- def size
9
- Sidekiq.redis { |c| c.llen(@lname) }
10
- end
11
-
12
- def items
13
- Sidekiq.redis { |c| c.lrange(@lname, 0, -1) }
14
- end
15
-
16
- def clear
17
- Sidekiq.redis { |c| c.del(@lname) }
18
- end
19
- end
20
- end
21
- end
@@ -1,84 +0,0 @@
1
- require 'sidekiq/util'
2
- require 'sidekiq/scheduled'
3
-
4
- module Sidekiq
5
- module Paquet
6
- class Poller < Sidekiq::Scheduled::Poller
7
-
8
- def initialize
9
- @sleeper = ConnectionPool::TimedStack.new
10
- @done = false
11
- end
12
-
13
- def start
14
- @thread ||= safe_thread('bulk') do
15
- initial_wait
16
-
17
- while !@done
18
- enqueue
19
- wait
20
- end
21
- Sidekiq.logger.info('Bulk exiting...')
22
- end
23
- end
24
-
25
- def enqueue
26
- begin
27
- Batch.enqueue_jobs
28
- rescue => ex
29
- # Most likely a problem with redis networking.
30
- # Punt and try again at the next interval
31
- logger.error ex.message
32
- logger.error ex.backtrace.first
33
- end
34
- end
35
-
36
- private
37
-
38
- # Calculates a random interval that is ±50% the desired average.
39
- def random_poll_interval
40
- avg = poll_interval_average.to_f
41
- avg * rand + avg / 2
42
- end
43
-
44
- # We do our best to tune the poll interval to the size of the active Sidekiq
45
- # cluster. If you have 30 processes and poll every 15 seconds, that means one
46
- # Sidekiq is checking Redis every 0.5 seconds - way too often for most people
47
- # and really bad if the retry or scheduled sets are large.
48
- #
49
- # Instead try to avoid polling more than once every 15 seconds. If you have
50
- # 30 Sidekiq processes, we'll poll every 30 * 15 or 450 seconds.
51
- # To keep things statistically random, we'll sleep a random amount between
52
- # 225 and 675 seconds for each poll or 450 seconds on average. Otherwise restarting
53
- # all your Sidekiq processes at the same time will lead to them all polling at
54
- # the same time: the thundering herd problem.
55
- #
56
- # We only do this if poll_interval_average is unset (the default).
57
- def poll_interval_average
58
- if Sidekiq::Paquet.options[:dynamic_interval_scaling]
59
- scaled_poll_interval
60
- else
61
- Sidekiq::Paquet.options[:bulk_flush_interval] ||= scaled_poll_interval
62
- end
63
- end
64
-
65
- # Calculates an average poll interval based on the number of known Sidekiq processes.
66
- # This minimizes a single point of failure by dispersing check-ins but without taxing
67
- # Redis if you run many Sidekiq processes.
68
- def scaled_poll_interval
69
- pcount = Sidekiq::ProcessSet.new.size
70
- pcount = 1 if pcount == 0
71
- pcount * Sidekiq::Paquet.options[:average_bulk_flush_interval]
72
- end
73
-
74
- def initial_wait
75
- # Have all processes sleep between 5-15 seconds. 10 seconds
76
- # to give time for the heartbeat to register (if the poll interval is going to be calculated by the number
77
- # of workers), and 5 random seconds to ensure they don't all hit Redis at the same time.
78
- total = INITIAL_WAIT + (15 * rand)
79
- @sleeper.pop(total)
80
- rescue Timeout::Error
81
- end
82
- end
83
- end
84
- end