sqewer 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 5d7500a5cfddbad5db77d066880dc9c4db85b505
4
+ data.tar.gz: ed15e8c5303a3dd9067cfeb52754af09a1b4b3f5
5
+ SHA512:
6
+ metadata.gz: 391b04bba9100e0360b44407ab8cb81790d550c4d4ebb14921b9ef46498038b5cadb245a1fe5c4d910193c9e6a1c69bbd9184ea5631451f17c0d065ba9081bd9
7
+ data.tar.gz: cb9d4cde98e3b157d5527f106efce4deb370e0ab7712e113e9189fab2db6b41a94b13ac2c2dce7f2fc411e0de2b539cdb74864bcde462f2f52832ea85ce0d6de
@@ -0,0 +1,12 @@
1
+ rake:
2
+ script:
3
+ - git submodule update --init
4
+ - ls -la
5
+ - gem install bundler
6
+ - bundle config --global jobs 4
7
+ - bundle config --global path /cache/gems
8
+ - bundle config build.nokogiri "--use-system-libraries --with-xml2-include=/usr/include/libxml2"
9
+ - bundle check || bundle install
10
+ - bundle exec rake
11
+ tags:
12
+ - ruby
@@ -0,0 +1 @@
1
+ --markup markdown
@@ -0,0 +1,180 @@
1
+ A more in-depth explanation of the systems below.
2
+
3
+ ## Job storage
4
+
5
+ Jobs are (by default) stored in SQS as JSON blobs. A very simple job ticket looks like this:
6
+
7
+ {"_job_class": "MyJob", "_job_params": null}
8
+
9
+ When this ticket is being picked up by the worker, the worker will do the following:
10
+
11
+ job = MyJob.new
12
+ job.run
13
+
14
+ So the smallest job class has to be instantiatable, and has to respond to the `run` message.
15
+
16
+ ## Jobs with arguments and parameters
17
+
18
+ Job parameters can be passed as keyword arguments. Properties in the job ticket (encoded as JSON) are
19
+ directly translated to keyword arguments of the job constructor. With a job ticket like this:
20
+
21
+ {
22
+ "_job_class": "MyJob",
23
+ "_job_params": {"ids": [1,2,3]}
24
+ }
25
+
26
+ the worker will instantiate your `MyJob` class with the `ids:` keyword argument:
27
+
28
+ job = MyJob.new(ids: [1,2,3])
29
+ job.run
30
+
31
+ Note that at this point only arguments that are raw JSON types are supported:
32
+
33
+ * Hash
34
+ * Array
35
+ * Numeric
36
+ * String
37
+ * nil/false/true
38
+
39
+ If you need marshalable Ruby types there instead, you might need to implement a custom `Serializer.`
40
+
41
+ ## Jobs spawning dependent jobs
42
+
43
+ If your `run` method on the job object accepts arguments (has non-zero `arity` ) the `ExecutionContext` will
44
+ be passed to the `run` method.
45
+
46
+ job = MyJob.new(ids: [1,2,3])
47
+ job.run(execution_context)
48
+
49
+ The execution context has some useful methods:
50
+
51
+ * `logger`, for logging the state of the current job. The logger messages will be prefixed with the job's `inspect`.
52
+ * `submit!` for submitting more jobs to the same queue
53
+
54
+ A job submitting a subsequent job could look like this:
55
+
56
+ class MyJob
57
+ def run(ctx)
58
+ ...
59
+ ctx.submit!(DeferredCleanupJob.new)
60
+ end
61
+ end
62
+
63
+ ## Job submission
64
+
65
+ In general, a job object that needs some arguments for instantiation must return a Hash from it's `to_h` method. The hash must
66
+ include all the keyword arguments needed to instantiate the job when executing. For example:
67
+
68
+ class SendMail
69
+ def initialize(to:, body:)
70
+ ...
71
+ end
72
+
73
+ def run()
74
+ ...
75
+ end
76
+
77
+ def to_h
78
+ {to: @to, body: @body}
79
+ end
80
+ end
81
+
82
+ Or if you are using `ks` gem (https://rubygems.org/gems/ks) you could inherit your Job from it:
83
+
84
+ class SendMail < Ks.strict(:to, :body)
85
+ def run
86
+ ...
87
+ end
88
+ end
89
+
90
+ ## Job marshaling
91
+
92
+ By default, the jobs are converted to JSON and back from JSON using the Sqewer::Serializer object. You can
93
+ override that object if you need to handle job tickets that come from external sources and do not necessarily
94
+ conform to the job serialization format used internally. For example, you can handle S3 bucket notifications:
95
+
96
+ class CustomSerializer < Sqewer::Serializer
97
+ # Overridden so that we can instantiate a custom job
98
+ # from the AWS notification payload.
99
+ # Return "nil" and the job will be simply deleted from the queue
100
+ def unserialize(message_blob)
101
+ message = JSON.load(message_blob)
102
+ return if message['Service'] # AWS test
103
+ return HandleS3Notification.new(message) if message['Records']
104
+
105
+ super # as default
106
+ end
107
+ end
108
+
109
+ Or you can override the serialization method to add some metadata to the job ticket on job submission:
110
+
111
+ class CustomSerializer < Sqewer::Serializer
112
+ def serialize(job_object)
113
+ json_blob = super
114
+ parsed = JSON.load(json_blob)
115
+ parsed['_submitter_host'] = Socket.gethostname
116
+ JSON.dump(parsed)
117
+ end
118
+ end
119
+
120
+ If you return `nil` from your `unserialize` method the job will not be executed,
121
+ but will just be deleted from the SQS queue.
122
+
123
+ ## Starting and running the worker
124
+
125
+ The very minimal executable for running jobs would be this:
126
+
127
+ #!/usr/bin/env ruby
128
+ require 'my_applicaion'
129
+ Sqewer::CLI.run
130
+
131
+ This will connect to the queue at the URL set in the `SQS_QUEUE_URL` environment variable.
132
+
133
+ You can also run a worker without signal handling, for example in test
134
+ environments. Note that the worker is asynchronous, it has worker threads
135
+ which do all the operations by themselves.
136
+
137
+ worker = Sqewer::Worker.new
138
+ worker.start
139
+ # ...and once you are done testing
140
+ worker.stop
141
+
142
+ ## Configuring the worker
143
+
144
+ One of the reasons this library exists is that sometimes you need to set up some more
145
+ things than usually assumed to be possible. For example, you might want to have a special
146
+ logging library:
147
+
148
+ worker = Sqewer::Worker.new(logger: MyCustomLogger.new)
149
+
150
+ Or you might want a different job serializer/deserializer (for instance, if you want to handle
151
+ S3 bucket notifications coming into the same queue):
152
+
153
+ worker = Sqewer::Worker.new(serializer: CustomSerializer.new)
154
+
155
+ The `Sqewer::CLI` module that you run from the commandline handler application accepts the
156
+ same options as the `Worker` constructor, so everything stays configurable.
157
+
158
+ ## Execution and serialization wrappers (middleware)
159
+
160
+ You can wrap job processing in middleware. A full-featured middleware class looks like this:
161
+
162
+ class MyWrapper
163
+ # Surrounds the job instantiation from the string coming from SQS.
164
+ def around_deserialization(serializer, msg_id, msg_payload)
165
+ # msg_id is the receipt handle, msg_payload is the message body string
166
+ yield
167
+ end
168
+
169
+ # Surrounds the actual job execution
170
+ def around_execution(job, context)
171
+ # job is the actual job you will be running, context is the ExecutionContext.
172
+ yield
173
+ end
174
+ end
175
+
176
+ You need to set up a `MiddlewareStack` and supply it to the `Worker` when instantiating:
177
+
178
+ stack = Sqewer::MiddlewareStack.new
179
+ stack << MyWrapper.new
180
+ w = Sqewer::Worker.new(middleware_stack: stack)
data/FAQ.md ADDED
@@ -0,0 +1,54 @@
1
+ # FAQ
2
+
3
+ This document tries to answer some questions that may arise when reading or using the library. Hopefully
4
+ this can provide some answers with regards to how things are put together.
5
+
6
+ ## Why no ActiveJob?
7
+
8
+ An adapter will be added in the future
9
+
10
+ ## Why separate `new` and `run` methods instead of just `perform`?
11
+
12
+ Because the job needs access to the execution context of the worker. It turned out that keeping the context
13
+ in global/thread/class variables was somewhat nasty, and jobs needed access to the current execution context
14
+ to enqueue the subsequent jobs, and to get access to loggers (and other context-sensitive objects). Therefore
15
+ it makes more sense to offer Jobs access to the execution context, and to make a Job a command object.
16
+
17
+ Also, Jobs usually use their parameters in multiple smaller methods down the line. It therefore makes sense
18
+ to save those parameters in instance variables or in struct members.
19
+
20
+ ## Why keyword constructors for jobs?
21
+
22
+ Because keyword constructors map very nicely to JSON objects and provide some (at least rudimentary) arity safety,
23
+ by checking for missing keywords and by allowing default keyword argument values. Also, we already have some
24
+ products that use those job formats. Some have dozens of classes of jobs, all with those signatures and tests.
25
+
26
+ ## Why no weighted queues?
27
+
28
+ Because very often when you want to split queues servicing one application it means that you do not have enough
29
+ capacity to serve all of the job _types_ in a timely manner. Then you try to assign priority to separate jobs,
30
+ whereas in fact what you need are jobs that execute _roughly_ at the same speed - so that your workers do not
31
+ stall when clogged with mostly-long jobs. Also, multiple queues introduce more configuration, which, for most
32
+ products using this library, was a very bad idea (more workload for deployment).
33
+
34
+ ## Why so many configurable components?
35
+
36
+ Because sometimes your requirements differ just-a-little-bit from what is provided, and you have to swap your
37
+ implementation in instead. One product needs foreign-submitted SQS jobs (S3 notifications). Another product
38
+ needs a custom Logger subclass. Yet another product needs process-based concurrency on top of threads.
39
+ Yet another process needs to manage database connections when running the jobs. Have 3-4 of those, and a
40
+ pretty substantial union of required features will start to emerge. Do not fear - most classes of the library
41
+ have a magic `.default` method which will liberate you from most complexities.
42
+
43
+ ## Why multithreading for workers?
44
+
45
+ Because it is fast and relatively memory-efficient. Most of the workload we encountered was IO-bound or even
46
+ network-IO bound. In that situation it makes more sense to use threads that switch quickly, instead of burdening
47
+ the operating system with too many processes. An optional feature for one-process-per-job is going to be added
48
+ soon, for tasks that really warrant it (like image manipulation). For now, however, threads are working quite OK.
49
+
50
+ ## Why no Celluloid?
51
+
52
+ Because I found that a producer-consumer model with a thread pool works quite well, and can be created based on
53
+ the Ruby standard library alone.
54
+
data/Gemfile ADDED
@@ -0,0 +1,18 @@
1
+ source "http://rubygems.org"
2
+
3
+ gem 'aws-sdk', '~> 2'
4
+ gem 'very_tiny_state_machine', '~> 1'
5
+ gem 'hash_tools'
6
+ gem 'exceptional_fork'
7
+
8
+ group :development do
9
+ gem 'ks'
10
+ gem 'dotenv'
11
+ gem 'rake'
12
+ gem "rspec", "~> 3.2.0"
13
+ gem 'simplecov', :require => false
14
+ # gem "autospec" -> I would love to have this, but it wants capybara-webkit too :-(
15
+ gem "rdoc", "~> 3.12"
16
+ gem "bundler", "~> 1.0"
17
+ gem "jeweler", "~> 2.0.1"
18
+ end
@@ -0,0 +1,69 @@
1
+ An AWS SQS-based queue processor, for highly distributed job engines.
2
+
3
+ ## The shortest introduction possible
4
+
5
+ In your environment, set `SQS_QUEUE_URL`. Then, define a job class:
6
+
7
+ class MyJob
8
+ def run
9
+ File.open('output', 'a') { ... }
10
+ end
11
+ end
12
+
13
+ Then submit the job:
14
+
15
+ Sqewer.submit!(MyJob.new)
16
+
17
+ and to start processing, in your commandline handler:
18
+
19
+ #!/usr/bin/env ruby
20
+ require 'my_applicaion'
21
+ Sqewer::CLI.run
22
+
23
+ To add arguments to the job
24
+
25
+ class JobWithArgs
26
+ include Sqewer::SimpleJob
27
+ attr_accessor :times
28
+
29
+ def run
30
+ ...
31
+ end
32
+ end
33
+ ...
34
+ Sqewer.submit!(JobWithArgs.new(times: 20))
35
+
36
+ Submitting jobs from other jobs (the job will go to the same queue the parent job came from):
37
+
38
+ class MyJob
39
+ def run(worker_context)
40
+ ...
41
+ worker_context.submit!(CleanupJob.new)
42
+ end
43
+ end
44
+
45
+ The messages will only be deleted from SQS once the job execution completes without raising an exception.
46
+
47
+ ## Detailed usage instructions
48
+
49
+ For more detailed usage information, see [DETAILS.md](./DETAILS.md)
50
+
51
+ ## Frequently asked questions (A.K.A. _why is it done this way_)
52
+
53
+ Please see [FAQ.md](./FAQ.md). This might explain some decisions behind the library in greater detail.
54
+
55
+ ## Contributing to the library
56
+
57
+ * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
58
+ * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
59
+ * Fork the project.
60
+ * Start a feature/bugfix branch.
61
+ * Commit and push until you are happy with your contribution.
62
+ * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
63
+ * Run your tests against a _real_ SQS queue. You will need your tests to have permissions to create and delete SQS queues.
64
+ * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
65
+
66
+ ## Copyright
67
+
68
+ Copyright (c) 2016 WeTransfer. See LICENSE.txt for further details.
69
+
@@ -0,0 +1,41 @@
1
+ # encoding: utf-8
2
+
3
+ require 'rubygems'
4
+ require 'bundler'
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+ require 'rake'
13
+ require_relative 'lib/sqewer/version'
14
+ require 'jeweler'
15
+ Jeweler::Tasks.new do |gem|
16
+ # gem is a Gem::Specification... see http://guides.rubygems.org/specification-reference/ for more options
17
+ gem.version = Sqewer::VERSION
18
+ gem.name = "sqewer"
19
+ gem.homepage = "https://gitlab.wetransfer.net/julik/sqewer"
20
+ gem.license = "MIT"
21
+ gem.description = %Q{Process jobs from SQS}
22
+ gem.summary = %Q{A full-featured library for all them worker needs}
23
+ gem.email = "me@julik.nl"
24
+ gem.authors = ["Julik Tarkhanov"]
25
+ # dependencies defined in Gemfile
26
+ end
27
+ Jeweler::RubygemsDotOrgTasks.new
28
+
29
+ require 'rspec/core'
30
+ require 'rspec/core/rake_task'
31
+ RSpec::Core::RakeTask.new(:spec) do |spec|
32
+ spec.pattern = FileList['spec/**/*_spec.rb']
33
+ end
34
+
35
+ # desc "Code coverage detail"
36
+ # task :simplecov do
37
+ # ENV['COVERAGE'] = "true"
38
+ # Rake::Task['spec'].execute
39
+ # end
40
+
41
+ task :default => :spec
@@ -0,0 +1,6 @@
1
+ # Dotenv is used when running tests for the library. You may choose to use Dotenv
2
+ # for your main production application as well. When you use Sqewer as a library
3
+ # Dotenv is not required, and .env is not going to be force-loaded for you.
4
+ AWS_ACCESS_KEY_ID=secret
5
+ AWS_SECRET_ACCESS_KEY=secret
6
+ AWS_REGION=eu-west-1
@@ -0,0 +1,11 @@
1
+ # The enclosing module for the library
2
+ module Sqewer
3
+ Dir.glob(__dir__ + '/**/*.rb').each {|p| require p unless p == __FILE__ }
4
+
5
+ # Shortcut access to Submitter#submit.
6
+ #
7
+ # @see {Sqewer::Submitter#submit!}
8
+ def self.submit!(*jobs, **options)
9
+ Sqewer::Submitter.default.submit!(*jobs, **options)
10
+ end
11
+ end
@@ -0,0 +1,22 @@
1
+ require 'thread'
2
+
3
+ # Maintains a thread-safe counter wrapped in a Mutex.
4
+ class Sqewer::AtomicCounter
5
+ def initialize
6
+ @m, @v = Mutex.new, 0
7
+ end
8
+
9
+ # Returns the current value of the counter
10
+ #
11
+ # @return [Fixnum] the current value of the counter
12
+ def to_i
13
+ @m.synchronize { @v + 0 }
14
+ end
15
+
16
+ # Increments the counter
17
+ #
18
+ # @return [Fixnum] the current value of the counter
19
+ def increment!
20
+ @m.synchronize { @v += 1 }
21
+ end
22
+ end
@@ -0,0 +1,44 @@
1
+ module Sqewer::CLI
2
+ # Start the commandline handler, and set up a centralized signal handler that reacts
3
+ # to USR1 and TERM to do a soft-terminate on the worker.
4
+ #
5
+ # @param worker[Sqewer::Worker] the worker to start. Must respond to `#start` and `#stop`
6
+ # @return [void]
7
+ def start(worker = Sqewer::Worker.default)
8
+ # Use a self-pipe to accumulate signals in a central location
9
+ self_read, self_write = IO.pipe
10
+ %w(INT TERM USR1 USR2 TTIN).each do |sig|
11
+ begin
12
+ trap(sig) { self_write.puts(sig) }
13
+ rescue ArgumentError
14
+ # Signal not supported
15
+ end
16
+ end
17
+
18
+ begin
19
+ worker.start
20
+ # The worker is non-blocking, so in the main CLI process we select() on the signal
21
+ # pipe and handle the signal in a centralized fashion
22
+ while (readable_io = IO.select([self_read]))
23
+ signal = readable_io.first[0].gets.strip
24
+ handle_signal(worker, signal)
25
+ end
26
+ rescue Interrupt
27
+ worker.stop
28
+ exit 1
29
+ end
30
+ end
31
+
32
+ def handle_signal(worker, sig)
33
+ case sig
34
+ when 'USR1', 'TERM'
35
+ worker.stop
36
+ exit 0
37
+ #when 'TTIN' # a good place to print the worker status
38
+ else
39
+ raise Interrupt
40
+ end
41
+ end
42
+
43
+ extend self
44
+ end