chutzen 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: e078b40512684bbdd04c341199d45ed740749194fa8a1154cc3760afdd9cefeb
4
+ data.tar.gz: 2eca64fa5545fc9c7722ab06fd5456458b31e7347248bb67f30237616c1c24e7
5
+ SHA512:
6
+ metadata.gz: 2f53fd4ea333c556f30143fc580b8b5c0fb9d2450bf093c29e9c60f97147836c280b6b999ed33c13ff1c807004bb7b132f994fea0881d4ee4f3730300a974cd8
7
+ data.tar.gz: 943189ce50907777c0d663e4c755277dabe35fbbeadcded292c53dd657b0a766f55aff5d88dc3cee6b47e94bf932694b0849f88fac1fa02c43e6f905c95548c5
data/README.md ADDED
@@ -0,0 +1,244 @@
1
+ # Chutzen
2
+
3
+ Chutzen is a little toolkit to help with implementing batch processing. It uses Sidekiq as a job queue for incoming jobs and as a notification queue for outgoing notifications.
4
+
5
+ ## Enqueueing a job
6
+
7
+ Let's look at an example app called Squirrel that wants to collect GIFs from the internet.
8
+
9
+ ```ruby
10
+ Chutzen.perform_async(
11
+ 'notify' => { 'class' => 'Squirrel::Result' },
12
+ 'result' => { 'id' => 42 },
13
+ 'commands' => [
14
+ {
15
+ 'execute' => [
16
+ 'curl',
17
+ {
18
+ '-o' => 'food.gif'
19
+ },
20
+ 'https://example.com/food.gif'
21
+ ]
22
+ 'skip_when' => '12 > 24',
23
+ 'fail_when' => { 'read_timeout' => 2.5 },
24
+ 'result' => {
25
+ 'food' => 'found'
26
+ }
27
+ }
28
+ ]
29
+ )
30
+ ```
31
+
32
+ ### Dynamic expresssions
33
+
34
+ Some properties in the job description accept dynamic expressions. Expressions are an onion of expression, template, and expressons.
35
+
36
+ Let's start with a complex example and then peal the onion.
37
+
38
+ ${video.width} < ${minimal_width}
39
+
40
+ Any expression is first evaluated as a *template*. All values between the `${}` notation are evaluated as a *query* and then replaced in the string. The resulting string is again evaluate as a *query* and results in a value. So it's queries, inside a template, inside a query.
41
+
42
+ Let's take the following job *dictionary* as an example:
43
+
44
+ { 'minimal_width' => 512, 'video' => { 'width' => 1080 } }
45
+
46
+ Step one we replace all the variable expressions by their *query* result in the template.
47
+
48
+ video.width => 1080
49
+ minimal_width => 512
50
+
51
+ So the resulting string is
52
+
53
+ 1080 < 512
54
+
55
+ In certain properties (eg. `skip_when`) the resulting string is evaluated as a *query*.
56
+
57
+ 1080 < 512 => false
58
+
59
+ The documentation below will tell you when a property is treated as a *template* to produce a string and when a property is treated as a *query* to return a (boolean) value.
60
+
61
+ ### Variable expressions
62
+
63
+ A variable expression starts with `${` and ends with `}`. The expression can be anything valid in the Chutzen query language (see below).
64
+
65
+ A literal expression starts with `@` and is replaced by the entire unparsed output from a command.
66
+
67
+ ```
68
+ {
69
+ 'duration' => 'The duration is: ${ffprobe.format.duration}',
70
+ #=> {"duration":"The duration is: 55433.234"}
71
+ 'ffprobe_output' => '@ffprobe'
72
+ #=> {"ffprobe_output":{"steams":[…],"format":…}}
73
+ }
74
+ ```
75
+
76
+ ### The Chutzen query language
77
+
78
+ The primary purpose of the query language is to select values from nested hashes and arrays.
79
+
80
+ The concept of the language is that you dig into a mixed structure of hashes and arrays. Numeric values reference the n-th item in an array. Strings are used as the key in a hash. The query is separated by dots.
81
+
82
+ For example the following Chutzen expression is the same as the Ruby expression below,
83
+
84
+ streams.0.height
85
+ dictionary['streams'][0]['height']
86
+
87
+ If you don't want to reference the entire path you can prefix an expression with `//`, this will perform a depth-first search for the first occurance of that key in the structure. Deep reference with an array index generally don't make a lot of sense.
88
+
89
+ For example, givens the following structure.
90
+
91
+ {
92
+ 'streams' => [
93
+ { 'height' => 12, 'width' => 12, 'codec' => 'H.264 },
94
+ { 'height' => 42 }
95
+ ]
96
+ }
97
+
98
+ These are valid queries with their result.
99
+
100
+ streams.1.height => 42
101
+ streams.0.codec => 'H.265'
102
+ streams.1.codec => nil
103
+ //height => 12
104
+
105
+ Identifiers reference a value in the structure. You can also compare expression with the `=`, `&`, `|`, `<`, and `>` operators.
106
+
107
+ 12 = 12 => true
108
+ streams.1.height = 65 => false
109
+ //height = 12 => true
110
+ streams.0.codec = 'MOV' => false
111
+ streams.0.height > 12 => false
112
+ true & false => false
113
+ true | false => true
114
+ streams.1.codec = undefined => true
115
+
116
+ ### Job description
117
+
118
+ * notify: A hash of options used to enqueue a Sidekiq worker to report back after a command was performed. This section must include at least a *class*, but may also include all other valid Sidekiq job settings like *queue*. Note that the *arg* setting will be overwritten with the payload.
119
+ * result: A hash that is sent as a payload with each notification. You can use this to send a reference to the job to you can always associate results and errors with the job. Interpreted as *template*.
120
+ * commands: A list of command hashes which must be performed to complete the job.
121
+
122
+ ### Command description
123
+
124
+ * execute: A string or list which describes a command to be executed, see below for details. Interpreted as *template*.
125
+ * skip_when: Describes when to skip the command. Interpreted as *boolean query*.
126
+ * perform_when: Describes when to perform the commmand. Inverse of `skip_when` for convenience. Interpreted as *boolean query*.
127
+ * fail_when: Describes when to kill the command (ie. in case of a timeout).
128
+ * merge: Hash to instruct that an output stream has to be stored in the global job dictionary, see below for details.
129
+ * result: Hash with values to use as the payload for the notify worker. Values may include variables. Note that the payload will be encoded as JSON to increase interoperability and allow for easier backwards compatiblity in case of notification payload changes. Properties are interpreted as *template*.
130
+ * remember: Merge keys into the global job dictionary. Properties are interpreted as *template*.
131
+ * optional: When set to true the command will not throw an exception in case of a non-zero exit code of the executed command and it will continue processing the next command.
132
+
133
+ ### Execute description
134
+
135
+ When *execute* contain a string it will be passed to be executed verbatim. An Array will be joined and Hash values in the Array will be joined to generate valid switch values. Arrays and Hashes are supported to make it easier to write code that merges repeating elements in a job description.
136
+
137
+ ### Fail when description
138
+
139
+ The *fail_when* section can contain any of the following options:
140
+
141
+ * read_timeout: Check if the stdout or stderr grows and kill the command if this doesn't happen for the specified number of seconds.
142
+
143
+ ### Merge description
144
+
145
+ Currently this only supports one of two streams: *stdout* or *stderr*. The value of either section **must** be *json*, *string*, or *integer*. This setting causes the command to save the output of the indicated stream and parse it as the indicated format. The parsed value will be added to the global job dictionary which can be referenced in queries in subsequent commands.
146
+
147
+ You can us the *as* section to set the name for the dictionary in case the command is not useful.
148
+
149
+ ```ruby
150
+ { 'merge' => { 'stdout' => 'integer', 'as' => 'user_count' } }
151
+ ```
152
+
153
+ ### Remember description
154
+
155
+ Merges additional data into the global job dictionary. This is useful when you want to prevent repeating complex expressions across jobs or perform nested boolean expressions.
156
+
157
+ ```ruby
158
+ {
159
+ 'merge' => { 'stdout' => 'string', 'as' => 'result' },
160
+ 'remember' => {
161
+ 'missing' => "result = 'missing'",
162
+ 'blank' => "result = 'blank'",
163
+ 'do_nothing' => 'missing | blank'
164
+ }
165
+ },
166
+ {
167
+ 'skip_when' => 'do_nothing'
168
+ }
169
+ ```
170
+
171
+ Using `remember` you can store an undefined value. A Chutzen expression uses the special identifier `undefined` to express this value. This works similar to `true` and `false`.
172
+
173
+ ```ruby
174
+ {
175
+ 'remember' => {
176
+ 'count' => 'undefined',
177
+ 'work' => 'undefined = count'
178
+ }
179
+ }
180
+ ```
181
+
182
+ ## Receiving notifications
183
+
184
+ The arguments passed to *notify* will be used to schedule a Sidekiq job back on the the queue so your application can process the result.
185
+
186
+ ```ruby
187
+ module Squirrel
188
+ class Result
189
+ include Sidekiq::Worker
190
+
191
+ def perform(payload)
192
+ args = JSON.parse(payload)
193
+ # … do something with the args
194
+ end
195
+ end
196
+ end
197
+ ```
198
+
199
+ ## Safely stopping Chutzen
200
+
201
+ Chutzen is basically a Sidekiq process with some extra trimming. In many cases it will be started in a way similar to this:
202
+
203
+ sidekiq --queue chutzen --require ./lib/chutzen.rb
204
+
205
+ Sidekiq works optimally with small fire-and-forget tasks written in Ruby. Chutzen is more about batch processing and generally starts additional processes in the same progress group.
206
+
207
+ For example:
208
+
209
+ 3630 ? Ssl 290:59 sidekiq 6.1.2 chutzen [2 of 2 busy]
210
+ 1346100 ? S 0:00 \_ sh -c ffmpeg
211
+ 1346101 ? RLl 64:19 | \_ ffmpeg
212
+
213
+ You can tell Sidekiq to stop accepting new jobs by sending it the `TSTP` signal. The intended result is that it finishes the current job without starting new jobs so you can eventually restart Sidekiq safely. In our example that would be:
214
+
215
+ kill -TSTP 3630
216
+
217
+ Downside of this approach is that it also stops the ffmpeg process because all processes in the process group receive the same signal. So everything stops and nothing ever finishes.
218
+
219
+ The solution is to send a Chutzen signal.
220
+
221
+ We tell Sidekiq to listen to an additional ‘control’ queue. Which is basically just a regular queue used to send signal jobs to a specific host.
222
+
223
+ sidekiq --queue chutzen --queue chutzen-13-production \
224
+ --require ./lib/chutzen.rb
225
+
226
+ Now we can schedule the `Chutzen::Signal` worker through the `emit_signal` convenience method.
227
+
228
+ Chutzen.emit_signal('stop', queue: 'chutzen-13-production')
229
+
230
+ Downside of using a job is that you can enqueue multiple signals which may keep stopping Chutzen.
231
+
232
+ 20.times { Chutzen.emit_signal('stop', queue: 'chutzen-13') }
233
+
234
+ That is why a `Chutzen::Signal` worker will, by default, only process within 30 seconds after is was enqueued. The expiration time can be expressed in a number of ways if you don't like the default.
235
+
236
+ Chutzen.emit_signal('stop', queue: 'c', expires_in: 10) # seconds
237
+ Chutzen.emit_signal('stop', queue: 'c', expires_at: Time.now + 3600)
238
+ Chutzen::Signal.
239
+ set(queue: 'chutzen-5-test').
240
+ perform_async('stop', (Time.now + 10).to_i)
241
+
242
+ ## New Relic integrations
243
+
244
+ Chutzen can send custom metrics and exceptions to New Relic. The New Relic gem is ‘dynamically’ loaded so you have to make sure it's installed for this to work. You also have to provide a valid `confg/newrelic.yml` file relative to the working directory where Chutzen is started.
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Chutzen
4
+ # Applies a data hash with values of Chutzen expressions to a destination Dictionary using a
5
+ # deep merge.
6
+ class Apply
7
+ def initialize(target, lookup)
8
+ @target = target
9
+ @lookup = lookup
10
+ end
11
+
12
+ def merge(source, path = [])
13
+ case source
14
+ when Hash
15
+ merge_hash(source, path)
16
+ when Array
17
+ merge_array(source, path)
18
+ else
19
+ merge_scalar(source, path)
20
+ end
21
+ end
22
+
23
+ private
24
+
25
+ def merge_hash(source, path)
26
+ source.each do |key, value|
27
+ merge(value, path + [key])
28
+ end
29
+ end
30
+
31
+ def merge_array(source, path)
32
+ source.each_with_index do |value, index|
33
+ merge(value, path + [index])
34
+ end
35
+ end
36
+
37
+ def merge_scalar(expression, path)
38
+ expression = Chutzen::Template.new(expression, @lookup).result if expression.is_a?(String)
39
+
40
+ merge_expression(expression, path)
41
+ end
42
+
43
+ def merge_expression(expression, path)
44
+ expression = Chutzen::Expression.new(expression, @lookup).result if expression.is_a?(String)
45
+
46
+ @target.bury!(path, expression)
47
+ end
48
+ end
49
+ end
@@ -0,0 +1,42 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Chutzen
4
+ class Command
5
+ # Raised when execution of a command fails.
6
+ class ExecutionFailed < Chutzen::StandardError
7
+ # Instance of Command that triggered the error.
8
+ attr_reader :command
9
+
10
+ def initialize(message, command: nil)
11
+ super(message)
12
+ @command = command
13
+ end
14
+
15
+ def as_json
16
+ { 'error' => details }
17
+ end
18
+
19
+ private
20
+
21
+ def details
22
+ {
23
+ 'message' => message,
24
+ 'command' => @command&.to_s,
25
+ 'stdout_tail' => last(@command&.stdout),
26
+ 'stderr_tail' => last(@command&.stderr),
27
+ 'exit_status' => @command&.exit_status&.exitstatus
28
+ }.compact
29
+ end
30
+
31
+ def last(io)
32
+ return unless io
33
+
34
+ io.rewind
35
+ content = io.read.strip
36
+ return if content.empty?
37
+
38
+ content[-1024..] || content
39
+ end
40
+ end
41
+ end
42
+ end
@@ -0,0 +1,216 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'open3'
4
+ require 'stringio'
5
+
6
+ module Chutzen
7
+ # Holds a description for a command and executes it.
8
+ class Command
9
+ ALLOW_EVAL = /\A[\d<>=\s]+\z/.freeze
10
+
11
+ autoload :ExecutionFailed, 'chutzen/command/execution_failed'
12
+
13
+ # Returns a StringIO object with all data the command wrote to stdout.
14
+ attr_reader :stdout
15
+ # Returns a StringIO object with all data the command wrote to stderr.
16
+ attr_reader :stderr
17
+ # Returns a Process::Status object after the command has run.
18
+ attr_reader :exit_status
19
+
20
+ # Creates a new command with a description hash, dictionary, and the
21
+ # current work path.
22
+ #
23
+ # Command.new(
24
+ # { 'execute' => 'ls -al' },
25
+ # dictionary: dictionary,
26
+ # work_path: Dir.pwd
27
+ # )
28
+ def initialize(description, dictionary:, work_path:)
29
+ defaults
30
+ description.each do |name, value|
31
+ instance_variable_set("@#{name}", value)
32
+ end
33
+ @dictionary = dictionary
34
+ @work_path = work_path
35
+ @stdout = StringIO.new
36
+ @stderr = StringIO.new
37
+ end
38
+
39
+ # Return true when the command should be performed but does not need to exit
40
+ # successfully.
41
+ def optional?
42
+ @optional
43
+ end
44
+
45
+ # Joins all the arguments from the execute sections to build a command
46
+ # that can be executed by a shell.
47
+ def to_s
48
+ Template.new(Shell.join(@execute), @dictionary).result
49
+ end
50
+
51
+ # Attempts to return the name of the binary that is being executed.
52
+ def name
53
+ to_s.split(' ')[0].split('/').last
54
+ end
55
+
56
+ # Runs the job based on its description. Returns the job itself for
57
+ # convenience.
58
+ def perform
59
+ return if skip?
60
+
61
+ result = @execute ? perform_command : self
62
+ merge_remember_into_dictionary
63
+ result
64
+ end
65
+
66
+ def perform_command
67
+ trace_execution do
68
+ # We use the STOP signal to instruct Sidekiq to stop reading from the queue and the TERM
69
+ # signal to kill and re-queue running jobs. Unfortunately the command opened by popen would
70
+ # also receive the signal and stop or terminate the command. When stopped the command would
71
+ # never complete and when killed it would probably fail the job.
72
+ #
73
+ # Chutzen can't trap the signals because it runs in the same process. We don't want to wrap
74
+ # the command in a runner because that's even more inconvenient.
75
+ #
76
+ # As a tradeoff we start the command in its own process group. Upside is that it ignores the
77
+ # signals of its parent process. Downside is that it may orphan the process when Sidekiq
78
+ # stops.
79
+ Open3.popen3(to_s, chdir: @work_path, pgroup: true) do |stdin, stdout, stderr, thread|
80
+ stdin.close_write
81
+ monitor(stdout, stderr, thread)
82
+ @exit_status = thread.value
83
+ end
84
+ verify_exit_status
85
+ merge_output_into_dictionary
86
+ self
87
+ end
88
+ end
89
+
90
+ # Returns the result expressed by the result section but with its variables
91
+ # instantiated.
92
+ def result
93
+ return nil unless @result
94
+
95
+ result = Dictionary.new
96
+ Apply.new(result, @dictionary).merge(@result)
97
+ result.to_hash
98
+ end
99
+
100
+ # Returns the value expressed by the skip_when section but with its
101
+ # variables instantiated.
102
+ def skip_when
103
+ return nil unless @skip_when
104
+
105
+ Expression.new(Template.new(@skip_when, @dictionary).result, @dictionary)
106
+ end
107
+
108
+ # Returns the value expressed by the perform_when section but with its
109
+ # variables instantiated.
110
+ def perform_when
111
+ return nil unless @perform_when
112
+
113
+ Expression.new(Template.new(@perform_when, @dictionary).result, @dictionary)
114
+ end
115
+
116
+ private
117
+
118
+ def defaults
119
+ @execute = nil
120
+ @optional = false
121
+ @merge = nil
122
+ @result = nil
123
+ @remember = nil
124
+ @fail_when = nil
125
+ @skip_when = nil
126
+ @perform_when = nil
127
+ end
128
+
129
+ def skip?
130
+ if skip_when
131
+ skip_when.result
132
+ elsif perform_when
133
+ !perform_when.result
134
+ else
135
+ false
136
+ end
137
+ end
138
+
139
+ def monitor(stdout, stderr, thread)
140
+ watcher = Watcher.new(fail_when: @fail_when, files: [@stdout, @stderr])
141
+ wait_until_done(
142
+ Demux.new(
143
+ stdout, $stdout, @stdout, select_timeout: watcher.select_timeout
144
+ ),
145
+ Demux.new(
146
+ stderr, $stderr, @stderr, select_timeout: watcher.select_timeout
147
+ ),
148
+ watcher
149
+ )
150
+ rescue Chutzen::Watcher::Error => e
151
+ Process.kill('KILL', thread.pid)
152
+ raise ExecutionFailed.new(e.message, command: self) unless optional?
153
+ end
154
+
155
+ def wait_until_done(*operations)
156
+ operations.each(&:tick) until operations.all?(&:done?)
157
+ end
158
+
159
+ # Raises an exception when the job was required to succeed but didn't.
160
+ def verify_exit_status
161
+ return if optional?
162
+ return if exit_status.success?
163
+
164
+ raise ExecutionFailed.new('Failed to execute command.', command: self)
165
+ end
166
+
167
+ # Adds the result hash from the command into the dictionary hash.
168
+ def merge_output_into_dictionary
169
+ return unless @merge
170
+
171
+ streams = @merge.dup
172
+ as = streams.delete('as') || name
173
+ streams.each do |stream, format|
174
+ @dictionary.merge!(as => cast_data(format, send(stream).string.strip))
175
+ end
176
+ end
177
+
178
+ # Evaluates expressions and adds then to the dictionary hash.
179
+ def merge_remember_into_dictionary
180
+ return unless @remember
181
+
182
+ Apply.new(@dictionary, @dictionary).merge(@remember)
183
+ end
184
+
185
+ def cast_data(format, data)
186
+ case format
187
+ when 'json'
188
+ JSON.parse(data)
189
+ when 'integer'
190
+ data.to_i
191
+ when 'string'
192
+ data
193
+ else
194
+ raise(
195
+ ArgumentError,
196
+ "Only `string', `json', and `integer' are currently supported for merging."
197
+ )
198
+ end
199
+ end
200
+
201
+ def trace_execution(&block)
202
+ if defined?(::NewRelic)
203
+ NewRelic::Agent::MethodTracer.trace_execution_scoped(
204
+ transaction_scope,
205
+ &block
206
+ )
207
+ else
208
+ yield
209
+ end
210
+ end
211
+
212
+ def transaction_scope
213
+ %W[Custom/Command/#{name}]
214
+ end
215
+ end
216
+ end
@@ -0,0 +1,35 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Chutzen
4
+ # Reads from one stream and writes to a set of other streams.
5
+ #
6
+ # demux = Demux.new(stdin, [file, file, socket, stdout])
7
+ # until demux.done?
8
+ # demux.tick
9
+ # end
10
+ class Demux
11
+ BUFFER_SIZE = 1024
12
+
13
+ def initialize(input, *output, select_timeout: nil)
14
+ @input = input
15
+ @output = output
16
+ @select_timeout = select_timeout
17
+ @done = false
18
+ end
19
+
20
+ def done?
21
+ @done
22
+ end
23
+
24
+ def tick
25
+ readable = IO.select([@input], nil, nil, @select_timeout)
26
+ return unless readable
27
+
28
+ buffer = readable.first.first.read_nonblock(BUFFER_SIZE)
29
+ @done = buffer.nil?
30
+ @output.each { |stream| stream.write(buffer) }
31
+ rescue IOError
32
+ @done = true
33
+ end
34
+ end
35
+ end