RubyGems - chutzen - Versions diffs - 0.8.0 - Mend

chutzen 0.8.0

Files changed (25) hide show

checksums.yaml +7 -0
data/README.md +244 -0
data/lib/chutzen/apply.rb +49 -0
data/lib/chutzen/command/execution_failed.rb +42 -0
data/lib/chutzen/command.rb +216 -0
data/lib/chutzen/demux.rb +35 -0
data/lib/chutzen/dictionary.rb +135 -0
data/lib/chutzen/expression/lookup_error.rb +30 -0
data/lib/chutzen/expression/syntax_error.rb +28 -0
data/lib/chutzen/expression.rb +74 -0
data/lib/chutzen/expression_parser.rb +47 -0
data/lib/chutzen/job.rb +80 -0
data/lib/chutzen/notification.rb +32 -0
data/lib/chutzen/runtime_error.rb +21 -0
data/lib/chutzen/shell.rb +27 -0
data/lib/chutzen/signal.rb +49 -0
data/lib/chutzen/standard_error.rb +10 -0
data/lib/chutzen/template/syntax_error.rb +28 -0
data/lib/chutzen/template.rb +39 -0
data/lib/chutzen/template_parser.rb +17 -0
data/lib/chutzen/version.rb +5 -0
data/lib/chutzen/watcher.rb +111 -0
data/lib/chutzen/worker.rb +16 -0
data/lib/chutzen.rb +87 -0
metadata +136 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: e078b40512684bbdd04c341199d45ed740749194fa8a1154cc3760afdd9cefeb
+  data.tar.gz: 2eca64fa5545fc9c7722ab06fd5456458b31e7347248bb67f30237616c1c24e7
+SHA512:
+  metadata.gz: 2f53fd4ea333c556f30143fc580b8b5c0fb9d2450bf093c29e9c60f97147836c280b6b999ed33c13ff1c807004bb7b132f994fea0881d4ee4f3730300a974cd8
+  data.tar.gz: 943189ce50907777c0d663e4c755277dabe35fbbeadcded292c53dd657b0a766f55aff5d88dc3cee6b47e94bf932694b0849f88fac1fa02c43e6f905c95548c5

data/README.md ADDED Viewed

@@ -0,0 +1,244 @@
+# Chutzen
+Chutzen is a little toolkit to help with implementing batch processing. It uses Sidekiq as a job queue for incoming jobs and as a notification queue for outgoing notifications.
+## Enqueueing a job
+Let's look at an example app called Squirrel that wants to collect GIFs from the internet.
+```ruby
+Chutzen.perform_async(
+  'notify' => { 'class' => 'Squirrel::Result' },
+  'result' => { 'id' => 42 },
+  'commands' => [
+    {
+      'execute' => [
+        'curl',
+        {
+          '-o' => 'food.gif'
+        },
+        'https://example.com/food.gif'
+      ]
+      'skip_when' => '12 > 24',
+      'fail_when' => { 'read_timeout' => 2.5 },
+      'result' => {
+        'food' => 'found'
+      }
+    }
+  ]
+)
+```
+### Dynamic expresssions
+Some properties in the job description accept dynamic expressions. Expressions are an onion of expression, template, and expressons.
+Let's start with a complex example and then peal the onion.
+    ${video.width} < ${minimal_width}
+Any expression is first evaluated as a *template*. All values between the `${}` notation are evaluated as a *query* and then replaced in the string. The resulting string is again evaluate as a *query* and results in a value. So it's queries, inside a template, inside a query.
+Let's take the following job *dictionary* as an example:
+    { 'minimal_width' => 512, 'video' => { 'width' => 1080 } }
+Step one we replace all the variable expressions by their *query* result in the template.
+    video.width => 1080
+    minimal_width => 512
+So the resulting string is
+    1080 < 512
+In certain properties (eg. `skip_when`) the resulting string is evaluated as a *query*.
+    1080 < 512 => false
+The documentation below will tell you when a property is treated as a *template* to produce a string and when a property is treated as a *query* to return a (boolean) value.
+### Variable expressions
+A variable expression starts with `${` and ends with `}`. The expression can be anything valid in the Chutzen query language (see below).
+A literal expression starts with `@` and is replaced by the entire unparsed output from a command.
+```
+{
+  'duration' => 'The duration is: ${ffprobe.format.duration}',
+  #=> {"duration":"The duration is: 55433.234"}
+  'ffprobe_output' => '@ffprobe'
+  #=> {"ffprobe_output":{"steams":[…],"format":…}}
+}
+```
+### The Chutzen query language
+The primary purpose of the query language is to select values from nested hashes and arrays.
+The concept of the language is that you dig into a mixed structure of hashes and arrays. Numeric values reference the n-th item in an array. Strings are used as the key in a hash. The query is separated by dots.
+For example the following Chutzen expression is the same as the Ruby expression below,
+    streams.0.height
+    dictionary['streams'][0]['height']
+If you don't want to reference the entire path you can prefix an expression with `//`, this will perform a depth-first search for the first occurance of that key in the structure. Deep reference with an array index generally don't make a lot of sense.
+For example, givens the following structure.
+    {
+      'streams' => [
+        { 'height' => 12, 'width' => 12, 'codec' => 'H.264 },
+        { 'height' => 42 }
+      ]
+    }
+These are valid queries with their result.
+    streams.1.height => 42
+    streams.0.codec => 'H.265'
+    streams.1.codec => nil
+    //height => 12
+Identifiers reference a value in the structure. You can also compare expression with the `=`, `&`, `|`, `<`, and `>` operators.
+    12 = 12 => true
+    streams.1.height = 65 => false
+    //height = 12 => true
+    streams.0.codec = 'MOV' => false
+    streams.0.height > 12 => false
+    true & false => false
+    true | false => true
+    streams.1.codec = undefined => true
+### Job description
+* notify: A hash of options used to enqueue a Sidekiq worker to report back after a command was performed. This section must include at least a *class*, but may also include all other valid Sidekiq job settings like *queue*. Note that the *arg* setting will be overwritten with the payload.
+* result: A hash that is sent as a payload with each notification. You can use this to send a reference to the job to you can always associate results and errors with the job. Interpreted as *template*.
+* commands: A list of command hashes which must be performed to complete the job.
+### Command description
+* execute: A string or list which describes a command to be executed, see below for details. Interpreted as *template*.
+* skip_when: Describes when to skip the command. Interpreted as *boolean query*.
+* perform_when: Describes when to perform the commmand. Inverse of `skip_when` for convenience. Interpreted as *boolean query*.
+* fail_when: Describes when to kill the command (ie. in case of a timeout).
+* merge: Hash to instruct that an output stream has to be stored in the global job dictionary, see below for details.
+* result: Hash with values to use as the payload for the notify worker. Values may include variables. Note that the payload will be encoded as JSON to increase interoperability and allow for easier backwards compatiblity in case of notification payload changes. Properties are interpreted as *template*.
+* remember: Merge keys into the global job dictionary. Properties are interpreted as *template*.
+* optional: When set to true the command will not throw an exception in case of a non-zero exit code of the executed command and it will continue processing the next command.
+### Execute description
+When *execute* contain a string it will be passed to be executed verbatim. An Array will be joined and Hash values in the Array will be joined to generate valid switch values. Arrays and Hashes are supported to make it easier to write code that merges repeating elements in a job description.
+### Fail when description
+The *fail_when* section can contain any of the following options:
+* read_timeout: Check if the stdout or stderr grows and kill the command if this doesn't happen for the specified number of seconds.
+### Merge description
+Currently this only supports one of two streams: *stdout* or *stderr*.  The value of either section **must** be *json*, *string*, or *integer*. This setting causes the command to save the output of the indicated stream and parse it as the indicated format. The parsed value will be added to the global job dictionary which can be referenced in queries in subsequent commands.
+You can us the *as* section to set the name for the dictionary in case the command is not useful.
+```ruby
+{ 'merge' => { 'stdout' => 'integer', 'as' => 'user_count' } }
+```
+### Remember description
+Merges additional data into the global job dictionary. This is useful when you want to prevent repeating complex expressions across jobs or perform nested boolean expressions.
+```ruby
+{
+  'merge' => { 'stdout' => 'string', 'as' => 'result' },
+  'remember' => {
+    'missing' => "result = 'missing'",
+    'blank' => "result = 'blank'",
+    'do_nothing' => 'missing | blank'
+  }
+},
+{
+  'skip_when' => 'do_nothing'
+}
+```
+Using `remember` you can store an undefined value. A Chutzen expression uses the special identifier `undefined` to express this value. This works similar to `true` and `false`.
+```ruby
+{
+  'remember' => {
+    'count' => 'undefined',
+    'work' => 'undefined = count'
+  }
+}
+```
+## Receiving notifications
+The arguments passed to *notify* will be used to schedule a Sidekiq job back on the the queue so your application can process the result.
+```ruby
+module Squirrel
+  class Result
+    include Sidekiq::Worker
+    def perform(payload)
+      args = JSON.parse(payload)
+      # … do something with the args
+    end
+  end
+end
+```
+## Safely stopping Chutzen
+Chutzen is basically a Sidekiq process with some extra trimming. In many cases it will be started in a way similar to this:
+    sidekiq --queue chutzen --require ./lib/chutzen.rb
+Sidekiq works optimally with small fire-and-forget tasks written in Ruby. Chutzen is more about batch processing and generally starts additional processes in the same progress group.
+For example:
+       3630 ?        Ssl  290:59 sidekiq 6.1.2 chutzen [2 of 2 busy]
+    1346100 ?        S      0:00  \_ sh -c ffmpeg
+    1346101 ?        RLl   64:19  |   \_ ffmpeg
+You can tell Sidekiq to stop accepting new jobs by sending it the `TSTP` signal. The intended result is that it finishes the current job without starting new jobs so you can eventually restart Sidekiq safely. In our example that would be:
+    kill -TSTP 3630
+Downside of this approach is that it also stops the ffmpeg process because all processes in the process group receive the same signal. So everything stops and nothing ever finishes.
+The solution is to send a Chutzen signal.
+We tell Sidekiq to listen to an additional ‘control’ queue. Which is basically just a regular queue used to send signal jobs to a specific host.
+    sidekiq --queue chutzen --queue chutzen-13-production \
+      --require ./lib/chutzen.rb
+Now we can schedule the `Chutzen::Signal` worker through the `emit_signal` convenience method.
+    Chutzen.emit_signal('stop', queue: 'chutzen-13-production')
+Downside of using a job is that you can enqueue multiple signals which may keep stopping Chutzen.
+    20.times { Chutzen.emit_signal('stop', queue: 'chutzen-13') }
+That is why a `Chutzen::Signal` worker will, by default, only process within 30 seconds after is was enqueued. The expiration time can be expressed in a number of ways if you don't like the default.
+    Chutzen.emit_signal('stop', queue: 'c', expires_in: 10) # seconds
+    Chutzen.emit_signal('stop', queue: 'c', expires_at: Time.now + 3600)
+    Chutzen::Signal.
+      set(queue: 'chutzen-5-test').
+      perform_async('stop', (Time.now + 10).to_i)
+## New Relic integrations
+Chutzen can send custom metrics and exceptions to New Relic. The New Relic gem is ‘dynamically’ loaded so you have to make sure it's installed for this to work. You also have to provide a valid `confg/newrelic.yml` file relative to the working directory where Chutzen is started.

data/lib/chutzen/apply.rb ADDED Viewed

@@ -0,0 +1,49 @@
+# frozen_string_literal: true
+module Chutzen
+  # Applies a data hash with values of Chutzen expressions to a destination Dictionary using a
+  # deep merge.
+  class Apply
+    def initialize(target, lookup)
+      @target = target
+      @lookup = lookup
+    end
+    def merge(source, path = [])
+      case source
+      when Hash
+        merge_hash(source, path)
+      when Array
+        merge_array(source, path)
+      else
+        merge_scalar(source, path)
+      end
+    end
+    private
+    def merge_hash(source, path)
+      source.each do |key, value|
+        merge(value, path + [key])
+      end
+    end
+    def merge_array(source, path)
+      source.each_with_index do |value, index|
+        merge(value, path + [index])
+      end
+    end
+    def merge_scalar(expression, path)
+      expression = Chutzen::Template.new(expression, @lookup).result if expression.is_a?(String)
+      merge_expression(expression, path)
+    end
+    def merge_expression(expression, path)
+      expression = Chutzen::Expression.new(expression, @lookup).result if expression.is_a?(String)
+      @target.bury!(path, expression)
+    end
+  end
+end

data/lib/chutzen/command/execution_failed.rb ADDED Viewed

@@ -0,0 +1,42 @@
+# frozen_string_literal: true
+module Chutzen
+  class Command
+    # Raised when execution of a command fails.
+    class ExecutionFailed < Chutzen::StandardError
+      # Instance of Command that triggered the error.
+      attr_reader :command
+      def initialize(message, command: nil)
+        super(message)
+        @command = command
+      end
+      def as_json
+        { 'error' => details }
+      end
+      private
+      def details
+        {
+          'message' => message,
+          'command' => @command&.to_s,
+          'stdout_tail' => last(@command&.stdout),
+          'stderr_tail' => last(@command&.stderr),
+          'exit_status' => @command&.exit_status&.exitstatus
+        }.compact
+      end
+      def last(io)
+        return unless io
+        io.rewind
+        content = io.read.strip
+        return if content.empty?
+        content[-1024..] || content
+      end
+    end
+  end
+end

data/lib/chutzen/command.rb ADDED Viewed

@@ -0,0 +1,216 @@
+# frozen_string_literal: true
+require 'open3'
+require 'stringio'
+module Chutzen
+  # Holds a description for a command and executes it.
+  class Command
+    ALLOW_EVAL = /\A[\d<>=\s]+\z/.freeze
+    autoload :ExecutionFailed, 'chutzen/command/execution_failed'
+    # Returns a StringIO object with all data the command wrote to stdout.
+    attr_reader :stdout
+    # Returns a StringIO object with all data the command wrote to stderr.
+    attr_reader :stderr
+    # Returns a Process::Status object after the command has run.
+    attr_reader :exit_status
+    # Creates a new command with a description hash, dictionary, and the
+    # current work path.
+    #
+    #   Command.new(
+    #     { 'execute' => 'ls -al' },
+    #     dictionary: dictionary,
+    #     work_path: Dir.pwd
+    #   )
+    def initialize(description, dictionary:, work_path:)
+      defaults
+      description.each do |name, value|
+        instance_variable_set("@#{name}", value)
+      end
+      @dictionary = dictionary
+      @work_path = work_path
+      @stdout = StringIO.new
+      @stderr = StringIO.new
+    end
+    # Return true when the command should be performed but does not need to exit
+    # successfully.
+    def optional?
+      @optional
+    end
+    # Joins all the arguments from the execute sections to build a command
+    # that can be executed by a shell.
+    def to_s
+      Template.new(Shell.join(@execute), @dictionary).result
+    end
+    # Attempts to return the name of the binary that is being executed.
+    def name
+      to_s.split(' ')[0].split('/').last
+    end
+    # Runs the job based on its description. Returns the job itself for
+    # convenience.
+    def perform
+      return if skip?
+      result = @execute ? perform_command : self
+      merge_remember_into_dictionary
+      result
+    end
+    def perform_command
+      trace_execution do
+        # We use the STOP signal to instruct Sidekiq to stop reading from the queue and the TERM
+        # signal to kill and re-queue running jobs. Unfortunately the command opened by popen would
+        # also receive the signal and stop or terminate the command. When stopped the command would
+        # never complete and when killed it would probably fail the job.
+        #
+        # Chutzen can't trap the signals because it runs in the same process. We don't want to wrap
+        # the command in a runner because that's even more inconvenient.
+        #
+        # As a tradeoff we start the command in its own process group. Upside is that it ignores the
+        # signals of its parent process. Downside is that it may orphan the process when Sidekiq
+        # stops.
+        Open3.popen3(to_s, chdir: @work_path, pgroup: true) do |stdin, stdout, stderr, thread|
+          stdin.close_write
+          monitor(stdout, stderr, thread)
+          @exit_status = thread.value
+        end
+        verify_exit_status
+        merge_output_into_dictionary
+        self
+      end
+    end
+    # Returns the result expressed by the result section but with its variables
+    # instantiated.
+    def result
+      return nil unless @result
+      result = Dictionary.new
+      Apply.new(result, @dictionary).merge(@result)
+      result.to_hash
+    end
+    # Returns the value expressed by the skip_when section but with its
+    # variables instantiated.
+    def skip_when
+      return nil unless @skip_when
+      Expression.new(Template.new(@skip_when, @dictionary).result, @dictionary)
+    end
+    # Returns the value expressed by the perform_when section but with its
+    # variables instantiated.
+    def perform_when
+      return nil unless @perform_when
+      Expression.new(Template.new(@perform_when, @dictionary).result, @dictionary)
+    end
+    private
+    def defaults
+      @execute = nil
+      @optional = false
+      @merge = nil
+      @result = nil
+      @remember = nil
+      @fail_when = nil
+      @skip_when = nil
+      @perform_when = nil
+    end
+    def skip?
+      if skip_when
+        skip_when.result
+      elsif perform_when
+        !perform_when.result
+      else
+        false
+      end
+    end
+    def monitor(stdout, stderr, thread)
+      watcher = Watcher.new(fail_when: @fail_when, files: [@stdout, @stderr])
+      wait_until_done(
+        Demux.new(
+          stdout, $stdout, @stdout, select_timeout: watcher.select_timeout
+        ),
+        Demux.new(
+          stderr, $stderr, @stderr, select_timeout: watcher.select_timeout
+        ),
+        watcher
+      )
+    rescue Chutzen::Watcher::Error => e
+      Process.kill('KILL', thread.pid)
+      raise ExecutionFailed.new(e.message, command: self) unless optional?
+    end
+    def wait_until_done(*operations)
+      operations.each(&:tick) until operations.all?(&:done?)
+    end
+    # Raises an exception when the job was required to succeed but didn't.
+    def verify_exit_status
+      return if optional?
+      return if exit_status.success?
+      raise ExecutionFailed.new('Failed to execute command.', command: self)
+    end
+    # Adds the result hash from the command into the dictionary hash.
+    def merge_output_into_dictionary
+      return unless @merge
+      streams = @merge.dup
+      as = streams.delete('as') || name
+      streams.each do |stream, format|
+        @dictionary.merge!(as => cast_data(format, send(stream).string.strip))
+      end
+    end
+    # Evaluates expressions and adds then to the dictionary hash.
+    def merge_remember_into_dictionary
+      return unless @remember
+      Apply.new(@dictionary, @dictionary).merge(@remember)
+    end
+    def cast_data(format, data)
+      case format
+      when 'json'
+        JSON.parse(data)
+      when 'integer'
+        data.to_i
+      when 'string'
+        data
+      else
+        raise(
+          ArgumentError,
+          "Only `string', `json', and `integer' are currently supported for merging."
+        )
+      end
+    end
+    def trace_execution(&block)
+      if defined?(::NewRelic)
+        NewRelic::Agent::MethodTracer.trace_execution_scoped(
+          transaction_scope,
+          &block
+        )
+      else
+        yield
+      end
+    end
+    def transaction_scope
+      %W[Custom/Command/#{name}]
+    end
+  end
+end

data/lib/chutzen/demux.rb ADDED Viewed

@@ -0,0 +1,35 @@
+# frozen_string_literal: true
+module Chutzen
+  # Reads from one stream and writes to a set of other streams.
+  #
+  #   demux = Demux.new(stdin, [file, file, socket, stdout])
+  #   until demux.done?
+  #     demux.tick
+  #   end
+  class Demux
+    BUFFER_SIZE = 1024
+    def initialize(input, *output, select_timeout: nil)
+      @input = input
+      @output = output
+      @select_timeout = select_timeout
+      @done = false
+    end
+    def done?
+      @done
+    end
+    def tick
+      readable = IO.select([@input], nil, nil, @select_timeout)
+      return unless readable
+      buffer = readable.first.first.read_nonblock(BUFFER_SIZE)
+      @done = buffer.nil?
+      @output.each { |stream| stream.write(buffer) }
+    rescue IOError
+      @done = true
+    end
+  end
+end