opener-daemons 1.3.0 → 2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: db9deedcb0722c93dce8dbf63951426670ccfc31
4
- data.tar.gz: 0c915961e981220c36f5668d89420d12eea6a615
3
+ metadata.gz: 84d367a150e9b90e3cef7c94b1d90db4791309f7
4
+ data.tar.gz: 4af1be3dd9992df37774dc7ee6f417c3343d044c
5
5
  SHA512:
6
- metadata.gz: bd96be27c2fd2c3a312afdb304b903262bbfee59d3a97a46c492aae903d744928e132afbfd91cca2069d2cf68fd3ea587537269a0268ad520146c17f59e4924c
7
- data.tar.gz: e460109a333e013dfbfe54228348f030475914406d8cc7faed4af9e1196cf36f04d8b07b55df2902161c6e6a82a18e957dd94a7fe0caf8e0fe70cebf47422392
6
+ metadata.gz: 7a45c4af290e89c800ee4a702f62f4ffb159cdc86952da55f93769747335cd44351f61d46b4dac558a573e16b1b33b38b208d0ceb3a0fbbc8454fe05cbd37f9e
7
+ data.tar.gz: 4cb8f013c3349958b89cfdbe2287ec4b275976e5c729df15fcd8302ab63a338ae095fcabc87734c38d4fb36160d2d5cfe7ff57707a6b8488ecac1abc27711cb6
data/README.md CHANGED
@@ -1,127 +1,169 @@
1
- # Opener::Daemons
1
+ # OpeNER Daemons
2
2
 
3
- This GEM is part of the OpeNER project, which is the NLP toolchain for the rest
4
- of us. This particular GEM makes is possible that al OpeNER components can
5
- actually be launched as deamons reading from and push to Amazon SQS queues.
3
+ This Gem makes it possible for OpeNER components to be used as a daemon using
4
+ Amazon SQS and Amazon S3. SQS is used for job input while S3 is used for storing
5
+ results. Daemons only take URLs as input, they don't allow text to be specified
6
+ directly due to size restrictions of SQS (a maximum of 256 KB).
6
7
 
7
- ## Installation
8
+ ## Usage
8
9
 
9
- Add this line to your application's Gemfile:
10
+ Create an executable file `bin/<component>-daemon`, for example
11
+ `bin/language-identifier-daemon`, with the following content:
10
12
 
11
- gem 'opener-daemons'
13
+ #!/usr/bin/env ruby
14
+ require 'opener/daemons'
12
15
 
13
- And then execute:
16
+ controller = Opener::Daemons::Controller.new(
17
+ :name => 'opener-<component>',
18
+ :exec_path => File.expand_path('../../exec/<component>.rb', __FILE__)
19
+ )
14
20
 
15
- $ bundle
21
+ controller.run
16
22
 
17
- Or install it yourself as:
23
+ Replace `<component>` with the name of the component. For example, for the
24
+ language identifier this would result in the following:
18
25
 
19
- $ gem install opener-daemons
26
+ #!/usr/bin/env ruby
27
+ require 'opener/daemons'
20
28
 
29
+ controller = Opener::Daemons::Controller.new(
30
+ :name => 'opener-language-identifier',
31
+ :exec_path => File.expand_path('../../exec/language-identifier.rb', __FILE__)
32
+ )
21
33
 
22
- ## SQS
34
+ controller.run
23
35
 
24
- The Opener-daemon GEM uses Amazon SQS service as a message service. In order for
25
- this to work properly you need an Amazon AWS account and you need to set the
26
- following 3 environment variables in your shell:
36
+ Next, create an executable file `exec/<component>.rb`, for example
37
+ `exec/language-identifier.rb`, with the following content:
27
38
 
28
- ```
29
- export AWS_ACCESS_KEY_ID='...'
30
- export AWS_SECRET_ACCESS_KEY='...'
31
- export AWS_REGION='...' #e.g. eu-west-1
32
- ```
39
+ #!/usr/bin/env ruby
40
+ require 'opener/daemons'
33
41
 
34
- To see how you specify which queues to use, checkout the usage section below.
42
+ require_relative '../lib/opener/<component>'
35
43
 
36
- ## Implementation
44
+ daemon = Opener::Daemons::Daemon.new(Opener::<constant>)
37
45
 
38
- This Gem is intended for use with other OpeNER components. In order to turn a
39
- component in to a Daemon you have to do the following:
46
+ daemon.start
40
47
 
41
- Add the opener-daemons gem to the gemspec of your component (or the the Gemfile
42
- if your component is not a gem).
48
+ Replace `<component>` with the component name, replace `<constant>` with the
49
+ corresponding constant. For example, for the language identifier:
43
50
 
44
- ```
45
- gem.add_dependency 'opener-daemons'
46
- ```
47
51
 
48
- Create a file in the bin/ directory of the component. Following the OpeNER
49
- naming conventions that will be something like this (e.g. the
50
- language-identifier). This file provides you with the option to launch a daemon
51
- from the command line.
52
+ #!/usr/bin/env ruby
53
+ require 'opener/daemons'
52
54
 
53
- touch bin/language-identifier-daemon
55
+ require_relative '../lib/opener/language_identifier'
54
56
 
55
- Then add the following lines and replace the language-identifier with your own
56
- component:
57
+ daemon = Opener::Daemons::Daemon.new(Opener::LanguageIdentifier)
57
58
 
58
- ```ruby
59
- #!/usr/bin/env ruby
60
- require 'rubygems'
61
- require 'opener/daemons'
59
+ daemon.start
62
60
 
63
- exec_path = File.expand_path("exec/language-identifier.rb")
64
- Opener::Daemons::Controller.new(:name=>"language-identifier",
65
- :exec_path=>exec_path)
66
- ```
61
+ If the component takes extra arguments, such as a resource path, these should be
62
+ set in the `initialize` method of the component using the actual environment
63
+ variables.
67
64
 
68
- After that you have to create a file that does the actual work in an "exec"
69
- directory. From the root of your component do this:
65
+ ## Requirements
70
66
 
71
- ```
72
- mkdir exec
73
- touch exec/language-identifier.rb
74
- ```
67
+ * A supported Ruby version (see below)
68
+ * Amazon SQS
69
+ * Amazon S3
75
70
 
76
- Then copy paste the following code into that file, replacing the
77
- "language-identifier" parts with your own component.
71
+ The following Ruby versions are supported:
78
72
 
79
- ```ruby
80
- require 'opener/daemons'
81
- require 'opener/language_identifier'
73
+ | Ruby | Required | Recommended |
74
+ |:---------|:--------------|:------------|
75
+ | MRI | >= 1.9.3 | >= 2.1.4 |
76
+ | Rubinius | >= 2.2 | >= 2.3.0 |
77
+ | JRuby | >= 1.7 | >= 1.7.16 |
82
78
 
83
- options = Opener::OptParser.parse!(ARGV)
84
- daemon = Opener::Daemon.new(Opener::LanguageIdentifier, options)
85
- daemon.start
86
- ```
79
+ ## Installation
87
80
 
88
- Now you should be able to launch yourself a LanguageIdentifier daemon. Check out
89
- the exact usage of the daemon by typing:
81
+ Install it from RubyGems:
90
82
 
91
- ```
92
- bin/language-identifier-daemon -h
93
- ```
83
+ gem install opener-daemons
94
84
 
95
- ## Usage
85
+ Or using Bundler:
86
+
87
+ # add this to your Gemfile
88
+ gem 'opener-daemons'
89
+
90
+ # then run this
91
+ bundle install
92
+
93
+ ## Job Format
94
+
95
+ Jobs should be serialized as JSON and should adhere to the JSON schema
96
+ definition [schema/sqs_input.json](schema/sqs_input.json). In short, a job is a
97
+ JSON object with the following fields:
98
+
99
+ * `input_url`: the input URL
100
+ * `callbacks`: an array of URLs
101
+ * `identifier`: a unique identifier to use for the file stored in S3, if no
102
+ value is given an identifier will be generated automatically
103
+ * `metadata`: an object containing arbitrary metadata, will be passed to every
104
+ callback URL
105
+
106
+ An example:
107
+
108
+ {
109
+ "input_url": "http://example.com/my-kaf.xml",
110
+ "callbacks": ["http://example.com/my-callback"],
111
+ "identifier": "foo123",
112
+ "metadata": {
113
+ "customer_id": 123
114
+ }
115
+ }
116
+
117
+ For more specific details see the schema.
118
+
119
+ ## Output
120
+
121
+ Daemon output is stored in an Amazon S3 bucket, output files are named
122
+ `<identifier>.xml` where `<identifier>` is the unique identifier of the
123
+ document. The content type of these documents is set to `application/xml`.
124
+ Metadata associated with the job (as specified in the `metadata` field) is saved
125
+ as metadata of the S3 object.
126
+
127
+ Callback URLs will receive the URL of an uploaded document, _not_ the actual
128
+ content itself. The S3 URLs are only valid for a limited time (currently 1 hour)
129
+ so callbacks must ensure they can process the input within that time limit.
130
+
131
+ ## Monitoring
132
+
133
+ Components using this Gem can measure performance using New Relic and report
134
+ errors using Rollbar. To support this the following two environment variables
135
+ must be set:
136
+
137
+ * `NEWRELIC_TOKEN`
138
+ * `ROLLBAR_TOKEN`
139
+
140
+ For New Relic the application names will be `opener-<component>` where
141
+ `<component>` is the component name, as defined by a component itself. If one of
142
+ these environment variables is not set the corresponding feature is disabled.
143
+
144
+ ## CLI Options
145
+
146
+ Each daemon takes a set of options that can be used to configure the input
147
+ queue, the S3 bucket and so forth. For an up to date list of these options and
148
+ their descriptions run a daemon using the `--help` option.
149
+
150
+ Some of these options set environment variables that can be used by components,
151
+ these are as following:
152
+
153
+ * `input`: sets the input queue in the `INPUT_QUEUE` variable
154
+ * `threads`: sets the amount of threads to use in the `DAEMON_THREADS` variable
155
+ * `bucket`: sets the S3 bucket to use for output documents in the
156
+ `OUTPUT_BUCKET` variable
157
+
158
+ ## Amazon Environment Variables
159
+
160
+ To properly configure the daemons for Amazon you should set the following
161
+ environment variables:
162
+
163
+ * `AWS_ACCESS_KEY_ID`
164
+ * `AWS_SECRET_ACCESS_KEY`
165
+ * `AWS_REGION`
96
166
 
97
- Once you implemented the daemon you can use the -h option to get usage
98
- information. It will look like this, with the "language-identifier" strings
99
- replaced by your own component.
100
-
101
- ```
102
- Usage: language-identifier.rb <start|stop|restart> [options]
103
-
104
- Specific options:
105
- -i, --input INPUT_QUEUE_NAME Input queue name
106
- -o, --output OUTPUT_QUEUE_NAME Output queue name
107
- -b, --batch-size BATCH_SIZE Request x messages at once where x is between 1 and 10
108
- -w, --workers NUMBER number of worker thread
109
- -r, --readers NUMBER number of reader threads
110
- -p, --writers NUMBER number of writer / pusher threads
111
- --log FILENAME Filename and path of logfile. Defaults to STDOUT
112
- --pid FILENAME Filename and path of pidfile. Defaults to /var/run/{filename}.rb
113
- --pidpath DIRNAME Directory where to put the PID file. Is Overwritten by --pid if that option is present
114
- --debug Turn on debug log level
115
-
116
- Common options:
117
- -h, --help Show this message
118
- ```
119
-
120
-
121
- ## Contributing
122
-
123
- 1. Fork it ( http://github.com/opener-project/opener-daemons/fork )
124
- 2. Create your feature branch (`git checkout -b my-new-feature`)
125
- 3. Commit your changes (`git commit -am 'Add some feature'`)
126
- 4. Push to the branch (`git push origin my-new-feature`)
127
- 5. Create new Pull Request
167
+ If you're running this daemon on an EC2 instance then the first two environment
168
+ variables will be set automatically if the instance has an associated IAM
169
+ profile. The `AWS_REGION` variable must _always_ be set.
@@ -0,0 +1,52 @@
1
+ module Opener
2
+ module Daemons
3
+ ##
4
+ # Configuration object for storing details about a single job.
5
+ #
6
+ # @!attribute [r] component
7
+ # @return [Class]
8
+ #
9
+ # @!attribute [r] input_url
10
+ # @return [String]
11
+ #
12
+ # @!attribute [r] callbacks
13
+ # @return [Array]
14
+ #
15
+ # @!attribute [r] metadata
16
+ # @return [Hash]
17
+ #
18
+ class Configuration
19
+ attr_reader :component, :input_url, :callbacks, :metadata
20
+
21
+ ##
22
+ # @param [Class] component
23
+ # @param [Hash] options
24
+ #
25
+ # @option options [String] :input_url
26
+ # @option options [String] :identifier
27
+ # @option options [Array] :callbacks
28
+ # @option options [Hash] :metadata
29
+ #
30
+ def initialize(component, options = {})
31
+ @component = component
32
+
33
+ options.each do |key, value|
34
+ instance_variable_set("@#{key}", value) if respond_to?(key)
35
+ end
36
+
37
+ @callbacks ||= []
38
+ @metadata ||= {}
39
+ end
40
+
41
+ ##
42
+ # Returns the identifier of the document. If no identifier was given a
43
+ # unique one will be generated instead.
44
+ #
45
+ # @return [String]
46
+ #
47
+ def identifier
48
+ return @identifier ||= SecureRandom.hex
49
+ end
50
+ end # Configuration
51
+ end # Daemons
52
+ end # Opener
@@ -1,166 +1,160 @@
1
- #
2
- # Original Idea by Charles Nutter
3
- # Copied from: https://gist.github.com/ik5/448884
4
- # Then adjusted.
5
- #
6
-
7
- require 'rubygems'
8
- require 'opener/daemons'
9
- require 'spoon'
10
-
11
1
  module Opener
12
2
  module Daemons
3
+ ##
4
+ # CLI controller for a component.
5
+ #
6
+ # @!attribute [r] name
7
+ # The name of the daemon.
8
+ # @return [String]
9
+ #
10
+ # @!attribute [r] exec_path
11
+ # The path to the script to daemonize.
12
+ # @return [String]
13
+ #
13
14
  class Controller
14
15
  attr_reader :name, :exec_path
15
16
 
16
- def initialize(options={})
17
+ ##
18
+ # @param [Hash] options
19
+ #
20
+ # @option options [String] :name
21
+ # @option options [String] :exec_path
22
+ #
23
+ def initialize(options = {})
24
+ @name = options.fetch(:name)
17
25
  @exec_path = options.fetch(:exec_path)
18
- @name = determine_name(options[:name])
19
- read_commandline
20
26
  end
21
27
 
22
- def determine_name(name)
23
- return identify(name) unless name.nil?
24
- get_name_from_exec_path
25
- end
28
+ ##
29
+ # Runs the CLI
30
+ #
31
+ # @param [Array] argv CLI arguments to parse.
32
+ #
33
+ def run(argv = ARGV)
34
+ slop = configure_slop
26
35
 
27
- def get_name_from_exec_path
28
- File.basename(exec_path, ".rb")
36
+ slop.parse(argv)
29
37
  end
30
38
 
31
- def read_commandline
32
- if ARGV[0] == 'start'
33
- start
34
- elsif ARGV[0] == 'stop'
35
- stop
36
- elsif ARGV[0] == 'restart'
37
- stop
38
- start
39
- else
40
- start_foreground
39
+ ##
40
+ # @return [Slop]
41
+ #
42
+ def configure_slop
43
+ parser = OptionParser.new(name)
44
+
45
+ parser.parser.run do |opts, args|
46
+ command = args.shift
47
+ new_args = args.reject { |arg| arg == '--' }
48
+
49
+ case command
50
+ when 'start'
51
+ start_background(opts, new_args)
52
+ when 'stop'
53
+ stop(opts)
54
+ when 'restart'
55
+ stop(opts)
56
+ start_background(opts, new_args)
57
+ else
58
+ start_foreground(opts, new_args)
59
+ end
41
60
  end
42
- end
43
61
 
44
- def options
45
- return @options if @options
46
- args = ARGV.dup
47
- @options = Opener::Daemons::OptParser.pre_parse!(args)
62
+ return parser
48
63
  end
49
64
 
50
-
51
- def pid_path
52
- return @pid_path unless @pid_path.nil?
53
- @pid_path = if options[:pid]
54
- File.expand_path(@options[:pid])
55
- elsif options[:pidpath]
56
- File.expand_path(File.join(@options[:pidpath], "#{name}.pid"))
57
- else
58
- "/var/run/#{File.basename($0, ".rb")}.pid"
59
- end
65
+ ##
66
+ # Runs the daemon in the foreground.
67
+ #
68
+ # @param [Slop] options
69
+ # @param [Array] argv
70
+ #
71
+ def start_foreground(options, argv = [])
72
+ exec(setup_env(options), exec_path, *argv)
60
73
  end
61
74
 
62
- def create_pid(pid)
63
- begin
64
- open(pid_path, 'w') do |f|
65
- f.puts pid
66
- end
67
- rescue => e
68
- STDERR.puts "Error: Unable to open #{pid_path} for writing:\n\t" +
69
- "(#{e.class}) #{e.message}"
70
- exit!
71
- end
72
- end
75
+ ##
76
+ # Starts the daemon in the background.
77
+ #
78
+ # @param [Slop] options
79
+ # @param [Array] argv
80
+ #
81
+ def start_background(options, argv = [])
82
+ pidfile = Pidfile.new(options[:pidfile])
83
+ pid = Process.spawn(
84
+ setup_env(options),
85
+ exec_path,
86
+ *argv,
87
+ :out => :close,
88
+ :err => :close,
89
+ :in => :close
90
+ )
91
+
92
+ pidfile.write(pid)
73
93
 
74
- def get_pid
75
- pid = false
76
94
  begin
77
- open(pid_path, 'r') do |f|
78
- pid = f.readline
79
- pid = pid.to_s.gsub(/[^0-9]/,'')
80
- end
81
- rescue => e
82
- STDOUT.puts "Info: Unable to open #{pid_path} for reading while checking for existing PID:\n\t" +
83
- "(#{e.class}) #{e.message}"
84
- end
95
+ # Wait until the process has _actually_ started.
96
+ Timeout.timeout(options[:wait]) { sleep(0.5) until pidfile.alive? }
85
97
 
98
+ puts "Process with Pidfile #{pidfile.read} started"
99
+ rescue Timeout::Error
100
+ pidfile.unlink
86
101
 
87
- if pid
88
- return pid.to_i
89
- else
90
- return nil
102
+ abort "Failed to start the process after #{options[:wait]} seconds"
91
103
  end
92
104
  end
93
105
 
94
- def remove_pidfile
95
- begin
96
- File.unlink(pid_path)
97
- rescue => e
98
- STDERR.puts "ERROR: Unable to unlink #{path}:\n\t" +
99
- "(#{e.class}) #{e.message}"
100
- exit
101
- end
102
- end
106
+ ##
107
+ # Stops the daemon.
108
+ #
109
+ # @param [Slop] options
110
+ #
111
+ def stop(options)
112
+ pidfile = Pidfile.new(options[:pidfile])
103
113
 
104
- def process_exists?
105
- begin
106
- pid = get_pid
107
- return false unless pid
108
- Process.kill(0, pid)
109
- true
110
- rescue Errno::ESRCH, TypeError # "PID is NOT running or is zombied
111
- false
112
- rescue Errno::EPERM
113
- STDERR.puts "No permission to query #{pid}!";
114
- false
115
- rescue => e
116
- STDERR.puts "Error: Unable to determine status for pid: #{pid}.\n\t" +
117
- "(#{e.class}) #{e.message}"
118
- false
119
- end
120
- end
114
+ if pidfile.alive?
115
+ id = pidfile.read
121
116
 
122
- def stop
123
- begin
124
- pid = get_pid
125
- STDOUT.puts "Stopping pid: #{pid}"
126
- while true do
127
- Process.kill("TERM", pid)
128
- Process.wait(pid)
129
- sleep(0.1)
130
- end
131
- rescue Errno::ESRCH # no more process to kill
132
- remove_pidfile
133
- STDOUT.puts 'Stopped the process'
134
- rescue => e
135
- STDERR.puts "Unable to terminate process: (#{e.class}) #{e.message}"
117
+ pidfile.terminate
118
+ pidfile.unlink
119
+
120
+ puts "Process with Pidfile #{id.inspect} terminated"
121
+ else
122
+ abort 'Process already terminated or you are not allowed to terminate it'
136
123
  end
137
124
  end
138
125
 
139
- def start
140
- if process_exists?
141
- STDERR.puts "Error: The process #{name} already running. Restarting the process"
142
- stop
126
+ ##
127
+ # Returns a Hash containing the various environment variables to set for
128
+ # the daemon (on top of the current environment variables).
129
+ #
130
+ # @param [Slop] options
131
+ # @return [Hash]
132
+ #
133
+ def setup_env(options)
134
+ newrelic_config = File.expand_path(
135
+ '../../../../config/newrelic.yml',
136
+ __FILE__
137
+ )
138
+
139
+ env = ENV.to_hash.merge(
140
+ 'INPUT_QUEUE' => options[:input].to_s,
141
+ 'DAEMON_THREADS' => options[:threads].to_s,
142
+ 'OUTPUT_BUCKET' => options[:bucket].to_s,
143
+ 'NRCONFIG' => newrelic_config,
144
+ 'APP_ROOT' => File.expand_path('../../../../', __FILE__),
145
+ 'APP_NAME' => name
146
+ )
147
+
148
+ if !env['RAILS_ENV'] and env['RACK_ENV']
149
+ env['RAILS_ENV'] = env['RACK_ENV']
143
150
  end
144
151
 
145
- STDOUT.puts "Starting DAEMON"
146
- pid = Spoon.spawnp exec_path, *ARGV
147
- STDOUT.puts "Started DAEMON"
148
- create_pid(pid)
149
- begin
150
- Process.setsid
151
- rescue Errno::EPERM => e
152
- STDERR.puts "Process.setsid not permitted on this platform, not critical. Continuing normal operations.\n\t (#{e.class}) #{e.message}"
152
+ unless options[:'disable-syslog']
153
+ env['ENABLE_SYSLOG'] = 'true'
153
154
  end
154
- File::umask(0)
155
- end
156
-
157
- def start_foreground
158
- exec [exec_path, ARGV].flatten.join(" ")
159
- end
160
155
 
161
- def identify(string)
162
- string.gsub(/\W/,"-").gsub("--","-")
156
+ return env
163
157
  end
164
- end
165
- end
166
- end
158
+ end # Controller
159
+ end # Daemons
160
+ end # Opener