opener-daemons 1.3.0 → 2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: db9deedcb0722c93dce8dbf63951426670ccfc31
4
- data.tar.gz: 0c915961e981220c36f5668d89420d12eea6a615
3
+ metadata.gz: 84d367a150e9b90e3cef7c94b1d90db4791309f7
4
+ data.tar.gz: 4af1be3dd9992df37774dc7ee6f417c3343d044c
5
5
  SHA512:
6
- metadata.gz: bd96be27c2fd2c3a312afdb304b903262bbfee59d3a97a46c492aae903d744928e132afbfd91cca2069d2cf68fd3ea587537269a0268ad520146c17f59e4924c
7
- data.tar.gz: e460109a333e013dfbfe54228348f030475914406d8cc7faed4af9e1196cf36f04d8b07b55df2902161c6e6a82a18e957dd94a7fe0caf8e0fe70cebf47422392
6
+ metadata.gz: 7a45c4af290e89c800ee4a702f62f4ffb159cdc86952da55f93769747335cd44351f61d46b4dac558a573e16b1b33b38b208d0ceb3a0fbbc8454fe05cbd37f9e
7
+ data.tar.gz: 4cb8f013c3349958b89cfdbe2287ec4b275976e5c729df15fcd8302ab63a338ae095fcabc87734c38d4fb36160d2d5cfe7ff57707a6b8488ecac1abc27711cb6
data/README.md CHANGED
@@ -1,127 +1,169 @@
1
- # Opener::Daemons
1
+ # OpeNER Daemons
2
2
 
3
- This GEM is part of the OpeNER project, which is the NLP toolchain for the rest
4
- of us. This particular GEM makes is possible that al OpeNER components can
5
- actually be launched as deamons reading from and push to Amazon SQS queues.
3
+ This Gem makes it possible for OpeNER components to be used as a daemon using
4
+ Amazon SQS and Amazon S3. SQS is used for job input while S3 is used for storing
5
+ results. Daemons only take URLs as input, they don't allow text to be specified
6
+ directly due to size restrictions of SQS (a maximum of 256 KB).
6
7
 
7
- ## Installation
8
+ ## Usage
8
9
 
9
- Add this line to your application's Gemfile:
10
+ Create an executable file `bin/<component>-daemon`, for example
11
+ `bin/language-identifier-daemon`, with the following content:
10
12
 
11
- gem 'opener-daemons'
13
+ #!/usr/bin/env ruby
14
+ require 'opener/daemons'
12
15
 
13
- And then execute:
16
+ controller = Opener::Daemons::Controller.new(
17
+ :name => 'opener-<component>',
18
+ :exec_path => File.expand_path('../../exec/<component>.rb', __FILE__)
19
+ )
14
20
 
15
- $ bundle
21
+ controller.run
16
22
 
17
- Or install it yourself as:
23
+ Replace `<component>` with the name of the component. For example, for the
24
+ language identifier this would result in the following:
18
25
 
19
- $ gem install opener-daemons
26
+ #!/usr/bin/env ruby
27
+ require 'opener/daemons'
20
28
 
29
+ controller = Opener::Daemons::Controller.new(
30
+ :name => 'opener-language-identifier',
31
+ :exec_path => File.expand_path('../../exec/language-identifier.rb', __FILE__)
32
+ )
21
33
 
22
- ## SQS
34
+ controller.run
23
35
 
24
- The Opener-daemon GEM uses Amazon SQS service as a message service. In order for
25
- this to work properly you need an Amazon AWS account and you need to set the
26
- following 3 environment variables in your shell:
36
+ Next, create an executable file `exec/<component>.rb`, for example
37
+ `exec/language-identifier.rb`, with the following content:
27
38
 
28
- ```
29
- export AWS_ACCESS_KEY_ID='...'
30
- export AWS_SECRET_ACCESS_KEY='...'
31
- export AWS_REGION='...' #e.g. eu-west-1
32
- ```
39
+ #!/usr/bin/env ruby
40
+ require 'opener/daemons'
33
41
 
34
- To see how you specify which queues to use, checkout the usage section below.
42
+ require_relative '../lib/opener/<component>'
35
43
 
36
- ## Implementation
44
+ daemon = Opener::Daemons::Daemon.new(Opener::<constant>)
37
45
 
38
- This Gem is intended for use with other OpeNER components. In order to turn a
39
- component in to a Daemon you have to do the following:
46
+ daemon.start
40
47
 
41
- Add the opener-daemons gem to the gemspec of your component (or the the Gemfile
42
- if your component is not a gem).
48
+ Replace `<component>` with the component name, replace `<constant>` with the
49
+ corresponding constant. For example, for the language identifier:
43
50
 
44
- ```
45
- gem.add_dependency 'opener-daemons'
46
- ```
47
51
 
48
- Create a file in the bin/ directory of the component. Following the OpeNER
49
- naming conventions that will be something like this (e.g. the
50
- language-identifier). This file provides you with the option to launch a daemon
51
- from the command line.
52
+ #!/usr/bin/env ruby
53
+ require 'opener/daemons'
52
54
 
53
- touch bin/language-identifier-daemon
55
+ require_relative '../lib/opener/language_identifier'
54
56
 
55
- Then add the following lines and replace the language-identifier with your own
56
- component:
57
+ daemon = Opener::Daemons::Daemon.new(Opener::LanguageIdentifier)
57
58
 
58
- ```ruby
59
- #!/usr/bin/env ruby
60
- require 'rubygems'
61
- require 'opener/daemons'
59
+ daemon.start
62
60
 
63
- exec_path = File.expand_path("exec/language-identifier.rb")
64
- Opener::Daemons::Controller.new(:name=>"language-identifier",
65
- :exec_path=>exec_path)
66
- ```
61
+ If the component takes extra arguments, such as a resource path, these should be
62
+ set in the `initialize` method of the component using the actual environment
63
+ variables.
67
64
 
68
- After that you have to create a file that does the actual work in an "exec"
69
- directory. From the root of your component do this:
65
+ ## Requirements
70
66
 
71
- ```
72
- mkdir exec
73
- touch exec/language-identifier.rb
74
- ```
67
+ * A supported Ruby version (see below)
68
+ * Amazon SQS
69
+ * Amazon S3
75
70
 
76
- Then copy paste the following code into that file, replacing the
77
- "language-identifier" parts with your own component.
71
+ The following Ruby versions are supported:
78
72
 
79
- ```ruby
80
- require 'opener/daemons'
81
- require 'opener/language_identifier'
73
+ | Ruby | Required | Recommended |
74
+ |:---------|:--------------|:------------|
75
+ | MRI | >= 1.9.3 | >= 2.1.4 |
76
+ | Rubinius | >= 2.2 | >= 2.3.0 |
77
+ | JRuby | >= 1.7 | >= 1.7.16 |
82
78
 
83
- options = Opener::OptParser.parse!(ARGV)
84
- daemon = Opener::Daemon.new(Opener::LanguageIdentifier, options)
85
- daemon.start
86
- ```
79
+ ## Installation
87
80
 
88
- Now you should be able to launch yourself a LanguageIdentifier daemon. Check out
89
- the exact usage of the daemon by typing:
81
+ Install it from RubyGems:
90
82
 
91
- ```
92
- bin/language-identifier-daemon -h
93
- ```
83
+ gem install opener-daemons
94
84
 
95
- ## Usage
85
+ Or using Bundler:
86
+
87
+ # add this to your Gemfile
88
+ gem 'opener-daemons'
89
+
90
+ # then run this
91
+ bundle install
92
+
93
+ ## Job Format
94
+
95
+ Jobs should be serialized as JSON and should adhere to the JSON schema
96
+ definition [schema/sqs_input.json](schema/sqs_input.json). In short, a job is a
97
+ JSON object with the following fields:
98
+
99
+ * `input_url`: the input URL
100
+ * `callbacks`: an array of URLs
101
+ * `identifier`: a unique identifier to use for the file stored in S3, if no
102
+ value is given an identifier will be generated automatically
103
+ * `metadata`: an object containing arbitrary metadata, will be passed to every
104
+ callback URL
105
+
106
+ An example:
107
+
108
+ {
109
+ "input_url": "http://example.com/my-kaf.xml",
110
+ "callbacks": ["http://example.com/my-callback"],
111
+ "identifier": "foo123",
112
+ "metadata": {
113
+ "customer_id": 123
114
+ }
115
+ }
116
+
117
+ For more specific details see the schema.
118
+
119
+ ## Output
120
+
121
+ Daemon output is stored in an Amazon S3 bucket, output files are named
122
+ `<identifier>.xml` where `<identifier>` is the unique identifier of the
123
+ document. The content type of these documents is set to `application/xml`.
124
+ Metadata associated with the job (as specified in the `metadata` field) is saved
125
+ as metadata of the S3 object.
126
+
127
+ Callback URLs will receive the URL of an uploaded document, _not_ the actual
128
+ content itself. The S3 URLs are only valid for a limited time (currently 1 hour)
129
+ so callbacks must ensure they can process the input within that time limit.
130
+
131
+ ## Monitoring
132
+
133
+ Components using this Gem can measure performance using New Relic and report
134
+ errors using Rollbar. To support this the following two environment variables
135
+ must be set:
136
+
137
+ * `NEWRELIC_TOKEN`
138
+ * `ROLLBAR_TOKEN`
139
+
140
+ For New Relic the application names will be `opener-<component>` where
141
+ `<component>` is the component name, as defined by a component itself. If one of
142
+ these environment variables is not set the corresponding feature is disabled.
143
+
144
+ ## CLI Options
145
+
146
+ Each daemon takes a set of options that can be used to configure the input
147
+ queue, the S3 bucket and so forth. For an up to date list of these options and
148
+ their descriptions run a daemon using the `--help` option.
149
+
150
+ Some of these options set environment variables that can be used by components,
151
+ these are as following:
152
+
153
+ * `input`: sets the input queue in the `INPUT_QUEUE` variable
154
+ * `threads`: sets the amount of threads to use in the `DAEMON_THREADS` variable
155
+ * `bucket`: sets the S3 bucket to use for output documents in the
156
+ `OUTPUT_BUCKET` variable
157
+
158
+ ## Amazon Environment Variables
159
+
160
+ To properly configure the daemons for Amazon you should set the following
161
+ environment variables:
162
+
163
+ * `AWS_ACCESS_KEY_ID`
164
+ * `AWS_SECRET_ACCESS_KEY`
165
+ * `AWS_REGION`
96
166
 
97
- Once you implemented the daemon you can use the -h option to get usage
98
- information. It will look like this, with the "language-identifier" strings
99
- replaced by your own component.
100
-
101
- ```
102
- Usage: language-identifier.rb <start|stop|restart> [options]
103
-
104
- Specific options:
105
- -i, --input INPUT_QUEUE_NAME Input queue name
106
- -o, --output OUTPUT_QUEUE_NAME Output queue name
107
- -b, --batch-size BATCH_SIZE Request x messages at once where x is between 1 and 10
108
- -w, --workers NUMBER number of worker thread
109
- -r, --readers NUMBER number of reader threads
110
- -p, --writers NUMBER number of writer / pusher threads
111
- --log FILENAME Filename and path of logfile. Defaults to STDOUT
112
- --pid FILENAME Filename and path of pidfile. Defaults to /var/run/{filename}.rb
113
- --pidpath DIRNAME Directory where to put the PID file. Is Overwritten by --pid if that option is present
114
- --debug Turn on debug log level
115
-
116
- Common options:
117
- -h, --help Show this message
118
- ```
119
-
120
-
121
- ## Contributing
122
-
123
- 1. Fork it ( http://github.com/opener-project/opener-daemons/fork )
124
- 2. Create your feature branch (`git checkout -b my-new-feature`)
125
- 3. Commit your changes (`git commit -am 'Add some feature'`)
126
- 4. Push to the branch (`git push origin my-new-feature`)
127
- 5. Create new Pull Request
167
+ If you're running this daemon on an EC2 instance then the first two environment
168
+ variables will be set automatically if the instance has an associated IAM
169
+ profile. The `AWS_REGION` variable must _always_ be set.
@@ -0,0 +1,52 @@
1
+ module Opener
2
+ module Daemons
3
+ ##
4
+ # Configuration object for storing details about a single job.
5
+ #
6
+ # @!attribute [r] component
7
+ # @return [Class]
8
+ #
9
+ # @!attribute [r] input_url
10
+ # @return [String]
11
+ #
12
+ # @!attribute [r] callbacks
13
+ # @return [Array]
14
+ #
15
+ # @!attribute [r] metadata
16
+ # @return [Hash]
17
+ #
18
+ class Configuration
19
+ attr_reader :component, :input_url, :callbacks, :metadata
20
+
21
+ ##
22
+ # @param [Class] component
23
+ # @param [Hash] options
24
+ #
25
+ # @option options [String] :input_url
26
+ # @option options [String] :identifier
27
+ # @option options [Array] :callbacks
28
+ # @option options [Hash] :metadata
29
+ #
30
+ def initialize(component, options = {})
31
+ @component = component
32
+
33
+ options.each do |key, value|
34
+ instance_variable_set("@#{key}", value) if respond_to?(key)
35
+ end
36
+
37
+ @callbacks ||= []
38
+ @metadata ||= {}
39
+ end
40
+
41
+ ##
42
+ # Returns the identifier of the document. If no identifier was given a
43
+ # unique one will be generated instead.
44
+ #
45
+ # @return [String]
46
+ #
47
+ def identifier
48
+ return @identifier ||= SecureRandom.hex
49
+ end
50
+ end # Configuration
51
+ end # Daemons
52
+ end # Opener
@@ -1,166 +1,160 @@
1
- #
2
- # Original Idea by Charles Nutter
3
- # Copied from: https://gist.github.com/ik5/448884
4
- # Then adjusted.
5
- #
6
-
7
- require 'rubygems'
8
- require 'opener/daemons'
9
- require 'spoon'
10
-
11
1
  module Opener
12
2
  module Daemons
3
+ ##
4
+ # CLI controller for a component.
5
+ #
6
+ # @!attribute [r] name
7
+ # The name of the daemon.
8
+ # @return [String]
9
+ #
10
+ # @!attribute [r] exec_path
11
+ # The path to the script to daemonize.
12
+ # @return [String]
13
+ #
13
14
  class Controller
14
15
  attr_reader :name, :exec_path
15
16
 
16
- def initialize(options={})
17
+ ##
18
+ # @param [Hash] options
19
+ #
20
+ # @option options [String] :name
21
+ # @option options [String] :exec_path
22
+ #
23
+ def initialize(options = {})
24
+ @name = options.fetch(:name)
17
25
  @exec_path = options.fetch(:exec_path)
18
- @name = determine_name(options[:name])
19
- read_commandline
20
26
  end
21
27
 
22
- def determine_name(name)
23
- return identify(name) unless name.nil?
24
- get_name_from_exec_path
25
- end
28
+ ##
29
+ # Runs the CLI
30
+ #
31
+ # @param [Array] argv CLI arguments to parse.
32
+ #
33
+ def run(argv = ARGV)
34
+ slop = configure_slop
26
35
 
27
- def get_name_from_exec_path
28
- File.basename(exec_path, ".rb")
36
+ slop.parse(argv)
29
37
  end
30
38
 
31
- def read_commandline
32
- if ARGV[0] == 'start'
33
- start
34
- elsif ARGV[0] == 'stop'
35
- stop
36
- elsif ARGV[0] == 'restart'
37
- stop
38
- start
39
- else
40
- start_foreground
39
+ ##
40
+ # @return [Slop]
41
+ #
42
+ def configure_slop
43
+ parser = OptionParser.new(name)
44
+
45
+ parser.parser.run do |opts, args|
46
+ command = args.shift
47
+ new_args = args.reject { |arg| arg == '--' }
48
+
49
+ case command
50
+ when 'start'
51
+ start_background(opts, new_args)
52
+ when 'stop'
53
+ stop(opts)
54
+ when 'restart'
55
+ stop(opts)
56
+ start_background(opts, new_args)
57
+ else
58
+ start_foreground(opts, new_args)
59
+ end
41
60
  end
42
- end
43
61
 
44
- def options
45
- return @options if @options
46
- args = ARGV.dup
47
- @options = Opener::Daemons::OptParser.pre_parse!(args)
62
+ return parser
48
63
  end
49
64
 
50
-
51
- def pid_path
52
- return @pid_path unless @pid_path.nil?
53
- @pid_path = if options[:pid]
54
- File.expand_path(@options[:pid])
55
- elsif options[:pidpath]
56
- File.expand_path(File.join(@options[:pidpath], "#{name}.pid"))
57
- else
58
- "/var/run/#{File.basename($0, ".rb")}.pid"
59
- end
65
+ ##
66
+ # Runs the daemon in the foreground.
67
+ #
68
+ # @param [Slop] options
69
+ # @param [Array] argv
70
+ #
71
+ def start_foreground(options, argv = [])
72
+ exec(setup_env(options), exec_path, *argv)
60
73
  end
61
74
 
62
- def create_pid(pid)
63
- begin
64
- open(pid_path, 'w') do |f|
65
- f.puts pid
66
- end
67
- rescue => e
68
- STDERR.puts "Error: Unable to open #{pid_path} for writing:\n\t" +
69
- "(#{e.class}) #{e.message}"
70
- exit!
71
- end
72
- end
75
+ ##
76
+ # Starts the daemon in the background.
77
+ #
78
+ # @param [Slop] options
79
+ # @param [Array] argv
80
+ #
81
+ def start_background(options, argv = [])
82
+ pidfile = Pidfile.new(options[:pidfile])
83
+ pid = Process.spawn(
84
+ setup_env(options),
85
+ exec_path,
86
+ *argv,
87
+ :out => :close,
88
+ :err => :close,
89
+ :in => :close
90
+ )
91
+
92
+ pidfile.write(pid)
73
93
 
74
- def get_pid
75
- pid = false
76
94
  begin
77
- open(pid_path, 'r') do |f|
78
- pid = f.readline
79
- pid = pid.to_s.gsub(/[^0-9]/,'')
80
- end
81
- rescue => e
82
- STDOUT.puts "Info: Unable to open #{pid_path} for reading while checking for existing PID:\n\t" +
83
- "(#{e.class}) #{e.message}"
84
- end
95
+ # Wait until the process has _actually_ started.
96
+ Timeout.timeout(options[:wait]) { sleep(0.5) until pidfile.alive? }
85
97
 
98
+ puts "Process with Pidfile #{pidfile.read} started"
99
+ rescue Timeout::Error
100
+ pidfile.unlink
86
101
 
87
- if pid
88
- return pid.to_i
89
- else
90
- return nil
102
+ abort "Failed to start the process after #{options[:wait]} seconds"
91
103
  end
92
104
  end
93
105
 
94
- def remove_pidfile
95
- begin
96
- File.unlink(pid_path)
97
- rescue => e
98
- STDERR.puts "ERROR: Unable to unlink #{path}:\n\t" +
99
- "(#{e.class}) #{e.message}"
100
- exit
101
- end
102
- end
106
+ ##
107
+ # Stops the daemon.
108
+ #
109
+ # @param [Slop] options
110
+ #
111
+ def stop(options)
112
+ pidfile = Pidfile.new(options[:pidfile])
103
113
 
104
- def process_exists?
105
- begin
106
- pid = get_pid
107
- return false unless pid
108
- Process.kill(0, pid)
109
- true
110
- rescue Errno::ESRCH, TypeError # "PID is NOT running or is zombied
111
- false
112
- rescue Errno::EPERM
113
- STDERR.puts "No permission to query #{pid}!";
114
- false
115
- rescue => e
116
- STDERR.puts "Error: Unable to determine status for pid: #{pid}.\n\t" +
117
- "(#{e.class}) #{e.message}"
118
- false
119
- end
120
- end
114
+ if pidfile.alive?
115
+ id = pidfile.read
121
116
 
122
- def stop
123
- begin
124
- pid = get_pid
125
- STDOUT.puts "Stopping pid: #{pid}"
126
- while true do
127
- Process.kill("TERM", pid)
128
- Process.wait(pid)
129
- sleep(0.1)
130
- end
131
- rescue Errno::ESRCH # no more process to kill
132
- remove_pidfile
133
- STDOUT.puts 'Stopped the process'
134
- rescue => e
135
- STDERR.puts "Unable to terminate process: (#{e.class}) #{e.message}"
117
+ pidfile.terminate
118
+ pidfile.unlink
119
+
120
+ puts "Process with Pidfile #{id.inspect} terminated"
121
+ else
122
+ abort 'Process already terminated or you are not allowed to terminate it'
136
123
  end
137
124
  end
138
125
 
139
- def start
140
- if process_exists?
141
- STDERR.puts "Error: The process #{name} already running. Restarting the process"
142
- stop
126
+ ##
127
+ # Returns a Hash containing the various environment variables to set for
128
+ # the daemon (on top of the current environment variables).
129
+ #
130
+ # @param [Slop] options
131
+ # @return [Hash]
132
+ #
133
+ def setup_env(options)
134
+ newrelic_config = File.expand_path(
135
+ '../../../../config/newrelic.yml',
136
+ __FILE__
137
+ )
138
+
139
+ env = ENV.to_hash.merge(
140
+ 'INPUT_QUEUE' => options[:input].to_s,
141
+ 'DAEMON_THREADS' => options[:threads].to_s,
142
+ 'OUTPUT_BUCKET' => options[:bucket].to_s,
143
+ 'NRCONFIG' => newrelic_config,
144
+ 'APP_ROOT' => File.expand_path('../../../../', __FILE__),
145
+ 'APP_NAME' => name
146
+ )
147
+
148
+ if !env['RAILS_ENV'] and env['RACK_ENV']
149
+ env['RAILS_ENV'] = env['RACK_ENV']
143
150
  end
144
151
 
145
- STDOUT.puts "Starting DAEMON"
146
- pid = Spoon.spawnp exec_path, *ARGV
147
- STDOUT.puts "Started DAEMON"
148
- create_pid(pid)
149
- begin
150
- Process.setsid
151
- rescue Errno::EPERM => e
152
- STDERR.puts "Process.setsid not permitted on this platform, not critical. Continuing normal operations.\n\t (#{e.class}) #{e.message}"
152
+ unless options[:'disable-syslog']
153
+ env['ENABLE_SYSLOG'] = 'true'
153
154
  end
154
- File::umask(0)
155
- end
156
-
157
- def start_foreground
158
- exec [exec_path, ARGV].flatten.join(" ")
159
- end
160
155
 
161
- def identify(string)
162
- string.gsub(/\W/,"-").gsub("--","-")
156
+ return env
163
157
  end
164
- end
165
- end
166
- end
158
+ end # Controller
159
+ end # Daemons
160
+ end # Opener