opener-webservice 2.0.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 02a6433aa01c06b1251256710c706a59e45586b2
4
- data.tar.gz: 6dc2b0270835171445461234f8123fb13e267d36
3
+ metadata.gz: 32302b9a2aac215c69670cd55e4c28d22901df9c
4
+ data.tar.gz: d24fb9f1fc3113ff70685bd1c2ce440eced844c6
5
5
  SHA512:
6
- metadata.gz: 733000a6cd389bcfbdfd0865432cc45c4896d4564a653fa7ad5a14a123d1cf5eb28ab0202e13fa47daa4e5f2b4a64f88f753abf3046d71fc11d7198329468c02
7
- data.tar.gz: cfd032408fe8d59f391577d029eb96880684eb93ff61f4920f6264ac9ac214bed9d869b52b81d612c95ac386f552e43d8c5055e2024aa82e52a4597a778285fc
6
+ metadata.gz: 5c96ce12b9449ee865ee208180fed338a7bdb9077741cecc23a618cd31fe499570a0fa750f905ca1f17ebebb09294a9a431fa83d7c43910a6890a83779c68285
7
+ data.tar.gz: 602204c799cca06cf486f1fb44c1f6d2ea2a8f53417916313a08e7e80a035ef2e6f4c2eb75f784ce0f3ac7389c121851b041ef8b7a628493f2e1aaec2334663a
data/README.md CHANGED
@@ -1,29 +1,147 @@
1
- # Opener::Webservice
1
+ # Opener Webservice
2
2
 
3
- TODO: Write a gem description
3
+ This Gem makes it possible for OpeNER components to be used as a webservice.
4
+ Input can be passed directly or using an URL, the latter allows for greater data
5
+ sizes to be processed. Webservices can be chained together using callback URLs,
6
+ each passing its output to the next callback. Output can either be passed
7
+ directly, or as a URL pointing to a document in Amazon S3.
4
8
 
5
- ## Installation
9
+ ## Usage
6
10
 
7
- Add this line to your application's Gemfile:
11
+ Create an executable file `bin/<component>-server`, for example
12
+ `bin/language-identifier-server`, with the following content:
8
13
 
9
- gem 'opener-webservice'
14
+ #!/usr/bin/env ruby
10
15
 
11
- And then execute:
16
+ require 'opener/webservice'
12
17
 
13
- $ bundle
18
+ parser = Opener::Webservice::OptionParser.new(
19
+ 'opener-<component>',
20
+ File.expand_path('../../config.ru', __FILE__)
21
+ )
14
22
 
15
- Or install it yourself as:
23
+ parser.run
16
24
 
17
- $ gem install opener-webservice
25
+ Replace `<component>` with the name of the component. For example, for the
26
+ language identifier this would result in the following:
18
27
 
19
- ## Usage
28
+ #!/usr/bin/env ruby
29
+
30
+ require 'opener/webservice'
31
+
32
+ parser = Opener::Webservice::OptionParser.new(
33
+ 'opener-language-identifier',
34
+ File.expand_path('../../config.ru', __FILE__)
35
+ )
36
+
37
+ parser.run
38
+
39
+ Next, create a `config.ru` file in the root directory of the component. It
40
+ should have the following content:
41
+
42
+ require File.expand_path('../lib/opener/<component>', __FILE__)
43
+ require File.expand_path('../lib/opener/<component>/server', __FILE__)
44
+
45
+ run Opener::<constant>::Server
46
+
47
+ Replace `<component>` with the component name, replace `<constant>` with the
48
+ corresponding constant. For example, for the language identifier:
49
+
50
+ require File.expand_path('../lib/opener/language_identifier', __FILE__)
51
+ require File.expand_path('../lib/opener/language_identifier/server', __FILE__)
52
+
53
+ run Opener::LanguageIdentifier::Server
54
+
55
+ ## Input
56
+
57
+ To submit data, send a POST request to the root URL of a webservice. The request
58
+ body can either be a set of POST fields, or a JSON object. In both cases the
59
+ following fields can be set:
60
+
61
+ * `input`: direct input to process
62
+ * `input_url`: a URL to a document to download and process
63
+ * `callbacks`: an array of callback URLs to send output to
64
+ * `error_callback`: a URL to send errors to
65
+ * `request_id`: a custom request ID/identifier to associate with the document
66
+ * `metadata`: an arbitrary metadata object to associate with a document, only
67
+ supported when using JSON input as POST fields can't represent key/values.
68
+
69
+ Any other parameters are ignored _but_ passed along to the next callback (if
70
+ any).
71
+
72
+ To use JSON input, set the `Content-Type` header to `application/json` when
73
+ submitting data.
74
+
75
+ If no callback URLs are specified the data is processed synchronously, the
76
+ response will be whatever output the underlying component returned (usually
77
+ KAF).
78
+
79
+ When using a callback URL the response will be a JSON object containing:
80
+
81
+ * `request_id`: the generated (or manually specified) request ID/identifier
82
+ * `output_url`: the URL that will contain the end output after all callbacks
83
+ have been processed
84
+
85
+ If an error occurs the output URL will _not_ contain the document, instead a
86
+ POST request is executed using the URL in the `error_callback` field. This URL
87
+ receives the following parameters:
88
+
89
+ * `request_id`: The ID of the request/document that failed
90
+ * `error`: the error message
91
+
92
+ ## Requirements
93
+
94
+ * A supported Ruby version (see below)
95
+ * Amazon S3 (only when one wants to store ouput in S3)
96
+
97
+ The following Ruby versions are supported:
98
+
99
+ | Ruby | Required | Recommended |
100
+ |:---------|:--------------|:------------|
101
+ | MRI | >= 1.9.3 | >= 2.1.4 |
102
+ | Rubinius | >= 2.2 | >= 2.3.0 |
103
+ | JRuby | >= 1.7 | >= 1.7.16 |
104
+
105
+ Note that various components use JRuby, thus they won't work on MRI and
106
+ Rubinius.
107
+
108
+ ## S3 Support
109
+
110
+ To enable storing of output on Amazon S3, specify the `--bucket` option when
111
+ running the CLI. Also make sure that the following environment variables are
112
+ set:
113
+
114
+ * `AWS_ACCESS_KEY_ID`
115
+ * `AWS_SECRET_ACCESS_KEY`
116
+ * `AWS_REGION`
117
+
118
+ If you're running this daemon on an EC2 instance then the first two environment
119
+ variables will be set automatically if the instance has an associated IAM
120
+ profile. The `AWS_REGION` variable must _always_ be set.
121
+
122
+ Output files are named `<identifier>.xml` where `<identifier>` is the unique
123
+ identifier of the document. The content type of these documents is set to
124
+ `application/xml`. Metadata associated with the job (as specified in the
125
+ `metadata` field) is saved as metadata of the S3 object.
126
+
127
+ The S3 URLs are only valid for a limited time (currently 1 hour) so callbacks
128
+ must ensure they can process the input within that time limit.
129
+
130
+ To use custom identifiers for documents, specify a unique value in the
131
+ `request_id` parameter when submitting data. Existing documents using the same
132
+ identifier will be _overwritten_, so make sure your identifiers are truly
133
+ unique. Default identifiers are generated using Ruby's `SecureRandom.hex`
134
+ method.
135
+
136
+ ## Monitoring
20
137
 
21
- TODO: Write usage instructions here
138
+ Components using this Gem can measure performance using New Relic and report
139
+ errors using Rollbar. To support this the following two environment variables
140
+ must be set:
22
141
 
23
- ## Contributing
142
+ * `NEWRELIC_TOKEN`
143
+ * `ROLLBAR_TOKEN`
24
144
 
25
- 1. Fork it
26
- 2. Create your feature branch (`git checkout -b my-new-feature`)
27
- 3. Commit your changes (`git commit -am 'Add some feature'`)
28
- 4. Push to the branch (`git push origin my-new-feature`)
29
- 5. Create new Pull Request
145
+ For New Relic the application names will be `opener-<component>` where
146
+ `<component>` is the component name, as defined by a component itself. If one of
147
+ these environment variables is not set the corresponding feature is disabled.
@@ -0,0 +1,90 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Module for storing global configuration settings such as whether or not to
5
+ # enable authentication.
6
+ #
7
+ module Configuration
8
+ ##
9
+ # Returns `true` if authentication should be enabled.
10
+ #
11
+ # @return [TrueClass|FalseClass]
12
+ #
13
+ def self.authentication?
14
+ return !!authentication_endpoint && !authentication_endpoint.empty?
15
+ end
16
+
17
+ ##
18
+ # Returns the authentication endpoint to use.
19
+ #
20
+ # @return [String]
21
+ #
22
+ def self.authentication_endpoint
23
+ return ENV['AUTHENTICATION_ENDPOINT']
24
+ end
25
+
26
+ ##
27
+ # Returns the field name of the authentication secret.
28
+ #
29
+ # @return [String]
30
+ #
31
+ def self.authentication_secret
32
+ return ENV['AUTHENTICATION_SECRET'] || 'secret'
33
+ end
34
+
35
+ ##
36
+ # Returns the field name of the authentication token.
37
+ #
38
+ # @return [String]
39
+ #
40
+ def self.authentication_token
41
+ return ENV['AUTHENTICATION_TOKEN'] || 'token'
42
+ end
43
+
44
+ ##
45
+ # Name of the S3 bucket to store output in.
46
+ #
47
+ # @return [String]
48
+ #
49
+ def self.output_bucket
50
+ return ENV['OUTPUT_BUCKET']
51
+ end
52
+
53
+ ##
54
+ # Returns `true` if Syslog should be enabled.
55
+ #
56
+ # @return [TrueClass|FalseClass]
57
+ #
58
+ def self.syslog?
59
+ return !!ENV['ENABLE_SYSLOG'] && !ENV['ENABLE_SYSLOG'].empty?
60
+ end
61
+
62
+ ##
63
+ # Returns `true` if Rollbar error tracking should be enabled.
64
+ #
65
+ # @return [TrueClass|FalseClass]
66
+ #
67
+ def self.rollbar?
68
+ return !!ENV['ROLLBAR_TOKEN']
69
+ end
70
+
71
+ ##
72
+ # Configures Rollbar.
73
+ #
74
+ def self.configure_rollbar
75
+ Rollbar.configure do |config|
76
+ config.access_token = ENV['ROLLBAR_TOKEN']
77
+ config.enabled = rollbar?
78
+ config.environment = environment
79
+ end
80
+ end
81
+
82
+ ##
83
+ # @return [String]
84
+ #
85
+ def self.environment
86
+ return ENV['RACK_ENV'] || ENV['RAILS_ENV']
87
+ end
88
+ end # Configuration
89
+ end # Webservice
90
+ end # Opener
@@ -0,0 +1,29 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Class for handling error messages that occur when processing a document.
5
+ #
6
+ # @!attribute [r] http
7
+ # @return [HTTPClient]
8
+ #
9
+ class ErrorHandler
10
+ attr_reader :http
11
+
12
+ def initialize
13
+ @http = HTTPClient.new
14
+ end
15
+
16
+ ##
17
+ # @param [StandardError] error
18
+ # @param [String] request_id
19
+ # @param [String] url
20
+ #
21
+ def submit(error, request_id, url)
22
+ http.post(
23
+ url,
24
+ :body => {:error => error.message, :request_id => request_id}
25
+ )
26
+ end
27
+ end # ErrorHandler
28
+ end # Webservice
29
+ end # Opener
@@ -0,0 +1,43 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Extracts the KAF/text input to use from a set of input parameters.
5
+ #
6
+ # @!attribute [r] http
7
+ # @return [HTTPClient]
8
+ #
9
+ class InputExtractor
10
+ attr_reader :http
11
+
12
+ def initialize
13
+ @http = HTTPClient.new
14
+ end
15
+
16
+ ##
17
+ # @param [Hash] options
18
+ #
19
+ # @option options [String] input_url A URL to download input from.
20
+ # @option options [String] input The direct input to process.
21
+ #
22
+ # @return [String]
23
+ #
24
+ # @raise [RuntimeError] Raised when the input could not be downloaded.
25
+ #
26
+ def extract(options)
27
+ if options['input_url']
28
+ resp = http.get(options['input_url'], :follow_redirect => true)
29
+
30
+ unless resp.ok?
31
+ raise "Failed to download input from #{options['input_url']}"
32
+ end
33
+
34
+ input = resp.body
35
+ else
36
+ input = options['input']
37
+ end
38
+
39
+ return input
40
+ end
41
+ end # InputExtractor
42
+ end # Webservice
43
+ end # Opener
@@ -0,0 +1,65 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Sanitizes raw Sinatra input and component options.
5
+ #
6
+ class InputSanitizer
7
+ ##
8
+ # Returns a Hash containing cleaned up pairs based on the input
9
+ # parameters. The keys of the returned Hash are String instances to
10
+ # prevent Symbol DOS attacks.
11
+ #
12
+ # @param [Hash] input
13
+ # @return [Hash]
14
+ #
15
+ def prepare_parameters(input)
16
+ sanitized = {}
17
+
18
+ input.each do |key, value|
19
+ # Sinatra/Rack uses "on" for checked checkboxes.
20
+ if value == 'true' or value == 'on'
21
+ value = true
22
+ elsif value == 'false'
23
+ value = false
24
+ end
25
+
26
+ sanitized[key.to_s] = value
27
+ end
28
+
29
+ # Strip empty callback URLs (= default form values).
30
+ if sanitized['callbacks']
31
+ sanitized['callbacks'].reject! { |url| url.nil? || url.empty? }
32
+ end
33
+
34
+ if sanitized['error_callback'] and sanitized['error_callback'].empty?
35
+ sanitized.delete('error_callback')
36
+ end
37
+
38
+ return sanitized
39
+ end
40
+
41
+ ##
42
+ # Returns a Hash containing the whitelisted options to pass to a
43
+ # component. Since components use Symbols for their options this Hash uses
44
+ # Symbols for its keys.
45
+ #
46
+ # @param [Hash] input
47
+ # @param [Array] accepted The accepted parameter names.
48
+ # @return [Hash]
49
+ #
50
+ def whitelist_options(input, accepted)
51
+ whitelisted = {}
52
+
53
+ input.each do |key, value|
54
+ sym_key = key.to_sym
55
+
56
+ if accepted.include?(sym_key)
57
+ whitelisted[sym_key] = value
58
+ end
59
+ end
60
+
61
+ return whitelisted
62
+ end
63
+ end # InputSanitizer
64
+ end # Webservice
65
+ end # Opener
@@ -0,0 +1,175 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Slop wrapper for parsing webservice options and passing them to Puma.
5
+ #
6
+ # @!attribute [r] name
7
+ # The name of the component.
8
+ # @return [String]
9
+ #
10
+ # @!attribute [r] rackup
11
+ # Path to the config.ru to use.
12
+ # @return [String]
13
+ #
14
+ # @!attribute [r] parser
15
+ # @return [Slop]
16
+ #
17
+ class OptionParser
18
+ attr_reader :name, :rackup, :parser
19
+
20
+ ##
21
+ # Mapping of environment variables and Slop options.
22
+ #
23
+ # @return [Hash]
24
+ #
25
+ ENV_OPTIONS = {
26
+ 'OUTPUT_BUCKET' => :bucket,
27
+ 'AUTHENTICATION_TOKEN' => :token,
28
+ 'AUTHENTICATION_SECRET' => :secret,
29
+ 'AUTHENTICATION_ENDPOINT' => :authentication
30
+ }
31
+
32
+ ##
33
+ # @param [String] name
34
+ # @param [String] rackup
35
+ #
36
+ def initialize(name, rackup)
37
+ @name = name
38
+ @rackup = rackup
39
+ @parser = configure_slop
40
+ end
41
+
42
+ def parse(*args)
43
+ parser.parse(*args)
44
+ end
45
+
46
+ ##
47
+ # Parses the given CLI options and starts Puma.
48
+ #
49
+ # @param [Array] argv
50
+ #
51
+ def run(argv = ARGV)
52
+ parser.parse(argv)
53
+ end
54
+
55
+ ##
56
+ # @return [Slop]
57
+ #
58
+ def configure_slop
59
+ outer = self
60
+ server_name = "#{name}-server"
61
+ cli_name = server_name.gsub('opener-', '')
62
+
63
+ return Slop.new(:strict => false, :indent => 2) do
64
+ banner "Usage: #{cli_name} [RACKUP] [OPTIONS]"
65
+
66
+ separator <<-EOF.chomp
67
+
68
+ About:
69
+
70
+ Runs the OpeNER component as a webservice using Puma. For example:
71
+
72
+ language-identifier-server --daemon
73
+
74
+ This would start a language identifier server in the background.
75
+
76
+ Environment Variables:
77
+
78
+ These daemons make use of Amazon SQS queues and other Amazon services. In
79
+ order to use these services you should make sure the following environment
80
+ variables are set:
81
+
82
+ * AWS_ACCESS_KEY_ID
83
+ * AWS_SECRET_ACCESS_KEY
84
+ * AWS_REGION
85
+
86
+ If you're running this daemon on an EC2 instance then the first two
87
+ environment variables will be set automatically if the instance has an
88
+ associated IAM profile. The AWS_REGION variable must _always_ be set.
89
+
90
+ Optionally you can also set the following extra variables:
91
+
92
+ * NEWRELIC_TOKEN: when set the daemon will send profiling data to New Relic
93
+ using this token. The application name will be "#{server_name}".
94
+
95
+ * ROLLBAR_TOKEN: when set the daemon will report errors to Rollbar using
96
+ this token. You can freely use this in combination with NEWRELIC_TOKEN.
97
+
98
+ Puma Options:
99
+
100
+ This webserver uses Puma under the hood, but defines its own CLI options.
101
+ All unrecognized options are passed to the Puma CLI. For more information
102
+ on the available options for Puma, run `#{cli_name} --puma-help`.
103
+ EOF
104
+
105
+ separator "\nOptions:\n"
106
+
107
+ on :h, :help, 'Shows this help message' do
108
+ abort to_s
109
+ end
110
+
111
+ on :'puma-help', 'Shows the options of Puma' do
112
+ Puma::CLI.new(['--help']).run
113
+
114
+ abort
115
+ end
116
+
117
+ on :b=,
118
+ :bucket=,
119
+ 'The S3 bucket to store output in',
120
+ :as => String
121
+
122
+ on :authentication=,
123
+ 'An authentication endpoint to use',
124
+ :as => String
125
+
126
+ on :secret=,
127
+ 'Parameter name for the authentication secret',
128
+ :as => String
129
+
130
+ on :token=,
131
+ 'Parameter name for the authentication token',
132
+ :as => String
133
+
134
+ on :'disable-syslog', 'Disables Syslog logging (enabled by default)'
135
+
136
+ run do |opts, args|
137
+ puma_args = [outer.rackup] + args
138
+
139
+ ENV['APP_NAME'] = outer.name
140
+ ENV['APP_ROOT'] = File.expand_path('../../../../', __FILE__)
141
+ ENV['NRCONFIG'] = File.join(ENV['APP_ROOT'], 'config/newrelic.yml')
142
+
143
+ ENV_OPTIONS.each do |key, opt|
144
+ ENV[key] = opts[opt]
145
+ end
146
+
147
+ unless opts[:'disable-syslog']
148
+ ENV['ENABLE_SYSLOG'] = '1'
149
+ end
150
+
151
+ if !ENV['RAILS_ENV'] and ENV['RACK_ENV']
152
+ ENV['RAILS_ENV'] = ENV['RACK_ENV']
153
+ end
154
+
155
+ if ENV['NEWRELIC_TOKEN']
156
+ NewRelic::Control.instance.init_plugin
157
+
158
+ # Enable the GC profiler for New Relic.
159
+ GC::Profiler.enable
160
+ end
161
+
162
+ Configuration.configure_rollbar
163
+
164
+ # Puma on JRuby does some weird stuff with forking/exec. As a result
165
+ # of this we *have to* update ARGV as otherwise running Puma as a
166
+ # daemon does not work.
167
+ ARGV.replace(puma_args)
168
+
169
+ Puma::CLI.new(puma_args).run
170
+ end
171
+ end
172
+ end
173
+ end # OptionParser
174
+ end # Webservice
175
+ end # Opener