opener-webservice 2.0.0 → 2.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 02a6433aa01c06b1251256710c706a59e45586b2
4
- data.tar.gz: 6dc2b0270835171445461234f8123fb13e267d36
3
+ metadata.gz: 32302b9a2aac215c69670cd55e4c28d22901df9c
4
+ data.tar.gz: d24fb9f1fc3113ff70685bd1c2ce440eced844c6
5
5
  SHA512:
6
- metadata.gz: 733000a6cd389bcfbdfd0865432cc45c4896d4564a653fa7ad5a14a123d1cf5eb28ab0202e13fa47daa4e5f2b4a64f88f753abf3046d71fc11d7198329468c02
7
- data.tar.gz: cfd032408fe8d59f391577d029eb96880684eb93ff61f4920f6264ac9ac214bed9d869b52b81d612c95ac386f552e43d8c5055e2024aa82e52a4597a778285fc
6
+ metadata.gz: 5c96ce12b9449ee865ee208180fed338a7bdb9077741cecc23a618cd31fe499570a0fa750f905ca1f17ebebb09294a9a431fa83d7c43910a6890a83779c68285
7
+ data.tar.gz: 602204c799cca06cf486f1fb44c1f6d2ea2a8f53417916313a08e7e80a035ef2e6f4c2eb75f784ce0f3ac7389c121851b041ef8b7a628493f2e1aaec2334663a
data/README.md CHANGED
@@ -1,29 +1,147 @@
1
- # Opener::Webservice
1
+ # Opener Webservice
2
2
 
3
- TODO: Write a gem description
3
+ This Gem makes it possible for OpeNER components to be used as a webservice.
4
+ Input can be passed directly or using an URL, the latter allows for greater data
5
+ sizes to be processed. Webservices can be chained together using callback URLs,
6
+ each passing its output to the next callback. Output can either be passed
7
+ directly, or as a URL pointing to a document in Amazon S3.
4
8
 
5
- ## Installation
9
+ ## Usage
6
10
 
7
- Add this line to your application's Gemfile:
11
+ Create an executable file `bin/<component>-server`, for example
12
+ `bin/language-identifier-server`, with the following content:
8
13
 
9
- gem 'opener-webservice'
14
+ #!/usr/bin/env ruby
10
15
 
11
- And then execute:
16
+ require 'opener/webservice'
12
17
 
13
- $ bundle
18
+ parser = Opener::Webservice::OptionParser.new(
19
+ 'opener-<component>',
20
+ File.expand_path('../../config.ru', __FILE__)
21
+ )
14
22
 
15
- Or install it yourself as:
23
+ parser.run
16
24
 
17
- $ gem install opener-webservice
25
+ Replace `<component>` with the name of the component. For example, for the
26
+ language identifier this would result in the following:
18
27
 
19
- ## Usage
28
+ #!/usr/bin/env ruby
29
+
30
+ require 'opener/webservice'
31
+
32
+ parser = Opener::Webservice::OptionParser.new(
33
+ 'opener-language-identifier',
34
+ File.expand_path('../../config.ru', __FILE__)
35
+ )
36
+
37
+ parser.run
38
+
39
+ Next, create a `config.ru` file in the root directory of the component. It
40
+ should have the following content:
41
+
42
+ require File.expand_path('../lib/opener/<component>', __FILE__)
43
+ require File.expand_path('../lib/opener/<component>/server', __FILE__)
44
+
45
+ run Opener::<constant>::Server
46
+
47
+ Replace `<component>` with the component name, replace `<constant>` with the
48
+ corresponding constant. For example, for the language identifier:
49
+
50
+ require File.expand_path('../lib/opener/language_identifier', __FILE__)
51
+ require File.expand_path('../lib/opener/language_identifier/server', __FILE__)
52
+
53
+ run Opener::LanguageIdentifier::Server
54
+
55
+ ## Input
56
+
57
+ To submit data, send a POST request to the root URL of a webservice. The request
58
+ body can either be a set of POST fields, or a JSON object. In both cases the
59
+ following fields can be set:
60
+
61
+ * `input`: direct input to process
62
+ * `input_url`: a URL to a document to download and process
63
+ * `callbacks`: an array of callback URLs to send output to
64
+ * `error_callback`: a URL to send errors to
65
+ * `request_id`: a custom request ID/identifier to associate with the document
66
+ * `metadata`: an arbitrary metadata object to associate with a document, only
67
+ supported when using JSON input as POST fields can't represent key/values.
68
+
69
+ Any other parameters are ignored _but_ passed along to the next callback (if
70
+ any).
71
+
72
+ To use JSON input, set the `Content-Type` header to `application/json` when
73
+ submitting data.
74
+
75
+ If no callback URLs are specified the data is processed synchronously, the
76
+ response will be whatever output the underlying component returned (usually
77
+ KAF).
78
+
79
+ When using a callback URL the response will be a JSON object containing:
80
+
81
+ * `request_id`: the generated (or manually specified) request ID/identifier
82
+ * `output_url`: the URL that will contain the end output after all callbacks
83
+ have been processed
84
+
85
+ If an error occurs the output URL will _not_ contain the document, instead a
86
+ POST request is executed using the URL in the `error_callback` field. This URL
87
+ receives the following parameters:
88
+
89
+ * `request_id`: The ID of the request/document that failed
90
+ * `error`: the error message
91
+
92
+ ## Requirements
93
+
94
+ * A supported Ruby version (see below)
95
+ * Amazon S3 (only when one wants to store ouput in S3)
96
+
97
+ The following Ruby versions are supported:
98
+
99
+ | Ruby | Required | Recommended |
100
+ |:---------|:--------------|:------------|
101
+ | MRI | >= 1.9.3 | >= 2.1.4 |
102
+ | Rubinius | >= 2.2 | >= 2.3.0 |
103
+ | JRuby | >= 1.7 | >= 1.7.16 |
104
+
105
+ Note that various components use JRuby, thus they won't work on MRI and
106
+ Rubinius.
107
+
108
+ ## S3 Support
109
+
110
+ To enable storing of output on Amazon S3, specify the `--bucket` option when
111
+ running the CLI. Also make sure that the following environment variables are
112
+ set:
113
+
114
+ * `AWS_ACCESS_KEY_ID`
115
+ * `AWS_SECRET_ACCESS_KEY`
116
+ * `AWS_REGION`
117
+
118
+ If you're running this daemon on an EC2 instance then the first two environment
119
+ variables will be set automatically if the instance has an associated IAM
120
+ profile. The `AWS_REGION` variable must _always_ be set.
121
+
122
+ Output files are named `<identifier>.xml` where `<identifier>` is the unique
123
+ identifier of the document. The content type of these documents is set to
124
+ `application/xml`. Metadata associated with the job (as specified in the
125
+ `metadata` field) is saved as metadata of the S3 object.
126
+
127
+ The S3 URLs are only valid for a limited time (currently 1 hour) so callbacks
128
+ must ensure they can process the input within that time limit.
129
+
130
+ To use custom identifiers for documents, specify a unique value in the
131
+ `request_id` parameter when submitting data. Existing documents using the same
132
+ identifier will be _overwritten_, so make sure your identifiers are truly
133
+ unique. Default identifiers are generated using Ruby's `SecureRandom.hex`
134
+ method.
135
+
136
+ ## Monitoring
20
137
 
21
- TODO: Write usage instructions here
138
+ Components using this Gem can measure performance using New Relic and report
139
+ errors using Rollbar. To support this the following two environment variables
140
+ must be set:
22
141
 
23
- ## Contributing
142
+ * `NEWRELIC_TOKEN`
143
+ * `ROLLBAR_TOKEN`
24
144
 
25
- 1. Fork it
26
- 2. Create your feature branch (`git checkout -b my-new-feature`)
27
- 3. Commit your changes (`git commit -am 'Add some feature'`)
28
- 4. Push to the branch (`git push origin my-new-feature`)
29
- 5. Create new Pull Request
145
+ For New Relic the application names will be `opener-<component>` where
146
+ `<component>` is the component name, as defined by a component itself. If one of
147
+ these environment variables is not set the corresponding feature is disabled.
@@ -0,0 +1,90 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Module for storing global configuration settings such as whether or not to
5
+ # enable authentication.
6
+ #
7
+ module Configuration
8
+ ##
9
+ # Returns `true` if authentication should be enabled.
10
+ #
11
+ # @return [TrueClass|FalseClass]
12
+ #
13
+ def self.authentication?
14
+ return !!authentication_endpoint && !authentication_endpoint.empty?
15
+ end
16
+
17
+ ##
18
+ # Returns the authentication endpoint to use.
19
+ #
20
+ # @return [String]
21
+ #
22
+ def self.authentication_endpoint
23
+ return ENV['AUTHENTICATION_ENDPOINT']
24
+ end
25
+
26
+ ##
27
+ # Returns the field name of the authentication secret.
28
+ #
29
+ # @return [String]
30
+ #
31
+ def self.authentication_secret
32
+ return ENV['AUTHENTICATION_SECRET'] || 'secret'
33
+ end
34
+
35
+ ##
36
+ # Returns the field name of the authentication token.
37
+ #
38
+ # @return [String]
39
+ #
40
+ def self.authentication_token
41
+ return ENV['AUTHENTICATION_TOKEN'] || 'token'
42
+ end
43
+
44
+ ##
45
+ # Name of the S3 bucket to store output in.
46
+ #
47
+ # @return [String]
48
+ #
49
+ def self.output_bucket
50
+ return ENV['OUTPUT_BUCKET']
51
+ end
52
+
53
+ ##
54
+ # Returns `true` if Syslog should be enabled.
55
+ #
56
+ # @return [TrueClass|FalseClass]
57
+ #
58
+ def self.syslog?
59
+ return !!ENV['ENABLE_SYSLOG'] && !ENV['ENABLE_SYSLOG'].empty?
60
+ end
61
+
62
+ ##
63
+ # Returns `true` if Rollbar error tracking should be enabled.
64
+ #
65
+ # @return [TrueClass|FalseClass]
66
+ #
67
+ def self.rollbar?
68
+ return !!ENV['ROLLBAR_TOKEN']
69
+ end
70
+
71
+ ##
72
+ # Configures Rollbar.
73
+ #
74
+ def self.configure_rollbar
75
+ Rollbar.configure do |config|
76
+ config.access_token = ENV['ROLLBAR_TOKEN']
77
+ config.enabled = rollbar?
78
+ config.environment = environment
79
+ end
80
+ end
81
+
82
+ ##
83
+ # @return [String]
84
+ #
85
+ def self.environment
86
+ return ENV['RACK_ENV'] || ENV['RAILS_ENV']
87
+ end
88
+ end # Configuration
89
+ end # Webservice
90
+ end # Opener
@@ -0,0 +1,29 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Class for handling error messages that occur when processing a document.
5
+ #
6
+ # @!attribute [r] http
7
+ # @return [HTTPClient]
8
+ #
9
+ class ErrorHandler
10
+ attr_reader :http
11
+
12
+ def initialize
13
+ @http = HTTPClient.new
14
+ end
15
+
16
+ ##
17
+ # @param [StandardError] error
18
+ # @param [String] request_id
19
+ # @param [String] url
20
+ #
21
+ def submit(error, request_id, url)
22
+ http.post(
23
+ url,
24
+ :body => {:error => error.message, :request_id => request_id}
25
+ )
26
+ end
27
+ end # ErrorHandler
28
+ end # Webservice
29
+ end # Opener
@@ -0,0 +1,43 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Extracts the KAF/text input to use from a set of input parameters.
5
+ #
6
+ # @!attribute [r] http
7
+ # @return [HTTPClient]
8
+ #
9
+ class InputExtractor
10
+ attr_reader :http
11
+
12
+ def initialize
13
+ @http = HTTPClient.new
14
+ end
15
+
16
+ ##
17
+ # @param [Hash] options
18
+ #
19
+ # @option options [String] input_url A URL to download input from.
20
+ # @option options [String] input The direct input to process.
21
+ #
22
+ # @return [String]
23
+ #
24
+ # @raise [RuntimeError] Raised when the input could not be downloaded.
25
+ #
26
+ def extract(options)
27
+ if options['input_url']
28
+ resp = http.get(options['input_url'], :follow_redirect => true)
29
+
30
+ unless resp.ok?
31
+ raise "Failed to download input from #{options['input_url']}"
32
+ end
33
+
34
+ input = resp.body
35
+ else
36
+ input = options['input']
37
+ end
38
+
39
+ return input
40
+ end
41
+ end # InputExtractor
42
+ end # Webservice
43
+ end # Opener
@@ -0,0 +1,65 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Sanitizes raw Sinatra input and component options.
5
+ #
6
+ class InputSanitizer
7
+ ##
8
+ # Returns a Hash containing cleaned up pairs based on the input
9
+ # parameters. The keys of the returned Hash are String instances to
10
+ # prevent Symbol DOS attacks.
11
+ #
12
+ # @param [Hash] input
13
+ # @return [Hash]
14
+ #
15
+ def prepare_parameters(input)
16
+ sanitized = {}
17
+
18
+ input.each do |key, value|
19
+ # Sinatra/Rack uses "on" for checked checkboxes.
20
+ if value == 'true' or value == 'on'
21
+ value = true
22
+ elsif value == 'false'
23
+ value = false
24
+ end
25
+
26
+ sanitized[key.to_s] = value
27
+ end
28
+
29
+ # Strip empty callback URLs (= default form values).
30
+ if sanitized['callbacks']
31
+ sanitized['callbacks'].reject! { |url| url.nil? || url.empty? }
32
+ end
33
+
34
+ if sanitized['error_callback'] and sanitized['error_callback'].empty?
35
+ sanitized.delete('error_callback')
36
+ end
37
+
38
+ return sanitized
39
+ end
40
+
41
+ ##
42
+ # Returns a Hash containing the whitelisted options to pass to a
43
+ # component. Since components use Symbols for their options this Hash uses
44
+ # Symbols for its keys.
45
+ #
46
+ # @param [Hash] input
47
+ # @param [Array] accepted The accepted parameter names.
48
+ # @return [Hash]
49
+ #
50
+ def whitelist_options(input, accepted)
51
+ whitelisted = {}
52
+
53
+ input.each do |key, value|
54
+ sym_key = key.to_sym
55
+
56
+ if accepted.include?(sym_key)
57
+ whitelisted[sym_key] = value
58
+ end
59
+ end
60
+
61
+ return whitelisted
62
+ end
63
+ end # InputSanitizer
64
+ end # Webservice
65
+ end # Opener
@@ -0,0 +1,175 @@
1
+ module Opener
2
+ module Webservice
3
+ ##
4
+ # Slop wrapper for parsing webservice options and passing them to Puma.
5
+ #
6
+ # @!attribute [r] name
7
+ # The name of the component.
8
+ # @return [String]
9
+ #
10
+ # @!attribute [r] rackup
11
+ # Path to the config.ru to use.
12
+ # @return [String]
13
+ #
14
+ # @!attribute [r] parser
15
+ # @return [Slop]
16
+ #
17
+ class OptionParser
18
+ attr_reader :name, :rackup, :parser
19
+
20
+ ##
21
+ # Mapping of environment variables and Slop options.
22
+ #
23
+ # @return [Hash]
24
+ #
25
+ ENV_OPTIONS = {
26
+ 'OUTPUT_BUCKET' => :bucket,
27
+ 'AUTHENTICATION_TOKEN' => :token,
28
+ 'AUTHENTICATION_SECRET' => :secret,
29
+ 'AUTHENTICATION_ENDPOINT' => :authentication
30
+ }
31
+
32
+ ##
33
+ # @param [String] name
34
+ # @param [String] rackup
35
+ #
36
+ def initialize(name, rackup)
37
+ @name = name
38
+ @rackup = rackup
39
+ @parser = configure_slop
40
+ end
41
+
42
+ def parse(*args)
43
+ parser.parse(*args)
44
+ end
45
+
46
+ ##
47
+ # Parses the given CLI options and starts Puma.
48
+ #
49
+ # @param [Array] argv
50
+ #
51
+ def run(argv = ARGV)
52
+ parser.parse(argv)
53
+ end
54
+
55
+ ##
56
+ # @return [Slop]
57
+ #
58
+ def configure_slop
59
+ outer = self
60
+ server_name = "#{name}-server"
61
+ cli_name = server_name.gsub('opener-', '')
62
+
63
+ return Slop.new(:strict => false, :indent => 2) do
64
+ banner "Usage: #{cli_name} [RACKUP] [OPTIONS]"
65
+
66
+ separator <<-EOF.chomp
67
+
68
+ About:
69
+
70
+ Runs the OpeNER component as a webservice using Puma. For example:
71
+
72
+ language-identifier-server --daemon
73
+
74
+ This would start a language identifier server in the background.
75
+
76
+ Environment Variables:
77
+
78
+ These daemons make use of Amazon SQS queues and other Amazon services. In
79
+ order to use these services you should make sure the following environment
80
+ variables are set:
81
+
82
+ * AWS_ACCESS_KEY_ID
83
+ * AWS_SECRET_ACCESS_KEY
84
+ * AWS_REGION
85
+
86
+ If you're running this daemon on an EC2 instance then the first two
87
+ environment variables will be set automatically if the instance has an
88
+ associated IAM profile. The AWS_REGION variable must _always_ be set.
89
+
90
+ Optionally you can also set the following extra variables:
91
+
92
+ * NEWRELIC_TOKEN: when set the daemon will send profiling data to New Relic
93
+ using this token. The application name will be "#{server_name}".
94
+
95
+ * ROLLBAR_TOKEN: when set the daemon will report errors to Rollbar using
96
+ this token. You can freely use this in combination with NEWRELIC_TOKEN.
97
+
98
+ Puma Options:
99
+
100
+ This webserver uses Puma under the hood, but defines its own CLI options.
101
+ All unrecognized options are passed to the Puma CLI. For more information
102
+ on the available options for Puma, run `#{cli_name} --puma-help`.
103
+ EOF
104
+
105
+ separator "\nOptions:\n"
106
+
107
+ on :h, :help, 'Shows this help message' do
108
+ abort to_s
109
+ end
110
+
111
+ on :'puma-help', 'Shows the options of Puma' do
112
+ Puma::CLI.new(['--help']).run
113
+
114
+ abort
115
+ end
116
+
117
+ on :b=,
118
+ :bucket=,
119
+ 'The S3 bucket to store output in',
120
+ :as => String
121
+
122
+ on :authentication=,
123
+ 'An authentication endpoint to use',
124
+ :as => String
125
+
126
+ on :secret=,
127
+ 'Parameter name for the authentication secret',
128
+ :as => String
129
+
130
+ on :token=,
131
+ 'Parameter name for the authentication token',
132
+ :as => String
133
+
134
+ on :'disable-syslog', 'Disables Syslog logging (enabled by default)'
135
+
136
+ run do |opts, args|
137
+ puma_args = [outer.rackup] + args
138
+
139
+ ENV['APP_NAME'] = outer.name
140
+ ENV['APP_ROOT'] = File.expand_path('../../../../', __FILE__)
141
+ ENV['NRCONFIG'] = File.join(ENV['APP_ROOT'], 'config/newrelic.yml')
142
+
143
+ ENV_OPTIONS.each do |key, opt|
144
+ ENV[key] = opts[opt]
145
+ end
146
+
147
+ unless opts[:'disable-syslog']
148
+ ENV['ENABLE_SYSLOG'] = '1'
149
+ end
150
+
151
+ if !ENV['RAILS_ENV'] and ENV['RACK_ENV']
152
+ ENV['RAILS_ENV'] = ENV['RACK_ENV']
153
+ end
154
+
155
+ if ENV['NEWRELIC_TOKEN']
156
+ NewRelic::Control.instance.init_plugin
157
+
158
+ # Enable the GC profiler for New Relic.
159
+ GC::Profiler.enable
160
+ end
161
+
162
+ Configuration.configure_rollbar
163
+
164
+ # Puma on JRuby does some weird stuff with forking/exec. As a result
165
+ # of this we *have to* update ARGV as otherwise running Puma as a
166
+ # daemon does not work.
167
+ ARGV.replace(puma_args)
168
+
169
+ Puma::CLI.new(puma_args).run
170
+ end
171
+ end
172
+ end
173
+ end # OptionParser
174
+ end # Webservice
175
+ end # Opener