wukong-storm 0.0.2 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore CHANGED
@@ -1,2 +1,59 @@
1
- pkg/
1
+ ## OS
2
+ .DS_Store
3
+ Icon
4
+ nohup.out
5
+ .bak
6
+
7
+ *.pem
8
+
9
+ ## EDITORS
10
+ \#*
11
+ .\#*
12
+ \#*\#
13
+ *~
14
+ *.swp
15
+ REVISION
16
+ TAGS*
17
+ tmtags
18
+ *_flymake.*
19
+ *_flymake
20
+ *.tmproj
21
+ .project
22
+ .settings
23
+
24
+ ## COMPILED
25
+ a.out
26
+ *.o
27
+ *.pyc
28
+ *.so
29
+
30
+ ## OTHER SCM
31
+ .bzr
32
+ .hg
33
+ .svn
34
+
35
+ ## PROJECT::GENERAL
36
+
37
+ log/*
38
+ tmp/*
39
+ pkg/*
40
+
41
+ coverage
42
+ rdoc
43
+ doc
44
+ pkg
45
+ .rake_test_cache
46
+ .bundle
47
+ .yardoc
48
+
49
+ .vendor
50
+
51
+ ## PROJECT::SPECIFIC
52
+
53
+ old/*
54
+ docpages
55
+ away
56
+
57
+ .rbx
2
58
  Gemfile.lock
59
+ Backup*of*.numbers
data/.yardopts ADDED
@@ -0,0 +1,5 @@
1
+ --readme README.md
2
+ --markup markdown
3
+ -
4
+ LICENSE.md
5
+ README.md
data/Gemfile CHANGED
@@ -5,4 +5,6 @@ gemspec
5
5
  group :development do
6
6
  gem 'rake', '~> 0.9'
7
7
  gem 'rspec', '~> 2'
8
+ gem 'yard'
9
+ gem 'redcarpet'
8
10
  end
data/LICENSE.md ADDED
@@ -0,0 +1,95 @@
1
+ # License for Wukong-Storm
2
+
3
+ The wukong code is __Copyright (c) 2011, 2012 Infochimps, Inc__
4
+
5
+ This code is licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an **AS IS** BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
10
+
11
+ __________________________________________________________________________
12
+
13
+ # Apache License
14
+
15
+
16
+ Apache License
17
+ Version 2.0, January 2004
18
+ http://www.apache.org/licenses/
19
+
20
+ _TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION_
21
+
22
+ ## 1. Definitions.
23
+
24
+ * **License** shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
25
+
26
+ * **Licensor** shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
27
+
28
+ * **Legal Entity** shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
29
+
30
+ * **You** (or **Your**) shall mean an individual or Legal Entity exercising permissions granted by this License.
31
+
32
+ * **Source** form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
33
+
34
+ * **Object** form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
35
+
36
+ * **Work** shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
37
+
38
+ * **Derivative Works** shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
39
+
40
+ * **Contribution** shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
41
+
42
+ * **Contributor** shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
43
+
44
+ ## 2. Grant of Copyright License.
45
+
46
+ Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
47
+
48
+ ## 3. Grant of Patent License.
49
+
50
+ Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
51
+
52
+ ## 4. Redistribution.
53
+
54
+ You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
55
+
56
+ - (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
57
+ - (b) You must cause any modified files to carry prominent notices stating that You changed the files; and
58
+ - (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
59
+ - (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
60
+
61
+ You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
62
+
63
+ ## 5. Submission of Contributions.
64
+
65
+ Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
66
+
67
+ ## 6. Trademarks.
68
+
69
+ This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
70
+
71
+ ## 7. Disclaimer of Warranty.
72
+
73
+ Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
74
+
75
+ ## 8. Limitation of Liability.
76
+
77
+ In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
78
+
79
+ ## 9. Accepting Warranty or Additional Liability.
80
+
81
+ While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
82
+
83
+ _END OF TERMS AND CONDITIONS_
84
+
85
+ ## APPENDIX: How to apply the Apache License to your work.
86
+
87
+ To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets `[]` replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.
88
+
89
+ > Copyright [yyyy] [name of copyright owner]
90
+ >
91
+ > Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
92
+ >
93
+ > http://www.apache.org/licenses/LICENSE-2.0
94
+ >
95
+ > Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
data/README.md CHANGED
@@ -12,13 +12,13 @@ a storm or trident topology.
12
12
 
13
13
  wu-storm operates over STDIN and STDOUT and has a one-to-one message guarantee.
14
14
  For example, when using an identity processor, wu-storm, given an event 'foo', will return
15
- 'foo|'. The '|' character is the specified End-Of-File delimiter.
15
+ 'foo\n|\n'. The '|' character is the specified End-Of-File delimiter.
16
16
 
17
- If there is ever a suppressed error in pricessing, or a skipped record for any reason,
18
- wu-storm will still respond with a '|', signifying an empty return event.
17
+ If there is ever a suppressed error in processing, or a skipped record for any reason,
18
+ wu-storm will still respond with a '|\n', signifying an empty return event.
19
19
 
20
20
  If there are multiple messages that have resulted from a single event, wu-storm will return
21
- them newline separated, followed by the delimite, e.g. 'foo\nbar\nbaz|'.
21
+ them newline separated, followed by the delimite, e.g. 'foo\nbar\nbaz\n|\n'.
22
22
 
23
23
 
24
24
  Params:
data/bin/wu-storm CHANGED
@@ -1,52 +1,4 @@
1
1
  #!/usr/bin/env ruby
2
- require 'wukong-storm'
3
- require 'configliere'
4
-
5
- Settings.use(:commandline)
6
- Settings.define :run, description: 'Name of the processor or dataflow to use. Defaults to basename of the given path', flag: 'r'
7
- Settings.define :delimiter, description: 'The EOF specifier when returning events', default: '|', flag: 't'
8
-
9
- def Settings.usage() "usage: #{File.basename($0)} PROCESSOR|FLOW [...--param=value...]" ; end
10
-
11
- Settings.description = <<'EOF'
12
- wu-storm is a commandline tool for running Wukong processors and flows in
13
- a storm or trident topology.
14
-
15
- wu-storm operates over STDIN and STDOUT and has a one-to-one message guarantee.
16
- For example, when using an identity processor, wu-storm, given an event 'foo', will return
17
- 'foo|'. The '|' character is the specified End-Of-File delimiter.
18
-
19
- If there is ever a suppressed error in pricessing, or a skipped record for any reason,
20
- wu-storm will still respond with a '|', signifying an empty return event.
21
2
 
22
- If there are multiple messages that have resulted from a single event, wu-storm will return
23
- them newline separated, followed by the delimite, e.g. 'foo\nbar\nbaz|'.
24
- EOF
25
-
26
- require 'wukong/boot'
27
- Wukong.boot!(Settings)
28
-
29
- runnable = Settings.rest.first
30
-
31
- case
32
- when runnable.nil?
33
- Settings.dump_help
34
- exit(1)
35
- when Wukong.registry.registered?(runnable.to_sym)
36
- processor = runnable
37
- when File.exist?(runnable)
38
- load runnable
39
- processor = Settings.run || File.basename(runnable, '.rb')
40
- else
41
- Settings.dump_help
42
- exit(1)
43
- end
44
-
45
- begin
46
- EM.run do
47
- Wu::StormRunner.start(processor.to_sym, Settings)
48
- end
49
- rescue Wu::Error => e
50
- $stderr.puts e.message
51
- exit(1)
52
- end
3
+ require 'wukong-storm'
4
+ Wukong::Storm::StormRunner.run
data/lib/wukong-storm.rb CHANGED
@@ -1,3 +1,36 @@
1
1
  require 'wukong'
2
+
3
+ module Wukong
4
+
5
+ # Connects Wukong to Storm.
6
+ module Storm
7
+
8
+ include Plugin
9
+
10
+ # Configure the given settings object for use with Wukong::Storm.
11
+ #
12
+ # @param [Configliere::Param] settings the settings to configure
13
+ # @param [String] program the name of the currently executing program
14
+ def self.configure settings, program
15
+ return unless program == 'wu-storm'
16
+ settings.define :zookeepers_servers, description: 'storm.zookeeper.servers'
17
+ settings.define :zookeepers_port, description: 'storm.zookeeper.port'
18
+ settings.define :local_dir, description: 'storm.local.dir'
19
+ settings.define :scheduler, description: 'storm.scheduler'
20
+ settings.define :cluster_mode, description: 'storm.cluster.mode'
21
+ settings.define :local_hostname, description: 'storm.local.hostname'
22
+ settings.define :run, description: 'Name of the processor or dataflow to use. Defaults to basename of the given path', flag: 'r'
23
+ settings.define :delimiter, description: 'Emitted as a single record to mark the end of the batch ', default: '---', flag: 't'
24
+ end
25
+
26
+ # Boots the Wukong::Storm plugin.
27
+ #
28
+ # @param [Configliere::Param] settings the settings to boot from
29
+ # @param [String] root the root directory to boot in
30
+ def self.boot settings, root
31
+ end
32
+
33
+ end
34
+ end
35
+
2
36
  require 'wukong-storm/runner'
3
- # require 'wukong-storm/configuration'
@@ -0,0 +1,58 @@
1
+ module Wukong
2
+ module Storm
3
+
4
+ # A driver to connect events passed in over STDIN to STDOUT.
5
+ # Differs from the vanilla Wukong::Local::LocalDriver in some
6
+ # Storm-specific ways.
7
+ class StormDriver < EM::P::LineAndTextProtocol
8
+ include DriverMethods
9
+
10
+ attr_accessor :dataflow, :settings
11
+
12
+ def self.start(label, settings = {})
13
+ EM.attach($stdin, self, label, settings)
14
+ end
15
+
16
+ def initialize(label, settings)
17
+ super
18
+ @settings = settings
19
+ @dataflow = construct_dataflow(label, settings)
20
+ @messages = []
21
+ end
22
+
23
+ def post_init
24
+ setup_dataflow
25
+ end
26
+
27
+ def receive_line line
28
+ driver.send_through_dataflow(line)
29
+ send_messages
30
+ rescue => e
31
+ raise Wukong::Error.new(e)
32
+ EM.stop
33
+ end
34
+
35
+ def send_messages
36
+ # message newline message newline message delimiter
37
+ # message newline message newline message newline delimiter newline
38
+ @messages.each do |message|
39
+ $stdout.write(message)
40
+ $stdout.write("\n")
41
+ end
42
+ $stdout.write(settings.delimiter)
43
+ $stdout.write("\n")
44
+ $stdout.flush
45
+ @messages.clear
46
+ end
47
+
48
+ def unbind
49
+ EM.stop
50
+ end
51
+
52
+ def setup() ; end
53
+ def process(record) @messages << record ; end
54
+ def stop() ; end
55
+
56
+ end
57
+ end
58
+ end
@@ -1,45 +1,40 @@
1
+ require_relative('driver')
2
+
1
3
  module Wukong
2
- class StormRunner < EM::P::LineAndTextProtocol
3
- include DriverMethods
4
+ module Storm
4
5
 
5
- attr_accessor :dataflow, :settings
6
+ # Implements the runner for wu-storm.
7
+ class StormRunner < Wukong::Local::LocalRunner
6
8
 
7
- def self.start(label, settings = {})
8
- EM.attach($stdin, self, label, settings)
9
- end
9
+ include Wukong::Logging
10
10
 
11
- def initialize(label, settings)
12
- super
13
- @settings = settings
14
- @dataflow = construct_dataflow(label, settings)
15
- @messages = []
16
- end
17
-
18
- def post_init
19
- setup_dataflow
20
- end
11
+ usage "PROCESSOR|FLOW"
21
12
 
22
- def receive_line line
23
- driver.send_through_dataflow(line)
24
- send_messages
25
- rescue => e
26
- $stderr.puts e.message
27
- EM.stop
28
- end
13
+ description <<-EOF.gsub(/^ {8}/,'')
14
+ wu-storm is a commandline tool for running Wukong processors and flows
15
+ in a storm or trident topology.
29
16
 
30
- def send_messages
31
- $stdout.write(@messages.join("\n") + settings.delimiter)
32
- $stdout.flush
33
- @messages.clear
34
- end
17
+ wu-storm operates over STDIN and STDOUT and has a one-to-one message
18
+ guarantee. For example, when using an identity processor, wu-storm,
19
+ given an event 'foo', will return 'foo|'. The '|' character is the
20
+ specified End-Of-File delimiter.
35
21
 
36
- def unbind
37
- EM.stop
38
- end
22
+ If there is ever a suppressed error in pricessing, or a skipped record
23
+ for any reason, wu-storm will still respond with a '|', signifying an
24
+ empty return event.
39
25
 
40
- def setup() ; end
41
- def process(record) @messages << record ; end
42
- def stop() ; end
26
+ If there are multiple messages that have resulted from a single event,
27
+ wu-storm will return them newline separated, followed by the
28
+ delimiter, e.g. 'foo\nbar\nbaz|'.
29
+ EOF
43
30
 
31
+ # :nodoc:
32
+ def driver
33
+ StormDriver
34
+ end
35
+
36
+ end
44
37
  end
45
38
  end
39
+
40
+
@@ -1,326 +1,321 @@
1
- module Wukong
2
- module Storm
3
-
4
- Configuration = Configliere::Param.new unless defined? Configuration
5
-
6
- Configuration.define :zookeepers_servers, description: 'storm.zookeeper.servers'
7
- Configuration.define :zookeepers_port, description: 'storm.zookeeper.port'
8
- Configuration.define :local_dir, description: 'storm.local.dir'
9
- Configuration.define :scheduler, description: 'storm.scheduler'
10
- Configuration.define :cluster_mode, description: 'storm.cluster.mode'
11
- Configuration.define :local_hostname, description: 'storm.local.hostname'
12
-
13
- /**
14
- * Whether or not to use ZeroMQ for messaging in local mode. If this is set
15
- * to false, then Storm will use a pure-Java messaging system. The purpose
16
- * of this flag is to make it easy to run Storm in local mode by eliminating
17
- * the need for native dependencies, which can be difficult to install.
18
- *
19
- * Defaults to false.
20
- */
21
- public static String STORM_LOCAL_MODE_ZMQ = "storm.local.mode.zmq";
22
-
23
- /**
24
- * The root location at which Storm stores data in ZooKeeper.
25
- */
26
- public static String STORM_ZOOKEEPER_ROOT = "storm.zookeeper.root";
27
-
28
- /**
29
- * The session timeout for clients to ZooKeeper.
30
- */
31
- public static String STORM_ZOOKEEPER_SESSION_TIMEOUT = "storm.zookeeper.session.timeout";
32
-
33
- /**
34
- * The connection timeout for clients to ZooKeeper.
35
- */
36
- public static String STORM_ZOOKEEPER_CONNECTION_TIMEOUT = "storm.zookeeper.connection.timeout";
37
-
38
-
39
- /**
40
- * The number of times to retry a Zookeeper operation.
41
- */
42
- public static String STORM_ZOOKEEPER_RETRY_TIMES="storm.zookeeper.retry.times";
43
-
44
- /**
45
- * The interval between retries of a Zookeeper operation.
46
- */
47
- public static String STORM_ZOOKEEPER_RETRY_INTERVAL="storm.zookeeper.retry.interval";
48
-
49
- /**
50
- * The Zookeeper authentication scheme to use, e.g. "digest". Defaults to no authentication.
51
- */
52
- public static String STORM_ZOOKEEPER_AUTH_SCHEME="storm.zookeeper.auth.scheme";
53
-
54
- /**
55
- * A string representing the payload for Zookeeper authentication. It gets serialized using UTF-8 encoding during authentication.
56
- */
57
- public static String STORM_ZOOKEEPER_AUTH_PAYLOAD="storm.zookeeper.auth.payload";
58
-
59
- /**
60
- * The id assigned to a running topology. The id is the storm name with a unique nonce appended.
61
- */
62
- public static String STORM_ID = "storm.id";
63
-
64
- /**
65
- * The host that the master server is running on.
66
- */
67
- public static String NIMBUS_HOST = "nimbus.host";
68
-
69
- /**
70
- * Which port the Thrift interface of Nimbus should run on. Clients should
71
- * connect to this port to upload jars and submit topologies.
72
- */
73
- public static String NIMBUS_THRIFT_PORT = "nimbus.thrift.port";
74
-
75
-
76
- /**
77
- * This parameter is used by the storm-deploy project to configure the
78
- * jvm options for the nimbus daemon.
79
- */
80
- public static String NIMBUS_CHILDOPTS = "nimbus.childopts";
81
-
82
-
83
- /**
84
- * How long without heartbeating a task can go before nimbus will consider the
85
- * task dead and reassign it to another location.
86
- */
87
- public static String NIMBUS_TASK_TIMEOUT_SECS = "nimbus.task.timeout.secs";
88
-
89
-
90
- /**
91
- * How often nimbus should wake up to check heartbeats and do reassignments. Note
92
- * that if a machine ever goes down Nimbus will immediately wake up and take action.
93
- * This parameter is for checking for failures when there's no explicit event like that
94
- * occuring.
95
- */
96
- public static String NIMBUS_MONITOR_FREQ_SECS = "nimbus.monitor.freq.secs";
97
-
98
- /**
99
- * How often nimbus should wake the cleanup thread to clean the inbox.
100
- * @see NIMBUS_INBOX_JAR_EXPIRATION_SECS
101
- */
102
- public static String NIMBUS_CLEANUP_INBOX_FREQ_SECS = "nimbus.cleanup.inbox.freq.secs";
103
-
104
- /**
105
- * The length of time a jar file lives in the inbox before being deleted by the cleanup thread.
106
- *
107
- * Probably keep this value greater than or equal to NIMBUS_CLEANUP_INBOX_JAR_EXPIRATION_SECS.
108
- * Note that the time it takes to delete an inbox jar file is going to be somewhat more than
109
- * NIMBUS_CLEANUP_INBOX_JAR_EXPIRATION_SECS (depending on how often NIMBUS_CLEANUP_FREQ_SECS
110
- * is set to).
111
- * @see NIMBUS_CLEANUP_FREQ_SECS
112
- */
113
- public static String NIMBUS_INBOX_JAR_EXPIRATION_SECS = "nimbus.inbox.jar.expiration.secs";
114
-
115
- /**
116
- * How long before a supervisor can go without heartbeating before nimbus considers it dead
117
- * and stops assigning new work to it.
118
- */
119
- public static String NIMBUS_SUPERVISOR_TIMEOUT_SECS = "nimbus.supervisor.timeout.secs";
120
-
121
- /**
122
- * A special timeout used when a task is initially launched. During launch, this is the timeout
123
- * used until the first heartbeat, overriding nimbus.task.timeout.secs.
124
- *
125
- * <p>A separate timeout exists for launch because there can be quite a bit of overhead
126
- * to launching new JVM's and configuring them.</p>
127
- */
128
- public static String NIMBUS_TASK_LAUNCH_SECS = "nimbus.task.launch.secs";
129
-
130
- /**
131
- * Whether or not nimbus should reassign tasks if it detects that a task goes down.
132
- * Defaults to true, and it's not recommended to change this value.
133
- */
134
- public static String NIMBUS_REASSIGN = "nimbus.reassign";
135
-
136
- /**
137
- * During upload/download with the master, how long an upload or download connection is idle
138
- * before nimbus considers it dead and drops the connection.
139
- */
140
- public static String NIMBUS_FILE_COPY_EXPIRATION_SECS = "nimbus.file.copy.expiration.secs";
141
-
142
- /**
143
- * A custom class that implements ITopologyValidator that is run whenever a
144
- * topology is submitted. Can be used to provide business-specific logic for
145
- * whether topologies are allowed to run or not.
146
- */
147
- public static String NIMBUS_TOPOLOGY_VALIDATOR = "nimbus.topology.validator";
148
-
149
-
150
- /**
151
- * Storm UI binds to this port.
152
- */
153
- public static String UI_PORT = "ui.port";
154
-
155
- /**
156
- * Childopts for Storm UI Java process.
157
- */
158
- public static String UI_CHILDOPTS = "ui.childopts";
159
-
160
-
161
- /**
162
- * List of DRPC servers so that the DRPCSpout knows who to talk to.
163
- */
164
- public static String DRPC_SERVERS = "drpc.servers";
165
-
166
- /**
167
- * This port is used by Storm DRPC for receiving DPRC requests from clients.
168
- */
169
- public static String DRPC_PORT = "drpc.port";
170
-
171
- /**
172
- * This port on Storm DRPC is used by DRPC topologies to receive function invocations and send results back.
173
- */
174
- public static String DRPC_INVOCATIONS_PORT = "drpc.invocations.port";
175
-
176
- /**
177
- * The timeout on DRPC requests within the DRPC server. Defaults to 10 minutes. Note that requests can also
178
- * timeout based on the socket timeout on the DRPC client, and separately based on the topology message
179
- * timeout for the topology implementing the DRPC function.
180
- */
181
- public static String DRPC_REQUEST_TIMEOUT_SECS = "drpc.request.timeout.secs";
182
-
183
- /**
184
- * the metadata configed on the supervisor
185
- */
186
- public static String SUPERVISOR_SCHEDULER_META = "supervisor.scheduler.meta";
187
- /**
188
- * A list of ports that can run workers on this supervisor. Each worker uses one port, and
189
- * the supervisor will only run one worker per port. Use this configuration to tune
190
- * how many workers run on each machine.
191
- */
192
- public static String SUPERVISOR_SLOTS_PORTS = "supervisor.slots.ports";
193
-
194
-
195
-
196
- /**
197
- * This parameter is used by the storm-deploy project to configure the
198
- * jvm options for the supervisor daemon.
199
- */
200
- public static String SUPERVISOR_CHILDOPTS = "supervisor.childopts";
201
-
202
-
203
- /**
204
- * How long a worker can go without heartbeating before the supervisor tries to
205
- * restart the worker process.
206
- */
207
- public static String SUPERVISOR_WORKER_TIMEOUT_SECS = "supervisor.worker.timeout.secs";
208
-
209
-
210
- /**
211
- * How long a worker can go without heartbeating during the initial launch before
212
- * the supervisor tries to restart the worker process. This value override
213
- * supervisor.worker.timeout.secs during launch because there is additional
214
- * overhead to starting and configuring the JVM on launch.
215
- */
216
- public static String SUPERVISOR_WORKER_START_TIMEOUT_SECS = "supervisor.worker.start.timeout.secs";
217
-
218
-
219
- /**
220
- * Whether or not the supervisor should launch workers assigned to it. Defaults
221
- * to true -- and you should probably never change this value. This configuration
222
- * is used in the Storm unit tests.
223
- */
224
- public static String SUPERVISOR_ENABLE = "supervisor.enable";
225
-
226
-
227
- /**
228
- * how often the supervisor sends a heartbeat to the master.
229
- */
230
- public static String SUPERVISOR_HEARTBEAT_FREQUENCY_SECS = "supervisor.heartbeat.frequency.secs";
231
-
232
-
233
- /**
234
- * How often the supervisor checks the worker heartbeats to see if any of them
235
- * need to be restarted.
236
- */
237
- public static String SUPERVISOR_MONITOR_FREQUENCY_SECS = "supervisor.monitor.frequency.secs";
238
-
239
- /**
240
- * The jvm opts provided to workers launched by this supervisor. All "%ID%" substrings are replaced
241
- * with an identifier for this worker.
242
- */
243
- public static String WORKER_CHILDOPTS = "worker.childopts";
244
-
245
-
246
- /**
247
- * How often this worker should heartbeat to the supervisor.
248
- */
249
- public static String WORKER_HEARTBEAT_FREQUENCY_SECS = "worker.heartbeat.frequency.secs";
250
-
251
- /**
252
- * How often a task should heartbeat its status to the master.
253
- */
254
- public static String TASK_HEARTBEAT_FREQUENCY_SECS = "task.heartbeat.frequency.secs";
255
-
256
-
257
- /**
258
- * How often a task should sync its connections with other tasks (if a task is
259
- * reassigned, the other tasks sending messages to it need to refresh their connections).
260
- * In general though, when a reassignment happens other tasks will be notified
261
- * almost immediately. This configuration is here just in case that notification doesn't
262
- * come through.
263
- */
264
- public static String TASK_REFRESH_POLL_SECS = "task.refresh.poll.secs";
265
-
266
-
267
-
268
- /**
269
- * True if Storm should timeout messages or not. Defaults to true. This is meant to be used
270
- * in unit tests to prevent tuples from being accidentally timed out during the test.
271
- */
272
- public static String TOPOLOGY_ENABLE_MESSAGE_TIMEOUTS = "topology.enable.message.timeouts";
273
-
274
- /**
275
- * When set to true, Storm will log every message that's emitted.
276
- */
277
- public static String TOPOLOGY_DEBUG = "topology.debug";
278
-
279
-
280
- /**
281
- * Whether or not the master should optimize topologies by running multiple
282
- * tasks in a single thread where appropriate.
283
- */
284
- public static String TOPOLOGY_OPTIMIZE = "topology.optimize";
285
-
286
- /**
287
- * How many processes should be spawned around the cluster to execute this
288
- * topology. Each process will execute some number of tasks as threads within
289
- * them. This parameter should be used in conjunction with the parallelism hints
290
- * on each component in the topology to tune the performance of a topology.
291
- */
292
- public static String TOPOLOGY_WORKERS = "topology.workers";
293
-
294
- /**
295
- * How many instances to create for a spout/bolt. A task runs on a thread with zero or more
296
- * other tasks for the same spout/bolt. The number of tasks for a spout/bolt is always
297
- * the same throughout the lifetime of a topology, but the number of executors (threads) for
298
- * a spout/bolt can change over time. This allows a topology to scale to more or less resources
299
- * without redeploying the topology or violating the constraints of Storm (such as a fields grouping
300
- * guaranteeing that the same value goes to the same task).
301
- */
302
- public static String TOPOLOGY_TASKS = "topology.tasks";
303
-
304
- /**
305
- * How many executors to spawn for ackers.
306
- *
307
- * <p>If this is set to 0, then Storm will immediately ack tuples as soon
308
- * as they come off the spout, effectively disabling reliability.</p>
309
- */
310
- public static String TOPOLOGY_ACKER_EXECUTORS = "topology.acker.executors";
311
-
312
-
313
- /**
314
- * The maximum amount of time given to the topology to fully process a message
315
- * emitted by a spout. If the message is not acked within this time frame, Storm
316
- * will fail the message on the spout. Some spouts implementations will then replay
317
- * the message at a later time.
318
- */
319
- public static String TOPOLOGY_MESSAGE_TIMEOUT_SECS = "topology.message.timeout.secs";
320
-
321
- /**
322
- * A list of serialization registrations for Kryo ( http://code.google.com/p/kryo/ ),
323
- * the underlying serialization framework for Storm. A serialization can either
1
+ /**
2
+ This file contains a bunch Storm settings ripped from the Storm
3
+ Java code. Some of these might need wrapped at the Ruby layer by
4
+ this plugin for added configurability. Others may not.
5
+ */
6
+
7
+
8
+ /**
9
+ * Whether or not to use ZeroMQ for messaging in local mode. If this is set
10
+ * to false, then Storm will use a pure-Java messaging system. The purpose
11
+ * of this flag is to make it easy to run Storm in local mode by eliminating
12
+ * the need for native dependencies, which can be difficult to install.
13
+ *
14
+ * Defaults to false.
15
+ */
16
+ public static String STORM_LOCAL_MODE_ZMQ = "storm.local.mode.zmq";
17
+
18
+ /**
19
+ * The root location at which Storm stores data in ZooKeeper.
20
+ */
21
+ public static String STORM_ZOOKEEPER_ROOT = "storm.zookeeper.root";
22
+
23
+ /**
24
+ * The session timeout for clients to ZooKeeper.
25
+ */
26
+ public static String STORM_ZOOKEEPER_SESSION_TIMEOUT = "storm.zookeeper.session.timeout";
27
+
28
+ /**
29
+ * The connection timeout for clients to ZooKeeper.
30
+ */
31
+ public static String STORM_ZOOKEEPER_CONNECTION_TIMEOUT = "storm.zookeeper.connection.timeout";
32
+
33
+
34
+ /**
35
+ * The number of times to retry a Zookeeper operation.
36
+ */
37
+ public static String STORM_ZOOKEEPER_RETRY_TIMES="storm.zookeeper.retry.times";
38
+
39
+ /**
40
+ * The interval between retries of a Zookeeper operation.
41
+ */
42
+ public static String STORM_ZOOKEEPER_RETRY_INTERVAL="storm.zookeeper.retry.interval";
43
+
44
+ /**
45
+ * The Zookeeper authentication scheme to use, e.g. "digest". Defaults to no authentication.
46
+ */
47
+ public static String STORM_ZOOKEEPER_AUTH_SCHEME="storm.zookeeper.auth.scheme";
48
+
49
+ /**
50
+ * A string representing the payload for Zookeeper authentication. It gets serialized using UTF-8 encoding during authentication.
51
+ */
52
+ public static String STORM_ZOOKEEPER_AUTH_PAYLOAD="storm.zookeeper.auth.payload";
53
+
54
+ /**
55
+ * The id assigned to a running topology. The id is the storm name with a unique nonce appended.
56
+ */
57
+ public static String STORM_ID = "storm.id";
58
+
59
+ /**
60
+ * The host that the master server is running on.
61
+ */
62
+ public static String NIMBUS_HOST = "nimbus.host";
63
+
64
+ /**
65
+ * Which port the Thrift interface of Nimbus should run on. Clients should
66
+ * connect to this port to upload jars and submit topologies.
67
+ */
68
+ public static String NIMBUS_THRIFT_PORT = "nimbus.thrift.port";
69
+
70
+
71
+ /**
72
+ * This parameter is used by the storm-deploy project to configure the
73
+ * jvm options for the nimbus daemon.
74
+ */
75
+ public static String NIMBUS_CHILDOPTS = "nimbus.childopts";
76
+
77
+
78
+ /**
79
+ * How long without heartbeating a task can go before nimbus will consider the
80
+ * task dead and reassign it to another location.
81
+ */
82
+ public static String NIMBUS_TASK_TIMEOUT_SECS = "nimbus.task.timeout.secs";
83
+
84
+
85
+ /**
86
+ * How often nimbus should wake up to check heartbeats and do reassignments. Note
87
+ * that if a machine ever goes down Nimbus will immediately wake up and take action.
88
+ * This parameter is for checking for failures when there's no explicit event like that
89
+ * occuring.
90
+ */
91
+ public static String NIMBUS_MONITOR_FREQ_SECS = "nimbus.monitor.freq.secs";
92
+
93
+ /**
94
+ * How often nimbus should wake the cleanup thread to clean the inbox.
95
+ * @see NIMBUS_INBOX_JAR_EXPIRATION_SECS
96
+ */
97
+ public static String NIMBUS_CLEANUP_INBOX_FREQ_SECS = "nimbus.cleanup.inbox.freq.secs";
98
+
99
+ /**
100
+ * The length of time a jar file lives in the inbox before being deleted by the cleanup thread.
101
+ *
102
+ * Probably keep this value greater than or equal to NIMBUS_CLEANUP_INBOX_JAR_EXPIRATION_SECS.
103
+ * Note that the time it takes to delete an inbox jar file is going to be somewhat more than
104
+ * NIMBUS_CLEANUP_INBOX_JAR_EXPIRATION_SECS (depending on how often NIMBUS_CLEANUP_FREQ_SECS
105
+ * is set to).
106
+ * @see NIMBUS_CLEANUP_FREQ_SECS
107
+ */
108
+ public static String NIMBUS_INBOX_JAR_EXPIRATION_SECS = "nimbus.inbox.jar.expiration.secs";
109
+
110
+ /**
111
+ * How long before a supervisor can go without heartbeating before nimbus considers it dead
112
+ * and stops assigning new work to it.
113
+ */
114
+ public static String NIMBUS_SUPERVISOR_TIMEOUT_SECS = "nimbus.supervisor.timeout.secs";
115
+
116
+ /**
117
+ * A special timeout used when a task is initially launched. During launch, this is the timeout
118
+ * used until the first heartbeat, overriding nimbus.task.timeout.secs.
119
+ *
120
+ * <p>A separate timeout exists for launch because there can be quite a bit of overhead
121
+ * to launching new JVM's and configuring them.</p>
122
+ */
123
+ public static String NIMBUS_TASK_LAUNCH_SECS = "nimbus.task.launch.secs";
124
+
125
+ /**
126
+ * Whether or not nimbus should reassign tasks if it detects that a task goes down.
127
+ * Defaults to true, and it's not recommended to change this value.
128
+ */
129
+ public static String NIMBUS_REASSIGN = "nimbus.reassign";
130
+
131
+ /**
132
+ * During upload/download with the master, how long an upload or download connection is idle
133
+ * before nimbus considers it dead and drops the connection.
134
+ */
135
+ public static String NIMBUS_FILE_COPY_EXPIRATION_SECS = "nimbus.file.copy.expiration.secs";
136
+
137
+ /**
138
+ * A custom class that implements ITopologyValidator that is run whenever a
139
+ * topology is submitted. Can be used to provide business-specific logic for
140
+ * whether topologies are allowed to run or not.
141
+ */
142
+ public static String NIMBUS_TOPOLOGY_VALIDATOR = "nimbus.topology.validator";
143
+
144
+
145
+ /**
146
+ * Storm UI binds to this port.
147
+ */
148
+ public static String UI_PORT = "ui.port";
149
+
150
+ /**
151
+ * Childopts for Storm UI Java process.
152
+ */
153
+ public static String UI_CHILDOPTS = "ui.childopts";
154
+
155
+
156
+ /**
157
+ * List of DRPC servers so that the DRPCSpout knows who to talk to.
158
+ */
159
+ public static String DRPC_SERVERS = "drpc.servers";
160
+
161
+ /**
162
+ * This port is used by Storm DRPC for receiving DPRC requests from clients.
163
+ */
164
+ public static String DRPC_PORT = "drpc.port";
165
+
166
+ /**
167
+ * This port on Storm DRPC is used by DRPC topologies to receive function invocations and send results back.
168
+ */
169
+ public static String DRPC_INVOCATIONS_PORT = "drpc.invocations.port";
170
+
171
+ /**
172
+ * The timeout on DRPC requests within the DRPC server. Defaults to 10 minutes. Note that requests can also
173
+ * timeout based on the socket timeout on the DRPC client, and separately based on the topology message
174
+ * timeout for the topology implementing the DRPC function.
175
+ */
176
+ public static String DRPC_REQUEST_TIMEOUT_SECS = "drpc.request.timeout.secs";
177
+
178
+ /**
179
+ * the metadata configed on the supervisor
180
+ */
181
+ public static String SUPERVISOR_SCHEDULER_META = "supervisor.scheduler.meta";
182
+ /**
183
+ * A list of ports that can run workers on this supervisor. Each worker uses one port, and
184
+ * the supervisor will only run one worker per port. Use this configuration to tune
185
+ * how many workers run on each machine.
186
+ */
187
+ public static String SUPERVISOR_SLOTS_PORTS = "supervisor.slots.ports";
188
+
189
+
190
+
191
+ /**
192
+ * This parameter is used by the storm-deploy project to configure the
193
+ * jvm options for the supervisor daemon.
194
+ */
195
+ public static String SUPERVISOR_CHILDOPTS = "supervisor.childopts";
196
+
197
+
198
+ /**
199
+ * How long a worker can go without heartbeating before the supervisor tries to
200
+ * restart the worker process.
201
+ */
202
+ public static String SUPERVISOR_WORKER_TIMEOUT_SECS = "supervisor.worker.timeout.secs";
203
+
204
+
205
+ /**
206
+ * How long a worker can go without heartbeating during the initial launch before
207
+ * the supervisor tries to restart the worker process. This value override
208
+ * supervisor.worker.timeout.secs during launch because there is additional
209
+ * overhead to starting and configuring the JVM on launch.
210
+ */
211
+ public static String SUPERVISOR_WORKER_START_TIMEOUT_SECS = "supervisor.worker.start.timeout.secs";
212
+
213
+
214
+ /**
215
+ * Whether or not the supervisor should launch workers assigned to it. Defaults
216
+ * to true -- and you should probably never change this value. This configuration
217
+ * is used in the Storm unit tests.
218
+ */
219
+ public static String SUPERVISOR_ENABLE = "supervisor.enable";
220
+
221
+
222
+ /**
223
+ * how often the supervisor sends a heartbeat to the master.
224
+ */
225
+ public static String SUPERVISOR_HEARTBEAT_FREQUENCY_SECS = "supervisor.heartbeat.frequency.secs";
226
+
227
+
228
+ /**
229
+ * How often the supervisor checks the worker heartbeats to see if any of them
230
+ * need to be restarted.
231
+ */
232
+ public static String SUPERVISOR_MONITOR_FREQUENCY_SECS = "supervisor.monitor.frequency.secs";
233
+
234
+ /**
235
+ * The jvm opts provided to workers launched by this supervisor. All "%ID%" substrings are replaced
236
+ * with an identifier for this worker.
237
+ */
238
+ public static String WORKER_CHILDOPTS = "worker.childopts";
239
+
240
+
241
+ /**
242
+ * How often this worker should heartbeat to the supervisor.
243
+ */
244
+ public static String WORKER_HEARTBEAT_FREQUENCY_SECS = "worker.heartbeat.frequency.secs";
245
+
246
+ /**
247
+ * How often a task should heartbeat its status to the master.
248
+ */
249
+ public static String TASK_HEARTBEAT_FREQUENCY_SECS = "task.heartbeat.frequency.secs";
250
+
251
+
252
+ /**
253
+ * How often a task should sync its connections with other tasks (if a task is
254
+ * reassigned, the other tasks sending messages to it need to refresh their connections).
255
+ * In general though, when a reassignment happens other tasks will be notified
256
+ * almost immediately. This configuration is here just in case that notification doesn't
257
+ * come through.
258
+ */
259
+ public static String TASK_REFRESH_POLL_SECS = "task.refresh.poll.secs";
260
+
261
+
262
+
263
+ /**
264
+ * True if Storm should timeout messages or not. Defaults to true. This is meant to be used
265
+ * in unit tests to prevent tuples from being accidentally timed out during the test.
266
+ */
267
+ public static String TOPOLOGY_ENABLE_MESSAGE_TIMEOUTS = "topology.enable.message.timeouts";
268
+
269
+ /**
270
+ * When set to true, Storm will log every message that's emitted.
271
+ */
272
+ public static String TOPOLOGY_DEBUG = "topology.debug";
273
+
274
+
275
+ /**
276
+ * Whether or not the master should optimize topologies by running multiple
277
+ * tasks in a single thread where appropriate.
278
+ */
279
+ public static String TOPOLOGY_OPTIMIZE = "topology.optimize";
280
+
281
+ /**
282
+ * How many processes should be spawned around the cluster to execute this
283
+ * topology. Each process will execute some number of tasks as threads within
284
+ * them. This parameter should be used in conjunction with the parallelism hints
285
+ * on each component in the topology to tune the performance of a topology.
286
+ */
287
+ public static String TOPOLOGY_WORKERS = "topology.workers";
288
+
289
+ /**
290
+ * How many instances to create for a spout/bolt. A task runs on a thread with zero or more
291
+ * other tasks for the same spout/bolt. The number of tasks for a spout/bolt is always
292
+ * the same throughout the lifetime of a topology, but the number of executors (threads) for
293
+ * a spout/bolt can change over time. This allows a topology to scale to more or less resources
294
+ * without redeploying the topology or violating the constraints of Storm (such as a fields grouping
295
+ * guaranteeing that the same value goes to the same task).
296
+ */
297
+ public static String TOPOLOGY_TASKS = "topology.tasks";
298
+
299
+ /**
300
+ * How many executors to spawn for ackers.
301
+ *
302
+ * <p>If this is set to 0, then Storm will immediately ack tuples as soon
303
+ * as they come off the spout, effectively disabling reliability.</p>
304
+ */
305
+ public static String TOPOLOGY_ACKER_EXECUTORS = "topology.acker.executors";
306
+
307
+
308
+ /**
309
+ * The maximum amount of time given to the topology to fully process a message
310
+ * emitted by a spout. If the message is not acked within this time frame, Storm
311
+ * will fail the message on the spout. Some spouts implementations will then replay
312
+ * the message at a later time.
313
+ */
314
+ public static String TOPOLOGY_MESSAGE_TIMEOUT_SECS = "topology.message.timeout.secs";
315
+
316
+ /**
317
+ * A list of serialization registrations for Kryo ( http://code.google.com/p/kryo/ ),
318
+ * the underlying serialization framework for Storm. A serialization can either
324
319
  * be the name of a class (in which case Kryo will automatically create a serializer for the class
325
320
  * that saves all the object's fields), or an implementation of com.esotericsoftware.kryo.Serializer.
326
321
  *
@@ -670,7 +665,3 @@ module Wukong
670
665
  return ret;
671
666
  }
672
667
  }
673
-
674
-
675
- end
676
- end