redstorm 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG.md ADDED
File without changes
data/LICENSE.md ADDED
@@ -0,0 +1,13 @@
1
+ # License
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
data/README.md ADDED
@@ -0,0 +1,116 @@
1
+ # RedStorm v0.1.0 - JRuby on Storm
2
+
3
+ RedStorm provides the JRuby integration for the [Storm][storm] distributed realtime computation system.
4
+
5
+ ## disclaimer/limitations
6
+
7
+ The current Ruby interface is **very** similar to the Java interface. A more idiomatic Ruby interface will be be addded, as I better understand the various usage patterns.
8
+
9
+ ## dependencies
10
+
11
+ This has been tested on OSX 10.6.8 and Linux 10.04 using Storm 0.5.4 and JRuby 1.6.5
12
+
13
+ ## installation
14
+ ``` sh
15
+ $ gem install redstorm
16
+ ```
17
+
18
+ ## usage
19
+
20
+ The currently supported usage pattern is to start your new Storm project in an empty directory, install the RedStorm gem and follow the steps below. There is no layout constrains for your project. The `target/` directory will be created by RedStorm in the root of your project.
21
+
22
+ ### initial setup
23
+
24
+ Install RedStom dependencies; from your project root directory execute:
25
+
26
+ ``` sh
27
+ $ redstorm install
28
+ ```
29
+
30
+ The `install` command will install all Java jars dependencies using [ruby-maven][ruby-maven] in `target/dependency` and generate & compile the Java bindings in `target/classes`
31
+
32
+ ### run in local mode
33
+
34
+ Create a topology class that implements the `start` method. The *underscore* topology_class_file_name.rb **MUST** correspond to its *CamelCase* class name.
35
+
36
+ ``` sh
37
+ $ redstorm topology_class_file_name.rb
38
+ ```
39
+
40
+ **See examples below** to run examples in local mode or on a production cluster.
41
+
42
+ ### run on production cluster
43
+
44
+ - generate `target/cluster-topology.jar`. This jar file will include everything in your project directory plus the required dependencies from the `target/` directory:
45
+
46
+ ``` sh
47
+ $ redstorm jar
48
+ ```
49
+
50
+ - submit the cluster topology jar file to the cluster, assuming you have the Storm distribution installed and the Storm `bin/` directory in your path:
51
+
52
+ ``` sh
53
+ storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher topology_class_file_name.rb
54
+ ```
55
+
56
+ Basically you must follow the [Storm instructions](https://github.com/nathanmarz/storm/wiki) to [setup a production cluster](https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster) and [submit your topology to the cluster](https://github.com/nathanmarz/storm/wiki/Running-topologies-on-a-production-cluster).
57
+
58
+
59
+ ## examples
60
+
61
+ Install the example files into `examples/`:
62
+
63
+ ``` sh
64
+ $ redstorm examples
65
+ ```
66
+
67
+ ### local mode
68
+
69
+ ``` sh
70
+ $ redstorm examples/local_exclamation_topology.rb
71
+ $ redstorm examples/local_exclamation_topology2.rb
72
+ $ redstorm examples/local_word_count_topology.rb
73
+ ```
74
+
75
+ This next example requires the use of a [Redis][redis] server on `localhost:6379`
76
+
77
+ ``` sh
78
+ $ redstorm examples/local_redis_word_count_topology.rb
79
+ ```
80
+
81
+ Using `redis-cli`, push words into the `test` list and watch Storm pick them up
82
+
83
+ ### production cluster
84
+
85
+ The only example compatible with a production cluster is `examples/cluster_word_count_topology.rb`
86
+
87
+ - genererate the `target/cluster-topology.jar`
88
+
89
+ ``` sh
90
+ $ redstorm jar
91
+ ```
92
+
93
+ - submit the cluster topology jar file to the cluster, assuming you have the Storm distribution installed and the Storm `bin/` directory in your path:
94
+
95
+ ``` sh
96
+ storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher examples/cluster_word_count_topology.rb
97
+ ```
98
+
99
+ Basically you must follow the [Storm instructions](https://github.com/nathanmarz/storm/wiki) to [setup a production cluster](https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster) and [submit your topology to the cluster](https://github.com/nathanmarz/storm/wiki/Running-topologies-on-a-production-cluster).
100
+
101
+
102
+ ## author
103
+ Colin Surprenant, [@colinsurprenant][twitter], [colin.surprenant@needium.com][needium], [colin.surprenant@gmail.com][gmail], [http://github.com/colinsurprenant][github]
104
+
105
+ ## license
106
+ Apache License, Version 2.0. See the LICENSE.md file.
107
+
108
+ [needium]: colin.surprenant@needium.com
109
+ [gmail]: colin.surprenant@gmail.com
110
+ [twitter]: http://twitter.com/colinsurprenant
111
+ [github]: http://github.com/colinsurprenant
112
+ [rvm]: http://beginrescueend.com/
113
+ [storm]: https://github.com/nathanmarz/storm
114
+ [jruby]: http://jruby.org/
115
+ [ruby-maven]: https://github.com/mkristian/ruby-maven
116
+ [redis]: http://redis.io/
data/Rakefile ADDED
@@ -0,0 +1,127 @@
1
+ require 'ant'
2
+
3
+ begin
4
+ # will work from gem, since lib dir is in gem require_paths
5
+ require 'red_storm'
6
+ rescue LoadError
7
+ # will work within RedStorm dev project
8
+ $:.unshift './lib'
9
+ require 'red_storm'
10
+ end
11
+
12
+ CWD = Dir.pwd
13
+ TARGET_DIR = "#{CWD}/target"
14
+ TARGET_SRC_DIR = "#{TARGET_DIR}/src"
15
+ TARGET_CLASSES_DIR = "#{TARGET_DIR}/classes"
16
+ TARGET_DEPENDENCY_DIR = "#{TARGET_DIR}/dependency"
17
+ TARGET_DEPENDENCY_UNPACKED_DIR = "#{TARGET_DIR}/dependency-unpacked"
18
+ TARGET_CLUSTER_JAR = "#{TARGET_DIR}/cluster-topology.jar"
19
+
20
+ JAVA_SRC_DIR = "#{RedStorm::REDSTORM_HOME}/src/main"
21
+ JRUBY_SRC_DIR = "#{RedStorm::REDSTORM_HOME}/lib/red_storm"
22
+
23
+ SRC_EXAMPLES = "#{RedStorm::REDSTORM_HOME}/examples"
24
+ DST_EXAMPLES = "#{CWD}/examples"
25
+
26
+ task :default => [:clean, :build]
27
+
28
+ task :launch, :class_file do |t, args|
29
+ system("java -cp \"#{TARGET_CLASSES_DIR}:#{TARGET_DEPENDENCY_DIR}/*\" redstorm.TopologyLauncher #{args[:class_file]}")
30
+ end
31
+
32
+ task :clean do
33
+ ant.delete :dir => TARGET_DIR
34
+ end
35
+
36
+ task :clean_jar do
37
+ ant.delete :dir => "#{TARGET_DIR}/cluster-topology.jar"
38
+ end
39
+
40
+ task :setup do
41
+ ant.mkdir :dir => TARGET_DIR
42
+ ant.mkdir :dir => TARGET_CLASSES_DIR
43
+ ant.mkdir :dir => TARGET_SRC_DIR
44
+ ant.path :id => 'classpath' do
45
+ fileset :dir => TARGET_DEPENDENCY_DIR
46
+ fileset :dir => TARGET_CLASSES_DIR
47
+ end
48
+ end
49
+
50
+ task :install => [:deps, :build] do
51
+ puts("\nRedStorm install completed. All dependencies installed in #{TARGET_DIR}")
52
+ end
53
+
54
+ task :unpack do
55
+ system("rmvn dependency:unpack -f #{RedStorm::REDSTORM_HOME}/pom.xml -DoutputDirectory=#{TARGET_DEPENDENCY_UNPACKED_DIR}")
56
+ end
57
+
58
+ task :jar => [:unpack, :clean_jar] do
59
+ ant.jar :destfile => TARGET_CLUSTER_JAR do
60
+ fileset :dir => TARGET_CLASSES_DIR
61
+ fileset :dir => TARGET_DEPENDENCY_UNPACKED_DIR
62
+ fileset :dir => CWD do
63
+ exclude :name => "target/**/*"
64
+ end
65
+ manifest do
66
+ attribute :name => "Main-Class", :value => "redstorm.TopologyLauncher"
67
+ end
68
+ end
69
+ puts("\nRedStorm jar completed. Generated jar file #{TARGET_CLUSTER_JAR}")
70
+ end
71
+
72
+ task :examples do
73
+ if File.identical?(SRC_EXAMPLES, DST_EXAMPLES)
74
+ STDERR.puts("error: cannot copy examples into itself")
75
+ exit(1)
76
+ end
77
+ if File.exist?(DST_EXAMPLES)
78
+ STDERR.puts("error: directory #{DST_EXAMPLES} already exists")
79
+ exit(1)
80
+ end
81
+
82
+ puts("copying examples into #{DST_EXAMPLES}")
83
+ system("mkdir #{DST_EXAMPLES}")
84
+ system("cp -r #{SRC_EXAMPLES}/* #{DST_EXAMPLES}")
85
+ puts("\nRedStorm examples completed. All examples copied in #{DST_EXAMPLES}")
86
+ end
87
+
88
+ task :deps do
89
+ system("rmvn dependency:copy-dependencies -f #{RedStorm::REDSTORM_HOME}/pom.xml -DoutputDirectory=#{TARGET_DEPENDENCY_DIR}")
90
+ end
91
+
92
+ task :build => :setup do
93
+ # compile the JRuby proxy classes to Java
94
+ build_jruby("#{JRUBY_SRC_DIR}/proxy")
95
+
96
+ # compile the generated Java proxy classes
97
+ build_java_dir("#{TARGET_SRC_DIR}")
98
+
99
+ # generate the JRuby topology launcher
100
+ build_jruby("#{JRUBY_SRC_DIR}/topology_launcher.rb")
101
+
102
+ # compile the JRuby proxy classes
103
+ build_java_dir("#{JAVA_SRC_DIR}")
104
+
105
+ # compile the JRuby proxy classes
106
+ build_java_dir("#{TARGET_SRC_DIR}")
107
+ end
108
+
109
+ def build_java_dir(source_folder)
110
+ puts("\n--> Compiling Java")
111
+ ant.javac(
112
+ :srcdir => source_folder,
113
+ :destdir => TARGET_CLASSES_DIR,
114
+ :classpathref => 'classpath',
115
+ :source => "1.6",
116
+ :target => "1.6",
117
+ :debug => "yes",
118
+ :includeantruntime => "no",
119
+ :verbose => false,
120
+ :listfiles => true
121
+ )
122
+ end
123
+
124
+ def build_jruby(source_path)
125
+ puts("\n--> Compiling JRuby")
126
+ system("cd #{RedStorm::REDSTORM_HOME}; jrubyc -t #{TARGET_SRC_DIR} --verbose --java -c \"#{TARGET_DEPENDENCY_DIR}/storm-0.5.3.jar\" -c \"#{TARGET_CLASSES_DIR}\" #{source_path}")
127
+ end
data/bin/redstorm ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'rubygems'
4
+
5
+ begin
6
+ # will work from gem, since lib dir is in gem require_paths
7
+ require 'red_storm'
8
+ rescue LoadError
9
+ # will work within RedStorm dev project
10
+ $:.unshift './lib'
11
+ require 'red_storm'
12
+ end
13
+
14
+ RedStorm::Application.new.run(ARGV.dup)
@@ -0,0 +1,18 @@
1
+ require 'examples/random_sentence_spout'
2
+ require 'examples/split_sentence_bolt'
3
+ require 'examples/word_count_bolt'
4
+
5
+ class ClusterWordCountTopology
6
+ def start(base_class_path)
7
+ builder = TopologyBuilder.new
8
+ builder.setSpout(1, JRubySpout.new(base_class_path, "RandomSentenceSpout"), 5)
9
+ builder.setBolt(2, JRubyBolt.new(base_class_path, "SplitSentenceBolt"), 4).shuffleGrouping(1)
10
+ builder.setBolt(3, JRubyBolt.new(base_class_path, "WordCountBolt"), 4).fieldsGrouping(2, Fields.new("word"))
11
+
12
+ conf = Config.new
13
+ conf.setDebug(true)
14
+ conf.setNumWorkers(20);
15
+ conf.setMaxSpoutPending(1000);
16
+ StormSubmitter.submitTopology("word-count", conf, builder.createTopology);
17
+ end
18
+ end
@@ -0,0 +1,14 @@
1
+ class ExclamationBolt
2
+ def prepare(conf, context, collector)
3
+ @collector = collector
4
+ end
5
+
6
+ def execute(tuple)
7
+ @collector.emit(tuple, Values.new(tuple.getString(0) + "!!!"))
8
+ @collector.ack(tuple)
9
+ end
10
+
11
+ def declare_output_fields(declarer)
12
+ declarer.declare(Fields.new("word"))
13
+ end
14
+ end
@@ -0,0 +1,23 @@
1
+ java_import 'backtype.storm.testing.TestWordSpout'
2
+ require 'examples/exclamation_bolt'
3
+
4
+ # this example topology uses the Storm TestWordSpout and our own JRuby ExclamationBolt
5
+
6
+ class LocalExclamationTopology
7
+ def start(base_class_path)
8
+ builder = TopologyBuilder.new
9
+
10
+ builder.setSpout(1, TestWordSpout.new, 10)
11
+ builder.setBolt(2, JRubyBolt.new(base_class_path, "ExclamationBolt"), 3).shuffleGrouping(1)
12
+ builder.setBolt(3, JRubyBolt.new(base_class_path, "ExclamationBolt"), 2).shuffleGrouping(2)
13
+
14
+ conf = Config.new
15
+ conf.setDebug(true)
16
+
17
+ cluster = LocalCluster.new
18
+ cluster.submitTopology("test", conf, builder.createTopology)
19
+ sleep(5)
20
+ cluster.killTopology("test")
21
+ cluster.shutdown
22
+ end
23
+ end
@@ -0,0 +1,37 @@
1
+ java_import 'backtype.storm.testing.TestWordSpout'
2
+
3
+ class ExclamationBolt2
4
+ def prepare(conf, context, collector)
5
+ @collector = collector
6
+ end
7
+
8
+ def execute(tuple)
9
+ @collector.emit(tuple, Values.new(tuple.getString(0) + "!!!"))
10
+ @collector.ack(tuple)
11
+ end
12
+
13
+ def declare_output_fields(declarer)
14
+ declarer.declare(Fields.new("word"))
15
+ end
16
+ end
17
+
18
+ # this example topology uses the Storm TestWordSpout and our own JRuby ExclamationBolt
19
+
20
+ class LocalExclamationTopology2
21
+ def start(base_class_path)
22
+ builder = TopologyBuilder.new
23
+
24
+ builder.setSpout(1, TestWordSpout.new, 10)
25
+ builder.setBolt(2, JRubyBolt.new(base_class_path, "ExclamationBolt2"), 3).shuffleGrouping(1)
26
+ builder.setBolt(3, JRubyBolt.new(base_class_path, "ExclamationBolt2"), 2).shuffleGrouping(2)
27
+
28
+ conf = Config.new
29
+ conf.setDebug(true)
30
+
31
+ cluster = LocalCluster.new
32
+ cluster.submitTopology("test", conf, builder.createTopology)
33
+ sleep(5)
34
+ cluster.killTopology("test")
35
+ cluster.shutdown
36
+ end
37
+ end
@@ -0,0 +1,58 @@
1
+ require 'redis'
2
+ require 'thread'
3
+ require 'examples/word_count_bolt'
4
+
5
+ # RedisWordSpout reads the Redis queue "test" on localhost:6379
6
+ # and emits each word items pop'ed from the queue.
7
+ class RedisWordSpout
8
+ def open(conf, context, collector)
9
+ @collector = collector
10
+ @q = Queue.new
11
+ @redis_reader = detach_redis_reader
12
+ end
13
+
14
+ def next_tuple
15
+ # per doc nextTuple should not block, and sleep a bit when there's no data to process.
16
+ if @q.size > 0
17
+ @collector.emit(Values.new(@q.pop))
18
+ else
19
+ sleep(0.1)
20
+ end
21
+ end
22
+
23
+ def declare_output_fields(declarer)
24
+ declarer.declare(Fields.new("word"))
25
+ end
26
+
27
+ private
28
+
29
+ def detach_redis_reader
30
+ Thread.new do
31
+ Thread.current.abort_on_exception = true
32
+
33
+ redis = Redis.new(:host => "localhost", :port => 6379)
34
+ loop do
35
+ if data = redis.blpop("test", 0)
36
+ @q << data[1]
37
+ end
38
+ end
39
+ end
40
+ end
41
+ end
42
+
43
+ class LocalRedisWordCountTopology
44
+ def start(base_class_path)
45
+ builder = TopologyBuilder.new
46
+ builder.setSpout(1, JRubySpout.new(base_class_path, "RedisWordSpout"), 1)
47
+ builder.setBolt(2, JRubyBolt.new(base_class_path, "WordCountBolt"), 3).fieldsGrouping(1, Fields.new("word"))
48
+
49
+ conf = Config.new
50
+ conf.setDebug(true)
51
+ conf.setMaxTaskParallelism(3)
52
+
53
+ cluster = LocalCluster.new
54
+ cluster.submitTopology("redis-word-count", conf, builder.createTopology)
55
+ sleep(600)
56
+ cluster.shutdown
57
+ end
58
+ end
@@ -0,0 +1,21 @@
1
+ require 'examples/random_sentence_spout'
2
+ require 'examples/split_sentence_bolt'
3
+ require 'examples/word_count_bolt'
4
+
5
+ class LocalWordCountTopology
6
+ def start(base_class_path)
7
+ builder = TopologyBuilder.new
8
+ builder.setSpout(1, JRubySpout.new(base_class_path, "RandomSentenceSpout"), 5)
9
+ builder.setBolt(2, JRubyBolt.new(base_class_path, "SplitSentenceBolt"), 8).shuffleGrouping(1)
10
+ builder.setBolt(3, JRubyBolt.new(base_class_path, "WordCountBolt"), 12).fieldsGrouping(2, Fields.new("word"))
11
+
12
+ conf = Config.new
13
+ conf.setDebug(true)
14
+ conf.setMaxTaskParallelism(3)
15
+
16
+ cluster = LocalCluster.new
17
+ cluster.submitTopology("word-count", conf, builder.createTopology)
18
+ sleep(5)
19
+ cluster.shutdown
20
+ end
21
+ end
@@ -0,0 +1,26 @@
1
+ class RandomSentenceSpout
2
+ attr_reader :is_distributed
3
+
4
+ def initialize
5
+ @is_distributed = true
6
+ @sentences = [
7
+ "the cow jumped over the moon",
8
+ "an apple a day keeps the doctor away",
9
+ "four score and seven years ago",
10
+ "snow white and the seven dwarfs",
11
+ "i am at two with nature"
12
+ ]
13
+ end
14
+
15
+ def open(conf, context, collector)
16
+ @collector = collector
17
+ end
18
+
19
+ def next_tuple
20
+ @collector.emit(Values.new(@sentences[rand(@sentences.length)]))
21
+ end
22
+
23
+ def declare_output_fields(declarer)
24
+ declarer.declare(Fields.new("word"))
25
+ end
26
+ end
@@ -0,0 +1,13 @@
1
+ class SplitSentenceBolt
2
+ def prepare(conf, context, collector)
3
+ @collector = collector
4
+ end
5
+
6
+ def execute(tuple)
7
+ tuple.getString(0).split(" ").each {|w| @collector.emit(Values.new(w)) }
8
+ end
9
+
10
+ def declare_output_fields(declarer)
11
+ declarer.declare(Fields.new("word"))
12
+ end
13
+ end
@@ -0,0 +1,19 @@
1
+ class WordCountBolt
2
+ def initialize
3
+ @counts = Hash.new{|h, k| h[k] = 0}
4
+ end
5
+
6
+ def prepare(conf, context, collector)
7
+ @collector = collector
8
+ end
9
+
10
+ def execute(tuple)
11
+ word = tuple.getString(0)
12
+ @counts[word] += 1
13
+ @collector.emit(Values.new(word, @counts[word]))
14
+ end
15
+
16
+ def declare_output_fields(declarer)
17
+ declarer.declare(Fields.new("word", "count"))
18
+ end
19
+ end
data/lib/red_storm.rb ADDED
@@ -0,0 +1,6 @@
1
+ module RedStorm
2
+ REDSTORM_HOME = File.expand_path(File.dirname(__FILE__) + '/..') unless defined?(REDSTORM_HOME)
3
+ end
4
+
5
+ require 'red_storm/version'
6
+ require 'red_storm/application'
@@ -0,0 +1,20 @@
1
+ require 'rake'
2
+
3
+ class RedStorm::Application
4
+
5
+ def run(args)
6
+ if args.size == 1 && File.exist?(args.first)
7
+ load("#{RedStorm::REDSTORM_HOME}/Rakefile")
8
+ Rake::Task['launch'].invoke(args)
9
+ else
10
+ task = args.shift
11
+ if ["install", "examples", "jar"].include?(task)
12
+ load("#{RedStorm::REDSTORM_HOME}/Rakefile")
13
+ Rake::Task[task].invoke(args)
14
+ else
15
+ puts("\nUsage: redstorm install|examples|jar|topology_class_file_name")
16
+ exit(1)
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,55 @@
1
+ require 'java'
2
+
3
+ java_import 'backtype.storm.task.OutputCollector'
4
+ java_import 'backtype.storm.task.TopologyContext'
5
+ java_import 'backtype.storm.topology.IRichBolt'
6
+ java_import 'backtype.storm.topology.OutputFieldsDeclarer'
7
+ java_import 'backtype.storm.tuple.Tuple'
8
+ java_import 'backtype.storm.tuple.Fields'
9
+ java_import 'backtype.storm.tuple.Values'
10
+ java_import 'java.util.Map'
11
+
12
+ java_package 'redstorm.proxy'
13
+
14
+ # the Bolt class is a proxy to the real bolt to avoid having to deal with all the
15
+ # Java artifacts when creating a bolt.
16
+ #
17
+ # The real bolt class implementation must define these methods:
18
+ # - prepare(conf, context, collector)
19
+ # - execute(tuple)
20
+ # - declare_output_fields
21
+ #
22
+ # and optionnaly:
23
+ # - cleanup
24
+ #
25
+ class Bolt
26
+ java_implements IRichBolt
27
+
28
+ java_signature 'IRichBolt (String base_class_path, String real_bolt_class_name)'
29
+ def initialize(base_class_path, real_bolt_class_name)
30
+ @real_bolt = Object.module_eval(real_bolt_class_name).new
31
+ rescue NameError
32
+ require base_class_path
33
+ @real_bolt = Object.module_eval(real_bolt_class_name).new
34
+ end
35
+
36
+ java_signature 'void prepare(Map, TopologyContext, OutputCollector)'
37
+ def prepare(conf, context, collector)
38
+ @real_bolt.prepare(conf, context, collector)
39
+ end
40
+
41
+ java_signature 'void execute(Tuple)'
42
+ def execute(tuple)
43
+ @real_bolt.execute(tuple)
44
+ end
45
+
46
+ java_signature 'void cleanup()'
47
+ def cleanup
48
+ @real_bolt.cleanup if @real_bolt.respond_to?(:cleanup)
49
+ end
50
+
51
+ java_signature 'void declareOutputFields(OutputFieldsDeclarer)'
52
+ def declareOutputFields(declarer)
53
+ @real_bolt.declare_output_fields(declarer)
54
+ end
55
+ end
@@ -0,0 +1,73 @@
1
+ require 'java'
2
+
3
+ java_import 'backtype.storm.spout.SpoutOutputCollector'
4
+ java_import 'backtype.storm.task.TopologyContext'
5
+ java_import 'backtype.storm.topology.IRichSpout'
6
+ java_import 'backtype.storm.topology.OutputFieldsDeclarer'
7
+ java_import 'backtype.storm.tuple.Tuple'
8
+ java_import 'backtype.storm.tuple.Fields'
9
+ java_import 'backtype.storm.tuple.Values'
10
+ java_import 'java.util.Map'
11
+
12
+ java_package 'redstorm.proxy'
13
+
14
+ # the Spout class is a proxy to the real spout to avoid having to deal with all the
15
+ # Java artifacts when creating a spout.
16
+ #
17
+ # The real spout class implementation must define these methods:
18
+ # - open(conf, context, collector)
19
+ # - next_tuple
20
+ # - is_distributed
21
+ # - declare_output_fields
22
+ #
23
+ # and optionnaly:
24
+ # - ack(msg_id)
25
+ # - fail(msg_id)
26
+ # - close
27
+ #
28
+ class Spout
29
+ java_implements IRichSpout
30
+
31
+ java_signature 'IRichSpout (String base_class_path, String real_spout_class_name)'
32
+ def initialize(base_class_path, real_spout_class_name)
33
+ @real_spout = Object.module_eval(real_spout_class_name).new
34
+ rescue NameError
35
+ require base_class_path
36
+ @real_spout = Object.module_eval(real_spout_class_name).new
37
+ end
38
+
39
+ java_signature 'boolean isDistributed()'
40
+ def isDistributed
41
+ @real_spout.respond_to?(:is_distributed) ? @real_spout.is_distributed : false
42
+ end
43
+
44
+ java_signature 'void open(Map, TopologyContext, SpoutOutputCollector)'
45
+ def open(conf, context, collector)
46
+ @real_spout.open(conf, context, collector)
47
+ end
48
+
49
+ java_signature 'void close()'
50
+ def close
51
+ @real_spout.close if @real_spout.respond_to?(:close)
52
+ end
53
+
54
+ java_signature 'void nextTuple()'
55
+ def nextTuple
56
+ @real_spout.next_tuple
57
+ end
58
+
59
+ java_signature 'void ack(Object)'
60
+ def ack(msg_id)
61
+ @real_spout.ack(msg_id) if @real_spout.respond_to?(:ack)
62
+ end
63
+
64
+ java_signature 'void fail(Object)'
65
+ def fail(msg_id)
66
+ @real_spout.fail(msg_id) if @real_spout.respond_to?(:fail)
67
+ end
68
+
69
+ java_signature 'void declareOutputFields(OutputFieldsDeclarer)'
70
+ def declareOutputFields(declarer)
71
+ @real_spout.declare_output_fields(declarer)
72
+ end
73
+ end
@@ -0,0 +1,49 @@
1
+ require 'java'
2
+ require 'rubygems'
3
+
4
+ begin
5
+ # will work from gem, since lib dir is in gem require_paths
6
+ require 'red_storm/version'
7
+ rescue LoadError
8
+ # will work within RedStorm dev project
9
+ $:.unshift './lib'
10
+ require 'red_storm/version'
11
+ end
12
+
13
+
14
+ java_import 'backtype.storm.Config'
15
+ java_import 'backtype.storm.LocalCluster'
16
+ java_import 'backtype.storm.StormSubmitter'
17
+ java_import 'backtype.storm.topology.TopologyBuilder'
18
+ java_import 'backtype.storm.tuple.Fields'
19
+ java_import 'backtype.storm.tuple.Tuple'
20
+ java_import 'backtype.storm.tuple.Values'
21
+
22
+ java_import 'redstorm.storm.jruby.JRubyBolt'
23
+ java_import 'redstorm.storm.jruby.JRubySpout'
24
+
25
+ java_package 'redstorm'
26
+
27
+ # TopologyLauncher is the application entry point when launching a topology. Basically it will
28
+ # call require on the specified Ruby topology/project class file path and call its start method
29
+ class TopologyLauncher
30
+
31
+ java_signature 'void main(String[])'
32
+ def self.main(args)
33
+ unless args.size > 0
34
+ puts("Usage: redstorm topology_class_file_name")
35
+ exit(1)
36
+ end
37
+ class_path = args[0]
38
+ clazz = camel_case(class_path.split('/').last.split('.').first)
39
+ puts("redstorm v#{RedStorm::VERSION} launching #{clazz}")
40
+ require class_path
41
+ Object.module_eval(clazz).new.start(class_path)
42
+ end
43
+
44
+ private
45
+
46
+ def self.camel_case(s)
47
+ s.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
48
+ end
49
+ end
@@ -0,0 +1,3 @@
1
+ module RedStorm
2
+ VERSION = '0.1.0'
3
+ end
data/pom.xml ADDED
@@ -0,0 +1,69 @@
1
+ <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
3
+ <modelVersion>4.0.0</modelVersion>
4
+
5
+ <groupId>redstorm</groupId>
6
+ <artifactId>redstorm</artifactId>
7
+ <version>0.1.0</version>
8
+ <name>RedStorm JRuby on Storm</name>
9
+
10
+ <properties>
11
+ <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
12
+ </properties>
13
+
14
+ <repositories>
15
+ <repository>
16
+ <id>clojars</id>
17
+ <url>http://clojars.org/repo/</url>
18
+ </repository>
19
+
20
+ <repository>
21
+ <id>central</id>
22
+ <url>http://repo1.maven.org/maven2</url>
23
+ </repository>
24
+ </repositories>
25
+
26
+ <dependencies>
27
+ <dependency>
28
+ <groupId>storm</groupId>
29
+ <artifactId>storm</artifactId>
30
+ <version>0.5.4</version>
31
+ </dependency>
32
+
33
+ <dependency>
34
+ <groupId>org.jruby</groupId>
35
+ <artifactId>jruby-complete</artifactId>
36
+ <version>1.6.5</version>
37
+ </dependency>
38
+ </dependencies>
39
+
40
+ <build>
41
+ <plugins>
42
+ <plugin>
43
+ <groupId>org.apache.maven.plugins</groupId>
44
+ <artifactId>maven-dependency-plugin</artifactId>
45
+ <version>2.3</version>
46
+ <configuration>
47
+ <artifactItems>
48
+ <artifactItem>
49
+ <groupId>org.jruby</groupId>
50
+ <artifactId>jruby-complete</artifactId>
51
+ <version>1.6.5</version>
52
+ <type>jar</type>
53
+ <overWrite>false</overWrite>
54
+ </artifactItem>
55
+ </artifactItems>
56
+ </configuration>
57
+ <executions>
58
+ <execution>
59
+ <id>unpack</id>
60
+ <goals>
61
+ <goal>unpack</goal>
62
+ </goals>
63
+ </execution>
64
+ </executions>
65
+ </plugin>
66
+ </plugins>
67
+ </build>
68
+
69
+ </project>
@@ -0,0 +1,70 @@
1
+ package redstorm.storm.jruby;
2
+
3
+ import backtype.storm.task.OutputCollector;
4
+ import backtype.storm.task.TopologyContext;
5
+ import backtype.storm.topology.IRichBolt;
6
+ import backtype.storm.topology.OutputFieldsDeclarer;
7
+ import backtype.storm.tuple.Tuple;
8
+ import java.util.Map;
9
+
10
+ /**
11
+ * the JRubyBolt class is a simple proxy class to the actual bolt implementation in JRuby.
12
+ * this proxy is required to bypass the serialization/deserialization process when dispatching
13
+ * the bolts to the workers. JRuby does not yet support serialization from Java
14
+ * (Java serialization call on a JRuby class).
15
+ *
16
+ * Note that the JRuby bolt proxy class is instanciated in the prepare method which is called after
17
+ * deserialization at the worker and in the declareOutputFields method which is called once before
18
+ * serialization at topology creation.
19
+ */
20
+ public class JRubyBolt implements IRichBolt {
21
+ IRichBolt _proxyBolt;
22
+ String _realBoltClassName;
23
+ String _baseClassPath;
24
+ /**
25
+ * create a new JRubyBolt
26
+ *
27
+ * @param baseClassPath the topology/project base JRuby class file path
28
+ * @param realBoltClassName the fully qualified JRuby bolt implementation class name
29
+ */
30
+ public JRubyBolt(String baseClassPath, String realBoltClassName) {
31
+ _baseClassPath = baseClassPath;
32
+ _realBoltClassName = realBoltClassName;
33
+ }
34
+
35
+ @Override
36
+ public void prepare(final Map stormConf, final TopologyContext context, final OutputCollector collector) {
37
+ // create instance of the jruby class here, after deserialization in the workers.
38
+ _proxyBolt = newProxyBolt(_baseClassPath, _realBoltClassName);
39
+ _proxyBolt.prepare(stormConf, context, collector);
40
+ }
41
+
42
+ @Override
43
+ public void execute(Tuple input) {
44
+ _proxyBolt.execute(input);
45
+ }
46
+
47
+ @Override
48
+ public void cleanup() {
49
+ _proxyBolt.cleanup();
50
+ }
51
+
52
+ @Override
53
+ public void declareOutputFields(OutputFieldsDeclarer declarer) {
54
+ // declareOutputFields is executed in the topology creation time, before serialisation.
55
+ // do not set the _proxyBolt instance variable here to avoid JRuby serialization
56
+ // issues. Just create tmp bolt instance to call declareOutputFields.
57
+ IRichBolt bolt = newProxyBolt(_baseClassPath, _realBoltClassName);
58
+ bolt.declareOutputFields(declarer);
59
+ }
60
+
61
+ private static IRichBolt newProxyBolt(String baseClassPath, String realBoltClassName) {
62
+ try {
63
+ redstorm.proxy.Bolt proxy = new redstorm.proxy.Bolt(baseClassPath, realBoltClassName);
64
+ return proxy;
65
+ }
66
+ catch (Exception e) {
67
+ throw new RuntimeException(e);
68
+ }
69
+ }
70
+ }
@@ -0,0 +1,90 @@
1
+ package redstorm.storm.jruby;
2
+
3
+ import backtype.storm.spout.SpoutOutputCollector;
4
+ import backtype.storm.task.TopologyContext;
5
+ import backtype.storm.topology.IRichSpout;
6
+ import backtype.storm.topology.OutputFieldsDeclarer;
7
+ import backtype.storm.tuple.Tuple;
8
+ import java.util.Map;
9
+
10
+ /**
11
+ * the JRubySpout class is a simple proxy class to the actual spout implementation in JRuby.
12
+ * this proxy is required to bypass the serialization/deserialization process when dispatching
13
+ * the spout to the workers. JRuby does not yet support serialization from Java
14
+ * (Java serialization call on a JRuby class).
15
+ *
16
+ * Note that the JRuby spout proxy class is instanciated in the open method which is called after
17
+ * deserialization at the worker and in both the declareOutputFields and isDistributed methods which
18
+ * are called once before serialization at topology creation.
19
+ */
20
+ public class JRubySpout implements IRichSpout {
21
+ IRichSpout _proxySpout;
22
+ String _realSpoutClassName;
23
+ String _baseClassPath;
24
+
25
+ /**
26
+ * create a new JRubySpout
27
+ *
28
+ * @param baseClassPath the topology/project base JRuby class file path
29
+ * @param realSpoutClassName the fully qualified JRuby spout implementation class name
30
+ */
31
+ public JRubySpout(String baseClassPath, String realSpoutClassName) {
32
+ _baseClassPath = baseClassPath;
33
+ _realSpoutClassName = realSpoutClassName;
34
+ }
35
+
36
+ @Override
37
+ public boolean isDistributed() {
38
+ // isDistributed is executed in the topology creation time before serialisation.
39
+ // do not set the _proxySpout instance variable here to avoid JRuby serialization
40
+ // issues. Just create tmp spout instance to call isDistributed.
41
+ IRichSpout spout = newProxySpout(_baseClassPath, _realSpoutClassName);
42
+ return spout.isDistributed();
43
+ }
44
+
45
+ @Override
46
+ public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
47
+ // create instance of the jruby proxy class here, after deserialization in the workers.
48
+ _proxySpout = newProxySpout(_baseClassPath, _realSpoutClassName);
49
+ _proxySpout.open(conf, context, collector);
50
+ }
51
+
52
+ @Override
53
+ public void close() {
54
+ _proxySpout.close();
55
+ }
56
+
57
+ @Override
58
+ public void nextTuple() {
59
+ _proxySpout.nextTuple();
60
+ }
61
+
62
+ @Override
63
+ public void ack(Object msgId) {
64
+ _proxySpout.ack(msgId);
65
+ }
66
+
67
+ @Override
68
+ public void fail(Object msgId) {
69
+ _proxySpout.fail(msgId);
70
+ }
71
+
72
+ @Override
73
+ public void declareOutputFields(OutputFieldsDeclarer declarer) {
74
+ // declareOutputFields is executed in the topology creation time before serialisation.
75
+ // do not set the _proxySpout instance variable here to avoid JRuby serialization
76
+ // issues. Just create tmp spout instance to call declareOutputFields.
77
+ IRichSpout spout = newProxySpout(_baseClassPath, _realSpoutClassName);
78
+ spout.declareOutputFields(declarer);
79
+ }
80
+
81
+ private static IRichSpout newProxySpout(String baseClassPath, String realSpoutClassName) {
82
+ try {
83
+ redstorm.proxy.Spout proxy = new redstorm.proxy.Spout(baseClassPath, realSpoutClassName);
84
+ return proxy;
85
+ }
86
+ catch (Exception e) {
87
+ throw new RuntimeException(e);
88
+ }
89
+ }
90
+ }
metadata ADDED
@@ -0,0 +1,109 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: redstorm
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.1.0
6
+ platform: ruby
7
+ authors:
8
+ - Colin Surprenant
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2011-11-07 00:00:00 Z
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: rubyforge
17
+ prerelease: false
18
+ requirement: &id001 !ruby/object:Gem::Requirement
19
+ none: false
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: "0"
24
+ type: :development
25
+ version_requirements: *id001
26
+ - !ruby/object:Gem::Dependency
27
+ name: rake
28
+ prerelease: false
29
+ requirement: &id002 !ruby/object:Gem::Requirement
30
+ none: false
31
+ requirements:
32
+ - - ~>
33
+ - !ruby/object:Gem::Version
34
+ version: 0.9.2
35
+ type: :runtime
36
+ version_requirements: *id002
37
+ - !ruby/object:Gem::Dependency
38
+ name: ruby-maven
39
+ prerelease: false
40
+ requirement: &id003 !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ version: 3.0.3.0.28.5
46
+ type: :runtime
47
+ version_requirements: *id003
48
+ description: JRuby integration for the Storm distributed realtime computation system
49
+ email:
50
+ - colin.surprenant@gmail.com
51
+ executables:
52
+ - redstorm
53
+ extensions: []
54
+
55
+ extra_rdoc_files: []
56
+
57
+ files:
58
+ - lib/red_storm.rb
59
+ - lib/red_storm/application.rb
60
+ - lib/red_storm/topology_launcher.rb
61
+ - lib/red_storm/version.rb
62
+ - lib/red_storm/proxy/bolt.rb
63
+ - lib/red_storm/proxy/spout.rb
64
+ - examples/cluster_word_count_topology.rb
65
+ - examples/exclamation_bolt.rb
66
+ - examples/local_exclamation_topology.rb
67
+ - examples/local_exclamation_topology2.rb
68
+ - examples/local_redis_word_count_topology.rb
69
+ - examples/local_word_count_topology.rb
70
+ - examples/random_sentence_spout.rb
71
+ - examples/split_sentence_bolt.rb
72
+ - examples/word_count_bolt.rb
73
+ - src/main/redstorm/storm/jruby/JRubyBolt.java
74
+ - src/main/redstorm/storm/jruby/JRubySpout.java
75
+ - bin/redstorm
76
+ - Rakefile
77
+ - pom.xml
78
+ - README.md
79
+ - CHANGELOG.md
80
+ - LICENSE.md
81
+ homepage: https://github.com/colinsurprenant/redstorm
82
+ licenses: []
83
+
84
+ post_install_message:
85
+ rdoc_options: []
86
+
87
+ require_paths:
88
+ - lib
89
+ required_ruby_version: !ruby/object:Gem::Requirement
90
+ none: false
91
+ requirements:
92
+ - - ">="
93
+ - !ruby/object:Gem::Version
94
+ version: "0"
95
+ required_rubygems_version: !ruby/object:Gem::Requirement
96
+ none: false
97
+ requirements:
98
+ - - ">="
99
+ - !ruby/object:Gem::Version
100
+ version: 1.3.0
101
+ requirements: []
102
+
103
+ rubyforge_project: redstorm
104
+ rubygems_version: 1.8.9
105
+ signing_key:
106
+ specification_version: 3
107
+ summary: JRuby on Storm
108
+ test_files: []
109
+