redstorm 0.6.6 → 0.7.0.beta1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (46) hide show
  1. checksums.yaml +7 -0
  2. data/CHANGELOG.md +6 -1
  3. data/README.md +8 -7
  4. data/examples/dsl/exclamation_topology.rb +2 -4
  5. data/examples/dsl/exclamation_topology2.rb +4 -5
  6. data/examples/dsl/hello_world_topology.rb +7 -0
  7. data/examples/dsl/kafka_topology.rb +5 -1
  8. data/examples/dsl/redis_word_count_topology.rb +5 -9
  9. data/examples/dsl/ruby_version_topology.rb +2 -0
  10. data/examples/dsl/word_count_topology.rb +4 -5
  11. data/examples/trident/word_count_query.rb +33 -0
  12. data/examples/trident/word_count_topology.rb +153 -0
  13. data/ivy/storm_dependencies.xml +1 -1
  14. data/ivy/topology_dependencies.xml +3 -2
  15. data/lib/red_storm.rb +5 -2
  16. data/lib/red_storm/configurator.rb +12 -0
  17. data/lib/red_storm/dsl/batch_bolt.rb +34 -0
  18. data/lib/red_storm/dsl/batch_committer_bolt.rb +9 -0
  19. data/lib/red_storm/dsl/batch_spout.rb +53 -0
  20. data/lib/red_storm/dsl/bolt.rb +7 -2
  21. data/lib/red_storm/dsl/output_collector.rb +8 -0
  22. data/lib/red_storm/dsl/spout.rb +3 -1
  23. data/lib/red_storm/dsl/topology.rb +2 -2
  24. data/lib/red_storm/dsl/tuple.rb +2 -0
  25. data/lib/red_storm/topology_launcher.rb +14 -10
  26. data/lib/red_storm/version.rb +1 -1
  27. data/redstorm.gemspec +1 -0
  28. data/src/main/redstorm/storm/jruby/JRubyBatchBolt.java +53 -35
  29. data/src/main/redstorm/storm/jruby/JRubyBatchSpout.java +77 -42
  30. data/src/main/redstorm/storm/jruby/JRubyBolt.java +54 -34
  31. data/src/main/redstorm/storm/jruby/JRubySpout.java +62 -40
  32. data/src/main/redstorm/storm/jruby/JRubyTransactionalBolt.java +57 -35
  33. data/src/main/redstorm/storm/jruby/JRubyTransactionalCommitterBolt.java +6 -17
  34. data/src/main/redstorm/storm/jruby/JRubyTransactionalCommitterSpout.java +14 -26
  35. data/src/main/redstorm/storm/jruby/JRubyTransactionalSpout.java +60 -37
  36. data/src/main/redstorm/storm/jruby/JRubyTridentFunction.java +66 -0
  37. metadata +16 -23
  38. data/lib/red_storm/proxy/batch_bolt.rb +0 -63
  39. data/lib/red_storm/proxy/batch_committer_bolt.rb +0 -52
  40. data/lib/red_storm/proxy/batch_spout.rb +0 -59
  41. data/lib/red_storm/proxy/bolt.rb +0 -63
  42. data/lib/red_storm/proxy/proxy_function.rb +0 -40
  43. data/lib/red_storm/proxy/spout.rb +0 -87
  44. data/lib/red_storm/proxy/transactional_committer_spout.rb +0 -47
  45. data/lib/red_storm/proxy/transactional_spout.rb +0 -46
  46. data/src/main/redstorm/storm/jruby/JRubyProxyFunction.java +0 -51
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d7b22ba82aa7cc0135889207e53253b16c129efb
4
+ data.tar.gz: ae678b9666da95da30f4f2a23a344d0fd44df022
5
+ SHA512:
6
+ metadata.gz: 1bb9e682e7b053f7a2df1c043ab5821df427cf62bfcd26e1281253574f097e88f623bc93db9e73ff1494c865ea2ea3e61e9c7004a0b6c17819dd5f903676f1cc
7
+ data.tar.gz: bb6148070c8de87f56339c517e6ed8eed4d873a35db5212dc98b720d929fdfe9ba960dd51273995bfee0b2d6a473e4399f7c8c0e0abbe54e2a6bf65d0bcd27f0
@@ -97,4 +97,9 @@
97
97
  - [issue #76](https://github.com/colinsurprenant/redstorm/issues/76) - avoid shelling out to storm jar command for cluster submission
98
98
 
99
99
  # 0.6.6, 07-25-2013
100
- - updated example Kafka topology for new dependencies for Storm KafkaSpout
100
+ - updated example Kafka topology for new dependencies for Storm KafkaSpout
101
+
102
+ # 0.7.0.beta1, 03-2014
103
+ - refactored the proxy classes for better performance
104
+ - Storm 0.9.1-incubating and JRuby 1.7.11
105
+ - added Trident example in `examples/trident/`
data/README.md CHANGED
@@ -1,6 +1,9 @@
1
1
  # RedStorm - JRuby on Storm
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/redstorm.png)](http://badge.fury.io/rb/redstorm)
3
4
  [![build status](https://secure.travis-ci.org/colinsurprenant/redstorm.png)](http://travis-ci.org/colinsurprenant/redstorm)
5
+ [![Code Climate](https://codeclimate.com/github/colinsurprenant/redstorm.png)](https://codeclimate.com/github/colinsurprenant/redstorm)
6
+ [![Coverage Status](https://coveralls.io/repos/colinsurprenant/redstorm/badge.png?branch=master)](https://coveralls.io/r/colinsurprenant/redstorm?branch=master)
4
7
 
5
8
  RedStorm provides a Ruby DSL using JRuby integration for the [Storm](https://github.com/nathanmarz/storm/) distributed realtime computation system.
6
9
 
@@ -13,17 +16,15 @@ Check also these related projects:
13
16
 
14
17
  ## Documentation
15
18
 
16
- <!--
17
- ---
18
- This is the documentation for the **current 0.6.6-beta2 version of RedStorm** - the **[latest released Gem is v0.6.5](https://github.com/colinsurprenant/redstorm/wiki/RedStorm-Gem-v0.6.5-Documentation)**
19
-
20
- ---
21
- -->
22
19
  Chances are new versions of RedStorm will introduce changes that will break compatibility or change the developement workflow. To prevent out-of-sync documentation, per version specific documentation are kept [in the wiki](https://github.com/colinsurprenant/redstorm/wiki) when necessary.
23
20
 
24
21
  ## Dependencies
25
22
 
26
- Tested on **OSX 10.8.3** and **Ubuntu Linux 12.10** using **Storm 0.9.0-wip16** and **JRuby 1.7.4** and **OpenJDK 7**
23
+ #### Stable 0.6.6
24
+ - Tested on **OSX 10.8.3**, **Ubuntu Linux 12.10** using **Storm 0.9.0-wip16**, **JRuby 1.7.4**, **OpenJDK 7**
25
+
26
+ #### Current 0.7.0.beta1
27
+ - Tested on **OSX 10.9.1**, **Ubuntu Linux 12.10** using **Storm 0.9.1-incubating**, **JRuby 1.7.11**, **OpenJDK 7**
27
28
 
28
29
  ## Installation
29
30
 
@@ -24,10 +24,8 @@ module RedStorm
24
24
  configure do |env|
25
25
  debug false
26
26
  max_task_parallelism 4
27
- if env == :cluster
28
- num_workers 4
29
- max_spout_pending(1000)
30
- end
27
+ num_workers 1
28
+ max_spout_pending 1000
31
29
  end
32
30
 
33
31
  on_submit do |env|
@@ -21,15 +21,14 @@ module RedStorm
21
21
 
22
22
  bolt ExclamationBolt, :id => :ExclamationBolt2, :parallelism => 2 do
23
23
  source ExclamationBolt, :shuffle
24
+ debug true
24
25
  end
25
26
 
26
27
  configure do |env|
27
- debug true
28
+ debug false
28
29
  max_task_parallelism 4
29
- if env == :cluster
30
- num_workers 4
31
- max_spout_pending(1000)
32
- end
30
+ num_workers 1
31
+ max_spout_pending 1000
33
32
  end
34
33
 
35
34
  on_submit do |env|
@@ -19,4 +19,11 @@ class HelloWorldTopology < RedStorm::DSL::Topology
19
19
  bolt HelloWorldBolt do
20
20
  source HelloWorldSpout, :global
21
21
  end
22
+
23
+ configure do
24
+ debug false
25
+ max_task_parallelism 4
26
+ num_workers 1
27
+ max_spout_pending 1000
28
+ end
22
29
  end
@@ -48,10 +48,14 @@ class KafkaTopology < RedStorm::DSL::Topology
48
48
  bolt SplitStringBolt do
49
49
  output_fields :word
50
50
  source KafkaSpout, :shuffle
51
+ debug true
51
52
  end
52
53
 
53
54
  configure do |env|
54
- debug true
55
+ debug false
56
+ max_task_parallelism 4
57
+ num_workers 1
58
+ max_spout_pending 1000
55
59
  end
56
60
 
57
61
  on_submit do |env|
@@ -39,19 +39,15 @@ module RedStorm
39
39
  spout RedisWordSpout
40
40
 
41
41
  bolt WordCountBolt, :parallelism => 2 do
42
+ debug true
42
43
  source RedisWordSpout, :fields => ["word"]
43
44
  end
44
45
 
45
46
  configure do |env|
46
- debug true
47
- case env
48
- when :local
49
- max_task_parallelism 2
50
- when :cluster
51
- max_task_parallelism 2
52
- num_workers 2
53
- max_spout_pending(1000)
54
- end
47
+ debug false
48
+ max_task_parallelism 2
49
+ num_workers 1
50
+ max_spout_pending 1000
55
51
  end
56
52
  end
57
53
  end
@@ -24,6 +24,8 @@ module RedStorm
24
24
  spout VersionSpout
25
25
 
26
26
  configure do |env|
27
+ max_task_parallelism 1
28
+ num_workers 1
27
29
  debug false
28
30
  end
29
31
 
@@ -13,16 +13,15 @@ module RedStorm
13
13
  end
14
14
 
15
15
  bolt WordCountBolt, :parallelism => 2 do
16
+ debug true
16
17
  source SplitSentenceBolt, :fields => ["word"]
17
18
  end
18
19
 
19
20
  configure :word_count do |env|
20
- debug true
21
+ debug false
21
22
  max_task_parallelism 4
22
- if env == :cluster
23
- num_workers 6
24
- max_spout_pending(1000)
25
- end
23
+ num_workers 1
24
+ max_spout_pending 1000
26
25
  end
27
26
 
28
27
  on_submit do |env|
@@ -0,0 +1,33 @@
1
+ require "red_storm"
2
+ require "json"
3
+
4
+ java_import "backtype.storm.utils.DRPCClient"
5
+
6
+ # Usage:
7
+ #
8
+ # This is a DRPC client that will query a Storm cluster trident drpc topology.
9
+ # See the trident word_count_topology.rb for runnnig the drpc topology.
10
+ #
11
+ # Edit the host and port below.
12
+
13
+ module Example
14
+
15
+ # this is not a topology, the redstorm topology_launcher will launch any class with the
16
+ # start method in the correct storm environment
17
+
18
+ class TridentWordCountQuery
19
+ RedStorm::Configuration.topology_class = self
20
+
21
+ def start(env)
22
+ puts("TridentWordCountQuery starting")
23
+
24
+ client = DRPCClient.new("localhost", 3772)
25
+ loop do
26
+ json_result = client.execute("words", "cat the dog jumped")
27
+ puts("DRPC execute=#{JSON.parse(json_result)[0][0]}")
28
+
29
+ sleep(2)
30
+ end
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,153 @@
1
+ require 'red_storm'
2
+ require 'json'
3
+
4
+ java_import "backtype.storm.LocalCluster"
5
+ java_import "backtype.storm.LocalDRPC"
6
+ java_import "backtype.storm.StormSubmitter"
7
+ java_import "backtype.storm.generated.StormTopology"
8
+ java_import "backtype.storm.tuple.Fields"
9
+ java_import "backtype.storm.tuple.Values"
10
+ java_import "storm.trident.TridentState"
11
+ java_import "storm.trident.TridentTopology"
12
+ java_import "storm.trident.operation.BaseFunction"
13
+ java_import "storm.trident.operation.TridentCollector"
14
+ java_import "storm.trident.operation.builtin.Count"
15
+ java_import "storm.trident.operation.builtin.FilterNull"
16
+ java_import "storm.trident.operation.builtin.MapGet"
17
+ java_import "storm.trident.operation.builtin.Sum"
18
+ java_import "storm.trident.testing.FixedBatchSpout"
19
+ java_import "storm.trident.testing.MemoryMapState"
20
+ java_import "storm.trident.tuple.TridentTuple"
21
+
22
+ java_import 'redstorm.storm.jruby.JRubyTridentFunction'
23
+
24
+ REQUIRE_PATH = Pathname.new(__FILE__).relative_path_from(Pathname.new(RedStorm::BASE_PATH)).to_s
25
+
26
+ # Usage:
27
+ #
28
+ # Local mode:
29
+ #
30
+ # $ redstorm install
31
+ # $ redstorm examples
32
+ # $ restorm local examples/trident/word_count_topology.rb
33
+ #
34
+ # Cluster mode:
35
+ #
36
+ # $ redstorm install
37
+ # $ redstorm examples
38
+ # $ redstorm jar examples
39
+ # $ redstorm cluster examples/trident/word_count_topology.rb
40
+ #
41
+ # After submission, wait a bit for topology to startup and launch the drpc query example:
42
+ # Edit word_count_query.rb to set the host/port of your cluster drpc daemon.
43
+ #
44
+ # $ redstorm local examples/trident/word_count_query.rb
45
+
46
+ module Examples
47
+ class TridentSplit
48
+
49
+ def execute(tuple, collector)
50
+ tuple[0].split(" ").each do |word|
51
+ collector.emit(Values.new(word))
52
+ end
53
+ end
54
+
55
+ def prepare(conf, context); end
56
+ def cleanup;end
57
+ end
58
+
59
+ class TridentWordCountTopology
60
+ RedStorm::Configuration.topology_class = self
61
+
62
+ def build_topology(local_drpc)
63
+ spout = FixedBatchSpout.new(
64
+ Fields.new("sentence"), 3,
65
+ Values.new("the cow jumped over the moon"),
66
+ Values.new("the man went to the store and bought some candy"),
67
+ Values.new("four score and seven years ago"),
68
+ Values.new("how many apples can you eat"),
69
+ Values.new("to be or not to be the person")
70
+ )
71
+ spout.cycle = true
72
+
73
+ topology = TridentTopology.new
74
+
75
+ stream = topology.new_stream("spout1", spout)
76
+ .parallelism_hint(3)
77
+ .each(
78
+ Fields.new("sentence"),
79
+ JRubyTridentFunction.new(REQUIRE_PATH, "Examples::TridentSplit"),
80
+ Fields.new("word")
81
+ )
82
+ .groupBy(
83
+ Fields.new("word")
84
+ )
85
+ .persistentAggregate(
86
+ MemoryMapState::Factory.new,
87
+ Count.new,
88
+ Fields.new("count")
89
+ )
90
+ .parallelism_hint(3)
91
+
92
+ # topology.newDRPCStream("words", drpc)
93
+ topology.newDRPCStream("words", local_drpc)
94
+ .each(
95
+ Fields.new("args"),
96
+ JRubyTridentFunction.new(REQUIRE_PATH, "Examples::TridentSplit"),
97
+ Fields.new("word")
98
+ )
99
+ .groupBy(
100
+ Fields.new("word")
101
+ )
102
+ .stateQuery(
103
+ stream,
104
+ Fields.new("word"),
105
+ MapGet.new,
106
+ Fields.new("count")
107
+ )
108
+ .each(
109
+ Fields.new("count"),
110
+ FilterNull.new
111
+ )
112
+ .aggregate(
113
+ Fields.new("count"),
114
+ Sum.new,
115
+ Fields.new("sum")
116
+ )
117
+
118
+ topology.build
119
+ end
120
+
121
+ def display_drpc(client)
122
+ loop do
123
+ sleep(2)
124
+
125
+ json_result = client.execute("words", "cat the dog jumped")
126
+ puts("DRPC execute=#{JSON.parse(json_result)[0][0]}")
127
+ end
128
+ end
129
+
130
+ def start(env)
131
+ conf = Backtype::Config.new
132
+ conf.debug = false
133
+ conf.max_spout_pending = 20
134
+
135
+ case env
136
+ when :local
137
+ local_drpc = LocalDRPC.new
138
+ submitter = LocalCluster.new
139
+ conf.num_workers = 1 # set to 1 in local, see https://issues.apache.org/jira/browse/STORM-113
140
+ when :cluster
141
+ local_drpc = nil
142
+ submitter = StormSubmitter
143
+ conf.put("drpc.servers", ["localhost"])
144
+ conf.num_workers = 3
145
+ end
146
+
147
+ submitter.submit_topology("trident_word_count", conf, build_topology(local_drpc));
148
+
149
+ display_drpc(local_drpc) if local_drpc
150
+ end
151
+ end
152
+
153
+ end
@@ -2,7 +2,7 @@
2
2
  <ivy-module version="2.0" xmlns:m="http://ant.apache.org/ivy/maven">
3
3
  <info organisation="redstorm" module="storm-deps"/>
4
4
  <dependencies>
5
- <dependency org="storm" name="storm" rev="0.9.0-wip16" conf="default" transitive="true" />
5
+ <dependency org="org.apache.storm" name="storm-core" rev="0.9.1-incubating" conf="default" transitive="true" />
6
6
  <override org="org.slf4j" module="slf4j-log4j12" rev="1.6.3"/>
7
7
  </dependencies>
8
8
  </ivy-module>
@@ -2,11 +2,12 @@
2
2
  <ivy-module version="2.0" xmlns:m="http://ant.apache.org/ivy/maven">
3
3
  <info organisation="redstorm" module="topology-deps"/>
4
4
  <dependencies>
5
- <dependency org="org.jruby" name="jruby-core" rev="1.7.4" conf="default" transitive="true"/>
5
+ <dependency org="org.jruby" name="jruby-core" rev="1.7.11" conf="default" transitive="true"/>
6
+ <dependency org="org.jruby" name="jruby-stdlib" rev="1.7.11" conf="default" transitive="true"/>
6
7
 
7
8
  <!-- explicitely specify jffi to also fetch the native jar. make sure to update jffi version matching jruby-core version -->
8
9
  <!-- this is the only way I found using Ivy to fetch the native jar -->
9
- <dependency org="com.github.jnr" name="jffi" rev="1.2.5" conf="default" transitive="true">
10
+ <dependency org="com.github.jnr" name="jffi" rev="1.2.7" conf="default" transitive="true">
10
11
  <artifact name="jffi" type="jar" />
11
12
  <artifact name="jffi" type="jar" m:classifier="native"/>
12
13
  </dependency>
@@ -1,10 +1,13 @@
1
- require 'rubygems'
2
-
3
1
  require 'red_storm/version'
4
2
  require 'red_storm/environment'
5
3
  require 'red_storm/configuration'
4
+ require 'red_storm/configurator'
6
5
  require 'red_storm/dsl/bolt'
6
+ require 'red_storm/dsl/batch_bolt'
7
+ require 'red_storm/dsl/batch_committer_bolt'
7
8
  require 'red_storm/dsl/spout'
9
+ require 'red_storm/dsl/batch_spout'
8
10
  require 'red_storm/dsl/topology'
9
11
  require 'red_storm/dsl/drpc_topology'
10
12
  require 'red_storm/dsl/tuple'
13
+ require 'red_storm/dsl/output_collector'
@@ -1,3 +1,15 @@
1
+ # This hack get rif of the "Use RbConfig instead of obsolete and deprecated Config"
2
+ # deprecation warning that is triggered by "java_import 'backtype.storm.Config'".
3
+ begin
4
+ Object.send :remove_const, :Config
5
+ Config = RbConfig
6
+ rescue NameError
7
+ end
8
+
9
+ module Backtype
10
+ java_import 'backtype.storm.Config'
11
+ end
12
+
1
13
  module RedStorm
2
14
 
3
15
  class Configurator
@@ -0,0 +1,34 @@
1
+ module RedStorm
2
+ module DSL
3
+
4
+ class BatchBolt < Bolt
5
+ attr_reader :id
6
+
7
+ def self.java_proxy; "Java::RedstormStormJruby::JRubyBatchBolt"; end
8
+
9
+ def self.on_finish_batch(method_name = nil, &on_finish_batch_block)
10
+ body = block_given? ? on_finish_batch_block : lambda {self.send((method_name || :on_finish_batch).to_sym)}
11
+ define_method(:on_finish_batch, body)
12
+ end
13
+
14
+ def prepare(config, context, collector, id)
15
+ @collector = collector
16
+ @context = context
17
+ @config = config
18
+ @id = id
19
+
20
+ on_init
21
+ end
22
+
23
+ def finish_batch
24
+ on_finish_batch
25
+ end
26
+
27
+ private
28
+
29
+ # default noop optional dsl callbacks
30
+ def on_finish_batch; end
31
+
32
+ end
33
+ end
34
+ end