redstorm 0.1.1 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (31) hide show
  1. data/CHANGELOG.md +7 -0
  2. data/README.md +363 -32
  3. data/Rakefile +10 -125
  4. data/bin/redstorm +1 -0
  5. data/examples/{cluster_word_count_topology.rb → native/cluster_word_count_topology.rb} +4 -4
  6. data/examples/{exclamation_bolt.rb → native/exclamation_bolt.rb} +0 -0
  7. data/examples/{local_exclamation_topology.rb → native/local_exclamation_topology.rb} +2 -2
  8. data/examples/{local_exclamation_topology2.rb → native/local_exclamation_topology2.rb} +1 -1
  9. data/examples/{local_redis_word_count_topology.rb → native/local_redis_word_count_topology.rb} +2 -2
  10. data/examples/{local_word_count_topology.rb → native/local_word_count_topology.rb} +4 -4
  11. data/examples/{random_sentence_spout.rb → native/random_sentence_spout.rb} +0 -0
  12. data/examples/{split_sentence_bolt.rb → native/split_sentence_bolt.rb} +0 -0
  13. data/examples/{word_count_bolt.rb → native/word_count_bolt.rb} +0 -0
  14. data/examples/simple/exclamation_bolt.rb +6 -0
  15. data/examples/simple/exclamation_topology.rb +36 -0
  16. data/examples/simple/exclamation_topology2.rb +41 -0
  17. data/examples/simple/random_sentence_spout.rb +18 -0
  18. data/examples/simple/redis_word_count_topology.rb +54 -0
  19. data/examples/simple/split_sentence_bolt.rb +29 -0
  20. data/examples/simple/word_count_bolt.rb +15 -0
  21. data/examples/simple/word_count_topology.rb +34 -0
  22. data/lib/red_storm.rb +3 -0
  23. data/lib/red_storm/application.rb +20 -13
  24. data/lib/red_storm/simple_bolt.rb +106 -0
  25. data/lib/red_storm/simple_spout.rb +136 -0
  26. data/lib/red_storm/simple_topology.rb +191 -0
  27. data/lib/red_storm/topology_launcher.rb +10 -7
  28. data/lib/red_storm/version.rb +1 -1
  29. data/lib/tasks/red_storm.rake +151 -0
  30. data/pom.xml +1 -1
  31. metadata +24 -12
@@ -3,3 +3,10 @@
3
3
 
4
4
  # 0.1.1, 11-10-2011
5
5
  - issue #1 cannot find redstorm gem when using rbenv
6
+
7
+ # 0.2.0, 11-16-2011
8
+ - issue #2 redstorm examples fails when directory examples already exists
9
+ - new *simple* DSL
10
+ - examples using simple DSL
11
+ - redstorm command usage syntax change
12
+ - more doc in README
data/README.md CHANGED
@@ -1,25 +1,31 @@
1
- # RedStorm v0.1.1 - JRuby on Storm
1
+ # RedStorm v0.2.0 - JRuby on Storm
2
2
 
3
3
  RedStorm provides the JRuby integration for the [Storm][storm] distributed realtime computation system.
4
4
 
5
- ## disclaimer/limitations
5
+ ## Changes from 0.1.x
6
6
 
7
- The current Ruby interface is **very** similar to the Java interface. A more idiomatic Ruby interface will be be addded, as I better understand the various usage patterns.
7
+ - This release introduces the *simple* DSL. Topology, Spout and Bolt classes can inherit from the SimpleTopoloy, SimpleSpout and SimpleBolt classes which provides a very clean and consise DSL. See [examples/simple](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple).
8
+ - Use the same SimpleTopology class for local development cluster or remote production cluster.
9
+ - The `redstorm` command has a new syntax.
8
10
 
9
- ## dependencies
11
+ ## Dependencies
10
12
 
11
13
  This has been tested on OSX 10.6.8 and Linux 10.04 using Storm 0.5.4 and JRuby 1.6.5
12
14
 
13
- ## installation
15
+ ## Installation
14
16
  ``` sh
15
17
  $ gem install redstorm
16
18
  ```
17
19
 
18
- ## usage
20
+ ## Usage overview
19
21
 
20
- The currently supported usage pattern is to start your new Storm project in an empty directory, install the RedStorm gem and follow the steps below. There is no layout constrains for your project. The `target/` directory will be created by RedStorm in the root of your project.
22
+ - create a new empty project directory.
23
+ - install the [RedStorm gem](http://rubygems.org/gems/redstorm).
24
+ - create a subdirectory which will contain your sources.
25
+ - perform the initial setup as described below to install the dependencies in the `target/` subdir of your project directory.
26
+ - run your topology in local mode and/or on a production cluster as described below.
21
27
 
22
- ### initial setup
28
+ ### Initial setup
23
29
 
24
30
  Install RedStom dependencies; from your project root directory execute:
25
31
 
@@ -29,80 +35,405 @@ $ redstorm install
29
35
 
30
36
  The `install` command will install all Java jars dependencies using [ruby-maven][ruby-maven] in `target/dependency` and generate & compile the Java bindings in `target/classes`
31
37
 
32
- ### run in local mode
38
+ ***DON'T PANIC*** it's Maven. The first time you run `$ redstorm install` Maven will take a few minutes resolving dependencies and in the end will download and install the dependency jar files.
33
39
 
34
- Create a topology class that implements the `start` method. The *underscore* topology_class_file_name.rb **MUST** correspond to its *CamelCase* class name.
40
+ ### Run in local mode
41
+
42
+ Create a topology class. The *underscore* topology_class_file_name.rb **MUST** correspond to its *CamelCase* class name.
35
43
 
36
44
  ``` sh
37
- $ redstorm topology_class_file_name.rb
45
+ $ redstorm local <path/to/topology_class_file_name.rb>
38
46
  ```
39
47
 
40
48
  **See examples below** to run examples in local mode or on a production cluster.
41
49
 
42
- ### run on production cluster
50
+ ### Run on production cluster
43
51
 
44
- - generate `target/cluster-topology.jar`. This jar file will include everything in your project directory plus the required dependencies from the `target/` directory:
52
+ - generate `target/cluster-topology.jar`. This jar file will include your sources directory plus the required dependencies from the `target/` directory:
45
53
 
46
54
  ``` sh
47
- $ redstorm jar
55
+ $ redstorm jar <sources_directory>
48
56
  ```
49
57
 
50
- - submit the cluster topology jar file to the cluster, assuming you have the Storm distribution installed and the Storm `bin/` directory in your path:
58
+ - submit the cluster topology jar file to the cluster. Assuming you have the Storm distribution installed and the Storm `bin/` directory in your path:
51
59
 
52
60
  ``` sh
53
- storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher topology_class_file_name.rb
61
+ storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher cluster <path/to/topology_class_file_name.rb>
54
62
  ```
55
63
 
56
64
  Basically you must follow the [Storm instructions](https://github.com/nathanmarz/storm/wiki) to [setup a production cluster](https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster) and [submit your topology to the cluster](https://github.com/nathanmarz/storm/wiki/Running-topologies-on-a-production-cluster).
57
65
 
66
+ ## Examples
58
67
 
59
- ## examples
60
-
61
- Install the example files into `examples/`:
68
+ Install the [example files](https://github.com/colinsurprenant/redstorm/tree/master/examples) in your project. The `examples/` dir will be created in your project root dir.
62
69
 
63
70
  ``` sh
64
71
  $ redstorm examples
65
72
  ```
66
73
 
67
- ### local mode
74
+ All examples using the **simple DSL** are located in `examples/simple`. Examples using the standard Java interface are in `examples/native`.
75
+
76
+ ### Local mode
68
77
 
69
78
  ``` sh
70
- $ redstorm examples/local_exclamation_topology.rb
71
- $ redstorm examples/local_exclamation_topology2.rb
72
- $ redstorm examples/local_word_count_topology.rb
79
+ $ redstorm local examples/simple/exclamation_topology.rb
80
+ $ redstorm local examples/simple/exclamation_topology2.rb
81
+ $ redstorm local examples/simple/word_count_topology.rb
73
82
  ```
74
83
 
75
- This next example requires the use of a [Redis][redis] server on `localhost:6379`
84
+ This next example requires the use of the [Redis Gem](https://github.com/ezmobius/redis-rb) and a [Redis][redis] server runnig on `localhost:6379`
76
85
 
77
86
  ``` sh
78
- $ redstorm examples/local_redis_word_count_topology.rb
87
+ $ redstorm local examples/simple/redis_word_count_topology.rb
79
88
  ```
80
89
 
81
90
  Using `redis-cli`, push words into the `test` list and watch Storm pick them up
82
91
 
83
- ### production cluster
92
+ ### Production cluster
84
93
 
85
- The only example compatible with a production cluster is `examples/cluster_word_count_topology.rb`
94
+ All examples using the **simple DSL** can also run on a productions cluster. The only **native** example compatible with a production cluster is the [ClusterWordCountTopology](https://github.com/colinsurprenant/redstorm/tree/master/examples/native/cluster_word_count_topology.rb)
86
95
 
87
- - genererate the `target/cluster-topology.jar`
96
+ - genererate the `target/cluster-topology.jar` and include the `examples/` directory.
88
97
 
89
98
  ``` sh
90
- $ redstorm jar
99
+ $ redstorm jar examples
91
100
  ```
92
101
 
93
102
  - submit the cluster topology jar file to the cluster, assuming you have the Storm distribution installed and the Storm `bin/` directory in your path:
94
103
 
95
104
  ``` sh
96
- storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher examples/cluster_word_count_topology.rb
105
+ $ storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher cluster examples/simple/word_count_topology.rb
97
106
  ```
98
107
 
99
108
  Basically you must follow the [Storm instructions](https://github.com/nathanmarz/storm/wiki) to [setup a production cluster](https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster) and [submit your topology to the cluster](https://github.com/nathanmarz/storm/wiki/Running-topologies-on-a-production-cluster).
100
109
 
110
+ ## DSL usage
111
+
112
+ Your project can be created in a single file containing all spouts, bolts and topology classes or each classes can be in its own file, your choice. There are [many examples](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple) for the *simple* DSL.
113
+
114
+ The DSL uses a **callback metaphor** to attach code to the topology/spout/bolt execution contexts using `on_*` DSL constructs (ex.: on_submit, on_send, ...). When using `on_*` you can attach you code in 3 different ways:
115
+
116
+ - using a code block
117
+
118
+ ```ruby
119
+ on_receive (:ack => true, :anchor => true) {|tuple| do_something_with(tuple)}
120
+
121
+ on_receive :ack => true, :anchor => true do |tuple|
122
+ do_something_with(tuple)
123
+ end
124
+ ```
125
+
126
+ - defining the corresponding method
127
+
128
+ ```ruby
129
+ on_receive :ack => true, :anchor => true
130
+ def on_receive(tuple)
131
+ do_something_with(tuple)
132
+ end
133
+ ```
134
+
135
+ - defining an arbitrary method
136
+
137
+ ```ruby
138
+ on_receive :my_method, :ack => true, :anchor => true
139
+ def my_method(tuple)
140
+ do_something_with(tuple)
141
+ end
142
+ ```
143
+
144
+ The [example SplitSentenceBolt](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/split_sentence_bolt.rb) shows the 3 different coding style.
145
+
146
+ ### Topology DSL
147
+
148
+ Normally Storm topology components are assigned and referenced using numeric ids. In the SimpleTopology DSL **ids are optional**. By default the DSL will use the component class name as an implicit symbolic id and bolt source ids can use these implicit ids. The DSL will automatically resolve and assign numeric ids upon topology submission. If two components are of the same class, creating a conflict, then the id can be explicitly defined using either a numeric value, a symbol or a string. Numeric values will be used as-is at topology submission while symbols and strings will be resolved and assigned a numeric id.
149
+
150
+ ```ruby
151
+ require 'red_storm'
152
+
153
+ class MyTopology < RedStorm::SimpleTopology
154
+
155
+ spout spout_class, options
156
+
157
+ bolt bolt_class, options do
158
+ source source_id, grouping
159
+ ...
160
+ end
161
+
162
+ configure topology_name do |env|
163
+ config_attribute value
164
+ ...
165
+ end
166
+
167
+ on_submit do |env|
168
+ ...
169
+ end
170
+ end
171
+ ```
172
+
173
+ #### spout statement
174
+
175
+ ```ruby
176
+ spout spout_class, options
177
+ ```
178
+
179
+ - `spout_class` — spout Ruby class
180
+ - `options`
181
+ - `:id` — spout explicit id (**default** is spout class name)
182
+ - `:parallelism` — spout parallelism (**default** is 1)
183
+
184
+ #### bolt statement
185
+
186
+ ```ruby
187
+ bolt bolt_class, options do
188
+ source source_id, grouping
189
+ ...
190
+ end
191
+ ```
192
+
193
+ - `bolt_class` — bolt Ruby class
194
+ - `options`
195
+ - `:id` — bolt explicit id (**default** is bolt class name)
196
+ - `:parallelism` — bolt parallelism (**default** is 1)
197
+ - `source_id` — source id reference. can be the source class name if unique or the explicit id if defined
198
+ - `grouping`
199
+ - `:fields => ["field", ...]` — fieldsGrouping using fields on the source_id
200
+ - `:shuffle` — shuffleGrouping on the source_id
201
+ - `:global` — globalGrouping on the source_id
202
+ - `:none` — noneGrouping on the source_id
203
+ - `:all` — allGrouping on the source_id
204
+ - `:direct` — directGrouping on the source_id
205
+
206
+ #### configure statement
207
+
208
+ ```ruby
209
+ configure topology_name do |env|
210
+ configuration_field value
211
+ ...
212
+ end
213
+ ```
214
+
215
+ The `configure` statement is **optional**.
216
+
217
+ - `topology_name` — alternate topology name (**default** is topology class name)
218
+ - `env` — is set to `:local` or `:cluster` for you to set enviroment specific configurations
219
+ - `config_attribute` — the Storm Config attribute name. See Storm for complete list. The attribute name correspond to the Java setter method, without the "set" prefix and the suffix converted from CamelCase to underscore. Ex.: `setMaxTaskParallelism` is `:max_task_parallelism`.
220
+ - `:debug`
221
+ - `:max_task_parallelism`
222
+ - `:num_workers`
223
+ - `:max_spout_pending`
224
+ - ...
225
+
226
+ #### on_submit statement
227
+
228
+ ```ruby
229
+ on_submit do |env|
230
+ ...
231
+ end
232
+ ```
233
+
234
+ The `on_submit` statement is **optional**. Use it to execute code after the topology submission.
235
+
236
+ - `env` — is set to `:local` or `:cluster`
237
+
238
+ For example, you can use `on_submit` to shutdown the LocalCluster after some time. The LocalCluster instance is available usign the `cluster` method.
239
+
240
+ ```ruby
241
+ on_submit do |env|
242
+ if env == :local
243
+ sleep(5)
244
+ cluster.shutdown
245
+ end
246
+ end
247
+ ```
248
+
249
+ #### Examples
250
+
251
+ - [ExclamationTopology](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/exclamation_topology.rb)
252
+ - [ExclamationTopology2](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/exclamation_topology2.rb)
253
+ - [WordCountTopology](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/word_count_topology.rb)
254
+ - [RedisWordCountTopology](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/redis_word_count_topology.rb)
255
+
256
+ ### Spout DSL
257
+
258
+ ```ruby
259
+ require 'red_storm'
260
+
261
+ class MySpout < RedStorm::SimpleSpout
262
+ set spout_attribute => value
263
+ ...
264
+
265
+ output_fields :field, ...
266
+
267
+ on_send options do
268
+ ...
269
+ end
270
+
271
+ on_init do
272
+ ...
273
+ end
274
+
275
+ on_close do
276
+ ...
277
+ end
278
+
279
+ on_ack do |msg_id|
280
+ ...
281
+ end
282
+
283
+ on_fail do |msg_id|
284
+ ...
285
+ end
286
+ end
287
+ ```
288
+
289
+ #### set statement
290
+
291
+ ```ruby
292
+ set spout_attribute => value
293
+ ```
294
+
295
+ The `set` statement is **optional**. Use it to set spout specific attributes.
296
+
297
+ - `spout_attributes`
298
+ - `:is_distributed` — set to `true` for a distributed spout (**default** is `false`)
299
+
300
+ #### output_fields statement
301
+
302
+ ```ruby
303
+ output_fields :field, ...
304
+ ```
305
+
306
+ Define the output fields for this spout.
307
+
308
+ - `:field` — the field name, can be symbol or string.
309
+
310
+ #### on_send statement
311
+
312
+ ```ruby
313
+ on_send options do
314
+ ...
315
+ end
316
+ ```
317
+
318
+ `on_send` relates to the Java spout `nextTuple` method and is called periodically by storm to allow the spout to output a tuple. When using auto-emit (default), the block return value will be auto emited. A single value return will be emited as a single-field tuple. An array of values `[a, b]` will be emited as a multiple-fields tuple. Normally a spout [should only output a single tuple per on_send invocation](https://groups.google.com/forum/#!topic/storm-user/SGwih7vPiDE/discussion).
319
+
320
+ - `:options`
321
+ - `:emit` — set to `false` to disable auto-emit (**default** is `true`)
322
+
323
+ #### on_init statement
324
+
325
+ ```ruby
326
+ on_init do
327
+ ...
328
+ end
329
+ ```
330
+
331
+ `on_init` relates to the Java spout `open` method. When `on_init` is called, the `config`, `context` and `collector` are set to return the Java spout config `Map`, `TopologyContext` and `SpoutOutputCollector`.
332
+
333
+ #### on_close statement
334
+
335
+ ```ruby
336
+ on_close do
337
+ ...
338
+ end
339
+ ```
340
+
341
+ `on_close` relates to the Java spout `close` method.
342
+
343
+ #### on_ack statement
344
+
345
+ ```ruby
346
+ on_ack do |msg_id|
347
+ ...
348
+ end
349
+ ```
350
+
351
+ `on_ack` relates to the Java spout `ack` method.
352
+
353
+ #### on_fail statement
354
+
355
+ ```ruby
356
+ on_fail do |msg_id|
357
+ ...
358
+ end
359
+ ```
360
+
361
+ `on_fail` relates to the Java spout `fail` method.
362
+
363
+ #### Examples
364
+
365
+ - [RandomSentenceSpout](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/random_sentence_spout.rb)
366
+ - [RedisWordSpout](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/redis_word_count_topology.rb)
367
+
368
+ ### Bolt DSL
369
+
370
+ ```ruby
371
+ require 'red_storm'
372
+
373
+ class MyBolt < RedStorm::SimpleBolt
374
+ output_fields :field, ...
375
+
376
+ on_receive options do
377
+ ...
378
+ end
379
+
380
+ on_init do
381
+ ...
382
+ end
383
+
384
+ on_close do
385
+ ...
386
+ end
387
+ end
388
+ ```
389
+
390
+ #### on_receive statement
391
+
392
+ ```ruby
393
+ on_receive options do
394
+ ...
395
+ end
396
+ ```
397
+
398
+ `on_receive` relates to the Java bolt `execute` method and is called upon tuple reception by Storm. When using auto-emit, the block return value will be auto emited. A single value return will be emited as a single-field tuple. An array of values `[a, b]` will be emited as a multiple-fields tuple. An array of arrays `[[a, b], [c, d]]` will be emited as multiple-fields multiple tuples. When not using auto-emit, the `unanchored_emit(value, ...)` and `anchored_emit(tuple, value, ...)` method can be used to emit a single tuple. When using auto-anchor (disabled by default) the sent tuples will be anchored to the received tuple. When using auto-ack (disabled by default) the received tuple will be ack'ed after emitting the return value. When not using auto-ack, the `ack(tuple)` method can be used to ack the tuple.
399
+
400
+ Note that setting auto-ack and auto-anchor is possible **only** when auto-emit is enabled.
401
+
402
+ - `:options`
403
+ - `:emit` — set to `false` to disable auto-emit (**default** is `true`)
404
+ - `:ack` — set to `true` to enable auto-ack (**default** is `false`)
405
+ - `:anchor` — set to `true` to enable auto-anchor (**default** is `false`)
406
+
407
+ #### on_init statement
408
+
409
+ ```ruby
410
+ on_init do
411
+ ...
412
+ end
413
+ ```
414
+
415
+ `on_init` relates to the Java bolt `prepare` method. When `on_init` is called, the `config`, `context` and `collector` are set to return the Java spout config `Map`, `TopologyContext` and `SpoutOutputCollector`.
416
+
417
+ #### on_close statement
418
+
419
+ ```ruby
420
+ on_close do
421
+ ...
422
+ end
423
+ ```
424
+
425
+ `on_close` relates to the Java bolt `cleanup` method.
426
+
427
+ #### Examples
428
+
429
+ - [ExclamationBolt](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/exclamation_bolt.rb)
430
+ - [SplitSentenceBolt](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/split_sentence_bolt.rb)
431
+ - [WordCountBolt](https://github.com/colinsurprenant/redstorm/tree/master/examples/simple/word_count_bolt.rb)
101
432
 
102
- ## author
433
+ ## Author
103
434
  Colin Surprenant, [@colinsurprenant][twitter], [colin.surprenant@needium.com][needium], [colin.surprenant@gmail.com][gmail], [http://github.com/colinsurprenant][github]
104
435
 
105
- ## license
436
+ ## License
106
437
  Apache License, Version 2.0. See the LICENSE.md file.
107
438
 
108
439
  [needium]: colin.surprenant@needium.com