rubberband_flamethrower 0.2.1 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,87 +1,51 @@
1
1
  = Rubberband Flamethrower
2
2
 
3
- Rubberband Flamethrower is a collection of scripts for dealing with faked Elastic Search data.
4
-
5
- This readme is mostly out of date because I am changing this code into a gem but I'm leaving it here and will edit when the gem conversion is completed.
6
-
7
- The main script "inserter.rb" is a script for rapidly inserting test data into Elastic Search. It inserts fake "tweet" type objects into a "twitter" index on a local Elastic Search server at localhost:9200. It runs in an infinite loop until you stop it.
8
-
9
- The script "retriever.rb" also runs in a constant loop, doing a search on the tweets type in the twitter index for all objects within the date range between 2 and 3 minutes ago and reports the number of objects found. This can be used to easily approximate the maximum speed obtainable for inserting to a local Elastic Search index for a given AWS box size.
3
+ Rubberband Flamethrower is a gem for inserting faked data into an Elastic Search server.
10
4
 
11
5
  == Pre-Requisites
12
6
 
13
- This script has been written with ruby 1.9.3 in mind. It requires two gems: httparty and active_support. Httparty is used to send commands to the Elastic Search server. Active Support is used for the JSON library in core_ext.
14
-
15
- == To run the inserter script:
16
-
17
- The script "inserter.rb" will create and store objects with a message, username, and postDate to a local Elastic Search index called "twitter" for the type "tweet". The message is composed of 6 to 16 random words and capped at 140 characters.
7
+ === Elastic Search
18
8
 
19
- cd /path/to/repository/rubberband_flamethrower
20
- ruby inserter.rb
9
+ You should install and have an Elastic Search node running before trying to use this gem
21
10
 
22
- There are several word lists in the "words" folder which come from SCOWL http://wordlist.sourceforge.net/scowl-readme
11
+ To download Elastic Search, unarchive, and start a node:
23
12
 
24
- The code uses the combination of the 10, 20, and 35 lists by default.
13
+ $ curl -k -L -o elasticsearch-0.20.6.tar.gz http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.6.tar.gz
14
+ $ tar -zxvf elasticsearch-0.20.6.tar.gz
15
+ $ ./elasticsearch-0.20.6/bin/elasticsearch -f
25
16
 
26
- == To run the retriever:
17
+ === Ruby
27
18
 
28
- The retriever script is set up to run in a continuous loop and report the number of objects inserted into Elastic Search during the specified time period. It defaults to a span of one minute (really one minute and one second) that occurred between two and three minutes ago.
19
+ You do not need a rails project to use this gem though it is easier to use if you do. It has been designed with ruby 1.9.1 and above in mind. The sample method of the Array class is used in the code and was not a part of the 1.8.7 release.
29
20
 
30
- cd /path/to/repository/rubberband_flamethrower
31
- ruby retriever.rb
21
+ == Installation
32
22
 
33
- == Benchmarks
23
+ install the gem manually
34
24
 
35
- Running one AWS Small Instance. I installed and started Elastic Search and then cloned this project onto the box and started the script running in a screen session.
25
+ gem install rubberband_flamethrower
36
26
 
37
- Running only the insert script during the time period being sampled by the retriever with it set to the default one minute (and one second) span.
27
+ == Use
38
28
 
39
- A few samples from the run show that we are inserting anywhere from 6,000 to 13,000 documents in Elastic Search.
29
+ Insert into Elastic Search by using the gem through irb
40
30
 
41
- {"query":{"range":{"postDate":{"from":"20130401T20:50:24","to":"20130401T20:51:24"}}}}
42
- 11325
43
- {"query":{"range":{"postDate":{"from":"20130401T20:50:25","to":"20130401T20:51:25"}}}}
44
- 11317
45
- {"query":{"range":{"postDate":{"from":"20130401T21:08:01","to":"20130401T21:09:01"}}}}
46
- 6365
47
- {"query":{"range":{"postDate":{"from":"20130401T21:08:02","to":"20130401T21:09:02"}}}}
48
- 6328
49
- {"query":{"range":{"postDate":{"from":"20130401T21:12:33","to":"20130401T21:13:33"}}}}
50
- 12689
51
- {"query":{"range":{"postDate":{"from":"20130401T21:12:34","to":"20130401T21:13:34"}}}}
52
- 12684
31
+ require 'rubberband_flamethrower'
32
+ RubberbandFlamethrower.send_batch
53
33
 
54
- Running the insert script and retriever script during the time period being sampled by the retriever for a short period did not seem to greatly affect the number, though running the retriever for a long duration did greatly degrade the performance of the inserts and increased load average on the box.
34
+ The send_batch method can be configured by passing parameters. By default it will insert 5000 documents starting with document ID 1 into an Elastic Search index named "twitter" of type "tweet" into a server node located at http://localhost:9200.
55
35
 
56
- Running the retriever on a sample period of an hour (and one second) instead of one minute (and one second) we still see a fairly large variance in the number being inserted per hour and so far have only seen it increasing in time. This is a very small sample and too brief and investigation to draw any real conclusions yet.
36
+ There are 5 parameters accepted by the send_batch method, all of which have a default value if left blank. Here are the parameters in order with their default values: (how_many=5000, starting_id=1, server_url="http://localhost:9200", index="twitter", type="tweet")
57
37
 
58
- {"query":{"range":{"postDate":{"from":"20130401T20:18:43","to":"20130401T21:18:43"}}}}
59
- 377978
60
- {"query":{"range":{"postDate":{"from":"20130401T20:18:44","to":"20130401T21:18:44"}}}}
61
- 378122
62
- {"query":{"range":{"postDate":{"from":"20130401T20:30:59","to":"20130401T21:30:59"}}}}
63
- 462394
64
- {"query":{"range":{"postDate":{"from":"20130401T20:31:00","to":"20130401T21:31:00"}}}}
65
- 462503
66
- {"query":{"range":{"postDate":{"from":"20130401T20:34:19","to":"20130401T21:34:19"}}}}
67
- 485449
68
- {"query":{"range":{"postDate":{"from":"20130401T20:34:20","to":"20130401T21:34:20"}}}}
69
- 485577
70
- {"query":{"range":{"postDate":{"from":"20130401T20:40:10","to":"20130401T21:40:10"}}}}
71
- 529912
72
- {"query":{"range":{"postDate":{"from":"20130401T20:40:11","to":"20130401T21:40:11"}}}}
73
- 529981
38
+ To Insert 10,000 instead of 5,000:
39
+ RubberbandFlamethrower.send_batch(10000)
74
40
 
41
+ To Insert 5,000 starting with the ID 5001
42
+ RubberbandFlamethrower.send_batch(5000,5001)
75
43
 
76
- == Things to Look Into Or Do/Notes
77
- - Does message length varying account for some of the variance in insert speed? How much?
78
- - What role does the size of the pool of words used to construct the random sentences play in the time it takes to insert into the index?
79
- - What happens when we add more nodes to the Elastic Search cluster?
80
- - As the script runs longer and longer on the limited word set does it become increasingly faster or slower at indexing new content? To put it another way, does having a lot that already uses the same words make it faster or does it slow down because it now matches it up to that much more information.
81
- - How realistic is it to use Elastic Search nodes in the same manner as MySQL master/slave relationships where all selects are done from other nodes and all inserts on a master? Is this just common sense good practice, why didn't I see this mentioned anywhere?
82
- - These scripts need to be run from inside the rubberband_flamethrower directory because the script is using a relative link to the random words files.
83
- - Rescue error -> Errno::ECONNREFUSED -> in inserter.rb and report that the server is not running or not accepting connections, check your configuration
44
+ To Insert 2,000 starting with the ID 1 to a server located at http://es.com:9200
45
+ RubberbandFlamethrower.send_batch(2000,1,"http://es.com:9200")
84
46
 
47
+ To put your documents into an index named "facebook" instead of "twitter" with a type of "message" instead of "tweet"
48
+ RubberbandFlamethrower.send_batch(5000,1,"http://localhost:9200","facebook","message")
85
49
 
86
50
  == Contributing to rubberband_flamethrower
87
51
 
data/Rakefile CHANGED
@@ -56,7 +56,7 @@ require File.dirname(__FILE__) + "/lib/rubberband_flamethrower"
56
56
  namespace :rubberband_flamethrower do
57
57
  desc "Insert fake data into Elastic Search, use arguments to change defaults"
58
58
  task :fire, :how_many, :starting_id, :server_url, :index, :type do |t, args|
59
- args.with_defaults(:how_many => 1000, :starting_id => 1, :server_url => "http://localhost:9200", :index => "twitter", :type => "tweet")
59
+ args.with_defaults(:how_many => 5000, :starting_id => 1, :server_url => "http://localhost:9200", :index => "twitter", :type => "tweet")
60
60
  RubberbandFlamethrower.send_batch(args[:how_many].to_i, args[:starting_id].to_i, args[:server_url], args[:index], args[:type])
61
61
  end
62
62
  end
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.1
1
+ 0.2.2
@@ -1,5 +1,5 @@
1
1
  module RubberbandFlamethrower
2
- def self.send_batch(how_many, starting_id, server_url, index, type)
2
+ def self.send_batch(how_many=5000, starting_id=1, server_url="http://localhost:9200", index="twitter", type="tweet")
3
3
  require "active_support/core_ext"
4
4
  require 'httparty'
5
5
  require File.dirname(__FILE__)+"/rubberband_flamethrower/data_generator.rb"
@@ -5,7 +5,7 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = "rubberband_flamethrower"
8
- s.version = "0.2.1"
8
+ s.version = "0.2.2"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Michael Orr"]
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rubberband_flamethrower
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.2.2
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -13,7 +13,7 @@ date: 2013-04-04 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: httparty
16
- requirement: &10372660 !ruby/object:Gem::Requirement
16
+ requirement: &21031020 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ~>
@@ -21,10 +21,10 @@ dependencies:
21
21
  version: 0.10.2
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *10372660
24
+ version_requirements: *21031020
25
25
  - !ruby/object:Gem::Dependency
26
26
  name: activesupport
27
- requirement: &10372160 !ruby/object:Gem::Requirement
27
+ requirement: &21030520 !ruby/object:Gem::Requirement
28
28
  none: false
29
29
  requirements:
30
30
  - - ~>
@@ -32,10 +32,10 @@ dependencies:
32
32
  version: 3.2.13
33
33
  type: :runtime
34
34
  prerelease: false
35
- version_requirements: *10372160
35
+ version_requirements: *21030520
36
36
  - !ruby/object:Gem::Dependency
37
37
  name: rdoc
38
- requirement: &10371680 !ruby/object:Gem::Requirement
38
+ requirement: &21030040 !ruby/object:Gem::Requirement
39
39
  none: false
40
40
  requirements:
41
41
  - - ~>
@@ -43,10 +43,10 @@ dependencies:
43
43
  version: '3.12'
44
44
  type: :development
45
45
  prerelease: false
46
- version_requirements: *10371680
46
+ version_requirements: *21030040
47
47
  - !ruby/object:Gem::Dependency
48
48
  name: bundler
49
- requirement: &10371140 !ruby/object:Gem::Requirement
49
+ requirement: &21029540 !ruby/object:Gem::Requirement
50
50
  none: false
51
51
  requirements:
52
52
  - - ~>
@@ -54,10 +54,10 @@ dependencies:
54
54
  version: 1.3.0
55
55
  type: :development
56
56
  prerelease: false
57
- version_requirements: *10371140
57
+ version_requirements: *21029540
58
58
  - !ruby/object:Gem::Dependency
59
59
  name: jeweler
60
- requirement: &10370600 !ruby/object:Gem::Requirement
60
+ requirement: &21029000 !ruby/object:Gem::Requirement
61
61
  none: false
62
62
  requirements:
63
63
  - - ~>
@@ -65,7 +65,7 @@ dependencies:
65
65
  version: 1.8.4
66
66
  type: :development
67
67
  prerelease: false
68
- version_requirements: *10370600
68
+ version_requirements: *21029000
69
69
  description: Use to quickly fill up some indicies in Elastic Search and to retrieve
70
70
  statistics about insertion rates
71
71
  email: michael@cloudspace.com
@@ -115,7 +115,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
115
115
  version: '0'
116
116
  segments:
117
117
  - 0
118
- hash: 2499937689788223217
118
+ hash: 2029033906449151258
119
119
  required_rubygems_version: !ruby/object:Gem::Requirement
120
120
  none: false
121
121
  requirements: