couchpopulator 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,64 @@
1
+ # CouchPopulator: The Idea
2
+ The idea behind this tool is to provide a framework for populating your [CouchDB][couchdb] instances with generated documents. It provides a plug-able system for easy writing own generators. Also the the process, which invokes the generator and manages the insertion to CouchDB, what I call execution engines, are easily exchangeable. The default execution engine uses CouchDB's [`bulk-docs`][bulk_api]-API with configurable chunk-size, concurrent inserts and total chunks to insert.
3
+
4
+ # Warning
5
+ This project is in a very early state. I'm sure it has some serious bugs and it's interface and structure for writing own generators and execution engines will definitely change (maybe significantly). I only test it with Ruby 1.8 on OS X 10.6.2.
6
+
7
+ **Use with caution!**
8
+
9
+ **BUT**: Please, feel free to comment, fork or fill a ticket with bugs and wishes. You may also drop me a message via GitHub, [@tisba](https://twitter.com/tisba) or at [@couchdb on freenode](irc://irc.freenode.net/couchdb).
10
+
11
+
12
+ # Why?
13
+ *"there is tool xy already doing that"* - I don't care (okay, thats not true, I care and I'm always eager to see how others implement stuff). I know that there are some tools providing dumping/loading support for CouchDB. But none is written in ruby and non satisfied my needs (e.g. dynamically generating documents). Nevertheless I wanted to learn how you can write such a tool and get more familiar with CouchDB.
14
+
15
+
16
+ # Getting Started
17
+
18
+ ## Gem
19
+
20
+ sudo gem install couchpopulator
21
+
22
+ ## Building the gem yourself
23
+
24
+ sudo gem install json trollops
25
+ git clone git@github.com:tisba/couchpopulator.git
26
+ cd couchpopulator
27
+ rake build
28
+
29
+ ## Getting help
30
+
31
+ couchpopulator --help
32
+
33
+ ## Custom Generators
34
+ Custom generators only need to implement one method. Have a look:
35
+
36
+ module Generators
37
+ class Example
38
+ class << self
39
+ def generate(count)
40
+ # ...heavy generating action goes here...
41
+ # return array of hashes (documents)
42
+ end
43
+ end
44
+ end
45
+ end
46
+
47
+ generate(count) should return an array of documents. Each document should be an hash that will be encoded in JSON. You can include any object-type you want. CouchPopulator provides some "special" encoding support for Ruby's `time` and `date`. For your own objects you can provide `to_json` and `json_create` used by the [JSON gem][json_gem] to serialise and deserialise it properly.
48
+
49
+
50
+ ## Custom Execution Engines
51
+ Custom execute engines need to implement two methods `troll_options` and `execute`. See `executors/standard.rb` for an example.
52
+
53
+
54
+ # TODO
55
+ - Find out the best strategies for inserting docs to CouchDB and provide execution engines for different approches
56
+ - Implement some more features, like dumping-options for generated documents or load dumped JSON docs to CouchDB
57
+ - Think about a test suite and implement it
58
+ - hunting bugs, make it cleaner, make a gem, ...
59
+
60
+
61
+
62
+ [couchdb]: http://couchdb.apache.org
63
+ [bulk_api]: http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
64
+ [json_gem]: http://flori.github.com/json/
@@ -0,0 +1,4 @@
1
+ #! /usr/bin/ruby
2
+ require 'couchpopulator'
3
+
4
+ CouchPopulator::Initializer.run
@@ -0,0 +1,72 @@
1
+ module Executors
2
+ class Standard
3
+ def initialize(opts={})
4
+ @opts = opts.merge(command_line_options)
5
+ end
6
+
7
+ def command_line_options
8
+ help = StringIO.new
9
+
10
+ opts = Trollop.options do
11
+ version "StandardExecutor v0.1 (c) Sebastian Cohnen, 2009"
12
+ banner <<-BANNER
13
+ This is the StandardExecutor
14
+ BANNER
15
+ opt :docs_per_chunk, "Number of docs per chunk", :default => 2000
16
+ opt :concurrent_inserts, "Number of concurrent inserts", :default => 5
17
+ opt :rounds, "Number of rounds", :default => 2
18
+ opt :preflight, "Generate the docs, but don't write to couch. Use with ", :default => false
19
+ opt :help, "Show this message"
20
+
21
+ educate(help)
22
+ end
23
+
24
+ if opts[:help]
25
+ puts help.rewind.read
26
+ exit
27
+ else
28
+ return opts
29
+ end
30
+ end
31
+
32
+ def execute
33
+ rounds = @opts[:rounds]
34
+ docs_per_chunk = @opts[:docs_per_chunk]
35
+ concurrent_inserts = @opts[:concurrent_inserts]
36
+ generator = @opts[:generator_klass]
37
+
38
+ log = @opts[:logger]
39
+ log << "CouchPopulator's default execution engine has been started."
40
+ log << "Using #{generator.to_s} for generating the documents."
41
+
42
+ total_docs = docs_per_chunk * concurrent_inserts * rounds
43
+ log << "Going to insert #{total_docs} generated docs into #{@opts[:couch_url]}"
44
+ log << "Using #{rounds} rounds of #{concurrent_inserts} concurrent inserts with #{docs_per_chunk} docs each"
45
+
46
+ start_time = Time.now
47
+
48
+ rounds.times do |round|
49
+ log << "Starting with round #{round + 1}"
50
+ concurrent_inserts.times do
51
+ fork do
52
+ # generate payload for bulk_doc
53
+ payload = ({"docs" => generator.generate(docs_per_chunk)}).to_json
54
+
55
+ unless @opts[:generate_only]
56
+ result = CurlAdapter::Invoker.new(@opts[:couch_url]).post(payload)
57
+ else
58
+ log << "Generated chunk..."
59
+ puts payload
60
+ end
61
+ end
62
+ end
63
+ concurrent_inserts.times { Process.wait() }
64
+ end
65
+
66
+ end_time = Time.now
67
+ duration = end_time - start_time
68
+
69
+ log << "Execution time: #{duration}s, inserted #{total_docs}"
70
+ end
71
+ end
72
+ end
@@ -0,0 +1,16 @@
1
+ module Generators
2
+ class Example
3
+ class << self
4
+ def generate(count)
5
+ docs = []
6
+ count.times do
7
+ docs << {
8
+ "title" => "Example",
9
+ "created_at" => Time.now - (rand(7) * 60*60*24)
10
+ }
11
+ end
12
+ docs
13
+ end
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,24 @@
1
+ module CouchPopulator
2
+ class Base
3
+ def initialize(options)
4
+ @opts = options
5
+ @opts[:couch_url] = CouchHelper.get_full_couchurl options[:couch] unless options[:couch].nil?
6
+ @logger = options[:logger]
7
+ end
8
+
9
+ def populate
10
+ @opts[:logger] ||= @logger
11
+
12
+ @opts[:database] ||= database
13
+ @opts[:executor_klass].new(@opts).execute
14
+ end
15
+
16
+ def log(message)
17
+ @logger.log(message)
18
+ end
19
+
20
+ def database
21
+ URI.parse(@opts[:couch_url]).path unless @opts[:couch_url].nil?
22
+ end
23
+ end
24
+ end
@@ -0,0 +1,27 @@
1
+ module CouchPopulator
2
+ # Borrowed from Rails
3
+ # http://github.com/rails/rails/blob/ea0e41d8fa5a132a2d2771e9785833b7663203ac/activesupport/lib/active_support/inflector.rb#L355
4
+ class CouchHelper
5
+ class << self
6
+ def get_full_couchurl(arg)
7
+ arg.match(/^https?:\/\//) ? arg : URI.join('http://127.0.0.1:5984/', arg).to_s
8
+ end
9
+
10
+ def couch_available? (couch_url)
11
+ # TODO this uri-thing is ugly :/
12
+ tmp = URI.parse(couch_url)
13
+ `curl --fail --request GET #{tmp.scheme}://#{tmp.host}:#{tmp.port} 2> /dev/null`
14
+ return $?.exitstatus == 0
15
+ end
16
+
17
+ def database_exists? (db_url)
18
+ `curl --fail --request GET #{db_url} 2> /dev/null`
19
+ return $?.exitstatus == 0
20
+ end
21
+
22
+ def create_db(url)
23
+ !!(JSON.parse(`curl --silent --write-out %{http_code} --request PUT #{url}`))["ok"]
24
+ end
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,94 @@
1
+ module CouchPopulator
2
+ class Initializer
3
+ class << self
4
+
5
+ def run
6
+ # process command line options
7
+ command_line_options
8
+
9
+ # Only check CouchDB-availibilty when needed
10
+ unless command_line_options[:generate_only]
11
+ Trollop.die :couch, "You need at least to provide the database's name" if command_line_options[:couch].nil?
12
+
13
+ # Build the full CouchDB database url
14
+ couch_url = CouchHelper.get_full_couchurl(command_line_options[:couch])
15
+
16
+ # Check for availabilty of couchdb
17
+ Trollop.die :couch, "#{couch_url} is not reachable or ressource does not exist" unless CouchHelper.couch_available?(couch_url)
18
+
19
+ # create database on demand
20
+ if command_line_options[:create_db]
21
+ # TODO needs to be implemented properly
22
+ # CouchPopulator::CouchHelper.create_db(command_line_options[:couch])
23
+ else
24
+ CouchPopulator::CouchHelper.database_exists? couch_url
25
+ end
26
+ end
27
+
28
+ # Initialize CouchPopulator
29
+ options = ({:executor_klass => executor, :generator_klass => generator, :logger => CouchPopulator::Logger.new(command_line_options[:logfile])}).merge(command_line_options)
30
+ CouchPopulator::Base.new(options).populate
31
+ end
32
+
33
+ # Define some command-line options
34
+ def command_line_options
35
+ @command_line_options ||= Trollop.options do
36
+ version "v0.1 (c) Sebastian Cohnen, 2009"
37
+ banner <<-BANNER
38
+ This is a simple, yet powerfull tool to import large numbers of on-the-fly generated documents into CouchDB.
39
+ It's using concurrency by spawning several curl subprocesses. Documents are generated on-the-fly.
40
+
41
+ See http://github.com/tisba/couchpopulator for more information.
42
+
43
+ Usage:
44
+ ./couchpopulator [OPTIONS] [executor [EXECUTOR-OPTIONS]]
45
+
46
+ To see, what options for 'executor' are:
47
+ ./couchpopulator executor -h
48
+
49
+ OPTIONS:
50
+ BANNER
51
+ opt :couch, "URL of CouchDB Server. You can also provide the name of the target DB only, http://localhost:5984/ will be prepended automatically", :type => String
52
+ opt :create_db, "Create DB if needed.", :default => false
53
+ opt :generator, "Name of the generator-class to use", :default => "Example"
54
+ opt :generate_only, "Generate the docs, but don't write to couch and stdout them instead", :default => false
55
+ opt :logfile, "Redirect info/debug output to specified file instead to stdout", :type => String, :default => ""
56
+ stop_on_unknown
57
+ end
58
+ end
59
+
60
+ # Get the requested generator or die
61
+ def generator
62
+ retried = false
63
+ @generator ||= begin
64
+ generator_klass = CouchPopulator::MiscHelper.camelize_and_constantize("generators/#{command_line_options[:generator]}")
65
+ rescue NameError
66
+ begin
67
+ require File.join(File.dirname(__FILE__), "../../generators/#{command_line_options[:generator]}.rb")
68
+ rescue LoadError; end # just catch, do nothing
69
+ retry if (retried = !retried)
70
+ ensure
71
+ Trollop.die :generator, "Generator must be set, a valid class-name and respond to generate(n)" if generator_klass.nil?
72
+ generator_klass
73
+ end
74
+ end
75
+
76
+ # Get the exexcutor (defaults to standard) or die
77
+ def executor
78
+ retried = false
79
+ @executor ||= begin
80
+ executor_cmd ||= ARGV.shift || "standard"
81
+ executor_klass = CouchPopulator::MiscHelper.camelize_and_constantize("executors/#{executor_cmd}")
82
+ rescue NameError
83
+ begin
84
+ require File.join(File.dirname(__FILE__), "../../executors/#{executor_cmd}.rb")
85
+ rescue NameError, LoadError; end # just catch, do nothing
86
+ retry if (retried = !retried)
87
+ ensure
88
+ Trollop.die "Executor must be set and a valid class-name" if executor_klass.nil?
89
+ executor_klass
90
+ end
91
+ end
92
+ end
93
+ end
94
+ end
@@ -0,0 +1,61 @@
1
+ module CouchPopulator
2
+ class MiscHelper
3
+ class << self
4
+
5
+ def camelize_and_constantize(lower_case_and_underscored_word)
6
+ constantize(camelize(lower_case_and_underscored_word))
7
+ end
8
+
9
+ def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
10
+ if first_letter_in_uppercase
11
+ lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
12
+ else
13
+ lower_case_and_underscored_word.first.downcase + camelize(lower_case_and_underscored_word)[1..-1]
14
+ end
15
+ end
16
+
17
+ # Ruby 1.9 introduces an inherit argument for Module#const_get and
18
+ # #const_defined? and changes their default behavior.
19
+ if Module.method(:const_get).arity == 1
20
+ # Tries to find a constant with the name specified in the argument string:
21
+ #
22
+ # "Module".constantize # => Module
23
+ # "Test::Unit".constantize # => Test::Unit
24
+ #
25
+ # The name is assumed to be the one of a top-level constant, no matter whether
26
+ # it starts with "::" or not. No lexical context is taken into account:
27
+ #
28
+ # C = 'outside'
29
+ # module M
30
+ # C = 'inside'
31
+ # C # => 'inside'
32
+ # "C".constantize # => 'outside', same as ::C
33
+ # end
34
+ #
35
+ # NameError is raised when the name is not in CamelCase or the constant is
36
+ # unknown.
37
+ def constantize(camel_cased_word)
38
+ names = camel_cased_word.split('::')
39
+ names.shift if names.empty? || names.first.empty?
40
+
41
+ constant = Object
42
+ names.each do |name|
43
+ constant = constant.const_defined?(name) ? constant.const_get(name) : constant.const_missing(name)
44
+ end
45
+ constant
46
+ end
47
+ else
48
+ def constantize(camel_cased_word) #:nodoc:
49
+ names = camel_cased_word.split('::')
50
+ names.shift if names.empty? || names.first.empty?
51
+
52
+ constant = Object
53
+ names.each do |name|
54
+ constant = constant.const_get(name, false) || constant.const_missing(name)
55
+ end
56
+ constant
57
+ end
58
+ end
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,13 @@
1
+ require 'rubygems'
2
+ require 'trollop'
3
+ require 'uri'
4
+
5
+ require 'json/add/rails'
6
+ require 'json/add/core'
7
+
8
+ require File.join(File.dirname(__FILE__), 'couchpopulator.rb')
9
+ require File.join(File.dirname(__FILE__), 'curl_adapter.rb')
10
+ require File.join(File.dirname(__FILE__), 'generator.rb')
11
+ require File.join(File.dirname(__FILE__), 'logger.rb')
12
+
13
+ Dir.glob(File.join(File.dirname(__FILE__), 'couchpopulator/*.rb')).each {|f| require f }
@@ -0,0 +1,36 @@
1
+ class CurlAdapter
2
+ class Response
3
+ attr_reader :http_response_code
4
+ attr_reader :time_total
5
+
6
+ def initialize(http_response_code, time_total)
7
+ @http_response_code = http_response_code
8
+ @time_total = time_total
9
+ end
10
+
11
+ def inspect
12
+ "#{@http_response_code} #{@time_total} sec"
13
+ end
14
+ end
15
+
16
+ class Invoker
17
+ attr_reader :db_url
18
+
19
+ def initialize(db_url)
20
+ @db_url = db_url
21
+ end
22
+
23
+ def post(payload)
24
+ cmd = "curl -T - -X POST #{@db_url}/_bulk_docs -w\"%{http_code}\ %{time_total}\" -o out.file 2> /dev/null"
25
+ curl_io = IO.popen(cmd, "w+")
26
+ curl_io.puts payload
27
+ curl_io.close_write
28
+ result = CurlAdapter::Response.new(*curl_io.gets.split(" "))
29
+ end
30
+ end
31
+ end
32
+
33
+ # TODO:
34
+ # Keep-Alive mit curl? wäre geil...
35
+
36
+
data/lib/generator.rb ADDED
@@ -0,0 +1,18 @@
1
+ # Some nasty Array-Monkeypatches
2
+ class Array
3
+ def rand
4
+ self[Kernel.rand(length)]
5
+ end unless Array.methods.include?("rand")
6
+
7
+ # works like rand, but returns n random elements from self
8
+ # if n >= self.lenth, self is returned
9
+ def randn(n)
10
+ return self if n >= length
11
+ ret = []
12
+ while (ret.length < n) do
13
+ dummy = rand
14
+ ret << dummy unless ret.member?(dummy)
15
+ end
16
+ ret
17
+ end unless Array.methods.include?("randn")
18
+ end
data/lib/logger.rb ADDED
@@ -0,0 +1,16 @@
1
+ module CouchPopulator
2
+ class Logger
3
+ def initialize(logfile)
4
+ @out = logfile.empty? ? $stdout : File.new(logfile, "a")
5
+ end
6
+
7
+ def log(message)
8
+ t = Time.now
9
+ @out << "#{t.strftime("%Y-%m-%d %H:%M:%S")}:#{t.usec} :: #{message} \n"
10
+ end
11
+
12
+ def <<(message)
13
+ log(message)
14
+ end
15
+ end
16
+ end
metadata ADDED
@@ -0,0 +1,91 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: couchpopulator
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Sebastian Cohnen
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-11-16 00:00:00 +01:00
13
+ default_executable: couchpopulator
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: json
17
+ type: :runtime
18
+ version_requirement:
19
+ version_requirements: !ruby/object:Gem::Requirement
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: 1.2.0
24
+ version:
25
+ - !ruby/object:Gem::Dependency
26
+ name: trollop
27
+ type: :runtime
28
+ version_requirement:
29
+ version_requirements: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: "1.15"
34
+ version:
35
+ description: flexible tool for populating CouchDB with generated documents
36
+ email: sebastian.cohnen@gmx.net
37
+ executables:
38
+ - couchpopulator
39
+ extensions: []
40
+
41
+ extra_rdoc_files:
42
+ - README.md
43
+ files:
44
+ - bin/couchpopulator
45
+ - executors/standard.rb
46
+ - generators/example.rb
47
+ - lib/couchpopulator.rb
48
+ - lib/couchpopulator/base.rb
49
+ - lib/couchpopulator/couch_helper.rb
50
+ - lib/couchpopulator/initializer.rb
51
+ - lib/couchpopulator/misc_helper.rb
52
+ - lib/curl_adapter.rb
53
+ - lib/generator.rb
54
+ - lib/logger.rb
55
+ - README.md
56
+ has_rdoc: true
57
+ homepage: http://github.com/tisba/couchpopulator
58
+ licenses: []
59
+
60
+ post_install_message:
61
+ rdoc_options:
62
+ - --charset=UTF-8
63
+ require_paths:
64
+ - lib
65
+ - lib
66
+ - generators
67
+ - executors
68
+ - lib
69
+ - generators
70
+ - executors
71
+ required_ruby_version: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: "0"
76
+ version:
77
+ required_rubygems_version: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ version: "0"
82
+ version:
83
+ requirements: []
84
+
85
+ rubyforge_project:
86
+ rubygems_version: 1.3.5
87
+ signing_key:
88
+ specification_version: 3
89
+ summary: flexible tool for populating CouchDB with generated documents
90
+ test_files: []
91
+