couchpopulator 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,64 @@
1
+ # CouchPopulator: The Idea
2
+ The idea behind this tool is to provide a framework for populating your [CouchDB][couchdb] instances with generated documents. It provides a plug-able system for easy writing own generators. Also the the process, which invokes the generator and manages the insertion to CouchDB, what I call execution engines, are easily exchangeable. The default execution engine uses CouchDB's [`bulk-docs`][bulk_api]-API with configurable chunk-size, concurrent inserts and total chunks to insert.
3
+
4
+ # Warning
5
+ This project is in a very early state. I'm sure it has some serious bugs and it's interface and structure for writing own generators and execution engines will definitely change (maybe significantly). I only test it with Ruby 1.8 on OS X 10.6.2.
6
+
7
+ **Use with caution!**
8
+
9
+ **BUT**: Please, feel free to comment, fork or fill a ticket with bugs and wishes. You may also drop me a message via GitHub, [@tisba](https://twitter.com/tisba) or at [@couchdb on freenode](irc://irc.freenode.net/couchdb).
10
+
11
+
12
+ # Why?
13
+ *"there is tool xy already doing that"* - I don't care (okay, thats not true, I care and I'm always eager to see how others implement stuff). I know that there are some tools providing dumping/loading support for CouchDB. But none is written in ruby and non satisfied my needs (e.g. dynamically generating documents). Nevertheless I wanted to learn how you can write such a tool and get more familiar with CouchDB.
14
+
15
+
16
+ # Getting Started
17
+
18
+ ## Gem
19
+
20
+ sudo gem install couchpopulator
21
+
22
+ ## Building the gem yourself
23
+
24
+ sudo gem install json trollops
25
+ git clone git@github.com:tisba/couchpopulator.git
26
+ cd couchpopulator
27
+ rake build
28
+
29
+ ## Getting help
30
+
31
+ couchpopulator --help
32
+
33
+ ## Custom Generators
34
+ Custom generators only need to implement one method. Have a look:
35
+
36
+ module Generators
37
+ class Example
38
+ class << self
39
+ def generate(count)
40
+ # ...heavy generating action goes here...
41
+ # return array of hashes (documents)
42
+ end
43
+ end
44
+ end
45
+ end
46
+
47
+ generate(count) should return an array of documents. Each document should be an hash that will be encoded in JSON. You can include any object-type you want. CouchPopulator provides some "special" encoding support for Ruby's `time` and `date`. For your own objects you can provide `to_json` and `json_create` used by the [JSON gem][json_gem] to serialise and deserialise it properly.
48
+
49
+
50
+ ## Custom Execution Engines
51
+ Custom execute engines need to implement two methods `troll_options` and `execute`. See `executors/standard.rb` for an example.
52
+
53
+
54
+ # TODO
55
+ - Find out the best strategies for inserting docs to CouchDB and provide execution engines for different approches
56
+ - Implement some more features, like dumping-options for generated documents or load dumped JSON docs to CouchDB
57
+ - Think about a test suite and implement it
58
+ - hunting bugs, make it cleaner, make a gem, ...
59
+
60
+
61
+
62
+ [couchdb]: http://couchdb.apache.org
63
+ [bulk_api]: http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
64
+ [json_gem]: http://flori.github.com/json/
@@ -0,0 +1,4 @@
1
+ #! /usr/bin/ruby
2
+ require 'couchpopulator'
3
+
4
+ CouchPopulator::Initializer.run
@@ -0,0 +1,72 @@
1
+ module Executors
2
+ class Standard
3
+ def initialize(opts={})
4
+ @opts = opts.merge(command_line_options)
5
+ end
6
+
7
+ def command_line_options
8
+ help = StringIO.new
9
+
10
+ opts = Trollop.options do
11
+ version "StandardExecutor v0.1 (c) Sebastian Cohnen, 2009"
12
+ banner <<-BANNER
13
+ This is the StandardExecutor
14
+ BANNER
15
+ opt :docs_per_chunk, "Number of docs per chunk", :default => 2000
16
+ opt :concurrent_inserts, "Number of concurrent inserts", :default => 5
17
+ opt :rounds, "Number of rounds", :default => 2
18
+ opt :preflight, "Generate the docs, but don't write to couch. Use with ", :default => false
19
+ opt :help, "Show this message"
20
+
21
+ educate(help)
22
+ end
23
+
24
+ if opts[:help]
25
+ puts help.rewind.read
26
+ exit
27
+ else
28
+ return opts
29
+ end
30
+ end
31
+
32
+ def execute
33
+ rounds = @opts[:rounds]
34
+ docs_per_chunk = @opts[:docs_per_chunk]
35
+ concurrent_inserts = @opts[:concurrent_inserts]
36
+ generator = @opts[:generator_klass]
37
+
38
+ log = @opts[:logger]
39
+ log << "CouchPopulator's default execution engine has been started."
40
+ log << "Using #{generator.to_s} for generating the documents."
41
+
42
+ total_docs = docs_per_chunk * concurrent_inserts * rounds
43
+ log << "Going to insert #{total_docs} generated docs into #{@opts[:couch_url]}"
44
+ log << "Using #{rounds} rounds of #{concurrent_inserts} concurrent inserts with #{docs_per_chunk} docs each"
45
+
46
+ start_time = Time.now
47
+
48
+ rounds.times do |round|
49
+ log << "Starting with round #{round + 1}"
50
+ concurrent_inserts.times do
51
+ fork do
52
+ # generate payload for bulk_doc
53
+ payload = ({"docs" => generator.generate(docs_per_chunk)}).to_json
54
+
55
+ unless @opts[:generate_only]
56
+ result = CurlAdapter::Invoker.new(@opts[:couch_url]).post(payload)
57
+ else
58
+ log << "Generated chunk..."
59
+ puts payload
60
+ end
61
+ end
62
+ end
63
+ concurrent_inserts.times { Process.wait() }
64
+ end
65
+
66
+ end_time = Time.now
67
+ duration = end_time - start_time
68
+
69
+ log << "Execution time: #{duration}s, inserted #{total_docs}"
70
+ end
71
+ end
72
+ end
@@ -0,0 +1,16 @@
1
+ module Generators
2
+ class Example
3
+ class << self
4
+ def generate(count)
5
+ docs = []
6
+ count.times do
7
+ docs << {
8
+ "title" => "Example",
9
+ "created_at" => Time.now - (rand(7) * 60*60*24)
10
+ }
11
+ end
12
+ docs
13
+ end
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,24 @@
1
+ module CouchPopulator
2
+ class Base
3
+ def initialize(options)
4
+ @opts = options
5
+ @opts[:couch_url] = CouchHelper.get_full_couchurl options[:couch] unless options[:couch].nil?
6
+ @logger = options[:logger]
7
+ end
8
+
9
+ def populate
10
+ @opts[:logger] ||= @logger
11
+
12
+ @opts[:database] ||= database
13
+ @opts[:executor_klass].new(@opts).execute
14
+ end
15
+
16
+ def log(message)
17
+ @logger.log(message)
18
+ end
19
+
20
+ def database
21
+ URI.parse(@opts[:couch_url]).path unless @opts[:couch_url].nil?
22
+ end
23
+ end
24
+ end
@@ -0,0 +1,27 @@
1
+ module CouchPopulator
2
+ # Borrowed from Rails
3
+ # http://github.com/rails/rails/blob/ea0e41d8fa5a132a2d2771e9785833b7663203ac/activesupport/lib/active_support/inflector.rb#L355
4
+ class CouchHelper
5
+ class << self
6
+ def get_full_couchurl(arg)
7
+ arg.match(/^https?:\/\//) ? arg : URI.join('http://127.0.0.1:5984/', arg).to_s
8
+ end
9
+
10
+ def couch_available? (couch_url)
11
+ # TODO this uri-thing is ugly :/
12
+ tmp = URI.parse(couch_url)
13
+ `curl --fail --request GET #{tmp.scheme}://#{tmp.host}:#{tmp.port} 2> /dev/null`
14
+ return $?.exitstatus == 0
15
+ end
16
+
17
+ def database_exists? (db_url)
18
+ `curl --fail --request GET #{db_url} 2> /dev/null`
19
+ return $?.exitstatus == 0
20
+ end
21
+
22
+ def create_db(url)
23
+ !!(JSON.parse(`curl --silent --write-out %{http_code} --request PUT #{url}`))["ok"]
24
+ end
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,94 @@
1
+ module CouchPopulator
2
+ class Initializer
3
+ class << self
4
+
5
+ def run
6
+ # process command line options
7
+ command_line_options
8
+
9
+ # Only check CouchDB-availibilty when needed
10
+ unless command_line_options[:generate_only]
11
+ Trollop.die :couch, "You need at least to provide the database's name" if command_line_options[:couch].nil?
12
+
13
+ # Build the full CouchDB database url
14
+ couch_url = CouchHelper.get_full_couchurl(command_line_options[:couch])
15
+
16
+ # Check for availabilty of couchdb
17
+ Trollop.die :couch, "#{couch_url} is not reachable or ressource does not exist" unless CouchHelper.couch_available?(couch_url)
18
+
19
+ # create database on demand
20
+ if command_line_options[:create_db]
21
+ # TODO needs to be implemented properly
22
+ # CouchPopulator::CouchHelper.create_db(command_line_options[:couch])
23
+ else
24
+ CouchPopulator::CouchHelper.database_exists? couch_url
25
+ end
26
+ end
27
+
28
+ # Initialize CouchPopulator
29
+ options = ({:executor_klass => executor, :generator_klass => generator, :logger => CouchPopulator::Logger.new(command_line_options[:logfile])}).merge(command_line_options)
30
+ CouchPopulator::Base.new(options).populate
31
+ end
32
+
33
+ # Define some command-line options
34
+ def command_line_options
35
+ @command_line_options ||= Trollop.options do
36
+ version "v0.1 (c) Sebastian Cohnen, 2009"
37
+ banner <<-BANNER
38
+ This is a simple, yet powerfull tool to import large numbers of on-the-fly generated documents into CouchDB.
39
+ It's using concurrency by spawning several curl subprocesses. Documents are generated on-the-fly.
40
+
41
+ See http://github.com/tisba/couchpopulator for more information.
42
+
43
+ Usage:
44
+ ./couchpopulator [OPTIONS] [executor [EXECUTOR-OPTIONS]]
45
+
46
+ To see, what options for 'executor' are:
47
+ ./couchpopulator executor -h
48
+
49
+ OPTIONS:
50
+ BANNER
51
+ opt :couch, "URL of CouchDB Server. You can also provide the name of the target DB only, http://localhost:5984/ will be prepended automatically", :type => String
52
+ opt :create_db, "Create DB if needed.", :default => false
53
+ opt :generator, "Name of the generator-class to use", :default => "Example"
54
+ opt :generate_only, "Generate the docs, but don't write to couch and stdout them instead", :default => false
55
+ opt :logfile, "Redirect info/debug output to specified file instead to stdout", :type => String, :default => ""
56
+ stop_on_unknown
57
+ end
58
+ end
59
+
60
+ # Get the requested generator or die
61
+ def generator
62
+ retried = false
63
+ @generator ||= begin
64
+ generator_klass = CouchPopulator::MiscHelper.camelize_and_constantize("generators/#{command_line_options[:generator]}")
65
+ rescue NameError
66
+ begin
67
+ require File.join(File.dirname(__FILE__), "../../generators/#{command_line_options[:generator]}.rb")
68
+ rescue LoadError; end # just catch, do nothing
69
+ retry if (retried = !retried)
70
+ ensure
71
+ Trollop.die :generator, "Generator must be set, a valid class-name and respond to generate(n)" if generator_klass.nil?
72
+ generator_klass
73
+ end
74
+ end
75
+
76
+ # Get the exexcutor (defaults to standard) or die
77
+ def executor
78
+ retried = false
79
+ @executor ||= begin
80
+ executor_cmd ||= ARGV.shift || "standard"
81
+ executor_klass = CouchPopulator::MiscHelper.camelize_and_constantize("executors/#{executor_cmd}")
82
+ rescue NameError
83
+ begin
84
+ require File.join(File.dirname(__FILE__), "../../executors/#{executor_cmd}.rb")
85
+ rescue NameError, LoadError; end # just catch, do nothing
86
+ retry if (retried = !retried)
87
+ ensure
88
+ Trollop.die "Executor must be set and a valid class-name" if executor_klass.nil?
89
+ executor_klass
90
+ end
91
+ end
92
+ end
93
+ end
94
+ end
@@ -0,0 +1,61 @@
1
+ module CouchPopulator
2
+ class MiscHelper
3
+ class << self
4
+
5
+ def camelize_and_constantize(lower_case_and_underscored_word)
6
+ constantize(camelize(lower_case_and_underscored_word))
7
+ end
8
+
9
+ def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
10
+ if first_letter_in_uppercase
11
+ lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
12
+ else
13
+ lower_case_and_underscored_word.first.downcase + camelize(lower_case_and_underscored_word)[1..-1]
14
+ end
15
+ end
16
+
17
+ # Ruby 1.9 introduces an inherit argument for Module#const_get and
18
+ # #const_defined? and changes their default behavior.
19
+ if Module.method(:const_get).arity == 1
20
+ # Tries to find a constant with the name specified in the argument string:
21
+ #
22
+ # "Module".constantize # => Module
23
+ # "Test::Unit".constantize # => Test::Unit
24
+ #
25
+ # The name is assumed to be the one of a top-level constant, no matter whether
26
+ # it starts with "::" or not. No lexical context is taken into account:
27
+ #
28
+ # C = 'outside'
29
+ # module M
30
+ # C = 'inside'
31
+ # C # => 'inside'
32
+ # "C".constantize # => 'outside', same as ::C
33
+ # end
34
+ #
35
+ # NameError is raised when the name is not in CamelCase or the constant is
36
+ # unknown.
37
+ def constantize(camel_cased_word)
38
+ names = camel_cased_word.split('::')
39
+ names.shift if names.empty? || names.first.empty?
40
+
41
+ constant = Object
42
+ names.each do |name|
43
+ constant = constant.const_defined?(name) ? constant.const_get(name) : constant.const_missing(name)
44
+ end
45
+ constant
46
+ end
47
+ else
48
+ def constantize(camel_cased_word) #:nodoc:
49
+ names = camel_cased_word.split('::')
50
+ names.shift if names.empty? || names.first.empty?
51
+
52
+ constant = Object
53
+ names.each do |name|
54
+ constant = constant.const_get(name, false) || constant.const_missing(name)
55
+ end
56
+ constant
57
+ end
58
+ end
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,13 @@
1
+ require 'rubygems'
2
+ require 'trollop'
3
+ require 'uri'
4
+
5
+ require 'json/add/rails'
6
+ require 'json/add/core'
7
+
8
+ require File.join(File.dirname(__FILE__), 'couchpopulator.rb')
9
+ require File.join(File.dirname(__FILE__), 'curl_adapter.rb')
10
+ require File.join(File.dirname(__FILE__), 'generator.rb')
11
+ require File.join(File.dirname(__FILE__), 'logger.rb')
12
+
13
+ Dir.glob(File.join(File.dirname(__FILE__), 'couchpopulator/*.rb')).each {|f| require f }
@@ -0,0 +1,36 @@
1
+ class CurlAdapter
2
+ class Response
3
+ attr_reader :http_response_code
4
+ attr_reader :time_total
5
+
6
+ def initialize(http_response_code, time_total)
7
+ @http_response_code = http_response_code
8
+ @time_total = time_total
9
+ end
10
+
11
+ def inspect
12
+ "#{@http_response_code} #{@time_total} sec"
13
+ end
14
+ end
15
+
16
+ class Invoker
17
+ attr_reader :db_url
18
+
19
+ def initialize(db_url)
20
+ @db_url = db_url
21
+ end
22
+
23
+ def post(payload)
24
+ cmd = "curl -T - -X POST #{@db_url}/_bulk_docs -w\"%{http_code}\ %{time_total}\" -o out.file 2> /dev/null"
25
+ curl_io = IO.popen(cmd, "w+")
26
+ curl_io.puts payload
27
+ curl_io.close_write
28
+ result = CurlAdapter::Response.new(*curl_io.gets.split(" "))
29
+ end
30
+ end
31
+ end
32
+
33
+ # TODO:
34
+ # Keep-Alive mit curl? wäre geil...
35
+
36
+
data/lib/generator.rb ADDED
@@ -0,0 +1,18 @@
1
+ # Some nasty Array-Monkeypatches
2
+ class Array
3
+ def rand
4
+ self[Kernel.rand(length)]
5
+ end unless Array.methods.include?("rand")
6
+
7
+ # works like rand, but returns n random elements from self
8
+ # if n >= self.lenth, self is returned
9
+ def randn(n)
10
+ return self if n >= length
11
+ ret = []
12
+ while (ret.length < n) do
13
+ dummy = rand
14
+ ret << dummy unless ret.member?(dummy)
15
+ end
16
+ ret
17
+ end unless Array.methods.include?("randn")
18
+ end
data/lib/logger.rb ADDED
@@ -0,0 +1,16 @@
1
+ module CouchPopulator
2
+ class Logger
3
+ def initialize(logfile)
4
+ @out = logfile.empty? ? $stdout : File.new(logfile, "a")
5
+ end
6
+
7
+ def log(message)
8
+ t = Time.now
9
+ @out << "#{t.strftime("%Y-%m-%d %H:%M:%S")}:#{t.usec} :: #{message} \n"
10
+ end
11
+
12
+ def <<(message)
13
+ log(message)
14
+ end
15
+ end
16
+ end
metadata ADDED
@@ -0,0 +1,91 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: couchpopulator
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Sebastian Cohnen
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-11-16 00:00:00 +01:00
13
+ default_executable: couchpopulator
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: json
17
+ type: :runtime
18
+ version_requirement:
19
+ version_requirements: !ruby/object:Gem::Requirement
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: 1.2.0
24
+ version:
25
+ - !ruby/object:Gem::Dependency
26
+ name: trollop
27
+ type: :runtime
28
+ version_requirement:
29
+ version_requirements: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: "1.15"
34
+ version:
35
+ description: flexible tool for populating CouchDB with generated documents
36
+ email: sebastian.cohnen@gmx.net
37
+ executables:
38
+ - couchpopulator
39
+ extensions: []
40
+
41
+ extra_rdoc_files:
42
+ - README.md
43
+ files:
44
+ - bin/couchpopulator
45
+ - executors/standard.rb
46
+ - generators/example.rb
47
+ - lib/couchpopulator.rb
48
+ - lib/couchpopulator/base.rb
49
+ - lib/couchpopulator/couch_helper.rb
50
+ - lib/couchpopulator/initializer.rb
51
+ - lib/couchpopulator/misc_helper.rb
52
+ - lib/curl_adapter.rb
53
+ - lib/generator.rb
54
+ - lib/logger.rb
55
+ - README.md
56
+ has_rdoc: true
57
+ homepage: http://github.com/tisba/couchpopulator
58
+ licenses: []
59
+
60
+ post_install_message:
61
+ rdoc_options:
62
+ - --charset=UTF-8
63
+ require_paths:
64
+ - lib
65
+ - lib
66
+ - generators
67
+ - executors
68
+ - lib
69
+ - generators
70
+ - executors
71
+ required_ruby_version: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: "0"
76
+ version:
77
+ required_rubygems_version: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ version: "0"
82
+ version:
83
+ requirements: []
84
+
85
+ rubyforge_project:
86
+ rubygems_version: 1.3.5
87
+ signing_key:
88
+ specification_version: 3
89
+ summary: flexible tool for populating CouchDB with generated documents
90
+ test_files: []
91
+