wukong-load 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,59 @@
1
+ ## OS
2
+ .DS_Store
3
+ Icon
4
+ nohup.out
5
+ .bak
6
+
7
+ *.pem
8
+
9
+ ## EDITORS
10
+ \#*
11
+ .\#*
12
+ \#*\#
13
+ *~
14
+ *.swp
15
+ REVISION
16
+ TAGS*
17
+ tmtags
18
+ *_flymake.*
19
+ *_flymake
20
+ *.tmproj
21
+ .project
22
+ .settings
23
+
24
+ ## COMPILED
25
+ a.out
26
+ *.o
27
+ *.pyc
28
+ *.so
29
+
30
+ ## OTHER SCM
31
+ .bzr
32
+ .hg
33
+ .svn
34
+
35
+ ## PROJECT::GENERAL
36
+
37
+ log/*
38
+ tmp/*
39
+ pkg/*
40
+
41
+ coverage
42
+ rdoc
43
+ doc
44
+ pkg
45
+ .rake_test_cache
46
+ .bundle
47
+ .yardoc
48
+
49
+ .vendor
50
+
51
+ ## PROJECT::SPECIFIC
52
+
53
+ old/*
54
+ docpages
55
+ away
56
+
57
+ .rbx
58
+ Gemfile.lock
59
+ Backup*of*.numbers
data/Gemfile ADDED
@@ -0,0 +1,8 @@
1
+ source :rubygems
2
+
3
+ gemspec
4
+
5
+ group :development do
6
+ gem 'rake', '~> 0.9'
7
+ gem 'rspec', '~> 2'
8
+ end
data/LICENSE.md ADDED
@@ -0,0 +1,95 @@
1
+ # License for Wukong
2
+
3
+ The wukong code is __Copyright (c) 2011, 2012 Infochimps, Inc__
4
+
5
+ This code is licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an **AS IS** BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
10
+
11
+ __________________________________________________________________________
12
+
13
+ # Apache License
14
+
15
+
16
+ Apache License
17
+ Version 2.0, January 2004
18
+ http://www.apache.org/licenses/
19
+
20
+ _TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION_
21
+
22
+ ## 1. Definitions.
23
+
24
+ * **License** shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
25
+
26
+ * **Licensor** shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
27
+
28
+ * **Legal Entity** shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
29
+
30
+ * **You** (or **Your**) shall mean an individual or Legal Entity exercising permissions granted by this License.
31
+
32
+ * **Source** form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
33
+
34
+ * **Object** form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
35
+
36
+ * **Work** shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
37
+
38
+ * **Derivative Works** shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
39
+
40
+ * **Contribution** shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
41
+
42
+ * **Contributor** shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
43
+
44
+ ## 2. Grant of Copyright License.
45
+
46
+ Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
47
+
48
+ ## 3. Grant of Patent License.
49
+
50
+ Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
51
+
52
+ ## 4. Redistribution.
53
+
54
+ You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
55
+
56
+ - (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
57
+ - (b) You must cause any modified files to carry prominent notices stating that You changed the files; and
58
+ - (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
59
+ - (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
60
+
61
+ You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
62
+
63
+ ## 5. Submission of Contributions.
64
+
65
+ Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
66
+
67
+ ## 6. Trademarks.
68
+
69
+ This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
70
+
71
+ ## 7. Disclaimer of Warranty.
72
+
73
+ Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
74
+
75
+ ## 8. Limitation of Liability.
76
+
77
+ In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
78
+
79
+ ## 9. Accepting Warranty or Additional Liability.
80
+
81
+ While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
82
+
83
+ _END OF TERMS AND CONDITIONS_
84
+
85
+ ## APPENDIX: How to apply the Apache License to your work.
86
+
87
+ To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets `[]` replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.
88
+
89
+ > Copyright [yyyy] [name of copyright owner]
90
+ >
91
+ > Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
92
+ >
93
+ > http://www.apache.org/licenses/LICENSE-2.0
94
+ >
95
+ > Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
data/README.md ADDED
@@ -0,0 +1,111 @@
1
+ # Wukong-Load
2
+
3
+ This Wukong plugin makes it easy to load data from the command-line
4
+ into various.
5
+
6
+ It is assumed that you will independently deploy and configure each
7
+ data store yourself (but see
8
+ [Ironfan](http://github.com/infochimps-labs/ironfan)). Once you've
9
+ done that, and once you've written some dataflows with
10
+ [Wukong](http://github.com/infochimps-labs/wukong/tree/3.0.0), you can
11
+ load them into your data stores with `wu-load`.
12
+
13
+ Wukong-Load is **not intended for production use**. It is meant as a
14
+ tool to quickly load data into over the command-line, especially
15
+ useful when developing flows in concert with wu-local.
16
+
17
+ ## Installation & Setup
18
+
19
+ Wukong-Load can be installed as a RubyGem:
20
+
21
+ ```
22
+ $ sudo gem install wukong-hadoop
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ Wukong-Load provides a command-line program `wu-load` you can use to
28
+ load data fed in over STDIN. Get help on `wu-load` by running
29
+
30
+ ```
31
+ $ wu-load --help
32
+ ```
33
+
34
+ and get help for a specific data store with
35
+
36
+ ```
37
+ $ wu-load store_name --help
38
+ ```
39
+
40
+ Further details will depend on the data store you're writing to.
41
+
42
+ ### Elasticsearch Usage
43
+
44
+ Lets you load JSON-formatted records into an
45
+ [Elasticsearch](http://www.elasticsearch.org) database. See full
46
+ options with
47
+
48
+ ```
49
+ $ wu-load elasticsearch --help
50
+ ```
51
+
52
+ #### Expected Input
53
+
54
+ All input to `wu-load` should be newline-separated, JSON-formatted,
55
+ hash-like record. Some keys in the record will be interpreted as
56
+ metadata about the record or about how to route the record within the
57
+ database but the entire record will be written to the database
58
+ unmodified.
59
+
60
+ A (pretty-printed for clarity -- the real record shouldn't contain
61
+ newlines) record like
62
+
63
+ ```json
64
+ {
65
+ "_index": "publications"
66
+ "_type": "book",
67
+ "ISBN": "0553573403",
68
+ "title": "A Game of Thrones",
69
+ "author": "George R. R. Martin",
70
+ "description": "The first of half a hundred novels to come out since...",
71
+ ...
72
+ }
73
+ ```
74
+
75
+ might use the `_index` and `_type` fields as metadata but the
76
+ **whole** record will be written.
77
+
78
+ #### Connecting
79
+
80
+ `wu-load` has a default host (localhost) and port (9200) it tries to
81
+ connect to but you can change these:
82
+
83
+ ```
84
+ $ cat data.json | wu-load elasticsearch --host=10.122.123.124 --port=80
85
+ ```
86
+
87
+ All queries will be sent to this address.
88
+
89
+ #### Routing
90
+
91
+ Elasticsearch stores data in several *indices* which each contain
92
+ *documents* of various *types*.
93
+
94
+ `wu-load` loads each document into default index (`wukong`) and type
95
+ (`streaming_record`), but you can change these:
96
+
97
+ ```
98
+ $ cat data.json | wu-load elasticsearch --host=10.123.123.123 --index=publication --es_type=book
99
+ ```
100
+
101
+ ##### Creates vs. Updates
102
+
103
+ If an input document contains a value for the field `_id` then that
104
+ value will be as the ID of the record when written, possibly
105
+ overwriting a record that already exists -- an update.
106
+
107
+ You can change the field you use for the Elasticsearch ID property:
108
+
109
+ ```
110
+ $ cat data.json | wu-load elasticsearch --host=10.123.123.123 --index=media --es_type=books --id_field="ISBN"
111
+ ```
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ require 'bundler'
2
+ Bundler::GemHelper.install_tasks
3
+
4
+ require 'rspec/core/rake_task'
5
+ RSpec::Core::RakeTask.new(:specs)
6
+
7
+ require 'yard'
8
+ YARD::Rake::YardocTask.new
9
+
10
+ task :default => [:specs]
data/bin/wu-load ADDED
@@ -0,0 +1,50 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'wukong-load'
4
+ settings = Wukong::Load::Configuration
5
+ settings.use(:commandline)
6
+
7
+ settings.usage = "usage: #{File.basename($0)} DATA_STORE [ --param=value | -p value | --param | -p]"
8
+ settings.description = <<-EOF
9
+ wu-load is a tool for loading data from Wukong into data stores. It
10
+ supports multiple, pluggable data stores, including:
11
+
12
+ Supported data stores:
13
+
14
+ elasticsearch
15
+ hbase (planned)
16
+ mongob (planned)
17
+ mysql (planned)
18
+
19
+ Get specific help for a data store with
20
+
21
+ $ wu-load store_name --help
22
+
23
+ Elasticsearch Usage:
24
+
25
+ Pass newline-separated, JSON-formatted records over STDIN:
26
+
27
+ $ cat data.json | wu-load elasticsearch
28
+
29
+ By default, wu-load attempts to write each input record to a local
30
+ Elasticsearch database. Records will be routed to a default
31
+ Elasticsearch index and type. Records with an '_id' field will be
32
+ considered updates. The rest will be creates. You can override these
33
+ options:
34
+
35
+ $ cat data.json | wu-load elasticsearch --host=10.123.123.123 --index=my_app --es_type=my_obj --id_field="doc_id"
36
+
37
+ Params:
38
+ --host=String Elasticsearch host, without HTTP prefix [Default: localhost]
39
+ --port=Integer Port on Elasticsearch host [Default: 9200]
40
+ --index=String Default Elasticsearch index for records [Default: wukong]
41
+ --es_type=String Default Elasticsearch type for records [Default: streaming_record]
42
+ --index_field=String Field in each record naming desired Elasticsearch index
43
+ --es_type_field=String Field in each record naming desired Elasticsearch type
44
+ --id_field=String Field in each record naming providing ID of existing Elasticsearch record to update
45
+ EOF
46
+
47
+ require 'wukong/boot' ; Wukong.boot!(settings)
48
+
49
+ require 'wukong-load/runner'
50
+ Wukong::Load::Runner.run(settings)
@@ -0,0 +1,10 @@
1
+ require 'wukong'
2
+
3
+ module Wukong
4
+ # Loads data from the command-line into data stores.
5
+ module Load
6
+ end
7
+ end
8
+ require_relative 'wukong-load/version'
9
+ require_relative 'wukong-load/configuration'
10
+ require_relative 'wukong-load/elasticsearch'
@@ -0,0 +1,8 @@
1
+ module Wukong
2
+ module Load
3
+
4
+ # All local configuration for Wukong-Load lives within this object.
5
+ Configuration = Configliere::Param.new unless defined? Configuration
6
+
7
+ end
8
+ end
@@ -0,0 +1,99 @@
1
+ # This should be extracted into Wonderdog and inserted via the Wukong
2
+ # plugin mechanism.
3
+
4
+ require_relative('loader')
5
+
6
+ module Wukong
7
+ module Load
8
+
9
+ # Loads data into Elasticsearch
10
+ class ElasticsearchLoader < Loader
11
+
12
+ field :host, String, :default => 'localhost'
13
+ field :port, Integer,:default => 9200
14
+ field :index, String, :default => 'wukong'
15
+ field :es_type, String, :default => 'streaming_record'
16
+ field :index_field, String, :default => '_index'
17
+ field :es_type_field, String, :default => '_es_type'
18
+ field :id_field, String, :default => '_id'
19
+
20
+ attr_accessor :connection
21
+
22
+ def setup
23
+ h = host.gsub(%r{^http://},'')
24
+ log.debug("Connecting to Elasticsearch cluster at #{h}:#{port}...")
25
+ begin
26
+ self.connection = Net::HTTP.new(h, port)
27
+ self.connection.use_ssl = true if host =~ /^https/
28
+ rescue => e
29
+ raise Error.new(e.message)
30
+ end
31
+ end
32
+
33
+ def load record
34
+ id_for(record) ? request(Net::HTTP::Put, update_path(record), record) : request(Net::HTTP::Post, create_path(record), record)
35
+ end
36
+
37
+ def create_path record
38
+ File.join('/', index_for(record).to_s, es_type_for(record).to_s)
39
+ end
40
+
41
+ def update_path record
42
+ File.join('/', index_for(record).to_s, es_type_for(record).to_s, id_for(record).to_s)
43
+ end
44
+
45
+ def index_for record
46
+ record[index_field] || self.index
47
+ end
48
+
49
+ def es_type_for record
50
+ record[es_type_field] || self.es_type
51
+ end
52
+
53
+ def id_for record
54
+ record[id_field]
55
+ end
56
+
57
+ def request request_type, path, record
58
+ perform_request(create_request(request_type, path, record))
59
+ end
60
+
61
+ private
62
+
63
+ def create_request request_type, path, record
64
+ request_type.new(path).tap do |req|
65
+ req.body = MultiJson.dump(record)
66
+ end
67
+ end
68
+
69
+ def perform_request req
70
+ begin
71
+ response = connection.request(req)
72
+ status = response.code.to_i
73
+ if (200..201).include?(status)
74
+ log.info("#{req.class} #{req.path} #{status}")
75
+ else
76
+ handle_elasticsearch_error(status, response)
77
+ end
78
+ rescue => e
79
+ log.error("#{e.class} - #{e.message}")
80
+ end
81
+ end
82
+
83
+ def handle_elasticsearch_error response
84
+ begin
85
+ error = MultiJson.load(response.body)
86
+ log.error("#{response.code}: #{error['error']}")
87
+ rescue => e
88
+ log.error("Received a response code of #{status}: #{response.body}")
89
+ end
90
+ end
91
+
92
+ register :elasticsearch_loader
93
+
94
+ end
95
+ end
96
+ end
97
+
98
+
99
+
@@ -0,0 +1,16 @@
1
+ module Wukong
2
+ module Load
3
+
4
+ # Base class from which to build Loaders.
5
+ class Loader < Wukong::Processor::FromJson
6
+
7
+ def process line
8
+ super(line) { |record| load(record) }
9
+ end
10
+
11
+ def load record
12
+ end
13
+
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,48 @@
1
+ module Wukong
2
+ module Load
3
+ class Runner
4
+
5
+ include Logging
6
+
7
+ def self.run settings
8
+ begin
9
+ new(settings).run
10
+ rescue Error => e
11
+ log.error(e.message)
12
+ exit(127)
13
+ end
14
+ end
15
+
16
+ attr_accessor :settings
17
+ def initialize settings
18
+ self.settings = settings
19
+ end
20
+
21
+ def args
22
+ settings.rest
23
+ end
24
+
25
+ def data_store_name
26
+ args.first
27
+ end
28
+
29
+ def processor_name
30
+ case data_store_name
31
+ when 'elasticsearch' then :elasticsearch_loader
32
+ when nil
33
+ settings.dump_help
34
+ exit(1)
35
+ else
36
+ raise Error.new("No loader defined for data store: #{data_store_name}")
37
+ end
38
+ end
39
+
40
+ def run
41
+ EM.run do
42
+ StupidServer.new(processor_name, settings).run!
43
+ end
44
+ end
45
+
46
+ end
47
+ end
48
+ end
@@ -0,0 +1,6 @@
1
+ module Wukong
2
+ module Load
3
+ # The current version of Wukong-Load
4
+ VERSION = '0.0.2'
5
+ end
6
+ end
@@ -0,0 +1,7 @@
1
+ require 'wukong-load'
2
+ require 'wukong/spec_helpers'
3
+
4
+ RSpec.configure do |config|
5
+ config.mock_with :rspec
6
+ include Wukong::SpecHelpers
7
+ end
@@ -0,0 +1,140 @@
1
+ require 'spec_helper'
2
+
3
+ describe Wukong::Load::ElasticsearchLoader do
4
+
5
+ let(:record) { {'text' => 'hi' } }
6
+ let(:record_with_index) { {'text' => 'hi', '_index' => 'custom_index' } }
7
+ let(:record_with_custom_index) { {'text' => 'hi', '_custom_index' => 'custom_index' } }
8
+ let(:record_with_es_type) { {'text' => 'hi', '_es_type' => 'custom_es_type' } }
9
+ let(:record_with_custom_es_type) { {'text' => 'hi', '_custom_es_type' => 'custom_es_type' } }
10
+ let(:record_with_id) { {'text' => 'hi', '_id' => 'the_id' } }
11
+ let(:record_with_custom_id) { {'text' => 'hi', '_custom_id' => 'the_id' } }
12
+
13
+ it_behaves_like 'a processor', :named => :elasticsearch_loader
14
+
15
+ context "without an Elasticsearch available" do
16
+ before do
17
+ Net::HTTP.should_receive(:new).and_raise(StandardError)
18
+ end
19
+
20
+ it "raises an error on setup" do
21
+ expect { processor(:elasticsearch_loader).setup }.to raise_error(Wukong::Error)
22
+ end
23
+ end
24
+
25
+ context "routes" do
26
+ context "all records" do
27
+ it "to a default index" do
28
+ proc = processor(:elasticsearch_loader)
29
+ proc.index_for(record).should == proc.index
30
+ end
31
+ it "to a given index" do
32
+ processor(:elasticsearch_loader, :index => 'custom_index').index_for(record).should == 'custom_index'
33
+ end
34
+ it "to a default type" do
35
+ proc = processor(:elasticsearch_loader)
36
+ proc.es_type_for(record).should == proc.es_type
37
+ end
38
+ it "to a given type" do
39
+ processor(:elasticsearch_loader, :es_type => 'custom_es_type').es_type_for(record).should == 'custom_es_type'
40
+ end
41
+ end
42
+
43
+ context "records having a value for" do
44
+ it "default index field to the given index" do
45
+ processor(:elasticsearch_loader).index_for(record_with_index).should == 'custom_index'
46
+ end
47
+ it "given index field to the given index" do
48
+ processor(:elasticsearch_loader, :index_field => '_custom_index').index_for(record_with_custom_index).should == 'custom_index'
49
+ end
50
+ it "default type field to the given type" do
51
+ processor(:elasticsearch_loader).es_type_for(record_with_es_type).should == 'custom_es_type'
52
+ end
53
+ it "given type field to the given type" do
54
+ processor(:elasticsearch_loader, :es_type_field => '_custom_es_type').es_type_for(record_with_custom_es_type).should == 'custom_es_type'
55
+ end
56
+ end
57
+ end
58
+
59
+ context "detects IDs" do
60
+ it "based on the absence of a default ID field" do
61
+ processor(:elasticsearch_loader).id_for(record).should be_nil
62
+ end
63
+ it "based on the value of a default ID field" do
64
+ processor(:elasticsearch_loader).id_for(record_with_id).should == 'the_id'
65
+ end
66
+ it "based on the value of a custom ID field" do
67
+ processor(:elasticsearch_loader, :id_field => '_custom_id').id_for(record_with_custom_id).should == 'the_id'
68
+ end
69
+ end
70
+
71
+ context "having made a connection to the database" do
72
+
73
+ let(:connection) { double() }
74
+ let(:log) { double() }
75
+ subject { processor(:elasticsearch_loader) }
76
+ before do
77
+ Net::HTTP.should_receive(:new).and_return(connection)
78
+ subject.stub!(:log).and_return(log)
79
+ end
80
+
81
+
82
+ context "sends" do
83
+ it "create requests on a record without an ID" do
84
+ subject.should_receive(:request).with(Net::HTTP::Post, '/foo/bar', kind_of(Hash))
85
+ subject.load({'_index' => 'foo', '_es_type' => 'bar'})
86
+ end
87
+ it "update requests on a record with an ID" do
88
+ subject.should_receive(:request).with(Net::HTTP::Put, '/foo/bar/1', kind_of(Hash))
89
+ subject.load({'_index' => 'foo', '_es_type' => 'bar', '_id' => '1'})
90
+ end
91
+ end
92
+
93
+ context "receives" do
94
+
95
+ let(:ok) do
96
+ mock("Net::HTTPOK").tap do |response|
97
+ response.stub!(:code).and_return('200')
98
+ response.stub!(:body).and_return('{"ok": true}')
99
+ end
100
+ end
101
+ let(:created) do
102
+ mock("Net::HTTPCreated").tap do |response|
103
+ response.stub!(:code).and_return('201')
104
+ response.stub!(:body).and_return('{"created": true}')
105
+ end
106
+ end
107
+ let(:not_found) do
108
+ mock("Net::HTTPNotFound").tap do |response|
109
+ response.stub!(:code).and_return('404')
110
+ response.stub!(:body).and_return('{"error": "Not found"}')
111
+ end
112
+ end
113
+
114
+ context "201 Created" do
115
+ before { connection.should_receive(:request).with(kind_of(Net::HTTP::Post)).and_return(created) }
116
+ it "by logging an INFO message" do
117
+ log.should_receive(:info)
118
+ subject.load(record)
119
+ end
120
+ end
121
+
122
+ context "200 OK" do
123
+ before { connection.should_receive(:request).with(kind_of(Net::HTTP::Put)).and_return(ok) }
124
+ it "by logging an INFO message" do
125
+ log.should_receive(:info)
126
+ subject.load(record_with_id)
127
+ end
128
+ end
129
+
130
+ context "an error response from Elasticsearch" do
131
+ before { connection.should_receive(:request).with(kind_of(Net::HTTP::Post)).and_return(not_found) }
132
+ it "by logging an ERROR message" do
133
+ log.should_receive(:error)
134
+ subject.load(record)
135
+ end
136
+ end
137
+
138
+ end
139
+ end
140
+ end
@@ -0,0 +1,30 @@
1
+ # -*- encoding: utf-8 -*-
2
+ require File.expand_path('../lib/wukong-load/version', __FILE__)
3
+
4
+ Gem::Specification.new do |gem|
5
+ gem.name = 'wukong-load'
6
+ gem.homepage = 'https://github.com/infochimps-labs/wukong-load'
7
+ gem.licenses = ["Apache 2.0"]
8
+ gem.email = 'coders@infochimps.com'
9
+ gem.authors = ['Infochimps', 'Philip (flip) Kromer', 'Travis Dempsey', 'Dhruv Bansal']
10
+ gem.version = Wukong::Load::VERSION
11
+
12
+ gem.summary = 'Load data produced by Wukong processors and dataflows into data stores.'
13
+ gem.description = <<-EOF
14
+ Lets you load data from the command-line into data stores like
15
+
16
+ * Elasticsearch
17
+ * MongoDB
18
+ * HBase
19
+ * MySQL
20
+
21
+ and others.
22
+ EOF
23
+
24
+ gem.files = `git ls-files`.split("\n")
25
+ gem.executables = ['wu-load']
26
+ gem.test_files = gem.files.grep(/^spec/)
27
+ gem.require_paths = ['lib']
28
+
29
+ gem.add_dependency('wukong', '3.0.0.pre3')
30
+ end
metadata ADDED
@@ -0,0 +1,84 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: wukong-load
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.2
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Infochimps
9
+ - Philip (flip) Kromer
10
+ - Travis Dempsey
11
+ - Dhruv Bansal
12
+ autorequire:
13
+ bindir: bin
14
+ cert_chain: []
15
+ date: 2012-12-18 00:00:00.000000000 Z
16
+ dependencies:
17
+ - !ruby/object:Gem::Dependency
18
+ name: wukong
19
+ requirement: !ruby/object:Gem::Requirement
20
+ none: false
21
+ requirements:
22
+ - - '='
23
+ - !ruby/object:Gem::Version
24
+ version: 3.0.0.pre3
25
+ type: :runtime
26
+ prerelease: false
27
+ version_requirements: !ruby/object:Gem::Requirement
28
+ none: false
29
+ requirements:
30
+ - - '='
31
+ - !ruby/object:Gem::Version
32
+ version: 3.0.0.pre3
33
+ description: ! " Lets you load data from the command-line into data stores like\n\n
34
+ \ * Elasticsearch\n * MongoDB\n * HBase\n * MySQL\n\nand others.\n"
35
+ email: coders@infochimps.com
36
+ executables:
37
+ - wu-load
38
+ extensions: []
39
+ extra_rdoc_files: []
40
+ files:
41
+ - .gitignore
42
+ - Gemfile
43
+ - LICENSE.md
44
+ - README.md
45
+ - Rakefile
46
+ - bin/wu-load
47
+ - lib/wukong-load.rb
48
+ - lib/wukong-load/configuration.rb
49
+ - lib/wukong-load/elasticsearch.rb
50
+ - lib/wukong-load/loader.rb
51
+ - lib/wukong-load/runner.rb
52
+ - lib/wukong-load/version.rb
53
+ - spec/spec_helper.rb
54
+ - spec/wukong-load/elasticsearch_spec.rb
55
+ - wukong-load.gemspec
56
+ homepage: https://github.com/infochimps-labs/wukong-load
57
+ licenses:
58
+ - Apache 2.0
59
+ post_install_message:
60
+ rdoc_options: []
61
+ require_paths:
62
+ - lib
63
+ required_ruby_version: !ruby/object:Gem::Requirement
64
+ none: false
65
+ requirements:
66
+ - - ! '>='
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ required_rubygems_version: !ruby/object:Gem::Requirement
70
+ none: false
71
+ requirements:
72
+ - - ! '>='
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ requirements: []
76
+ rubyforge_project:
77
+ rubygems_version: 1.8.23
78
+ signing_key:
79
+ specification_version: 3
80
+ summary: Load data produced by Wukong processors and dataflows into data stores.
81
+ test_files:
82
+ - spec/spec_helper.rb
83
+ - spec/wukong-load/elasticsearch_spec.rb
84
+ has_rdoc: