embulk-output-documentdb 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: cdf1c91d03258737edcf353cdfaa4ad87ddba0b2
4
+ data.tar.gz: 4a96c42c177b9693c66bfa412f720880073b16d7
5
+ SHA512:
6
+ metadata.gz: d632dcfc63aa637e55838ac223327f9b5fc884ba0684edcb408dfab5ab3882ef237950941c158c7c25544c7969ed3b611ebd537aa00c8a62b3431905ddf9e6ec
7
+ data.tar.gz: a1ded833cbd20cb47dcc417fdea5c3c37aedfbfe169d2008a631bc9c800953d2d8b1b136cac661416d352cc51c57b2d8e7849e3e9ff0ff88eb5d75b52720ec9f
data/.gitignore ADDED
@@ -0,0 +1,5 @@
1
+ *~
2
+ /pkg/
3
+ /tmp/
4
+ /.bundle/
5
+ /Gemfile.lock
data/ChangeLog ADDED
@@ -0,0 +1,4 @@
1
+ Release 0.1.0 - 2016/08/28
2
+
3
+ * Inital Release
4
+ * Add documentdb.rb + document tiny client libs
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org/'
2
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+
2
+ MIT License
3
+
4
+ Permission is hereby granted, free of charge, to any person obtaining
5
+ a copy of this software and associated documentation files (the
6
+ "Software"), to deal in the Software without restriction, including
7
+ without limitation the rights to use, copy, modify, merge, publish,
8
+ distribute, sublicense, and/or sell copies of the Software, and to
9
+ permit persons to whom the Software is furnished to do so, subject to
10
+ the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,114 @@
1
+ # Azure DocumentDB output plugin for Embulk
2
+
3
+ embulk-output-documentdb is a embulk output plugin that dumps records to Azure DocumentDB
4
+
5
+ ## Overview
6
+
7
+ * **Plugin type**: output
8
+ * **Load all or nothing**: no
9
+ * **Resume supported**: no
10
+ * **Cleanup supported**: yes
11
+
12
+ ## Installation
13
+
14
+ $ gem install embulk-output-documentdb
15
+
16
+ ## Configuration
17
+
18
+ ### DocumentDB
19
+
20
+ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account using either the Azure portal, Azure Resource Manager templates, or Azure command-line interface (CLI). In addition, you must have a database and a collection to which embulk-output-documentdb writes event-stream out. Here are instructions:
21
+
22
+ * Create a DocumentDB database account using [the Azure portal](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-account/), or [Azure Resource Manager templates and Azure CLI](https://azure.microsoft.com/en-us/documentation/articles/documentdb-automation-resource-manager-cli/)
23
+ * [How to create a database for DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-database/)
24
+ * [Create a DocumentDB collection](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-collection/)
25
+ * [Partitioning and scaling in Azure DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-partition-data/)
26
+
27
+ ## Configuration
28
+
29
+ ```yaml
30
+ out:
31
+ type: documentdb
32
+ docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
33
+ docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
34
+ docdb_database: myembulkdb
35
+ docdb_collection: myembulkcoll
36
+ auto_create_database: true
37
+ auto_create_collection: true
38
+ partitioned_collection: false
39
+ key_column: id
40
+ ```
41
+
42
+ * **docdb\_endpoint (required)** - Azure DocumentDB Account endpoint URI
43
+ * **docdb\_account\_key (required)** - Azure DocumentDB Account key (master key). You must NOT set a read-only key
44
+ * **docdb\_database (required)** - DocumentDB database nameb
45
+ * **docdb\_collection (required)** - DocumentDB collection name
46
+ * **auto\_create\_database (optional)** - Default:true. By default, DocumentDB database named **docdb\_database** will be automatically created if it does not exist
47
+ * **auto\_create\_collection (optional)** - Default:true. By default, DocumentDB collection named **docdb\_collection** will be automatically created if it does not exist
48
+ * **partitioned\_collection (optional)** - Default:false. Set true if you want to create and/or store records to partitioned collection. Set false for single-partition collection
49
+ * **partition\_key (optional)** - Default:nil. Partition key must be specified for paritioned collection (partitioned\_collection set to be true)
50
+ * **offer\_throughput (optional)** - Default:10100. Throughput for the collection expressed in units of 100 request units per second. This is only effective when you newly create a partitioned collection (ie. Both auto\_create\_collection and partitioned\_collection are set to be true )
51
+ * **key\_column (required)** - Column name to be inserted to DocumentDB as primary key. If it's not named "id", the column name is converted into "id" (string).
52
+
53
+ ## Configuration examples
54
+
55
+ Here are two types of the plugin configurations example - single-parition collection and partitioned collection.
56
+
57
+ ### (1) Single-Partition Collection Case
58
+
59
+ ```yaml
60
+ out:
61
+ type: documentdb
62
+ docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
63
+ docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
64
+ docdb_database: myembulkdb
65
+ docdb_collection: myembulkcoll
66
+ auto_create_database: true
67
+ auto_create_collection: true
68
+ partitioned_collection: false
69
+ key_column: id
70
+ ```
71
+
72
+ ### (2) Partitioned Collection Case
73
+
74
+ ```yaml
75
+ type: documentdb
76
+ docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
77
+ docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
78
+ docdb_database: myembulkdb
79
+ docdb_collection: myembulkcoll
80
+ auto_create_database: true
81
+ auto_create_collection: true
82
+ partitioned_collection: true
83
+ partition_key: account
84
+ offer_throughput: 10100
85
+ key_column: id
86
+ ```
87
+
88
+ ## Build, Install, and Run
89
+
90
+ ```
91
+ $ rake
92
+
93
+ $ embulk gem install pkg/embulk-output-documentdb-0.1.0.gem
94
+
95
+ $ embulk preview config.yml
96
+
97
+ $ embulk run config.yml
98
+
99
+ ```
100
+
101
+ ## Contributing
102
+
103
+ Bug reports and pull requests are welcome on GitHub at https://github.com/yokawasa/embulk-output-documentdb.
104
+
105
+ ## Copyright
106
+
107
+ <table>
108
+ <tr>
109
+ <td>Copyright</td><td>Copyright (c) 2016- Yoichi Kawasaki</td>
110
+ </tr>
111
+ <tr>
112
+ <td>License</td><td>MIT</td>
113
+ </tr>
114
+ </table>
data/Rakefile ADDED
@@ -0,0 +1,3 @@
1
+ require "bundler/gem_tasks"
2
+
3
+ task default: :build
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,21 @@
1
+ # coding: utf-8
2
+
3
+ Gem::Specification.new do |spec|
4
+ spec.name = "embulk-output-documentdb"
5
+ spec.version = File.read("VERSION").strip
6
+ spec.authors = ["Yoichi Kawasaki"]
7
+ spec.email = ["yoichi.kawasaki@outlook.com"]
8
+ spec.summary = "Azure DocumentDB output plugin for Embulk"
9
+ spec.description = "Dumps records to Azure DocumentDB"
10
+ spec.licenses = ["MIT"]
11
+ spec.homepage = "https://github.com/yoichika/embulk-output-documentdb"
12
+
13
+ spec.files = `git ls-files`.split("\n")
14
+ spec.test_files = spec.files.grep(%r{^(test|spec)/})
15
+ spec.require_paths = ["lib"]
16
+
17
+ spec.add_dependency "rest-client"
18
+ spec.add_development_dependency 'embulk', ['>= 0.8.13']
19
+ spec.add_development_dependency 'bundler', ['>= 1.10.6']
20
+ spec.add_development_dependency 'rake', ['>= 10.0']
21
+ end
@@ -0,0 +1,166 @@
1
+ module Embulk
2
+ module Output
3
+
4
+ require 'time'
5
+ require 'securerandom'
6
+ require_relative 'documentdb/client'
7
+ require_relative 'documentdb/partitioned_coll_client'
8
+ require_relative 'documentdb/header'
9
+ require_relative 'documentdb/resource'
10
+
11
+ class Documentdb < OutputPlugin
12
+ Plugin.register_output("documentdb", self)
13
+
14
+ def self.transaction(config, schema, count, &control)
15
+ # configuration code:
16
+ task = {
17
+ 'docdb_endpoint' => config.param('docdb_endpoint', :string),
18
+ 'docdb_account_key' => config.param('docdb_account_key', :string),
19
+ 'docdb_database' => config.param('docdb_database', :string),
20
+ 'docdb_collection' => config.param('docdb_collection', :string),
21
+ 'auto_create_database' => config.param('auto_create_database', :bool, :default => true),
22
+ 'auto_create_collection' => config.param('auto_create_collection',:bool, :default => true),
23
+ 'partitioned_collection' => config.param('partitioned_collection',:bool, :default => false),
24
+ 'partition_key' => config.param('partition_key', :string, :default => nil),
25
+ 'offer_throughput' => config.param('offer_throughput', :integer, :default => AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT),
26
+ 'key_column' => config.param('key_column', :string),
27
+ }
28
+ Embulk.logger.info "transaction start"
29
+ # param validation
30
+ raise ConfigError, 'no docdb_endpoint' if task['docdb_endpoint'].empty?
31
+ raise ConfigError, 'no docdb_account_key' if task['docdb_account_key'].empty?
32
+ raise ConfigError, 'no docdb_database' if task['docdb_database'].empty?
33
+ raise ConfigError, 'no docdb_collection' if task['docdb_collection'].empty?
34
+ raise ConfigError, 'no key_column' if task['key_column'].empty?
35
+
36
+ if task['partitioned_collection']
37
+ raise ConfigError, 'partition_key must be set in partitioned collection mode' if @partition_key.empty?
38
+ if (task['auto_create_collection'] && task['offer_throughput'] < AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
39
+ raise ConfigError, sprintf("offer_throughput must be more than and equals to %s",
40
+ AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
41
+ end
42
+ end
43
+
44
+ # resumable output:
45
+ # resume(task, schema, count, &control)
46
+
47
+ # non-resumable output:
48
+ Embulk.logger.info "Documentdb output start"
49
+ task_reports = yield(task)
50
+ Embulk.logger.info "Documentdb output finished. Task reports = #{task_reports.to_json}"
51
+
52
+ next_config_diff = {}
53
+ return next_config_diff
54
+ end
55
+
56
+ #def self.resume(task, schema, count, &control)
57
+ # task_reports = yield(task)
58
+ #
59
+ # next_config_diff = {}
60
+ # return next_config_diff
61
+ #end
62
+
63
+
64
+ # init is called in initialize(task, schema, index)
65
+ def init
66
+ # initialization code:
67
+ @recordnum = 0
68
+ @successnum = 0
69
+
70
+ begin
71
+ @client = nil
72
+ if task['partitioned_collection']
73
+ @client = AzureDocumentDB::PartitionedCollectionClient.new(task['docdb_account_key'],task['docdb_endpoint'])
74
+ else
75
+ @client = AzureDocumentDB::Client.new(task['docdb_account_key'],task['docdb_endpoint'])
76
+ end
77
+
78
+ # initial operations for database
79
+ res = @client.find_databases_by_name(task['docdb_database'])
80
+ if( res[:body]["_count"].to_i == 0 )
81
+ raise "No database (#{docdb_database})! Enable auto_create_database or create it by yourself" if !task['auto_create_database']
82
+ # create new database as it doesn't exists
83
+ @client.create_database(task['docdb_database'])
84
+ end
85
+
86
+ # initial operations for collection
87
+ database_resource = @client.get_database_resource(task['docdb_database'])
88
+ res = @client.find_collections_by_name(database_resource, task['docdb_collection'])
89
+ if( res[:body]["_count"].to_i == 0 )
90
+ raise "No collection (#{docdb_collection})! Enable auto_create_collection or create it by yourself" if !task['auto_create_collection']
91
+ # create new collection as it doesn't exists
92
+ if task['partitioned_collection']
93
+ partition_key_paths = ["/#{task['partition_key']}"]
94
+ @client.create_collection(database_resource,
95
+ task['docdb_collection'], partition_key_paths, task['offer_throughput'])
96
+ else
97
+ @client.create_collection(database_resource, task['docdb_collection'])
98
+ end
99
+ end
100
+ @coll_resource = @client.get_collection_resource(database_resource, task['docdb_collection'])
101
+
102
+ rescue Exception =>ex
103
+ Embulk.logger.error { "Error: init: '#{ex}'" }
104
+ exit!
105
+ end
106
+ end
107
+
108
+
109
+ def close
110
+ end
111
+
112
+ # called for each page in each task
113
+ def add(page)
114
+ # output code:
115
+ page.each do |record|
116
+ hash = Hash[schema.names.zip(record)]
117
+ @recordnum += 1
118
+ if !hash.key?(@task['key_column'])
119
+ Embulk.logger.warn { "Skip Invalid Record: no key_column, data=>" + hash.to_json }
120
+ next
121
+ end
122
+ unique_doc_id = "#{hash[@task['key_column']]}"
123
+ if @task['key_column'] != 'id'
124
+ hash.delete(@task['key_column'])
125
+ end
126
+ # force primary key to be both named "id" and "string" type
127
+ hash['id'] = unique_doc_id
128
+
129
+ begin
130
+ if @task['partitioned_collection']
131
+ @client.create_document(@coll_resource, unique_doc_id, hash, @task['partition_key'])
132
+ else
133
+ @client.create_document(@coll_resource, unique_doc_id, hash)
134
+ end
135
+ @successnum += 1
136
+ rescue RestClient::ExceptionWithResponse => rcex
137
+ exdict = JSON.parse(rcex.response)
138
+ if exdict['code'] == 'Conflict'
139
+ Embulk.logger.error { "Duplicate Error: doc id (#{unique_doc_id}) already exists, data=>" + hash.to_json }
140
+ else
141
+ Embulk.logger.error { "RestClient Error: '#{rcex.response}', data=>" + hash.to_json }
142
+ end
143
+ rescue => ex
144
+ Embulk.logger.error { "UnknownError: '#{ex}', doc id=>#{unique_doc_id}, data=>" + hash.to_json }
145
+ end
146
+ end
147
+ end
148
+
149
+ def finish
150
+ end
151
+
152
+ def abort
153
+ end
154
+
155
+ def commit
156
+ task_report = {
157
+ "total_records" => @recordnum,
158
+ "success" => @successnum,
159
+ "skip_or_error" => (@recordnum - @successnum),
160
+ }
161
+ return task_report
162
+ end
163
+ end
164
+
165
+ end
166
+ end
@@ -0,0 +1,167 @@
1
+ require 'rest-client'
2
+ require 'json'
3
+ require_relative 'constants'
4
+ require_relative 'header'
5
+ require_relative 'resource'
6
+
7
+ module AzureDocumentDB
8
+
9
+ class Client
10
+
11
+ def initialize (master_key, url_endpoint)
12
+ @master_key = master_key
13
+ @url_endpoint = url_endpoint
14
+ @header = AzureDocumentDB::Header.new(@master_key)
15
+ end
16
+
17
+ def create_database (database_name)
18
+ url = "#{@url_endpoint}/dbs"
19
+ custom_headers = {'Content-Type' => 'application/json'}
20
+ headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', custom_headers )
21
+ body_json = { 'id' => database_name }.to_json
22
+ res = RestClient.post( url, body_json, headers)
23
+ JSON.parse(res)
24
+ end
25
+
26
+ def find_databases_by_name (database_name)
27
+ query_params = []
28
+ query_text = "SELECT * FROM root r WHERE r.id=@id"
29
+ query_params.push( {:name=>"@id", :value=> database_name } )
30
+ url = sprintf("%s/dbs", @url_endpoint )
31
+ res = _query(AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', url, query_text, query_params)
32
+ res
33
+ end
34
+
35
+ def get_database_resource (database_name)
36
+ resource = nil
37
+ res = find_databases_by_name (database_name)
38
+ if( res[:body]["_count"].to_i == 0 )
39
+ p "no #{database_name} database exists"
40
+ return resource
41
+ end
42
+ res[:body]['Databases'].select do |db|
43
+ if (db['id'] == database_name )
44
+ resource = AzureDocumentDB::DatabaseResource.new(db['_rid'])
45
+ end
46
+ end
47
+ resource
48
+ end
49
+
50
+ def create_collection(database_resource, collection_name, colls_options={}, custom_headers={} )
51
+ if !database_resource
52
+ raise ArgumentError.new 'No database_resource!'
53
+ end
54
+ url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid )
55
+ custom_headers['Content-Type'] = 'application/json'
56
+ headers = @header.generate('post',
57
+ AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
58
+ database_resource.database_rid, custom_headers )
59
+ body = {'id' => collection_name }
60
+ colls_options.each{|k, v|
61
+ if k == 'indexingPolicy' || k == 'partitionKey'
62
+ body[k] = v
63
+ end
64
+ }
65
+ res = RestClient.post( url, body.to_json, headers)
66
+ JSON.parse(res)
67
+ end
68
+
69
+ def find_collections_by_name(database_resource, collection_name)
70
+ if !database_resource
71
+ raise ArgumentError.new 'No database_resource!'
72
+ end
73
+ ret = {}
74
+ query_params = []
75
+ query_text = "SELECT * FROM root r WHERE r.id=@id"
76
+ query_params.push( {:name=>"@id", :value=> collection_name } )
77
+ url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid)
78
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
79
+ database_resource.database_rid, url, query_text, query_params)
80
+ ret
81
+ end
82
+
83
+ def get_collection_resource (database_resource, collection_name)
84
+ _collection_rid = ''
85
+ if !database_resource
86
+ raise ArgumentError.new 'No database_resource!'
87
+ end
88
+ res = find_collections_by_name(database_resource, collection_name)
89
+ res[:body]['DocumentCollections'].select do |col|
90
+ if (col['id'] == collection_name )
91
+ _collection_rid = col['_rid']
92
+ end
93
+ end
94
+ if _collection_rid.empty?
95
+ p "no #{collection_name} collection exists"
96
+ return nil
97
+ end
98
+ AzureDocumentDB::CollectionResource.new(database_resource.database_rid, _collection_rid)
99
+ end
100
+
101
+ def create_document(collection_resource, document_id, document, custom_headers={} )
102
+ if !collection_resource
103
+ raise ArgumentError.new 'No collection_resource!'
104
+ end
105
+ if document['id'] && document_id != document['id']
106
+ raise ArgumentError.new "Document id mismatch error (#{document_id})!"
107
+ end
108
+ body = { 'id' => document_id }.merge document
109
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
110
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
111
+ custom_headers['Content-Type'] = 'application/json'
112
+ headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
113
+ collection_resource.collection_rid, custom_headers )
114
+ res = RestClient.post( url, body.to_json, headers)
115
+ JSON.parse(res)
116
+ end
117
+
118
+ def find_documents(collection_resource, document_id, custom_headers={})
119
+ if !collection_resource
120
+ raise ArgumentError.new 'No collection_resource!'
121
+ end
122
+ ret = {}
123
+ query_params = []
124
+ query_text = "SELECT * FROM c WHERE c.id=@id"
125
+ query_params.push( {:name=>"@id", :value=> document_id } )
126
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
127
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
128
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
129
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
130
+ ret
131
+ end
132
+
133
+ def query_documents( collection_resource, query_text, query_params, custom_headers={} )
134
+ if !collection_resource
135
+ raise ArgumentError.new 'No collection_resource!'
136
+ end
137
+ ret = {}
138
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
139
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
140
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
141
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
142
+ ret
143
+ end
144
+
145
+ protected
146
+
147
+ def _query( resource_type, parent_resource_id, url, query_text, query_params, custom_headers={} )
148
+ query_specific_header = {
149
+ 'x-ms-documentdb-isquery' => 'True',
150
+ 'Content-Type' => 'application/query+json',
151
+ 'Accept' => 'application/json'
152
+ }
153
+ query_specific_header.merge! custom_headers
154
+ headers = @header.generate('post', resource_type, parent_resource_id, query_specific_header)
155
+ body_json = {
156
+ :query => query_text,
157
+ :parameters => query_params
158
+ }.to_json
159
+
160
+ res = RestClient.post( url, body_json, headers)
161
+ result = {
162
+ :header => res.headers,
163
+ :body => JSON.parse(res.body) }
164
+ return result
165
+ end
166
+ end
167
+ end
@@ -0,0 +1,10 @@
1
+ module AzureDocumentDB
2
+ API_VERSION = '2015-12-16'.freeze
3
+ RESOURCE_TYPE_DATABASE='dbs'.freeze
4
+ RESOURCE_TYPE_COLLECTION='colls'.freeze
5
+ RESOURCE_TYPE_DOCUMENT='docs'.freeze
6
+ AUTH_TOKEN_VERSION = '1.0'.freeze
7
+ AUTH_TOKEN_TYPE_MASTER = 'master'.freeze
8
+ AUTH_TOKEN_TYPE_RESOURCE = 'resource'.freeze
9
+ PARTITIONED_COLL_MIN_THROUGHPUT = 10100.freeze
10
+ end
@@ -0,0 +1,55 @@
1
+ require 'time'
2
+ require 'openssl'
3
+ require 'base64'
4
+ require 'erb'
5
+
6
+ module AzureDocumentDB
7
+
8
+ class Header
9
+
10
+ def initialize (master_key)
11
+ @master_key = master_key
12
+ end
13
+
14
+ def generate (verb, resource_type, parent_resource_id, api_specific_headers = {} )
15
+ headers = {}
16
+ utc_date = get_httpdate()
17
+ auth_token = generate_auth_token(verb, resource_type, parent_resource_id, utc_date )
18
+ default_headers = {
19
+ 'x-ms-version' => AzureDocumentDB::API_VERSION,
20
+ 'x-ms-date' => utc_date,
21
+ 'authorization' => auth_token
22
+ }.freeze
23
+ headers.merge!(default_headers)
24
+ headers.merge(api_specific_headers)
25
+ end
26
+
27
+ private
28
+
29
+ def generate_auth_token ( verb, resource_type, resource_id, utc_date)
30
+ payload = sprintf("%s\n%s\n%s\n%s\n%s\n",
31
+ verb,
32
+ resource_type,
33
+ resource_id,
34
+ utc_date,
35
+ "" )
36
+ sig = hmac_base64encode(payload)
37
+
38
+ ERB::Util.url_encode sprintf("type=%s&ver=%s&sig=%s",
39
+ AzureDocumentDB::AUTH_TOKEN_TYPE_MASTER,
40
+ AzureDocumentDB::AUTH_TOKEN_VERSION,
41
+ sig )
42
+ end
43
+
44
+ def get_httpdate
45
+ Time.now.httpdate
46
+ end
47
+
48
+ def hmac_base64encode( text )
49
+ key = Base64.urlsafe_decode64 @master_key
50
+ hmac = OpenSSL::HMAC.digest('sha256', key, text.downcase)
51
+ Base64.encode64(hmac).strip
52
+ end
53
+
54
+ end
55
+ end
@@ -0,0 +1,62 @@
1
+ require 'rest-client'
2
+ require 'json'
3
+ require_relative 'constants'
4
+ require_relative 'header'
5
+ require_relative 'resource'
6
+
7
+ module AzureDocumentDB
8
+
9
+ class PartitionedCollectionClient < Client
10
+
11
+ def create_collection(database_resource, collection_name,
12
+ partition_key_paths, offer_throughput = AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT )
13
+
14
+ if (offer_throughput < AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
15
+ raise ArgumentError.new sprintf("Offeer thoughput need to be more than %d !",
16
+ AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
17
+ end
18
+ if (partition_key_paths.length < 1 )
19
+ raise ArgumentError.new "No PartitionKey paths!"
20
+ end
21
+ colls_options = {
22
+ 'indexingPolicy' => { 'indexingMode' => "consistent", 'automatic'=>true },
23
+ 'partitionKey' => { "paths" => partition_key_paths, "kind" => "Hash" }
24
+ }
25
+ custom_headers= {'x-ms-offer-throughput' => offer_throughput }
26
+ super(database_resource, collection_name, colls_options, custom_headers)
27
+ end
28
+
29
+
30
+ def create_document(collection_resource, document_id, document, partitioned_key )
31
+ if partitioned_key.empty?
32
+ raise ArgumentError.new "No partitioned key!"
33
+ end
34
+ if !document.key?(partitioned_key)
35
+ raise ArgumentError.new "No partitioned key in your document!"
36
+ end
37
+ partitioned_key_value = document[partitioned_key]
38
+ custom_headers = {
39
+ 'x-ms-documentdb-partitionkey' => "[\"#{partitioned_key_value}\"]"
40
+ }
41
+ super(collection_resource, document_id, document, custom_headers)
42
+ end
43
+
44
+ def find_documents(collection_resource, document_id,
45
+ partitioned_key, partitioned_key_value, custom_headers={})
46
+ if !collection_resource
47
+ raise ArgumentError.new "No collection_resource!"
48
+ end
49
+ ret = {}
50
+ query_params = []
51
+ query_text = sprintf("SELECT * FROM c WHERE c.id=@id AND c.%s=@value", partitioned_key)
52
+ query_params.push( {:name=>"@id", :value=> document_id } )
53
+ query_params.push( {:name=>"@value", :value=> partitioned_key_value } )
54
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
55
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
56
+ ret = query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
57
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
58
+ ret
59
+ end
60
+
61
+ end
62
+ end
@@ -0,0 +1,40 @@
1
+ module AzureDocumentDB
2
+
3
+ class Resource
4
+ def initialize
5
+ @r = {}
6
+ end
7
+ protected
8
+ attr_accessor :r
9
+ end
10
+
11
+ class DatabaseResource < Resource
12
+
13
+ def initialize (database_rid)
14
+ super()
15
+ @r['database_rid'] = database_rid
16
+ end
17
+
18
+ def database_rid
19
+ @r['database_rid']
20
+ end
21
+ end
22
+
23
+ class CollectionResource < Resource
24
+
25
+ def initialize (database_rid, collection_rid)
26
+ super()
27
+ @r['database_rid'] = database_rid
28
+ @r['collection_rid'] = collection_rid
29
+ end
30
+
31
+ def database_rid
32
+ @r['database_rid']
33
+ end
34
+
35
+ def collection_rid
36
+ @r['collection_rid']
37
+ end
38
+ end
39
+
40
+ end
@@ -0,0 +1,33 @@
1
+ in:
2
+ type: file
3
+ path_prefix: samples/sample_01.csv
4
+ parser:
5
+ charset: UTF-8
6
+ newline: CRLF
7
+ type: csv
8
+ delimiter: ','
9
+ quote: '"'
10
+ escape: '"'
11
+ null_string: 'NULL'
12
+ trim_if_not_quoted: false
13
+ skip_header_lines: 1
14
+ allow_extra_columns: false
15
+ allow_optional_columns: false
16
+ columns:
17
+ - {name: id, type: long}
18
+ - {name: account, type: long}
19
+ - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
20
+ - {name: purchase, type: timestamp, format: '%Y%m%d'}
21
+ - {name: comment, type: string}
22
+ out:
23
+ type: documentdb
24
+ docdb_endpoint: https://yoichikademo1.documents.azure.com:443/
25
+ docdb_account_key: EMwUa3EzsAtJ1qYfzwo9nQ3xxxfsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
26
+ docdb_database: myembulkdb
27
+ docdb_collection: myembulkcoll
28
+ auto_create_database: true
29
+ auto_create_collection: true
30
+ partitioned_collection: true
31
+ partition_key: host
32
+ offer_throughput: 10100
33
+ key_column: id
@@ -0,0 +1,31 @@
1
+ in:
2
+ type: file
3
+ path_prefix: samples/sample_01.csv
4
+ parser:
5
+ charset: UTF-8
6
+ newline: CRLF
7
+ type: csv
8
+ delimiter: ','
9
+ quote: '"'
10
+ escape: '"'
11
+ null_string: 'NULL'
12
+ trim_if_not_quoted: false
13
+ skip_header_lines: 1
14
+ allow_extra_columns: false
15
+ allow_optional_columns: false
16
+ columns:
17
+ - {name: id, type: long}
18
+ - {name: account, type: long}
19
+ - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
20
+ - {name: purchase, type: timestamp, format: '%Y%m%d'}
21
+ - {name: comment, type: string}
22
+ out:
23
+ type: documentdb
24
+ docdb_endpoint: https://yoichikademo1.documents.azure.com:443/
25
+ docdb_account_key: EMwUa3EzsAtJ1qYfzwo9nQ3xxxfsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
26
+ docdb_database: myembulkdb
27
+ docdb_collection: myembulkcoll
28
+ auto_create_database: true
29
+ auto_create_collection: true
30
+ partitioned_collection: false
31
+ key_column: id
@@ -0,0 +1,6 @@
1
+ id,account,time,purchase,comment
2
+ 0,21123,2016-08-27 19:23:49,20160127,java
3
+ 1,32864,2016-08-27 19:23:49,20160127,embulk
4
+ 2,14824,2016-08-27 19:01:23,20160127,embulk jruby
5
+ 3,27559,2016-08-28 02:20:02,20160128,"Embulk ""csv"" parser plugin"
6
+ 4,11270,2016-08-29 11:54:36,20160129,NULL
metadata ADDED
@@ -0,0 +1,117 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: embulk-output-documentdb
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Yoichi Kawasaki
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-08-28 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rest-client
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: embulk
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: 0.8.13
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: 0.8.13
41
+ - !ruby/object:Gem::Dependency
42
+ name: bundler
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: 1.10.6
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: 1.10.6
55
+ - !ruby/object:Gem::Dependency
56
+ name: rake
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '10.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '10.0'
69
+ description: Dumps records to Azure DocumentDB
70
+ email:
71
+ - yoichi.kawasaki@outlook.com
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - ".gitignore"
77
+ - ChangeLog
78
+ - Gemfile
79
+ - LICENSE.txt
80
+ - README.md
81
+ - Rakefile
82
+ - VERSION
83
+ - embulk-output-documentdb.gemspec
84
+ - lib/embulk/output/documentdb.rb
85
+ - lib/embulk/output/documentdb/client.rb
86
+ - lib/embulk/output/documentdb/constants.rb
87
+ - lib/embulk/output/documentdb/header.rb
88
+ - lib/embulk/output/documentdb/partitioned_coll_client.rb
89
+ - lib/embulk/output/documentdb/resource.rb
90
+ - samples/config-csv2docdb_partitionedcoll.yml
91
+ - samples/config-csv2docdb_singlecoll.yml
92
+ - samples/sample_01.csv
93
+ homepage: https://github.com/yoichika/embulk-output-documentdb
94
+ licenses:
95
+ - MIT
96
+ metadata: {}
97
+ post_install_message:
98
+ rdoc_options: []
99
+ require_paths:
100
+ - lib
101
+ required_ruby_version: !ruby/object:Gem::Requirement
102
+ requirements:
103
+ - - ">="
104
+ - !ruby/object:Gem::Version
105
+ version: '0'
106
+ required_rubygems_version: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ requirements: []
112
+ rubyforge_project:
113
+ rubygems_version: 2.6.2
114
+ signing_key:
115
+ specification_version: 4
116
+ summary: Azure DocumentDB output plugin for Embulk
117
+ test_files: []