RubyGems - embulk-output-documentdb - Versions diffs - 0.1.0 - Mend

embulk-output-documentdb 0.1.0

Files changed (19) hide show

checksums.yaml +7 -0
data/.gitignore +5 -0
data/ChangeLog +4 -0
data/Gemfile +2 -0
data/LICENSE.txt +21 -0
data/README.md +114 -0
data/Rakefile +3 -0
data/VERSION +1 -0
data/embulk-output-documentdb.gemspec +21 -0
data/lib/embulk/output/documentdb.rb +166 -0
data/lib/embulk/output/documentdb/client.rb +167 -0
data/lib/embulk/output/documentdb/constants.rb +10 -0
data/lib/embulk/output/documentdb/header.rb +55 -0
data/lib/embulk/output/documentdb/partitioned_coll_client.rb +62 -0
data/lib/embulk/output/documentdb/resource.rb +40 -0
data/samples/config-csv2docdb_partitionedcoll.yml +33 -0
data/samples/config-csv2docdb_singlecoll.yml +31 -0
data/samples/sample_01.csv +6 -0
metadata +117 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: cdf1c91d03258737edcf353cdfaa4ad87ddba0b2
+  data.tar.gz: 4a96c42c177b9693c66bfa412f720880073b16d7
+SHA512:
+  metadata.gz: d632dcfc63aa637e55838ac223327f9b5fc884ba0684edcb408dfab5ab3882ef237950941c158c7c25544c7969ed3b611ebd537aa00c8a62b3431905ddf9e6ec
+  data.tar.gz: a1ded833cbd20cb47dcc417fdea5c3c37aedfbfe169d2008a631bc9c800953d2d8b1b136cac661416d352cc51c57b2d8e7849e3e9ff0ff88eb5d75b52720ec9f

data/.gitignore ADDED Viewed

@@ -0,0 +1,5 @@
+*~
+/pkg/
+/tmp/
+/.bundle/
+/Gemfile.lock

data/ChangeLog ADDED Viewed

@@ -0,0 +1,4 @@
+Release 0.1.0 - 2016/08/28
+	* Inital Release
+	* Add documentdb.rb + document tiny client libs

data/Gemfile ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ source 'https://rubygems.org/'
2	+ gemspec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,114 @@
+# Azure DocumentDB output plugin for Embulk
+embulk-output-documentdb is a embulk output plugin that dumps records to Azure DocumentDB
+## Overview
+* **Plugin type**: output
+* **Load all or nothing**: no
+* **Resume supported**: no
+* **Cleanup supported**: yes
+## Installation
+    $ gem install embulk-output-documentdb
+## Configuration
+### DocumentDB
+To use Microsoft Azure DocumentDB, you must create a DocumentDB database account using either the Azure portal, Azure Resource Manager templates, or Azure command-line interface (CLI). In addition, you must have a database and a collection to which embulk-output-documentdb writes event-stream out. Here are instructions:
+ * Create a DocumentDB database account using [the Azure portal](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-account/), or [Azure Resource Manager templates and Azure CLI](https://azure.microsoft.com/en-us/documentation/articles/documentdb-automation-resource-manager-cli/)
+ * [How to create a database for DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-database/)
+ * [Create a DocumentDB collection](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-collection/)
+ * [Partitioning and scaling in Azure DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-partition-data/)
+## Configuration
+```yaml
+out:
+  type: documentdb
+  docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
+  docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
+  docdb_database: myembulkdb
+  docdb_collection: myembulkcoll
+  auto_create_database: true
+  auto_create_collection: true
+  partitioned_collection: false
+  key_column: id
+```
+ * **docdb\_endpoint (required)** - Azure DocumentDB Account endpoint URI
+ * **docdb\_account\_key (required)** - Azure DocumentDB Account key (master key). You must NOT set a read-only key
+ * **docdb\_database (required)** - DocumentDB database nameb
+ * **docdb\_collection (required)** - DocumentDB collection name
+ * **auto\_create\_database (optional)** - Default:true. By default, DocumentDB database named **docdb\_database** will be automatically created if it does not exist
+ * **auto\_create\_collection (optional)** - Default:true. By default, DocumentDB collection named **docdb\_collection** will be automatically created if it does not exist
+ * **partitioned\_collection (optional)** - Default:false. Set true if you want to create and/or store records to partitioned collection. Set false for single-partition collection
+ * **partition\_key (optional)** - Default:nil. Partition key must be specified for paritioned collection (partitioned\_collection set to be true)
+ * **offer\_throughput (optional)** - Default:10100. Throughput for the collection expressed in units of 100 request units per second. This is only effective when you newly create a partitioned collection (ie. Both auto\_create\_collection and partitioned\_collection are set to be true )
+ * **key\_column (required)** - Column name to be inserted to DocumentDB as primary key. If it's not named "id", the column name is converted into "id" (string).
+## Configuration examples
+Here are two types of the plugin configurations example - single-parition collection and partitioned collection.
+### (1) Single-Partition Collection Case
+```yaml
+out:
+  type: documentdb
+  docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
+  docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
+  docdb_database: myembulkdb
+  docdb_collection: myembulkcoll
+  auto_create_database: true
+  auto_create_collection: true
+  partitioned_collection: false
+  key_column: id
+```
+### (2) Partitioned Collection Case
+```yaml
+  type: documentdb
+  docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
+  docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
+  docdb_database: myembulkdb
+  docdb_collection: myembulkcoll
+  auto_create_database: true
+  auto_create_collection: true
+  partitioned_collection: true
+  partition_key: account
+  offer_throughput: 10100
+  key_column: id
+```
+## Build, Install, and Run
+```
+$ rake
+$ embulk gem install pkg/embulk-output-documentdb-0.1.0.gem
+$ embulk preview config.yml
+$ embulk run config.yml
+```
+## Contributing
+Bug reports and pull requests are welcome on GitHub at https://github.com/yokawasa/embulk-output-documentdb.
+## Copyright
+<table>
+  <tr>
+    <td>Copyright</td><td>Copyright (c) 2016- Yoichi Kawasaki</td>
+  </tr>
+  <tr>
+    <td>License</td><td>MIT</td>
+  </tr>
+</table>

data/Rakefile ADDED Viewed

@@ -0,0 +1,3 @@
+require "bundler/gem_tasks"
+task default: :build

data/VERSION ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0.1.0

data/embulk-output-documentdb.gemspec ADDED Viewed

@@ -0,0 +1,21 @@
+# coding: utf-8
+Gem::Specification.new do |spec|
+  spec.name          = "embulk-output-documentdb"
+  spec.version       =  File.read("VERSION").strip
+  spec.authors       = ["Yoichi Kawasaki"]
+  spec.email         = ["yoichi.kawasaki@outlook.com"]
+  spec.summary       = "Azure DocumentDB output plugin for Embulk"
+  spec.description   = "Dumps records to Azure DocumentDB"
+  spec.licenses      = ["MIT"]
+  spec.homepage      = "https://github.com/yoichika/embulk-output-documentdb"
+  spec.files         = `git ls-files`.split("\n")
+  spec.test_files    = spec.files.grep(%r{^(test|spec)/})
+  spec.require_paths = ["lib"]
+  spec.add_dependency "rest-client"
+  spec.add_development_dependency 'embulk', ['>= 0.8.13']
+  spec.add_development_dependency 'bundler', ['>= 1.10.6']
+  spec.add_development_dependency 'rake', ['>= 10.0']
+end

data/lib/embulk/output/documentdb.rb ADDED Viewed

@@ -0,0 +1,166 @@
+module Embulk
+  module Output
+    require 'time'
+    require 'securerandom'
+    require_relative 'documentdb/client'
+    require_relative 'documentdb/partitioned_coll_client'
+    require_relative 'documentdb/header'
+    require_relative 'documentdb/resource'
+    class Documentdb < OutputPlugin
+      Plugin.register_output("documentdb", self)
+      def self.transaction(config, schema, count, &control)
+        # configuration code:
+        task = {
+          'docdb_endpoint'         => config.param('docdb_endpoint',        :string),
+          'docdb_account_key'      => config.param('docdb_account_key',     :string),
+          'docdb_database'         => config.param('docdb_database',        :string),
+          'docdb_collection'       => config.param('docdb_collection',      :string),
+          'auto_create_database'   => config.param('auto_create_database',  :bool,  :default => true),
+          'auto_create_collection' => config.param('auto_create_collection',:bool,  :default => true),
+          'partitioned_collection' => config.param('partitioned_collection',:bool,  :default => false),
+          'partition_key'          => config.param('partition_key',         :string, :default => nil),
+          'offer_throughput'       => config.param('offer_throughput',      :integer, :default => AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT),
+          'key_column'             => config.param('key_column',            :string),
+        }
+        Embulk.logger.info "transaction start"
+        # param validation
+        raise ConfigError, 'no docdb_endpoint' if task['docdb_endpoint'].empty?
+        raise ConfigError, 'no docdb_account_key' if task['docdb_account_key'].empty?
+        raise ConfigError, 'no docdb_database' if task['docdb_database'].empty?
+        raise ConfigError, 'no docdb_collection' if task['docdb_collection'].empty?
+        raise ConfigError, 'no key_column' if task['key_column'].empty?
+        if task['partitioned_collection']
+          raise ConfigError, 'partition_key must be set in partitioned collection mode' if @partition_key.empty?
+          if (task['auto_create_collection'] && task['offer_throughput'] < AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
+            raise ConfigError, sprintf("offer_throughput must be more than and equals to %s",
+                                AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
+          end
+        end
+        # resumable output:
+        # resume(task, schema, count, &control)
+        # non-resumable output:
+        Embulk.logger.info "Documentdb output start"
+        task_reports = yield(task)
+        Embulk.logger.info "Documentdb output finished. Task reports = #{task_reports.to_json}"
+        next_config_diff = {}
+        return next_config_diff
+      end
+      #def self.resume(task, schema, count, &control)
+      #  task_reports = yield(task)
+      #
+      #  next_config_diff = {}
+      #  return next_config_diff
+      #end
+      # init is called in initialize(task, schema, index)
+      def init
+        # initialization code:
+        @recordnum = 0
+        @successnum = 0
+        begin
+          @client = nil
+          if task['partitioned_collection']
+            @client = AzureDocumentDB::PartitionedCollectionClient.new(task['docdb_account_key'],task['docdb_endpoint'])
+          else
+            @client = AzureDocumentDB::Client.new(task['docdb_account_key'],task['docdb_endpoint'])
+          end
+          # initial operations for database
+          res = @client.find_databases_by_name(task['docdb_database'])
+          if( res[:body]["_count"].to_i == 0 )
+            raise "No database (#{docdb_database})! Enable auto_create_database or create it by yourself" if !task['auto_create_database']
+            # create new database as it doesn't exists
+            @client.create_database(task['docdb_database'])
+          end
+          # initial operations for collection
+          database_resource = @client.get_database_resource(task['docdb_database'])
+          res = @client.find_collections_by_name(database_resource, task['docdb_collection'])
+          if( res[:body]["_count"].to_i == 0 )
+            raise "No collection (#{docdb_collection})! Enable auto_create_collection or create it by yourself" if !task['auto_create_collection']
+            # create new collection as it doesn't exists
+            if task['partitioned_collection']
+              partition_key_paths = ["/#{task['partition_key']}"]
+              @client.create_collection(database_resource,
+                               task['docdb_collection'], partition_key_paths, task['offer_throughput'])
+            else
+              @client.create_collection(database_resource, task['docdb_collection'])
+            end
+          end
+          @coll_resource = @client.get_collection_resource(database_resource, task['docdb_collection'])
+        rescue Exception =>ex
+          Embulk.logger.error { "Error: init: '#{ex}'" }
+          exit!
+        end
+      end
+      def close
+      end
+      # called for each page in each task
+      def add(page)
+        # output code:
+        page.each do |record|
+          hash = Hash[schema.names.zip(record)]
+          @recordnum += 1
+          if !hash.key?(@task['key_column'])
+            Embulk.logger.warn { "Skip Invalid Record: no key_column, data=>" + hash.to_json }
+            next
+          end
+          unique_doc_id = "#{hash[@task['key_column']]}"
+          if @task['key_column'] != 'id'
+            hash.delete(@task['key_column'])
+          end
+          # force primary key to be both named "id" and "string" type
+          hash['id'] = unique_doc_id
+          begin
+            if @task['partitioned_collection']
+              @client.create_document(@coll_resource, unique_doc_id, hash, @task['partition_key'])
+            else
+              @client.create_document(@coll_resource, unique_doc_id, hash)
+            end
+            @successnum += 1
+          rescue RestClient::ExceptionWithResponse => rcex
+            exdict = JSON.parse(rcex.response)
+            if exdict['code'] == 'Conflict'
+              Embulk.logger.error { "Duplicate Error: doc id (#{unique_doc_id}) already exists, data=>" + hash.to_json }
+            else
+              Embulk.logger.error { "RestClient Error: '#{rcex.response}', data=>" + hash.to_json }
+            end
+          rescue => ex
+            Embulk.logger.error { "UnknownError: '#{ex}', doc id=>#{unique_doc_id}, data=>" + hash.to_json }
+          end
+        end
+      end
+      def finish
+      end
+      def abort
+      end
+      def commit
+        task_report = {
+          "total_records" => @recordnum,
+          "success" => @successnum,
+          "skip_or_error" => (@recordnum - @successnum),
+        }
+        return task_report
+      end
+    end
+  end
+end

data/lib/embulk/output/documentdb/client.rb ADDED Viewed

@@ -0,0 +1,167 @@
+require 'rest-client'
+require 'json'
+require_relative 'constants'
+require_relative 'header'
+require_relative 'resource'
+module AzureDocumentDB
+  class Client
+    def initialize (master_key, url_endpoint)
+      @master_key = master_key
+      @url_endpoint = url_endpoint
+      @header = AzureDocumentDB::Header.new(@master_key)
+    end
+    def create_database (database_name)
+      url = "#{@url_endpoint}/dbs"
+      custom_headers = {'Content-Type' => 'application/json'}
+      headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', custom_headers )
+      body_json = { 'id' => database_name }.to_json
+      res = RestClient.post( url, body_json, headers)
+      JSON.parse(res)
+    end
+    def find_databases_by_name (database_name)
+      query_params = []
+      query_text = "SELECT * FROM root r WHERE r.id=@id"
+      query_params.push( {:name=>"@id", :value=> database_name } )
+      url = sprintf("%s/dbs", @url_endpoint )
+      res = _query(AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', url, query_text, query_params)
+      res
+    end
+    def get_database_resource (database_name)
+      resource  = nil
+      res = find_databases_by_name (database_name)
+      if( res[:body]["_count"].to_i == 0 )
+        p "no #{database_name} database exists"
+        return resource
+      end
+      res[:body]['Databases'].select do |db|
+        if (db['id'] == database_name )
+          resource = AzureDocumentDB::DatabaseResource.new(db['_rid'])
+        end
+      end
+      resource
+    end
+    def create_collection(database_resource, collection_name, colls_options={}, custom_headers={} )
+      if !database_resource
+        raise ArgumentError.new 'No database_resource!'
+      end
+      url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid )
+      custom_headers['Content-Type'] = 'application/json'
+      headers = @header.generate('post',
+            AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
+            database_resource.database_rid, custom_headers )
+      body = {'id' => collection_name }
+      colls_options.each{|k, v|
+        if k == 'indexingPolicy' || k == 'partitionKey'
+          body[k] = v
+        end
+      }
+      res = RestClient.post( url, body.to_json, headers)
+      JSON.parse(res)
+    end
+    def find_collections_by_name(database_resource, collection_name)
+      if !database_resource
+        raise ArgumentError.new 'No database_resource!'
+      end
+      ret = {}
+      query_params = []
+      query_text = "SELECT * FROM root r WHERE r.id=@id"
+      query_params.push( {:name=>"@id", :value=> collection_name } )
+      url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid)
+      ret = _query(AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
+                database_resource.database_rid, url, query_text, query_params)
+      ret
+    end
+    def get_collection_resource (database_resource, collection_name)
+      _collection_rid = ''
+      if !database_resource
+        raise ArgumentError.new 'No database_resource!'
+      end
+      res = find_collections_by_name(database_resource, collection_name)
+      res[:body]['DocumentCollections'].select do |col|
+        if (col['id'] == collection_name )
+          _collection_rid = col['_rid']
+        end
+      end
+      if _collection_rid.empty?
+        p "no #{collection_name} collection exists"
+        return nil
+      end
+      AzureDocumentDB::CollectionResource.new(database_resource.database_rid, _collection_rid)
+    end
+    def create_document(collection_resource, document_id, document, custom_headers={} )
+      if !collection_resource
+        raise ArgumentError.new 'No collection_resource!'
+      end
+      if document['id'] && document_id != document['id']
+        raise ArgumentError.new "Document id mismatch error (#{document_id})!"
+      end
+      body = { 'id' => document_id }.merge document
+      url = sprintf("%s/dbs/%s/colls/%s/docs",
+                  @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
+      custom_headers['Content-Type'] = 'application/json'
+      headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
+                                  collection_resource.collection_rid, custom_headers )
+      res = RestClient.post( url, body.to_json, headers)
+      JSON.parse(res)
+    end
+    def find_documents(collection_resource, document_id, custom_headers={})
+      if !collection_resource
+        raise ArgumentError.new 'No collection_resource!'
+      end
+      ret = {}
+      query_params = []
+      query_text = "SELECT * FROM c WHERE c.id=@id"
+      query_params.push( {:name=>"@id", :value=> document_id } )
+      url = sprintf("%s/dbs/%s/colls/%s/docs",
+              @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
+      ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
+              collection_resource.collection_rid, url, query_text, query_params, custom_headers)
+      ret
+    end
+    def query_documents( collection_resource, query_text, query_params, custom_headers={} )
+      if !collection_resource
+        raise ArgumentError.new 'No collection_resource!'
+      end
+      ret = {}
+      url = sprintf("%s/dbs/%s/colls/%s/docs",
+              @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
+      ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
+              collection_resource.collection_rid, url, query_text, query_params, custom_headers)
+      ret
+    end
+    protected
+    def _query( resource_type, parent_resource_id, url, query_text, query_params, custom_headers={} )
+      query_specific_header = {
+              'x-ms-documentdb-isquery' => 'True',
+              'Content-Type' => 'application/query+json',
+              'Accept' => 'application/json'
+             }
+      query_specific_header.merge! custom_headers
+      headers = @header.generate('post', resource_type, parent_resource_id, query_specific_header)
+      body_json = {
+              :query => query_text,
+              :parameters => query_params
+             }.to_json
+      res = RestClient.post( url, body_json, headers)
+      result = {
+          :header => res.headers,
+          :body => JSON.parse(res.body) }
+      return result
+    end
+  end
+end

data/lib/embulk/output/documentdb/constants.rb ADDED Viewed

@@ -0,0 +1,10 @@
+module AzureDocumentDB
+  API_VERSION = '2015-12-16'.freeze
+  RESOURCE_TYPE_DATABASE='dbs'.freeze
+  RESOURCE_TYPE_COLLECTION='colls'.freeze
+  RESOURCE_TYPE_DOCUMENT='docs'.freeze
+  AUTH_TOKEN_VERSION = '1.0'.freeze
+  AUTH_TOKEN_TYPE_MASTER = 'master'.freeze
+  AUTH_TOKEN_TYPE_RESOURCE = 'resource'.freeze
+  PARTITIONED_COLL_MIN_THROUGHPUT = 10100.freeze
+end

data/lib/embulk/output/documentdb/header.rb ADDED Viewed

@@ -0,0 +1,55 @@
+require 'time'
+require 'openssl'
+require 'base64'
+require 'erb'
+module AzureDocumentDB
+  class Header
+    def initialize (master_key)
+      @master_key = master_key
+    end
+    def generate (verb, resource_type, parent_resource_id, api_specific_headers = {} )
+      headers = {}
+      utc_date = get_httpdate()
+      auth_token = generate_auth_token(verb, resource_type, parent_resource_id, utc_date )
+      default_headers = {
+            'x-ms-version' => AzureDocumentDB::API_VERSION,
+            'x-ms-date' => utc_date,
+            'authorization' => auth_token
+            }.freeze
+      headers.merge!(default_headers)
+      headers.merge(api_specific_headers)
+    end
+    private
+    def generate_auth_token ( verb, resource_type, resource_id, utc_date)
+      payload = sprintf("%s\n%s\n%s\n%s\n%s\n",
+                verb,
+                resource_type,
+                resource_id,
+                utc_date,
+                "" )
+      sig = hmac_base64encode(payload)
+      ERB::Util.url_encode sprintf("type=%s&ver=%s&sig=%s",
+                AzureDocumentDB::AUTH_TOKEN_TYPE_MASTER,
+                AzureDocumentDB::AUTH_TOKEN_VERSION,
+                sig )
+    end
+    def get_httpdate
+      Time.now.httpdate
+    end
+    def hmac_base64encode( text )
+      key = Base64.urlsafe_decode64 @master_key
+      hmac = OpenSSL::HMAC.digest('sha256', key, text.downcase)
+      Base64.encode64(hmac).strip
+    end
+  end
+end

data/lib/embulk/output/documentdb/partitioned_coll_client.rb ADDED Viewed

@@ -0,0 +1,62 @@
+require 'rest-client'
+require 'json'
+require_relative 'constants'
+require_relative 'header'
+require_relative 'resource'
+module AzureDocumentDB
+  class PartitionedCollectionClient < Client
+    def create_collection(database_resource, collection_name,
+          partition_key_paths, offer_throughput = AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT )
+      if (offer_throughput <  AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
+        raise ArgumentError.new sprintf("Offeer thoughput need to be more than %d !",
+                          AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
+      end
+      if (partition_key_paths.length < 1 )
+        raise ArgumentError.new "No PartitionKey paths!"
+      end
+      colls_options = {
+            'indexingPolicy' => { 'indexingMode' => "consistent", 'automatic'=>true },
+            'partitionKey' => { "paths" => partition_key_paths, "kind" => "Hash" }
+      }
+      custom_headers= {'x-ms-offer-throughput' => offer_throughput }
+      super(database_resource, collection_name, colls_options, custom_headers)
+    end
+    def create_document(collection_resource, document_id, document, partitioned_key )
+      if partitioned_key.empty?
+        raise ArgumentError.new "No partitioned key!"
+      end
+      if !document.key?(partitioned_key)
+        raise ArgumentError.new "No partitioned key in your document!"
+      end
+      partitioned_key_value = document[partitioned_key]
+      custom_headers = {
+          'x-ms-documentdb-partitionkey' => "[\"#{partitioned_key_value}\"]"
+        }
+      super(collection_resource, document_id, document, custom_headers)
+    end
+    def find_documents(collection_resource, document_id,
+                partitioned_key, partitioned_key_value, custom_headers={})
+      if !collection_resource
+        raise ArgumentError.new "No collection_resource!"
+      end
+      ret = {}
+      query_params = []
+      query_text = sprintf("SELECT * FROM c WHERE c.id=@id AND c.%s=@value", partitioned_key)
+      query_params.push( {:name=>"@id", :value=> document_id } )
+      query_params.push( {:name=>"@value", :value=> partitioned_key_value } )
+      url = sprintf("%s/dbs/%s/colls/%s/docs",
+              @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
+      ret = query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
+              collection_resource.collection_rid, url, query_text, query_params, custom_headers)
+      ret
+    end
+  end
+end

data/lib/embulk/output/documentdb/resource.rb ADDED Viewed

@@ -0,0 +1,40 @@
+module AzureDocumentDB
+  class Resource
+    def initialize
+      @r = {}
+    end
+    protected
+      attr_accessor :r
+  end
+  class DatabaseResource < Resource
+    def initialize (database_rid)
+      super()
+      @r['database_rid'] = database_rid
+    end
+    def database_rid
+      @r['database_rid']
+    end
+  end
+  class CollectionResource < Resource
+    def initialize (database_rid, collection_rid)
+      super()
+      @r['database_rid'] = database_rid
+      @r['collection_rid'] = collection_rid
+    end
+    def database_rid
+      @r['database_rid']
+    end
+    def collection_rid
+      @r['collection_rid']
+    end
+  end
+end

data/samples/config-csv2docdb_partitionedcoll.yml ADDED Viewed

@@ -0,0 +1,33 @@
+in:
+  type: file
+  path_prefix: samples/sample_01.csv
+  parser:
+    charset: UTF-8
+    newline: CRLF
+    type: csv
+    delimiter: ','
+    quote: '"'
+    escape: '"'
+    null_string: 'NULL'
+    trim_if_not_quoted: false
+    skip_header_lines: 1
+    allow_extra_columns: false
+    allow_optional_columns: false
+    columns:
+    - {name: id, type: long}
+    - {name: account, type: long}
+    - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
+    - {name: purchase, type: timestamp, format: '%Y%m%d'}
+    - {name: comment, type: string}
+out:
+  type: documentdb
+  docdb_endpoint: https://yoichikademo1.documents.azure.com:443/
+  docdb_account_key: EMwUa3EzsAtJ1qYfzwo9nQ3xxxfsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
+  docdb_database: myembulkdb
+  docdb_collection: myembulkcoll
+  auto_create_database: true
+  auto_create_collection: true
+  partitioned_collection: true
+  partition_key: host
+  offer_throughput: 10100
+  key_column: id

data/samples/config-csv2docdb_singlecoll.yml ADDED Viewed

@@ -0,0 +1,31 @@
+in:
+  type: file
+  path_prefix: samples/sample_01.csv
+  parser:
+    charset: UTF-8
+    newline: CRLF
+    type: csv
+    delimiter: ','
+    quote: '"'
+    escape: '"'
+    null_string: 'NULL'
+    trim_if_not_quoted: false
+    skip_header_lines: 1
+    allow_extra_columns: false
+    allow_optional_columns: false
+    columns:
+    - {name: id, type: long}
+    - {name: account, type: long}
+    - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
+    - {name: purchase, type: timestamp, format: '%Y%m%d'}
+    - {name: comment, type: string}
+out:
+  type: documentdb
+  docdb_endpoint: https://yoichikademo1.documents.azure.com:443/
+  docdb_account_key: EMwUa3EzsAtJ1qYfzwo9nQ3xxxfsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
+  docdb_database: myembulkdb
+  docdb_collection: myembulkcoll
+  auto_create_database: true
+  auto_create_collection: true
+  partitioned_collection: false
+  key_column: id

data/samples/sample_01.csv ADDED Viewed

@@ -0,0 +1,6 @@
+id,account,time,purchase,comment
+0,21123,2016-08-27 19:23:49,20160127,java
+1,32864,2016-08-27 19:23:49,20160127,embulk
+2,14824,2016-08-27 19:01:23,20160127,embulk jruby
+3,27559,2016-08-28 02:20:02,20160128,"Embulk ""csv"" parser plugin"
+4,11270,2016-08-29 11:54:36,20160129,NULL

metadata ADDED Viewed

@@ -0,0 +1,117 @@
+--- !ruby/object:Gem::Specification
+name: embulk-output-documentdb
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Yoichi Kawasaki
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2016-08-28 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rest-client
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: embulk
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 0.8.13
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 0.8.13
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.10.6
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.10.6
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '10.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '10.0'
+description: Dumps records to Azure DocumentDB
+email:
+- yoichi.kawasaki@outlook.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- ChangeLog
+- Gemfile
+- LICENSE.txt
+- README.md
+- Rakefile
+- VERSION
+- embulk-output-documentdb.gemspec
+- lib/embulk/output/documentdb.rb
+- lib/embulk/output/documentdb/client.rb
+- lib/embulk/output/documentdb/constants.rb
+- lib/embulk/output/documentdb/header.rb
+- lib/embulk/output/documentdb/partitioned_coll_client.rb
+- lib/embulk/output/documentdb/resource.rb
+- samples/config-csv2docdb_partitionedcoll.yml
+- samples/config-csv2docdb_singlecoll.yml
+- samples/sample_01.csv
+homepage: https://github.com/yoichika/embulk-output-documentdb
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.6.2
+signing_key:
+specification_version: 4
+summary: Azure DocumentDB output plugin for Embulk
+test_files: []