RubyGems - fluent-plugin-documentdb - Versions diffs - 0.1.2 → 0.2.0 - Mend

fluent-plugin-documentdb 0.1.2 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/ChangeLog +5 -0
data/README.md +55 -5
data/VERSION +1 -1
data/conf/fluent-sample.conf +27 -0
data/fluent-plugin-documentdb.gemspec +3 -3
data/img/fluentd-azure-documentdb-collection.png +0 -0
data/lib/fluent/plugin/documentdb/client.rb +167 -0
data/lib/fluent/plugin/documentdb/constants.rb +10 -0
data/lib/fluent/plugin/documentdb/header.rb +55 -0
data/lib/fluent/plugin/documentdb/partitioned_coll_client.rb +62 -0
data/lib/fluent/plugin/documentdb/resource.rb +40 -0
data/lib/fluent/plugin/out_documentdb.rb +96 -69
data/test/plugin/test_documentdb.rb +8 -3
metadata +11 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: ff6f03b3a3f55afffb9df25d70c21d151413a1e6
-  data.tar.gz: 4380b4c75f0eead0710388c6d39527634683adcb
+  metadata.gz: edf79934c835555db5a5718d55a4246307dd8f94
+  data.tar.gz: 323dc8a0a1b394e509fef89d7caba6895e4087f9
 SHA512:
-  metadata.gz: 529f06c3b973f340572a0e4fc35bf8ca4ca8a682a16f12e2e0a063f6d21427755e7c5068dbe0f56c7846bd30584f49be0efc95707ff14c564d753f2ad5e86c07
-  data.tar.gz: 73b1528c3957f2664414f3dfac50f37d5a005fa73360489e9fc9017f8db89fbdde10174d46a3e4e2187d1fd25ec0c4b91d5e6345ad02559ec189a16702e3496d
+  metadata.gz: ffc08ce92a70407e91124eb2a8e5178c257e5de657be392e734e7bb0e51e6caa7b0f95f3b0fb81da80284cc34beb316079b44a09eefe5db95f67588f96dd5db3
+  data.tar.gz: aacbf0fad60603916af0e56da8e0c7e41f6965a45dad7e31b99c851ec162ce8bd5c4fc95733c2dfcd4196ca4a1635d7127fba36ec50037a2557f55e3149be569

data/ChangeLog CHANGED Viewed

@@ -1,3 +1,8 @@
+Release 0.2.0 - 2016/08/17
+	* Support Partitioned Collection mode
+	* No longer depend on azure-documentdb-sdk instead use very tiny documentdb client library that included in the plugin
 Release 0.1.2 - 2016/02/20
 	* Change gem package dependency option for azure-documentdb-sdk from add_development_dependency to add_dependency

data/README.md CHANGED Viewed

@@ -2,6 +2,10 @@
 fluent-plugin-documentdb is a fluent plugin to output to Azure DocumentDB
+![fluent-plugin-documentdb overview](https://github.com/yokawasa/fluent-plugin-documentdb/raw/master/img/fluentd-azure-documentdb-collection.png)
+[NEWS] From fluent-plugin-documentdb-0.2.0, it supports partitioned collections, not only single-partition collections (See [Partitioning and scaling in Azure DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-partition-data/#single-partition-and-partitioned-collections) for partitioned collections and single-partition collection ).
 ## Installation
     $ gem install fluent-plugin-documentdb
@@ -15,6 +19,7 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
  * Create a DocumentDB database account using [the Azure portal](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-account/), or [Azure Resource Manager templates and Azure CLI](https://azure.microsoft.com/en-us/documentation/articles/documentdb-automation-resource-manager-cli/)
  * [How to create a database for DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-database/)
  * [Create a DocumentDB collection](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-collection/)
+ * [Partitioning and scaling in Azure DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-partition-data/)
 ### Fluentd - fluent.conf
@@ -27,6 +32,9 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
         docdb_collection mycollection
         auto_create_database true
         auto_create_collection true
+        partitioned_collection true
+        partition_key PARTITION_EKY
+        offer_throughput 10100
         time_format %s
         localtime false
         add_time_field true
@@ -41,6 +49,9 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
  * **docdb\_collection (required)** - DocumentDB collection name
  * **auto\_create\_database (optional)** - Default:true. By default, DocumentDB database named **docdb\_database** will be automatically created if it does not exist
  * **auto\_create\_collection (optional)** - Default:true. By default, DocumentDB collection named **docdb\_collection** will be automatically created if it does not exist
+ * **partitioned\_collection (optional)** - Default:false. Set true if you want to create and/or store records to partitioned collection. Set false for single-partition collection
+ * **partition\_key (optional)** - Default:nil. Partition key must be specified for paritioned collection (partitioned\_collection set to be true)
+ * **offer\_throughput (optional)** - Default:10100. Throughput for the collection expressed in units of 100 request units per second. This is only effective when you newly create a partitioned collection (ie. Both auto\_create\_collection and partitioned\_collection are set to be true )
  * **localtime (optional)** - Default:false. By default, time record is inserted with UTC (Coordinated Universal Time). This option allows to use local time if you set localtime true
  * **time\_format (optional)** -  Default:%s. Time format for a time field to be inserted. Default format is %s, that is unix epoch time. If you want it to be more human readable, set this %Y%m%d-%H:%M:%S, for example.
  * **add\_time\_field (optional)** - Default:true. This option allows to insert a time field to record
@@ -49,9 +60,11 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
  * **tag\_field\_name (optional)** - Default:tag. Tag field name to be inserted
-## Expected Records
+## Configuration examples
+fluent-plugin-documentdb will add **id** attribute which is UUID format and any other attributes of record automatically. In addition, it will add **time** and **tag** attributes if **add_time_field** and **add_tag_field** are true respectively. Please see 2 types of the plugin configurations example below - single-parition collection and partitioned collection. Source for fluentd to read is apache access log.
-fluent-plugin-documentdb will add **id** attribute which is UUID format and any other attributes of record automatically. In addition, it will add **time** and **tag** attributes if **add_time_field** and **add_tag_field** are true respectively. For example if you read apache's access log via fluentd, structure of the record to inserted into documentdb will have been like this.
+### (1) Single-Partition Collection Case
 <u>fluent.conf</u>
@@ -68,7 +81,42 @@ fluent-plugin-documentdb will add **id** attribute which is UUID format and any
         docdb_endpoint https://yoichikademo.documents.azure.com:443/
         docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
         docdb_database mydb
-        docdb_collection mycollection
+        docdb_collection my-single-partition-collection
+        auto_create_database true
+        auto_create_collection true
+        partitioned_collection true
+        localtime true
+        time_format %Y%m%d-%H:%M:%S
+        add_time_field true
+        time_field_name time
+        add_tag_field true
+        tag_field_name tag
+    </match>
+### (2) Partitioned Collection Case
+<u>fluent.conf</u>
+    <source>
+        @type tail                          # input plugin
+        path /var/log/apache2/access.log   # monitoring file
+        pos_file /tmp/fluentd_pos_file     # position file
+        format apache                      # format
+        tag documentdb.access              # tag
+    </source>
+    <match documentdb.*>
+        @type documentdb
+        docdb_endpoint https://yoichikademo.documents.azure.com:443/
+        docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
+        docdb_database mydb
+        docdb_collection my-partitioned-collection
+        auto_create_database true
+        auto_create_collection true
+        partitioned_collection true
+        partition_key host
+        offer_throughput 10100
+        auto_create_database
         localtime true
         time_format %Y%m%d-%H:%M:%S
         add_time_field true
@@ -77,6 +125,9 @@ fluent-plugin-documentdb will add **id** attribute which is UUID format and any
         tag_field_name tag
     </match>
+## Sample inputs and expected records
 An expected output record for sample input will be like this:
 <u>Sample Input (apache access log)</u>
@@ -122,8 +173,7 @@ An expected output record for sample input will be like this:
     $ ab -n 5 -c 2 http://localhost/foo/bar/test.html
 ## TODOs
- * Support documentdb sharding. See [How to partition data in DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-sharding/)
- * Support resource tokens access. See [Access Control on DocumentDB Resources](https://msdn.microsoft.com/en-us/library/azure/dn783368.aspx)
+ * Support automatic data expiration with TTL (Time-to-Live ). See [Expire data in DocumentDB collections automatically with time to live](https://azure.microsoft.com/en-us/documentation/articles/documentdb-time-to-live/)
 ## Contributing

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.1.2
1	+ 0.2.0

data/conf/fluent-sample.conf ADDED Viewed

@@ -0,0 +1,27 @@
+<source>
+    @type tail                         # input plugin
+    path /var/log/apache2/access.log   # monitoring file
+    pos_file /tmp/fluentd_pos_file     # position file
+    format apache                      # format
+    tag documentdb.access              # tag
+</source>
+<match documentdb.*>
+    @type documentdb
+    docdb_endpoint https://yoichikademo1.documents.azure.com:443/
+    docdb_account_key  EMwUa3EzsAtJ1qYfzwo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
+    docdb_database mydb
+    docdb_collection mycollection
+    auto_create_database true
+    auto_create_collection true
+    partitioned_collection true
+    partition_key host
+    offer_throughput 10100
+    localtime true
+    time_format %Y%m%d-%H:%M:%S
+    add_time_field true
+    time_field_name time
+    add_tag_field true
+    tag_field_name tag
+</match>

data/fluent-plugin-documentdb.gemspec CHANGED Viewed

@@ -14,13 +14,13 @@ Gem::Specification.new do |gem|
   gem.has_rdoc       = false
   gem.files         = `git ls-files`.split("\n")
-  #gem.executables   = gem.files.grep(%r{^bin/}) { |f| File.basename(f) }
-  gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
+  gem.executables   = gem.files.grep(%r{^bin/}) { |f| File.basename(f) }
+  #gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
   gem.test_files    = gem.files.grep(%r{^(test|gem|features)/})
   gem.require_paths = ["lib"]
   gem.add_dependency "fluentd", [">= 0.10.58", "< 2"]
-  gem.add_dependency "azure-documentdb-sdk"
+  gem.add_dependency "rest-client"
   gem.add_development_dependency "bundler", "~> 1.11"
   gem.add_development_dependency "rake", "~> 10.0"
   gem.add_development_dependency "test-unit"

data/img/fluentd-azure-documentdb-collection.png ADDED Viewed

Binary file

data/lib/fluent/plugin/documentdb/client.rb ADDED Viewed

@@ -0,0 +1,167 @@
+require 'rest-client'
+require 'json'
+require_relative 'constants'
+require_relative 'header'
+require_relative 'resource'
+module AzureDocumentDB
+  class Client
+    def initialize (master_key, url_endpoint)
+      @master_key = master_key
+      @url_endpoint = url_endpoint
+      @header = AzureDocumentDB::Header.new(@master_key)
+    end
+    def create_database (database_name)
+      url = "#{@url_endpoint}/dbs"
+      custom_headers = {'Content-Type' => 'application/json'}
+      headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', custom_headers )
+      body_json = { 'id' => database_name }.to_json
+      res = RestClient.post( url, body_json, headers)
+      JSON.parse(res)
+    end
+    def find_databases_by_name (database_name)
+      query_params = []
+      query_text = "SELECT * FROM root r WHERE r.id=@id"
+      query_params.push( {:name=>"@id", :value=> database_name } )
+      url = sprintf("%s/dbs", @url_endpoint )
+      res = _query(AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', url, query_text, query_params)
+      res
+    end
+    def get_database_resource (database_name)
+      resource  = nil
+      res = find_databases_by_name (database_name)
+      if( res[:body]["_count"].to_i == 0 )
+        p "no #{database_name} database exists"
+        return resource
+      end
+      res[:body]['Databases'].select do |db|
+        if (db['id'] == database_name )
+          resource = AzureDocumentDB::DatabaseResource.new(db['_rid'])
+        end
+      end
+      resource
+    end
+    def create_collection(database_resource, collection_name, colls_options={}, custom_headers={} )
+      if !database_resource
+        raise ArgumentError.new 'No database_resource!'
+      end
+      url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid )
+      custom_headers['Content-Type'] = 'application/json'
+      headers = @header.generate('post',
+            AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
+            database_resource.database_rid, custom_headers )
+      body = {'id' => collection_name }
+      colls_options.each{|k, v|
+        if k == 'indexingPolicy' || k == 'partitionKey'
+          body[k] = v
+        end
+      }
+      res = RestClient.post( url, body.to_json, headers)
+      JSON.parse(res)
+    end
+    def find_collections_by_name(database_resource, collection_name)
+      if !database_resource
+        raise ArgumentError.new 'No database_resource!'
+      end
+      ret = {}
+      query_params = []
+      query_text = "SELECT * FROM root r WHERE r.id=@id"
+      query_params.push( {:name=>"@id", :value=> collection_name } )
+      url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid)
+      ret = _query(AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
+                database_resource.database_rid, url, query_text, query_params)
+      ret
+    end
+    def get_collection_resource (database_resource, collection_name)
+      _collection_rid = ''
+      if !database_resource
+        raise ArgumentError.new 'No database_resource!'
+      end
+      res = find_collections_by_name(database_resource, collection_name)
+      res[:body]['DocumentCollections'].select do |col|
+        if (col['id'] == collection_name )
+          _collection_rid = col['_rid']
+        end
+      end
+      if _collection_rid.empty?
+        p "no #{collection_name} collection exists"
+        return nil
+      end
+      AzureDocumentDB::CollectionResource.new(database_resource.database_rid, _collection_rid)
+    end
+    def create_document(collection_resource, document_id, document, custom_headers={} )
+      if !collection_resource
+        raise ArgumentError.new 'No collection_resource!'
+      end
+      if document['id'] && document_id != document['id']
+        raise ArgumentError.new "Document id mismatch error (#{document_id})!"
+      end
+      body = { 'id' => document_id }.merge document
+      url = sprintf("%s/dbs/%s/colls/%s/docs",
+                  @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
+      custom_headers['Content-Type'] = 'application/json'
+      headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
+                                  collection_resource.collection_rid, custom_headers )
+      res = RestClient.post( url, body.to_json, headers)
+      JSON.parse(res)
+    end
+    def find_documents(collection_resource, document_id, custom_headers={})
+      if !collection_resource
+        raise ArgumentError.new 'No collection_resource!'
+      end
+      ret = {}
+      query_params = []
+      query_text = "SELECT * FROM c WHERE c.id=@id"
+      query_params.push( {:name=>"@id", :value=> document_id } )
+      url = sprintf("%s/dbs/%s/colls/%s/docs",
+              @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
+      ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
+              collection_resource.collection_rid, url, query_text, query_params, custom_headers)
+      ret
+    end
+    def query_documents( collection_resource, query_text, query_params, custom_headers={} )
+      if !collection_resource
+        raise ArgumentError.new 'No collection_resource!'
+      end
+      ret = {}
+      url = sprintf("%s/dbs/%s/colls/%s/docs",
+              @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
+      ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
+              collection_resource.collection_rid, url, query_text, query_params, custom_headers)
+      ret
+    end
+    protected
+    def _query( resource_type, parent_resource_id, url, query_text, query_params, custom_headers={} )
+      query_specific_header = {
+              'x-ms-documentdb-isquery' => 'True',
+              'Content-Type' => 'application/query+json',
+              'Accept' => 'application/json'
+             }
+      query_specific_header.merge! custom_headers
+      headers = @header.generate('post', resource_type, parent_resource_id, query_specific_header)
+      body_json = {
+              :query => query_text,
+              :parameters => query_params
+             }.to_json
+      res = RestClient.post( url, body_json, headers)
+      result = {
+          :header => res.headers,
+          :body => JSON.parse(res.body) }
+      return result
+    end
+  end
+end

data/lib/fluent/plugin/documentdb/constants.rb ADDED Viewed

@@ -0,0 +1,10 @@
+module AzureDocumentDB
+  API_VERSION = '2015-12-16'.freeze
+  RESOURCE_TYPE_DATABASE='dbs'.freeze
+  RESOURCE_TYPE_COLLECTION='colls'.freeze
+  RESOURCE_TYPE_DOCUMENT='docs'.freeze
+  AUTH_TOKEN_VERSION = '1.0'.freeze
+  AUTH_TOKEN_TYPE_MASTER = 'master'.freeze
+  AUTH_TOKEN_TYPE_RESOURCE = 'resource'.freeze
+  PARTITIONED_COLL_MIN_THROUGHPUT = 10100.freeze
+end

data/lib/fluent/plugin/documentdb/header.rb ADDED Viewed

@@ -0,0 +1,55 @@
+require 'time'
+require 'openssl'
+require 'base64'
+require 'erb'
+module AzureDocumentDB
+  class Header
+    def initialize (master_key)
+      @master_key = master_key
+    end
+    def generate (verb, resource_type, parent_resource_id, api_specific_headers = {} )
+      headers = {}
+      utc_date = get_httpdate()
+      auth_token = generate_auth_token(verb, resource_type, parent_resource_id, utc_date )
+      default_headers = {
+            'x-ms-version' => AzureDocumentDB::API_VERSION,
+            'x-ms-date' => utc_date,
+            'authorization' => auth_token
+            }.freeze
+      headers.merge!(default_headers)
+      headers.merge(api_specific_headers)
+    end
+    private
+    def generate_auth_token ( verb, resource_type, resource_id, utc_date)
+      payload = sprintf("%s\n%s\n%s\n%s\n%s\n",
+                verb,
+                resource_type,
+                resource_id,
+                utc_date,
+                "" )
+      sig = hmac_base64encode(payload)
+      ERB::Util.url_encode sprintf("type=%s&ver=%s&sig=%s",
+                AzureDocumentDB::AUTH_TOKEN_TYPE_MASTER,
+                AzureDocumentDB::AUTH_TOKEN_VERSION,
+                sig )
+    end
+    def get_httpdate
+      Time.now.httpdate
+    end
+    def hmac_base64encode( text )
+      key = Base64.urlsafe_decode64 @master_key
+      hmac = OpenSSL::HMAC.digest('sha256', key, text.downcase)
+      Base64.encode64(hmac).strip
+    end
+  end
+end

data/lib/fluent/plugin/documentdb/partitioned_coll_client.rb ADDED Viewed

@@ -0,0 +1,62 @@
+require 'rest-client'
+require 'json'
+require_relative 'constants'
+require_relative 'header'
+require_relative 'resource'
+module AzureDocumentDB
+  class PartitionedCollectionClient < Client
+    def create_collection(database_resource, collection_name,
+          partition_key_paths, offer_throughput = AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT )
+      if (offer_throughput <  AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
+        raise ArgumentError.new sprintf("Offeer thoughput need to be more than %d !",
+                          AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
+      end
+      if (partition_key_paths.length < 1 )
+        raise ArgumentError.new "No PartitionKey paths!"
+      end
+      colls_options = {
+            'indexingPolicy' => { 'indexingMode' => "consistent", 'automatic'=>true },
+            'partitionKey' => { "paths" => partition_key_paths, "kind" => "Hash" }
+      }
+      custom_headers= {'x-ms-offer-throughput' => offer_throughput }
+      super(database_resource, collection_name, colls_options, custom_headers)
+    end
+    def create_document(collection_resource, document_id, document, partitioned_key )
+      if partitioned_key.empty?
+        raise ArgumentError.new "No partitioned key!"
+      end
+      if !document.key?(partitioned_key)
+        raise ArgumentError.new "No partitioned key in your document!"
+      end
+      partitioned_key_value = document[partitioned_key]
+      custom_headers = {
+          'x-ms-documentdb-partitionkey' => "[\"#{partitioned_key_value}\"]"
+        }
+      super(collection_resource, document_id, document, custom_headers)
+    end
+    def find_documents(collection_resource, document_id,
+                partitioned_key, partitioned_key_value, custom_headers={})
+      if !collection_resource
+        raise ArgumentError.new "No collection_resource!"
+      end
+      ret = {}
+      query_params = []
+      query_text = sprintf("SELECT * FROM c WHERE c.id=@id AND c.%s=@value", partitioned_key)
+      query_params.push( {:name=>"@id", :value=> document_id } )
+      query_params.push( {:name=>"@value", :value=> partitioned_key_value } )
+      url = sprintf("%s/dbs/%s/colls/%s/docs",
+              @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
+      ret = query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
+              collection_resource.collection_rid, url, query_text, query_params, custom_headers)
+      ret
+    end
+  end
+end

data/lib/fluent/plugin/documentdb/resource.rb ADDED Viewed

@@ -0,0 +1,40 @@
+module AzureDocumentDB
+  class Resource
+    def initialize
+      @r = {}
+    end
+    protected
+      attr_accessor :r
+  end
+  class DatabaseResource < Resource
+    def initialize (database_rid)
+      super()
+      @r['database_rid'] = database_rid
+    end
+    def database_rid
+      @r['database_rid']
+    end
+  end
+  class CollectionResource < Resource
+    def initialize (database_rid, collection_rid)
+      super()
+      @r['database_rid'] = database_rid
+      @r['collection_rid'] = collection_rid
+    end
+    def database_rid
+      @r['database_rid']
+    end
+    def collection_rid
+      @r['collection_rid']
+    end
+  end
+end

data/lib/fluent/plugin/out_documentdb.rb CHANGED Viewed

@@ -1,15 +1,21 @@
 # -*- coding: utf-8 -*-
 module Fluent
+  require 'fluent/plugin/documentdb/constants'
   class DocumentdbOutput < BufferedOutput
     Plugin.register_output('documentdb', self)
     def initialize
-        super
-        require 'documentdb'
-        require 'msgpack'
-        require 'time'
-        require 'securerandom'
+      super
+      require 'msgpack'
+      require 'time'
+      require 'securerandom'
+      require 'fluent/plugin/documentdb/client'
+      require 'fluent/plugin/documentdb/partitioned_coll_client'
+      require 'fluent/plugin/documentdb/header'
+      require 'fluent/plugin/documentdb/resource'
     end
     config_param :docdb_endpoint, :string
@@ -18,6 +24,9 @@ module Fluent
     config_param :docdb_collection, :string
     config_param :auto_create_database, :bool, :default => true
     config_param :auto_create_collection, :bool, :default => true
+    config_param :partitioned_collection, :bool, :default => false
+    config_param :partition_key, :string, :default => nil
+    config_param :offer_throughput, :integer, :default => AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT
     config_param :time_format, :string, :default => nil
     config_param :localtime, :bool, default: false
     config_param :add_time_field, :bool, :default => true
@@ -26,88 +35,106 @@ module Fluent
     config_param :tag_field_name, :string, :default => 'tag'
     def configure(conf)
-        super
-        raise ConfigError, 'no docdb_endpoint' if @docdb_endpoint.empty?
-        raise ConfigError, 'no docdb_account_key' if @docdb_account_key.empty?
-        raise ConfigError, 'no docdb_database' if @docdb_database.empty?
-        raise ConfigError, 'no docdb_collection' if @docdb_collection.empty?
-        if @add_time_field and @time_field_name.empty?
-            raise ConfigError, 'time_field_name is needed if add_time_field is true'
-        end
-        if @add_tag_field and @tag_field_name.empty?
-            raise ConfigError, 'tag_field_name is needed if add_tag_field is true'
+      super
+      raise ConfigError, 'no docdb_endpoint' if @docdb_endpoint.empty?
+      raise ConfigError, 'no docdb_account_key' if @docdb_account_key.empty?
+      raise ConfigError, 'no docdb_database' if @docdb_database.empty?
+      raise ConfigError, 'no docdb_collection' if @docdb_collection.empty?
+      if @add_time_field and @time_field_name.empty?
+        raise ConfigError, 'time_field_name must be set if add_time_field is true'
+      end
+      if @add_tag_field and @tag_field_name.empty?
+        raise ConfigError, 'tag_field_name must be set if add_tag_field is true'
+      end
+      if @partitioned_collection
+        raise ConfigError, 'partition_key must be set in partitioned collection mode' if @partition_key.empty?
+        if (@auto_create_collection &&
+              @offer_throughput < AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
+          raise ConfigError, sprintf("offer_throughput must be more than and equals to %s",
+                                 AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
         end
-        @timef = TimeFormatter.new(@time_format, @localtime)
+      end
+      @timef = TimeFormatter.new(@time_format, @localtime)
     end
     def start
-        super
+      super
-        begin
-            context = Azure::DocumentDB::Context.new @docdb_endpoint, @docdb_account_key
+      begin
-            ## initial operations for database
-            database = Azure::DocumentDB::Database.new context, RestClient
-            qreq = Azure::DocumentDB::QueryRequest.new "SELECT * FROM root r WHERE r.id=@id"
-            qreq.parameters.add "@id", @docdb_database
-            query = database.query
-            qres = query.execute qreq
-            if( qres[:body]["_count"].to_i == 0 )
-                raise "No database (#{docdb_database}) exists! Enable auto_create_database or create it by useself" if !@auto_create_database
-                # create new database as it doesn't exists
-                database.create @docdb_database
-            end
+        @client = nil
+        if @partitioned_collection
+          @client = AzureDocumentDB::PartitionedCollectionClient.new(@docdb_account_key,@docdb_endpoint)
+        else
+          @client = AzureDocumentDB::Client.new(@docdb_account_key,@docdb_endpoint)
+        end
-            ## initial operations for collection
-            collection = database.collection_for_name @docdb_database
-            qreq = Azure::DocumentDB::QueryRequest.new "SELECT * FROM root r WHERE r.id=@id"
-            qreq.parameters.add "@id", @docdb_collection
-            query = collection.query
-            qres = query.execute qreq
-            if( qres[:body]["_count"].to_i == 0 )
-                raise "No collection (#{docdb_collection}) exists! Enable auto_create_collection or create it by useself" if !@auto_create_collection
-                # create new collection as it doesn't exists
-                collection.create @docdb_collection
-            end
-            @docdb = collection.document_for_name @docdb_collection
+        ## initial operations for database
+        res = @client.find_databases_by_name(@docdb_database)
+        if( res[:body]["_count"].to_i == 0 )
+          raise "No database (#{docdb_database}) exists! Enable auto_create_database or create it by useself" if !@auto_create_database
+          # create new database as it doesn't exists
+          @client.create_database(@docdb_database)
+        end
-        rescue Exception =>ex
-            $log.fatal "Error: '#{ex}'"
-            exit!
+        ## initial operations for collection
+        database_resource = @client.get_database_resource(@docdb_database)
+        res = @client.find_collections_by_name(database_resource, @docdb_collection)
+        if( res[:body]["_count"].to_i == 0 )
+          raise "No collection (#{docdb_collection}) exists! Enable auto_create_collection or create it by useself" if !@auto_create_collection
+          # create new collection as it doesn't exists
+          if @partitioned_collection
+            partition_key_paths = ["/#{@partition_key}"]
+            @client.create_collection(database_resource,
+                        @docdb_collection, partition_key_paths, @offer_throughput)
+          else
+            @client.create_collection(database_resource, @docdb_collection)
+          end
         end
+        @coll_resource = @client.get_collection_resource(database_resource, @docdb_collection)
+      rescue Exception =>ex
+        $log.fatal "Error: '#{ex}'"
+        exit!
+      end
     end
     def shutdown
-        super
-        # destroy
+      super
+      # destroy
     end
     def format(tag, time, record)
-        record['id'] =  SecureRandom.uuid
-        if @add_time_field
-            record[@time_field_name] = @timef.format(time)
-        end
-        if @add_tag_field
-            record[@tag_field_name] = tag
-        end
-        record.to_msgpack
+      record['id'] =  SecureRandom.uuid
+      if @add_time_field
+        record[@time_field_name] = @timef.format(time)
+      end
+      if @add_tag_field
+        record[@tag_field_name] = tag
+      end
+      record.to_msgpack
     end
     def write(chunk)
-        records = []
-        chunk.msgpack_each { |record|
-            unique_doc_identifier = record["id"]
-            docdata = record.to_json
-            begin
-                @docdb.create unique_doc_identifier, docdata
-            rescue Exception => ex
-                $log.fatal "UnknownError: '#{ex}'"
-                            + ", uniqueid=>#{unique_doc_identifier}, data=>"
-                            + docdata.to_s
-            end
-        }
+      chunk.msgpack_each { |record|
+        unique_doc_identifier = record["id"]
+        begin
+          if @partitioned_collection
+            @client.create_document(@coll_resource, unique_doc_identifier, record, @partition_key)
+          else
+            @client.create_document(@coll_resource, unique_doc_identifier, record, @partition_key)
+          end
+        rescue RestClient::ExceptionWithResponse => rcex
+          exdict = JSON.parse(rcex.response)
+          if exdict['code'] == 'Conflict'
+            $log.fatal "Duplicate Error: document #{unique_document_identifier} already exists, data=>" + record.to_json
+          else
+            $log.fatal "RestClient Error: '#{rcex.response}', data=>" + record.to_json
+          end
+        rescue => ex
+          $log.fatal "UnknownError: '#{ex}', uniqueid=>#{unique_doc_identifier}, data=>" + record.to_json
+        end
+      }
     end
   end
 end

data/test/plugin/test_documentdb.rb CHANGED Viewed

@@ -6,12 +6,17 @@ class DocumentdbOutputTest < Test::Unit::TestCase
   end
   CONFIG = %[
-    docdb_endpoint DOCUMENTDB_ACCOUNT_ENDPOINT
-    docdb_account_key DOCUMENTDB_ACCOUNT_KEY
+    docdb_endpoint https://yoichikademo1.documents.azure.com:443
+    docdb_account_key EMwUa3EzsAtJ1qYfzwo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
     docdb_database mydb
     docdb_collection mycollection
-    localtime true
+    auto_create_database true
+    auto_create_collection true
+    partitioned_collection true
+    partition_key host
+    offer_throughput 10100
     time_format %Y%m%d-%H:%M:%S
+    localtime false
     add_time_field true
     time_field_name time
     add_tag_field true

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: fluent-plugin-documentdb
 version: !ruby/object:Gem::Version
-  version: 0.1.2
+  version: 0.2.0
 platform: ruby
 authors:
 - Yoichi Kawasaki
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-02-20 00:00:00.000000000 Z
+date: 2016-08-18 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: fluentd
@@ -31,7 +31,7 @@ dependencies:
       - !ruby/object:Gem::Version
         version: '2'
 - !ruby/object:Gem::Dependency
-  name: azure-documentdb-sdk
+  name: rest-client
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
@@ -100,7 +100,14 @@ files:
 - README.md
 - Rakefile
 - VERSION
+- conf/fluent-sample.conf
 - fluent-plugin-documentdb.gemspec
+- img/fluentd-azure-documentdb-collection.png
+- lib/fluent/plugin/documentdb/client.rb
+- lib/fluent/plugin/documentdb/constants.rb
+- lib/fluent/plugin/documentdb/header.rb
+- lib/fluent/plugin/documentdb/partitioned_coll_client.rb
+- lib/fluent/plugin/documentdb/resource.rb
 - lib/fluent/plugin/out_documentdb.rb
 - test/helper.rb
 - test/plugin/test_documentdb.rb
@@ -124,7 +131,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.5.1
+rubygems_version: 2.6.2
 signing_key:
 specification_version: 4
 summary: Azure DocumentDB output plugin for Fluentd