fluent-plugin-documentdb 0.1.2 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ff6f03b3a3f55afffb9df25d70c21d151413a1e6
4
- data.tar.gz: 4380b4c75f0eead0710388c6d39527634683adcb
3
+ metadata.gz: edf79934c835555db5a5718d55a4246307dd8f94
4
+ data.tar.gz: 323dc8a0a1b394e509fef89d7caba6895e4087f9
5
5
  SHA512:
6
- metadata.gz: 529f06c3b973f340572a0e4fc35bf8ca4ca8a682a16f12e2e0a063f6d21427755e7c5068dbe0f56c7846bd30584f49be0efc95707ff14c564d753f2ad5e86c07
7
- data.tar.gz: 73b1528c3957f2664414f3dfac50f37d5a005fa73360489e9fc9017f8db89fbdde10174d46a3e4e2187d1fd25ec0c4b91d5e6345ad02559ec189a16702e3496d
6
+ metadata.gz: ffc08ce92a70407e91124eb2a8e5178c257e5de657be392e734e7bb0e51e6caa7b0f95f3b0fb81da80284cc34beb316079b44a09eefe5db95f67588f96dd5db3
7
+ data.tar.gz: aacbf0fad60603916af0e56da8e0c7e41f6965a45dad7e31b99c851ec162ce8bd5c4fc95733c2dfcd4196ca4a1635d7127fba36ec50037a2557f55e3149be569
data/ChangeLog CHANGED
@@ -1,3 +1,8 @@
1
+ Release 0.2.0 - 2016/08/17
2
+
3
+ * Support Partitioned Collection mode
4
+ * No longer depend on azure-documentdb-sdk instead use very tiny documentdb client library that included in the plugin
5
+
1
6
  Release 0.1.2 - 2016/02/20
2
7
 
3
8
  * Change gem package dependency option for azure-documentdb-sdk from add_development_dependency to add_dependency
data/README.md CHANGED
@@ -2,6 +2,10 @@
2
2
 
3
3
  fluent-plugin-documentdb is a fluent plugin to output to Azure DocumentDB
4
4
 
5
+ ![fluent-plugin-documentdb overview](https://github.com/yokawasa/fluent-plugin-documentdb/raw/master/img/fluentd-azure-documentdb-collection.png)
6
+
7
+ [NEWS] From fluent-plugin-documentdb-0.2.0, it supports partitioned collections, not only single-partition collections (See [Partitioning and scaling in Azure DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-partition-data/#single-partition-and-partitioned-collections) for partitioned collections and single-partition collection ).
8
+
5
9
  ## Installation
6
10
 
7
11
  $ gem install fluent-plugin-documentdb
@@ -15,6 +19,7 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
15
19
  * Create a DocumentDB database account using [the Azure portal](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-account/), or [Azure Resource Manager templates and Azure CLI](https://azure.microsoft.com/en-us/documentation/articles/documentdb-automation-resource-manager-cli/)
16
20
  * [How to create a database for DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-database/)
17
21
  * [Create a DocumentDB collection](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-collection/)
22
+ * [Partitioning and scaling in Azure DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-partition-data/)
18
23
 
19
24
 
20
25
  ### Fluentd - fluent.conf
@@ -27,6 +32,9 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
27
32
  docdb_collection mycollection
28
33
  auto_create_database true
29
34
  auto_create_collection true
35
+ partitioned_collection true
36
+ partition_key PARTITION_EKY
37
+ offer_throughput 10100
30
38
  time_format %s
31
39
  localtime false
32
40
  add_time_field true
@@ -41,6 +49,9 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
41
49
  * **docdb\_collection (required)** - DocumentDB collection name
42
50
  * **auto\_create\_database (optional)** - Default:true. By default, DocumentDB database named **docdb\_database** will be automatically created if it does not exist
43
51
  * **auto\_create\_collection (optional)** - Default:true. By default, DocumentDB collection named **docdb\_collection** will be automatically created if it does not exist
52
+ * **partitioned\_collection (optional)** - Default:false. Set true if you want to create and/or store records to partitioned collection. Set false for single-partition collection
53
+ * **partition\_key (optional)** - Default:nil. Partition key must be specified for paritioned collection (partitioned\_collection set to be true)
54
+ * **offer\_throughput (optional)** - Default:10100. Throughput for the collection expressed in units of 100 request units per second. This is only effective when you newly create a partitioned collection (ie. Both auto\_create\_collection and partitioned\_collection are set to be true )
44
55
  * **localtime (optional)** - Default:false. By default, time record is inserted with UTC (Coordinated Universal Time). This option allows to use local time if you set localtime true
45
56
  * **time\_format (optional)** - Default:%s. Time format for a time field to be inserted. Default format is %s, that is unix epoch time. If you want it to be more human readable, set this %Y%m%d-%H:%M:%S, for example.
46
57
  * **add\_time\_field (optional)** - Default:true. This option allows to insert a time field to record
@@ -49,9 +60,11 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
49
60
  * **tag\_field\_name (optional)** - Default:tag. Tag field name to be inserted
50
61
 
51
62
 
52
- ## Expected Records
63
+ ## Configuration examples
64
+
65
+ fluent-plugin-documentdb will add **id** attribute which is UUID format and any other attributes of record automatically. In addition, it will add **time** and **tag** attributes if **add_time_field** and **add_tag_field** are true respectively. Please see 2 types of the plugin configurations example below - single-parition collection and partitioned collection. Source for fluentd to read is apache access log.
53
66
 
54
- fluent-plugin-documentdb will add **id** attribute which is UUID format and any other attributes of record automatically. In addition, it will add **time** and **tag** attributes if **add_time_field** and **add_tag_field** are true respectively. For example if you read apache's access log via fluentd, structure of the record to inserted into documentdb will have been like this.
67
+ ### (1) Single-Partition Collection Case
55
68
 
56
69
  <u>fluent.conf</u>
57
70
 
@@ -68,7 +81,42 @@ fluent-plugin-documentdb will add **id** attribute which is UUID format and any
68
81
  docdb_endpoint https://yoichikademo.documents.azure.com:443/
69
82
  docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
70
83
  docdb_database mydb
71
- docdb_collection mycollection
84
+ docdb_collection my-single-partition-collection
85
+ auto_create_database true
86
+ auto_create_collection true
87
+ partitioned_collection true
88
+ localtime true
89
+ time_format %Y%m%d-%H:%M:%S
90
+ add_time_field true
91
+ time_field_name time
92
+ add_tag_field true
93
+ tag_field_name tag
94
+ </match>
95
+
96
+ ### (2) Partitioned Collection Case
97
+
98
+ <u>fluent.conf</u>
99
+
100
+ <source>
101
+ @type tail # input plugin
102
+ path /var/log/apache2/access.log # monitoring file
103
+ pos_file /tmp/fluentd_pos_file # position file
104
+ format apache # format
105
+ tag documentdb.access # tag
106
+ </source>
107
+
108
+ <match documentdb.*>
109
+ @type documentdb
110
+ docdb_endpoint https://yoichikademo.documents.azure.com:443/
111
+ docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
112
+ docdb_database mydb
113
+ docdb_collection my-partitioned-collection
114
+ auto_create_database true
115
+ auto_create_collection true
116
+ partitioned_collection true
117
+ partition_key host
118
+ offer_throughput 10100
119
+ auto_create_database
72
120
  localtime true
73
121
  time_format %Y%m%d-%H:%M:%S
74
122
  add_time_field true
@@ -77,6 +125,9 @@ fluent-plugin-documentdb will add **id** attribute which is UUID format and any
77
125
  tag_field_name tag
78
126
  </match>
79
127
 
128
+
129
+ ## Sample inputs and expected records
130
+
80
131
  An expected output record for sample input will be like this:
81
132
 
82
133
  <u>Sample Input (apache access log)</u>
@@ -122,8 +173,7 @@ An expected output record for sample input will be like this:
122
173
  $ ab -n 5 -c 2 http://localhost/foo/bar/test.html
123
174
 
124
175
  ## TODOs
125
- * Support documentdb sharding. See [How to partition data in DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-sharding/)
126
- * Support resource tokens access. See [Access Control on DocumentDB Resources](https://msdn.microsoft.com/en-us/library/azure/dn783368.aspx)
176
+ * Support automatic data expiration with TTL (Time-to-Live ). See [Expire data in DocumentDB collections automatically with time to live](https://azure.microsoft.com/en-us/documentation/articles/documentdb-time-to-live/)
127
177
 
128
178
  ## Contributing
129
179
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.2
1
+ 0.2.0
@@ -0,0 +1,27 @@
1
+ <source>
2
+ @type tail # input plugin
3
+ path /var/log/apache2/access.log # monitoring file
4
+ pos_file /tmp/fluentd_pos_file # position file
5
+ format apache # format
6
+ tag documentdb.access # tag
7
+ </source>
8
+
9
+ <match documentdb.*>
10
+ @type documentdb
11
+ docdb_endpoint https://yoichikademo1.documents.azure.com:443/
12
+ docdb_account_key EMwUa3EzsAtJ1qYfzwo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
13
+ docdb_database mydb
14
+ docdb_collection mycollection
15
+ auto_create_database true
16
+ auto_create_collection true
17
+ partitioned_collection true
18
+ partition_key host
19
+ offer_throughput 10100
20
+ localtime true
21
+ time_format %Y%m%d-%H:%M:%S
22
+ add_time_field true
23
+ time_field_name time
24
+ add_tag_field true
25
+ tag_field_name tag
26
+ </match>
27
+
@@ -14,13 +14,13 @@ Gem::Specification.new do |gem|
14
14
  gem.has_rdoc = false
15
15
 
16
16
  gem.files = `git ls-files`.split("\n")
17
- #gem.executables = gem.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
- gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
17
+ gem.executables = gem.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
+ #gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
19
19
  gem.test_files = gem.files.grep(%r{^(test|gem|features)/})
20
20
  gem.require_paths = ["lib"]
21
21
 
22
22
  gem.add_dependency "fluentd", [">= 0.10.58", "< 2"]
23
- gem.add_dependency "azure-documentdb-sdk"
23
+ gem.add_dependency "rest-client"
24
24
  gem.add_development_dependency "bundler", "~> 1.11"
25
25
  gem.add_development_dependency "rake", "~> 10.0"
26
26
  gem.add_development_dependency "test-unit"
@@ -0,0 +1,167 @@
1
+ require 'rest-client'
2
+ require 'json'
3
+ require_relative 'constants'
4
+ require_relative 'header'
5
+ require_relative 'resource'
6
+
7
+ module AzureDocumentDB
8
+
9
+ class Client
10
+
11
+ def initialize (master_key, url_endpoint)
12
+ @master_key = master_key
13
+ @url_endpoint = url_endpoint
14
+ @header = AzureDocumentDB::Header.new(@master_key)
15
+ end
16
+
17
+ def create_database (database_name)
18
+ url = "#{@url_endpoint}/dbs"
19
+ custom_headers = {'Content-Type' => 'application/json'}
20
+ headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', custom_headers )
21
+ body_json = { 'id' => database_name }.to_json
22
+ res = RestClient.post( url, body_json, headers)
23
+ JSON.parse(res)
24
+ end
25
+
26
+ def find_databases_by_name (database_name)
27
+ query_params = []
28
+ query_text = "SELECT * FROM root r WHERE r.id=@id"
29
+ query_params.push( {:name=>"@id", :value=> database_name } )
30
+ url = sprintf("%s/dbs", @url_endpoint )
31
+ res = _query(AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', url, query_text, query_params)
32
+ res
33
+ end
34
+
35
+ def get_database_resource (database_name)
36
+ resource = nil
37
+ res = find_databases_by_name (database_name)
38
+ if( res[:body]["_count"].to_i == 0 )
39
+ p "no #{database_name} database exists"
40
+ return resource
41
+ end
42
+ res[:body]['Databases'].select do |db|
43
+ if (db['id'] == database_name )
44
+ resource = AzureDocumentDB::DatabaseResource.new(db['_rid'])
45
+ end
46
+ end
47
+ resource
48
+ end
49
+
50
+ def create_collection(database_resource, collection_name, colls_options={}, custom_headers={} )
51
+ if !database_resource
52
+ raise ArgumentError.new 'No database_resource!'
53
+ end
54
+ url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid )
55
+ custom_headers['Content-Type'] = 'application/json'
56
+ headers = @header.generate('post',
57
+ AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
58
+ database_resource.database_rid, custom_headers )
59
+ body = {'id' => collection_name }
60
+ colls_options.each{|k, v|
61
+ if k == 'indexingPolicy' || k == 'partitionKey'
62
+ body[k] = v
63
+ end
64
+ }
65
+ res = RestClient.post( url, body.to_json, headers)
66
+ JSON.parse(res)
67
+ end
68
+
69
+ def find_collections_by_name(database_resource, collection_name)
70
+ if !database_resource
71
+ raise ArgumentError.new 'No database_resource!'
72
+ end
73
+ ret = {}
74
+ query_params = []
75
+ query_text = "SELECT * FROM root r WHERE r.id=@id"
76
+ query_params.push( {:name=>"@id", :value=> collection_name } )
77
+ url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid)
78
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
79
+ database_resource.database_rid, url, query_text, query_params)
80
+ ret
81
+ end
82
+
83
+ def get_collection_resource (database_resource, collection_name)
84
+ _collection_rid = ''
85
+ if !database_resource
86
+ raise ArgumentError.new 'No database_resource!'
87
+ end
88
+ res = find_collections_by_name(database_resource, collection_name)
89
+ res[:body]['DocumentCollections'].select do |col|
90
+ if (col['id'] == collection_name )
91
+ _collection_rid = col['_rid']
92
+ end
93
+ end
94
+ if _collection_rid.empty?
95
+ p "no #{collection_name} collection exists"
96
+ return nil
97
+ end
98
+ AzureDocumentDB::CollectionResource.new(database_resource.database_rid, _collection_rid)
99
+ end
100
+
101
+ def create_document(collection_resource, document_id, document, custom_headers={} )
102
+ if !collection_resource
103
+ raise ArgumentError.new 'No collection_resource!'
104
+ end
105
+ if document['id'] && document_id != document['id']
106
+ raise ArgumentError.new "Document id mismatch error (#{document_id})!"
107
+ end
108
+ body = { 'id' => document_id }.merge document
109
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
110
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
111
+ custom_headers['Content-Type'] = 'application/json'
112
+ headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
113
+ collection_resource.collection_rid, custom_headers )
114
+ res = RestClient.post( url, body.to_json, headers)
115
+ JSON.parse(res)
116
+ end
117
+
118
+ def find_documents(collection_resource, document_id, custom_headers={})
119
+ if !collection_resource
120
+ raise ArgumentError.new 'No collection_resource!'
121
+ end
122
+ ret = {}
123
+ query_params = []
124
+ query_text = "SELECT * FROM c WHERE c.id=@id"
125
+ query_params.push( {:name=>"@id", :value=> document_id } )
126
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
127
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
128
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
129
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
130
+ ret
131
+ end
132
+
133
+ def query_documents( collection_resource, query_text, query_params, custom_headers={} )
134
+ if !collection_resource
135
+ raise ArgumentError.new 'No collection_resource!'
136
+ end
137
+ ret = {}
138
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
139
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
140
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
141
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
142
+ ret
143
+ end
144
+
145
+ protected
146
+
147
+ def _query( resource_type, parent_resource_id, url, query_text, query_params, custom_headers={} )
148
+ query_specific_header = {
149
+ 'x-ms-documentdb-isquery' => 'True',
150
+ 'Content-Type' => 'application/query+json',
151
+ 'Accept' => 'application/json'
152
+ }
153
+ query_specific_header.merge! custom_headers
154
+ headers = @header.generate('post', resource_type, parent_resource_id, query_specific_header)
155
+ body_json = {
156
+ :query => query_text,
157
+ :parameters => query_params
158
+ }.to_json
159
+
160
+ res = RestClient.post( url, body_json, headers)
161
+ result = {
162
+ :header => res.headers,
163
+ :body => JSON.parse(res.body) }
164
+ return result
165
+ end
166
+ end
167
+ end
@@ -0,0 +1,10 @@
1
+ module AzureDocumentDB
2
+ API_VERSION = '2015-12-16'.freeze
3
+ RESOURCE_TYPE_DATABASE='dbs'.freeze
4
+ RESOURCE_TYPE_COLLECTION='colls'.freeze
5
+ RESOURCE_TYPE_DOCUMENT='docs'.freeze
6
+ AUTH_TOKEN_VERSION = '1.0'.freeze
7
+ AUTH_TOKEN_TYPE_MASTER = 'master'.freeze
8
+ AUTH_TOKEN_TYPE_RESOURCE = 'resource'.freeze
9
+ PARTITIONED_COLL_MIN_THROUGHPUT = 10100.freeze
10
+ end
@@ -0,0 +1,55 @@
1
+ require 'time'
2
+ require 'openssl'
3
+ require 'base64'
4
+ require 'erb'
5
+
6
+ module AzureDocumentDB
7
+
8
+ class Header
9
+
10
+ def initialize (master_key)
11
+ @master_key = master_key
12
+ end
13
+
14
+ def generate (verb, resource_type, parent_resource_id, api_specific_headers = {} )
15
+ headers = {}
16
+ utc_date = get_httpdate()
17
+ auth_token = generate_auth_token(verb, resource_type, parent_resource_id, utc_date )
18
+ default_headers = {
19
+ 'x-ms-version' => AzureDocumentDB::API_VERSION,
20
+ 'x-ms-date' => utc_date,
21
+ 'authorization' => auth_token
22
+ }.freeze
23
+ headers.merge!(default_headers)
24
+ headers.merge(api_specific_headers)
25
+ end
26
+
27
+ private
28
+
29
+ def generate_auth_token ( verb, resource_type, resource_id, utc_date)
30
+ payload = sprintf("%s\n%s\n%s\n%s\n%s\n",
31
+ verb,
32
+ resource_type,
33
+ resource_id,
34
+ utc_date,
35
+ "" )
36
+ sig = hmac_base64encode(payload)
37
+
38
+ ERB::Util.url_encode sprintf("type=%s&ver=%s&sig=%s",
39
+ AzureDocumentDB::AUTH_TOKEN_TYPE_MASTER,
40
+ AzureDocumentDB::AUTH_TOKEN_VERSION,
41
+ sig )
42
+ end
43
+
44
+ def get_httpdate
45
+ Time.now.httpdate
46
+ end
47
+
48
+ def hmac_base64encode( text )
49
+ key = Base64.urlsafe_decode64 @master_key
50
+ hmac = OpenSSL::HMAC.digest('sha256', key, text.downcase)
51
+ Base64.encode64(hmac).strip
52
+ end
53
+
54
+ end
55
+ end
@@ -0,0 +1,62 @@
1
+ require 'rest-client'
2
+ require 'json'
3
+ require_relative 'constants'
4
+ require_relative 'header'
5
+ require_relative 'resource'
6
+
7
+ module AzureDocumentDB
8
+
9
+ class PartitionedCollectionClient < Client
10
+
11
+ def create_collection(database_resource, collection_name,
12
+ partition_key_paths, offer_throughput = AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT )
13
+
14
+ if (offer_throughput < AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
15
+ raise ArgumentError.new sprintf("Offeer thoughput need to be more than %d !",
16
+ AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
17
+ end
18
+ if (partition_key_paths.length < 1 )
19
+ raise ArgumentError.new "No PartitionKey paths!"
20
+ end
21
+ colls_options = {
22
+ 'indexingPolicy' => { 'indexingMode' => "consistent", 'automatic'=>true },
23
+ 'partitionKey' => { "paths" => partition_key_paths, "kind" => "Hash" }
24
+ }
25
+ custom_headers= {'x-ms-offer-throughput' => offer_throughput }
26
+ super(database_resource, collection_name, colls_options, custom_headers)
27
+ end
28
+
29
+
30
+ def create_document(collection_resource, document_id, document, partitioned_key )
31
+ if partitioned_key.empty?
32
+ raise ArgumentError.new "No partitioned key!"
33
+ end
34
+ if !document.key?(partitioned_key)
35
+ raise ArgumentError.new "No partitioned key in your document!"
36
+ end
37
+ partitioned_key_value = document[partitioned_key]
38
+ custom_headers = {
39
+ 'x-ms-documentdb-partitionkey' => "[\"#{partitioned_key_value}\"]"
40
+ }
41
+ super(collection_resource, document_id, document, custom_headers)
42
+ end
43
+
44
+ def find_documents(collection_resource, document_id,
45
+ partitioned_key, partitioned_key_value, custom_headers={})
46
+ if !collection_resource
47
+ raise ArgumentError.new "No collection_resource!"
48
+ end
49
+ ret = {}
50
+ query_params = []
51
+ query_text = sprintf("SELECT * FROM c WHERE c.id=@id AND c.%s=@value", partitioned_key)
52
+ query_params.push( {:name=>"@id", :value=> document_id } )
53
+ query_params.push( {:name=>"@value", :value=> partitioned_key_value } )
54
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
55
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
56
+ ret = query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
57
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
58
+ ret
59
+ end
60
+
61
+ end
62
+ end
@@ -0,0 +1,40 @@
1
+ module AzureDocumentDB
2
+
3
+ class Resource
4
+ def initialize
5
+ @r = {}
6
+ end
7
+ protected
8
+ attr_accessor :r
9
+ end
10
+
11
+ class DatabaseResource < Resource
12
+
13
+ def initialize (database_rid)
14
+ super()
15
+ @r['database_rid'] = database_rid
16
+ end
17
+
18
+ def database_rid
19
+ @r['database_rid']
20
+ end
21
+ end
22
+
23
+ class CollectionResource < Resource
24
+
25
+ def initialize (database_rid, collection_rid)
26
+ super()
27
+ @r['database_rid'] = database_rid
28
+ @r['collection_rid'] = collection_rid
29
+ end
30
+
31
+ def database_rid
32
+ @r['database_rid']
33
+ end
34
+
35
+ def collection_rid
36
+ @r['collection_rid']
37
+ end
38
+ end
39
+
40
+ end
@@ -1,15 +1,21 @@
1
1
  # -*- coding: utf-8 -*-
2
2
 
3
3
  module Fluent
4
+
5
+ require 'fluent/plugin/documentdb/constants'
6
+
4
7
  class DocumentdbOutput < BufferedOutput
5
8
  Plugin.register_output('documentdb', self)
6
9
 
7
10
  def initialize
8
- super
9
- require 'documentdb'
10
- require 'msgpack'
11
- require 'time'
12
- require 'securerandom'
11
+ super
12
+ require 'msgpack'
13
+ require 'time'
14
+ require 'securerandom'
15
+ require 'fluent/plugin/documentdb/client'
16
+ require 'fluent/plugin/documentdb/partitioned_coll_client'
17
+ require 'fluent/plugin/documentdb/header'
18
+ require 'fluent/plugin/documentdb/resource'
13
19
  end
14
20
 
15
21
  config_param :docdb_endpoint, :string
@@ -18,6 +24,9 @@ module Fluent
18
24
  config_param :docdb_collection, :string
19
25
  config_param :auto_create_database, :bool, :default => true
20
26
  config_param :auto_create_collection, :bool, :default => true
27
+ config_param :partitioned_collection, :bool, :default => false
28
+ config_param :partition_key, :string, :default => nil
29
+ config_param :offer_throughput, :integer, :default => AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT
21
30
  config_param :time_format, :string, :default => nil
22
31
  config_param :localtime, :bool, default: false
23
32
  config_param :add_time_field, :bool, :default => true
@@ -26,88 +35,106 @@ module Fluent
26
35
  config_param :tag_field_name, :string, :default => 'tag'
27
36
 
28
37
  def configure(conf)
29
- super
30
- raise ConfigError, 'no docdb_endpoint' if @docdb_endpoint.empty?
31
- raise ConfigError, 'no docdb_account_key' if @docdb_account_key.empty?
32
- raise ConfigError, 'no docdb_database' if @docdb_database.empty?
33
- raise ConfigError, 'no docdb_collection' if @docdb_collection.empty?
34
- if @add_time_field and @time_field_name.empty?
35
- raise ConfigError, 'time_field_name is needed if add_time_field is true'
36
- end
37
- if @add_tag_field and @tag_field_name.empty?
38
- raise ConfigError, 'tag_field_name is needed if add_tag_field is true'
38
+ super
39
+ raise ConfigError, 'no docdb_endpoint' if @docdb_endpoint.empty?
40
+ raise ConfigError, 'no docdb_account_key' if @docdb_account_key.empty?
41
+ raise ConfigError, 'no docdb_database' if @docdb_database.empty?
42
+ raise ConfigError, 'no docdb_collection' if @docdb_collection.empty?
43
+ if @add_time_field and @time_field_name.empty?
44
+ raise ConfigError, 'time_field_name must be set if add_time_field is true'
45
+ end
46
+ if @add_tag_field and @tag_field_name.empty?
47
+ raise ConfigError, 'tag_field_name must be set if add_tag_field is true'
48
+ end
49
+ if @partitioned_collection
50
+ raise ConfigError, 'partition_key must be set in partitioned collection mode' if @partition_key.empty?
51
+ if (@auto_create_collection &&
52
+ @offer_throughput < AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
53
+ raise ConfigError, sprintf("offer_throughput must be more than and equals to %s",
54
+ AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
39
55
  end
40
-
41
- @timef = TimeFormatter.new(@time_format, @localtime)
56
+ end
57
+ @timef = TimeFormatter.new(@time_format, @localtime)
42
58
  end
43
59
 
44
60
  def start
45
- super
61
+ super
46
62
 
47
- begin
48
- context = Azure::DocumentDB::Context.new @docdb_endpoint, @docdb_account_key
63
+ begin
49
64
 
50
- ## initial operations for database
51
- database = Azure::DocumentDB::Database.new context, RestClient
52
- qreq = Azure::DocumentDB::QueryRequest.new "SELECT * FROM root r WHERE r.id=@id"
53
- qreq.parameters.add "@id", @docdb_database
54
- query = database.query
55
- qres = query.execute qreq
56
- if( qres[:body]["_count"].to_i == 0 )
57
- raise "No database (#{docdb_database}) exists! Enable auto_create_database or create it by useself" if !@auto_create_database
58
- # create new database as it doesn't exists
59
- database.create @docdb_database
60
- end
65
+ @client = nil
66
+ if @partitioned_collection
67
+ @client = AzureDocumentDB::PartitionedCollectionClient.new(@docdb_account_key,@docdb_endpoint)
68
+ else
69
+ @client = AzureDocumentDB::Client.new(@docdb_account_key,@docdb_endpoint)
70
+ end
61
71
 
62
- ## initial operations for collection
63
- collection = database.collection_for_name @docdb_database
64
- qreq = Azure::DocumentDB::QueryRequest.new "SELECT * FROM root r WHERE r.id=@id"
65
- qreq.parameters.add "@id", @docdb_collection
66
- query = collection.query
67
- qres = query.execute qreq
68
- if( qres[:body]["_count"].to_i == 0 )
69
- raise "No collection (#{docdb_collection}) exists! Enable auto_create_collection or create it by useself" if !@auto_create_collection
70
- # create new collection as it doesn't exists
71
- collection.create @docdb_collection
72
- end
73
-
74
- @docdb = collection.document_for_name @docdb_collection
72
+ ## initial operations for database
73
+ res = @client.find_databases_by_name(@docdb_database)
74
+ if( res[:body]["_count"].to_i == 0 )
75
+ raise "No database (#{docdb_database}) exists! Enable auto_create_database or create it by useself" if !@auto_create_database
76
+ # create new database as it doesn't exists
77
+ @client.create_database(@docdb_database)
78
+ end
75
79
 
76
- rescue Exception =>ex
77
- $log.fatal "Error: '#{ex}'"
78
- exit!
80
+ ## initial operations for collection
81
+ database_resource = @client.get_database_resource(@docdb_database)
82
+ res = @client.find_collections_by_name(database_resource, @docdb_collection)
83
+ if( res[:body]["_count"].to_i == 0 )
84
+ raise "No collection (#{docdb_collection}) exists! Enable auto_create_collection or create it by useself" if !@auto_create_collection
85
+ # create new collection as it doesn't exists
86
+ if @partitioned_collection
87
+ partition_key_paths = ["/#{@partition_key}"]
88
+ @client.create_collection(database_resource,
89
+ @docdb_collection, partition_key_paths, @offer_throughput)
90
+ else
91
+ @client.create_collection(database_resource, @docdb_collection)
92
+ end
79
93
  end
94
+ @coll_resource = @client.get_collection_resource(database_resource, @docdb_collection)
95
+
96
+ rescue Exception =>ex
97
+ $log.fatal "Error: '#{ex}'"
98
+ exit!
99
+ end
80
100
  end
81
101
 
82
102
  def shutdown
83
- super
84
- # destroy
103
+ super
104
+ # destroy
85
105
  end
86
106
 
87
107
  def format(tag, time, record)
88
- record['id'] = SecureRandom.uuid
89
- if @add_time_field
90
- record[@time_field_name] = @timef.format(time)
91
- end
92
- if @add_tag_field
93
- record[@tag_field_name] = tag
94
- end
95
- record.to_msgpack
108
+ record['id'] = SecureRandom.uuid
109
+ if @add_time_field
110
+ record[@time_field_name] = @timef.format(time)
111
+ end
112
+ if @add_tag_field
113
+ record[@tag_field_name] = tag
114
+ end
115
+ record.to_msgpack
96
116
  end
97
117
 
98
118
  def write(chunk)
99
- records = []
100
- chunk.msgpack_each { |record|
101
- unique_doc_identifier = record["id"]
102
- docdata = record.to_json
103
- begin
104
- @docdb.create unique_doc_identifier, docdata
105
- rescue Exception => ex
106
- $log.fatal "UnknownError: '#{ex}'"
107
- + ", uniqueid=>#{unique_doc_identifier}, data=>"
108
- + docdata.to_s
109
- end
110
- }
119
+ chunk.msgpack_each { |record|
120
+ unique_doc_identifier = record["id"]
121
+ begin
122
+ if @partitioned_collection
123
+ @client.create_document(@coll_resource, unique_doc_identifier, record, @partition_key)
124
+ else
125
+ @client.create_document(@coll_resource, unique_doc_identifier, record, @partition_key)
126
+ end
127
+ rescue RestClient::ExceptionWithResponse => rcex
128
+ exdict = JSON.parse(rcex.response)
129
+ if exdict['code'] == 'Conflict'
130
+ $log.fatal "Duplicate Error: document #{unique_document_identifier} already exists, data=>" + record.to_json
131
+ else
132
+ $log.fatal "RestClient Error: '#{rcex.response}', data=>" + record.to_json
133
+ end
134
+ rescue => ex
135
+ $log.fatal "UnknownError: '#{ex}', uniqueid=>#{unique_doc_identifier}, data=>" + record.to_json
136
+ end
137
+ }
111
138
  end
112
139
  end
113
140
  end
@@ -6,12 +6,17 @@ class DocumentdbOutputTest < Test::Unit::TestCase
6
6
  end
7
7
 
8
8
  CONFIG = %[
9
- docdb_endpoint DOCUMENTDB_ACCOUNT_ENDPOINT
10
- docdb_account_key DOCUMENTDB_ACCOUNT_KEY
9
+ docdb_endpoint https://yoichikademo1.documents.azure.com:443
10
+ docdb_account_key EMwUa3EzsAtJ1qYfzwo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
11
11
  docdb_database mydb
12
12
  docdb_collection mycollection
13
- localtime true
13
+ auto_create_database true
14
+ auto_create_collection true
15
+ partitioned_collection true
16
+ partition_key host
17
+ offer_throughput 10100
14
18
  time_format %Y%m%d-%H:%M:%S
19
+ localtime false
15
20
  add_time_field true
16
21
  time_field_name time
17
22
  add_tag_field true
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-documentdb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yoichi Kawasaki
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-02-20 00:00:00.000000000 Z
11
+ date: 2016-08-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: fluentd
@@ -31,7 +31,7 @@ dependencies:
31
31
  - !ruby/object:Gem::Version
32
32
  version: '2'
33
33
  - !ruby/object:Gem::Dependency
34
- name: azure-documentdb-sdk
34
+ name: rest-client
35
35
  requirement: !ruby/object:Gem::Requirement
36
36
  requirements:
37
37
  - - ">="
@@ -100,7 +100,14 @@ files:
100
100
  - README.md
101
101
  - Rakefile
102
102
  - VERSION
103
+ - conf/fluent-sample.conf
103
104
  - fluent-plugin-documentdb.gemspec
105
+ - img/fluentd-azure-documentdb-collection.png
106
+ - lib/fluent/plugin/documentdb/client.rb
107
+ - lib/fluent/plugin/documentdb/constants.rb
108
+ - lib/fluent/plugin/documentdb/header.rb
109
+ - lib/fluent/plugin/documentdb/partitioned_coll_client.rb
110
+ - lib/fluent/plugin/documentdb/resource.rb
104
111
  - lib/fluent/plugin/out_documentdb.rb
105
112
  - test/helper.rb
106
113
  - test/plugin/test_documentdb.rb
@@ -124,7 +131,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
124
131
  version: '0'
125
132
  requirements: []
126
133
  rubyforge_project:
127
- rubygems_version: 2.5.1
134
+ rubygems_version: 2.6.2
128
135
  signing_key:
129
136
  specification_version: 4
130
137
  summary: Azure DocumentDB output plugin for Fluentd