fluent-plugin-documentdb 0.1.2 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ff6f03b3a3f55afffb9df25d70c21d151413a1e6
4
- data.tar.gz: 4380b4c75f0eead0710388c6d39527634683adcb
3
+ metadata.gz: edf79934c835555db5a5718d55a4246307dd8f94
4
+ data.tar.gz: 323dc8a0a1b394e509fef89d7caba6895e4087f9
5
5
  SHA512:
6
- metadata.gz: 529f06c3b973f340572a0e4fc35bf8ca4ca8a682a16f12e2e0a063f6d21427755e7c5068dbe0f56c7846bd30584f49be0efc95707ff14c564d753f2ad5e86c07
7
- data.tar.gz: 73b1528c3957f2664414f3dfac50f37d5a005fa73360489e9fc9017f8db89fbdde10174d46a3e4e2187d1fd25ec0c4b91d5e6345ad02559ec189a16702e3496d
6
+ metadata.gz: ffc08ce92a70407e91124eb2a8e5178c257e5de657be392e734e7bb0e51e6caa7b0f95f3b0fb81da80284cc34beb316079b44a09eefe5db95f67588f96dd5db3
7
+ data.tar.gz: aacbf0fad60603916af0e56da8e0c7e41f6965a45dad7e31b99c851ec162ce8bd5c4fc95733c2dfcd4196ca4a1635d7127fba36ec50037a2557f55e3149be569
data/ChangeLog CHANGED
@@ -1,3 +1,8 @@
1
+ Release 0.2.0 - 2016/08/17
2
+
3
+ * Support Partitioned Collection mode
4
+ * No longer depend on azure-documentdb-sdk instead use very tiny documentdb client library that included in the plugin
5
+
1
6
  Release 0.1.2 - 2016/02/20
2
7
 
3
8
  * Change gem package dependency option for azure-documentdb-sdk from add_development_dependency to add_dependency
data/README.md CHANGED
@@ -2,6 +2,10 @@
2
2
 
3
3
  fluent-plugin-documentdb is a fluent plugin to output to Azure DocumentDB
4
4
 
5
+ ![fluent-plugin-documentdb overview](https://github.com/yokawasa/fluent-plugin-documentdb/raw/master/img/fluentd-azure-documentdb-collection.png)
6
+
7
+ [NEWS] From fluent-plugin-documentdb-0.2.0, it supports partitioned collections, not only single-partition collections (See [Partitioning and scaling in Azure DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-partition-data/#single-partition-and-partitioned-collections) for partitioned collections and single-partition collection ).
8
+
5
9
  ## Installation
6
10
 
7
11
  $ gem install fluent-plugin-documentdb
@@ -15,6 +19,7 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
15
19
  * Create a DocumentDB database account using [the Azure portal](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-account/), or [Azure Resource Manager templates and Azure CLI](https://azure.microsoft.com/en-us/documentation/articles/documentdb-automation-resource-manager-cli/)
16
20
  * [How to create a database for DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-database/)
17
21
  * [Create a DocumentDB collection](https://azure.microsoft.com/en-us/documentation/articles/documentdb-create-collection/)
22
+ * [Partitioning and scaling in Azure DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-partition-data/)
18
23
 
19
24
 
20
25
  ### Fluentd - fluent.conf
@@ -27,6 +32,9 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
27
32
  docdb_collection mycollection
28
33
  auto_create_database true
29
34
  auto_create_collection true
35
+ partitioned_collection true
36
+ partition_key PARTITION_EKY
37
+ offer_throughput 10100
30
38
  time_format %s
31
39
  localtime false
32
40
  add_time_field true
@@ -41,6 +49,9 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
41
49
  * **docdb\_collection (required)** - DocumentDB collection name
42
50
  * **auto\_create\_database (optional)** - Default:true. By default, DocumentDB database named **docdb\_database** will be automatically created if it does not exist
43
51
  * **auto\_create\_collection (optional)** - Default:true. By default, DocumentDB collection named **docdb\_collection** will be automatically created if it does not exist
52
+ * **partitioned\_collection (optional)** - Default:false. Set true if you want to create and/or store records to partitioned collection. Set false for single-partition collection
53
+ * **partition\_key (optional)** - Default:nil. Partition key must be specified for paritioned collection (partitioned\_collection set to be true)
54
+ * **offer\_throughput (optional)** - Default:10100. Throughput for the collection expressed in units of 100 request units per second. This is only effective when you newly create a partitioned collection (ie. Both auto\_create\_collection and partitioned\_collection are set to be true )
44
55
  * **localtime (optional)** - Default:false. By default, time record is inserted with UTC (Coordinated Universal Time). This option allows to use local time if you set localtime true
45
56
  * **time\_format (optional)** - Default:%s. Time format for a time field to be inserted. Default format is %s, that is unix epoch time. If you want it to be more human readable, set this %Y%m%d-%H:%M:%S, for example.
46
57
  * **add\_time\_field (optional)** - Default:true. This option allows to insert a time field to record
@@ -49,9 +60,11 @@ To use Microsoft Azure DocumentDB, you must create a DocumentDB database account
49
60
  * **tag\_field\_name (optional)** - Default:tag. Tag field name to be inserted
50
61
 
51
62
 
52
- ## Expected Records
63
+ ## Configuration examples
64
+
65
+ fluent-plugin-documentdb will add **id** attribute which is UUID format and any other attributes of record automatically. In addition, it will add **time** and **tag** attributes if **add_time_field** and **add_tag_field** are true respectively. Please see 2 types of the plugin configurations example below - single-parition collection and partitioned collection. Source for fluentd to read is apache access log.
53
66
 
54
- fluent-plugin-documentdb will add **id** attribute which is UUID format and any other attributes of record automatically. In addition, it will add **time** and **tag** attributes if **add_time_field** and **add_tag_field** are true respectively. For example if you read apache's access log via fluentd, structure of the record to inserted into documentdb will have been like this.
67
+ ### (1) Single-Partition Collection Case
55
68
 
56
69
  <u>fluent.conf</u>
57
70
 
@@ -68,7 +81,42 @@ fluent-plugin-documentdb will add **id** attribute which is UUID format and any
68
81
  docdb_endpoint https://yoichikademo.documents.azure.com:443/
69
82
  docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
70
83
  docdb_database mydb
71
- docdb_collection mycollection
84
+ docdb_collection my-single-partition-collection
85
+ auto_create_database true
86
+ auto_create_collection true
87
+ partitioned_collection true
88
+ localtime true
89
+ time_format %Y%m%d-%H:%M:%S
90
+ add_time_field true
91
+ time_field_name time
92
+ add_tag_field true
93
+ tag_field_name tag
94
+ </match>
95
+
96
+ ### (2) Partitioned Collection Case
97
+
98
+ <u>fluent.conf</u>
99
+
100
+ <source>
101
+ @type tail # input plugin
102
+ path /var/log/apache2/access.log # monitoring file
103
+ pos_file /tmp/fluentd_pos_file # position file
104
+ format apache # format
105
+ tag documentdb.access # tag
106
+ </source>
107
+
108
+ <match documentdb.*>
109
+ @type documentdb
110
+ docdb_endpoint https://yoichikademo.documents.azure.com:443/
111
+ docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
112
+ docdb_database mydb
113
+ docdb_collection my-partitioned-collection
114
+ auto_create_database true
115
+ auto_create_collection true
116
+ partitioned_collection true
117
+ partition_key host
118
+ offer_throughput 10100
119
+ auto_create_database
72
120
  localtime true
73
121
  time_format %Y%m%d-%H:%M:%S
74
122
  add_time_field true
@@ -77,6 +125,9 @@ fluent-plugin-documentdb will add **id** attribute which is UUID format and any
77
125
  tag_field_name tag
78
126
  </match>
79
127
 
128
+
129
+ ## Sample inputs and expected records
130
+
80
131
  An expected output record for sample input will be like this:
81
132
 
82
133
  <u>Sample Input (apache access log)</u>
@@ -122,8 +173,7 @@ An expected output record for sample input will be like this:
122
173
  $ ab -n 5 -c 2 http://localhost/foo/bar/test.html
123
174
 
124
175
  ## TODOs
125
- * Support documentdb sharding. See [How to partition data in DocumentDB](https://azure.microsoft.com/en-us/documentation/articles/documentdb-sharding/)
126
- * Support resource tokens access. See [Access Control on DocumentDB Resources](https://msdn.microsoft.com/en-us/library/azure/dn783368.aspx)
176
+ * Support automatic data expiration with TTL (Time-to-Live ). See [Expire data in DocumentDB collections automatically with time to live](https://azure.microsoft.com/en-us/documentation/articles/documentdb-time-to-live/)
127
177
 
128
178
  ## Contributing
129
179
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.2
1
+ 0.2.0
@@ -0,0 +1,27 @@
1
+ <source>
2
+ @type tail # input plugin
3
+ path /var/log/apache2/access.log # monitoring file
4
+ pos_file /tmp/fluentd_pos_file # position file
5
+ format apache # format
6
+ tag documentdb.access # tag
7
+ </source>
8
+
9
+ <match documentdb.*>
10
+ @type documentdb
11
+ docdb_endpoint https://yoichikademo1.documents.azure.com:443/
12
+ docdb_account_key EMwUa3EzsAtJ1qYfzwo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
13
+ docdb_database mydb
14
+ docdb_collection mycollection
15
+ auto_create_database true
16
+ auto_create_collection true
17
+ partitioned_collection true
18
+ partition_key host
19
+ offer_throughput 10100
20
+ localtime true
21
+ time_format %Y%m%d-%H:%M:%S
22
+ add_time_field true
23
+ time_field_name time
24
+ add_tag_field true
25
+ tag_field_name tag
26
+ </match>
27
+
@@ -14,13 +14,13 @@ Gem::Specification.new do |gem|
14
14
  gem.has_rdoc = false
15
15
 
16
16
  gem.files = `git ls-files`.split("\n")
17
- #gem.executables = gem.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
- gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
17
+ gem.executables = gem.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
+ #gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
19
19
  gem.test_files = gem.files.grep(%r{^(test|gem|features)/})
20
20
  gem.require_paths = ["lib"]
21
21
 
22
22
  gem.add_dependency "fluentd", [">= 0.10.58", "< 2"]
23
- gem.add_dependency "azure-documentdb-sdk"
23
+ gem.add_dependency "rest-client"
24
24
  gem.add_development_dependency "bundler", "~> 1.11"
25
25
  gem.add_development_dependency "rake", "~> 10.0"
26
26
  gem.add_development_dependency "test-unit"
@@ -0,0 +1,167 @@
1
+ require 'rest-client'
2
+ require 'json'
3
+ require_relative 'constants'
4
+ require_relative 'header'
5
+ require_relative 'resource'
6
+
7
+ module AzureDocumentDB
8
+
9
+ class Client
10
+
11
+ def initialize (master_key, url_endpoint)
12
+ @master_key = master_key
13
+ @url_endpoint = url_endpoint
14
+ @header = AzureDocumentDB::Header.new(@master_key)
15
+ end
16
+
17
+ def create_database (database_name)
18
+ url = "#{@url_endpoint}/dbs"
19
+ custom_headers = {'Content-Type' => 'application/json'}
20
+ headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', custom_headers )
21
+ body_json = { 'id' => database_name }.to_json
22
+ res = RestClient.post( url, body_json, headers)
23
+ JSON.parse(res)
24
+ end
25
+
26
+ def find_databases_by_name (database_name)
27
+ query_params = []
28
+ query_text = "SELECT * FROM root r WHERE r.id=@id"
29
+ query_params.push( {:name=>"@id", :value=> database_name } )
30
+ url = sprintf("%s/dbs", @url_endpoint )
31
+ res = _query(AzureDocumentDB::RESOURCE_TYPE_DATABASE, '', url, query_text, query_params)
32
+ res
33
+ end
34
+
35
+ def get_database_resource (database_name)
36
+ resource = nil
37
+ res = find_databases_by_name (database_name)
38
+ if( res[:body]["_count"].to_i == 0 )
39
+ p "no #{database_name} database exists"
40
+ return resource
41
+ end
42
+ res[:body]['Databases'].select do |db|
43
+ if (db['id'] == database_name )
44
+ resource = AzureDocumentDB::DatabaseResource.new(db['_rid'])
45
+ end
46
+ end
47
+ resource
48
+ end
49
+
50
+ def create_collection(database_resource, collection_name, colls_options={}, custom_headers={} )
51
+ if !database_resource
52
+ raise ArgumentError.new 'No database_resource!'
53
+ end
54
+ url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid )
55
+ custom_headers['Content-Type'] = 'application/json'
56
+ headers = @header.generate('post',
57
+ AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
58
+ database_resource.database_rid, custom_headers )
59
+ body = {'id' => collection_name }
60
+ colls_options.each{|k, v|
61
+ if k == 'indexingPolicy' || k == 'partitionKey'
62
+ body[k] = v
63
+ end
64
+ }
65
+ res = RestClient.post( url, body.to_json, headers)
66
+ JSON.parse(res)
67
+ end
68
+
69
+ def find_collections_by_name(database_resource, collection_name)
70
+ if !database_resource
71
+ raise ArgumentError.new 'No database_resource!'
72
+ end
73
+ ret = {}
74
+ query_params = []
75
+ query_text = "SELECT * FROM root r WHERE r.id=@id"
76
+ query_params.push( {:name=>"@id", :value=> collection_name } )
77
+ url = sprintf("%s/dbs/%s/colls", @url_endpoint, database_resource.database_rid)
78
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_COLLECTION,
79
+ database_resource.database_rid, url, query_text, query_params)
80
+ ret
81
+ end
82
+
83
+ def get_collection_resource (database_resource, collection_name)
84
+ _collection_rid = ''
85
+ if !database_resource
86
+ raise ArgumentError.new 'No database_resource!'
87
+ end
88
+ res = find_collections_by_name(database_resource, collection_name)
89
+ res[:body]['DocumentCollections'].select do |col|
90
+ if (col['id'] == collection_name )
91
+ _collection_rid = col['_rid']
92
+ end
93
+ end
94
+ if _collection_rid.empty?
95
+ p "no #{collection_name} collection exists"
96
+ return nil
97
+ end
98
+ AzureDocumentDB::CollectionResource.new(database_resource.database_rid, _collection_rid)
99
+ end
100
+
101
+ def create_document(collection_resource, document_id, document, custom_headers={} )
102
+ if !collection_resource
103
+ raise ArgumentError.new 'No collection_resource!'
104
+ end
105
+ if document['id'] && document_id != document['id']
106
+ raise ArgumentError.new "Document id mismatch error (#{document_id})!"
107
+ end
108
+ body = { 'id' => document_id }.merge document
109
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
110
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
111
+ custom_headers['Content-Type'] = 'application/json'
112
+ headers = @header.generate('post', AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
113
+ collection_resource.collection_rid, custom_headers )
114
+ res = RestClient.post( url, body.to_json, headers)
115
+ JSON.parse(res)
116
+ end
117
+
118
+ def find_documents(collection_resource, document_id, custom_headers={})
119
+ if !collection_resource
120
+ raise ArgumentError.new 'No collection_resource!'
121
+ end
122
+ ret = {}
123
+ query_params = []
124
+ query_text = "SELECT * FROM c WHERE c.id=@id"
125
+ query_params.push( {:name=>"@id", :value=> document_id } )
126
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
127
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
128
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
129
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
130
+ ret
131
+ end
132
+
133
+ def query_documents( collection_resource, query_text, query_params, custom_headers={} )
134
+ if !collection_resource
135
+ raise ArgumentError.new 'No collection_resource!'
136
+ end
137
+ ret = {}
138
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
139
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
140
+ ret = _query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
141
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
142
+ ret
143
+ end
144
+
145
+ protected
146
+
147
+ def _query( resource_type, parent_resource_id, url, query_text, query_params, custom_headers={} )
148
+ query_specific_header = {
149
+ 'x-ms-documentdb-isquery' => 'True',
150
+ 'Content-Type' => 'application/query+json',
151
+ 'Accept' => 'application/json'
152
+ }
153
+ query_specific_header.merge! custom_headers
154
+ headers = @header.generate('post', resource_type, parent_resource_id, query_specific_header)
155
+ body_json = {
156
+ :query => query_text,
157
+ :parameters => query_params
158
+ }.to_json
159
+
160
+ res = RestClient.post( url, body_json, headers)
161
+ result = {
162
+ :header => res.headers,
163
+ :body => JSON.parse(res.body) }
164
+ return result
165
+ end
166
+ end
167
+ end
@@ -0,0 +1,10 @@
1
+ module AzureDocumentDB
2
+ API_VERSION = '2015-12-16'.freeze
3
+ RESOURCE_TYPE_DATABASE='dbs'.freeze
4
+ RESOURCE_TYPE_COLLECTION='colls'.freeze
5
+ RESOURCE_TYPE_DOCUMENT='docs'.freeze
6
+ AUTH_TOKEN_VERSION = '1.0'.freeze
7
+ AUTH_TOKEN_TYPE_MASTER = 'master'.freeze
8
+ AUTH_TOKEN_TYPE_RESOURCE = 'resource'.freeze
9
+ PARTITIONED_COLL_MIN_THROUGHPUT = 10100.freeze
10
+ end
@@ -0,0 +1,55 @@
1
+ require 'time'
2
+ require 'openssl'
3
+ require 'base64'
4
+ require 'erb'
5
+
6
+ module AzureDocumentDB
7
+
8
+ class Header
9
+
10
+ def initialize (master_key)
11
+ @master_key = master_key
12
+ end
13
+
14
+ def generate (verb, resource_type, parent_resource_id, api_specific_headers = {} )
15
+ headers = {}
16
+ utc_date = get_httpdate()
17
+ auth_token = generate_auth_token(verb, resource_type, parent_resource_id, utc_date )
18
+ default_headers = {
19
+ 'x-ms-version' => AzureDocumentDB::API_VERSION,
20
+ 'x-ms-date' => utc_date,
21
+ 'authorization' => auth_token
22
+ }.freeze
23
+ headers.merge!(default_headers)
24
+ headers.merge(api_specific_headers)
25
+ end
26
+
27
+ private
28
+
29
+ def generate_auth_token ( verb, resource_type, resource_id, utc_date)
30
+ payload = sprintf("%s\n%s\n%s\n%s\n%s\n",
31
+ verb,
32
+ resource_type,
33
+ resource_id,
34
+ utc_date,
35
+ "" )
36
+ sig = hmac_base64encode(payload)
37
+
38
+ ERB::Util.url_encode sprintf("type=%s&ver=%s&sig=%s",
39
+ AzureDocumentDB::AUTH_TOKEN_TYPE_MASTER,
40
+ AzureDocumentDB::AUTH_TOKEN_VERSION,
41
+ sig )
42
+ end
43
+
44
+ def get_httpdate
45
+ Time.now.httpdate
46
+ end
47
+
48
+ def hmac_base64encode( text )
49
+ key = Base64.urlsafe_decode64 @master_key
50
+ hmac = OpenSSL::HMAC.digest('sha256', key, text.downcase)
51
+ Base64.encode64(hmac).strip
52
+ end
53
+
54
+ end
55
+ end
@@ -0,0 +1,62 @@
1
+ require 'rest-client'
2
+ require 'json'
3
+ require_relative 'constants'
4
+ require_relative 'header'
5
+ require_relative 'resource'
6
+
7
+ module AzureDocumentDB
8
+
9
+ class PartitionedCollectionClient < Client
10
+
11
+ def create_collection(database_resource, collection_name,
12
+ partition_key_paths, offer_throughput = AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT )
13
+
14
+ if (offer_throughput < AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
15
+ raise ArgumentError.new sprintf("Offeer thoughput need to be more than %d !",
16
+ AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
17
+ end
18
+ if (partition_key_paths.length < 1 )
19
+ raise ArgumentError.new "No PartitionKey paths!"
20
+ end
21
+ colls_options = {
22
+ 'indexingPolicy' => { 'indexingMode' => "consistent", 'automatic'=>true },
23
+ 'partitionKey' => { "paths" => partition_key_paths, "kind" => "Hash" }
24
+ }
25
+ custom_headers= {'x-ms-offer-throughput' => offer_throughput }
26
+ super(database_resource, collection_name, colls_options, custom_headers)
27
+ end
28
+
29
+
30
+ def create_document(collection_resource, document_id, document, partitioned_key )
31
+ if partitioned_key.empty?
32
+ raise ArgumentError.new "No partitioned key!"
33
+ end
34
+ if !document.key?(partitioned_key)
35
+ raise ArgumentError.new "No partitioned key in your document!"
36
+ end
37
+ partitioned_key_value = document[partitioned_key]
38
+ custom_headers = {
39
+ 'x-ms-documentdb-partitionkey' => "[\"#{partitioned_key_value}\"]"
40
+ }
41
+ super(collection_resource, document_id, document, custom_headers)
42
+ end
43
+
44
+ def find_documents(collection_resource, document_id,
45
+ partitioned_key, partitioned_key_value, custom_headers={})
46
+ if !collection_resource
47
+ raise ArgumentError.new "No collection_resource!"
48
+ end
49
+ ret = {}
50
+ query_params = []
51
+ query_text = sprintf("SELECT * FROM c WHERE c.id=@id AND c.%s=@value", partitioned_key)
52
+ query_params.push( {:name=>"@id", :value=> document_id } )
53
+ query_params.push( {:name=>"@value", :value=> partitioned_key_value } )
54
+ url = sprintf("%s/dbs/%s/colls/%s/docs",
55
+ @url_endpoint, collection_resource.database_rid, collection_resource.collection_rid)
56
+ ret = query(AzureDocumentDB::RESOURCE_TYPE_DOCUMENT,
57
+ collection_resource.collection_rid, url, query_text, query_params, custom_headers)
58
+ ret
59
+ end
60
+
61
+ end
62
+ end
@@ -0,0 +1,40 @@
1
+ module AzureDocumentDB
2
+
3
+ class Resource
4
+ def initialize
5
+ @r = {}
6
+ end
7
+ protected
8
+ attr_accessor :r
9
+ end
10
+
11
+ class DatabaseResource < Resource
12
+
13
+ def initialize (database_rid)
14
+ super()
15
+ @r['database_rid'] = database_rid
16
+ end
17
+
18
+ def database_rid
19
+ @r['database_rid']
20
+ end
21
+ end
22
+
23
+ class CollectionResource < Resource
24
+
25
+ def initialize (database_rid, collection_rid)
26
+ super()
27
+ @r['database_rid'] = database_rid
28
+ @r['collection_rid'] = collection_rid
29
+ end
30
+
31
+ def database_rid
32
+ @r['database_rid']
33
+ end
34
+
35
+ def collection_rid
36
+ @r['collection_rid']
37
+ end
38
+ end
39
+
40
+ end
@@ -1,15 +1,21 @@
1
1
  # -*- coding: utf-8 -*-
2
2
 
3
3
  module Fluent
4
+
5
+ require 'fluent/plugin/documentdb/constants'
6
+
4
7
  class DocumentdbOutput < BufferedOutput
5
8
  Plugin.register_output('documentdb', self)
6
9
 
7
10
  def initialize
8
- super
9
- require 'documentdb'
10
- require 'msgpack'
11
- require 'time'
12
- require 'securerandom'
11
+ super
12
+ require 'msgpack'
13
+ require 'time'
14
+ require 'securerandom'
15
+ require 'fluent/plugin/documentdb/client'
16
+ require 'fluent/plugin/documentdb/partitioned_coll_client'
17
+ require 'fluent/plugin/documentdb/header'
18
+ require 'fluent/plugin/documentdb/resource'
13
19
  end
14
20
 
15
21
  config_param :docdb_endpoint, :string
@@ -18,6 +24,9 @@ module Fluent
18
24
  config_param :docdb_collection, :string
19
25
  config_param :auto_create_database, :bool, :default => true
20
26
  config_param :auto_create_collection, :bool, :default => true
27
+ config_param :partitioned_collection, :bool, :default => false
28
+ config_param :partition_key, :string, :default => nil
29
+ config_param :offer_throughput, :integer, :default => AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT
21
30
  config_param :time_format, :string, :default => nil
22
31
  config_param :localtime, :bool, default: false
23
32
  config_param :add_time_field, :bool, :default => true
@@ -26,88 +35,106 @@ module Fluent
26
35
  config_param :tag_field_name, :string, :default => 'tag'
27
36
 
28
37
  def configure(conf)
29
- super
30
- raise ConfigError, 'no docdb_endpoint' if @docdb_endpoint.empty?
31
- raise ConfigError, 'no docdb_account_key' if @docdb_account_key.empty?
32
- raise ConfigError, 'no docdb_database' if @docdb_database.empty?
33
- raise ConfigError, 'no docdb_collection' if @docdb_collection.empty?
34
- if @add_time_field and @time_field_name.empty?
35
- raise ConfigError, 'time_field_name is needed if add_time_field is true'
36
- end
37
- if @add_tag_field and @tag_field_name.empty?
38
- raise ConfigError, 'tag_field_name is needed if add_tag_field is true'
38
+ super
39
+ raise ConfigError, 'no docdb_endpoint' if @docdb_endpoint.empty?
40
+ raise ConfigError, 'no docdb_account_key' if @docdb_account_key.empty?
41
+ raise ConfigError, 'no docdb_database' if @docdb_database.empty?
42
+ raise ConfigError, 'no docdb_collection' if @docdb_collection.empty?
43
+ if @add_time_field and @time_field_name.empty?
44
+ raise ConfigError, 'time_field_name must be set if add_time_field is true'
45
+ end
46
+ if @add_tag_field and @tag_field_name.empty?
47
+ raise ConfigError, 'tag_field_name must be set if add_tag_field is true'
48
+ end
49
+ if @partitioned_collection
50
+ raise ConfigError, 'partition_key must be set in partitioned collection mode' if @partition_key.empty?
51
+ if (@auto_create_collection &&
52
+ @offer_throughput < AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
53
+ raise ConfigError, sprintf("offer_throughput must be more than and equals to %s",
54
+ AzureDocumentDB::PARTITIONED_COLL_MIN_THROUGHPUT)
39
55
  end
40
-
41
- @timef = TimeFormatter.new(@time_format, @localtime)
56
+ end
57
+ @timef = TimeFormatter.new(@time_format, @localtime)
42
58
  end
43
59
 
44
60
  def start
45
- super
61
+ super
46
62
 
47
- begin
48
- context = Azure::DocumentDB::Context.new @docdb_endpoint, @docdb_account_key
63
+ begin
49
64
 
50
- ## initial operations for database
51
- database = Azure::DocumentDB::Database.new context, RestClient
52
- qreq = Azure::DocumentDB::QueryRequest.new "SELECT * FROM root r WHERE r.id=@id"
53
- qreq.parameters.add "@id", @docdb_database
54
- query = database.query
55
- qres = query.execute qreq
56
- if( qres[:body]["_count"].to_i == 0 )
57
- raise "No database (#{docdb_database}) exists! Enable auto_create_database or create it by useself" if !@auto_create_database
58
- # create new database as it doesn't exists
59
- database.create @docdb_database
60
- end
65
+ @client = nil
66
+ if @partitioned_collection
67
+ @client = AzureDocumentDB::PartitionedCollectionClient.new(@docdb_account_key,@docdb_endpoint)
68
+ else
69
+ @client = AzureDocumentDB::Client.new(@docdb_account_key,@docdb_endpoint)
70
+ end
61
71
 
62
- ## initial operations for collection
63
- collection = database.collection_for_name @docdb_database
64
- qreq = Azure::DocumentDB::QueryRequest.new "SELECT * FROM root r WHERE r.id=@id"
65
- qreq.parameters.add "@id", @docdb_collection
66
- query = collection.query
67
- qres = query.execute qreq
68
- if( qres[:body]["_count"].to_i == 0 )
69
- raise "No collection (#{docdb_collection}) exists! Enable auto_create_collection or create it by useself" if !@auto_create_collection
70
- # create new collection as it doesn't exists
71
- collection.create @docdb_collection
72
- end
73
-
74
- @docdb = collection.document_for_name @docdb_collection
72
+ ## initial operations for database
73
+ res = @client.find_databases_by_name(@docdb_database)
74
+ if( res[:body]["_count"].to_i == 0 )
75
+ raise "No database (#{docdb_database}) exists! Enable auto_create_database or create it by useself" if !@auto_create_database
76
+ # create new database as it doesn't exists
77
+ @client.create_database(@docdb_database)
78
+ end
75
79
 
76
- rescue Exception =>ex
77
- $log.fatal "Error: '#{ex}'"
78
- exit!
80
+ ## initial operations for collection
81
+ database_resource = @client.get_database_resource(@docdb_database)
82
+ res = @client.find_collections_by_name(database_resource, @docdb_collection)
83
+ if( res[:body]["_count"].to_i == 0 )
84
+ raise "No collection (#{docdb_collection}) exists! Enable auto_create_collection or create it by useself" if !@auto_create_collection
85
+ # create new collection as it doesn't exists
86
+ if @partitioned_collection
87
+ partition_key_paths = ["/#{@partition_key}"]
88
+ @client.create_collection(database_resource,
89
+ @docdb_collection, partition_key_paths, @offer_throughput)
90
+ else
91
+ @client.create_collection(database_resource, @docdb_collection)
92
+ end
79
93
  end
94
+ @coll_resource = @client.get_collection_resource(database_resource, @docdb_collection)
95
+
96
+ rescue Exception =>ex
97
+ $log.fatal "Error: '#{ex}'"
98
+ exit!
99
+ end
80
100
  end
81
101
 
82
102
  def shutdown
83
- super
84
- # destroy
103
+ super
104
+ # destroy
85
105
  end
86
106
 
87
107
  def format(tag, time, record)
88
- record['id'] = SecureRandom.uuid
89
- if @add_time_field
90
- record[@time_field_name] = @timef.format(time)
91
- end
92
- if @add_tag_field
93
- record[@tag_field_name] = tag
94
- end
95
- record.to_msgpack
108
+ record['id'] = SecureRandom.uuid
109
+ if @add_time_field
110
+ record[@time_field_name] = @timef.format(time)
111
+ end
112
+ if @add_tag_field
113
+ record[@tag_field_name] = tag
114
+ end
115
+ record.to_msgpack
96
116
  end
97
117
 
98
118
  def write(chunk)
99
- records = []
100
- chunk.msgpack_each { |record|
101
- unique_doc_identifier = record["id"]
102
- docdata = record.to_json
103
- begin
104
- @docdb.create unique_doc_identifier, docdata
105
- rescue Exception => ex
106
- $log.fatal "UnknownError: '#{ex}'"
107
- + ", uniqueid=>#{unique_doc_identifier}, data=>"
108
- + docdata.to_s
109
- end
110
- }
119
+ chunk.msgpack_each { |record|
120
+ unique_doc_identifier = record["id"]
121
+ begin
122
+ if @partitioned_collection
123
+ @client.create_document(@coll_resource, unique_doc_identifier, record, @partition_key)
124
+ else
125
+ @client.create_document(@coll_resource, unique_doc_identifier, record, @partition_key)
126
+ end
127
+ rescue RestClient::ExceptionWithResponse => rcex
128
+ exdict = JSON.parse(rcex.response)
129
+ if exdict['code'] == 'Conflict'
130
+ $log.fatal "Duplicate Error: document #{unique_document_identifier} already exists, data=>" + record.to_json
131
+ else
132
+ $log.fatal "RestClient Error: '#{rcex.response}', data=>" + record.to_json
133
+ end
134
+ rescue => ex
135
+ $log.fatal "UnknownError: '#{ex}', uniqueid=>#{unique_doc_identifier}, data=>" + record.to_json
136
+ end
137
+ }
111
138
  end
112
139
  end
113
140
  end
@@ -6,12 +6,17 @@ class DocumentdbOutputTest < Test::Unit::TestCase
6
6
  end
7
7
 
8
8
  CONFIG = %[
9
- docdb_endpoint DOCUMENTDB_ACCOUNT_ENDPOINT
10
- docdb_account_key DOCUMENTDB_ACCOUNT_KEY
9
+ docdb_endpoint https://yoichikademo1.documents.azure.com:443
10
+ docdb_account_key EMwUa3EzsAtJ1qYfzwo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
11
11
  docdb_database mydb
12
12
  docdb_collection mycollection
13
- localtime true
13
+ auto_create_database true
14
+ auto_create_collection true
15
+ partitioned_collection true
16
+ partition_key host
17
+ offer_throughput 10100
14
18
  time_format %Y%m%d-%H:%M:%S
19
+ localtime false
15
20
  add_time_field true
16
21
  time_field_name time
17
22
  add_tag_field true
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-documentdb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yoichi Kawasaki
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-02-20 00:00:00.000000000 Z
11
+ date: 2016-08-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: fluentd
@@ -31,7 +31,7 @@ dependencies:
31
31
  - !ruby/object:Gem::Version
32
32
  version: '2'
33
33
  - !ruby/object:Gem::Dependency
34
- name: azure-documentdb-sdk
34
+ name: rest-client
35
35
  requirement: !ruby/object:Gem::Requirement
36
36
  requirements:
37
37
  - - ">="
@@ -100,7 +100,14 @@ files:
100
100
  - README.md
101
101
  - Rakefile
102
102
  - VERSION
103
+ - conf/fluent-sample.conf
103
104
  - fluent-plugin-documentdb.gemspec
105
+ - img/fluentd-azure-documentdb-collection.png
106
+ - lib/fluent/plugin/documentdb/client.rb
107
+ - lib/fluent/plugin/documentdb/constants.rb
108
+ - lib/fluent/plugin/documentdb/header.rb
109
+ - lib/fluent/plugin/documentdb/partitioned_coll_client.rb
110
+ - lib/fluent/plugin/documentdb/resource.rb
104
111
  - lib/fluent/plugin/out_documentdb.rb
105
112
  - test/helper.rb
106
113
  - test/plugin/test_documentdb.rb
@@ -124,7 +131,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
124
131
  version: '0'
125
132
  requirements: []
126
133
  rubyforge_project:
127
- rubygems_version: 2.5.1
134
+ rubygems_version: 2.6.2
128
135
  signing_key:
129
136
  specification_version: 4
130
137
  summary: Azure DocumentDB output plugin for Fluentd