Nordes-logstash-input-azureblob 0.9.5.pre.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: c95d8dd858297e06d69f2366d28e2846b2c35fb4
4
+ data.tar.gz: 1a1921f5e3db235698d6c0387d4b615008ccc5c0
5
+ SHA512:
6
+ metadata.gz: 1684e18df37bcf098189d61265a008d21d20bc3d768956347563529e2633b71744578142a8b2a27df072ac33ce13c5b9a0fd749e51021a0f79d54ba7caf263ad
7
+ data.tar.gz: 05f3b519d14dd26fb8614f1c00af111f492bf05d5ad4bdffdcb0f6a3627ad4942d0ca5150a57cb1e5f49608f213f485bf1c40a519458253673d0b5cb991a0b99
@@ -0,0 +1,16 @@
1
+ ## 2016.07.01
2
+ * Updated the *README.md*
3
+ * Implemented *sleep_time*
4
+ * Added *sincedb* parameter (Use Azure table)
5
+ * Added *ignore_older* parameter (work as file plugin)
6
+ * Added *start_position* parameter (work as file plugin)
7
+ * Added *path_prefix* parameter (no wildcard accepted, using Azure blob API)
8
+ * Added some logs for debugging
9
+
10
+ ### Not yet completed
11
+ * Removed the milestone (deprecated)
12
+
13
+ ## 2016.05.05
14
+ * Made the plugin to respect Logstash shutdown signal.
15
+ * Updated the *logstash-core* runtime dependency requirement to '~> 2.0'.
16
+ * Updated the *logstash-devutils* development dependency requirement to '>= 0.0.16'
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,17 @@
1
+
2
+ Copyright (c) Microsoft. All rights reserved.
3
+ Microsoft would like to thank its contributors, a list
4
+ of whom are at http://aka.ms/entlib-contributors
5
+
6
+ Licensed under the Apache License, Version 2.0 (the "License"); you
7
+ may not use this file except in compliance with the License. You may
8
+ obtain a copy of the License at
9
+
10
+ http://www.apache.org/licenses/LICENSE-2.0
11
+
12
+ Unless required by applicable law or agreed to in writing, software
13
+ distributed under the License is distributed on an "AS IS" BASIS,
14
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
15
+ implied. See the License for the specific language governing permissions
16
+ and limitations under the License.
17
+
@@ -0,0 +1,146 @@
1
+ # Logstash input plugin for Azure Storage Blobs
2
+
3
+ ## Summary
4
+ This plugin reads and parses data from Azure Storage Blobs.
5
+
6
+ ## Installation
7
+ You can install this plugin using the Logstash "plugin" or "logstash-plugin" (for newer versions of Logstash) command:
8
+ ```sh
9
+ logstash-plugin install logstash-input-azureblob
10
+ ```
11
+
12
+ For more information, see Logstash reference [Working with plugins](https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html).
13
+
14
+ ## Configuration
15
+ ### Required Parameters
16
+ __*storage_account_name*__
17
+
18
+ The Azure storage account name.
19
+
20
+ __*storage_access_key*__
21
+
22
+ The access key to the storage account.
23
+
24
+ __*container*__
25
+
26
+ The blob container name.
27
+
28
+ ### Optional Parameters
29
+ __*codec*__
30
+
31
+ The codec used to decode the blob. By default *json_lines* is selected. For normal log file, use *line* or other existing codec.
32
+
33
+ * **Default value:** *json_lines*
34
+
35
+ __*sleep_time*__
36
+
37
+ The sleep time before scanning for new data.
38
+
39
+ * **Default value:** *10* seconds.
40
+ * **Note:** Does not seems to be implemented
41
+
42
+ __*sincedb*__
43
+
44
+ The Azure Table name to keep track of what have been done like when we
45
+ use the file plugin. This define the table name that will be used during
46
+ the process. **IMPORTANT** This will __not__ take into account any *.lock*
47
+ files. It will also __not__ create any *.lock* files.
48
+
49
+ * **Default value:** No default value, if a value is defined, than it will
50
+ create the *sincedb* table in the blob account.
51
+
52
+ __*ignore_older*__
53
+
54
+ When the file input discovers a file that was last modified before the
55
+ specified timespan in seconds, the file is ignored. After it's discovery,
56
+ if an ignored file is modified it is no longer ignored and any new data
57
+ is read. The default is 24 hours.
58
+
59
+ * **Default value:** *24*60*60* (24h)
60
+
61
+ __*start_position*__
62
+
63
+ Choose where Logstash starts initially reading blob: at the beginning or
64
+ at the end. The default behavior treats files like live streams and thus
65
+ starts at the end. If you have old data you want to import, set this
66
+ to 'beginning'.
67
+
68
+ This option only modifies *"first contact"* situations where a file
69
+ is new and not seen before, **i.e.** files that don't have a current
70
+ position recorded in a sincedb read by Logstash. If a file
71
+ has already been seen before, this option has no effect and the
72
+ position recorded in the sincedb file will be used.
73
+
74
+ * **Possible values:** [beginning | end]
75
+ * **Dependency:** *sincedb* needs to be activated.
76
+ * **Default value:** *beginning*
77
+
78
+ __*path_prefix*__
79
+
80
+ Array of blob "path" prefixes. It defines the path prefix to watch in the
81
+ blob container. Path are defined by the blob name (i.e.: ["path/to/blob.log"]).
82
+ Regex cannot really be used to optimize perfs.
83
+
84
+ I recommend to use the paths in order to speed up the processing. *By example,
85
+ WebApp on azure with IIS logging enabled will create one folder per hour. If
86
+ you keep the logs for a long retention it will select all before keeping only
87
+ the last modified ones.*
88
+
89
+ * **Default value:** *[""]*
90
+
91
+ ***
92
+
93
+ ### Example 1 Basic (out of the box)
94
+ Read from a blob (any type) and send it to ElasticSearch.
95
+ ```
96
+ input
97
+ {
98
+ azureblob
99
+ {
100
+ storage_account_name => "mystorageaccount"
101
+ storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
102
+ container => "mycontainer"
103
+ }
104
+ }
105
+ output
106
+ {
107
+ elasticsearch {
108
+ hosts => "localhost"
109
+ index => "logstash-azureblob-%{+YYYY-MM-dd}"
110
+ }
111
+ }
112
+ ```
113
+
114
+ #### What will it do
115
+ It will get the blob, create an empty lock file (512 bytes) and read the entire blob **only once**. Each iteration of the plugin will take a new file and create a new lock file and push the original file to ElasticSearch. If any modification are made on the blob file, it won't be taken into account to push the new data to ElasticSearch. (*No use of sincedb in this situation*)
116
+
117
+ ### Example 2 Advanced (Using sincedb and some other features)
118
+ Read from a blob and send it to ElasticSearch and keep track of where we are in the file.
119
+
120
+ ```
121
+ input
122
+ {
123
+ azureblob
124
+ {
125
+ storage_account_name => "mystorageaccount"
126
+ storage_access_key => "VGhpcyBpcyBhIGZha2Uga2V5Lg=="
127
+ container => "mycontainer"
128
+ codec => "line" # Optional override => use line instead of json
129
+ sincedb => "sincedb" # Optional => Activate the sincedb in Azure table
130
+ sleep_time => 60 # Optional override => Azure IIS blob are updated each minutes
131
+ ignore_older => 2*60*60 # Optional override => Set to 2hours instead of 24 hours
132
+ path_prefix => ["ServerA/iis/2016/06/30/", "ServerA/log4net/"]
133
+ start_position => "end" # Optional override => First contact set the sincedb to the end
134
+ }
135
+ }
136
+ output
137
+ {
138
+ elasticsearch {
139
+ hosts => "localhost"
140
+ index => "logstash-azureblob-%{+YYYY-MM-dd}"
141
+ }
142
+ }
143
+ ```
144
+
145
+ ## More information
146
+ The source code of this plugin is hosted in GitHub repo [Microsoft Azure Diagnostics with ELK](https://github.com/Azure/azure-diagnostics-tools). We welcome you to provide feedback and/or contribute to the project.
@@ -0,0 +1,269 @@
1
+ # encoding: utf-8
2
+ require "logstash/inputs/base"
3
+ require "logstash/namespace"
4
+
5
+ require "azure"
6
+ require "base64"
7
+ require "securerandom"
8
+
9
+ # Reads events from Azure Blobs
10
+ class LogStash::Inputs::Azureblob < LogStash::Inputs::Base
11
+ # Define the plugin name
12
+ config_name "azureblob"
13
+
14
+ # Codec
15
+ # *Possible values available at https://www.elastic.co/guide/en/logstash/current/codec-plugins.html
16
+ # *Most used: json_lines, line, etc.
17
+ default :codec, "json_lines"
18
+
19
+ # storage_account_name
20
+ # *Define the Azure Storage Account Name
21
+ config :storage_account_name, :validate => :string, :required => true
22
+
23
+ # storage_access_key
24
+ # *Define the Azure Storage Access Key (available through the portal)
25
+ config :storage_access_key, :validate => :string, :required => true
26
+
27
+ # container
28
+ # *Define the container to watch
29
+ config :container, :validate => :string, :required => true
30
+
31
+ # sleep_time
32
+ # *Define the sleep_time between scanning for new data
33
+ config :sleep_time, :validate => :number, :default => 10, :required => false
34
+
35
+ # [New]
36
+ # path_prefix
37
+ # *Define the path prefix in the container in order to not take everything
38
+ config :path_prefix, :validate => :array, :default => [""], :required => false
39
+
40
+ # sincedb
41
+ # *Define the Azure Storage Table where we can drop information about the the blob we're collecting.
42
+ # *Important! The sincedb will be on the container we're watching.
43
+ # *By default, we don't use the sincedb but I recommend it if files gets updated.
44
+ config :sincedb, :validate => :string, :required => false
45
+
46
+ # ignore_older
47
+ # When the file input discovers a file that was last modified
48
+ # before the specified timespan in seconds, the file is ignored.
49
+ # After it's discovery, if an ignored file is modified it is no
50
+ # longer ignored and any new data is read. The default is 24 hours.
51
+ config :ignore_older, :validate => :number, :default => 24 * 60 * 60, :required => false
52
+
53
+ # Choose where Logstash starts initially reading blob: at the beginning or
54
+ # at the end. The default behavior treats files like live streams and thus
55
+ # starts at the end. If you have old data you want to import, set this
56
+ # to 'beginning'.
57
+ #
58
+ # This option only modifies "first contact" situations where a file
59
+ # is new and not seen before, i.e. files that don't have a current
60
+ # position recorded in a sincedb file read by Logstash. If a file
61
+ # has already been seen before, this option has no effect and the
62
+ # position recorded in the sincedb file will be used.
63
+ config :start_position, :validate => [ "beginning", "end"], :default => "end", :required => false
64
+
65
+ # Initialize the plugin
66
+ def initialize(*args)
67
+ super(*args)
68
+ end # def initialize
69
+
70
+ public
71
+ def register
72
+ Azure.configure do |config|
73
+ config.storage_account_name = @storage_account_name
74
+ config.storage_access_key = @storage_access_key
75
+ config.storage_table_host = "https://#{@storage_account_name}.table.core.windows.net"
76
+ end
77
+ @azure_blob = Azure::Blob::BlobService.new
78
+
79
+ if (@sincedb)
80
+ @azure_table = Azure::Table::TableService.new
81
+ init_wad_table
82
+ end
83
+ end # def register
84
+
85
+ # Initialize the WAD Table if we have a sincedb defined.
86
+ def init_wad_table
87
+ if (@sincedb)
88
+ begin
89
+ @azure_table.create_table(@sincedb) # Be sure that the table name is properly named.
90
+ rescue
91
+ @logger.info("#{DateTime.now} Table #{@sincedb} already exists.")
92
+ end
93
+ end
94
+ end # def init_wad_table
95
+
96
+ # List the blob names in the container. If we have any path pattern defined, it will filter
97
+ # the blob names from the list. The status of the blobs will be persisted in the WAD table.
98
+ #
99
+ # Returns the list of blob_names to read from.
100
+ def list_blobs
101
+ blobs = Hash.new
102
+ now_time = DateTime.now.new_offset(0)
103
+
104
+ @logger.info("#{DateTime.now} Looking for blobs in #{path_prefix.length} paths (#{path_prefix.to_s})...")
105
+
106
+ path_prefix.each do |prefix|
107
+ continuation_token = NIL
108
+ loop do
109
+ entries = @azure_blob.list_blobs(@container, { :timeout => 10, :marker => continuation_token, :prefix => prefix})
110
+ entries.each do |entry|
111
+ entry_last_modified = DateTime.parse(entry.properties[:last_modified]) # Normally in GMT 0
112
+ elapsed_seconds = ((now_time - entry_last_modified) * 24 * 60 * 60).to_i
113
+ if (elapsed_seconds <= @ignore_older)
114
+ blobs[entry.name] = entry
115
+ end
116
+ end
117
+ continuation_token = entries.continuation_token
118
+ break if continuation_token.empty?
119
+ end
120
+ end
121
+
122
+ @logger.info("#{DateTime.now} Finished looking for blobs. #{blobs.length} are queued for possible candidate with new data")
123
+
124
+ return blobs
125
+ end # def list_blobs
126
+
127
+ # Acquire the lock on the blob. Default duration is 60 seconds with a timeout of 10 seconds.
128
+ # *blob_name: Blob name to threat
129
+ # Returns true if the aquiring works
130
+ def acquire_lock(blob_name)
131
+ @azure_blob.create_page_blob(@container, blob_name, 512)
132
+ @azure_blob.acquire_lease(@container, blob_name,{:duration=>60, :timeout=>10, :proposed_lease_id=>SecureRandom.uuid})
133
+
134
+ return true
135
+
136
+ # Shutdown signal for graceful shutdown in LogStash
137
+ rescue LogStash::ShutdownSignal => e
138
+ raise e
139
+ rescue => e
140
+ @logger.error("#{DateTime.now} Caught exception while locking", :exception => e)
141
+ return false
142
+ end # def acquire_lock
143
+
144
+ # Do the official lock on the blob
145
+ # *blob_names: Array of blob names to threat
146
+ def lock_blob(blobs)
147
+ # Take all the blobs without a lock file.
148
+ real_blobs = blobs.select { |name, v| !name.end_with?(".lock") }
149
+
150
+ # Return the first one not marked as locked + lock it.
151
+ real_blobs.each do |blob_name, blob|
152
+ if !blobs.keys.include?(blob_name + ".lock")
153
+ if acquire_lock(blob_name + ".lock")
154
+ return blob
155
+ end
156
+ end
157
+ end
158
+
159
+ return NIL
160
+ end # def lock_blob
161
+
162
+ def list_sinceDbContainerEntities
163
+ entities = Set.new
164
+
165
+ #loop do
166
+ continuation_token = NIL
167
+
168
+ entries = @azure_table.query_entities(@sincedb, { :filter => "PartitionKey eq '#{container}'", :continuation_token => continuation_token})
169
+ entries.each do |entry|
170
+ entities << entry
171
+ end
172
+ #continuation_token = entries.continuation_token
173
+ #break if continuation_token.empty?
174
+ #end
175
+
176
+ return entities
177
+ end # def list_sinceDbContainerEntities
178
+
179
+ # Process the plugin ans start watching.
180
+ def process(output_queue)
181
+ blobs = list_blobs
182
+
183
+ # use the azure table in order to set the :start_range and :end_range
184
+ # When we do that, we shouldn't use the lock strategy, since we know where we are at. It would still be interesting in a multi-thread
185
+ # environment.
186
+ if (@sincedb)
187
+ existing_entities = list_sinceDbContainerEntities # @azure_table.query_entities(@sincedb, { :filter => "PartitionKey eq '#{container}'"}) # continuation_token...
188
+
189
+ blobs.each do |blob_name, blob_info|
190
+ blob_name_encoded = Base64.strict_encode64(blob_info.name)
191
+ entityIndex = existing_entities.find_index {|entity| entity.properties["RowKey"] == blob_name_encoded }
192
+
193
+ entity = {
194
+ "PartitionKey" => @container,
195
+ "RowKey" => blob_name_encoded,
196
+ "ByteOffset" => 0, # First contact, start_position is beginning by default
197
+ "ETag" => NIL,
198
+ "BlobName" => blob_info.name
199
+ }
200
+
201
+ if (entityIndex)
202
+ # exists in table
203
+ foundEntity = existing_entities.to_a[entityIndex];
204
+ entity["ByteOffset"] = foundEntity.properties["ByteOffset"]
205
+ entity["ETag"] = foundEntity.properties["ETag"]
206
+ elsif (@start_position === "end")
207
+ # first contact
208
+ entity["ByteOffset"] = blob_info.properties[:content_length]
209
+ end
210
+
211
+ if (entity["ETag"] === blob_info.properties[:etag])
212
+ # nothing to do...
213
+ # @logger.info("#{DateTime.now} Blob already up to date #{blob_info.name}")
214
+ else
215
+ @logger.info("#{DateTime.now} Processing #{blob_info.name}")
216
+ blob, content = @azure_blob.get_blob(@container, blob_info.name, { :start_range => entity["ByteOffset"], :end_range => blob_info.properties[:content_length] })
217
+
218
+ @codec.decode(content) do |event|
219
+ decorate(event) # we could also add the host name that read the blob in the event from here.
220
+ # event["host"] = hostname...
221
+ output_queue << event
222
+ end
223
+
224
+ # Update the entity with the latest informations we used while processing the blob. If we have a crash,
225
+ # we will re-process the last batch.
226
+ entity["ByteOffset"] = blob_info.properties[:content_length]
227
+ entity["ETag"] = blob_info.properties[:etag]
228
+ @azure_table.insert_or_merge_entity(@sincedb, entity)
229
+ end
230
+ end
231
+ else
232
+ # Process the ones not yet processed. (The newly locked blob)
233
+ blob_info = lock_blob(blobs)
234
+
235
+ # Do what we were doing before
236
+ return if !blob_info
237
+ @logger.info("#{DateTime.now} Processing #{blob_info.name}")
238
+
239
+ blob, content = @azure_blob.get_blob(@container, blob_info.name)
240
+ @codec.decode(content) do |event|
241
+ decorate(event) # we could also add the host name that read the blob in the event from here.
242
+ # event["host"] = hostname...
243
+ output_queue << event
244
+ end
245
+ end
246
+
247
+ # Shutdown signal for graceful shutdown in LogStash
248
+ rescue LogStash::ShutdownSignal => e
249
+ raise e
250
+ rescue => e
251
+ @logger.error("#{DateTime.now} Oh My, An error occurred.", :exception => e)
252
+ end # def process
253
+
254
+ # Run the plugin (Called directly by LogStash)
255
+ public
256
+ def run(output_queue)
257
+ # Infinite processing loop.
258
+ while !stop?
259
+ process(output_queue)
260
+ sleep sleep_time
261
+ end # loop
262
+ end # def run
263
+
264
+ public
265
+ def teardown
266
+ # Nothing to do.
267
+ @logger.info("Teardown")
268
+ end # def teardown
269
+ end # class LogStash::Inputs::Azureblob
@@ -0,0 +1,24 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'Nordes-logstash-input-azureblob'
3
+ s.version = '0.9.5-1'
4
+ s.licenses = ['Apache License (2.0)']
5
+ s.summary = "This plugin collects Microsoft Azure Diagnostics data from Azure Storage Blobs."
6
+ s.description = "This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs."
7
+ s.authors = ["Microsoft Corporation"]
8
+ s.email = 'azdiag@microsoft.com'
9
+ s.homepage = "https://github.com/Azure/azure-diagnostics-tools"
10
+ s.require_paths = ["lib"]
11
+
12
+ # Files
13
+ s.files = Dir['lib/**/*','spec/**/*','vendor/**/*','*.gemspec','*.md','Gemfile','LICENSE']
14
+ # Tests
15
+ s.test_files = s.files.grep(%r{^(test|spec|features)/})
16
+
17
+ # Special flag to let us know this is actually a logstash plugin
18
+ s.metadata = { "logstash_plugin" => "true", "logstash_group" => "input" }
19
+
20
+ # Gem dependencies
21
+ s.add_runtime_dependency "logstash-core-plugin-api", ">= 1.60", "<= 2.99"
22
+ s.add_runtime_dependency 'azure', '~> 0.7.1'
23
+ s.add_development_dependency 'logstash-devutils'
24
+ end
@@ -0,0 +1 @@
1
+ require "logstash/devutils/rspec/spec_helper"
metadata ADDED
@@ -0,0 +1,101 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: Nordes-logstash-input-azureblob
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.9.5.pre.1
5
+ platform: ruby
6
+ authors:
7
+ - Microsoft Corporation
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-05-15 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - ">="
17
+ - !ruby/object:Gem::Version
18
+ version: '1.60'
19
+ - - "<="
20
+ - !ruby/object:Gem::Version
21
+ version: '2.99'
22
+ name: logstash-core-plugin-api
23
+ prerelease: false
24
+ type: :runtime
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - ">="
28
+ - !ruby/object:Gem::Version
29
+ version: '1.60'
30
+ - - "<="
31
+ - !ruby/object:Gem::Version
32
+ version: '2.99'
33
+ - !ruby/object:Gem::Dependency
34
+ requirement: !ruby/object:Gem::Requirement
35
+ requirements:
36
+ - - "~>"
37
+ - !ruby/object:Gem::Version
38
+ version: 0.7.1
39
+ name: azure
40
+ prerelease: false
41
+ type: :runtime
42
+ version_requirements: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - "~>"
45
+ - !ruby/object:Gem::Version
46
+ version: 0.7.1
47
+ - !ruby/object:Gem::Dependency
48
+ requirement: !ruby/object:Gem::Requirement
49
+ requirements:
50
+ - - ">="
51
+ - !ruby/object:Gem::Version
52
+ version: '0'
53
+ name: logstash-devutils
54
+ prerelease: false
55
+ type: :development
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: '0'
61
+ description: This gem is a Logstash plugin. It reads and parses data from Azure Storage Blobs.
62
+ email: azdiag@microsoft.com
63
+ executables: []
64
+ extensions: []
65
+ extra_rdoc_files: []
66
+ files:
67
+ - CHANGELOG.md
68
+ - Gemfile
69
+ - LICENSE
70
+ - README.md
71
+ - lib/logstash/inputs/azureblob.rb
72
+ - logstash-input-azureblob.gemspec
73
+ - spec/inputs/azureblob_spec.rb
74
+ homepage: https://github.com/Azure/azure-diagnostics-tools
75
+ licenses:
76
+ - Apache License (2.0)
77
+ metadata:
78
+ logstash_plugin: 'true'
79
+ logstash_group: input
80
+ post_install_message:
81
+ rdoc_options: []
82
+ require_paths:
83
+ - lib
84
+ required_ruby_version: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - ">="
87
+ - !ruby/object:Gem::Version
88
+ version: '0'
89
+ required_rubygems_version: !ruby/object:Gem::Requirement
90
+ requirements:
91
+ - - ">"
92
+ - !ruby/object:Gem::Version
93
+ version: 1.3.1
94
+ requirements: []
95
+ rubyforge_project:
96
+ rubygems_version: 2.4.8
97
+ signing_key:
98
+ specification_version: 4
99
+ summary: This plugin collects Microsoft Azure Diagnostics data from Azure Storage Blobs.
100
+ test_files:
101
+ - spec/inputs/azureblob_spec.rb