logstash-input-azure_blob_storage 0.11.3 → 0.11.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +11 -1
- data/README.md +47 -9
- data/lib/logstash/inputs/azure_blob_storage.rb +112 -52
- data/logstash-input-azure_blob_storage.gemspec +2 -2
- metadata +9 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 158d9ef3b7997fb3ec67f4e2278861ae367c3e4a73f362dc56f145482d802e34
|
4
|
+
data.tar.gz: 89f5b1bc848a97cbf31b1323aa64d021d86a05292d3d7d006994ad170666a37d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 80f12e364ba3fd81375d2b88d24567d92ec83decac371552e3a814194f6dcae2f1c6991ac87f50e0012a8cb177f67da92790d40a71af953b211e5043a1691170
|
7
|
+
data.tar.gz: 0e54b9c0b9f63737ef8046d362c47f1c20f2d9f702db0311993def976f1a40c14534c7fae9a7a90e098ce4b3bdd18d00517f420e9cc6c4b7810f3709aee797e1
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,13 @@
|
|
1
|
+
## 0.11.4
|
2
|
+
- fixed listing 3 times, rather than retrying to list max 3 times
|
3
|
+
- added log entries for better tracing in which phase the application is now and how long it takes
|
4
|
+
- removing pipeline name from logfiles, logstash 7.6 and up have this in the log4j2 by default now
|
5
|
+
- moved initialization from register to run. should make logs more readable
|
6
|
+
|
7
|
+
## 0.11.3
|
8
|
+
- don't crash on failed codec, e.g. gzip_lines could sometimes have a corrupted file?
|
9
|
+
- fix nextmarker loop so that more than 5000 files (or 15000 if faraday doesn't crash)
|
10
|
+
|
1
11
|
## 0.11.2
|
2
12
|
- implemented path_filters to to use path filtering like this **/*.log
|
3
13
|
- implemented debug_until to debug only at the start of a pipeline until it processed enough messages
|
@@ -10,7 +20,7 @@
|
|
10
20
|
## 0.11.0
|
11
21
|
- implemented start_fresh to skip all previous logs and start monitoring new entries
|
12
22
|
- fixed the timer, now properly sleep the interval and check again
|
13
|
-
-
|
23
|
+
- work around for a Faraday Middleware v.s. Azure Storage Account bug in follow_redirect
|
14
24
|
|
15
25
|
## 0.10.6
|
16
26
|
- fixed the rootcause of the checking the codec. Now compare the classname.
|
data/README.md
CHANGED
@@ -6,7 +6,7 @@ It is fully free and fully open source. The license is Apache 2.0, meaning you a
|
|
6
6
|
|
7
7
|
## Documentation
|
8
8
|
|
9
|
-
All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
|
9
|
+
All logstash plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
|
10
10
|
|
11
11
|
## Need Help?
|
12
12
|
|
@@ -15,15 +15,57 @@ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/log
|
|
15
15
|
## Purpose
|
16
16
|
This plugin can read from Azure Storage Blobs, for instance diagnostics logs for NSG flow logs or accesslogs from App Services.
|
17
17
|
[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
|
18
|
-
|
19
|
-
After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
|
20
|
-
|
18
|
+
This
|
21
19
|
## Installation
|
22
20
|
This plugin can be installed through logstash-plugin
|
23
21
|
```
|
24
22
|
logstash-plugin install logstash-input-azure_blob_storage
|
25
23
|
```
|
26
24
|
|
25
|
+
## Minimal Configuration
|
26
|
+
The minimum configuration required as input is storageaccount, access_key and container.
|
27
|
+
|
28
|
+
```
|
29
|
+
input {
|
30
|
+
azure_blob_storage {
|
31
|
+
storageaccount => "yourstorageaccountname"
|
32
|
+
access_key => "Ba5e64c0d3=="
|
33
|
+
container => "insights-logs-networksecuritygroupflowevent"
|
34
|
+
}
|
35
|
+
}
|
36
|
+
```
|
37
|
+
|
38
|
+
## Additional Configuration
|
39
|
+
The registry_create_policy is used when the pipeline is started to either resume from the last known unprocessed file, or to start_fresh ignoring old files or start_over to process all the files from the beginning.
|
40
|
+
|
41
|
+
interval defines the minimum time the registry should be saved to the registry file (by default 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
|
42
|
+
|
43
|
+
During the pipeline start the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually.
|
44
|
+
|
45
|
+
## Running the pipeline
|
46
|
+
The pipeline can be started in several ways.
|
47
|
+
- On the commandline
|
48
|
+
```
|
49
|
+
/usr/share/logstash/bin/logtash -f /etc/logstash/pipeline.d/test.yml
|
50
|
+
```
|
51
|
+
- In the pipeline.yml
|
52
|
+
```
|
53
|
+
/etc/logstash/pipeline.yml
|
54
|
+
pipe.id = test
|
55
|
+
pipe.path = /etc/logstash/pipeline.d/test.yml
|
56
|
+
```
|
57
|
+
- As managed pipeline from Kibana
|
58
|
+
|
59
|
+
Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up). To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
|
60
|
+
[https://www.elastic.co/guide/en/logstash/current/reloading-config.html](https://www.elastic.co/guide/en/logstash/current/reloading-config.html)
|
61
|
+
|
62
|
+
## Internal Working
|
63
|
+
When the plugin is started, it will read all the filenames and sizes in the blob store excluding the directies of files that are excluded by the "path_filters". After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format.
|
64
|
+
|
65
|
+
By default the root of the json message is named "message" so you can modify the content in the filter block
|
66
|
+
|
67
|
+
The configurations and the rest of the code are in [https://github.com/janmg/logstash-input-azure_blob_storage/tree/master/lib/logstash/inputs](lib/logstash/inputs) [https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L10](azure_blob_storage.rb)
|
68
|
+
|
27
69
|
## Enabling NSG Flowlogs
|
28
70
|
1. Enable Network Watcher in your regions
|
29
71
|
2. Create Storage account per region
|
@@ -39,7 +81,6 @@ logstash-plugin install logstash-input-azure_blob_storage
|
|
39
81
|
- Access key (key1 or key2)
|
40
82
|
|
41
83
|
## Troubleshooting
|
42
|
-
|
43
84
|
The default loglevel can be changed in global logstash.yml. On the info level, the plugin save offsets to the registry every interval and will log statistics of processed events (one ) plugin will print for each pipeline the first 6 characters of the ID, in DEBUG the yml log level debug shows details of number of events per (partial) files that are read.
|
44
85
|
```
|
45
86
|
log.level
|
@@ -51,9 +92,7 @@ curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/j
|
|
51
92
|
```
|
52
93
|
|
53
94
|
|
54
|
-
## Configuration Examples
|
55
|
-
The minimum configuration required as input is storageaccount, access_key and container.
|
56
|
-
|
95
|
+
## Other Configuration Examples
|
57
96
|
For nsgflowlogs, a simple configuration looks like this
|
58
97
|
```
|
59
98
|
input {
|
@@ -85,7 +124,6 @@ output {
|
|
85
124
|
}
|
86
125
|
```
|
87
126
|
|
88
|
-
It's possible to specify the optional parameters to overwrite the defaults. The iplookup, use_redis and iplist parameters are used for additional information about the source and destination ip address. Redis can be used for caching the results and iplist is to configure an array of ip addresses.
|
89
127
|
```
|
90
128
|
input {
|
91
129
|
azure_blob_storage {
|
@@ -39,6 +39,9 @@ config :container, :validate => :string, :default => 'insights-logs-networksecur
|
|
39
39
|
# The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
|
40
40
|
config :registry_path, :validate => :string, :required => false, :default => 'data/registry.dat'
|
41
41
|
|
42
|
+
# If registry_local_path is set to a directory on the local server, the registry is save there instead of the remote blob_storage
|
43
|
+
config :registry_local_path, :validate => :string, :required => false
|
44
|
+
|
42
45
|
# The default, `resume`, will load the registry offsets and will start processing files from the offsets.
|
43
46
|
# When set to `start_over`, all log files are processed from begining.
|
44
47
|
# when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
|
@@ -58,6 +61,9 @@ config :interval, :validate => :number, :default => 60
|
|
58
61
|
# debug_until will for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. This is a lightweight alternative to switching the loglevel from info to debug or even trace
|
59
62
|
config :debug_until, :validate => :number, :default => 0, :required => false
|
60
63
|
|
64
|
+
# debug_timer show time spent on activities
|
65
|
+
config :debug_timer, :validate => :boolean, :default => false, :required => false
|
66
|
+
|
61
67
|
# WAD IIS Grok Pattern
|
62
68
|
#config :grokpattern, :validate => :string, :required => false, :default => '%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:instanceId} %{NOTSPACE:instanceId2} %{IPORHOST:ServerIP} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:httpVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}'
|
63
69
|
|
@@ -90,12 +96,15 @@ config :path_filters, :validate => :array, :default => ['**/*'], :required => fa
|
|
90
96
|
public
|
91
97
|
def register
|
92
98
|
@pipe_id = Thread.current[:name].split("[").last.split("]").first
|
93
|
-
@logger.info("=== "+config_name
|
94
|
-
#@logger.info("ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL } / #{Gem.loaded_specs[config_name].version.to_s}")
|
99
|
+
@logger.info("=== #{config_name} #{Gem.loaded_specs["logstash-input-"+config_name].version.to_s} / #{@pipe_id} / #{@id[0,6]} / ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL } ===")
|
95
100
|
@logger.info("If this plugin doesn't work, please raise an issue in https://github.com/janmg/logstash-input-azure_blob_storage")
|
96
101
|
# TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
|
97
102
|
# TODO: Implement retry ... Error: Connection refused - Failed to open TCP connection to
|
103
|
+
end
|
104
|
+
|
98
105
|
|
106
|
+
|
107
|
+
def run(queue)
|
99
108
|
# counter for all processed events since the start of this pipeline
|
100
109
|
@processed = 0
|
101
110
|
@regsaved = @processed
|
@@ -127,22 +136,38 @@ def register
|
|
127
136
|
|
128
137
|
@registry = Hash.new
|
129
138
|
if registry_create_policy == "resume"
|
130
|
-
@logger.info(@pipe_id+" resuming from registry")
|
131
139
|
for counter in 1..3
|
132
140
|
begin
|
133
|
-
|
134
|
-
|
141
|
+
if (!@registry_local_path.nil?)
|
142
|
+
unless File.file?(@registry_local_path+"/"+@pipe_id)
|
143
|
+
@registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
|
144
|
+
#[0] headers [1] responsebody
|
145
|
+
@logger.info("migrating from remote registry #{registry_path}")
|
146
|
+
else
|
147
|
+
if !Dir.exist?(@registry_local_path)
|
148
|
+
FileUtils.mkdir_p(@registry_local_path)
|
149
|
+
end
|
150
|
+
@registry = Marshal.load(File.read(@registry_local_path+"/"+@pipe_id))
|
151
|
+
@logger.info("resuming from local registry #{registry_local_path+"/"+@pipe_id}")
|
152
|
+
end
|
153
|
+
else
|
154
|
+
@registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
|
155
|
+
#[0] headers [1] responsebody
|
156
|
+
@logger.info("resuming from remote registry #{registry_path}")
|
157
|
+
end
|
158
|
+
break
|
135
159
|
rescue Exception => e
|
136
|
-
@logger.error(
|
160
|
+
@logger.error("caught: #{e.message}")
|
137
161
|
@registry.clear
|
138
|
-
@logger.error(
|
162
|
+
@logger.error("loading registry failed for attempt #{counter} of 3")
|
139
163
|
end
|
140
164
|
end
|
141
165
|
end
|
142
166
|
# read filelist and set offsets to file length to mark all the old files as done
|
143
167
|
if registry_create_policy == "start_fresh"
|
144
|
-
@logger.info(@pipe_id+" starting fresh")
|
145
168
|
@registry = list_blobs(true)
|
169
|
+
save_registry(@registry)
|
170
|
+
@logger.info("starting fresh, overwriting the registry to contain #{@registry.size} blobs/files")
|
146
171
|
end
|
147
172
|
|
148
173
|
@is_json = false
|
@@ -162,27 +187,32 @@ def register
|
|
162
187
|
if file_tail
|
163
188
|
@tail = file_tail
|
164
189
|
end
|
165
|
-
@logger.info(
|
190
|
+
@logger.info("head will be: #{@head} and tail is set to #{@tail}")
|
166
191
|
end
|
167
|
-
end # def register
|
168
|
-
|
169
|
-
|
170
192
|
|
171
|
-
def run(queue)
|
172
193
|
newreg = Hash.new
|
173
194
|
filelist = Hash.new
|
174
195
|
worklist = Hash.new
|
175
|
-
|
196
|
+
@last = start = Time.now.to_i
|
197
|
+
|
198
|
+
# This is the main loop, it
|
199
|
+
# 1. Lists all the files in the remote storage account that match the path prefix
|
200
|
+
# 2. Filters on path_filters to only include files that match the directory and file glob (**/*.json)
|
201
|
+
# 3. Save the listed files in a registry of known files and filesizes.
|
202
|
+
# 4. List all the files again and compare the registry with the new filelist and put the delta in a worklist
|
203
|
+
# 5. Process the worklist and put all events in the logstash queue.
|
204
|
+
# 6. if there is time left, sleep to complete the interval. If processing takes more than an inteval, save the registry and continue.
|
205
|
+
# 7. If stop signal comes, finish the current file, save the registry and quit
|
176
206
|
while !stop?
|
177
|
-
chrono = Time.now.to_i
|
178
207
|
# load the registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
|
179
208
|
# TODO: sort by timestamp ?
|
180
209
|
#filelist.sort_by(|k,v|resource(k)[:date])
|
181
210
|
worklist.clear
|
182
211
|
filelist.clear
|
183
212
|
newreg.clear
|
213
|
+
|
214
|
+
# Listing all the files
|
184
215
|
filelist = list_blobs(false)
|
185
|
-
# registry.merge(filelist) {|key, :offset, :length| :offset.merge :length }
|
186
216
|
filelist.each do |name, file|
|
187
217
|
off = 0
|
188
218
|
begin
|
@@ -193,13 +223,15 @@ def run(queue)
|
|
193
223
|
newreg.store(name, { :offset => off, :length => file[:length] })
|
194
224
|
if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
|
195
225
|
end
|
196
|
-
|
197
226
|
# Worklist is the subset of files where the already read offset is smaller than the file size
|
198
227
|
worklist.clear
|
199
228
|
worklist = newreg.select {|name,file| file[:offset] < file[:length]}
|
200
|
-
|
229
|
+
if (worklist.size > 4) then @logger.info("worklist contains #{worklist.size} blobs") end
|
230
|
+
|
231
|
+
# Start of processing
|
232
|
+
# This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
|
201
233
|
worklist.each do |name, file|
|
202
|
-
|
234
|
+
start = Time.now.to_i
|
203
235
|
if (@debug_until > @processed) then @logger.info("3: processing #{name} from #{file[:offset]} to #{file[:length]}") end
|
204
236
|
size = 0
|
205
237
|
if file[:offset] == 0
|
@@ -207,16 +239,16 @@ def run(queue)
|
|
207
239
|
size=chunk.size
|
208
240
|
else
|
209
241
|
chunk = partial_read_json(name, file[:offset], file[:length])
|
210
|
-
@logger.
|
242
|
+
@logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
|
211
243
|
end
|
212
244
|
if logtype == "nsgflowlog" && @is_json
|
213
245
|
res = resource(name)
|
214
246
|
begin
|
215
247
|
fingjson = JSON.parse(chunk)
|
216
248
|
@processed += nsgflowlog(queue, fingjson)
|
217
|
-
@logger.debug(
|
249
|
+
@logger.debug("Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
|
218
250
|
rescue JSON::ParserError
|
219
|
-
@logger.error(
|
251
|
+
@logger.error("parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
|
220
252
|
end
|
221
253
|
# TODO: Convert this to line based grokking.
|
222
254
|
# TODO: ECS Compliance?
|
@@ -231,29 +263,32 @@ def run(queue)
|
|
231
263
|
queue << event
|
232
264
|
end
|
233
265
|
rescue Exception => e
|
234
|
-
@logger.error(
|
235
|
-
@logger.debug(
|
266
|
+
@logger.error("codec exception: #{e.message} .. will continue and pretend this never happened")
|
267
|
+
@logger.debug("#{chunk}")
|
236
268
|
end
|
237
269
|
@processed += counter
|
238
270
|
end
|
239
271
|
@registry.store(name, { :offset => size, :length => file[:length] })
|
240
272
|
# TODO add input plugin option to prevent connection cache
|
241
273
|
@blob_client.client.reset_agents!
|
242
|
-
#@logger.info(
|
274
|
+
#@logger.info("name #{name} size #{size} len #{file[:length]}")
|
243
275
|
# if stop? good moment to stop what we're doing
|
244
276
|
if stop?
|
245
277
|
return
|
246
278
|
end
|
247
|
-
|
248
|
-
now = Time.now.to_i
|
249
|
-
if ((now - chrono) > interval)
|
279
|
+
if ((Time.now.to_i - @last) > @interval)
|
250
280
|
save_registry(@registry)
|
251
|
-
chrono += interval
|
252
281
|
end
|
253
282
|
end
|
254
|
-
|
255
|
-
|
256
|
-
|
283
|
+
# The files that got processed after the last registry save need to be saved too, in case the worklist is empty for some intervals.
|
284
|
+
now = Time.now.to_i
|
285
|
+
if ((now - @last) > @interval)
|
286
|
+
save_registry(@registry)
|
287
|
+
end
|
288
|
+
sleeptime = interval - ((now - start) % interval)
|
289
|
+
if @debug_timer
|
290
|
+
@logger.info("going to sleep for #{sleeptime} seconds")
|
291
|
+
end
|
257
292
|
Stud.stoppable_sleep(sleeptime) { stop? }
|
258
293
|
end
|
259
294
|
end
|
@@ -338,51 +373,76 @@ end
|
|
338
373
|
# list all blobs in the blobstore, set the offsets from the registry and return the filelist
|
339
374
|
# inspired by: https://github.com/Azure-Samples/storage-blobs-ruby-quickstart/blob/master/example.rb
|
340
375
|
def list_blobs(fill)
|
341
|
-
|
342
|
-
|
343
|
-
|
344
|
-
|
345
|
-
|
376
|
+
tries ||= 3
|
377
|
+
begin
|
378
|
+
return try_list_blobs(fill)
|
379
|
+
rescue Exception => e
|
380
|
+
@logger.error("caught: #{e.message} for list_blobs retries left #{tries}")
|
381
|
+
if (tries -= 1) > 0
|
382
|
+
retry
|
383
|
+
end
|
384
|
+
end
|
385
|
+
end
|
386
|
+
|
387
|
+
def try_list_blobs(fill)
|
388
|
+
# inspired by: http://blog.mirthlab.com/2012/05/25/cleanly-retrying-blocks-of-code-after-an-exception-in-ruby/
|
389
|
+
chrono = Time.now.to_i
|
390
|
+
files = Hash.new
|
391
|
+
nextMarker = nil
|
392
|
+
counter = 1
|
393
|
+
loop do
|
346
394
|
blobs = @blob_client.list_blobs(container, { marker: nextMarker, prefix: @prefix})
|
347
395
|
blobs.each do |blob|
|
348
396
|
# FNM_PATHNAME is required so that "**/test" can match "test" at the root folder
|
349
397
|
# FNM_EXTGLOB allows you to use "test{a,b,c}" to match either "testa", "testb" or "testc" (closer to shell behavior)
|
350
398
|
unless blob.name == registry_path
|
351
|
-
|
399
|
+
if @path_filters.any? {|path| File.fnmatch?(path, blob.name, File::FNM_PATHNAME | File::FNM_EXTGLOB)}
|
352
400
|
length = blob.properties[:content_length].to_i
|
353
401
|
offset = 0
|
354
402
|
if fill
|
355
403
|
offset = length
|
356
404
|
end
|
357
405
|
files.store(blob.name, { :offset => offset, :length => length })
|
358
|
-
|
406
|
+
if (@debug_until > @processed) then @logger.info("1: list_blobs #{blob.name} #{offset} #{length}") end
|
359
407
|
end
|
360
408
|
end
|
361
409
|
end
|
362
410
|
nextMarker = blobs.continuation_token
|
363
411
|
break unless nextMarker && !nextMarker.empty?
|
412
|
+
if (counter % 10 == 0) then @logger.info(" listing #{counter * 50000} files") end
|
413
|
+
counter+=1
|
414
|
+
end
|
415
|
+
if @debug_timer
|
416
|
+
@logger.info("list_blobs took #{Time.now.to_i - chrono} sec")
|
364
417
|
end
|
365
|
-
rescue Exception => e
|
366
|
-
@logger.error(@pipe_id+" caught: #{e.message} for attempt #{counter} of 3")
|
367
|
-
counter += 1
|
368
|
-
end
|
369
|
-
end
|
370
418
|
return files
|
371
419
|
end
|
372
420
|
|
373
421
|
# When events were processed after the last registry save, start a thread to update the registry file.
|
374
422
|
def save_registry(filelist)
|
375
|
-
#
|
423
|
+
# Because of threading, processed values and regsaved are not thread safe, they can change as instance variable @! Most of the time this is fine because the registry is the last resort, but be careful about corner cases!
|
376
424
|
unless @processed == @regsaved
|
377
425
|
@regsaved = @processed
|
378
|
-
|
379
|
-
|
426
|
+
unless (@busy_writing_registry)
|
427
|
+
Thread.new {
|
380
428
|
begin
|
381
|
-
@
|
429
|
+
@busy_writing_registry = true
|
430
|
+
unless (@registry_local_path)
|
431
|
+
@blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
|
432
|
+
@logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
|
433
|
+
else
|
434
|
+
File.open(@registry_local_path+"/"+@pipe_id, 'w') { |file| file.write(Marshal.dump(filelist)) }
|
435
|
+
@logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_local_path+"/"+@pipe_id}")
|
436
|
+
end
|
437
|
+
@busy_writing_registry = false
|
438
|
+
@last = Time.now.to_i
|
382
439
|
rescue
|
383
|
-
@logger.error(
|
440
|
+
@logger.error("Oh my, registry write failed, do you have write access?")
|
384
441
|
end
|
385
442
|
}
|
443
|
+
else
|
444
|
+
@logger.info("Skipped writing the registry because previous write still in progress, it just takes long or may be hanging!")
|
445
|
+
end
|
386
446
|
end
|
387
447
|
end
|
388
448
|
|
@@ -394,13 +454,13 @@ def learn_encapsulation
|
|
394
454
|
return if blob.nil?
|
395
455
|
blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
|
396
456
|
# TODO add check for empty blocks and log error that the header and footer can't be learned and must be set in the config
|
397
|
-
@logger.debug(
|
457
|
+
@logger.debug("using #{blob.name} to learn the json header and tail")
|
398
458
|
@head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
|
399
|
-
@logger.debug(
|
459
|
+
@logger.debug("learned header: #{@head}")
|
400
460
|
length = blob.properties[:content_length].to_i
|
401
461
|
offset = length - blocks.last.size
|
402
462
|
@tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
|
403
|
-
@logger.debug(
|
463
|
+
@logger.debug("learned tail: #{@tail}")
|
404
464
|
end
|
405
465
|
|
406
466
|
def resource(str)
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'logstash-input-azure_blob_storage'
|
3
|
-
s.version = '0.11.
|
3
|
+
s.version = '0.11.4'
|
4
4
|
s.licenses = ['Apache-2.0']
|
5
5
|
s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
|
6
6
|
s.description = <<-EOF
|
@@ -22,6 +22,6 @@ EOF
|
|
22
22
|
# Gem dependencies
|
23
23
|
s.add_runtime_dependency 'logstash-core-plugin-api', '~> 2.1'
|
24
24
|
s.add_runtime_dependency 'stud', '~> 0.0.23'
|
25
|
-
s.add_runtime_dependency 'azure-storage-blob', '~> 1.
|
25
|
+
s.add_runtime_dependency 'azure-storage-blob', '~> 1.1'
|
26
26
|
s.add_development_dependency 'logstash-devutils', '~> 1.0', '>= 1.0.0'
|
27
27
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-input-azure_blob_storage
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.11.
|
4
|
+
version: 0.11.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Geertsma
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2020-
|
11
|
+
date: 2020-05-23 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -17,8 +17,8 @@ dependencies:
|
|
17
17
|
- !ruby/object:Gem::Version
|
18
18
|
version: '2.1'
|
19
19
|
name: logstash-core-plugin-api
|
20
|
-
prerelease: false
|
21
20
|
type: :runtime
|
21
|
+
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - "~>"
|
@@ -31,8 +31,8 @@ dependencies:
|
|
31
31
|
- !ruby/object:Gem::Version
|
32
32
|
version: 0.0.23
|
33
33
|
name: stud
|
34
|
-
prerelease: false
|
35
34
|
type: :runtime
|
35
|
+
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
38
|
- - "~>"
|
@@ -43,15 +43,15 @@ dependencies:
|
|
43
43
|
requirements:
|
44
44
|
- - "~>"
|
45
45
|
- !ruby/object:Gem::Version
|
46
|
-
version: '1.
|
46
|
+
version: '1.1'
|
47
47
|
name: azure-storage-blob
|
48
|
-
prerelease: false
|
49
48
|
type: :runtime
|
49
|
+
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
52
|
- - "~>"
|
53
53
|
- !ruby/object:Gem::Version
|
54
|
-
version: '1.
|
54
|
+
version: '1.1'
|
55
55
|
- !ruby/object:Gem::Dependency
|
56
56
|
requirement: !ruby/object:Gem::Requirement
|
57
57
|
requirements:
|
@@ -62,8 +62,8 @@ dependencies:
|
|
62
62
|
- !ruby/object:Gem::Version
|
63
63
|
version: '1.0'
|
64
64
|
name: logstash-devutils
|
65
|
-
prerelease: false
|
66
65
|
type: :development
|
66
|
+
prerelease: false
|
67
67
|
version_requirements: !ruby/object:Gem::Requirement
|
68
68
|
requirements:
|
69
69
|
- - ">="
|
@@ -112,8 +112,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
112
112
|
- !ruby/object:Gem::Version
|
113
113
|
version: '0'
|
114
114
|
requirements: []
|
115
|
-
|
116
|
-
rubygems_version: 2.7.10
|
115
|
+
rubygems_version: 3.0.6
|
117
116
|
signing_key:
|
118
117
|
specification_version: 4
|
119
118
|
summary: This logstash plugin reads and parses data from Azure Storage Blobs.
|