logstash-input-azure_blob_storage 0.11.3 → 0.11.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 38eceb446d8dd92f8a0c86d6f7e48fb707babb0c611f437e98322272fcfea124
4
- data.tar.gz: ef970349f391a6809cdd91feabe88a5e2beb2c81402ad1ec0ac29ad7bdb0ed1e
3
+ metadata.gz: 158d9ef3b7997fb3ec67f4e2278861ae367c3e4a73f362dc56f145482d802e34
4
+ data.tar.gz: 89f5b1bc848a97cbf31b1323aa64d021d86a05292d3d7d006994ad170666a37d
5
5
  SHA512:
6
- metadata.gz: c0285b5459e65dd7b95766626f6336f6fc92a8a767b78912f9830e8b605beebe5a681f077c7b59b2cc1eb31861854e49ebf89a1372f8ac282fc5a537ad478d54
7
- data.tar.gz: 72a905d9d621a80333eeb06a69baa51b435b1efb66acdde4ab51ecf37c6f0aa388ec39af57e0991be1cc1121035212f5a4a9a966a798bfa2794048e8949f33f3
6
+ metadata.gz: 80f12e364ba3fd81375d2b88d24567d92ec83decac371552e3a814194f6dcae2f1c6991ac87f50e0012a8cb177f67da92790d40a71af953b211e5043a1691170
7
+ data.tar.gz: 0e54b9c0b9f63737ef8046d362c47f1c20f2d9f702db0311993def976f1a40c14534c7fae9a7a90e098ce4b3bdd18d00517f420e9cc6c4b7810f3709aee797e1
@@ -1,3 +1,13 @@
1
+ ## 0.11.4
2
+ - fixed listing 3 times, rather than retrying to list max 3 times
3
+ - added log entries for better tracing in which phase the application is now and how long it takes
4
+ - removing pipeline name from logfiles, logstash 7.6 and up have this in the log4j2 by default now
5
+ - moved initialization from register to run. should make logs more readable
6
+
7
+ ## 0.11.3
8
+ - don't crash on failed codec, e.g. gzip_lines could sometimes have a corrupted file?
9
+ - fix nextmarker loop so that more than 5000 files (or 15000 if faraday doesn't crash)
10
+
1
11
  ## 0.11.2
2
12
  - implemented path_filters to to use path filtering like this **/*.log
3
13
  - implemented debug_until to debug only at the start of a pipeline until it processed enough messages
@@ -10,7 +20,7 @@
10
20
  ## 0.11.0
11
21
  - implemented start_fresh to skip all previous logs and start monitoring new entries
12
22
  - fixed the timer, now properly sleep the interval and check again
13
- - Work around for a Faraday Middleware v.s. Azure Storage Account bug in follow_redirect
23
+ - work around for a Faraday Middleware v.s. Azure Storage Account bug in follow_redirect
14
24
 
15
25
  ## 0.10.6
16
26
  - fixed the rootcause of the checking the codec. Now compare the classname.
data/README.md CHANGED
@@ -6,7 +6,7 @@ It is fully free and fully open source. The license is Apache 2.0, meaning you a
6
6
 
7
7
  ## Documentation
8
8
 
9
- All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
9
+ All logstash plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
10
10
 
11
11
  ## Need Help?
12
12
 
@@ -15,15 +15,57 @@ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/log
15
15
  ## Purpose
16
16
  This plugin can read from Azure Storage Blobs, for instance diagnostics logs for NSG flow logs or accesslogs from App Services.
17
17
  [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
18
-
19
- After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
20
-
18
+ This
21
19
  ## Installation
22
20
  This plugin can be installed through logstash-plugin
23
21
  ```
24
22
  logstash-plugin install logstash-input-azure_blob_storage
25
23
  ```
26
24
 
25
+ ## Minimal Configuration
26
+ The minimum configuration required as input is storageaccount, access_key and container.
27
+
28
+ ```
29
+ input {
30
+ azure_blob_storage {
31
+ storageaccount => "yourstorageaccountname"
32
+ access_key => "Ba5e64c0d3=="
33
+ container => "insights-logs-networksecuritygroupflowevent"
34
+ }
35
+ }
36
+ ```
37
+
38
+ ## Additional Configuration
39
+ The registry_create_policy is used when the pipeline is started to either resume from the last known unprocessed file, or to start_fresh ignoring old files or start_over to process all the files from the beginning.
40
+
41
+ interval defines the minimum time the registry should be saved to the registry file (by default 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
42
+
43
+ During the pipeline start the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually.
44
+
45
+ ## Running the pipeline
46
+ The pipeline can be started in several ways.
47
+ - On the commandline
48
+ ```
49
+ /usr/share/logstash/bin/logtash -f /etc/logstash/pipeline.d/test.yml
50
+ ```
51
+ - In the pipeline.yml
52
+ ```
53
+ /etc/logstash/pipeline.yml
54
+ pipe.id = test
55
+ pipe.path = /etc/logstash/pipeline.d/test.yml
56
+ ```
57
+ - As managed pipeline from Kibana
58
+
59
+ Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up). To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
60
+ [https://www.elastic.co/guide/en/logstash/current/reloading-config.html](https://www.elastic.co/guide/en/logstash/current/reloading-config.html)
61
+
62
+ ## Internal Working
63
+ When the plugin is started, it will read all the filenames and sizes in the blob store excluding the directies of files that are excluded by the "path_filters". After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format.
64
+
65
+ By default the root of the json message is named "message" so you can modify the content in the filter block
66
+
67
+ The configurations and the rest of the code are in [https://github.com/janmg/logstash-input-azure_blob_storage/tree/master/lib/logstash/inputs](lib/logstash/inputs) [https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L10](azure_blob_storage.rb)
68
+
27
69
  ## Enabling NSG Flowlogs
28
70
  1. Enable Network Watcher in your regions
29
71
  2. Create Storage account per region
@@ -39,7 +81,6 @@ logstash-plugin install logstash-input-azure_blob_storage
39
81
  - Access key (key1 or key2)
40
82
 
41
83
  ## Troubleshooting
42
-
43
84
  The default loglevel can be changed in global logstash.yml. On the info level, the plugin save offsets to the registry every interval and will log statistics of processed events (one ) plugin will print for each pipeline the first 6 characters of the ID, in DEBUG the yml log level debug shows details of number of events per (partial) files that are read.
44
85
  ```
45
86
  log.level
@@ -51,9 +92,7 @@ curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/j
51
92
  ```
52
93
 
53
94
 
54
- ## Configuration Examples
55
- The minimum configuration required as input is storageaccount, access_key and container.
56
-
95
+ ## Other Configuration Examples
57
96
  For nsgflowlogs, a simple configuration looks like this
58
97
  ```
59
98
  input {
@@ -85,7 +124,6 @@ output {
85
124
  }
86
125
  ```
87
126
 
88
- It's possible to specify the optional parameters to overwrite the defaults. The iplookup, use_redis and iplist parameters are used for additional information about the source and destination ip address. Redis can be used for caching the results and iplist is to configure an array of ip addresses.
89
127
  ```
90
128
  input {
91
129
  azure_blob_storage {
@@ -39,6 +39,9 @@ config :container, :validate => :string, :default => 'insights-logs-networksecur
39
39
  # The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
40
40
  config :registry_path, :validate => :string, :required => false, :default => 'data/registry.dat'
41
41
 
42
+ # If registry_local_path is set to a directory on the local server, the registry is save there instead of the remote blob_storage
43
+ config :registry_local_path, :validate => :string, :required => false
44
+
42
45
  # The default, `resume`, will load the registry offsets and will start processing files from the offsets.
43
46
  # When set to `start_over`, all log files are processed from begining.
44
47
  # when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
@@ -58,6 +61,9 @@ config :interval, :validate => :number, :default => 60
58
61
  # debug_until will for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. This is a lightweight alternative to switching the loglevel from info to debug or even trace
59
62
  config :debug_until, :validate => :number, :default => 0, :required => false
60
63
 
64
+ # debug_timer show time spent on activities
65
+ config :debug_timer, :validate => :boolean, :default => false, :required => false
66
+
61
67
  # WAD IIS Grok Pattern
62
68
  #config :grokpattern, :validate => :string, :required => false, :default => '%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:instanceId} %{NOTSPACE:instanceId2} %{IPORHOST:ServerIP} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:httpVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}'
63
69
 
@@ -90,12 +96,15 @@ config :path_filters, :validate => :array, :default => ['**/*'], :required => fa
90
96
  public
91
97
  def register
92
98
  @pipe_id = Thread.current[:name].split("[").last.split("]").first
93
- @logger.info("=== "+config_name+" / "+@pipe_id+" / "+@id[0,6]+" ===")
94
- #@logger.info("ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL } / #{Gem.loaded_specs[config_name].version.to_s}")
99
+ @logger.info("=== #{config_name} #{Gem.loaded_specs["logstash-input-"+config_name].version.to_s} / #{@pipe_id} / #{@id[0,6]} / ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL } ===")
95
100
  @logger.info("If this plugin doesn't work, please raise an issue in https://github.com/janmg/logstash-input-azure_blob_storage")
96
101
  # TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
97
102
  # TODO: Implement retry ... Error: Connection refused - Failed to open TCP connection to
103
+ end
104
+
98
105
 
106
+
107
+ def run(queue)
99
108
  # counter for all processed events since the start of this pipeline
100
109
  @processed = 0
101
110
  @regsaved = @processed
@@ -127,22 +136,38 @@ def register
127
136
 
128
137
  @registry = Hash.new
129
138
  if registry_create_policy == "resume"
130
- @logger.info(@pipe_id+" resuming from registry")
131
139
  for counter in 1..3
132
140
  begin
133
- @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
134
- #[0] headers [1] responsebody
141
+ if (!@registry_local_path.nil?)
142
+ unless File.file?(@registry_local_path+"/"+@pipe_id)
143
+ @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
144
+ #[0] headers [1] responsebody
145
+ @logger.info("migrating from remote registry #{registry_path}")
146
+ else
147
+ if !Dir.exist?(@registry_local_path)
148
+ FileUtils.mkdir_p(@registry_local_path)
149
+ end
150
+ @registry = Marshal.load(File.read(@registry_local_path+"/"+@pipe_id))
151
+ @logger.info("resuming from local registry #{registry_local_path+"/"+@pipe_id}")
152
+ end
153
+ else
154
+ @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
155
+ #[0] headers [1] responsebody
156
+ @logger.info("resuming from remote registry #{registry_path}")
157
+ end
158
+ break
135
159
  rescue Exception => e
136
- @logger.error(@pipe_id+" caught: #{e.message}")
160
+ @logger.error("caught: #{e.message}")
137
161
  @registry.clear
138
- @logger.error(@pipe_id+" loading registry failed for attempt #{counter} of 3")
162
+ @logger.error("loading registry failed for attempt #{counter} of 3")
139
163
  end
140
164
  end
141
165
  end
142
166
  # read filelist and set offsets to file length to mark all the old files as done
143
167
  if registry_create_policy == "start_fresh"
144
- @logger.info(@pipe_id+" starting fresh")
145
168
  @registry = list_blobs(true)
169
+ save_registry(@registry)
170
+ @logger.info("starting fresh, overwriting the registry to contain #{@registry.size} blobs/files")
146
171
  end
147
172
 
148
173
  @is_json = false
@@ -162,27 +187,32 @@ def register
162
187
  if file_tail
163
188
  @tail = file_tail
164
189
  end
165
- @logger.info(@pipe_id+" head will be: #{@head} and tail is set to #{@tail}")
190
+ @logger.info("head will be: #{@head} and tail is set to #{@tail}")
166
191
  end
167
- end # def register
168
-
169
-
170
192
 
171
- def run(queue)
172
193
  newreg = Hash.new
173
194
  filelist = Hash.new
174
195
  worklist = Hash.new
175
- # we can abort the loop if stop? becomes true
196
+ @last = start = Time.now.to_i
197
+
198
+ # This is the main loop, it
199
+ # 1. Lists all the files in the remote storage account that match the path prefix
200
+ # 2. Filters on path_filters to only include files that match the directory and file glob (**/*.json)
201
+ # 3. Save the listed files in a registry of known files and filesizes.
202
+ # 4. List all the files again and compare the registry with the new filelist and put the delta in a worklist
203
+ # 5. Process the worklist and put all events in the logstash queue.
204
+ # 6. if there is time left, sleep to complete the interval. If processing takes more than an inteval, save the registry and continue.
205
+ # 7. If stop signal comes, finish the current file, save the registry and quit
176
206
  while !stop?
177
- chrono = Time.now.to_i
178
207
  # load the registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
179
208
  # TODO: sort by timestamp ?
180
209
  #filelist.sort_by(|k,v|resource(k)[:date])
181
210
  worklist.clear
182
211
  filelist.clear
183
212
  newreg.clear
213
+
214
+ # Listing all the files
184
215
  filelist = list_blobs(false)
185
- # registry.merge(filelist) {|key, :offset, :length| :offset.merge :length }
186
216
  filelist.each do |name, file|
187
217
  off = 0
188
218
  begin
@@ -193,13 +223,15 @@ def run(queue)
193
223
  newreg.store(name, { :offset => off, :length => file[:length] })
194
224
  if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
195
225
  end
196
-
197
226
  # Worklist is the subset of files where the already read offset is smaller than the file size
198
227
  worklist.clear
199
228
  worklist = newreg.select {|name,file| file[:offset] < file[:length]}
200
- # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
229
+ if (worklist.size > 4) then @logger.info("worklist contains #{worklist.size} blobs") end
230
+
231
+ # Start of processing
232
+ # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
201
233
  worklist.each do |name, file|
202
- #res = resource(name)
234
+ start = Time.now.to_i
203
235
  if (@debug_until > @processed) then @logger.info("3: processing #{name} from #{file[:offset]} to #{file[:length]}") end
204
236
  size = 0
205
237
  if file[:offset] == 0
@@ -207,16 +239,16 @@ def run(queue)
207
239
  size=chunk.size
208
240
  else
209
241
  chunk = partial_read_json(name, file[:offset], file[:length])
210
- @logger.info(@pipe_id+" partial file #{name} from #{file[:offset]} to #{file[:length]}")
242
+ @logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
211
243
  end
212
244
  if logtype == "nsgflowlog" && @is_json
213
245
  res = resource(name)
214
246
  begin
215
247
  fingjson = JSON.parse(chunk)
216
248
  @processed += nsgflowlog(queue, fingjson)
217
- @logger.debug(@pipe_id+" Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
249
+ @logger.debug("Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
218
250
  rescue JSON::ParserError
219
- @logger.error(@pipe_id+" parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
251
+ @logger.error("parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
220
252
  end
221
253
  # TODO: Convert this to line based grokking.
222
254
  # TODO: ECS Compliance?
@@ -231,29 +263,32 @@ def run(queue)
231
263
  queue << event
232
264
  end
233
265
  rescue Exception => e
234
- @logger.error(@pipe_id+" codec exception: #{e.message} .. will continue and pretend this never happened")
235
- @logger.debug(@pipe_id+" #{chunk}")
266
+ @logger.error("codec exception: #{e.message} .. will continue and pretend this never happened")
267
+ @logger.debug("#{chunk}")
236
268
  end
237
269
  @processed += counter
238
270
  end
239
271
  @registry.store(name, { :offset => size, :length => file[:length] })
240
272
  # TODO add input plugin option to prevent connection cache
241
273
  @blob_client.client.reset_agents!
242
- #@logger.info(@pipe_id+" name #{name} size #{size} len #{file[:length]}")
274
+ #@logger.info("name #{name} size #{size} len #{file[:length]}")
243
275
  # if stop? good moment to stop what we're doing
244
276
  if stop?
245
277
  return
246
278
  end
247
- # save the registry past the regular intervals
248
- now = Time.now.to_i
249
- if ((now - chrono) > interval)
279
+ if ((Time.now.to_i - @last) > @interval)
250
280
  save_registry(@registry)
251
- chrono += interval
252
281
  end
253
282
  end
254
- # Save the registry and sleep until the remaining polling interval is over
255
- save_registry(@registry)
256
- sleeptime = interval - (Time.now.to_i - chrono)
283
+ # The files that got processed after the last registry save need to be saved too, in case the worklist is empty for some intervals.
284
+ now = Time.now.to_i
285
+ if ((now - @last) > @interval)
286
+ save_registry(@registry)
287
+ end
288
+ sleeptime = interval - ((now - start) % interval)
289
+ if @debug_timer
290
+ @logger.info("going to sleep for #{sleeptime} seconds")
291
+ end
257
292
  Stud.stoppable_sleep(sleeptime) { stop? }
258
293
  end
259
294
  end
@@ -338,51 +373,76 @@ end
338
373
  # list all blobs in the blobstore, set the offsets from the registry and return the filelist
339
374
  # inspired by: https://github.com/Azure-Samples/storage-blobs-ruby-quickstart/blob/master/example.rb
340
375
  def list_blobs(fill)
341
- files = Hash.new
342
- nextMarker = nil
343
- for counter in 1..3
344
- begin
345
- loop do
376
+ tries ||= 3
377
+ begin
378
+ return try_list_blobs(fill)
379
+ rescue Exception => e
380
+ @logger.error("caught: #{e.message} for list_blobs retries left #{tries}")
381
+ if (tries -= 1) > 0
382
+ retry
383
+ end
384
+ end
385
+ end
386
+
387
+ def try_list_blobs(fill)
388
+ # inspired by: http://blog.mirthlab.com/2012/05/25/cleanly-retrying-blocks-of-code-after-an-exception-in-ruby/
389
+ chrono = Time.now.to_i
390
+ files = Hash.new
391
+ nextMarker = nil
392
+ counter = 1
393
+ loop do
346
394
  blobs = @blob_client.list_blobs(container, { marker: nextMarker, prefix: @prefix})
347
395
  blobs.each do |blob|
348
396
  # FNM_PATHNAME is required so that "**/test" can match "test" at the root folder
349
397
  # FNM_EXTGLOB allows you to use "test{a,b,c}" to match either "testa", "testb" or "testc" (closer to shell behavior)
350
398
  unless blob.name == registry_path
351
- if @path_filters.any? {|path| File.fnmatch?(path, blob.name, File::FNM_PATHNAME | File::FNM_EXTGLOB)}
399
+ if @path_filters.any? {|path| File.fnmatch?(path, blob.name, File::FNM_PATHNAME | File::FNM_EXTGLOB)}
352
400
  length = blob.properties[:content_length].to_i
353
401
  offset = 0
354
402
  if fill
355
403
  offset = length
356
404
  end
357
405
  files.store(blob.name, { :offset => offset, :length => length })
358
- if (@debug_until > @processed) then @logger.info("1: list_blobs #{blob.name} #{offset} #{length}") end
406
+ if (@debug_until > @processed) then @logger.info("1: list_blobs #{blob.name} #{offset} #{length}") end
359
407
  end
360
408
  end
361
409
  end
362
410
  nextMarker = blobs.continuation_token
363
411
  break unless nextMarker && !nextMarker.empty?
412
+ if (counter % 10 == 0) then @logger.info(" listing #{counter * 50000} files") end
413
+ counter+=1
414
+ end
415
+ if @debug_timer
416
+ @logger.info("list_blobs took #{Time.now.to_i - chrono} sec")
364
417
  end
365
- rescue Exception => e
366
- @logger.error(@pipe_id+" caught: #{e.message} for attempt #{counter} of 3")
367
- counter += 1
368
- end
369
- end
370
418
  return files
371
419
  end
372
420
 
373
421
  # When events were processed after the last registry save, start a thread to update the registry file.
374
422
  def save_registry(filelist)
375
- # TODO because of threading, processed values and regsaved are not thread safe, they can change as instance variable @!
423
+ # Because of threading, processed values and regsaved are not thread safe, they can change as instance variable @! Most of the time this is fine because the registry is the last resort, but be careful about corner cases!
376
424
  unless @processed == @regsaved
377
425
  @regsaved = @processed
378
- @logger.info(@pipe_id+" processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
379
- Thread.new {
426
+ unless (@busy_writing_registry)
427
+ Thread.new {
380
428
  begin
381
- @blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
429
+ @busy_writing_registry = true
430
+ unless (@registry_local_path)
431
+ @blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
432
+ @logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
433
+ else
434
+ File.open(@registry_local_path+"/"+@pipe_id, 'w') { |file| file.write(Marshal.dump(filelist)) }
435
+ @logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_local_path+"/"+@pipe_id}")
436
+ end
437
+ @busy_writing_registry = false
438
+ @last = Time.now.to_i
382
439
  rescue
383
- @logger.error(@pipe_id+" Oh my, registry write failed, do you have write access?")
440
+ @logger.error("Oh my, registry write failed, do you have write access?")
384
441
  end
385
442
  }
443
+ else
444
+ @logger.info("Skipped writing the registry because previous write still in progress, it just takes long or may be hanging!")
445
+ end
386
446
  end
387
447
  end
388
448
 
@@ -394,13 +454,13 @@ def learn_encapsulation
394
454
  return if blob.nil?
395
455
  blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
396
456
  # TODO add check for empty blocks and log error that the header and footer can't be learned and must be set in the config
397
- @logger.debug(@pipe_id+" using #{blob.name} to learn the json header and tail")
457
+ @logger.debug("using #{blob.name} to learn the json header and tail")
398
458
  @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
399
- @logger.debug(@pipe_id+" learned header: #{@head}")
459
+ @logger.debug("learned header: #{@head}")
400
460
  length = blob.properties[:content_length].to_i
401
461
  offset = length - blocks.last.size
402
462
  @tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
403
- @logger.debug(@pipe_id+" learned tail: #{@tail}")
463
+ @logger.debug("learned tail: #{@tail}")
404
464
  end
405
465
 
406
466
  def resource(str)
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-input-azure_blob_storage'
3
- s.version = '0.11.3'
3
+ s.version = '0.11.4'
4
4
  s.licenses = ['Apache-2.0']
5
5
  s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
6
6
  s.description = <<-EOF
@@ -22,6 +22,6 @@ EOF
22
22
  # Gem dependencies
23
23
  s.add_runtime_dependency 'logstash-core-plugin-api', '~> 2.1'
24
24
  s.add_runtime_dependency 'stud', '~> 0.0.23'
25
- s.add_runtime_dependency 'azure-storage-blob', '~> 1.0'
25
+ s.add_runtime_dependency 'azure-storage-blob', '~> 1.1'
26
26
  s.add_development_dependency 'logstash-devutils', '~> 1.0', '>= 1.0.0'
27
27
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-azure_blob_storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.11.3
4
+ version: 0.11.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Geertsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-03-13 00:00:00.000000000 Z
11
+ date: 2020-05-23 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -17,8 +17,8 @@ dependencies:
17
17
  - !ruby/object:Gem::Version
18
18
  version: '2.1'
19
19
  name: logstash-core-plugin-api
20
- prerelease: false
21
20
  type: :runtime
21
+ prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
@@ -31,8 +31,8 @@ dependencies:
31
31
  - !ruby/object:Gem::Version
32
32
  version: 0.0.23
33
33
  name: stud
34
- prerelease: false
35
34
  type: :runtime
35
+ prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
@@ -43,15 +43,15 @@ dependencies:
43
43
  requirements:
44
44
  - - "~>"
45
45
  - !ruby/object:Gem::Version
46
- version: '1.0'
46
+ version: '1.1'
47
47
  name: azure-storage-blob
48
- prerelease: false
49
48
  type: :runtime
49
+ prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '1.0'
54
+ version: '1.1'
55
55
  - !ruby/object:Gem::Dependency
56
56
  requirement: !ruby/object:Gem::Requirement
57
57
  requirements:
@@ -62,8 +62,8 @@ dependencies:
62
62
  - !ruby/object:Gem::Version
63
63
  version: '1.0'
64
64
  name: logstash-devutils
65
- prerelease: false
66
65
  type: :development
66
+ prerelease: false
67
67
  version_requirements: !ruby/object:Gem::Requirement
68
68
  requirements:
69
69
  - - ">="
@@ -112,8 +112,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
112
112
  - !ruby/object:Gem::Version
113
113
  version: '0'
114
114
  requirements: []
115
- rubyforge_project:
116
- rubygems_version: 2.7.10
115
+ rubygems_version: 3.0.6
117
116
  signing_key:
118
117
  specification_version: 4
119
118
  summary: This logstash plugin reads and parses data from Azure Storage Blobs.