logstash-input-azure_blob_storage 0.12.7 → 0.12.9

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6bc1a46c4c6ae533e05c83f0e7cb90715cad7390a5cedb9b6e023c46f2e620d1
4
- data.tar.gz: 520d7b5131a6b00b6de066a12cd93a99082c7af0bb7184df9f2bc9c8ca64babd
3
+ metadata.gz: 4714d163b8085f62c285af7e18cae4b0075e89ee11aa6e6f2a9e18a2fd0dde1a
4
+ data.tar.gz: 0ebb527c554c1b48d7c1d3cb4b17b4ecb8aaa7745dab55b6d0eaa22660722fa2
5
5
  SHA512:
6
- metadata.gz: 3c069008cfef9b08c4b9793b24538c9c8bdc217b64285626d3c9564a57584b237bfef90f4382e4b68366c2555b1b9a6e91d897951bbcc336b355eaefb310ce00
7
- data.tar.gz: ccb7ba1d556cec586872ebe1c94237b3223f484902218d3bff899993467b741519c521b9c08075f98328536cf31274cd2aa386f64458097b025bbef2841c486d
6
+ metadata.gz: c0696b1431363cd1e828340a54fe044eacceb6c494f8f054f0082777f9bf78512fd3a3c893cea063eedfe006f0fad973fc72ebe7af8f7f03bde290a89fb77b89
7
+ data.tar.gz: 41a63e6decb12a501528b349b3c88b04c083f09f645360381af0cfaa520853989a9425fe8bf5c30a69666c3c87887efe649cc622291afdea6118abf462b7af3e
data/CHANGELOG.md CHANGED
@@ -1,3 +1,8 @@
1
+ ## 0.12.8
2
+ - support append blob (use codec json_lines and logtype raw)
3
+ - change the default head and tail to an empty string, unless the logtype is nsgflowlog
4
+ - jsonclean configuration parameter to clean the json stream from faulty characters to prevent parse errors
5
+ - catch ContainerNotFound, print error message in log and sleep interval time.
1
6
 
2
7
  ## 0.12.7
3
8
  - rewrote partial_read, now the occasional json parse errors should be fixed by reading only commited blocks.
data/README.md CHANGED
@@ -8,6 +8,14 @@ For problems or feature requests with this specific plugin, raise a github issue
8
8
  This plugin can read from Azure Storage Blobs, for instance JSON diagnostics logs for NSG flow logs or LINE based accesslogs from App Services.
9
9
  [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
10
10
 
11
+ ## Alternatives
12
+ This plugin was inspired by the Azure diagnostics tools, but should work better for bigger amounts of files. the configuration is not compatible, the configuration azureblob refers to the diagnostics tools plugin and this plugin uses azure_blob_storage
13
+ https://github.com/Azure/azure-diagnostics-tools/tree/master/Logstash/logstash-input-azureblob
14
+
15
+ There is a Filebeat plugin, that may work in the future
16
+ https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-azure-blob-storage.html
17
+
18
+ ## Innerworking
11
19
  The plugin depends on the [Ruby library azure-storage-blob](https://rubygems.org/gems/azure-storage-blob/versions/1.1.0) from Microsoft, that depends on Faraday for the HTTPS connection to Azure.
12
20
 
13
21
  The plugin executes the following steps
@@ -184,6 +192,20 @@ output {
184
192
  }
185
193
  }
186
194
  ```
195
+
196
+ Another for json_lines on append_blobs
197
+ ```
198
+ input {
199
+ azure_blob_storage {
200
+ codec => json_lines {
201
+ delimiter => "\n"
202
+ charset => "UTF-8"
203
+ }
204
+ # below options are optional
205
+ logtype => "raw"
206
+ append => true
207
+ cleanjson => true
208
+ ```
187
209
  The configuration documentation is in the first 100 lines of the code
188
210
  [GITHUB/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb](https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb)
189
211
 
@@ -228,5 +250,9 @@ filter {
228
250
  remove_field => ["timestamp"]
229
251
  }
230
252
  }
253
+
254
+ output {
255
+ stdout { codec => rubydebug }
256
+ }
231
257
  ```
232
258
 
@@ -26,7 +26,7 @@ require 'json'
26
26
  class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
27
27
  config_name "azure_blob_storage"
28
28
 
29
- # If undefined, Logstash will complain, even if codec is unused. The codec for nsgflowlog is "json" and the for WADIIS and APPSERVICE is "line".
29
+ # If undefined, Logstash will complain, even if codec is unused. The codec for nsgflowlog is "json", "json_line" works and the for WADIIS and APPSERVICE is "line".
30
30
  default :codec, "json"
31
31
 
32
32
  # logtype can be nsgflowlog, wadiis, appservice or raw. The default is raw, where files are read and added as one event. If the file grows, the next interval the file is read from the offset, so that the delta is sent as another event. In raw mode, further processing has to be done in the filter block. If the logtype is specified, this plugin will split and mutate and add individual events to the queue.
@@ -68,7 +68,7 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
68
68
  # when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
69
69
  config :registry_create_policy, :validate => ['resume','start_over','start_fresh'], :required => false, :default => 'resume'
70
70
 
71
- # The interval is used to save the registry regularly, when new events have have been processed. It is also used to wait before listing the files again and substracting the registry of already processed files to determine the worklist.
71
+ # The interval is used to save the registry regularly, when new events have have been processed. It is also used to wait before listing the files again and substracting the registry of already processed files to determine the worklist.
72
72
  # waiting time in seconds until processing the next batch. NSGFLOWLOGS append a block per minute, so use multiples of 60 seconds, 300 for 5 minutes, 600 for 10 minutes. The registry is also saved after every interval.
73
73
  # Partial reading starts from the offset and reads until the end, so the starting tag is prepended
74
74
  config :interval, :validate => :number, :default => 60
@@ -95,10 +95,14 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
95
95
  config :skip_learning, :validate => :boolean, :default => false, :required => false
96
96
 
97
97
  # The string that starts the JSON. Only needed when the codec is JSON. When partial file are read, the result will not be valid JSON unless the start and end are put back. the file_head and file_tail are learned at startup, by reading the first file in the blob_list and taking the first and last block, this would work for blobs that are appended like nsgflowlogs. The configuration can be set to override the learning. In case learning fails and the option is not set, the default is to use the 'records' as set by nsgflowlogs.
98
- config :file_head, :validate => :string, :required => false, :default => '{"records":['
98
+ config :file_head, :validate => :string, :required => false, :default => ''
99
99
  # The string that ends the JSON
100
- config :file_tail, :validate => :string, :required => false, :default => ']}'
100
+ config :file_tail, :validate => :string, :required => false, :default => ''
101
101
 
102
+ # inspect the bytes and remove faulty characters
103
+ config :cleanjson, :validate => :boolean, :default => false, :required => false
104
+
105
+ config :append, :validate => :boolean, :default => false, :required => false
102
106
  # By default it will watch every file in the storage container. The prefix option is a simple filter that only processes files with a path that starts with that value.
103
107
  # For NSGFLOWLOGS a path starts with "resourceId=/". This would only be needed to exclude other paths that may be written in the same container. The registry file will be excluded.
104
108
  # You may also configure multiple paths. See an example on the <<array,Logstash configuration page>>.
@@ -118,6 +122,7 @@ public
118
122
  @logger.info("If this plugin doesn't work, please raise an issue in https://github.com/janmg/logstash-input-azure_blob_storage")
119
123
  @busy_writing_registry = Mutex.new
120
124
  # TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
125
+ # For now it's difficult because the plugin would then have to synchronize the worklist
121
126
  end
122
127
 
123
128
 
@@ -128,41 +133,10 @@ public
128
133
  @regsaved = @processed
129
134
 
130
135
  connect
131
-
132
136
  @registry = Hash.new
133
- if registry_create_policy == "resume"
134
- for counter in 1..3
135
- begin
136
- if (!@registry_local_path.nil?)
137
- unless File.file?(@registry_local_path+"/"+@pipe_id)
138
- @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
139
- #[0] headers [1] responsebody
140
- @logger.info("migrating from remote registry #{registry_path}")
141
- else
142
- if !Dir.exist?(@registry_local_path)
143
- FileUtils.mkdir_p(@registry_local_path)
144
- end
145
- @registry = Marshal.load(File.read(@registry_local_path+"/"+@pipe_id))
146
- @logger.info("resuming from local registry #{registry_local_path+"/"+@pipe_id}")
147
- end
148
- else
149
- @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
150
- #[0] headers [1] responsebody
151
- @logger.info("resuming from remote registry #{registry_path}")
152
- end
153
- break
154
- rescue Exception => e
155
- @logger.error("caught: #{e.message}")
156
- @registry.clear
157
- @logger.error("loading registry failed for attempt #{counter} of 3")
158
- end
159
- end
160
- end
161
- # read filelist and set offsets to file length to mark all the old files as done
162
- if registry_create_policy == "start_fresh"
163
- @registry = list_blobs(true)
164
- save_registry()
165
- @logger.info("starting fresh, writing a clean registry to contain #{@registry.size} blobs/files")
137
+ load_registry()
138
+ @registry.each do |name, file|
139
+ @logger.info("offset: #{file[:offset]} length: #{file[:length]}")
166
140
  end
167
141
 
168
142
  @is_json = false
@@ -174,22 +148,29 @@ public
174
148
  @is_json_line = true
175
149
  end
176
150
  end
151
+
152
+
177
153
  @head = ''
178
154
  @tail = ''
179
- # if codec=json sniff one files blocks A and Z to learn file_head and file_tail
180
155
  if @is_json
156
+ # if codec=json sniff one files blocks A and Z to learn file_head and file_tail
157
+ if @logtype == 'nsgflowlog'
158
+ @head = '{"records":['
159
+ @tail = ']}'
160
+ end
181
161
  if file_head
182
162
  @head = file_head
183
163
  end
184
164
  if file_tail
185
165
  @tail = file_tail
186
166
  end
187
- if file_head and file_tail and !skip_learning
167
+ if !skip_learning
188
168
  learn_encapsulation
189
169
  end
190
- @logger.info("head will be: #{@head} and tail is set to #{@tail}")
170
+ @logger.info("head will be: '#{@head}' and tail is set to: '#{@tail}'")
191
171
  end
192
172
 
173
+
193
174
  filelist = Hash.new
194
175
  worklist = Hash.new
195
176
  @last = start = Time.now.to_i
@@ -206,24 +187,27 @@ public
206
187
  # load the registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
207
188
  # TODO: sort by timestamp ?
208
189
  #filelist.sort_by(|k,v|resource(k)[:date])
209
- worklist.clear
210
190
  filelist.clear
211
191
 
212
192
  # Listing all the files
213
193
  filelist = list_blobs(false)
194
+ if (@debug_until > @processed) then
195
+ @registry.each do |name, file|
196
+ @logger.info("#{name} offset: #{file[:offset]} length: #{file[:length]}")
197
+ end
198
+ end
214
199
  filelist.each do |name, file|
215
200
  off = 0
216
201
  if @registry.key?(name) then
217
- begin
218
- off = @registry[name][:offset]
219
- rescue Exception => e
220
- @logger.error("caught: #{e.message} while reading #{name}")
221
- end
202
+ begin
203
+ off = @registry[name][:offset]
204
+ rescue Exception => e
205
+ @logger.error("caught: #{e.message} while reading #{name}")
206
+ end
222
207
  end
223
208
  @registry.store(name, { :offset => off, :length => file[:length] })
224
209
  if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
225
210
  end
226
- # size nilClass when the list doesn't grow?!
227
211
 
228
212
  # clean registry of files that are not in the filelist
229
213
  @registry.each do |name,file|
@@ -242,14 +226,16 @@ public
242
226
 
243
227
  # Start of processing
244
228
  # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
229
+ # pool = Concurrent::FixedThreadPool.new(5) # 5 threads
230
+ #pool.post do
231
+ # some parallel work
232
+ #end
245
233
  if (worklist.size > 0) then
246
234
  worklist.each do |name, file|
247
235
  start = Time.now.to_i
248
236
  if (@debug_until > @processed) then @logger.info("3: processing #{name} from #{file[:offset]} to #{file[:length]}") end
249
237
  size = 0
250
238
  if file[:offset] == 0
251
- # This is where Sera4000 issue starts
252
- # For an append blob, reading full and crashing, retry, last_modified? ... lenght? ... committed? ...
253
239
  # length and skip reg value
254
240
  if (file[:length] > 0)
255
241
  begin
@@ -272,49 +258,86 @@ public
272
258
  delta_size = chunk.size - @head.length - 1
273
259
  end
274
260
 
275
- if logtype == "nsgflowlog" && @is_json
276
- # skip empty chunks
277
- unless chunk.nil?
278
- res = resource(name)
279
- begin
280
- fingjson = JSON.parse(chunk)
281
- @processed += nsgflowlog(queue, fingjson, name)
282
- @logger.debug("Processed #{res[:nsg]} #{@processed} events")
283
- rescue JSON::ParserError => e
284
- @logger.error("parse error #{e.message} on #{res[:nsg]} offset: #{file[:offset]} length: #{file[:length]}")
285
- if (@debug_until > @processed) then @logger.info("#{chunk}") end
286
- end
261
+ #
262
+ # TODO! ... split out the logtypes and use individual methods
263
+ # how does a byte array chuck from json_lines get translated to strings/json/events
264
+ # should the byte array be converted to a multiline and then split? drawback need to know characterset and linefeed characters
265
+ # how does the json_line decoder work on byte arrays?
266
+ #
267
+ # so many questions
268
+
269
+ unless chunk.nil?
270
+ counter = 0
271
+ if @is_json
272
+ if logtype == "nsgflowlog"
273
+ res = resource(name)
274
+ begin
275
+ fingjson = JSON.parse(chunk)
276
+ @processed += nsgflowlog(queue, fingjson, name)
277
+ @logger.debug("Processed #{res[:nsg]} #{@processed} events")
278
+ rescue JSON::ParserError => e
279
+ @logger.error("parse error #{e.message} on #{res[:nsg]} offset: #{file[:offset]} length: #{file[:length]}")
280
+ if (@debug_until > @processed) then @logger.info("#{chunk}") end
281
+ end
282
+ else
283
+ begin
284
+ @codec.decode(chunk) do |event|
285
+ counter += 1
286
+ if @addfilename
287
+ event.set('filename', name)
288
+ end
289
+ decorate(event)
290
+ queue << event
291
+ end
292
+ @processed += counter
293
+ rescue Exception => e
294
+ @logger.error("codec exception: #{e.message} .. continue and pretend this never happened")
295
+ end
296
+ end
297
+ end
298
+
299
+ if logtype == "wadiis" && !@is_json
300
+ # TODO: Convert this to line based grokking.
301
+ @processed += wadiislog(queue, name)
287
302
  end
288
- # TODO: Convert this to line based grokking.
289
- elsif logtype == "wadiis" && !@is_json
290
- @processed += wadiislog(queue, name)
291
- else
292
- # Handle JSONLines format
293
- if !@chunk.nil? && @is_json_line
294
- newline_rindex = chunk.rindex("\n")
295
- if newline_rindex.nil?
296
- # No full line in chunk, skip it without updating the registry.
297
- # Expecting that the JSON line would be filled in at a subsequent iteration.
298
- next
303
+
304
+ if @is_json_line
305
+ # parse one line at a time and dump it in the chunk?
306
+ lines = chunk.to_s
307
+ if cleanjson
308
+ @logger.info("cleaning in progress")
309
+ lines.chars.select(&:valid_encoding?).join
310
+ #lines.delete "\\"
311
+ #lines.scrub{|bytes| '<'+bytes.unpack('H*')[0]+'>' }
312
+ end
313
+ begin
314
+ @codec.decode(lines) do |event|
315
+ counter += 1
316
+ queue << event
317
+ end
318
+ @processed += counter
319
+ rescue Exception => e
320
+ # todo: fix codec_lines exception: no implicit conversion of Array into String
321
+ @logger.error("json_lines codec exception: #{e.message} .. continue and pretend this never happened")
299
322
  end
300
- chunk = chunk[0..newline_rindex]
301
- delta_size = chunk.size
302
323
  end
303
324
 
304
- counter = 0
305
- begin
306
- @codec.decode(chunk) do |event|
307
- counter += 1
308
- if @addfilename
309
- event.set('filename', name)
325
+ if !@is_json_line && !@is_json
326
+ if logtype == "wadiis"
327
+ # TODO: Convert this to line based grokking.
328
+ @processed += wadiislog(queue, name)
329
+ else
330
+ # Any other codec and logstyle
331
+ begin
332
+ @codec.decode(chunk) do |event|
333
+ counter += 1
334
+ queue << event
335
+ end
336
+ @processed += counter
337
+ rescue Exception => e
338
+ @logger.error("other codec exception: #{e.message} .. continue and pretend this never happened")
310
339
  end
311
- decorate(event)
312
- queue << event
313
340
  end
314
- @processed += counter
315
- rescue Exception => e
316
- @logger.error("codec exception: #{e.message} .. will continue and pretend this never happened")
317
- @logger.debug("#{chunk}")
318
341
  end
319
342
  end
320
343
 
@@ -354,6 +377,24 @@ public
354
377
 
355
378
 
356
379
  private
380
+ def list_files
381
+ filelist = list_blobs(false)
382
+ filelist.each do |name, file|
383
+ off = 0
384
+ if @registry.key?(name) then
385
+ begin
386
+ off = @registry[name][:offset]
387
+ rescue Exception => e
388
+ @logger.error("caught: #{e.message} while reading #{name}")
389
+ end
390
+ end
391
+ @registry.store(name, { :offset => off, :length => file[:length] })
392
+ if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
393
+ end
394
+ return filelist
395
+ end
396
+ # size nilClass when the list doesn't grow?!
397
+
357
398
  def connect
358
399
  # Try in this order to access the storageaccount
359
400
  # 1. storageaccount / sas_token
@@ -384,11 +425,48 @@ private
384
425
  # end
385
426
  end
386
427
  end
428
+ # @registry_create_policy,@registry_local_path,@container,@registry_path
429
+ def load_registry()
430
+ if @registry_create_policy == "resume"
431
+ for counter in 1..3
432
+ begin
433
+ if (!@registry_local_path.nil?)
434
+ unless File.file?(@registry_local_path+"/"+@pipe_id)
435
+ @registry = Marshal.load(@blob_client.get_blob(@container, path)[1])
436
+ #[0] headers [1] responsebody
437
+ @logger.info("migrating from remote registry #{path}")
438
+ else
439
+ if !Dir.exist?(@registry_local_path)
440
+ FileUtils.mkdir_p(@registry_local_path)
441
+ end
442
+ @registry = Marshal.load(File.read(@registry_local_path+"/"+@pipe_id))
443
+ @logger.info("resuming from local registry #{@registry_local_path+"/"+@pipe_id}")
444
+ end
445
+ else
446
+ @registry = Marshal.load(@blob_client.get_blob(container, path)[1])
447
+ #[0] headers [1] responsebody
448
+ @logger.info("resuming from remote registry #{path}")
449
+ end
450
+ break
451
+ rescue Exception => e
452
+ @logger.error("caught: #{e.message}")
453
+ @registry.clear
454
+ @logger.error("loading registry failed for attempt #{counter} of 3")
455
+ end
456
+ end
457
+ end
458
+ # read filelist and set offsets to file length to mark all the old files as done
459
+ if @registry_create_policy == "start_fresh"
460
+ @registry = list_blobs(true)
461
+ #save_registry()
462
+ @logger.info("starting fresh, with a clean registry containing #{@registry.size} blobs/files")
463
+ end
464
+ end
387
465
 
388
466
  def full_read(filename)
389
467
  tries ||= 2
390
468
  begin
391
- return @blob_client.get_blob(container, filename)[1]
469
+ return @blob_client.get_blob(@container, filename)[1]
392
470
  rescue Exception => e
393
471
  @logger.error("caught: #{e.message} for full_read")
394
472
  if (tries -= 1) > 0
@@ -399,7 +477,7 @@ private
399
477
  end
400
478
  end
401
479
  begin
402
- chuck = @blob_client.get_blob(container, filename)[1]
480
+ chuck = @blob_client.get_blob(@container, filename)[1]
403
481
  end
404
482
  return chuck
405
483
  end
@@ -410,29 +488,45 @@ private
410
488
  # 3. strip comma
411
489
  # if json strip comma and fix head and tail
412
490
  size = 0
413
- blocks = @blob_client.list_blob_blocks(container, blobname)
414
- blocks[:committed].each do |block|
415
- size += block.size
416
- end
417
- # read the new blob blocks from the offset to the last committed size.
418
- # if it is json, fix the head and tail
419
- # crap committed block at the end is the tail, so must be substracted from the read and then comma stripped and tail added.
420
- # but why did I need a -1 for the length?? probably the offset starts at 0 and ends at size-1
421
491
 
422
- # should first check commit, read and the check committed again? no, only read the commited size
423
- # should read the full content and then substract json tail
492
+ begin
493
+ if @append
494
+ return @blob_client.get_blob(@container, blobname, start_range: offset-1)[1]
495
+ end
496
+ blocks = @blob_client.list_blob_blocks(@container, blobname)
497
+ blocks[:committed].each do |block|
498
+ size += block.size
499
+ end
500
+ # read the new blob blocks from the offset to the last committed size.
501
+ # if it is json, fix the head and tail
502
+ # crap committed block at the end is the tail, so must be substracted from the read and then comma stripped and tail added.
503
+ # but why did I need a -1 for the length?? probably the offset starts at 0 and ends at size-1
504
+
505
+ # should first check commit, read and the check committed again? no, only read the commited size
506
+ # should read the full content and then substract json tail
424
507
 
425
- if @is_json
426
- content = @blob_client.get_blob(container, blobname, start_range: offset-1, end_range: size-1)[1]
427
- if content.end_with?(@tail)
428
- return @head + strip_comma(content)
508
+ unless @is_json
509
+ return @blob_client.get_blob(@container, blobname, start_range: offset, end_range: size-1)[1]
429
510
  else
430
- @logger.info("Fixed a tail! probably new committed blocks started appearing!")
431
- # substract the length of the tail and add the tail, because the file grew.size was calculated as the block boundary, so replacing the last bytes with the tail should fix the problem
432
- return @head + strip_comma(content[0...-@tail.length]) + @tail
511
+ content = @blob_client.get_blob(@container, blobname, start_range: offset-1, end_range: size-1)[1]
512
+ if content.end_with?(@tail)
513
+ return @head + strip_comma(content)
514
+ else
515
+ @logger.info("Fixed a tail! probably new committed blocks started appearing!")
516
+ # substract the length of the tail and add the tail, because the file grew.size was calculated as the block boundary, so replacing the last bytes with the tail should fix the problem
517
+ return @head + strip_comma(content[0...-@tail.length]) + @tail
518
+ end
433
519
  end
434
- else
435
- content = @blob_client.get_blob(container, blobname, start_range: offset, end_range: size-1)[1]
520
+ rescue InvalidBlobType => ibt
521
+ @logger.error("caught #{ibt.message}. Setting BlobType to append")
522
+ @append = true
523
+ retry
524
+ rescue NoMethodError => nme
525
+ @logger.error("caught #{nme.message}. Setting append to true")
526
+ @append = true
527
+ retry
528
+ rescue Exception => e
529
+ @logger.error("caught #{e.message}")
436
530
  end
437
531
  end
438
532
 
@@ -532,26 +626,31 @@ private
532
626
  nextMarker = nil
533
627
  counter = 1
534
628
  loop do
535
- blobs = @blob_client.list_blobs(container, { marker: nextMarker, prefix: @prefix})
536
- blobs.each do |blob|
537
- # FNM_PATHNAME is required so that "**/test" can match "test" at the root folder
538
- # FNM_EXTGLOB allows you to use "test{a,b,c}" to match either "testa", "testb" or "testc" (closer to shell behavior)
539
- unless blob.name == registry_path
540
- if @path_filters.any? {|path| File.fnmatch?(path, blob.name, File::FNM_PATHNAME | File::FNM_EXTGLOB)}
541
- length = blob.properties[:content_length].to_i
542
- offset = 0
543
- if fill
544
- offset = length
629
+ begin
630
+ blobs = @blob_client.list_blobs(@container, { marker: nextMarker, prefix: @prefix})
631
+ blobs.each do |blob|
632
+ # FNM_PATHNAME is required so that "**/test" can match "test" at the root folder
633
+ # FNM_EXTGLOB allows you to use "test{a,b,c}" to match either "testa", "testb" or "testc" (closer to shell behavior)
634
+ unless blob.name == registry_path
635
+ if @path_filters.any? {|path| File.fnmatch?(path, blob.name, File::FNM_PATHNAME | File::FNM_EXTGLOB)}
636
+ length = blob.properties[:content_length].to_i
637
+ offset = 0
638
+ if fill
639
+ offset = length
640
+ end
641
+ files.store(blob.name, { :offset => offset, :length => length })
642
+ if (@debug_until > @processed) then @logger.info("1: list_blobs #{blob.name} #{offset} #{length}") end
545
643
  end
546
- files.store(blob.name, { :offset => offset, :length => length })
547
- if (@debug_until > @processed) then @logger.info("1: list_blobs #{blob.name} #{offset} #{length}") end
548
644
  end
549
645
  end
646
+ nextMarker = blobs.continuation_token
647
+ break unless nextMarker && !nextMarker.empty?
648
+ if (counter % 10 == 0) then @logger.info(" listing #{counter * 50000} files") end
649
+ counter+=1
650
+ rescue Exception => e
651
+ @logger.error("caught: #{e.message} while trying to list blobs")
652
+ return files
550
653
  end
551
- nextMarker = blobs.continuation_token
552
- break unless nextMarker && !nextMarker.empty?
553
- if (counter % 10 == 0) then @logger.info(" listing #{counter * 50000} files") end
554
- counter+=1
555
654
  end
556
655
  if @debug_timer
557
656
  @logger.info("list_blobs took #{Time.now.to_i - chrono} sec")
@@ -571,7 +670,7 @@ private
571
670
  begin
572
671
  @busy_writing_registry.lock
573
672
  unless (@registry_local_path)
574
- @blob_client.create_block_blob(container, registry_path, regdump)
673
+ @blob_client.create_block_blob(@container, registry_path, regdump)
575
674
  @logger.info("processed #{@processed} events, saving #{regsize} blobs and offsets to remote registry #{registry_path}")
576
675
  else
577
676
  File.open(@registry_local_path+"/"+@pipe_id, 'w') { |file| file.write(regdump) }
@@ -597,20 +696,20 @@ private
597
696
  @logger.info("learn_encapsulation, this can be skipped by setting skip_learning => true. Or set both head_file and tail_file")
598
697
  # From one file, read first block and last block to learn head and tail
599
698
  begin
600
- blobs = @blob_client.list_blobs(container, { max_results: 3, prefix: @prefix})
699
+ blobs = @blob_client.list_blobs(@container, { max_results: 3, prefix: @prefix})
601
700
  blobs.each do |blob|
602
701
  unless blob.name == registry_path
603
702
  begin
604
- blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
703
+ blocks = @blob_client.list_blob_blocks(@container, blob.name)[:committed]
605
704
  if ['A00000000000000000000000000000000','QTAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw'].include?(blocks.first.name)
606
705
  @logger.debug("using #{blob.name}/#{blocks.first.name} to learn the json header")
607
- @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
706
+ @head = @blob_client.get_blob(@container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
608
707
  end
609
708
  if ['Z00000000000000000000000000000000','WjAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAw'].include?(blocks.last.name)
610
709
  @logger.debug("using #{blob.name}/#{blocks.last.name} to learn the json footer")
611
710
  length = blob.properties[:content_length].to_i
612
711
  offset = length - blocks.last.size
613
- @tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
712
+ @tail = @blob_client.get_blob(@container, blob.name, start_range: offset, end_range: length-1)[1]
614
713
  @logger.debug("learned tail: #{@tail}")
615
714
  end
616
715
  rescue Exception => e
@@ -635,7 +734,9 @@ private
635
734
  def val(str)
636
735
  return str.split('=')[1]
637
736
  end
737
+ end # class LogStash::Inputs::AzureBlobStorage
638
738
 
739
+ # This is a start towards mapping NSG events to ECS fields ... it's complicated
639
740
  =begin
640
741
  def ecs(old)
641
742
  # https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html
@@ -681,4 +782,3 @@ private
681
782
  return ecs
682
783
  end
683
784
  =end
684
- end # class LogStash::Inputs::AzureBlobStorage
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-input-azure_blob_storage'
3
- s.version = '0.12.7'
3
+ s.version = '0.12.9'
4
4
  s.licenses = ['Apache-2.0']
5
5
  s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
6
6
  s.description = <<-EOF
@@ -24,5 +24,5 @@ EOF
24
24
  s.add_runtime_dependency 'stud', '~> 0.0.23'
25
25
  s.add_runtime_dependency 'azure-storage-blob', '~> 2', '>= 2.0.3'
26
26
  s.add_development_dependency 'logstash-devutils', '~> 2.4'
27
- s.add_development_dependency 'rubocop', '~> 1.48'
27
+ s.add_development_dependency 'rubocop', '~> 1.50'
28
28
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-azure_blob_storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.12.7
4
+ version: 0.12.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Geertsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-04-02 00:00:00.000000000 Z
11
+ date: 2023-07-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -77,7 +77,7 @@ dependencies:
77
77
  requirements:
78
78
  - - "~>"
79
79
  - !ruby/object:Gem::Version
80
- version: '1.48'
80
+ version: '1.50'
81
81
  name: rubocop
82
82
  prerelease: false
83
83
  type: :development
@@ -85,7 +85,7 @@ dependencies:
85
85
  requirements:
86
86
  - - "~>"
87
87
  - !ruby/object:Gem::Version
88
- version: '1.48'
88
+ version: '1.50'
89
89
  description: " This gem is a Logstash plugin. It reads and parses data from Azure\
90
90
  \ Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob\
91
91
  \ from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial\