logstash-input-azure_blob_storage 0.11.4 → 0.12.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 158d9ef3b7997fb3ec67f4e2278861ae367c3e4a73f362dc56f145482d802e34
4
- data.tar.gz: 89f5b1bc848a97cbf31b1323aa64d021d86a05292d3d7d006994ad170666a37d
3
+ metadata.gz: bcf097b26eafe13b09cbaca77a097c10cc6e429b51125c6e82f27e8057e6ccab
4
+ data.tar.gz: cf229e45283fc69d29d751b75c4fce42b432103ef49d7ec018dea810477d4b32
5
5
  SHA512:
6
- metadata.gz: 80f12e364ba3fd81375d2b88d24567d92ec83decac371552e3a814194f6dcae2f1c6991ac87f50e0012a8cb177f67da92790d40a71af953b211e5043a1691170
7
- data.tar.gz: 0e54b9c0b9f63737ef8046d362c47f1c20f2d9f702db0311993def976f1a40c14534c7fae9a7a90e098ce4b3bdd18d00517f420e9cc6c4b7810f3709aee797e1
6
+ metadata.gz: ad5a05a919398a665b70ee177ba2a43f53e74462cfa4b1afb308caa2d065ece6d923142b12f16c5385e3edc840cbe453c7937b5242dd697239a240c8295e4418
7
+ data.tar.gz: 6be87f645933465f9edc34b675d7dd7dd861bbf1990d9fe9919731da14fd450523baa5bafefbdd9fcad4ec9aef70528d258090c2c545d4879c27069a514384ec
data/CHANGELOG.md CHANGED
@@ -1,6 +1,29 @@
1
+ ## 0.12.0
2
+ - version 2 of azure-storage
3
+ - saving current files registry, not keeping historical files
4
+
5
+ ## 0.11.7
6
+ - implemented skip_learning
7
+ - start ignoring failed files and not retry
8
+
9
+ ## 0.11.6
10
+ - fix in json head and tail learning the max_results
11
+ - broke out connection setup in order to call it again if connection exceptions come
12
+ - deal better with skipping of empty files.
13
+
14
+ ## 0.11.5
15
+ - added optional addfilename to add filename in message
16
+ - NSGFLOWLOG version 2 uses 0 as value instead of NULL in src and dst values
17
+ - added connection exception handling when full_read files
18
+ - rewritten json header footer learning to ignore learning from registry
19
+ - plumbing for emulator
20
+
1
21
  ## 0.11.4
2
22
  - fixed listing 3 times, rather than retrying to list max 3 times
3
- - added log entries for better tracing in which phase the application is now and how long it takes
23
+ - added option to migrate/save to using local registry
24
+ - rewrote interval timing
25
+ - reduced saving of registry to maximum once per interval, protect duplicate simultanious writes
26
+ - added debug_timer for better tracing how long operations take
4
27
  - removing pipeline name from logfiles, logstash 7.6 and up have this in the log4j2 by default now
5
28
  - moved initialization from register to run. should make logs more readable
6
29
 
data/README.md CHANGED
@@ -1,30 +1,34 @@
1
- # Logstash Plugin
1
+ # Logstash
2
2
 
3
- This is a plugin for [Logstash](https://github.com/elastic/logstash).
3
+ This is a plugin for [Logstash](https://github.com/elastic/logstash). It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way. All logstash plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/). Need generic logstash help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.
4
4
 
5
- It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
5
+ For problems or feature requests with this specific plugin, raise a github issue [GITHUB/janmg/logstash-input-azure_blob_storage/](https://github.com/janmg/logstash-input-azure_blob_storage). Pull requests will also be welcomed after discussion through an issue.
6
6
 
7
- ## Documentation
8
-
9
- All logstash plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
7
+ ## Purpose
8
+ This plugin can read from Azure Storage Blobs, for instance JSON diagnostics logs for NSG flow logs or LINE based accesslogs from App Services.
9
+ [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
10
10
 
11
- ## Need Help?
11
+ The plugin depends on the [Ruby library azure-storage-blon](https://rubygems.org/gems/azure-storage-blob/versions/1.1.0) from Microsoft, that depends on Faraday for the HTTPS connection to Azure.
12
12
 
13
- Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum. For real problems or feature requests, raise a github issue [GITHUB/janmg/logstash-input-azure_blob_storage/](https://github.com/janmg/logstash-input-azure_blob_storage). Pull requests will ionly be merged after discussion through an issue.
13
+ The plugin executes the following steps
14
+ 1. Lists all the files in the azure storage account. where the path of the files are matching pathprefix
15
+ 2. Filters on path_filters to only include files that match the directory and file glob (e.g. **/*.json)
16
+ 3. Save the listed files in a registry of known files and filesizes. (data/registry.dat on azure, or in a file on the logstash instance)
17
+ 4. List all the files again and compare the registry with the new filelist and put the delta in a worklist
18
+ 5. Process the worklist and put all events in the logstash queue.
19
+ 6. if there is time left, sleep to complete the interval. If processing takes more than an inteval, save the registry and continue processing.
20
+ 7. If logstash is stopped, a stop signal will try to finish the current file, save the registry and than quit
14
21
 
15
- ## Purpose
16
- This plugin can read from Azure Storage Blobs, for instance diagnostics logs for NSG flow logs or accesslogs from App Services.
17
- [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
18
- This
19
22
  ## Installation
20
23
  This plugin can be installed through logstash-plugin
21
24
  ```
22
- logstash-plugin install logstash-input-azure_blob_storage
25
+ /usr/share/logstash/bin/logstash-plugin install logstash-input-azure_blob_storage
23
26
  ```
24
27
 
25
28
  ## Minimal Configuration
26
29
  The minimum configuration required as input is storageaccount, access_key and container.
27
30
 
31
+ /etc/logstash/conf.d/test.conf
28
32
  ```
29
33
  input {
30
34
  azure_blob_storage {
@@ -36,23 +40,29 @@ input {
36
40
  ```
37
41
 
38
42
  ## Additional Configuration
39
- The registry_create_policy is used when the pipeline is started to either resume from the last known unprocessed file, or to start_fresh ignoring old files or start_over to process all the files from the beginning.
43
+ The registry keeps track of files in the storage account, their size and how many bytes have been processed. Files can grow and the added part will be processed as a partial file. The registry is saved todisk every interval.
44
+
45
+ The registry_create_policy determines at the start of the pipeline if processing should resume from the last known unprocessed file, or to start_fresh ignoring old files and start only processing new events that came after the start of the pipeline. Or start_over to process all the files ignoring the registry.
40
46
 
41
- interval defines the minimum time the registry should be saved to the registry file (by default 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
47
+ interval defines the minimum time the registry should be saved to the registry file (by default to 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
42
48
 
43
- During the pipeline start the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually.
49
+ When registry_local_path is set to a directory, the registry is saved on the logstash server in that directory. The filename is the pipe.id
50
+
51
+ with registry_create_policy set to resume and the registry_local_path set to a directory where the registry isn't yet created, should load the registry from the storage account and save the registry on the local server. This allows for a migration to localstorage
52
+
53
+ For pipelines that use the JSON codec or the JSON_LINE codec, the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually. Using skip_learning the learning can be disabled.
44
54
 
45
55
  ## Running the pipeline
46
56
  The pipeline can be started in several ways.
47
57
  - On the commandline
48
58
  ```
49
- /usr/share/logstash/bin/logtash -f /etc/logstash/pipeline.d/test.yml
59
+ /usr/share/logstash/bin/logtash -f /etc/logstash/conf.d/test.conf
50
60
  ```
51
61
  - In the pipeline.yml
52
62
  ```
53
63
  /etc/logstash/pipeline.yml
54
64
  pipe.id = test
55
- pipe.path = /etc/logstash/pipeline.d/test.yml
65
+ pipe.path = /etc/logstash/conf.d/test.conf
56
66
  ```
57
67
  - As managed pipeline from Kibana
58
68
 
@@ -91,6 +101,9 @@ The log level of the plugin can be put into DEBUG through
91
101
  curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.inputs.azureblobstorage" : "DEBUG"}'
92
102
  ```
93
103
 
104
+ Because logstash debug makes logstash very chatty, the option debug_until will for a number of processed events and stops debuging. One file can easily contain thousands of events. The debug_until is useful to monitor the start of the plugin and the processing of the first files.
105
+
106
+ debug_timer will show detailed information on how much time listing of files took and how long the plugin will sleep to fill the interval and the listing and processing starts again.
94
107
 
95
108
  ## Other Configuration Examples
96
109
  For nsgflowlogs, a simple configuration looks like this
@@ -116,6 +129,10 @@ filter {
116
129
  }
117
130
  }
118
131
 
132
+ output {
133
+ stdout { }
134
+ }
135
+
119
136
  output {
120
137
  elasticsearch {
121
138
  hosts => "elasticsearch"
@@ -123,21 +140,35 @@ output {
123
140
  }
124
141
  }
125
142
  ```
126
-
143
+ A more elaborate input configuration example
127
144
  ```
128
145
  input {
129
146
  azure_blob_storage {
147
+ codec => "json"
130
148
  storageaccount => "yourstorageaccountname"
131
149
  access_key => "Ba5e64c0d3=="
132
150
  container => "insights-logs-networksecuritygroupflowevent"
133
- codec => "json"
134
151
  logtype => "nsgflowlog"
135
152
  prefix => "resourceId=/"
153
+ path_filters => ['**/*.json']
154
+ addfilename => true
136
155
  registry_create_policy => "resume"
156
+ registry_local_path => "/usr/share/logstash/plugin"
137
157
  interval => 300
158
+ debug_timer => true
159
+ debug_until => 100
160
+ }
161
+ }
162
+
163
+ output {
164
+ elasticsearch {
165
+ hosts => "elasticsearch"
166
+ index => "nsg-flow-logs-%{+xxxx.ww}"
138
167
  }
139
168
  }
140
169
  ```
170
+ The configuration documentation is in the first 100 lines of the code
171
+ [GITHUB/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb](https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb)
141
172
 
142
173
  For WAD IIS and App Services the HTTP AccessLogs can be retrieved from a storage account as line based events and parsed through GROK. The date stamp can also be parsed with %{TIMESTAMP_ISO8601:log_timestamp}. For WAD IIS logfiles the container is wad-iis-logfiles. In the future grokking may happen already by the plugin.
143
174
  ```
@@ -176,7 +207,7 @@ filter {
176
207
  remove_field => ["subresponse"]
177
208
  remove_field => ["username"]
178
209
  remove_field => ["clientPort"]
179
- remove_field => ["port"]
210
+ remove_field => ["port"]:0
180
211
  remove_field => ["timestamp"]
181
212
  }
182
213
  }
@@ -25,6 +25,9 @@ config :storageaccount, :validate => :string, :required => false
25
25
  # DNS Suffix other then blob.core.windows.net
26
26
  config :dns_suffix, :validate => :string, :required => false, :default => 'core.windows.net'
27
27
 
28
+ # For development this can be used to emulate an accountstorage when not available from azure
29
+ #config :use_development_storage, :validate => :boolean, :required => false
30
+
28
31
  # The (primary or secondary) Access Key for the the storage account. The key can be found in the portal.azure.com or through the azure api StorageAccounts/ListKeys. For example the PowerShell command Get-AzStorageAccountKey.
29
32
  config :access_key, :validate => :password, :required => false
30
33
 
@@ -58,6 +61,9 @@ config :registry_create_policy, :validate => ['resume','start_over','start_fresh
58
61
  # Z00000000000000000000000000000000 2 ]}
59
62
  config :interval, :validate => :number, :default => 60
60
63
 
64
+ # add the filename into the events
65
+ config :addfilename, :validate => :boolean, :default => false, :required => false
66
+
61
67
  # debug_until will for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. This is a lightweight alternative to switching the loglevel from info to debug or even trace
62
68
  config :debug_until, :validate => :number, :default => 0, :required => false
63
69
 
@@ -67,6 +73,9 @@ config :debug_timer, :validate => :boolean, :default => false, :required => fals
67
73
  # WAD IIS Grok Pattern
68
74
  #config :grokpattern, :validate => :string, :required => false, :default => '%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:instanceId} %{NOTSPACE:instanceId2} %{IPORHOST:ServerIP} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:httpVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}'
69
75
 
76
+ # skip learning if you use json and don't want to learn the head and tail, but use either the defaults or configure them.
77
+ config :skip_learning, :validate => :boolean, :default => false, :required => false
78
+
70
79
  # The string that starts the JSON. Only needed when the codec is JSON. When partial file are read, the result will not be valid JSON unless the start and end are put back. the file_head and file_tail are learned at startup, by reading the first file in the blob_list and taking the first and last block, this would work for blobs that are appended like nsgflowlogs. The configuration can be set to override the learning. In case learning fails and the option is not set, the default is to use the 'records' as set by nsgflowlogs.
71
80
  config :file_head, :validate => :string, :required => false, :default => '{"records":['
72
81
  # The string that ends the JSON
@@ -109,30 +118,7 @@ def run(queue)
109
118
  @processed = 0
110
119
  @regsaved = @processed
111
120
 
112
- # Try in this order to access the storageaccount
113
- # 1. storageaccount / sas_token
114
- # 2. connection_string
115
- # 3. storageaccount / access_key
116
-
117
- unless connection_string.nil?
118
- conn = connection_string.value
119
- end
120
- unless sas_token.nil?
121
- unless sas_token.value.start_with?('?')
122
- conn = "BlobEndpoint=https://#{storageaccount}.#{dns_suffix};SharedAccessSignature=#{sas_token.value}"
123
- else
124
- conn = sas_token.value
125
- end
126
- end
127
- unless conn.nil?
128
- @blob_client = Azure::Storage::Blob::BlobService.create_from_connection_string(conn)
129
- else
130
- @blob_client = Azure::Storage::Blob::BlobService.create(
131
- storage_account_name: storageaccount,
132
- storage_dns_suffix: dns_suffix,
133
- storage_access_key: access_key.value,
134
- )
135
- end
121
+ connect
136
122
 
137
123
  @registry = Hash.new
138
124
  if registry_create_policy == "resume"
@@ -167,7 +153,7 @@ def run(queue)
167
153
  if registry_create_policy == "start_fresh"
168
154
  @registry = list_blobs(true)
169
155
  save_registry(@registry)
170
- @logger.info("starting fresh, overwriting the registry to contain #{@registry.size} blobs/files")
156
+ @logger.info("starting fresh, writing a clean registry to contain #{@registry.size} blobs/files")
171
157
  end
172
158
 
173
159
  @is_json = false
@@ -180,12 +166,14 @@ def run(queue)
180
166
  @tail = ''
181
167
  # if codec=json sniff one files blocks A and Z to learn file_head and file_tail
182
168
  if @is_json
183
- learn_encapsulation
184
169
  if file_head
185
- @head = file_head
170
+ @head = file_head
186
171
  end
187
172
  if file_tail
188
- @tail = file_tail
173
+ @tail = file_tail
174
+ end
175
+ if file_head and file_tail and !skip_learning
176
+ learn_encapsulation
189
177
  end
190
178
  @logger.info("head will be: #{@head} and tail is set to #{@tail}")
191
179
  end
@@ -223,33 +211,55 @@ def run(queue)
223
211
  newreg.store(name, { :offset => off, :length => file[:length] })
224
212
  if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
225
213
  end
214
+ # size nilClass when the list doesn't grow?!
226
215
  # Worklist is the subset of files where the already read offset is smaller than the file size
227
- worklist.clear
216
+ @registry = newreg
217
+ worklist.clear
218
+ chunk = nil
219
+
228
220
  worklist = newreg.select {|name,file| file[:offset] < file[:length]}
229
221
  if (worklist.size > 4) then @logger.info("worklist contains #{worklist.size} blobs") end
230
222
 
231
223
  # Start of processing
232
224
  # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
233
- worklist.each do |name, file|
225
+ if (worklist.size > 0) then
226
+ worklist.each do |name, file|
234
227
  start = Time.now.to_i
235
228
  if (@debug_until > @processed) then @logger.info("3: processing #{name} from #{file[:offset]} to #{file[:length]}") end
236
229
  size = 0
237
230
  if file[:offset] == 0
238
- chunk = full_read(name)
239
- size=chunk.size
231
+ # This is where Sera4000 issue starts
232
+ # For an append blob, reading full and crashing, retry, last_modified? ... lenght? ... committed? ...
233
+ # length and skip reg value
234
+ if (file[:length] > 0)
235
+ begin
236
+ chunk = full_read(name)
237
+ size=chunk.size
238
+ rescue Exception => e
239
+ @logger.error("Failed to read #{name} because of: #{e.message} .. will continue, set file as read and pretend this never happened")
240
+ @logger.error("#{size} size and #{file[:length]} file length")
241
+ size = file[:length]
242
+ end
243
+ else
244
+ @logger.info("found a zero size file #{name}")
245
+ chunk = nil
246
+ end
240
247
  else
241
248
  chunk = partial_read_json(name, file[:offset], file[:length])
242
249
  @logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
243
250
  end
244
251
  if logtype == "nsgflowlog" && @is_json
252
+ # skip empty chunks
253
+ unless chunk.nil?
245
254
  res = resource(name)
246
255
  begin
247
256
  fingjson = JSON.parse(chunk)
248
- @processed += nsgflowlog(queue, fingjson)
257
+ @processed += nsgflowlog(queue, fingjson, name)
249
258
  @logger.debug("Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
250
259
  rescue JSON::ParserError
251
260
  @logger.error("parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
252
261
  end
262
+ end
253
263
  # TODO: Convert this to line based grokking.
254
264
  # TODO: ECS Compliance?
255
265
  elsif logtype == "wadiis" && !@is_json
@@ -257,13 +267,17 @@ def run(queue)
257
267
  else
258
268
  counter = 0
259
269
  begin
260
- @codec.decode(chunk) do |event|
270
+ @codec.decode(chunk) do |event|
261
271
  counter += 1
272
+ if @addfilename
273
+ event.set('filename', name)
274
+ end
262
275
  decorate(event)
263
276
  queue << event
264
277
  end
265
278
  rescue Exception => e
266
279
  @logger.error("codec exception: #{e.message} .. will continue and pretend this never happened")
280
+ @registry.store(name, { :offset => file[:length], :length => file[:length] })
267
281
  @logger.debug("#{chunk}")
268
282
  end
269
283
  @processed += counter
@@ -279,6 +293,7 @@ def run(queue)
279
293
  if ((Time.now.to_i - @last) > @interval)
280
294
  save_registry(@registry)
281
295
  end
296
+ end
282
297
  end
283
298
  # The files that got processed after the last registry save need to be saved too, in case the worklist is empty for some intervals.
284
299
  now = Time.now.to_i
@@ -302,8 +317,54 @@ end
302
317
 
303
318
 
304
319
  private
320
+ def connect
321
+ # Try in this order to access the storageaccount
322
+ # 1. storageaccount / sas_token
323
+ # 2. connection_string
324
+ # 3. storageaccount / access_key
325
+
326
+ unless connection_string.nil?
327
+ conn = connection_string.value
328
+ end
329
+ unless sas_token.nil?
330
+ unless sas_token.value.start_with?('?')
331
+ conn = "BlobEndpoint=https://#{storageaccount}.#{dns_suffix};SharedAccessSignature=#{sas_token.value}"
332
+ else
333
+ conn = sas_token.value
334
+ end
335
+ end
336
+ unless conn.nil?
337
+ @blob_client = Azure::Storage::Blob::BlobService.create_from_connection_string(conn)
338
+ else
339
+ # unless use_development_storage?
340
+ @blob_client = Azure::Storage::Blob::BlobService.create(
341
+ storage_account_name: storageaccount,
342
+ storage_dns_suffix: dns_suffix,
343
+ storage_access_key: access_key.value,
344
+ )
345
+ # else
346
+ # @logger.info("not yet implemented")
347
+ # end
348
+ end
349
+ end
350
+
305
351
  def full_read(filename)
306
- return @blob_client.get_blob(container, filename)[1]
352
+ tries ||= 2
353
+ begin
354
+ return @blob_client.get_blob(container, filename)[1]
355
+ rescue Exception => e
356
+ @logger.error("caught: #{e.message} for full_read")
357
+ if (tries -= 1) > 0
358
+ if e.message = "Connection reset by peer"
359
+ connect
360
+ end
361
+ retry
362
+ end
363
+ end
364
+ begin
365
+ chuck = @blob_client.get_blob(container, filename)[1]
366
+ end
367
+ return chuck
307
368
  end
308
369
 
309
370
  def partial_read_json(filename, offset, length)
@@ -326,8 +387,7 @@ def strip_comma(str)
326
387
  end
327
388
 
328
389
 
329
-
330
- def nsgflowlog(queue, json)
390
+ def nsgflowlog(queue, json, name)
331
391
  count=0
332
392
  json["records"].each do |record|
333
393
  res = resource(record["resourceId"])
@@ -340,9 +400,16 @@ def nsgflowlog(queue, json)
340
400
  tups = tup.split(',')
341
401
  ev = rule.merge({:unixtimestamp => tups[0], :src_ip => tups[1], :dst_ip => tups[2], :src_port => tups[3], :dst_port => tups[4], :protocol => tups[5], :direction => tups[6], :decision => tups[7]})
342
402
  if (record["properties"]["Version"]==2)
403
+ tups[9] = 0 if tups[9].nil?
404
+ tups[10] = 0 if tups[10].nil?
405
+ tups[11] = 0 if tups[11].nil?
406
+ tups[12] = 0 if tups[12].nil?
343
407
  ev.merge!( {:flowstate => tups[8], :src_pack => tups[9], :src_bytes => tups[10], :dst_pack => tups[11], :dst_bytes => tups[12]} )
344
408
  end
345
409
  @logger.trace(ev.to_s)
410
+ if @addfilename
411
+ ev.merge!( {:filename => name } )
412
+ end
346
413
  event = LogStash::Event.new('message' => ev.to_json)
347
414
  decorate(event)
348
415
  queue << event
@@ -429,10 +496,10 @@ def save_registry(filelist)
429
496
  @busy_writing_registry = true
430
497
  unless (@registry_local_path)
431
498
  @blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
432
- @logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
499
+ @logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to remote registry #{registry_path}")
433
500
  else
434
501
  File.open(@registry_local_path+"/"+@pipe_id, 'w') { |file| file.write(Marshal.dump(filelist)) }
435
- @logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_local_path+"/"+@pipe_id}")
502
+ @logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to local registry #{registry_local_path+"/"+@pipe_id}")
436
503
  end
437
504
  @busy_writing_registry = false
438
505
  @last = Time.now.to_i
@@ -446,21 +513,35 @@ def save_registry(filelist)
446
513
  end
447
514
  end
448
515
 
516
+
449
517
  def learn_encapsulation
518
+ @logger.info("learn_encapsulation, this can be skipped by setting skip_learning => true. Or set both head_file and tail_file")
450
519
  # From one file, read first block and last block to learn head and tail
451
- # If the blobstorage can't be found, an error from farraday middleware will come with the text
452
- # org.jruby.ext.set.RubySet cannot be cast to class org.jruby.RubyFixnum
453
- blob = @blob_client.list_blobs(container, { maxresults: 1, prefix: @prefix }).first
454
- return if blob.nil?
455
- blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
456
- # TODO add check for empty blocks and log error that the header and footer can't be learned and must be set in the config
457
- @logger.debug("using #{blob.name} to learn the json header and tail")
458
- @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
459
- @logger.debug("learned header: #{@head}")
460
- length = blob.properties[:content_length].to_i
461
- offset = length - blocks.last.size
462
- @tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
463
- @logger.debug("learned tail: #{@tail}")
520
+ begin
521
+ blobs = @blob_client.list_blobs(container, { max_results: 3, prefix: @prefix})
522
+ blobs.each do |blob|
523
+ unless blob.name == registry_path
524
+ begin
525
+ blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
526
+ if blocks.first.name.start_with?('A00')
527
+ @logger.debug("using #{blob.name}/#{blocks.first.name} to learn the json header")
528
+ @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
529
+ end
530
+ if blocks.last.name.start_with?('Z00')
531
+ @logger.debug("using #{blob.name}/#{blocks.last.name} to learn the json footer")
532
+ length = blob.properties[:content_length].to_i
533
+ offset = length - blocks.last.size
534
+ @tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
535
+ @logger.debug("learned tail: #{@tail}")
536
+ end
537
+ rescue Exception => e
538
+ @logger.info("learn json one of the attempts failed #{e.message}")
539
+ end
540
+ end
541
+ end
542
+ rescue Exception => e
543
+ @logger.info("learn json header and footer failed because #{e.message}")
544
+ end
464
545
  end
465
546
 
466
547
  def resource(str)
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-input-azure_blob_storage'
3
- s.version = '0.11.4'
3
+ s.version = '0.12.0'
4
4
  s.licenses = ['Apache-2.0']
5
5
  s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
6
6
  s.description = <<-EOF
@@ -22,6 +22,6 @@ EOF
22
22
  # Gem dependencies
23
23
  s.add_runtime_dependency 'logstash-core-plugin-api', '~> 2.1'
24
24
  s.add_runtime_dependency 'stud', '~> 0.0.23'
25
- s.add_runtime_dependency 'azure-storage-blob', '~> 1.1'
26
- s.add_development_dependency 'logstash-devutils', '~> 1.0', '>= 1.0.0'
25
+ s.add_runtime_dependency 'azure-storage-blob', '~> 2', '>= 2.0.3'
26
+ #s.add_development_dependency 'logstash-devutils', '~> 2'
27
27
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-azure_blob_storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.11.4
4
+ version: 0.12.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Geertsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-05-23 00:00:00.000000000 Z
11
+ date: 2021-12-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -17,8 +17,8 @@ dependencies:
17
17
  - !ruby/object:Gem::Version
18
18
  version: '2.1'
19
19
  name: logstash-core-plugin-api
20
- type: :runtime
21
20
  prerelease: false
21
+ type: :runtime
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
@@ -31,8 +31,8 @@ dependencies:
31
31
  - !ruby/object:Gem::Version
32
32
  version: 0.0.23
33
33
  name: stud
34
- type: :runtime
35
34
  prerelease: false
35
+ type: :runtime
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
@@ -43,35 +43,21 @@ dependencies:
43
43
  requirements:
44
44
  - - "~>"
45
45
  - !ruby/object:Gem::Version
46
- version: '1.1'
46
+ version: '2'
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: 2.0.3
47
50
  name: azure-storage-blob
48
- type: :runtime
49
51
  prerelease: false
52
+ type: :runtime
50
53
  version_requirements: !ruby/object:Gem::Requirement
51
54
  requirements:
52
55
  - - "~>"
53
56
  - !ruby/object:Gem::Version
54
- version: '1.1'
55
- - !ruby/object:Gem::Dependency
56
- requirement: !ruby/object:Gem::Requirement
57
- requirements:
58
- - - ">="
59
- - !ruby/object:Gem::Version
60
- version: 1.0.0
61
- - - "~>"
62
- - !ruby/object:Gem::Version
63
- version: '1.0'
64
- name: logstash-devutils
65
- type: :development
66
- prerelease: false
67
- version_requirements: !ruby/object:Gem::Requirement
68
- requirements:
57
+ version: '2'
69
58
  - - ">="
70
59
  - !ruby/object:Gem::Version
71
- version: 1.0.0
72
- - - "~>"
73
- - !ruby/object:Gem::Version
74
- version: '1.0'
60
+ version: 2.0.3
75
61
  description: " This gem is a Logstash plugin. It reads and parses data from Azure\
76
62
  \ Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob\
77
63
  \ from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial\
@@ -112,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
112
98
  - !ruby/object:Gem::Version
113
99
  version: '0'
114
100
  requirements: []
115
- rubygems_version: 3.0.6
101
+ rubygems_version: 3.1.6
116
102
  signing_key:
117
103
  specification_version: 4
118
104
  summary: This logstash plugin reads and parses data from Azure Storage Blobs.