logstash-input-azure_blob_storage 0.11.4 → 0.12.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +24 -1
- data/README.md +52 -21
- data/lib/logstash/inputs/azure_blob_storage.rb +133 -52
- data/logstash-input-azure_blob_storage.gemspec +3 -3
- metadata +12 -26
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: bcf097b26eafe13b09cbaca77a097c10cc6e429b51125c6e82f27e8057e6ccab
|
4
|
+
data.tar.gz: cf229e45283fc69d29d751b75c4fce42b432103ef49d7ec018dea810477d4b32
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ad5a05a919398a665b70ee177ba2a43f53e74462cfa4b1afb308caa2d065ece6d923142b12f16c5385e3edc840cbe453c7937b5242dd697239a240c8295e4418
|
7
|
+
data.tar.gz: 6be87f645933465f9edc34b675d7dd7dd861bbf1990d9fe9919731da14fd450523baa5bafefbdd9fcad4ec9aef70528d258090c2c545d4879c27069a514384ec
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,29 @@
|
|
1
|
+
## 0.12.0
|
2
|
+
- version 2 of azure-storage
|
3
|
+
- saving current files registry, not keeping historical files
|
4
|
+
|
5
|
+
## 0.11.7
|
6
|
+
- implemented skip_learning
|
7
|
+
- start ignoring failed files and not retry
|
8
|
+
|
9
|
+
## 0.11.6
|
10
|
+
- fix in json head and tail learning the max_results
|
11
|
+
- broke out connection setup in order to call it again if connection exceptions come
|
12
|
+
- deal better with skipping of empty files.
|
13
|
+
|
14
|
+
## 0.11.5
|
15
|
+
- added optional addfilename to add filename in message
|
16
|
+
- NSGFLOWLOG version 2 uses 0 as value instead of NULL in src and dst values
|
17
|
+
- added connection exception handling when full_read files
|
18
|
+
- rewritten json header footer learning to ignore learning from registry
|
19
|
+
- plumbing for emulator
|
20
|
+
|
1
21
|
## 0.11.4
|
2
22
|
- fixed listing 3 times, rather than retrying to list max 3 times
|
3
|
-
- added
|
23
|
+
- added option to migrate/save to using local registry
|
24
|
+
- rewrote interval timing
|
25
|
+
- reduced saving of registry to maximum once per interval, protect duplicate simultanious writes
|
26
|
+
- added debug_timer for better tracing how long operations take
|
4
27
|
- removing pipeline name from logfiles, logstash 7.6 and up have this in the log4j2 by default now
|
5
28
|
- moved initialization from register to run. should make logs more readable
|
6
29
|
|
data/README.md
CHANGED
@@ -1,30 +1,34 @@
|
|
1
|
-
# Logstash
|
1
|
+
# Logstash
|
2
2
|
|
3
|
-
This is a plugin for [Logstash](https://github.com/elastic/logstash).
|
3
|
+
This is a plugin for [Logstash](https://github.com/elastic/logstash). It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way. All logstash plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/). Need generic logstash help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.
|
4
4
|
|
5
|
-
|
5
|
+
For problems or feature requests with this specific plugin, raise a github issue [GITHUB/janmg/logstash-input-azure_blob_storage/](https://github.com/janmg/logstash-input-azure_blob_storage). Pull requests will also be welcomed after discussion through an issue.
|
6
6
|
|
7
|
-
##
|
8
|
-
|
9
|
-
|
7
|
+
## Purpose
|
8
|
+
This plugin can read from Azure Storage Blobs, for instance JSON diagnostics logs for NSG flow logs or LINE based accesslogs from App Services.
|
9
|
+
[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
|
10
10
|
|
11
|
-
|
11
|
+
The plugin depends on the [Ruby library azure-storage-blon](https://rubygems.org/gems/azure-storage-blob/versions/1.1.0) from Microsoft, that depends on Faraday for the HTTPS connection to Azure.
|
12
12
|
|
13
|
-
|
13
|
+
The plugin executes the following steps
|
14
|
+
1. Lists all the files in the azure storage account. where the path of the files are matching pathprefix
|
15
|
+
2. Filters on path_filters to only include files that match the directory and file glob (e.g. **/*.json)
|
16
|
+
3. Save the listed files in a registry of known files and filesizes. (data/registry.dat on azure, or in a file on the logstash instance)
|
17
|
+
4. List all the files again and compare the registry with the new filelist and put the delta in a worklist
|
18
|
+
5. Process the worklist and put all events in the logstash queue.
|
19
|
+
6. if there is time left, sleep to complete the interval. If processing takes more than an inteval, save the registry and continue processing.
|
20
|
+
7. If logstash is stopped, a stop signal will try to finish the current file, save the registry and than quit
|
14
21
|
|
15
|
-
## Purpose
|
16
|
-
This plugin can read from Azure Storage Blobs, for instance diagnostics logs for NSG flow logs or accesslogs from App Services.
|
17
|
-
[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
|
18
|
-
This
|
19
22
|
## Installation
|
20
23
|
This plugin can be installed through logstash-plugin
|
21
24
|
```
|
22
|
-
logstash-plugin install logstash-input-azure_blob_storage
|
25
|
+
/usr/share/logstash/bin/logstash-plugin install logstash-input-azure_blob_storage
|
23
26
|
```
|
24
27
|
|
25
28
|
## Minimal Configuration
|
26
29
|
The minimum configuration required as input is storageaccount, access_key and container.
|
27
30
|
|
31
|
+
/etc/logstash/conf.d/test.conf
|
28
32
|
```
|
29
33
|
input {
|
30
34
|
azure_blob_storage {
|
@@ -36,23 +40,29 @@ input {
|
|
36
40
|
```
|
37
41
|
|
38
42
|
## Additional Configuration
|
39
|
-
The
|
43
|
+
The registry keeps track of files in the storage account, their size and how many bytes have been processed. Files can grow and the added part will be processed as a partial file. The registry is saved todisk every interval.
|
44
|
+
|
45
|
+
The registry_create_policy determines at the start of the pipeline if processing should resume from the last known unprocessed file, or to start_fresh ignoring old files and start only processing new events that came after the start of the pipeline. Or start_over to process all the files ignoring the registry.
|
40
46
|
|
41
|
-
interval defines the minimum time the registry should be saved to the registry file (by default 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
|
47
|
+
interval defines the minimum time the registry should be saved to the registry file (by default to 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
|
42
48
|
|
43
|
-
|
49
|
+
When registry_local_path is set to a directory, the registry is saved on the logstash server in that directory. The filename is the pipe.id
|
50
|
+
|
51
|
+
with registry_create_policy set to resume and the registry_local_path set to a directory where the registry isn't yet created, should load the registry from the storage account and save the registry on the local server. This allows for a migration to localstorage
|
52
|
+
|
53
|
+
For pipelines that use the JSON codec or the JSON_LINE codec, the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually. Using skip_learning the learning can be disabled.
|
44
54
|
|
45
55
|
## Running the pipeline
|
46
56
|
The pipeline can be started in several ways.
|
47
57
|
- On the commandline
|
48
58
|
```
|
49
|
-
/usr/share/logstash/bin/logtash -f /etc/logstash/
|
59
|
+
/usr/share/logstash/bin/logtash -f /etc/logstash/conf.d/test.conf
|
50
60
|
```
|
51
61
|
- In the pipeline.yml
|
52
62
|
```
|
53
63
|
/etc/logstash/pipeline.yml
|
54
64
|
pipe.id = test
|
55
|
-
pipe.path = /etc/logstash/
|
65
|
+
pipe.path = /etc/logstash/conf.d/test.conf
|
56
66
|
```
|
57
67
|
- As managed pipeline from Kibana
|
58
68
|
|
@@ -91,6 +101,9 @@ The log level of the plugin can be put into DEBUG through
|
|
91
101
|
curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.inputs.azureblobstorage" : "DEBUG"}'
|
92
102
|
```
|
93
103
|
|
104
|
+
Because logstash debug makes logstash very chatty, the option debug_until will for a number of processed events and stops debuging. One file can easily contain thousands of events. The debug_until is useful to monitor the start of the plugin and the processing of the first files.
|
105
|
+
|
106
|
+
debug_timer will show detailed information on how much time listing of files took and how long the plugin will sleep to fill the interval and the listing and processing starts again.
|
94
107
|
|
95
108
|
## Other Configuration Examples
|
96
109
|
For nsgflowlogs, a simple configuration looks like this
|
@@ -116,6 +129,10 @@ filter {
|
|
116
129
|
}
|
117
130
|
}
|
118
131
|
|
132
|
+
output {
|
133
|
+
stdout { }
|
134
|
+
}
|
135
|
+
|
119
136
|
output {
|
120
137
|
elasticsearch {
|
121
138
|
hosts => "elasticsearch"
|
@@ -123,21 +140,35 @@ output {
|
|
123
140
|
}
|
124
141
|
}
|
125
142
|
```
|
126
|
-
|
143
|
+
A more elaborate input configuration example
|
127
144
|
```
|
128
145
|
input {
|
129
146
|
azure_blob_storage {
|
147
|
+
codec => "json"
|
130
148
|
storageaccount => "yourstorageaccountname"
|
131
149
|
access_key => "Ba5e64c0d3=="
|
132
150
|
container => "insights-logs-networksecuritygroupflowevent"
|
133
|
-
codec => "json"
|
134
151
|
logtype => "nsgflowlog"
|
135
152
|
prefix => "resourceId=/"
|
153
|
+
path_filters => ['**/*.json']
|
154
|
+
addfilename => true
|
136
155
|
registry_create_policy => "resume"
|
156
|
+
registry_local_path => "/usr/share/logstash/plugin"
|
137
157
|
interval => 300
|
158
|
+
debug_timer => true
|
159
|
+
debug_until => 100
|
160
|
+
}
|
161
|
+
}
|
162
|
+
|
163
|
+
output {
|
164
|
+
elasticsearch {
|
165
|
+
hosts => "elasticsearch"
|
166
|
+
index => "nsg-flow-logs-%{+xxxx.ww}"
|
138
167
|
}
|
139
168
|
}
|
140
169
|
```
|
170
|
+
The configuration documentation is in the first 100 lines of the code
|
171
|
+
[GITHUB/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb](https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb)
|
141
172
|
|
142
173
|
For WAD IIS and App Services the HTTP AccessLogs can be retrieved from a storage account as line based events and parsed through GROK. The date stamp can also be parsed with %{TIMESTAMP_ISO8601:log_timestamp}. For WAD IIS logfiles the container is wad-iis-logfiles. In the future grokking may happen already by the plugin.
|
143
174
|
```
|
@@ -176,7 +207,7 @@ filter {
|
|
176
207
|
remove_field => ["subresponse"]
|
177
208
|
remove_field => ["username"]
|
178
209
|
remove_field => ["clientPort"]
|
179
|
-
remove_field => ["port"]
|
210
|
+
remove_field => ["port"]:0
|
180
211
|
remove_field => ["timestamp"]
|
181
212
|
}
|
182
213
|
}
|
@@ -25,6 +25,9 @@ config :storageaccount, :validate => :string, :required => false
|
|
25
25
|
# DNS Suffix other then blob.core.windows.net
|
26
26
|
config :dns_suffix, :validate => :string, :required => false, :default => 'core.windows.net'
|
27
27
|
|
28
|
+
# For development this can be used to emulate an accountstorage when not available from azure
|
29
|
+
#config :use_development_storage, :validate => :boolean, :required => false
|
30
|
+
|
28
31
|
# The (primary or secondary) Access Key for the the storage account. The key can be found in the portal.azure.com or through the azure api StorageAccounts/ListKeys. For example the PowerShell command Get-AzStorageAccountKey.
|
29
32
|
config :access_key, :validate => :password, :required => false
|
30
33
|
|
@@ -58,6 +61,9 @@ config :registry_create_policy, :validate => ['resume','start_over','start_fresh
|
|
58
61
|
# Z00000000000000000000000000000000 2 ]}
|
59
62
|
config :interval, :validate => :number, :default => 60
|
60
63
|
|
64
|
+
# add the filename into the events
|
65
|
+
config :addfilename, :validate => :boolean, :default => false, :required => false
|
66
|
+
|
61
67
|
# debug_until will for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. This is a lightweight alternative to switching the loglevel from info to debug or even trace
|
62
68
|
config :debug_until, :validate => :number, :default => 0, :required => false
|
63
69
|
|
@@ -67,6 +73,9 @@ config :debug_timer, :validate => :boolean, :default => false, :required => fals
|
|
67
73
|
# WAD IIS Grok Pattern
|
68
74
|
#config :grokpattern, :validate => :string, :required => false, :default => '%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:instanceId} %{NOTSPACE:instanceId2} %{IPORHOST:ServerIP} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:httpVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}'
|
69
75
|
|
76
|
+
# skip learning if you use json and don't want to learn the head and tail, but use either the defaults or configure them.
|
77
|
+
config :skip_learning, :validate => :boolean, :default => false, :required => false
|
78
|
+
|
70
79
|
# The string that starts the JSON. Only needed when the codec is JSON. When partial file are read, the result will not be valid JSON unless the start and end are put back. the file_head and file_tail are learned at startup, by reading the first file in the blob_list and taking the first and last block, this would work for blobs that are appended like nsgflowlogs. The configuration can be set to override the learning. In case learning fails and the option is not set, the default is to use the 'records' as set by nsgflowlogs.
|
71
80
|
config :file_head, :validate => :string, :required => false, :default => '{"records":['
|
72
81
|
# The string that ends the JSON
|
@@ -109,30 +118,7 @@ def run(queue)
|
|
109
118
|
@processed = 0
|
110
119
|
@regsaved = @processed
|
111
120
|
|
112
|
-
|
113
|
-
# 1. storageaccount / sas_token
|
114
|
-
# 2. connection_string
|
115
|
-
# 3. storageaccount / access_key
|
116
|
-
|
117
|
-
unless connection_string.nil?
|
118
|
-
conn = connection_string.value
|
119
|
-
end
|
120
|
-
unless sas_token.nil?
|
121
|
-
unless sas_token.value.start_with?('?')
|
122
|
-
conn = "BlobEndpoint=https://#{storageaccount}.#{dns_suffix};SharedAccessSignature=#{sas_token.value}"
|
123
|
-
else
|
124
|
-
conn = sas_token.value
|
125
|
-
end
|
126
|
-
end
|
127
|
-
unless conn.nil?
|
128
|
-
@blob_client = Azure::Storage::Blob::BlobService.create_from_connection_string(conn)
|
129
|
-
else
|
130
|
-
@blob_client = Azure::Storage::Blob::BlobService.create(
|
131
|
-
storage_account_name: storageaccount,
|
132
|
-
storage_dns_suffix: dns_suffix,
|
133
|
-
storage_access_key: access_key.value,
|
134
|
-
)
|
135
|
-
end
|
121
|
+
connect
|
136
122
|
|
137
123
|
@registry = Hash.new
|
138
124
|
if registry_create_policy == "resume"
|
@@ -167,7 +153,7 @@ def run(queue)
|
|
167
153
|
if registry_create_policy == "start_fresh"
|
168
154
|
@registry = list_blobs(true)
|
169
155
|
save_registry(@registry)
|
170
|
-
@logger.info("starting fresh,
|
156
|
+
@logger.info("starting fresh, writing a clean registry to contain #{@registry.size} blobs/files")
|
171
157
|
end
|
172
158
|
|
173
159
|
@is_json = false
|
@@ -180,12 +166,14 @@ def run(queue)
|
|
180
166
|
@tail = ''
|
181
167
|
# if codec=json sniff one files blocks A and Z to learn file_head and file_tail
|
182
168
|
if @is_json
|
183
|
-
learn_encapsulation
|
184
169
|
if file_head
|
185
|
-
|
170
|
+
@head = file_head
|
186
171
|
end
|
187
172
|
if file_tail
|
188
|
-
|
173
|
+
@tail = file_tail
|
174
|
+
end
|
175
|
+
if file_head and file_tail and !skip_learning
|
176
|
+
learn_encapsulation
|
189
177
|
end
|
190
178
|
@logger.info("head will be: #{@head} and tail is set to #{@tail}")
|
191
179
|
end
|
@@ -223,33 +211,55 @@ def run(queue)
|
|
223
211
|
newreg.store(name, { :offset => off, :length => file[:length] })
|
224
212
|
if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
|
225
213
|
end
|
214
|
+
# size nilClass when the list doesn't grow?!
|
226
215
|
# Worklist is the subset of files where the already read offset is smaller than the file size
|
227
|
-
|
216
|
+
@registry = newreg
|
217
|
+
worklist.clear
|
218
|
+
chunk = nil
|
219
|
+
|
228
220
|
worklist = newreg.select {|name,file| file[:offset] < file[:length]}
|
229
221
|
if (worklist.size > 4) then @logger.info("worklist contains #{worklist.size} blobs") end
|
230
222
|
|
231
223
|
# Start of processing
|
232
224
|
# This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
|
233
|
-
worklist.
|
225
|
+
if (worklist.size > 0) then
|
226
|
+
worklist.each do |name, file|
|
234
227
|
start = Time.now.to_i
|
235
228
|
if (@debug_until > @processed) then @logger.info("3: processing #{name} from #{file[:offset]} to #{file[:length]}") end
|
236
229
|
size = 0
|
237
230
|
if file[:offset] == 0
|
238
|
-
|
239
|
-
|
231
|
+
# This is where Sera4000 issue starts
|
232
|
+
# For an append blob, reading full and crashing, retry, last_modified? ... lenght? ... committed? ...
|
233
|
+
# length and skip reg value
|
234
|
+
if (file[:length] > 0)
|
235
|
+
begin
|
236
|
+
chunk = full_read(name)
|
237
|
+
size=chunk.size
|
238
|
+
rescue Exception => e
|
239
|
+
@logger.error("Failed to read #{name} because of: #{e.message} .. will continue, set file as read and pretend this never happened")
|
240
|
+
@logger.error("#{size} size and #{file[:length]} file length")
|
241
|
+
size = file[:length]
|
242
|
+
end
|
243
|
+
else
|
244
|
+
@logger.info("found a zero size file #{name}")
|
245
|
+
chunk = nil
|
246
|
+
end
|
240
247
|
else
|
241
248
|
chunk = partial_read_json(name, file[:offset], file[:length])
|
242
249
|
@logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
|
243
250
|
end
|
244
251
|
if logtype == "nsgflowlog" && @is_json
|
252
|
+
# skip empty chunks
|
253
|
+
unless chunk.nil?
|
245
254
|
res = resource(name)
|
246
255
|
begin
|
247
256
|
fingjson = JSON.parse(chunk)
|
248
|
-
@processed += nsgflowlog(queue, fingjson)
|
257
|
+
@processed += nsgflowlog(queue, fingjson, name)
|
249
258
|
@logger.debug("Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
|
250
259
|
rescue JSON::ParserError
|
251
260
|
@logger.error("parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
|
252
261
|
end
|
262
|
+
end
|
253
263
|
# TODO: Convert this to line based grokking.
|
254
264
|
# TODO: ECS Compliance?
|
255
265
|
elsif logtype == "wadiis" && !@is_json
|
@@ -257,13 +267,17 @@ def run(queue)
|
|
257
267
|
else
|
258
268
|
counter = 0
|
259
269
|
begin
|
260
|
-
|
270
|
+
@codec.decode(chunk) do |event|
|
261
271
|
counter += 1
|
272
|
+
if @addfilename
|
273
|
+
event.set('filename', name)
|
274
|
+
end
|
262
275
|
decorate(event)
|
263
276
|
queue << event
|
264
277
|
end
|
265
278
|
rescue Exception => e
|
266
279
|
@logger.error("codec exception: #{e.message} .. will continue and pretend this never happened")
|
280
|
+
@registry.store(name, { :offset => file[:length], :length => file[:length] })
|
267
281
|
@logger.debug("#{chunk}")
|
268
282
|
end
|
269
283
|
@processed += counter
|
@@ -279,6 +293,7 @@ def run(queue)
|
|
279
293
|
if ((Time.now.to_i - @last) > @interval)
|
280
294
|
save_registry(@registry)
|
281
295
|
end
|
296
|
+
end
|
282
297
|
end
|
283
298
|
# The files that got processed after the last registry save need to be saved too, in case the worklist is empty for some intervals.
|
284
299
|
now = Time.now.to_i
|
@@ -302,8 +317,54 @@ end
|
|
302
317
|
|
303
318
|
|
304
319
|
private
|
320
|
+
def connect
|
321
|
+
# Try in this order to access the storageaccount
|
322
|
+
# 1. storageaccount / sas_token
|
323
|
+
# 2. connection_string
|
324
|
+
# 3. storageaccount / access_key
|
325
|
+
|
326
|
+
unless connection_string.nil?
|
327
|
+
conn = connection_string.value
|
328
|
+
end
|
329
|
+
unless sas_token.nil?
|
330
|
+
unless sas_token.value.start_with?('?')
|
331
|
+
conn = "BlobEndpoint=https://#{storageaccount}.#{dns_suffix};SharedAccessSignature=#{sas_token.value}"
|
332
|
+
else
|
333
|
+
conn = sas_token.value
|
334
|
+
end
|
335
|
+
end
|
336
|
+
unless conn.nil?
|
337
|
+
@blob_client = Azure::Storage::Blob::BlobService.create_from_connection_string(conn)
|
338
|
+
else
|
339
|
+
# unless use_development_storage?
|
340
|
+
@blob_client = Azure::Storage::Blob::BlobService.create(
|
341
|
+
storage_account_name: storageaccount,
|
342
|
+
storage_dns_suffix: dns_suffix,
|
343
|
+
storage_access_key: access_key.value,
|
344
|
+
)
|
345
|
+
# else
|
346
|
+
# @logger.info("not yet implemented")
|
347
|
+
# end
|
348
|
+
end
|
349
|
+
end
|
350
|
+
|
305
351
|
def full_read(filename)
|
306
|
-
|
352
|
+
tries ||= 2
|
353
|
+
begin
|
354
|
+
return @blob_client.get_blob(container, filename)[1]
|
355
|
+
rescue Exception => e
|
356
|
+
@logger.error("caught: #{e.message} for full_read")
|
357
|
+
if (tries -= 1) > 0
|
358
|
+
if e.message = "Connection reset by peer"
|
359
|
+
connect
|
360
|
+
end
|
361
|
+
retry
|
362
|
+
end
|
363
|
+
end
|
364
|
+
begin
|
365
|
+
chuck = @blob_client.get_blob(container, filename)[1]
|
366
|
+
end
|
367
|
+
return chuck
|
307
368
|
end
|
308
369
|
|
309
370
|
def partial_read_json(filename, offset, length)
|
@@ -326,8 +387,7 @@ def strip_comma(str)
|
|
326
387
|
end
|
327
388
|
|
328
389
|
|
329
|
-
|
330
|
-
def nsgflowlog(queue, json)
|
390
|
+
def nsgflowlog(queue, json, name)
|
331
391
|
count=0
|
332
392
|
json["records"].each do |record|
|
333
393
|
res = resource(record["resourceId"])
|
@@ -340,9 +400,16 @@ def nsgflowlog(queue, json)
|
|
340
400
|
tups = tup.split(',')
|
341
401
|
ev = rule.merge({:unixtimestamp => tups[0], :src_ip => tups[1], :dst_ip => tups[2], :src_port => tups[3], :dst_port => tups[4], :protocol => tups[5], :direction => tups[6], :decision => tups[7]})
|
342
402
|
if (record["properties"]["Version"]==2)
|
403
|
+
tups[9] = 0 if tups[9].nil?
|
404
|
+
tups[10] = 0 if tups[10].nil?
|
405
|
+
tups[11] = 0 if tups[11].nil?
|
406
|
+
tups[12] = 0 if tups[12].nil?
|
343
407
|
ev.merge!( {:flowstate => tups[8], :src_pack => tups[9], :src_bytes => tups[10], :dst_pack => tups[11], :dst_bytes => tups[12]} )
|
344
408
|
end
|
345
409
|
@logger.trace(ev.to_s)
|
410
|
+
if @addfilename
|
411
|
+
ev.merge!( {:filename => name } )
|
412
|
+
end
|
346
413
|
event = LogStash::Event.new('message' => ev.to_json)
|
347
414
|
decorate(event)
|
348
415
|
queue << event
|
@@ -429,10 +496,10 @@ def save_registry(filelist)
|
|
429
496
|
@busy_writing_registry = true
|
430
497
|
unless (@registry_local_path)
|
431
498
|
@blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
|
432
|
-
@logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
|
499
|
+
@logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to remote registry #{registry_path}")
|
433
500
|
else
|
434
501
|
File.open(@registry_local_path+"/"+@pipe_id, 'w') { |file| file.write(Marshal.dump(filelist)) }
|
435
|
-
@logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_local_path+"/"+@pipe_id}")
|
502
|
+
@logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to local registry #{registry_local_path+"/"+@pipe_id}")
|
436
503
|
end
|
437
504
|
@busy_writing_registry = false
|
438
505
|
@last = Time.now.to_i
|
@@ -446,21 +513,35 @@ def save_registry(filelist)
|
|
446
513
|
end
|
447
514
|
end
|
448
515
|
|
516
|
+
|
449
517
|
def learn_encapsulation
|
518
|
+
@logger.info("learn_encapsulation, this can be skipped by setting skip_learning => true. Or set both head_file and tail_file")
|
450
519
|
# From one file, read first block and last block to learn head and tail
|
451
|
-
|
452
|
-
|
453
|
-
|
454
|
-
|
455
|
-
|
456
|
-
|
457
|
-
|
458
|
-
|
459
|
-
|
460
|
-
|
461
|
-
|
462
|
-
|
463
|
-
|
520
|
+
begin
|
521
|
+
blobs = @blob_client.list_blobs(container, { max_results: 3, prefix: @prefix})
|
522
|
+
blobs.each do |blob|
|
523
|
+
unless blob.name == registry_path
|
524
|
+
begin
|
525
|
+
blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
|
526
|
+
if blocks.first.name.start_with?('A00')
|
527
|
+
@logger.debug("using #{blob.name}/#{blocks.first.name} to learn the json header")
|
528
|
+
@head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
|
529
|
+
end
|
530
|
+
if blocks.last.name.start_with?('Z00')
|
531
|
+
@logger.debug("using #{blob.name}/#{blocks.last.name} to learn the json footer")
|
532
|
+
length = blob.properties[:content_length].to_i
|
533
|
+
offset = length - blocks.last.size
|
534
|
+
@tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
|
535
|
+
@logger.debug("learned tail: #{@tail}")
|
536
|
+
end
|
537
|
+
rescue Exception => e
|
538
|
+
@logger.info("learn json one of the attempts failed #{e.message}")
|
539
|
+
end
|
540
|
+
end
|
541
|
+
end
|
542
|
+
rescue Exception => e
|
543
|
+
@logger.info("learn json header and footer failed because #{e.message}")
|
544
|
+
end
|
464
545
|
end
|
465
546
|
|
466
547
|
def resource(str)
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'logstash-input-azure_blob_storage'
|
3
|
-
s.version = '0.
|
3
|
+
s.version = '0.12.0'
|
4
4
|
s.licenses = ['Apache-2.0']
|
5
5
|
s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
|
6
6
|
s.description = <<-EOF
|
@@ -22,6 +22,6 @@ EOF
|
|
22
22
|
# Gem dependencies
|
23
23
|
s.add_runtime_dependency 'logstash-core-plugin-api', '~> 2.1'
|
24
24
|
s.add_runtime_dependency 'stud', '~> 0.0.23'
|
25
|
-
s.add_runtime_dependency 'azure-storage-blob', '~>
|
26
|
-
s.add_development_dependency 'logstash-devutils', '~>
|
25
|
+
s.add_runtime_dependency 'azure-storage-blob', '~> 2', '>= 2.0.3'
|
26
|
+
#s.add_development_dependency 'logstash-devutils', '~> 2'
|
27
27
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-input-azure_blob_storage
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.12.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Geertsma
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-12-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -17,8 +17,8 @@ dependencies:
|
|
17
17
|
- !ruby/object:Gem::Version
|
18
18
|
version: '2.1'
|
19
19
|
name: logstash-core-plugin-api
|
20
|
-
type: :runtime
|
21
20
|
prerelease: false
|
21
|
+
type: :runtime
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - "~>"
|
@@ -31,8 +31,8 @@ dependencies:
|
|
31
31
|
- !ruby/object:Gem::Version
|
32
32
|
version: 0.0.23
|
33
33
|
name: stud
|
34
|
-
type: :runtime
|
35
34
|
prerelease: false
|
35
|
+
type: :runtime
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
38
|
- - "~>"
|
@@ -43,35 +43,21 @@ dependencies:
|
|
43
43
|
requirements:
|
44
44
|
- - "~>"
|
45
45
|
- !ruby/object:Gem::Version
|
46
|
-
version: '
|
46
|
+
version: '2'
|
47
|
+
- - ">="
|
48
|
+
- !ruby/object:Gem::Version
|
49
|
+
version: 2.0.3
|
47
50
|
name: azure-storage-blob
|
48
|
-
type: :runtime
|
49
51
|
prerelease: false
|
52
|
+
type: :runtime
|
50
53
|
version_requirements: !ruby/object:Gem::Requirement
|
51
54
|
requirements:
|
52
55
|
- - "~>"
|
53
56
|
- !ruby/object:Gem::Version
|
54
|
-
version: '
|
55
|
-
- !ruby/object:Gem::Dependency
|
56
|
-
requirement: !ruby/object:Gem::Requirement
|
57
|
-
requirements:
|
58
|
-
- - ">="
|
59
|
-
- !ruby/object:Gem::Version
|
60
|
-
version: 1.0.0
|
61
|
-
- - "~>"
|
62
|
-
- !ruby/object:Gem::Version
|
63
|
-
version: '1.0'
|
64
|
-
name: logstash-devutils
|
65
|
-
type: :development
|
66
|
-
prerelease: false
|
67
|
-
version_requirements: !ruby/object:Gem::Requirement
|
68
|
-
requirements:
|
57
|
+
version: '2'
|
69
58
|
- - ">="
|
70
59
|
- !ruby/object:Gem::Version
|
71
|
-
version:
|
72
|
-
- - "~>"
|
73
|
-
- !ruby/object:Gem::Version
|
74
|
-
version: '1.0'
|
60
|
+
version: 2.0.3
|
75
61
|
description: " This gem is a Logstash plugin. It reads and parses data from Azure\
|
76
62
|
\ Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob\
|
77
63
|
\ from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial\
|
@@ -112,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
112
98
|
- !ruby/object:Gem::Version
|
113
99
|
version: '0'
|
114
100
|
requirements: []
|
115
|
-
rubygems_version: 3.
|
101
|
+
rubygems_version: 3.1.6
|
116
102
|
signing_key:
|
117
103
|
specification_version: 4
|
118
104
|
summary: This logstash plugin reads and parses data from Azure Storage Blobs.
|