logstash-input-azure_blob_storage 0.11.0 → 0.11.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +40 -9
- data/README.md +53 -10
- data/lib/logstash/inputs/azure_blob_storage.rb +200 -83
- data/logstash-input-azure_blob_storage.gemspec +3 -3
- metadata +7 -28
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3d446aed971a95e6e17a27ed1e9ec8b141f939b53697fb9c332cfb130404745a
|
4
|
+
data.tar.gz: 4a1321f6c6a30f6787d2133642ca23840371d6f4e18102cb775d345b09eb176a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b4f48a0bebcd6e3594584a4473b223838359d44e9ef591f958aa4c80c4c22953f6b0f708b19faeaf0517c66f47185bda4de75ab4e3618b23e2e7f23f71cb4bee
|
7
|
+
data.tar.gz: 508cd39ea159a4655e590f46ad0108c3b6e6de95ed575c4456da0230bae73fb384ecb7697ed710e7afb1542fe01cbd8a62130acedcbf0ba9c3040ace1f9d76d0
|
data/CHANGELOG.md
CHANGED
@@ -1,24 +1,55 @@
|
|
1
|
+
## 0.11.5
|
2
|
+
- Added optional filename into the message
|
3
|
+
- plumbing for emulator, start_over not learning from registry
|
4
|
+
|
5
|
+
## 0.11.4
|
6
|
+
- fixed listing 3 times, rather than retrying to list max 3 times
|
7
|
+
- added option to migrate/save to using local registry
|
8
|
+
- rewrote interval timing
|
9
|
+
- reduced saving of registry to maximum once per interval, protect duplicate simultanious writes
|
10
|
+
- added debug_timer for better tracing how long operations take
|
11
|
+
- removing pipeline name from logfiles, logstash 7.6 and up have this in the log4j2 by default now
|
12
|
+
- moved initialization from register to run. should make logs more readable
|
13
|
+
|
14
|
+
## 0.11.3
|
15
|
+
- don't crash on failed codec, e.g. gzip_lines could sometimes have a corrupted file?
|
16
|
+
- fix nextmarker loop so that more than 5000 files (or 15000 if faraday doesn't crash)
|
17
|
+
|
18
|
+
## 0.11.2
|
19
|
+
- implemented path_filters to to use path filtering like this **/*.log
|
20
|
+
- implemented debug_until to debug only at the start of a pipeline until it processed enough messages
|
21
|
+
|
22
|
+
## 0.11.1
|
23
|
+
- copied changes from irnc fork (danke!)
|
24
|
+
- fixed trying to load the registry, three time is the charm
|
25
|
+
- logs are less chatty, changed info to debug
|
26
|
+
|
27
|
+
## 0.11.0
|
28
|
+
- implemented start_fresh to skip all previous logs and start monitoring new entries
|
29
|
+
- fixed the timer, now properly sleep the interval and check again
|
30
|
+
- work around for a Faraday Middleware v.s. Azure Storage Account bug in follow_redirect
|
31
|
+
|
1
32
|
## 0.10.6
|
2
|
-
-
|
33
|
+
- fixed the rootcause of the checking the codec. Now compare the classname.
|
3
34
|
|
4
35
|
## 0.10.5
|
5
|
-
-
|
36
|
+
- previous fix broke codec = "line"
|
6
37
|
|
7
38
|
## 0.10.4
|
8
|
-
-
|
39
|
+
- fixed JSON parsing error for partial files because somehow (logstash 7?) @codec.is_a? doesn't work anymore
|
9
40
|
|
10
41
|
## 0.10.3
|
11
|
-
-
|
42
|
+
- fixed issue-1 where iplookup confguration was removed, but still used
|
12
43
|
- iplookup is now done by a separate plugin named logstash-filter-weblookup
|
13
44
|
|
14
45
|
## 0.10.2
|
15
46
|
- moved iplookup to own plugin logstash-filter-lookup
|
16
47
|
|
17
48
|
## 0.10.1
|
18
|
-
-
|
19
|
-
-
|
20
|
-
-
|
49
|
+
- implemented iplookup
|
50
|
+
- fixed sas tokens (maybe)
|
51
|
+
- introduced dns_suffix
|
21
52
|
|
22
53
|
## 0.10.0
|
23
|
-
-
|
24
|
-
-
|
54
|
+
- plugin created with the logstash plugin generator
|
55
|
+
- reimplemented logstash-input-azureblob with incompatible config and data/registry
|
data/README.md
CHANGED
@@ -6,7 +6,7 @@ It is fully free and fully open source. The license is Apache 2.0, meaning you a
|
|
6
6
|
|
7
7
|
## Documentation
|
8
8
|
|
9
|
-
All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
|
9
|
+
All logstash plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
|
10
10
|
|
11
11
|
## Need Help?
|
12
12
|
|
@@ -15,15 +15,61 @@ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/log
|
|
15
15
|
## Purpose
|
16
16
|
This plugin can read from Azure Storage Blobs, for instance diagnostics logs for NSG flow logs or accesslogs from App Services.
|
17
17
|
[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
|
18
|
-
|
19
|
-
After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
|
20
|
-
|
18
|
+
This
|
21
19
|
## Installation
|
22
20
|
This plugin can be installed through logstash-plugin
|
23
21
|
```
|
24
22
|
logstash-plugin install logstash-input-azure_blob_storage
|
25
23
|
```
|
26
24
|
|
25
|
+
## Minimal Configuration
|
26
|
+
The minimum configuration required as input is storageaccount, access_key and container.
|
27
|
+
|
28
|
+
```
|
29
|
+
input {
|
30
|
+
azure_blob_storage {
|
31
|
+
storageaccount => "yourstorageaccountname"
|
32
|
+
access_key => "Ba5e64c0d3=="
|
33
|
+
container => "insights-logs-networksecuritygroupflowevent"
|
34
|
+
}
|
35
|
+
}
|
36
|
+
```
|
37
|
+
|
38
|
+
## Additional Configuration
|
39
|
+
The registry_create_policy is used when the pipeline is started to either resume from the last known unprocessed file, or to start_fresh ignoring old files or start_over to process all the files from the beginning.
|
40
|
+
|
41
|
+
interval defines the minimum time the registry should be saved to the registry file (by default 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
|
42
|
+
|
43
|
+
When registry_local_path is set to a directory, the registry is save on the logstash server in that directory. The filename is the pipe.id
|
44
|
+
|
45
|
+
with registry_create_policy set to resume and the registry_local_path set to a directory where the registry isn't yet created, should load from the storage account and save the registry on the local server
|
46
|
+
|
47
|
+
During the pipeline start for JSON codec, the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually.
|
48
|
+
|
49
|
+
## Running the pipeline
|
50
|
+
The pipeline can be started in several ways.
|
51
|
+
- On the commandline
|
52
|
+
```
|
53
|
+
/usr/share/logstash/bin/logtash -f /etc/logstash/pipeline.d/test.yml
|
54
|
+
```
|
55
|
+
- In the pipeline.yml
|
56
|
+
```
|
57
|
+
/etc/logstash/pipeline.yml
|
58
|
+
pipe.id = test
|
59
|
+
pipe.path = /etc/logstash/pipeline.d/test.yml
|
60
|
+
```
|
61
|
+
- As managed pipeline from Kibana
|
62
|
+
|
63
|
+
Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up). To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
|
64
|
+
[https://www.elastic.co/guide/en/logstash/current/reloading-config.html](https://www.elastic.co/guide/en/logstash/current/reloading-config.html)
|
65
|
+
|
66
|
+
## Internal Working
|
67
|
+
When the plugin is started, it will read all the filenames and sizes in the blob store excluding the directies of files that are excluded by the "path_filters". After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format.
|
68
|
+
|
69
|
+
By default the root of the json message is named "message" so you can modify the content in the filter block
|
70
|
+
|
71
|
+
The configurations and the rest of the code are in [https://github.com/janmg/logstash-input-azure_blob_storage/tree/master/lib/logstash/inputs](lib/logstash/inputs) [https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L10](azure_blob_storage.rb)
|
72
|
+
|
27
73
|
## Enabling NSG Flowlogs
|
28
74
|
1. Enable Network Watcher in your regions
|
29
75
|
2. Create Storage account per region
|
@@ -39,7 +85,6 @@ logstash-plugin install logstash-input-azure_blob_storage
|
|
39
85
|
- Access key (key1 or key2)
|
40
86
|
|
41
87
|
## Troubleshooting
|
42
|
-
|
43
88
|
The default loglevel can be changed in global logstash.yml. On the info level, the plugin save offsets to the registry every interval and will log statistics of processed events (one ) plugin will print for each pipeline the first 6 characters of the ID, in DEBUG the yml log level debug shows details of number of events per (partial) files that are read.
|
44
89
|
```
|
45
90
|
log.level
|
@@ -50,10 +95,9 @@ The log level of the plugin can be put into DEBUG through
|
|
50
95
|
curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.inputs.azureblobstorage" : "DEBUG"}'
|
51
96
|
```
|
52
97
|
|
98
|
+
because debug also makes logstash chatty, there are also debug_timer and debug_until that can be used to print additional informantion on what the pipeline is doing and how long it takes. debug_until is for the number of events until debug is disabled.
|
53
99
|
|
54
|
-
## Configuration Examples
|
55
|
-
The minimum configuration required as input is storageaccount, access_key and container.
|
56
|
-
|
100
|
+
## Other Configuration Examples
|
57
101
|
For nsgflowlogs, a simple configuration looks like this
|
58
102
|
```
|
59
103
|
input {
|
@@ -85,7 +129,6 @@ output {
|
|
85
129
|
}
|
86
130
|
```
|
87
131
|
|
88
|
-
It's possible to specify the optional parameters to overwrite the defaults. The iplookup, use_redis and iplist parameters are used for additional information about the source and destination ip address. Redis can be used for caching the results and iplist is to configure an array of ip addresses.
|
89
132
|
```
|
90
133
|
input {
|
91
134
|
azure_blob_storage {
|
@@ -138,7 +181,7 @@ filter {
|
|
138
181
|
remove_field => ["subresponse"]
|
139
182
|
remove_field => ["username"]
|
140
183
|
remove_field => ["clientPort"]
|
141
|
-
remove_field => ["port"]
|
184
|
+
remove_field => ["port"]:0
|
142
185
|
remove_field => ["timestamp"]
|
143
186
|
}
|
144
187
|
}
|
@@ -25,6 +25,9 @@ config :storageaccount, :validate => :string, :required => false
|
|
25
25
|
# DNS Suffix other then blob.core.windows.net
|
26
26
|
config :dns_suffix, :validate => :string, :required => false, :default => 'core.windows.net'
|
27
27
|
|
28
|
+
# For development this can be used to emulate an accountstorage when not available from azure
|
29
|
+
#config :use_development_storage, :validate => :boolean, :required => false
|
30
|
+
|
28
31
|
# The (primary or secondary) Access Key for the the storage account. The key can be found in the portal.azure.com or through the azure api StorageAccounts/ListKeys. For example the PowerShell command Get-AzStorageAccountKey.
|
29
32
|
config :access_key, :validate => :password, :required => false
|
30
33
|
|
@@ -39,6 +42,9 @@ config :container, :validate => :string, :default => 'insights-logs-networksecur
|
|
39
42
|
# The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
|
40
43
|
config :registry_path, :validate => :string, :required => false, :default => 'data/registry.dat'
|
41
44
|
|
45
|
+
# If registry_local_path is set to a directory on the local server, the registry is save there instead of the remote blob_storage
|
46
|
+
config :registry_local_path, :validate => :string, :required => false
|
47
|
+
|
42
48
|
# The default, `resume`, will load the registry offsets and will start processing files from the offsets.
|
43
49
|
# When set to `start_over`, all log files are processed from begining.
|
44
50
|
# when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
|
@@ -55,6 +61,13 @@ config :registry_create_policy, :validate => ['resume','start_over','start_fresh
|
|
55
61
|
# Z00000000000000000000000000000000 2 ]}
|
56
62
|
config :interval, :validate => :number, :default => 60
|
57
63
|
|
64
|
+
config :addfilename, :validate => :boolean, :default => false, :required => false
|
65
|
+
# debug_until will for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. This is a lightweight alternative to switching the loglevel from info to debug or even trace
|
66
|
+
config :debug_until, :validate => :number, :default => 0, :required => false
|
67
|
+
|
68
|
+
# debug_timer show time spent on activities
|
69
|
+
config :debug_timer, :validate => :boolean, :default => false, :required => false
|
70
|
+
|
58
71
|
# WAD IIS Grok Pattern
|
59
72
|
#config :grokpattern, :validate => :string, :required => false, :default => '%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:instanceId} %{NOTSPACE:instanceId2} %{IPORHOST:ServerIP} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:httpVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}'
|
60
73
|
|
@@ -76,23 +89,30 @@ config :file_tail, :validate => :string, :required => false, :default => ']}'
|
|
76
89
|
# For NSGFLOWLOGS a path starts with "resourceId=/", but this would only be needed to exclude other files that may be written in the same container.
|
77
90
|
config :prefix, :validate => :string, :required => false
|
78
91
|
|
92
|
+
config :path_filters, :validate => :array, :default => ['**/*'], :required => false
|
93
|
+
|
94
|
+
# TODO: Other feature requests
|
95
|
+
# show file path in logger
|
96
|
+
# add filepath as part of log message
|
97
|
+
# option to keep registry on local disk
|
79
98
|
|
80
99
|
|
81
100
|
public
|
82
101
|
def register
|
83
102
|
@pipe_id = Thread.current[:name].split("[").last.split("]").first
|
84
|
-
@logger.info("=== "+config_name
|
85
|
-
#@logger.info("ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL } / #{Gem.loaded_specs[config_name].version.to_s}")
|
103
|
+
@logger.info("=== #{config_name} #{Gem.loaded_specs["logstash-input-"+config_name].version.to_s} / #{@pipe_id} / #{@id[0,6]} / ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL } ===")
|
86
104
|
@logger.info("If this plugin doesn't work, please raise an issue in https://github.com/janmg/logstash-input-azure_blob_storage")
|
87
105
|
# TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
|
88
106
|
# TODO: Implement retry ... Error: Connection refused - Failed to open TCP connection to
|
107
|
+
end
|
108
|
+
|
89
109
|
|
110
|
+
|
111
|
+
def run(queue)
|
90
112
|
# counter for all processed events since the start of this pipeline
|
91
113
|
@processed = 0
|
92
114
|
@regsaved = @processed
|
93
115
|
|
94
|
-
#@buffer = FileWatch::BufferedTokenizer.new('\n')
|
95
|
-
|
96
116
|
# Try in this order to access the storageaccount
|
97
117
|
# 1. storageaccount / sas_token
|
98
118
|
# 2. connection_string
|
@@ -111,31 +131,51 @@ def register
|
|
111
131
|
unless conn.nil?
|
112
132
|
@blob_client = Azure::Storage::Blob::BlobService.create_from_connection_string(conn)
|
113
133
|
else
|
134
|
+
# unless use_development_storage?
|
114
135
|
@blob_client = Azure::Storage::Blob::BlobService.create(
|
115
136
|
storage_account_name: storageaccount,
|
116
137
|
storage_dns_suffix: dns_suffix,
|
117
138
|
storage_access_key: access_key.value,
|
118
139
|
)
|
140
|
+
# else
|
141
|
+
# @logger.info("not yet implemented")
|
142
|
+
# end
|
119
143
|
end
|
120
144
|
|
121
145
|
@registry = Hash.new
|
122
146
|
if registry_create_policy == "resume"
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
147
|
+
for counter in 1..3
|
148
|
+
begin
|
149
|
+
if (!@registry_local_path.nil?)
|
150
|
+
unless File.file?(@registry_local_path+"/"+@pipe_id)
|
151
|
+
@registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
|
152
|
+
#[0] headers [1] responsebody
|
153
|
+
@logger.info("migrating from remote registry #{registry_path}")
|
154
|
+
else
|
155
|
+
if !Dir.exist?(@registry_local_path)
|
156
|
+
FileUtils.mkdir_p(@registry_local_path)
|
157
|
+
end
|
158
|
+
@registry = Marshal.load(File.read(@registry_local_path+"/"+@pipe_id))
|
159
|
+
@logger.info("resuming from local registry #{registry_local_path+"/"+@pipe_id}")
|
160
|
+
end
|
161
|
+
else
|
162
|
+
@registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
|
163
|
+
#[0] headers [1] responsebody
|
164
|
+
@logger.info("resuming from remote registry #{registry_path}")
|
165
|
+
end
|
166
|
+
break
|
167
|
+
rescue Exception => e
|
168
|
+
@logger.error("caught: #{e.message}")
|
169
|
+
@registry.clear
|
170
|
+
@logger.error("loading registry failed for attempt #{counter} of 3")
|
171
|
+
end
|
130
172
|
end
|
131
173
|
end
|
132
174
|
# read filelist and set offsets to file length to mark all the old files as done
|
133
175
|
if registry_create_policy == "start_fresh"
|
134
|
-
@logger.info(@pipe_id+" starting fresh")
|
135
176
|
@registry = list_blobs(true)
|
136
|
-
|
137
|
-
|
138
|
-
#end
|
177
|
+
save_registry(@registry)
|
178
|
+
@logger.info("starting fresh, writing a clean the registry to contain #{@registry.size} blobs/files")
|
139
179
|
end
|
140
180
|
|
141
181
|
@is_json = false
|
@@ -155,27 +195,32 @@ def register
|
|
155
195
|
if file_tail
|
156
196
|
@tail = file_tail
|
157
197
|
end
|
158
|
-
@logger.info(
|
198
|
+
@logger.info("head will be: #{@head} and tail is set to #{@tail}")
|
159
199
|
end
|
160
|
-
end # def register
|
161
|
-
|
162
200
|
|
163
|
-
|
164
|
-
def run(queue)
|
165
201
|
newreg = Hash.new
|
166
202
|
filelist = Hash.new
|
167
203
|
worklist = Hash.new
|
168
|
-
|
204
|
+
@last = start = Time.now.to_i
|
205
|
+
|
206
|
+
# This is the main loop, it
|
207
|
+
# 1. Lists all the files in the remote storage account that match the path prefix
|
208
|
+
# 2. Filters on path_filters to only include files that match the directory and file glob (**/*.json)
|
209
|
+
# 3. Save the listed files in a registry of known files and filesizes.
|
210
|
+
# 4. List all the files again and compare the registry with the new filelist and put the delta in a worklist
|
211
|
+
# 5. Process the worklist and put all events in the logstash queue.
|
212
|
+
# 6. if there is time left, sleep to complete the interval. If processing takes more than an inteval, save the registry and continue.
|
213
|
+
# 7. If stop signal comes, finish the current file, save the registry and quit
|
169
214
|
while !stop?
|
170
|
-
|
171
|
-
#
|
172
|
-
# TODO: sort by timestamp
|
215
|
+
# load the registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
|
216
|
+
# TODO: sort by timestamp ?
|
173
217
|
#filelist.sort_by(|k,v|resource(k)[:date])
|
174
218
|
worklist.clear
|
175
219
|
filelist.clear
|
176
220
|
newreg.clear
|
221
|
+
|
222
|
+
# Listing all the files
|
177
223
|
filelist = list_blobs(false)
|
178
|
-
# registry.merge(filelist) {|key, :offset, :length| :offset.merge :length }
|
179
224
|
filelist.each do |name, file|
|
180
225
|
off = 0
|
181
226
|
begin
|
@@ -184,31 +229,41 @@ def run(queue)
|
|
184
229
|
off = 0
|
185
230
|
end
|
186
231
|
newreg.store(name, { :offset => off, :length => file[:length] })
|
232
|
+
if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
|
187
233
|
end
|
188
|
-
|
234
|
+
# size nilClass when the list doesn't grow?!
|
189
235
|
# Worklist is the subset of files where the already read offset is smaller than the file size
|
190
236
|
worklist.clear
|
191
237
|
worklist = newreg.select {|name,file| file[:offset] < file[:length]}
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
238
|
+
if (worklist.size > 4) then @logger.info("worklist contains #{worklist.size} blobs") end
|
239
|
+
|
240
|
+
# Start of processing
|
241
|
+
# This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
|
242
|
+
if (worklist.size > 0) then
|
243
|
+
worklist.each do |name, file|
|
244
|
+
start = Time.now.to_i
|
245
|
+
if (@debug_until > @processed) then @logger.info("3: processing #{name} from #{file[:offset]} to #{file[:length]}") end
|
196
246
|
size = 0
|
197
247
|
if file[:offset] == 0
|
198
|
-
|
199
|
-
|
248
|
+
# This is where Sera4000 issue starts
|
249
|
+
begin
|
250
|
+
chunk = full_read(name)
|
251
|
+
size=chunk.size
|
252
|
+
rescue Exception => e
|
253
|
+
@logger.error("Failed to read #{name} because of: #{e.message} .. will continue and pretend this never happened")
|
254
|
+
end
|
200
255
|
else
|
201
256
|
chunk = partial_read_json(name, file[:offset], file[:length])
|
202
|
-
@logger.debug(
|
257
|
+
@logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
|
203
258
|
end
|
204
259
|
if logtype == "nsgflowlog" && @is_json
|
205
260
|
res = resource(name)
|
206
261
|
begin
|
207
262
|
fingjson = JSON.parse(chunk)
|
208
|
-
@processed += nsgflowlog(queue, fingjson)
|
209
|
-
@logger.debug(
|
263
|
+
@processed += nsgflowlog(queue, fingjson, name)
|
264
|
+
@logger.debug("Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
|
210
265
|
rescue JSON::ParserError
|
211
|
-
@logger.error(
|
266
|
+
@logger.error("parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
|
212
267
|
end
|
213
268
|
# TODO: Convert this to line based grokking.
|
214
269
|
# TODO: ECS Compliance?
|
@@ -216,29 +271,43 @@ def run(queue)
|
|
216
271
|
@processed += wadiislog(queue, name)
|
217
272
|
else
|
218
273
|
counter = 0
|
219
|
-
|
274
|
+
begin
|
275
|
+
@codec.decode(chunk) do |event|
|
220
276
|
counter += 1
|
277
|
+
if @addfilename
|
278
|
+
event.set('filename', name)
|
279
|
+
end
|
221
280
|
decorate(event)
|
222
281
|
queue << event
|
282
|
+
end
|
283
|
+
rescue Exception => e
|
284
|
+
@logger.error("codec exception: #{e.message} .. will continue and pretend this never happened")
|
285
|
+
@logger.debug("#{chunk}")
|
223
286
|
end
|
224
287
|
@processed += counter
|
225
288
|
end
|
226
289
|
@registry.store(name, { :offset => size, :length => file[:length] })
|
227
|
-
|
290
|
+
# TODO add input plugin option to prevent connection cache
|
291
|
+
@blob_client.client.reset_agents!
|
292
|
+
#@logger.info("name #{name} size #{size} len #{file[:length]}")
|
228
293
|
# if stop? good moment to stop what we're doing
|
229
294
|
if stop?
|
230
295
|
return
|
231
296
|
end
|
232
|
-
|
233
|
-
now = Time.now.to_i
|
234
|
-
if ((now - chrono) > interval)
|
297
|
+
if ((Time.now.to_i - @last) > @interval)
|
235
298
|
save_registry(@registry)
|
236
|
-
chrono += interval
|
237
299
|
end
|
300
|
+
end
|
301
|
+
end
|
302
|
+
# The files that got processed after the last registry save need to be saved too, in case the worklist is empty for some intervals.
|
303
|
+
now = Time.now.to_i
|
304
|
+
if ((now - @last) > @interval)
|
305
|
+
save_registry(@registry)
|
306
|
+
end
|
307
|
+
sleeptime = interval - ((now - start) % interval)
|
308
|
+
if @debug_timer
|
309
|
+
@logger.info("going to sleep for #{sleeptime} seconds")
|
238
310
|
end
|
239
|
-
# Save the registry and sleep until the remaining polling interval is over
|
240
|
-
save_registry(@registry)
|
241
|
-
sleeptime = interval - (Time.now.to_i - chrono)
|
242
311
|
Stud.stoppable_sleep(sleeptime) { stop? }
|
243
312
|
end
|
244
313
|
end
|
@@ -246,7 +315,9 @@ end
|
|
246
315
|
def stop
|
247
316
|
save_registry(@registry)
|
248
317
|
end
|
249
|
-
|
318
|
+
def close
|
319
|
+
save_registry(@registry)
|
320
|
+
end
|
250
321
|
|
251
322
|
|
252
323
|
private
|
@@ -274,8 +345,7 @@ def strip_comma(str)
|
|
274
345
|
end
|
275
346
|
|
276
347
|
|
277
|
-
|
278
|
-
def nsgflowlog(queue, json)
|
348
|
+
def nsgflowlog(queue, json, name)
|
279
349
|
count=0
|
280
350
|
json["records"].each do |record|
|
281
351
|
res = resource(record["resourceId"])
|
@@ -288,9 +358,16 @@ def nsgflowlog(queue, json)
|
|
288
358
|
tups = tup.split(',')
|
289
359
|
ev = rule.merge({:unixtimestamp => tups[0], :src_ip => tups[1], :dst_ip => tups[2], :src_port => tups[3], :dst_port => tups[4], :protocol => tups[5], :direction => tups[6], :decision => tups[7]})
|
290
360
|
if (record["properties"]["Version"]==2)
|
361
|
+
tups[9] = 0 if tups[9].nil?
|
362
|
+
tups[10] = 0 if tups[10].nil?
|
363
|
+
tups[11] = 0 if tups[11].nil?
|
364
|
+
tups[12] = 0 if tups[12].nil?
|
291
365
|
ev.merge!( {:flowstate => tups[8], :src_pack => tups[9], :src_bytes => tups[10], :dst_pack => tups[11], :dst_bytes => tups[12]} )
|
292
366
|
end
|
293
367
|
@logger.trace(ev.to_s)
|
368
|
+
if @addfilename
|
369
|
+
ev.merge!( {:filename => name } )
|
370
|
+
end
|
294
371
|
event = LogStash::Event.new('message' => ev.to_json)
|
295
372
|
decorate(event)
|
296
373
|
queue << event
|
@@ -321,67 +398,107 @@ end
|
|
321
398
|
# list all blobs in the blobstore, set the offsets from the registry and return the filelist
|
322
399
|
# inspired by: https://github.com/Azure-Samples/storage-blobs-ruby-quickstart/blob/master/example.rb
|
323
400
|
def list_blobs(fill)
|
324
|
-
|
325
|
-
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
330
|
-
|
331
|
-
|
401
|
+
tries ||= 3
|
402
|
+
begin
|
403
|
+
return try_list_blobs(fill)
|
404
|
+
rescue Exception => e
|
405
|
+
@logger.error("caught: #{e.message} for list_blobs retries left #{tries}")
|
406
|
+
if (tries -= 1) > 0
|
407
|
+
retry
|
408
|
+
end
|
409
|
+
end
|
410
|
+
end
|
411
|
+
|
412
|
+
def try_list_blobs(fill)
|
413
|
+
# inspired by: http://blog.mirthlab.com/2012/05/25/cleanly-retrying-blocks-of-code-after-an-exception-in-ruby/
|
414
|
+
chrono = Time.now.to_i
|
415
|
+
files = Hash.new
|
416
|
+
nextMarker = nil
|
417
|
+
counter = 1
|
418
|
+
loop do
|
332
419
|
blobs = @blob_client.list_blobs(container, { marker: nextMarker, prefix: @prefix})
|
333
420
|
blobs.each do |blob|
|
334
|
-
|
335
|
-
|
421
|
+
# FNM_PATHNAME is required so that "**/test" can match "test" at the root folder
|
422
|
+
# FNM_EXTGLOB allows you to use "test{a,b,c}" to match either "testa", "testb" or "testc" (closer to shell behavior)
|
423
|
+
unless blob.name == registry_path
|
424
|
+
if @path_filters.any? {|path| File.fnmatch?(path, blob.name, File::FNM_PATHNAME | File::FNM_EXTGLOB)}
|
336
425
|
length = blob.properties[:content_length].to_i
|
337
|
-
|
426
|
+
offset = 0
|
338
427
|
if fill
|
339
428
|
offset = length
|
340
|
-
|
429
|
+
end
|
341
430
|
files.store(blob.name, { :offset => offset, :length => length })
|
431
|
+
if (@debug_until > @processed) then @logger.info("1: list_blobs #{blob.name} #{offset} #{length}") end
|
342
432
|
end
|
433
|
+
end
|
343
434
|
end
|
344
435
|
nextMarker = blobs.continuation_token
|
345
436
|
break unless nextMarker && !nextMarker.empty?
|
346
|
-
|
347
|
-
|
348
|
-
|
349
|
-
|
350
|
-
|
437
|
+
if (counter % 10 == 0) then @logger.info(" listing #{counter * 50000} files") end
|
438
|
+
counter+=1
|
439
|
+
end
|
440
|
+
if @debug_timer
|
441
|
+
@logger.info("list_blobs took #{Time.now.to_i - chrono} sec")
|
442
|
+
end
|
351
443
|
return files
|
352
444
|
end
|
353
445
|
|
354
446
|
# When events were processed after the last registry save, start a thread to update the registry file.
|
355
447
|
def save_registry(filelist)
|
356
|
-
#
|
448
|
+
# Because of threading, processed values and regsaved are not thread safe, they can change as instance variable @! Most of the time this is fine because the registry is the last resort, but be careful about corner cases!
|
357
449
|
unless @processed == @regsaved
|
358
450
|
@regsaved = @processed
|
359
|
-
|
360
|
-
|
451
|
+
unless (@busy_writing_registry)
|
452
|
+
Thread.new {
|
361
453
|
begin
|
362
|
-
@
|
454
|
+
@busy_writing_registry = true
|
455
|
+
unless (@registry_local_path)
|
456
|
+
@blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
|
457
|
+
@logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to remote registry #{registry_path}")
|
458
|
+
else
|
459
|
+
File.open(@registry_local_path+"/"+@pipe_id, 'w') { |file| file.write(Marshal.dump(filelist)) }
|
460
|
+
@logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to local registry #{registry_local_path+"/"+@pipe_id}")
|
461
|
+
end
|
462
|
+
@busy_writing_registry = false
|
463
|
+
@last = Time.now.to_i
|
363
464
|
rescue
|
364
|
-
@logger.error(
|
465
|
+
@logger.error("Oh my, registry write failed, do you have write access?")
|
365
466
|
end
|
366
467
|
}
|
468
|
+
else
|
469
|
+
@logger.info("Skipped writing the registry because previous write still in progress, it just takes long or may be hanging!")
|
470
|
+
end
|
367
471
|
end
|
368
472
|
end
|
369
473
|
|
474
|
+
|
370
475
|
def learn_encapsulation
|
371
476
|
# From one file, read first block and last block to learn head and tail
|
372
|
-
|
373
|
-
|
374
|
-
|
375
|
-
|
376
|
-
|
377
|
-
|
378
|
-
|
379
|
-
|
380
|
-
|
381
|
-
|
382
|
-
|
383
|
-
|
384
|
-
|
477
|
+
begin
|
478
|
+
blobs = @blob_client.list_blobs(container, { maxresults: 3, prefix: @prefix})
|
479
|
+
blobs.each do |blob|
|
480
|
+
unless blob.name == registry_path
|
481
|
+
begin
|
482
|
+
blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
|
483
|
+
if blocks.first.name.start_with?('A00')
|
484
|
+
@logger.debug("using #{blob.name}/#{blocks.first.name} to learn the json header")
|
485
|
+
@head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
|
486
|
+
end
|
487
|
+
if blocks.last.name.start_with?('Z00')
|
488
|
+
@logger.debug("using #{blob.name}/#{blocks.last.name} to learn the json footer")
|
489
|
+
length = blob.properties[:content_length].to_i
|
490
|
+
offset = length - blocks.last.size
|
491
|
+
@tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
|
492
|
+
@logger.debug("learned tail: #{@tail}")
|
493
|
+
end
|
494
|
+
rescue Exception => e
|
495
|
+
@logger.info("learn json one of the attempts failed #{e.message}")
|
496
|
+
end
|
497
|
+
end
|
498
|
+
end
|
499
|
+
rescue Exception => e
|
500
|
+
@logger.info("learn json header and footer failed because #{e.message}")
|
501
|
+
end
|
385
502
|
end
|
386
503
|
|
387
504
|
def resource(str)
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'logstash-input-azure_blob_storage'
|
3
|
-
s.version = '0.11.
|
3
|
+
s.version = '0.11.5'
|
4
4
|
s.licenses = ['Apache-2.0']
|
5
5
|
s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
|
6
6
|
s.description = <<-EOF
|
@@ -22,6 +22,6 @@ EOF
|
|
22
22
|
# Gem dependencies
|
23
23
|
s.add_runtime_dependency 'logstash-core-plugin-api', '~> 2.1'
|
24
24
|
s.add_runtime_dependency 'stud', '~> 0.0.23'
|
25
|
-
s.add_runtime_dependency 'azure-storage-blob', '~> 1.
|
26
|
-
s.add_development_dependency 'logstash-devutils', '~>
|
25
|
+
s.add_runtime_dependency 'azure-storage-blob', '~> 1.1'
|
26
|
+
#s.add_development_dependency 'logstash-devutils', '~> 2'
|
27
27
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-input-azure_blob_storage
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.11.
|
4
|
+
version: 0.11.5
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Geertsma
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2020-12-19 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -17,8 +17,8 @@ dependencies:
|
|
17
17
|
- !ruby/object:Gem::Version
|
18
18
|
version: '2.1'
|
19
19
|
name: logstash-core-plugin-api
|
20
|
-
prerelease: false
|
21
20
|
type: :runtime
|
21
|
+
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - "~>"
|
@@ -31,8 +31,8 @@ dependencies:
|
|
31
31
|
- !ruby/object:Gem::Version
|
32
32
|
version: 0.0.23
|
33
33
|
name: stud
|
34
|
-
prerelease: false
|
35
34
|
type: :runtime
|
35
|
+
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
38
|
- - "~>"
|
@@ -43,35 +43,15 @@ dependencies:
|
|
43
43
|
requirements:
|
44
44
|
- - "~>"
|
45
45
|
- !ruby/object:Gem::Version
|
46
|
-
version: '1.
|
46
|
+
version: '1.1'
|
47
47
|
name: azure-storage-blob
|
48
|
-
prerelease: false
|
49
48
|
type: :runtime
|
50
|
-
version_requirements: !ruby/object:Gem::Requirement
|
51
|
-
requirements:
|
52
|
-
- - "~>"
|
53
|
-
- !ruby/object:Gem::Version
|
54
|
-
version: '1.0'
|
55
|
-
- !ruby/object:Gem::Dependency
|
56
|
-
requirement: !ruby/object:Gem::Requirement
|
57
|
-
requirements:
|
58
|
-
- - ">="
|
59
|
-
- !ruby/object:Gem::Version
|
60
|
-
version: 1.0.0
|
61
|
-
- - "~>"
|
62
|
-
- !ruby/object:Gem::Version
|
63
|
-
version: '1.0'
|
64
|
-
name: logstash-devutils
|
65
49
|
prerelease: false
|
66
|
-
type: :development
|
67
50
|
version_requirements: !ruby/object:Gem::Requirement
|
68
51
|
requirements:
|
69
|
-
- - ">="
|
70
|
-
- !ruby/object:Gem::Version
|
71
|
-
version: 1.0.0
|
72
52
|
- - "~>"
|
73
53
|
- !ruby/object:Gem::Version
|
74
|
-
version: '1.
|
54
|
+
version: '1.1'
|
75
55
|
description: " This gem is a Logstash plugin. It reads and parses data from Azure\
|
76
56
|
\ Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob\
|
77
57
|
\ from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial\
|
@@ -112,8 +92,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
112
92
|
- !ruby/object:Gem::Version
|
113
93
|
version: '0'
|
114
94
|
requirements: []
|
115
|
-
|
116
|
-
rubygems_version: 2.7.9
|
95
|
+
rubygems_version: 3.0.6
|
117
96
|
signing_key:
|
118
97
|
specification_version: 4
|
119
98
|
summary: This logstash plugin reads and parses data from Azure Storage Blobs.
|