logstash-input-azure_blob_storage 0.11.0 → 0.11.5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cb93fc423babf6bc4cd7b13b1280a27ea6156fa2aeebe69ab172d8e925940d2c
4
- data.tar.gz: ae92a22e56d87cc9d4d0f0b615460a8da33afb822aaa3f0d109cd3b86137ee53
3
+ metadata.gz: 3d446aed971a95e6e17a27ed1e9ec8b141f939b53697fb9c332cfb130404745a
4
+ data.tar.gz: 4a1321f6c6a30f6787d2133642ca23840371d6f4e18102cb775d345b09eb176a
5
5
  SHA512:
6
- metadata.gz: f5a311f322b04740a98182271e3074171feea7b7e899e3ec712b457182d91f34033cc73b1219042bad727f0c6566b1c3cf0ce362e5cc9d2b1e4d09a2029d5456
7
- data.tar.gz: 53072a976feddc171ad02960fdcec6612099caaade967c28a501ba1ca413f1dccdc071e050e322eefa84a266b0c9a9ed487a6d98a36c62480e573421b2fc27b7
6
+ metadata.gz: b4f48a0bebcd6e3594584a4473b223838359d44e9ef591f958aa4c80c4c22953f6b0f708b19faeaf0517c66f47185bda4de75ab4e3618b23e2e7f23f71cb4bee
7
+ data.tar.gz: 508cd39ea159a4655e590f46ad0108c3b6e6de95ed575c4456da0230bae73fb384ecb7697ed710e7afb1542fe01cbd8a62130acedcbf0ba9c3040ace1f9d76d0
@@ -1,24 +1,55 @@
1
+ ## 0.11.5
2
+ - Added optional filename into the message
3
+ - plumbing for emulator, start_over not learning from registry
4
+
5
+ ## 0.11.4
6
+ - fixed listing 3 times, rather than retrying to list max 3 times
7
+ - added option to migrate/save to using local registry
8
+ - rewrote interval timing
9
+ - reduced saving of registry to maximum once per interval, protect duplicate simultanious writes
10
+ - added debug_timer for better tracing how long operations take
11
+ - removing pipeline name from logfiles, logstash 7.6 and up have this in the log4j2 by default now
12
+ - moved initialization from register to run. should make logs more readable
13
+
14
+ ## 0.11.3
15
+ - don't crash on failed codec, e.g. gzip_lines could sometimes have a corrupted file?
16
+ - fix nextmarker loop so that more than 5000 files (or 15000 if faraday doesn't crash)
17
+
18
+ ## 0.11.2
19
+ - implemented path_filters to to use path filtering like this **/*.log
20
+ - implemented debug_until to debug only at the start of a pipeline until it processed enough messages
21
+
22
+ ## 0.11.1
23
+ - copied changes from irnc fork (danke!)
24
+ - fixed trying to load the registry, three time is the charm
25
+ - logs are less chatty, changed info to debug
26
+
27
+ ## 0.11.0
28
+ - implemented start_fresh to skip all previous logs and start monitoring new entries
29
+ - fixed the timer, now properly sleep the interval and check again
30
+ - work around for a Faraday Middleware v.s. Azure Storage Account bug in follow_redirect
31
+
1
32
  ## 0.10.6
2
- - Fixed the rootcause of the checking the codec. Now compare the classname.
33
+ - fixed the rootcause of the checking the codec. Now compare the classname.
3
34
 
4
35
  ## 0.10.5
5
- - Previous fix broke codec = "line"
36
+ - previous fix broke codec = "line"
6
37
 
7
38
  ## 0.10.4
8
- - Fixed JSON parsing error for partial files because somehow (logstash 7?) @codec.is_a? doesn't work anymore
39
+ - fixed JSON parsing error for partial files because somehow (logstash 7?) @codec.is_a? doesn't work anymore
9
40
 
10
41
  ## 0.10.3
11
- - Fixed issue-1 where iplookup confguration was removed, but still used
42
+ - fixed issue-1 where iplookup confguration was removed, but still used
12
43
  - iplookup is now done by a separate plugin named logstash-filter-weblookup
13
44
 
14
45
  ## 0.10.2
15
46
  - moved iplookup to own plugin logstash-filter-lookup
16
47
 
17
48
  ## 0.10.1
18
- - Implemented iplookup
19
- - Fixed sas tokens (maybe)
20
- - Introduced dns_suffix
49
+ - implemented iplookup
50
+ - fixed sas tokens (maybe)
51
+ - introduced dns_suffix
21
52
 
22
53
  ## 0.10.0
23
- - Plugin created with the logstash plugin generator
24
- - Reimplemented logstash-input-azureblob with incompatible config and data/registry
54
+ - plugin created with the logstash plugin generator
55
+ - reimplemented logstash-input-azureblob with incompatible config and data/registry
data/README.md CHANGED
@@ -6,7 +6,7 @@ It is fully free and fully open source. The license is Apache 2.0, meaning you a
6
6
 
7
7
  ## Documentation
8
8
 
9
- All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
9
+ All logstash plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
10
10
 
11
11
  ## Need Help?
12
12
 
@@ -15,15 +15,61 @@ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/log
15
15
  ## Purpose
16
16
  This plugin can read from Azure Storage Blobs, for instance diagnostics logs for NSG flow logs or accesslogs from App Services.
17
17
  [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
18
-
19
- After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format. use source => message in the filter {} block.
20
-
18
+ This
21
19
  ## Installation
22
20
  This plugin can be installed through logstash-plugin
23
21
  ```
24
22
  logstash-plugin install logstash-input-azure_blob_storage
25
23
  ```
26
24
 
25
+ ## Minimal Configuration
26
+ The minimum configuration required as input is storageaccount, access_key and container.
27
+
28
+ ```
29
+ input {
30
+ azure_blob_storage {
31
+ storageaccount => "yourstorageaccountname"
32
+ access_key => "Ba5e64c0d3=="
33
+ container => "insights-logs-networksecuritygroupflowevent"
34
+ }
35
+ }
36
+ ```
37
+
38
+ ## Additional Configuration
39
+ The registry_create_policy is used when the pipeline is started to either resume from the last known unprocessed file, or to start_fresh ignoring old files or start_over to process all the files from the beginning.
40
+
41
+ interval defines the minimum time the registry should be saved to the registry file (by default 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
42
+
43
+ When registry_local_path is set to a directory, the registry is save on the logstash server in that directory. The filename is the pipe.id
44
+
45
+ with registry_create_policy set to resume and the registry_local_path set to a directory where the registry isn't yet created, should load from the storage account and save the registry on the local server
46
+
47
+ During the pipeline start for JSON codec, the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually.
48
+
49
+ ## Running the pipeline
50
+ The pipeline can be started in several ways.
51
+ - On the commandline
52
+ ```
53
+ /usr/share/logstash/bin/logtash -f /etc/logstash/pipeline.d/test.yml
54
+ ```
55
+ - In the pipeline.yml
56
+ ```
57
+ /etc/logstash/pipeline.yml
58
+ pipe.id = test
59
+ pipe.path = /etc/logstash/pipeline.d/test.yml
60
+ ```
61
+ - As managed pipeline from Kibana
62
+
63
+ Logstash itself (so not specific to this plugin) has a feature where multiple instances can run on the same system. The default TCP port is 9600, but if it's already in use it will use 9601 (and up). To update a config file on a running instance on the commandline you can add the argument --config.reload.automatic and if you modify the files that are in the pipeline.yml you can send a SIGHUP channel to reload the pipelines where the config was changed.
64
+ [https://www.elastic.co/guide/en/logstash/current/reloading-config.html](https://www.elastic.co/guide/en/logstash/current/reloading-config.html)
65
+
66
+ ## Internal Working
67
+ When the plugin is started, it will read all the filenames and sizes in the blob store excluding the directies of files that are excluded by the "path_filters". After every interval it will write a registry to the storageaccount to save the information of how many bytes per blob (file) are read and processed. After all files are processed and at least one interval has passed a new file list is generated and a worklist is constructed that will be processed. When a file has already been processed before, partial files are read from the offset to the filesize at the time of the file listing. If the codec is JSON partial files will be have the header and tail will be added. They can be configured. If logtype is nsgflowlog, the plugin will process the splitting into individual tuple events. The logtype wadiis may in the future be used to process the grok formats to split into log lines. Any other format is fed into the queue as one event per file or partial file. It's then up to the filter to split and mutate the file format.
68
+
69
+ By default the root of the json message is named "message" so you can modify the content in the filter block
70
+
71
+ The configurations and the rest of the code are in [https://github.com/janmg/logstash-input-azure_blob_storage/tree/master/lib/logstash/inputs](lib/logstash/inputs) [https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L10](azure_blob_storage.rb)
72
+
27
73
  ## Enabling NSG Flowlogs
28
74
  1. Enable Network Watcher in your regions
29
75
  2. Create Storage account per region
@@ -39,7 +85,6 @@ logstash-plugin install logstash-input-azure_blob_storage
39
85
  - Access key (key1 or key2)
40
86
 
41
87
  ## Troubleshooting
42
-
43
88
  The default loglevel can be changed in global logstash.yml. On the info level, the plugin save offsets to the registry every interval and will log statistics of processed events (one ) plugin will print for each pipeline the first 6 characters of the ID, in DEBUG the yml log level debug shows details of number of events per (partial) files that are read.
44
89
  ```
45
90
  log.level
@@ -50,10 +95,9 @@ The log level of the plugin can be put into DEBUG through
50
95
  curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.inputs.azureblobstorage" : "DEBUG"}'
51
96
  ```
52
97
 
98
+ because debug also makes logstash chatty, there are also debug_timer and debug_until that can be used to print additional informantion on what the pipeline is doing and how long it takes. debug_until is for the number of events until debug is disabled.
53
99
 
54
- ## Configuration Examples
55
- The minimum configuration required as input is storageaccount, access_key and container.
56
-
100
+ ## Other Configuration Examples
57
101
  For nsgflowlogs, a simple configuration looks like this
58
102
  ```
59
103
  input {
@@ -85,7 +129,6 @@ output {
85
129
  }
86
130
  ```
87
131
 
88
- It's possible to specify the optional parameters to overwrite the defaults. The iplookup, use_redis and iplist parameters are used for additional information about the source and destination ip address. Redis can be used for caching the results and iplist is to configure an array of ip addresses.
89
132
  ```
90
133
  input {
91
134
  azure_blob_storage {
@@ -138,7 +181,7 @@ filter {
138
181
  remove_field => ["subresponse"]
139
182
  remove_field => ["username"]
140
183
  remove_field => ["clientPort"]
141
- remove_field => ["port"]
184
+ remove_field => ["port"]:0
142
185
  remove_field => ["timestamp"]
143
186
  }
144
187
  }
@@ -25,6 +25,9 @@ config :storageaccount, :validate => :string, :required => false
25
25
  # DNS Suffix other then blob.core.windows.net
26
26
  config :dns_suffix, :validate => :string, :required => false, :default => 'core.windows.net'
27
27
 
28
+ # For development this can be used to emulate an accountstorage when not available from azure
29
+ #config :use_development_storage, :validate => :boolean, :required => false
30
+
28
31
  # The (primary or secondary) Access Key for the the storage account. The key can be found in the portal.azure.com or through the azure api StorageAccounts/ListKeys. For example the PowerShell command Get-AzStorageAccountKey.
29
32
  config :access_key, :validate => :password, :required => false
30
33
 
@@ -39,6 +42,9 @@ config :container, :validate => :string, :default => 'insights-logs-networksecur
39
42
  # The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
40
43
  config :registry_path, :validate => :string, :required => false, :default => 'data/registry.dat'
41
44
 
45
+ # If registry_local_path is set to a directory on the local server, the registry is save there instead of the remote blob_storage
46
+ config :registry_local_path, :validate => :string, :required => false
47
+
42
48
  # The default, `resume`, will load the registry offsets and will start processing files from the offsets.
43
49
  # When set to `start_over`, all log files are processed from begining.
44
50
  # when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
@@ -55,6 +61,13 @@ config :registry_create_policy, :validate => ['resume','start_over','start_fresh
55
61
  # Z00000000000000000000000000000000 2 ]}
56
62
  config :interval, :validate => :number, :default => 60
57
63
 
64
+ config :addfilename, :validate => :boolean, :default => false, :required => false
65
+ # debug_until will for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. This is a lightweight alternative to switching the loglevel from info to debug or even trace
66
+ config :debug_until, :validate => :number, :default => 0, :required => false
67
+
68
+ # debug_timer show time spent on activities
69
+ config :debug_timer, :validate => :boolean, :default => false, :required => false
70
+
58
71
  # WAD IIS Grok Pattern
59
72
  #config :grokpattern, :validate => :string, :required => false, :default => '%{TIMESTAMP_ISO8601:log_timestamp} %{NOTSPACE:instanceId} %{NOTSPACE:instanceId2} %{IPORHOST:ServerIP} %{WORD:httpMethod} %{URIPATH:requestUri} %{NOTSPACE:requestQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:httpVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:host} %{NUMBER:httpStatus} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:sentBytes:int} %{NUMBER:receivedBytes:int} %{NUMBER:timeTaken:int}'
60
73
 
@@ -76,23 +89,30 @@ config :file_tail, :validate => :string, :required => false, :default => ']}'
76
89
  # For NSGFLOWLOGS a path starts with "resourceId=/", but this would only be needed to exclude other files that may be written in the same container.
77
90
  config :prefix, :validate => :string, :required => false
78
91
 
92
+ config :path_filters, :validate => :array, :default => ['**/*'], :required => false
93
+
94
+ # TODO: Other feature requests
95
+ # show file path in logger
96
+ # add filepath as part of log message
97
+ # option to keep registry on local disk
79
98
 
80
99
 
81
100
  public
82
101
  def register
83
102
  @pipe_id = Thread.current[:name].split("[").last.split("]").first
84
- @logger.info("=== "+config_name+" / "+@pipe_id+" / "+@id[0,6]+" ===")
85
- #@logger.info("ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL } / #{Gem.loaded_specs[config_name].version.to_s}")
103
+ @logger.info("=== #{config_name} #{Gem.loaded_specs["logstash-input-"+config_name].version.to_s} / #{@pipe_id} / #{@id[0,6]} / ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL } ===")
86
104
  @logger.info("If this plugin doesn't work, please raise an issue in https://github.com/janmg/logstash-input-azure_blob_storage")
87
105
  # TODO: consider multiple readers, so add pipeline @id or use logstash-to-logstash communication?
88
106
  # TODO: Implement retry ... Error: Connection refused - Failed to open TCP connection to
107
+ end
108
+
89
109
 
110
+
111
+ def run(queue)
90
112
  # counter for all processed events since the start of this pipeline
91
113
  @processed = 0
92
114
  @regsaved = @processed
93
115
 
94
- #@buffer = FileWatch::BufferedTokenizer.new('\n')
95
-
96
116
  # Try in this order to access the storageaccount
97
117
  # 1. storageaccount / sas_token
98
118
  # 2. connection_string
@@ -111,31 +131,51 @@ def register
111
131
  unless conn.nil?
112
132
  @blob_client = Azure::Storage::Blob::BlobService.create_from_connection_string(conn)
113
133
  else
134
+ # unless use_development_storage?
114
135
  @blob_client = Azure::Storage::Blob::BlobService.create(
115
136
  storage_account_name: storageaccount,
116
137
  storage_dns_suffix: dns_suffix,
117
138
  storage_access_key: access_key.value,
118
139
  )
140
+ # else
141
+ # @logger.info("not yet implemented")
142
+ # end
119
143
  end
120
144
 
121
145
  @registry = Hash.new
122
146
  if registry_create_policy == "resume"
123
- begin
124
- @logger.info(@pipe_id+" resuming from registry")
125
- @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
126
- #[0] headers [1] responsebody
127
- rescue
128
- @registry.clear
129
- @logger.error(@pipe_id+" loading registry failed, starting over")
147
+ for counter in 1..3
148
+ begin
149
+ if (!@registry_local_path.nil?)
150
+ unless File.file?(@registry_local_path+"/"+@pipe_id)
151
+ @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
152
+ #[0] headers [1] responsebody
153
+ @logger.info("migrating from remote registry #{registry_path}")
154
+ else
155
+ if !Dir.exist?(@registry_local_path)
156
+ FileUtils.mkdir_p(@registry_local_path)
157
+ end
158
+ @registry = Marshal.load(File.read(@registry_local_path+"/"+@pipe_id))
159
+ @logger.info("resuming from local registry #{registry_local_path+"/"+@pipe_id}")
160
+ end
161
+ else
162
+ @registry = Marshal.load(@blob_client.get_blob(container, registry_path)[1])
163
+ #[0] headers [1] responsebody
164
+ @logger.info("resuming from remote registry #{registry_path}")
165
+ end
166
+ break
167
+ rescue Exception => e
168
+ @logger.error("caught: #{e.message}")
169
+ @registry.clear
170
+ @logger.error("loading registry failed for attempt #{counter} of 3")
171
+ end
130
172
  end
131
173
  end
132
174
  # read filelist and set offsets to file length to mark all the old files as done
133
175
  if registry_create_policy == "start_fresh"
134
- @logger.info(@pipe_id+" starting fresh")
135
176
  @registry = list_blobs(true)
136
- #tempreg.each do |name, file|
137
- # @registry.store(name, { :offset => file[:length], :length => file[:length] })
138
- #end
177
+ save_registry(@registry)
178
+ @logger.info("starting fresh, writing a clean the registry to contain #{@registry.size} blobs/files")
139
179
  end
140
180
 
141
181
  @is_json = false
@@ -155,27 +195,32 @@ def register
155
195
  if file_tail
156
196
  @tail = file_tail
157
197
  end
158
- @logger.info(@pipe_id+" head will be: #{@head} and tail is set to #{@tail}")
198
+ @logger.info("head will be: #{@head} and tail is set to #{@tail}")
159
199
  end
160
- end # def register
161
-
162
200
 
163
-
164
- def run(queue)
165
201
  newreg = Hash.new
166
202
  filelist = Hash.new
167
203
  worklist = Hash.new
168
- # we can abort the loop if stop? becomes true
204
+ @last = start = Time.now.to_i
205
+
206
+ # This is the main loop, it
207
+ # 1. Lists all the files in the remote storage account that match the path prefix
208
+ # 2. Filters on path_filters to only include files that match the directory and file glob (**/*.json)
209
+ # 3. Save the listed files in a registry of known files and filesizes.
210
+ # 4. List all the files again and compare the registry with the new filelist and put the delta in a worklist
211
+ # 5. Process the worklist and put all events in the logstash queue.
212
+ # 6. if there is time left, sleep to complete the interval. If processing takes more than an inteval, save the registry and continue.
213
+ # 7. If stop signal comes, finish the current file, save the registry and quit
169
214
  while !stop?
170
- chrono = Time.now.to_i
171
- # load te registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
172
- # TODO: sort by timestamp
215
+ # load the registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
216
+ # TODO: sort by timestamp ?
173
217
  #filelist.sort_by(|k,v|resource(k)[:date])
174
218
  worklist.clear
175
219
  filelist.clear
176
220
  newreg.clear
221
+
222
+ # Listing all the files
177
223
  filelist = list_blobs(false)
178
- # registry.merge(filelist) {|key, :offset, :length| :offset.merge :length }
179
224
  filelist.each do |name, file|
180
225
  off = 0
181
226
  begin
@@ -184,31 +229,41 @@ def run(queue)
184
229
  off = 0
185
230
  end
186
231
  newreg.store(name, { :offset => off, :length => file[:length] })
232
+ if (@debug_until > @processed) then @logger.info("2: adding offsets: #{name} #{off} #{file[:length]}") end
187
233
  end
188
-
234
+ # size nilClass when the list doesn't grow?!
189
235
  # Worklist is the subset of files where the already read offset is smaller than the file size
190
236
  worklist.clear
191
237
  worklist = newreg.select {|name,file| file[:offset] < file[:length]}
192
- # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
193
- worklist.each do |name, file|
194
- #res = resource(name)
195
- @logger.info(@pipe_id+" processing #{name} from #{file[:offset]} to #{file[:length]}")
238
+ if (worklist.size > 4) then @logger.info("worklist contains #{worklist.size} blobs") end
239
+
240
+ # Start of processing
241
+ # This would be ideal for threading since it's IO intensive, would be nice with a ruby native ThreadPool
242
+ if (worklist.size > 0) then
243
+ worklist.each do |name, file|
244
+ start = Time.now.to_i
245
+ if (@debug_until > @processed) then @logger.info("3: processing #{name} from #{file[:offset]} to #{file[:length]}") end
196
246
  size = 0
197
247
  if file[:offset] == 0
198
- chunk = full_read(name)
199
- size=chunk.size
248
+ # This is where Sera4000 issue starts
249
+ begin
250
+ chunk = full_read(name)
251
+ size=chunk.size
252
+ rescue Exception => e
253
+ @logger.error("Failed to read #{name} because of: #{e.message} .. will continue and pretend this never happened")
254
+ end
200
255
  else
201
256
  chunk = partial_read_json(name, file[:offset], file[:length])
202
- @logger.debug(@pipe_id+" partial file #{name} from #{file[:offset]} to #{file[:length]}")
257
+ @logger.debug("partial file #{name} from #{file[:offset]} to #{file[:length]}")
203
258
  end
204
259
  if logtype == "nsgflowlog" && @is_json
205
260
  res = resource(name)
206
261
  begin
207
262
  fingjson = JSON.parse(chunk)
208
- @processed += nsgflowlog(queue, fingjson)
209
- @logger.debug(@pipe_id+" Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
263
+ @processed += nsgflowlog(queue, fingjson, name)
264
+ @logger.debug("Processed #{res[:nsg]} [#{res[:date]}] #{@processed} events")
210
265
  rescue JSON::ParserError
211
- @logger.error(@pipe_id+" parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
266
+ @logger.error("parse error on #{res[:nsg]} [#{res[:date]}] offset: #{file[:offset]} length: #{file[:length]}")
212
267
  end
213
268
  # TODO: Convert this to line based grokking.
214
269
  # TODO: ECS Compliance?
@@ -216,29 +271,43 @@ def run(queue)
216
271
  @processed += wadiislog(queue, name)
217
272
  else
218
273
  counter = 0
219
- @codec.decode(chunk) do |event|
274
+ begin
275
+ @codec.decode(chunk) do |event|
220
276
  counter += 1
277
+ if @addfilename
278
+ event.set('filename', name)
279
+ end
221
280
  decorate(event)
222
281
  queue << event
282
+ end
283
+ rescue Exception => e
284
+ @logger.error("codec exception: #{e.message} .. will continue and pretend this never happened")
285
+ @logger.debug("#{chunk}")
223
286
  end
224
287
  @processed += counter
225
288
  end
226
289
  @registry.store(name, { :offset => size, :length => file[:length] })
227
- #@logger.info(@pipe_id+" name #{name} size #{size} len #{file[:length]}")
290
+ # TODO add input plugin option to prevent connection cache
291
+ @blob_client.client.reset_agents!
292
+ #@logger.info("name #{name} size #{size} len #{file[:length]}")
228
293
  # if stop? good moment to stop what we're doing
229
294
  if stop?
230
295
  return
231
296
  end
232
- # save the registry past the regular intervals
233
- now = Time.now.to_i
234
- if ((now - chrono) > interval)
297
+ if ((Time.now.to_i - @last) > @interval)
235
298
  save_registry(@registry)
236
- chrono += interval
237
299
  end
300
+ end
301
+ end
302
+ # The files that got processed after the last registry save need to be saved too, in case the worklist is empty for some intervals.
303
+ now = Time.now.to_i
304
+ if ((now - @last) > @interval)
305
+ save_registry(@registry)
306
+ end
307
+ sleeptime = interval - ((now - start) % interval)
308
+ if @debug_timer
309
+ @logger.info("going to sleep for #{sleeptime} seconds")
238
310
  end
239
- # Save the registry and sleep until the remaining polling interval is over
240
- save_registry(@registry)
241
- sleeptime = interval - (Time.now.to_i - chrono)
242
311
  Stud.stoppable_sleep(sleeptime) { stop? }
243
312
  end
244
313
  end
@@ -246,7 +315,9 @@ end
246
315
  def stop
247
316
  save_registry(@registry)
248
317
  end
249
-
318
+ def close
319
+ save_registry(@registry)
320
+ end
250
321
 
251
322
 
252
323
  private
@@ -274,8 +345,7 @@ def strip_comma(str)
274
345
  end
275
346
 
276
347
 
277
-
278
- def nsgflowlog(queue, json)
348
+ def nsgflowlog(queue, json, name)
279
349
  count=0
280
350
  json["records"].each do |record|
281
351
  res = resource(record["resourceId"])
@@ -288,9 +358,16 @@ def nsgflowlog(queue, json)
288
358
  tups = tup.split(',')
289
359
  ev = rule.merge({:unixtimestamp => tups[0], :src_ip => tups[1], :dst_ip => tups[2], :src_port => tups[3], :dst_port => tups[4], :protocol => tups[5], :direction => tups[6], :decision => tups[7]})
290
360
  if (record["properties"]["Version"]==2)
361
+ tups[9] = 0 if tups[9].nil?
362
+ tups[10] = 0 if tups[10].nil?
363
+ tups[11] = 0 if tups[11].nil?
364
+ tups[12] = 0 if tups[12].nil?
291
365
  ev.merge!( {:flowstate => tups[8], :src_pack => tups[9], :src_bytes => tups[10], :dst_pack => tups[11], :dst_bytes => tups[12]} )
292
366
  end
293
367
  @logger.trace(ev.to_s)
368
+ if @addfilename
369
+ ev.merge!( {:filename => name } )
370
+ end
294
371
  event = LogStash::Event.new('message' => ev.to_json)
295
372
  decorate(event)
296
373
  queue << event
@@ -321,67 +398,107 @@ end
321
398
  # list all blobs in the blobstore, set the offsets from the registry and return the filelist
322
399
  # inspired by: https://github.com/Azure-Samples/storage-blobs-ruby-quickstart/blob/master/example.rb
323
400
  def list_blobs(fill)
324
- files = Hash.new
325
- nextMarker = nil
326
- counter = 0
327
- loop do
328
- begin
329
- if (counter > 10)
330
- @logger.error(@pipe_id+" lets try again for the 10th time, why don't faraday and azure storage accounts not play nice together? it has something to do with follow_redirect and a missing authorization header?")
331
- end
401
+ tries ||= 3
402
+ begin
403
+ return try_list_blobs(fill)
404
+ rescue Exception => e
405
+ @logger.error("caught: #{e.message} for list_blobs retries left #{tries}")
406
+ if (tries -= 1) > 0
407
+ retry
408
+ end
409
+ end
410
+ end
411
+
412
+ def try_list_blobs(fill)
413
+ # inspired by: http://blog.mirthlab.com/2012/05/25/cleanly-retrying-blocks-of-code-after-an-exception-in-ruby/
414
+ chrono = Time.now.to_i
415
+ files = Hash.new
416
+ nextMarker = nil
417
+ counter = 1
418
+ loop do
332
419
  blobs = @blob_client.list_blobs(container, { marker: nextMarker, prefix: @prefix})
333
420
  blobs.each do |blob|
334
- # exclude the registry itself
335
- unless blob.name == registry_path
421
+ # FNM_PATHNAME is required so that "**/test" can match "test" at the root folder
422
+ # FNM_EXTGLOB allows you to use "test{a,b,c}" to match either "testa", "testb" or "testc" (closer to shell behavior)
423
+ unless blob.name == registry_path
424
+ if @path_filters.any? {|path| File.fnmatch?(path, blob.name, File::FNM_PATHNAME | File::FNM_EXTGLOB)}
336
425
  length = blob.properties[:content_length].to_i
337
- offset = 0
426
+ offset = 0
338
427
  if fill
339
428
  offset = length
340
- end
429
+ end
341
430
  files.store(blob.name, { :offset => offset, :length => length })
431
+ if (@debug_until > @processed) then @logger.info("1: list_blobs #{blob.name} #{offset} #{length}") end
342
432
  end
433
+ end
343
434
  end
344
435
  nextMarker = blobs.continuation_token
345
436
  break unless nextMarker && !nextMarker.empty?
346
- rescue Exception => e
347
- @logger.error(@pipe_id+" caught: #{e.message}")
348
- counter += 1
349
- end
350
- end
437
+ if (counter % 10 == 0) then @logger.info(" listing #{counter * 50000} files") end
438
+ counter+=1
439
+ end
440
+ if @debug_timer
441
+ @logger.info("list_blobs took #{Time.now.to_i - chrono} sec")
442
+ end
351
443
  return files
352
444
  end
353
445
 
354
446
  # When events were processed after the last registry save, start a thread to update the registry file.
355
447
  def save_registry(filelist)
356
- # TODO because of threading, processed values and regsaved are not thread safe, they can change as instance variable @!
448
+ # Because of threading, processed values and regsaved are not thread safe, they can change as instance variable @! Most of the time this is fine because the registry is the last resort, but be careful about corner cases!
357
449
  unless @processed == @regsaved
358
450
  @regsaved = @processed
359
- @logger.info(@pipe_id+" processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
360
- Thread.new {
451
+ unless (@busy_writing_registry)
452
+ Thread.new {
361
453
  begin
362
- @blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
454
+ @busy_writing_registry = true
455
+ unless (@registry_local_path)
456
+ @blob_client.create_block_blob(container, registry_path, Marshal.dump(filelist))
457
+ @logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to remote registry #{registry_path}")
458
+ else
459
+ File.open(@registry_local_path+"/"+@pipe_id, 'w') { |file| file.write(Marshal.dump(filelist)) }
460
+ @logger.info("processed #{@processed} events, saving #{filelist.size} blobs and offsets to local registry #{registry_local_path+"/"+@pipe_id}")
461
+ end
462
+ @busy_writing_registry = false
463
+ @last = Time.now.to_i
363
464
  rescue
364
- @logger.error(@pipe_id+" Oh my, registry write failed, do you have write access?")
465
+ @logger.error("Oh my, registry write failed, do you have write access?")
365
466
  end
366
467
  }
468
+ else
469
+ @logger.info("Skipped writing the registry because previous write still in progress, it just takes long or may be hanging!")
470
+ end
367
471
  end
368
472
  end
369
473
 
474
+
370
475
  def learn_encapsulation
371
476
  # From one file, read first block and last block to learn head and tail
372
- # If the blobstorage can't be found, an error from farraday middleware will come with the text
373
- # org.jruby.ext.set.RubySet cannot be cast to class org.jruby.RubyFixnum
374
- # implement options ... prefix may ot exist!
375
- blob = @blob_client.list_blobs(container, { maxresults: 1, prefix: @prefix }).first
376
- return if blob.nil?
377
- blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
378
- @logger.debug(@pipe_id+" using #{blob.name} to learn the json header and tail")
379
- @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
380
- @logger.debug(@pipe_id+" learned header: #{@head}")
381
- length = blob.properties[:content_length].to_i
382
- offset = length - blocks.last.size
383
- @tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
384
- @logger.debug(@pipe_id+" learned tail: #{@tail}")
477
+ begin
478
+ blobs = @blob_client.list_blobs(container, { maxresults: 3, prefix: @prefix})
479
+ blobs.each do |blob|
480
+ unless blob.name == registry_path
481
+ begin
482
+ blocks = @blob_client.list_blob_blocks(container, blob.name)[:committed]
483
+ if blocks.first.name.start_with?('A00')
484
+ @logger.debug("using #{blob.name}/#{blocks.first.name} to learn the json header")
485
+ @head = @blob_client.get_blob(container, blob.name, start_range: 0, end_range: blocks.first.size-1)[1]
486
+ end
487
+ if blocks.last.name.start_with?('Z00')
488
+ @logger.debug("using #{blob.name}/#{blocks.last.name} to learn the json footer")
489
+ length = blob.properties[:content_length].to_i
490
+ offset = length - blocks.last.size
491
+ @tail = @blob_client.get_blob(container, blob.name, start_range: offset, end_range: length-1)[1]
492
+ @logger.debug("learned tail: #{@tail}")
493
+ end
494
+ rescue Exception => e
495
+ @logger.info("learn json one of the attempts failed #{e.message}")
496
+ end
497
+ end
498
+ end
499
+ rescue Exception => e
500
+ @logger.info("learn json header and footer failed because #{e.message}")
501
+ end
385
502
  end
386
503
 
387
504
  def resource(str)
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-input-azure_blob_storage'
3
- s.version = '0.11.0'
3
+ s.version = '0.11.5'
4
4
  s.licenses = ['Apache-2.0']
5
5
  s.summary = 'This logstash plugin reads and parses data from Azure Storage Blobs.'
6
6
  s.description = <<-EOF
@@ -22,6 +22,6 @@ EOF
22
22
  # Gem dependencies
23
23
  s.add_runtime_dependency 'logstash-core-plugin-api', '~> 2.1'
24
24
  s.add_runtime_dependency 'stud', '~> 0.0.23'
25
- s.add_runtime_dependency 'azure-storage-blob', '~> 1.0'
26
- s.add_development_dependency 'logstash-devutils', '~> 1.0', '>= 1.0.0'
25
+ s.add_runtime_dependency 'azure-storage-blob', '~> 1.1'
26
+ #s.add_development_dependency 'logstash-devutils', '~> 2'
27
27
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-azure_blob_storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.11.0
4
+ version: 0.11.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Geertsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-11-15 00:00:00.000000000 Z
11
+ date: 2020-12-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -17,8 +17,8 @@ dependencies:
17
17
  - !ruby/object:Gem::Version
18
18
  version: '2.1'
19
19
  name: logstash-core-plugin-api
20
- prerelease: false
21
20
  type: :runtime
21
+ prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
@@ -31,8 +31,8 @@ dependencies:
31
31
  - !ruby/object:Gem::Version
32
32
  version: 0.0.23
33
33
  name: stud
34
- prerelease: false
35
34
  type: :runtime
35
+ prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
@@ -43,35 +43,15 @@ dependencies:
43
43
  requirements:
44
44
  - - "~>"
45
45
  - !ruby/object:Gem::Version
46
- version: '1.0'
46
+ version: '1.1'
47
47
  name: azure-storage-blob
48
- prerelease: false
49
48
  type: :runtime
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - "~>"
53
- - !ruby/object:Gem::Version
54
- version: '1.0'
55
- - !ruby/object:Gem::Dependency
56
- requirement: !ruby/object:Gem::Requirement
57
- requirements:
58
- - - ">="
59
- - !ruby/object:Gem::Version
60
- version: 1.0.0
61
- - - "~>"
62
- - !ruby/object:Gem::Version
63
- version: '1.0'
64
- name: logstash-devutils
65
49
  prerelease: false
66
- type: :development
67
50
  version_requirements: !ruby/object:Gem::Requirement
68
51
  requirements:
69
- - - ">="
70
- - !ruby/object:Gem::Version
71
- version: 1.0.0
72
52
  - - "~>"
73
53
  - !ruby/object:Gem::Version
74
- version: '1.0'
54
+ version: '1.1'
75
55
  description: " This gem is a Logstash plugin. It reads and parses data from Azure\
76
56
  \ Storage Blobs. The azure_blob_storage is a reimplementation to replace azureblob\
77
57
  \ from azure-diagnostics-tools/Logstash. It can deal with larger volumes and partial\
@@ -112,8 +92,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
112
92
  - !ruby/object:Gem::Version
113
93
  version: '0'
114
94
  requirements: []
115
- rubyforge_project:
116
- rubygems_version: 2.7.9
95
+ rubygems_version: 3.0.6
117
96
  signing_key:
118
97
  specification_version: 4
119
98
  summary: This logstash plugin reads and parses data from Azure Storage Blobs.