logstash-filter-translate 3.1.0 → 3.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/docs/index.asciidoc +173 -33
- data/lib/logstash/filters/array_of_maps_value_update.rb +44 -0
- data/lib/logstash/filters/array_of_values_update.rb +37 -0
- data/lib/logstash/filters/dictionary/csv_file.rb +25 -0
- data/lib/logstash/filters/dictionary/file.rb +140 -0
- data/lib/logstash/filters/dictionary/json_file.rb +87 -0
- data/lib/logstash/filters/dictionary/memory.rb +31 -0
- data/lib/logstash/filters/dictionary/yaml_file.rb +24 -0
- data/lib/logstash/filters/dictionary/yaml_visitor.rb +42 -0
- data/lib/logstash/filters/fetch_strategy/file.rb +81 -0
- data/lib/logstash/filters/fetch_strategy/memory.rb +52 -0
- data/lib/logstash/filters/single_value_update.rb +33 -0
- data/lib/logstash/filters/translate.rb +54 -155
- data/logstash-filter-translate.gemspec +5 -1
- data/spec/filters/benchmark_rspec.rb +69 -0
- data/spec/filters/scheduling_spec.rb +200 -0
- data/spec/filters/translate_spec.rb +238 -45
- data/spec/filters/yaml_visitor_spec.rb +16 -0
- data/spec/fixtures/regex_dict.csv +4 -0
- data/spec/fixtures/regex_union_dict.csv +4 -0
- data/spec/fixtures/tag-map-dict.yml +21 -0
- data/spec/fixtures/tag-omap-dict.yml +21 -0
- data/spec/support/build_huge_dictionaries.rb +33 -0
- data/spec/support/rspec_wait_handler_helper.rb +38 -0
- metadata +87 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: bda33e0807c4df1f6a144e456c86e59c538c7138d3d27c61c688a892dcc424df
|
4
|
+
data.tar.gz: 93724eb15e55f3e54e0ebf321189c4e4110031ff5656dc673cdac21b6e66f837
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7f1cfc504590cd22466348a677184221674198e9aa066c314630766b8a16e744a744220447586593c3e42dcee4b88736509d052d7513cdbf19ae1aca35a44924
|
7
|
+
data.tar.gz: d3b2da31ff46f55d6459ea32a1e1fd015f90f4618663c35b1d142e612937244bc101d0dfa96f039ca947ee5d716b87999ad682fdd6a7873fdf09ed5a6829dbd9
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,11 @@
|
|
1
|
+
## 3.2.0
|
2
|
+
- Add `iterate_on` setting to support fields that are arrays, see the docs
|
3
|
+
for detailed explanation.
|
4
|
+
[#66](https://github.com/logstash-plugins/logstash-filter-translate/issues/66)
|
5
|
+
- Add Rufus::Scheduler to provide asynchronous loading of dictionary.
|
6
|
+
[#65](https://github.com/logstash-plugins/logstash-filter-translate/issues/65)
|
7
|
+
- Re-organise code, yields performance improvement of around 360%
|
8
|
+
|
1
9
|
## 3.1.0
|
2
10
|
- Add 'refresh_behaviour' to either 'merge' or 'replace' during a refresh #57
|
3
11
|
|
data/docs/index.asciidoc
CHANGED
@@ -21,8 +21,8 @@ include::{include_path}/plugin_header.asciidoc[]
|
|
21
21
|
==== Description
|
22
22
|
|
23
23
|
A general search and replace tool that uses a configured hash
|
24
|
-
and/or a file to determine replacement values. Currently supported are
|
25
|
-
YAML, JSON, and CSV files.
|
24
|
+
and/or a file to determine replacement values. Currently supported are
|
25
|
+
YAML, JSON, and CSV files. Each dictionary item is a key value pair.
|
26
26
|
|
27
27
|
The dictionary entries can be specified in one of two ways: First,
|
28
28
|
the `dictionary` configuration item may contain a hash representing
|
@@ -30,19 +30,53 @@ the mapping. Second, an external file (readable by logstash) may be specified
|
|
30
30
|
in the `dictionary_path` configuration item. These two methods may not be used
|
31
31
|
in conjunction; it will produce an error.
|
32
32
|
|
33
|
-
Operationally,
|
34
|
-
|
35
|
-
`regex` configuration item has been enabled), the
|
36
|
-
|
33
|
+
Operationally, for each event, the value from the `field` setting is tested
|
34
|
+
against the dictionary and if it matches exactly (or matches a regex when
|
35
|
+
`regex` configuration item has been enabled), the matched value is put in
|
36
|
+
the `destination` field, but on no match the `fallback` setting string is
|
37
|
+
used instead.
|
37
38
|
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
39
|
+
Example:
|
40
|
+
```
|
41
|
+
[source,ruby]
|
42
|
+
filter {
|
43
|
+
translate {
|
44
|
+
field => "[http_status]"
|
45
|
+
destination => "[http_status_description]"
|
46
|
+
dictionary => {
|
47
|
+
"100" => "Continue"
|
48
|
+
"101" => "Switching Protocols"
|
49
|
+
"200" => "OK"
|
50
|
+
"500" => "Server Error"
|
51
|
+
}
|
52
|
+
fallback => "I'm a teapot"
|
53
|
+
}
|
54
|
+
}
|
55
|
+
```
|
56
|
+
|
57
|
+
Occasionally, people find that they have a field with a variable sized array of
|
58
|
+
values or objects that need some enrichment. The `iterate_on` setting helps in
|
59
|
+
these cases.
|
42
60
|
|
43
61
|
Alternatively, for simple string search and replacements for just a few values
|
44
62
|
you might consider using the gsub function of the mutate filter.
|
45
63
|
|
64
|
+
It is possible to provide multi-valued dictionary values. When using a YAML or
|
65
|
+
JSON dictionary, you can have the value as a hash (map) or an array datatype.
|
66
|
+
When using a CSV dictionary, multiple values in the translation must be
|
67
|
+
extracted with another filter e.g. Dissect or KV. +
|
68
|
+
Note that the `fallback` is a string so on no match the fallback setting needs
|
69
|
+
to formatted so that a filter can extract the multiple values to the correct fields.
|
70
|
+
|
71
|
+
File based dictionaries are loaded in a separate thread using a scheduler.
|
72
|
+
If you set a `refresh_interval` of 300 seconds (5 minutes) or less then the
|
73
|
+
modified time of the file is checked before reloading. Very large dictionaries
|
74
|
+
are supported, internally tested at 100 000 key/values, and we minimise
|
75
|
+
the impact on throughput by having the refresh in the scheduler thread.
|
76
|
+
Any ongoing modification of the dictionary file should be done using a
|
77
|
+
copy/edit/rename or create/rename mechanism to avoid the refresh code from
|
78
|
+
processing half-baked dictionary content.
|
79
|
+
|
46
80
|
[id="plugins-{type}s-{plugin}-options"]
|
47
81
|
==== Translate Filter Configuration Options
|
48
82
|
|
@@ -57,6 +91,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
|
|
57
91
|
| <<plugins-{type}s-{plugin}-exact>> |<<boolean,boolean>>|No
|
58
92
|
| <<plugins-{type}s-{plugin}-fallback>> |<<string,string>>|No
|
59
93
|
| <<plugins-{type}s-{plugin}-field>> |<<string,string>>|Yes
|
94
|
+
| <<plugins-{type}s-{plugin}-iterate_on>> |<<string,string>>|No
|
60
95
|
| <<plugins-{type}s-{plugin}-override>> |<<boolean,boolean>>|No
|
61
96
|
| <<plugins-{type}s-{plugin}-refresh_interval>> |<<number,number>>|No
|
62
97
|
| <<plugins-{type}s-{plugin}-regex>> |<<boolean,boolean>>|No
|
@@ -69,7 +104,7 @@ filter plugins.
|
|
69
104
|
|
70
105
|
|
71
106
|
[id="plugins-{type}s-{plugin}-destination"]
|
72
|
-
===== `destination`
|
107
|
+
===== `destination`
|
73
108
|
|
74
109
|
* Value type is <<string,string>>
|
75
110
|
* Default value is `"translation"`
|
@@ -77,10 +112,10 @@ filter plugins.
|
|
77
112
|
The destination field you wish to populate with the translated code. The default
|
78
113
|
is a field named `translation`. Set this to the same value as source if you want
|
79
114
|
to do a substitution, in this case filter will allways succeed. This will clobber
|
80
|
-
the old value of the source field!
|
115
|
+
the old value of the source field!
|
81
116
|
|
82
117
|
[id="plugins-{type}s-{plugin}-dictionary"]
|
83
|
-
===== `dictionary`
|
118
|
+
===== `dictionary`
|
84
119
|
|
85
120
|
* Value type is <<hash,hash>>
|
86
121
|
* Default value is `{}`
|
@@ -92,10 +127,10 @@ Example:
|
|
92
127
|
[source,ruby]
|
93
128
|
filter {
|
94
129
|
translate {
|
95
|
-
dictionary => {
|
96
|
-
"100" => "Continue"
|
97
|
-
"101" => "Switching Protocols"
|
98
|
-
"merci" => "thank you"
|
130
|
+
dictionary => {
|
131
|
+
"100" => "Continue"
|
132
|
+
"101" => "Switching Protocols"
|
133
|
+
"merci" => "thank you"
|
99
134
|
"old version" => "new version"
|
100
135
|
}
|
101
136
|
}
|
@@ -104,7 +139,7 @@ Example:
|
|
104
139
|
NOTE: It is an error to specify both `dictionary` and `dictionary_path`.
|
105
140
|
|
106
141
|
[id="plugins-{type}s-{plugin}-dictionary_path"]
|
107
|
-
===== `dictionary_path`
|
142
|
+
===== `dictionary_path`
|
108
143
|
|
109
144
|
* Value type is <<path,path>>
|
110
145
|
* There is no default value for this setting.
|
@@ -122,12 +157,11 @@ NOTE: it is an error to specify both `dictionary` and `dictionary_path`.
|
|
122
157
|
|
123
158
|
The currently supported formats are YAML, JSON, and CSV. Format selection is
|
124
159
|
based on the file extension: `json` for JSON, `yaml` or `yml` for YAML, and
|
125
|
-
`csv` for CSV. The
|
126
|
-
|
127
|
-
as the original text, and the second column as the replacement.
|
160
|
+
`csv` for CSV. The CSV format expects exactly two columns, with the first serving
|
161
|
+
as the original text (lookup key), and the second column as the translation.
|
128
162
|
|
129
163
|
[id="plugins-{type}s-{plugin}-exact"]
|
130
|
-
===== `exact`
|
164
|
+
===== `exact`
|
131
165
|
|
132
166
|
* Value type is <<boolean,boolean>>
|
133
167
|
* Default value is `true`
|
@@ -148,10 +182,10 @@ will be also set to `bar`. However, if logstash receives an event with the `data
|
|
148
182
|
set to `foofing`, the destination field will be set to `barfing`.
|
149
183
|
|
150
184
|
Set both `exact => true` AND `regex => `true` if you would like to match using dictionary
|
151
|
-
keys as regular expressions. A large dictionary could be expensive to match in this case.
|
185
|
+
keys as regular expressions. A large dictionary could be expensive to match in this case.
|
152
186
|
|
153
187
|
[id="plugins-{type}s-{plugin}-fallback"]
|
154
|
-
===== `fallback`
|
188
|
+
===== `fallback`
|
155
189
|
|
156
190
|
* Value type is <<string,string>>
|
157
191
|
* There is no default value for this setting.
|
@@ -169,19 +203,122 @@ then the destination field would still be populated, but with the value of `no m
|
|
169
203
|
This configuration can be dynamic and include parts of the event using the `%{field}` syntax.
|
170
204
|
|
171
205
|
[id="plugins-{type}s-{plugin}-field"]
|
172
|
-
===== `field`
|
206
|
+
===== `field`
|
173
207
|
|
174
208
|
* This is a required setting.
|
175
209
|
* Value type is <<string,string>>
|
176
210
|
* There is no default value for this setting.
|
177
211
|
|
178
212
|
The name of the logstash event field containing the value to be compared for a
|
179
|
-
match by the translate filter (e.g. `message`, `host`, `response_code`).
|
213
|
+
match by the translate filter (e.g. `message`, `host`, `response_code`).
|
180
214
|
|
181
215
|
If this field is an array, only the first value will be used.
|
182
216
|
|
217
|
+
[id="plugins-{type}s-{plugin}-iterate_on"]
|
218
|
+
===== `iterate_on`
|
219
|
+
|
220
|
+
* Value type is <<string,string>>
|
221
|
+
* There is no default value for this setting.
|
222
|
+
|
223
|
+
When the value that you need to perform enrichment on is a variable sized array
|
224
|
+
then specify the field name in this setting. This setting introduces two modes,
|
225
|
+
1) when the value is an array of strings and 2) when the value is an array of
|
226
|
+
objects (as in JSON object). +
|
227
|
+
In the first mode, you should have the same field name in both `field` and
|
228
|
+
`iterate_on`, the result will be an array added to the field specified in the
|
229
|
+
`destination` setting. This array will have the looked up value (or the
|
230
|
+
`fallback` value or nil) in same ordinal position as each sought value. +
|
231
|
+
In the second mode, specify the field that has the array of objects in
|
232
|
+
`iterate_on` then specify the field in each object that provides the sought value
|
233
|
+
with `field` and the field to write the looked up value (or the `fallback` value)
|
234
|
+
to with `destination`.
|
235
|
+
|
236
|
+
For a dictionary of:
|
237
|
+
[source,csv]
|
238
|
+
100,Yuki
|
239
|
+
101,Rupert
|
240
|
+
102,Ahmed
|
241
|
+
103,Kwame
|
242
|
+
|
243
|
+
Example of Mode 1
|
244
|
+
[source,ruby]
|
245
|
+
filter {
|
246
|
+
translate {
|
247
|
+
iterate_on => "[collaborator_ids]"
|
248
|
+
field => "[collaborator_ids]"
|
249
|
+
destination => "[collaborator_names]"
|
250
|
+
fallback => "Unknown"
|
251
|
+
}
|
252
|
+
}
|
253
|
+
|
254
|
+
Before
|
255
|
+
[source,json]
|
256
|
+
{
|
257
|
+
"collaborator_ids": [100,103,110,102]
|
258
|
+
}
|
259
|
+
|
260
|
+
After
|
261
|
+
[source,json]
|
262
|
+
{
|
263
|
+
"collaborator_ids": [100,103,110,102],
|
264
|
+
"collabrator_names": ["Yuki","Kwame","Unknown","Ahmed"]
|
265
|
+
}
|
266
|
+
|
267
|
+
Example of Mode 2
|
268
|
+
[source,ruby]
|
269
|
+
filter {
|
270
|
+
translate {
|
271
|
+
iterate_on => "[collaborators]"
|
272
|
+
field => "[id]"
|
273
|
+
destination => "[name]"
|
274
|
+
fallback => "Unknown"
|
275
|
+
}
|
276
|
+
}
|
277
|
+
|
278
|
+
Before
|
279
|
+
[source,json]
|
280
|
+
{
|
281
|
+
"collaborators": [
|
282
|
+
{
|
283
|
+
"id": 100
|
284
|
+
},
|
285
|
+
{
|
286
|
+
"id": 103
|
287
|
+
},
|
288
|
+
{
|
289
|
+
"id": 110
|
290
|
+
},
|
291
|
+
{
|
292
|
+
"id": 101
|
293
|
+
}
|
294
|
+
]
|
295
|
+
}
|
296
|
+
|
297
|
+
After
|
298
|
+
[source,json]
|
299
|
+
{
|
300
|
+
"collaborators": [
|
301
|
+
{
|
302
|
+
"id": 100,
|
303
|
+
"name": "Yuki"
|
304
|
+
},
|
305
|
+
{
|
306
|
+
"id": 103,
|
307
|
+
"name": "Kwame"
|
308
|
+
},
|
309
|
+
{
|
310
|
+
"id": 110,
|
311
|
+
"name": "Unknown"
|
312
|
+
},
|
313
|
+
{
|
314
|
+
"id": 101,
|
315
|
+
"name": "Rupert"
|
316
|
+
}
|
317
|
+
]
|
318
|
+
}
|
319
|
+
|
183
320
|
[id="plugins-{type}s-{plugin}-override"]
|
184
|
-
===== `override`
|
321
|
+
===== `override`
|
185
322
|
|
186
323
|
* Value type is <<boolean,boolean>>
|
187
324
|
* Default value is `false`
|
@@ -191,21 +328,22 @@ whether the filter should skip translation (default) or overwrite the target fie
|
|
191
328
|
value with the new translation value.
|
192
329
|
|
193
330
|
[id="plugins-{type}s-{plugin}-refresh_interval"]
|
194
|
-
===== `refresh_interval`
|
331
|
+
===== `refresh_interval`
|
195
332
|
|
196
333
|
* Value type is <<number,number>>
|
197
334
|
* Default value is `300`
|
198
335
|
|
199
336
|
When using a dictionary file, this setting will indicate how frequently
|
200
|
-
(in seconds) logstash will check the dictionary file for updates.
|
337
|
+
(in seconds) logstash will check the dictionary file for updates. +
|
338
|
+
A value of zero or less will disable refresh.
|
201
339
|
|
202
340
|
[id="plugins-{type}s-{plugin}-regex"]
|
203
|
-
===== `regex`
|
341
|
+
===== `regex`
|
204
342
|
|
205
343
|
* Value type is <<boolean,boolean>>
|
206
344
|
* Default value is `false`
|
207
345
|
|
208
|
-
If you'd like to treat dictionary keys as regular expressions, set `
|
346
|
+
If you'd like to treat dictionary keys as regular expressions, set `regex => true`.
|
209
347
|
Note: this is activated only when `exact => true`.
|
210
348
|
|
211
349
|
[id="plugins-{type}s-{plugin}-refresh_behaviour"]
|
@@ -215,8 +353,10 @@ Note: this is activated only when `exact => true`.
|
|
215
353
|
* Default value is `merge`
|
216
354
|
|
217
355
|
When using a dictionary file, this setting indicates how the update will be executed.
|
218
|
-
Setting this to `merge`
|
219
|
-
|
356
|
+
Setting this to `merge` causes the new dictionary to be merged into the old one. This means
|
357
|
+
same entry will be updated but entries that existed before but not in the new dictionary
|
358
|
+
will remain after the merge; `replace` causes the whole dictionary to be replaced
|
359
|
+
with a new one (deleting all entries of the old one on update).
|
220
360
|
|
221
361
|
|
222
362
|
|
@@ -0,0 +1,44 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
module LogStash module Filters
|
4
|
+
class ArrayOfMapsValueUpdate
|
5
|
+
def initialize(iterate_on, field, destination, fallback, lookup)
|
6
|
+
@iterate_on = ensure_reference_format(iterate_on)
|
7
|
+
@field = ensure_reference_format(field)
|
8
|
+
@destination = ensure_reference_format(destination)
|
9
|
+
@fallback = fallback
|
10
|
+
@use_fallback = !fallback.nil? # fallback is not nil, the user set a value in the config
|
11
|
+
@lookup = lookup
|
12
|
+
end
|
13
|
+
|
14
|
+
def test_for_inclusion(event, override)
|
15
|
+
event.include?(@iterate_on)
|
16
|
+
end
|
17
|
+
|
18
|
+
def update(event)
|
19
|
+
val = event.get(@iterate_on) # should be an array of hashes
|
20
|
+
source = Array(val)
|
21
|
+
matches = Array.new(source.size)
|
22
|
+
source.size.times do |index|
|
23
|
+
nested_field = "#{@iterate_on}[#{index}]#{@field}"
|
24
|
+
nested_destination = "#{@iterate_on}[#{index}]#{@destination}"
|
25
|
+
inner = event.get(nested_field)
|
26
|
+
next if inner.nil?
|
27
|
+
matched = [true, nil]
|
28
|
+
@lookup.fetch_strategy.fetch(inner, matched)
|
29
|
+
if matched.first
|
30
|
+
event.set(nested_destination, matched.last)
|
31
|
+
matches[index] = true
|
32
|
+
elsif @use_fallback
|
33
|
+
event.set(nested_destination, event.sprintf(@fallback))
|
34
|
+
matches[index] = true
|
35
|
+
end
|
36
|
+
end
|
37
|
+
return matches.any?
|
38
|
+
end
|
39
|
+
|
40
|
+
def ensure_reference_format(field)
|
41
|
+
field.start_with?("[") && field.end_with?("]") ? field : "[#{field}]"
|
42
|
+
end
|
43
|
+
end
|
44
|
+
end end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
module LogStash module Filters
|
4
|
+
class ArrayOfValuesUpdate
|
5
|
+
def initialize(iterate_on, destination, fallback, lookup)
|
6
|
+
@iterate_on = iterate_on
|
7
|
+
@destination = destination
|
8
|
+
@fallback = fallback
|
9
|
+
@use_fallback = !fallback.nil? # fallback is not nil, the user set a value in the config
|
10
|
+
@lookup = lookup
|
11
|
+
end
|
12
|
+
|
13
|
+
def test_for_inclusion(event, override)
|
14
|
+
# Skip translation in case @destination iterate_on already exists and @override is disabled.
|
15
|
+
return false if event.include?(@destination) && !override
|
16
|
+
event.include?(@iterate_on)
|
17
|
+
end
|
18
|
+
|
19
|
+
def update(event)
|
20
|
+
val = event.get(@iterate_on)
|
21
|
+
source = Array(val)
|
22
|
+
target = Array.new(source.size)
|
23
|
+
if @use_fallback
|
24
|
+
target.fill(event.sprintf(@fallback))
|
25
|
+
end
|
26
|
+
source.each_with_index do |inner, index|
|
27
|
+
matched = [true, nil]
|
28
|
+
@lookup.fetch_strategy.fetch(inner, matched)
|
29
|
+
if matched.first
|
30
|
+
target[index] = matched.last
|
31
|
+
end
|
32
|
+
end
|
33
|
+
event.set(@destination, target)
|
34
|
+
return target.any?
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end end
|
@@ -0,0 +1,25 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
require "csv"
|
3
|
+
|
4
|
+
module LogStash module Filters module Dictionary
|
5
|
+
class CsvFile < File
|
6
|
+
|
7
|
+
protected
|
8
|
+
|
9
|
+
def initialize_for_file_type
|
10
|
+
@io = StringIO.new("")
|
11
|
+
@csv = CSV.new(@io)
|
12
|
+
end
|
13
|
+
|
14
|
+
def read_file_into_dictionary
|
15
|
+
# low level CSV read that tries to create as
|
16
|
+
# few intermediate objects as possible
|
17
|
+
# this overwrites the value at key
|
18
|
+
IO.foreach(@dictionary_path, :mode => 'r:bom|utf-8') do |line|
|
19
|
+
@io.string = line
|
20
|
+
k,v = @csv.shift
|
21
|
+
@dictionary[k] = v
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end end end
|