logstash-filter-translate 3.1.0 → 3.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/docs/index.asciidoc +173 -33
- data/lib/logstash/filters/array_of_maps_value_update.rb +44 -0
- data/lib/logstash/filters/array_of_values_update.rb +37 -0
- data/lib/logstash/filters/dictionary/csv_file.rb +25 -0
- data/lib/logstash/filters/dictionary/file.rb +140 -0
- data/lib/logstash/filters/dictionary/json_file.rb +87 -0
- data/lib/logstash/filters/dictionary/memory.rb +31 -0
- data/lib/logstash/filters/dictionary/yaml_file.rb +24 -0
- data/lib/logstash/filters/dictionary/yaml_visitor.rb +42 -0
- data/lib/logstash/filters/fetch_strategy/file.rb +81 -0
- data/lib/logstash/filters/fetch_strategy/memory.rb +52 -0
- data/lib/logstash/filters/single_value_update.rb +33 -0
- data/lib/logstash/filters/translate.rb +54 -155
- data/logstash-filter-translate.gemspec +5 -1
- data/spec/filters/benchmark_rspec.rb +69 -0
- data/spec/filters/scheduling_spec.rb +200 -0
- data/spec/filters/translate_spec.rb +238 -45
- data/spec/filters/yaml_visitor_spec.rb +16 -0
- data/spec/fixtures/regex_dict.csv +4 -0
- data/spec/fixtures/regex_union_dict.csv +4 -0
- data/spec/fixtures/tag-map-dict.yml +21 -0
- data/spec/fixtures/tag-omap-dict.yml +21 -0
- data/spec/support/build_huge_dictionaries.rb +33 -0
- data/spec/support/rspec_wait_handler_helper.rb +38 -0
- metadata +87 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: bda33e0807c4df1f6a144e456c86e59c538c7138d3d27c61c688a892dcc424df
|
4
|
+
data.tar.gz: 93724eb15e55f3e54e0ebf321189c4e4110031ff5656dc673cdac21b6e66f837
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7f1cfc504590cd22466348a677184221674198e9aa066c314630766b8a16e744a744220447586593c3e42dcee4b88736509d052d7513cdbf19ae1aca35a44924
|
7
|
+
data.tar.gz: d3b2da31ff46f55d6459ea32a1e1fd015f90f4618663c35b1d142e612937244bc101d0dfa96f039ca947ee5d716b87999ad682fdd6a7873fdf09ed5a6829dbd9
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,11 @@
|
|
1
|
+
## 3.2.0
|
2
|
+
- Add `iterate_on` setting to support fields that are arrays, see the docs
|
3
|
+
for detailed explanation.
|
4
|
+
[#66](https://github.com/logstash-plugins/logstash-filter-translate/issues/66)
|
5
|
+
- Add Rufus::Scheduler to provide asynchronous loading of dictionary.
|
6
|
+
[#65](https://github.com/logstash-plugins/logstash-filter-translate/issues/65)
|
7
|
+
- Re-organise code, yields performance improvement of around 360%
|
8
|
+
|
1
9
|
## 3.1.0
|
2
10
|
- Add 'refresh_behaviour' to either 'merge' or 'replace' during a refresh #57
|
3
11
|
|
data/docs/index.asciidoc
CHANGED
@@ -21,8 +21,8 @@ include::{include_path}/plugin_header.asciidoc[]
|
|
21
21
|
==== Description
|
22
22
|
|
23
23
|
A general search and replace tool that uses a configured hash
|
24
|
-
and/or a file to determine replacement values. Currently supported are
|
25
|
-
YAML, JSON, and CSV files.
|
24
|
+
and/or a file to determine replacement values. Currently supported are
|
25
|
+
YAML, JSON, and CSV files. Each dictionary item is a key value pair.
|
26
26
|
|
27
27
|
The dictionary entries can be specified in one of two ways: First,
|
28
28
|
the `dictionary` configuration item may contain a hash representing
|
@@ -30,19 +30,53 @@ the mapping. Second, an external file (readable by logstash) may be specified
|
|
30
30
|
in the `dictionary_path` configuration item. These two methods may not be used
|
31
31
|
in conjunction; it will produce an error.
|
32
32
|
|
33
|
-
Operationally,
|
34
|
-
|
35
|
-
`regex` configuration item has been enabled), the
|
36
|
-
|
33
|
+
Operationally, for each event, the value from the `field` setting is tested
|
34
|
+
against the dictionary and if it matches exactly (or matches a regex when
|
35
|
+
`regex` configuration item has been enabled), the matched value is put in
|
36
|
+
the `destination` field, but on no match the `fallback` setting string is
|
37
|
+
used instead.
|
37
38
|
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
39
|
+
Example:
|
40
|
+
```
|
41
|
+
[source,ruby]
|
42
|
+
filter {
|
43
|
+
translate {
|
44
|
+
field => "[http_status]"
|
45
|
+
destination => "[http_status_description]"
|
46
|
+
dictionary => {
|
47
|
+
"100" => "Continue"
|
48
|
+
"101" => "Switching Protocols"
|
49
|
+
"200" => "OK"
|
50
|
+
"500" => "Server Error"
|
51
|
+
}
|
52
|
+
fallback => "I'm a teapot"
|
53
|
+
}
|
54
|
+
}
|
55
|
+
```
|
56
|
+
|
57
|
+
Occasionally, people find that they have a field with a variable sized array of
|
58
|
+
values or objects that need some enrichment. The `iterate_on` setting helps in
|
59
|
+
these cases.
|
42
60
|
|
43
61
|
Alternatively, for simple string search and replacements for just a few values
|
44
62
|
you might consider using the gsub function of the mutate filter.
|
45
63
|
|
64
|
+
It is possible to provide multi-valued dictionary values. When using a YAML or
|
65
|
+
JSON dictionary, you can have the value as a hash (map) or an array datatype.
|
66
|
+
When using a CSV dictionary, multiple values in the translation must be
|
67
|
+
extracted with another filter e.g. Dissect or KV. +
|
68
|
+
Note that the `fallback` is a string so on no match the fallback setting needs
|
69
|
+
to formatted so that a filter can extract the multiple values to the correct fields.
|
70
|
+
|
71
|
+
File based dictionaries are loaded in a separate thread using a scheduler.
|
72
|
+
If you set a `refresh_interval` of 300 seconds (5 minutes) or less then the
|
73
|
+
modified time of the file is checked before reloading. Very large dictionaries
|
74
|
+
are supported, internally tested at 100 000 key/values, and we minimise
|
75
|
+
the impact on throughput by having the refresh in the scheduler thread.
|
76
|
+
Any ongoing modification of the dictionary file should be done using a
|
77
|
+
copy/edit/rename or create/rename mechanism to avoid the refresh code from
|
78
|
+
processing half-baked dictionary content.
|
79
|
+
|
46
80
|
[id="plugins-{type}s-{plugin}-options"]
|
47
81
|
==== Translate Filter Configuration Options
|
48
82
|
|
@@ -57,6 +91,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
|
|
57
91
|
| <<plugins-{type}s-{plugin}-exact>> |<<boolean,boolean>>|No
|
58
92
|
| <<plugins-{type}s-{plugin}-fallback>> |<<string,string>>|No
|
59
93
|
| <<plugins-{type}s-{plugin}-field>> |<<string,string>>|Yes
|
94
|
+
| <<plugins-{type}s-{plugin}-iterate_on>> |<<string,string>>|No
|
60
95
|
| <<plugins-{type}s-{plugin}-override>> |<<boolean,boolean>>|No
|
61
96
|
| <<plugins-{type}s-{plugin}-refresh_interval>> |<<number,number>>|No
|
62
97
|
| <<plugins-{type}s-{plugin}-regex>> |<<boolean,boolean>>|No
|
@@ -69,7 +104,7 @@ filter plugins.
|
|
69
104
|
|
70
105
|
|
71
106
|
[id="plugins-{type}s-{plugin}-destination"]
|
72
|
-
===== `destination`
|
107
|
+
===== `destination`
|
73
108
|
|
74
109
|
* Value type is <<string,string>>
|
75
110
|
* Default value is `"translation"`
|
@@ -77,10 +112,10 @@ filter plugins.
|
|
77
112
|
The destination field you wish to populate with the translated code. The default
|
78
113
|
is a field named `translation`. Set this to the same value as source if you want
|
79
114
|
to do a substitution, in this case filter will allways succeed. This will clobber
|
80
|
-
the old value of the source field!
|
115
|
+
the old value of the source field!
|
81
116
|
|
82
117
|
[id="plugins-{type}s-{plugin}-dictionary"]
|
83
|
-
===== `dictionary`
|
118
|
+
===== `dictionary`
|
84
119
|
|
85
120
|
* Value type is <<hash,hash>>
|
86
121
|
* Default value is `{}`
|
@@ -92,10 +127,10 @@ Example:
|
|
92
127
|
[source,ruby]
|
93
128
|
filter {
|
94
129
|
translate {
|
95
|
-
dictionary => {
|
96
|
-
"100" => "Continue"
|
97
|
-
"101" => "Switching Protocols"
|
98
|
-
"merci" => "thank you"
|
130
|
+
dictionary => {
|
131
|
+
"100" => "Continue"
|
132
|
+
"101" => "Switching Protocols"
|
133
|
+
"merci" => "thank you"
|
99
134
|
"old version" => "new version"
|
100
135
|
}
|
101
136
|
}
|
@@ -104,7 +139,7 @@ Example:
|
|
104
139
|
NOTE: It is an error to specify both `dictionary` and `dictionary_path`.
|
105
140
|
|
106
141
|
[id="plugins-{type}s-{plugin}-dictionary_path"]
|
107
|
-
===== `dictionary_path`
|
142
|
+
===== `dictionary_path`
|
108
143
|
|
109
144
|
* Value type is <<path,path>>
|
110
145
|
* There is no default value for this setting.
|
@@ -122,12 +157,11 @@ NOTE: it is an error to specify both `dictionary` and `dictionary_path`.
|
|
122
157
|
|
123
158
|
The currently supported formats are YAML, JSON, and CSV. Format selection is
|
124
159
|
based on the file extension: `json` for JSON, `yaml` or `yml` for YAML, and
|
125
|
-
`csv` for CSV. The
|
126
|
-
|
127
|
-
as the original text, and the second column as the replacement.
|
160
|
+
`csv` for CSV. The CSV format expects exactly two columns, with the first serving
|
161
|
+
as the original text (lookup key), and the second column as the translation.
|
128
162
|
|
129
163
|
[id="plugins-{type}s-{plugin}-exact"]
|
130
|
-
===== `exact`
|
164
|
+
===== `exact`
|
131
165
|
|
132
166
|
* Value type is <<boolean,boolean>>
|
133
167
|
* Default value is `true`
|
@@ -148,10 +182,10 @@ will be also set to `bar`. However, if logstash receives an event with the `data
|
|
148
182
|
set to `foofing`, the destination field will be set to `barfing`.
|
149
183
|
|
150
184
|
Set both `exact => true` AND `regex => `true` if you would like to match using dictionary
|
151
|
-
keys as regular expressions. A large dictionary could be expensive to match in this case.
|
185
|
+
keys as regular expressions. A large dictionary could be expensive to match in this case.
|
152
186
|
|
153
187
|
[id="plugins-{type}s-{plugin}-fallback"]
|
154
|
-
===== `fallback`
|
188
|
+
===== `fallback`
|
155
189
|
|
156
190
|
* Value type is <<string,string>>
|
157
191
|
* There is no default value for this setting.
|
@@ -169,19 +203,122 @@ then the destination field would still be populated, but with the value of `no m
|
|
169
203
|
This configuration can be dynamic and include parts of the event using the `%{field}` syntax.
|
170
204
|
|
171
205
|
[id="plugins-{type}s-{plugin}-field"]
|
172
|
-
===== `field`
|
206
|
+
===== `field`
|
173
207
|
|
174
208
|
* This is a required setting.
|
175
209
|
* Value type is <<string,string>>
|
176
210
|
* There is no default value for this setting.
|
177
211
|
|
178
212
|
The name of the logstash event field containing the value to be compared for a
|
179
|
-
match by the translate filter (e.g. `message`, `host`, `response_code`).
|
213
|
+
match by the translate filter (e.g. `message`, `host`, `response_code`).
|
180
214
|
|
181
215
|
If this field is an array, only the first value will be used.
|
182
216
|
|
217
|
+
[id="plugins-{type}s-{plugin}-iterate_on"]
|
218
|
+
===== `iterate_on`
|
219
|
+
|
220
|
+
* Value type is <<string,string>>
|
221
|
+
* There is no default value for this setting.
|
222
|
+
|
223
|
+
When the value that you need to perform enrichment on is a variable sized array
|
224
|
+
then specify the field name in this setting. This setting introduces two modes,
|
225
|
+
1) when the value is an array of strings and 2) when the value is an array of
|
226
|
+
objects (as in JSON object). +
|
227
|
+
In the first mode, you should have the same field name in both `field` and
|
228
|
+
`iterate_on`, the result will be an array added to the field specified in the
|
229
|
+
`destination` setting. This array will have the looked up value (or the
|
230
|
+
`fallback` value or nil) in same ordinal position as each sought value. +
|
231
|
+
In the second mode, specify the field that has the array of objects in
|
232
|
+
`iterate_on` then specify the field in each object that provides the sought value
|
233
|
+
with `field` and the field to write the looked up value (or the `fallback` value)
|
234
|
+
to with `destination`.
|
235
|
+
|
236
|
+
For a dictionary of:
|
237
|
+
[source,csv]
|
238
|
+
100,Yuki
|
239
|
+
101,Rupert
|
240
|
+
102,Ahmed
|
241
|
+
103,Kwame
|
242
|
+
|
243
|
+
Example of Mode 1
|
244
|
+
[source,ruby]
|
245
|
+
filter {
|
246
|
+
translate {
|
247
|
+
iterate_on => "[collaborator_ids]"
|
248
|
+
field => "[collaborator_ids]"
|
249
|
+
destination => "[collaborator_names]"
|
250
|
+
fallback => "Unknown"
|
251
|
+
}
|
252
|
+
}
|
253
|
+
|
254
|
+
Before
|
255
|
+
[source,json]
|
256
|
+
{
|
257
|
+
"collaborator_ids": [100,103,110,102]
|
258
|
+
}
|
259
|
+
|
260
|
+
After
|
261
|
+
[source,json]
|
262
|
+
{
|
263
|
+
"collaborator_ids": [100,103,110,102],
|
264
|
+
"collabrator_names": ["Yuki","Kwame","Unknown","Ahmed"]
|
265
|
+
}
|
266
|
+
|
267
|
+
Example of Mode 2
|
268
|
+
[source,ruby]
|
269
|
+
filter {
|
270
|
+
translate {
|
271
|
+
iterate_on => "[collaborators]"
|
272
|
+
field => "[id]"
|
273
|
+
destination => "[name]"
|
274
|
+
fallback => "Unknown"
|
275
|
+
}
|
276
|
+
}
|
277
|
+
|
278
|
+
Before
|
279
|
+
[source,json]
|
280
|
+
{
|
281
|
+
"collaborators": [
|
282
|
+
{
|
283
|
+
"id": 100
|
284
|
+
},
|
285
|
+
{
|
286
|
+
"id": 103
|
287
|
+
},
|
288
|
+
{
|
289
|
+
"id": 110
|
290
|
+
},
|
291
|
+
{
|
292
|
+
"id": 101
|
293
|
+
}
|
294
|
+
]
|
295
|
+
}
|
296
|
+
|
297
|
+
After
|
298
|
+
[source,json]
|
299
|
+
{
|
300
|
+
"collaborators": [
|
301
|
+
{
|
302
|
+
"id": 100,
|
303
|
+
"name": "Yuki"
|
304
|
+
},
|
305
|
+
{
|
306
|
+
"id": 103,
|
307
|
+
"name": "Kwame"
|
308
|
+
},
|
309
|
+
{
|
310
|
+
"id": 110,
|
311
|
+
"name": "Unknown"
|
312
|
+
},
|
313
|
+
{
|
314
|
+
"id": 101,
|
315
|
+
"name": "Rupert"
|
316
|
+
}
|
317
|
+
]
|
318
|
+
}
|
319
|
+
|
183
320
|
[id="plugins-{type}s-{plugin}-override"]
|
184
|
-
===== `override`
|
321
|
+
===== `override`
|
185
322
|
|
186
323
|
* Value type is <<boolean,boolean>>
|
187
324
|
* Default value is `false`
|
@@ -191,21 +328,22 @@ whether the filter should skip translation (default) or overwrite the target fie
|
|
191
328
|
value with the new translation value.
|
192
329
|
|
193
330
|
[id="plugins-{type}s-{plugin}-refresh_interval"]
|
194
|
-
===== `refresh_interval`
|
331
|
+
===== `refresh_interval`
|
195
332
|
|
196
333
|
* Value type is <<number,number>>
|
197
334
|
* Default value is `300`
|
198
335
|
|
199
336
|
When using a dictionary file, this setting will indicate how frequently
|
200
|
-
(in seconds) logstash will check the dictionary file for updates.
|
337
|
+
(in seconds) logstash will check the dictionary file for updates. +
|
338
|
+
A value of zero or less will disable refresh.
|
201
339
|
|
202
340
|
[id="plugins-{type}s-{plugin}-regex"]
|
203
|
-
===== `regex`
|
341
|
+
===== `regex`
|
204
342
|
|
205
343
|
* Value type is <<boolean,boolean>>
|
206
344
|
* Default value is `false`
|
207
345
|
|
208
|
-
If you'd like to treat dictionary keys as regular expressions, set `
|
346
|
+
If you'd like to treat dictionary keys as regular expressions, set `regex => true`.
|
209
347
|
Note: this is activated only when `exact => true`.
|
210
348
|
|
211
349
|
[id="plugins-{type}s-{plugin}-refresh_behaviour"]
|
@@ -215,8 +353,10 @@ Note: this is activated only when `exact => true`.
|
|
215
353
|
* Default value is `merge`
|
216
354
|
|
217
355
|
When using a dictionary file, this setting indicates how the update will be executed.
|
218
|
-
Setting this to `merge`
|
219
|
-
|
356
|
+
Setting this to `merge` causes the new dictionary to be merged into the old one. This means
|
357
|
+
same entry will be updated but entries that existed before but not in the new dictionary
|
358
|
+
will remain after the merge; `replace` causes the whole dictionary to be replaced
|
359
|
+
with a new one (deleting all entries of the old one on update).
|
220
360
|
|
221
361
|
|
222
362
|
|
@@ -0,0 +1,44 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
module LogStash module Filters
|
4
|
+
class ArrayOfMapsValueUpdate
|
5
|
+
def initialize(iterate_on, field, destination, fallback, lookup)
|
6
|
+
@iterate_on = ensure_reference_format(iterate_on)
|
7
|
+
@field = ensure_reference_format(field)
|
8
|
+
@destination = ensure_reference_format(destination)
|
9
|
+
@fallback = fallback
|
10
|
+
@use_fallback = !fallback.nil? # fallback is not nil, the user set a value in the config
|
11
|
+
@lookup = lookup
|
12
|
+
end
|
13
|
+
|
14
|
+
def test_for_inclusion(event, override)
|
15
|
+
event.include?(@iterate_on)
|
16
|
+
end
|
17
|
+
|
18
|
+
def update(event)
|
19
|
+
val = event.get(@iterate_on) # should be an array of hashes
|
20
|
+
source = Array(val)
|
21
|
+
matches = Array.new(source.size)
|
22
|
+
source.size.times do |index|
|
23
|
+
nested_field = "#{@iterate_on}[#{index}]#{@field}"
|
24
|
+
nested_destination = "#{@iterate_on}[#{index}]#{@destination}"
|
25
|
+
inner = event.get(nested_field)
|
26
|
+
next if inner.nil?
|
27
|
+
matched = [true, nil]
|
28
|
+
@lookup.fetch_strategy.fetch(inner, matched)
|
29
|
+
if matched.first
|
30
|
+
event.set(nested_destination, matched.last)
|
31
|
+
matches[index] = true
|
32
|
+
elsif @use_fallback
|
33
|
+
event.set(nested_destination, event.sprintf(@fallback))
|
34
|
+
matches[index] = true
|
35
|
+
end
|
36
|
+
end
|
37
|
+
return matches.any?
|
38
|
+
end
|
39
|
+
|
40
|
+
def ensure_reference_format(field)
|
41
|
+
field.start_with?("[") && field.end_with?("]") ? field : "[#{field}]"
|
42
|
+
end
|
43
|
+
end
|
44
|
+
end end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
module LogStash module Filters
|
4
|
+
class ArrayOfValuesUpdate
|
5
|
+
def initialize(iterate_on, destination, fallback, lookup)
|
6
|
+
@iterate_on = iterate_on
|
7
|
+
@destination = destination
|
8
|
+
@fallback = fallback
|
9
|
+
@use_fallback = !fallback.nil? # fallback is not nil, the user set a value in the config
|
10
|
+
@lookup = lookup
|
11
|
+
end
|
12
|
+
|
13
|
+
def test_for_inclusion(event, override)
|
14
|
+
# Skip translation in case @destination iterate_on already exists and @override is disabled.
|
15
|
+
return false if event.include?(@destination) && !override
|
16
|
+
event.include?(@iterate_on)
|
17
|
+
end
|
18
|
+
|
19
|
+
def update(event)
|
20
|
+
val = event.get(@iterate_on)
|
21
|
+
source = Array(val)
|
22
|
+
target = Array.new(source.size)
|
23
|
+
if @use_fallback
|
24
|
+
target.fill(event.sprintf(@fallback))
|
25
|
+
end
|
26
|
+
source.each_with_index do |inner, index|
|
27
|
+
matched = [true, nil]
|
28
|
+
@lookup.fetch_strategy.fetch(inner, matched)
|
29
|
+
if matched.first
|
30
|
+
target[index] = matched.last
|
31
|
+
end
|
32
|
+
end
|
33
|
+
event.set(@destination, target)
|
34
|
+
return target.any?
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end end
|
@@ -0,0 +1,25 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
require "csv"
|
3
|
+
|
4
|
+
module LogStash module Filters module Dictionary
|
5
|
+
class CsvFile < File
|
6
|
+
|
7
|
+
protected
|
8
|
+
|
9
|
+
def initialize_for_file_type
|
10
|
+
@io = StringIO.new("")
|
11
|
+
@csv = CSV.new(@io)
|
12
|
+
end
|
13
|
+
|
14
|
+
def read_file_into_dictionary
|
15
|
+
# low level CSV read that tries to create as
|
16
|
+
# few intermediate objects as possible
|
17
|
+
# this overwrites the value at key
|
18
|
+
IO.foreach(@dictionary_path, :mode => 'r:bom|utf-8') do |line|
|
19
|
+
@io.string = line
|
20
|
+
k,v = @csv.shift
|
21
|
+
@dictionary[k] = v
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end end end
|