logstash-filter-translate 3.1.0 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5e4b8c724a742d47aa89a6237e127be569f2675a19c5752e719918bff8489c77
4
- data.tar.gz: 406d7e0417251e10d1142d19027af9557f9a0993782bb16780bcdc86c943bd5e
3
+ metadata.gz: bda33e0807c4df1f6a144e456c86e59c538c7138d3d27c61c688a892dcc424df
4
+ data.tar.gz: 93724eb15e55f3e54e0ebf321189c4e4110031ff5656dc673cdac21b6e66f837
5
5
  SHA512:
6
- metadata.gz: e25757bc66a2b1afa15318161c6a8182f5258e9a8f395c870e0826224f6111bfa607950e678058a423aecb733f9f23edbbaed7778067f239269786afc76341d9
7
- data.tar.gz: c84f492fad3418db7e18333658691fe5a8109c6dfc15589e88415198d348cce4fa05d4b966762072387441dec06210d6c7bf5fcc92fe5bec94ffe1b5026eba9d
6
+ metadata.gz: 7f1cfc504590cd22466348a677184221674198e9aa066c314630766b8a16e744a744220447586593c3e42dcee4b88736509d052d7513cdbf19ae1aca35a44924
7
+ data.tar.gz: d3b2da31ff46f55d6459ea32a1e1fd015f90f4618663c35b1d142e612937244bc101d0dfa96f039ca947ee5d716b87999ad682fdd6a7873fdf09ed5a6829dbd9
@@ -1,3 +1,11 @@
1
+ ## 3.2.0
2
+ - Add `iterate_on` setting to support fields that are arrays, see the docs
3
+ for detailed explanation.
4
+ [#66](https://github.com/logstash-plugins/logstash-filter-translate/issues/66)
5
+ - Add Rufus::Scheduler to provide asynchronous loading of dictionary.
6
+ [#65](https://github.com/logstash-plugins/logstash-filter-translate/issues/65)
7
+ - Re-organise code, yields performance improvement of around 360%
8
+
1
9
  ## 3.1.0
2
10
  - Add 'refresh_behaviour' to either 'merge' or 'replace' during a refresh #57
3
11
 
@@ -21,8 +21,8 @@ include::{include_path}/plugin_header.asciidoc[]
21
21
  ==== Description
22
22
 
23
23
  A general search and replace tool that uses a configured hash
24
- and/or a file to determine replacement values. Currently supported are
25
- YAML, JSON, and CSV files.
24
+ and/or a file to determine replacement values. Currently supported are
25
+ YAML, JSON, and CSV files. Each dictionary item is a key value pair.
26
26
 
27
27
  The dictionary entries can be specified in one of two ways: First,
28
28
  the `dictionary` configuration item may contain a hash representing
@@ -30,19 +30,53 @@ the mapping. Second, an external file (readable by logstash) may be specified
30
30
  in the `dictionary_path` configuration item. These two methods may not be used
31
31
  in conjunction; it will produce an error.
32
32
 
33
- Operationally, if the event field specified in the `field` configuration
34
- matches the EXACT contents of a dictionary entry key (or matches a regex if
35
- `regex` configuration item has been enabled), the field's value will be substituted
36
- with the matched key's value from the dictionary.
33
+ Operationally, for each event, the value from the `field` setting is tested
34
+ against the dictionary and if it matches exactly (or matches a regex when
35
+ `regex` configuration item has been enabled), the matched value is put in
36
+ the `destination` field, but on no match the `fallback` setting string is
37
+ used instead.
37
38
 
38
- By default, the translate filter will replace the contents of the
39
- maching event field (in-place). However, by using the `destination`
40
- configuration item, you may also specify a target event field to
41
- populate with the new translated value.
39
+ Example:
40
+ ```
41
+ [source,ruby]
42
+ filter {
43
+ translate {
44
+ field => "[http_status]"
45
+ destination => "[http_status_description]"
46
+ dictionary => {
47
+ "100" => "Continue"
48
+ "101" => "Switching Protocols"
49
+ "200" => "OK"
50
+ "500" => "Server Error"
51
+ }
52
+ fallback => "I'm a teapot"
53
+ }
54
+ }
55
+ ```
56
+
57
+ Occasionally, people find that they have a field with a variable sized array of
58
+ values or objects that need some enrichment. The `iterate_on` setting helps in
59
+ these cases.
42
60
 
43
61
  Alternatively, for simple string search and replacements for just a few values
44
62
  you might consider using the gsub function of the mutate filter.
45
63
 
64
+ It is possible to provide multi-valued dictionary values. When using a YAML or
65
+ JSON dictionary, you can have the value as a hash (map) or an array datatype.
66
+ When using a CSV dictionary, multiple values in the translation must be
67
+ extracted with another filter e.g. Dissect or KV. +
68
+ Note that the `fallback` is a string so on no match the fallback setting needs
69
+ to formatted so that a filter can extract the multiple values to the correct fields.
70
+
71
+ File based dictionaries are loaded in a separate thread using a scheduler.
72
+ If you set a `refresh_interval` of 300 seconds (5 minutes) or less then the
73
+ modified time of the file is checked before reloading. Very large dictionaries
74
+ are supported, internally tested at 100 000 key/values, and we minimise
75
+ the impact on throughput by having the refresh in the scheduler thread.
76
+ Any ongoing modification of the dictionary file should be done using a
77
+ copy/edit/rename or create/rename mechanism to avoid the refresh code from
78
+ processing half-baked dictionary content.
79
+
46
80
  [id="plugins-{type}s-{plugin}-options"]
47
81
  ==== Translate Filter Configuration Options
48
82
 
@@ -57,6 +91,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
57
91
  | <<plugins-{type}s-{plugin}-exact>> |<<boolean,boolean>>|No
58
92
  | <<plugins-{type}s-{plugin}-fallback>> |<<string,string>>|No
59
93
  | <<plugins-{type}s-{plugin}-field>> |<<string,string>>|Yes
94
+ | <<plugins-{type}s-{plugin}-iterate_on>> |<<string,string>>|No
60
95
  | <<plugins-{type}s-{plugin}-override>> |<<boolean,boolean>>|No
61
96
  | <<plugins-{type}s-{plugin}-refresh_interval>> |<<number,number>>|No
62
97
  | <<plugins-{type}s-{plugin}-regex>> |<<boolean,boolean>>|No
@@ -69,7 +104,7 @@ filter plugins.
69
104
  &nbsp;
70
105
 
71
106
  [id="plugins-{type}s-{plugin}-destination"]
72
- ===== `destination`
107
+ ===== `destination`
73
108
 
74
109
  * Value type is <<string,string>>
75
110
  * Default value is `"translation"`
@@ -77,10 +112,10 @@ filter plugins.
77
112
  The destination field you wish to populate with the translated code. The default
78
113
  is a field named `translation`. Set this to the same value as source if you want
79
114
  to do a substitution, in this case filter will allways succeed. This will clobber
80
- the old value of the source field!
115
+ the old value of the source field!
81
116
 
82
117
  [id="plugins-{type}s-{plugin}-dictionary"]
83
- ===== `dictionary`
118
+ ===== `dictionary`
84
119
 
85
120
  * Value type is <<hash,hash>>
86
121
  * Default value is `{}`
@@ -92,10 +127,10 @@ Example:
92
127
  [source,ruby]
93
128
  filter {
94
129
  translate {
95
- dictionary => {
96
- "100" => "Continue",
97
- "101" => "Switching Protocols",
98
- "merci" => "thank you",
130
+ dictionary => {
131
+ "100" => "Continue"
132
+ "101" => "Switching Protocols"
133
+ "merci" => "thank you"
99
134
  "old version" => "new version"
100
135
  }
101
136
  }
@@ -104,7 +139,7 @@ Example:
104
139
  NOTE: It is an error to specify both `dictionary` and `dictionary_path`.
105
140
 
106
141
  [id="plugins-{type}s-{plugin}-dictionary_path"]
107
- ===== `dictionary_path`
142
+ ===== `dictionary_path`
108
143
 
109
144
  * Value type is <<path,path>>
110
145
  * There is no default value for this setting.
@@ -122,12 +157,11 @@ NOTE: it is an error to specify both `dictionary` and `dictionary_path`.
122
157
 
123
158
  The currently supported formats are YAML, JSON, and CSV. Format selection is
124
159
  based on the file extension: `json` for JSON, `yaml` or `yml` for YAML, and
125
- `csv` for CSV. The JSON format only supports simple key/value, unnested
126
- objects. The CSV format expects exactly two columns, with the first serving
127
- as the original text, and the second column as the replacement.
160
+ `csv` for CSV. The CSV format expects exactly two columns, with the first serving
161
+ as the original text (lookup key), and the second column as the translation.
128
162
 
129
163
  [id="plugins-{type}s-{plugin}-exact"]
130
- ===== `exact`
164
+ ===== `exact`
131
165
 
132
166
  * Value type is <<boolean,boolean>>
133
167
  * Default value is `true`
@@ -148,10 +182,10 @@ will be also set to `bar`. However, if logstash receives an event with the `data
148
182
  set to `foofing`, the destination field will be set to `barfing`.
149
183
 
150
184
  Set both `exact => true` AND `regex => `true` if you would like to match using dictionary
151
- keys as regular expressions. A large dictionary could be expensive to match in this case.
185
+ keys as regular expressions. A large dictionary could be expensive to match in this case.
152
186
 
153
187
  [id="plugins-{type}s-{plugin}-fallback"]
154
- ===== `fallback`
188
+ ===== `fallback`
155
189
 
156
190
  * Value type is <<string,string>>
157
191
  * There is no default value for this setting.
@@ -169,19 +203,122 @@ then the destination field would still be populated, but with the value of `no m
169
203
  This configuration can be dynamic and include parts of the event using the `%{field}` syntax.
170
204
 
171
205
  [id="plugins-{type}s-{plugin}-field"]
172
- ===== `field`
206
+ ===== `field`
173
207
 
174
208
  * This is a required setting.
175
209
  * Value type is <<string,string>>
176
210
  * There is no default value for this setting.
177
211
 
178
212
  The name of the logstash event field containing the value to be compared for a
179
- match by the translate filter (e.g. `message`, `host`, `response_code`).
213
+ match by the translate filter (e.g. `message`, `host`, `response_code`).
180
214
 
181
215
  If this field is an array, only the first value will be used.
182
216
 
217
+ [id="plugins-{type}s-{plugin}-iterate_on"]
218
+ ===== `iterate_on`
219
+
220
+ * Value type is <<string,string>>
221
+ * There is no default value for this setting.
222
+
223
+ When the value that you need to perform enrichment on is a variable sized array
224
+ then specify the field name in this setting. This setting introduces two modes,
225
+ 1) when the value is an array of strings and 2) when the value is an array of
226
+ objects (as in JSON object). +
227
+ In the first mode, you should have the same field name in both `field` and
228
+ `iterate_on`, the result will be an array added to the field specified in the
229
+ `destination` setting. This array will have the looked up value (or the
230
+ `fallback` value or nil) in same ordinal position as each sought value. +
231
+ In the second mode, specify the field that has the array of objects in
232
+ `iterate_on` then specify the field in each object that provides the sought value
233
+ with `field` and the field to write the looked up value (or the `fallback` value)
234
+ to with `destination`.
235
+
236
+ For a dictionary of:
237
+ [source,csv]
238
+ 100,Yuki
239
+ 101,Rupert
240
+ 102,Ahmed
241
+ 103,Kwame
242
+
243
+ Example of Mode 1
244
+ [source,ruby]
245
+ filter {
246
+ translate {
247
+ iterate_on => "[collaborator_ids]"
248
+ field => "[collaborator_ids]"
249
+ destination => "[collaborator_names]"
250
+ fallback => "Unknown"
251
+ }
252
+ }
253
+
254
+ Before
255
+ [source,json]
256
+ {
257
+ "collaborator_ids": [100,103,110,102]
258
+ }
259
+
260
+ After
261
+ [source,json]
262
+ {
263
+ "collaborator_ids": [100,103,110,102],
264
+ "collabrator_names": ["Yuki","Kwame","Unknown","Ahmed"]
265
+ }
266
+
267
+ Example of Mode 2
268
+ [source,ruby]
269
+ filter {
270
+ translate {
271
+ iterate_on => "[collaborators]"
272
+ field => "[id]"
273
+ destination => "[name]"
274
+ fallback => "Unknown"
275
+ }
276
+ }
277
+
278
+ Before
279
+ [source,json]
280
+ {
281
+ "collaborators": [
282
+ {
283
+ "id": 100
284
+ },
285
+ {
286
+ "id": 103
287
+ },
288
+ {
289
+ "id": 110
290
+ },
291
+ {
292
+ "id": 101
293
+ }
294
+ ]
295
+ }
296
+
297
+ After
298
+ [source,json]
299
+ {
300
+ "collaborators": [
301
+ {
302
+ "id": 100,
303
+ "name": "Yuki"
304
+ },
305
+ {
306
+ "id": 103,
307
+ "name": "Kwame"
308
+ },
309
+ {
310
+ "id": 110,
311
+ "name": "Unknown"
312
+ },
313
+ {
314
+ "id": 101,
315
+ "name": "Rupert"
316
+ }
317
+ ]
318
+ }
319
+
183
320
  [id="plugins-{type}s-{plugin}-override"]
184
- ===== `override`
321
+ ===== `override`
185
322
 
186
323
  * Value type is <<boolean,boolean>>
187
324
  * Default value is `false`
@@ -191,21 +328,22 @@ whether the filter should skip translation (default) or overwrite the target fie
191
328
  value with the new translation value.
192
329
 
193
330
  [id="plugins-{type}s-{plugin}-refresh_interval"]
194
- ===== `refresh_interval`
331
+ ===== `refresh_interval`
195
332
 
196
333
  * Value type is <<number,number>>
197
334
  * Default value is `300`
198
335
 
199
336
  When using a dictionary file, this setting will indicate how frequently
200
- (in seconds) logstash will check the dictionary file for updates.
337
+ (in seconds) logstash will check the dictionary file for updates. +
338
+ A value of zero or less will disable refresh.
201
339
 
202
340
  [id="plugins-{type}s-{plugin}-regex"]
203
- ===== `regex`
341
+ ===== `regex`
204
342
 
205
343
  * Value type is <<boolean,boolean>>
206
344
  * Default value is `false`
207
345
 
208
- If you'd like to treat dictionary keys as regular expressions, set `exact => true`.
346
+ If you'd like to treat dictionary keys as regular expressions, set `regex => true`.
209
347
  Note: this is activated only when `exact => true`.
210
348
 
211
349
  [id="plugins-{type}s-{plugin}-refresh_behaviour"]
@@ -215,8 +353,10 @@ Note: this is activated only when `exact => true`.
215
353
  * Default value is `merge`
216
354
 
217
355
  When using a dictionary file, this setting indicates how the update will be executed.
218
- Setting this to `merge` leads to entries removed from the dictionary file being kept;
219
- `replace` deletes old entries on update.
356
+ Setting this to `merge` causes the new dictionary to be merged into the old one. This means
357
+ same entry will be updated but entries that existed before but not in the new dictionary
358
+ will remain after the merge; `replace` causes the whole dictionary to be replaced
359
+ with a new one (deleting all entries of the old one on update).
220
360
 
221
361
 
222
362
 
@@ -0,0 +1,44 @@
1
+ # encoding: utf-8
2
+
3
+ module LogStash module Filters
4
+ class ArrayOfMapsValueUpdate
5
+ def initialize(iterate_on, field, destination, fallback, lookup)
6
+ @iterate_on = ensure_reference_format(iterate_on)
7
+ @field = ensure_reference_format(field)
8
+ @destination = ensure_reference_format(destination)
9
+ @fallback = fallback
10
+ @use_fallback = !fallback.nil? # fallback is not nil, the user set a value in the config
11
+ @lookup = lookup
12
+ end
13
+
14
+ def test_for_inclusion(event, override)
15
+ event.include?(@iterate_on)
16
+ end
17
+
18
+ def update(event)
19
+ val = event.get(@iterate_on) # should be an array of hashes
20
+ source = Array(val)
21
+ matches = Array.new(source.size)
22
+ source.size.times do |index|
23
+ nested_field = "#{@iterate_on}[#{index}]#{@field}"
24
+ nested_destination = "#{@iterate_on}[#{index}]#{@destination}"
25
+ inner = event.get(nested_field)
26
+ next if inner.nil?
27
+ matched = [true, nil]
28
+ @lookup.fetch_strategy.fetch(inner, matched)
29
+ if matched.first
30
+ event.set(nested_destination, matched.last)
31
+ matches[index] = true
32
+ elsif @use_fallback
33
+ event.set(nested_destination, event.sprintf(@fallback))
34
+ matches[index] = true
35
+ end
36
+ end
37
+ return matches.any?
38
+ end
39
+
40
+ def ensure_reference_format(field)
41
+ field.start_with?("[") && field.end_with?("]") ? field : "[#{field}]"
42
+ end
43
+ end
44
+ end end
@@ -0,0 +1,37 @@
1
+ # encoding: utf-8
2
+
3
+ module LogStash module Filters
4
+ class ArrayOfValuesUpdate
5
+ def initialize(iterate_on, destination, fallback, lookup)
6
+ @iterate_on = iterate_on
7
+ @destination = destination
8
+ @fallback = fallback
9
+ @use_fallback = !fallback.nil? # fallback is not nil, the user set a value in the config
10
+ @lookup = lookup
11
+ end
12
+
13
+ def test_for_inclusion(event, override)
14
+ # Skip translation in case @destination iterate_on already exists and @override is disabled.
15
+ return false if event.include?(@destination) && !override
16
+ event.include?(@iterate_on)
17
+ end
18
+
19
+ def update(event)
20
+ val = event.get(@iterate_on)
21
+ source = Array(val)
22
+ target = Array.new(source.size)
23
+ if @use_fallback
24
+ target.fill(event.sprintf(@fallback))
25
+ end
26
+ source.each_with_index do |inner, index|
27
+ matched = [true, nil]
28
+ @lookup.fetch_strategy.fetch(inner, matched)
29
+ if matched.first
30
+ target[index] = matched.last
31
+ end
32
+ end
33
+ event.set(@destination, target)
34
+ return target.any?
35
+ end
36
+ end
37
+ end end
@@ -0,0 +1,25 @@
1
+ # encoding: utf-8
2
+ require "csv"
3
+
4
+ module LogStash module Filters module Dictionary
5
+ class CsvFile < File
6
+
7
+ protected
8
+
9
+ def initialize_for_file_type
10
+ @io = StringIO.new("")
11
+ @csv = CSV.new(@io)
12
+ end
13
+
14
+ def read_file_into_dictionary
15
+ # low level CSV read that tries to create as
16
+ # few intermediate objects as possible
17
+ # this overwrites the value at key
18
+ IO.foreach(@dictionary_path, :mode => 'r:bom|utf-8') do |line|
19
+ @io.string = line
20
+ k,v = @csv.shift
21
+ @dictionary[k] = v
22
+ end
23
+ end
24
+ end
25
+ end end end