logstash-filter-translate 3.1.0 → 3.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5e4b8c724a742d47aa89a6237e127be569f2675a19c5752e719918bff8489c77
4
- data.tar.gz: 406d7e0417251e10d1142d19027af9557f9a0993782bb16780bcdc86c943bd5e
3
+ metadata.gz: bda33e0807c4df1f6a144e456c86e59c538c7138d3d27c61c688a892dcc424df
4
+ data.tar.gz: 93724eb15e55f3e54e0ebf321189c4e4110031ff5656dc673cdac21b6e66f837
5
5
  SHA512:
6
- metadata.gz: e25757bc66a2b1afa15318161c6a8182f5258e9a8f395c870e0826224f6111bfa607950e678058a423aecb733f9f23edbbaed7778067f239269786afc76341d9
7
- data.tar.gz: c84f492fad3418db7e18333658691fe5a8109c6dfc15589e88415198d348cce4fa05d4b966762072387441dec06210d6c7bf5fcc92fe5bec94ffe1b5026eba9d
6
+ metadata.gz: 7f1cfc504590cd22466348a677184221674198e9aa066c314630766b8a16e744a744220447586593c3e42dcee4b88736509d052d7513cdbf19ae1aca35a44924
7
+ data.tar.gz: d3b2da31ff46f55d6459ea32a1e1fd015f90f4618663c35b1d142e612937244bc101d0dfa96f039ca947ee5d716b87999ad682fdd6a7873fdf09ed5a6829dbd9
@@ -1,3 +1,11 @@
1
+ ## 3.2.0
2
+ - Add `iterate_on` setting to support fields that are arrays, see the docs
3
+ for detailed explanation.
4
+ [#66](https://github.com/logstash-plugins/logstash-filter-translate/issues/66)
5
+ - Add Rufus::Scheduler to provide asynchronous loading of dictionary.
6
+ [#65](https://github.com/logstash-plugins/logstash-filter-translate/issues/65)
7
+ - Re-organise code, yields performance improvement of around 360%
8
+
1
9
  ## 3.1.0
2
10
  - Add 'refresh_behaviour' to either 'merge' or 'replace' during a refresh #57
3
11
 
@@ -21,8 +21,8 @@ include::{include_path}/plugin_header.asciidoc[]
21
21
  ==== Description
22
22
 
23
23
  A general search and replace tool that uses a configured hash
24
- and/or a file to determine replacement values. Currently supported are
25
- YAML, JSON, and CSV files.
24
+ and/or a file to determine replacement values. Currently supported are
25
+ YAML, JSON, and CSV files. Each dictionary item is a key value pair.
26
26
 
27
27
  The dictionary entries can be specified in one of two ways: First,
28
28
  the `dictionary` configuration item may contain a hash representing
@@ -30,19 +30,53 @@ the mapping. Second, an external file (readable by logstash) may be specified
30
30
  in the `dictionary_path` configuration item. These two methods may not be used
31
31
  in conjunction; it will produce an error.
32
32
 
33
- Operationally, if the event field specified in the `field` configuration
34
- matches the EXACT contents of a dictionary entry key (or matches a regex if
35
- `regex` configuration item has been enabled), the field's value will be substituted
36
- with the matched key's value from the dictionary.
33
+ Operationally, for each event, the value from the `field` setting is tested
34
+ against the dictionary and if it matches exactly (or matches a regex when
35
+ `regex` configuration item has been enabled), the matched value is put in
36
+ the `destination` field, but on no match the `fallback` setting string is
37
+ used instead.
37
38
 
38
- By default, the translate filter will replace the contents of the
39
- maching event field (in-place). However, by using the `destination`
40
- configuration item, you may also specify a target event field to
41
- populate with the new translated value.
39
+ Example:
40
+ ```
41
+ [source,ruby]
42
+ filter {
43
+ translate {
44
+ field => "[http_status]"
45
+ destination => "[http_status_description]"
46
+ dictionary => {
47
+ "100" => "Continue"
48
+ "101" => "Switching Protocols"
49
+ "200" => "OK"
50
+ "500" => "Server Error"
51
+ }
52
+ fallback => "I'm a teapot"
53
+ }
54
+ }
55
+ ```
56
+
57
+ Occasionally, people find that they have a field with a variable sized array of
58
+ values or objects that need some enrichment. The `iterate_on` setting helps in
59
+ these cases.
42
60
 
43
61
  Alternatively, for simple string search and replacements for just a few values
44
62
  you might consider using the gsub function of the mutate filter.
45
63
 
64
+ It is possible to provide multi-valued dictionary values. When using a YAML or
65
+ JSON dictionary, you can have the value as a hash (map) or an array datatype.
66
+ When using a CSV dictionary, multiple values in the translation must be
67
+ extracted with another filter e.g. Dissect or KV. +
68
+ Note that the `fallback` is a string so on no match the fallback setting needs
69
+ to formatted so that a filter can extract the multiple values to the correct fields.
70
+
71
+ File based dictionaries are loaded in a separate thread using a scheduler.
72
+ If you set a `refresh_interval` of 300 seconds (5 minutes) or less then the
73
+ modified time of the file is checked before reloading. Very large dictionaries
74
+ are supported, internally tested at 100 000 key/values, and we minimise
75
+ the impact on throughput by having the refresh in the scheduler thread.
76
+ Any ongoing modification of the dictionary file should be done using a
77
+ copy/edit/rename or create/rename mechanism to avoid the refresh code from
78
+ processing half-baked dictionary content.
79
+
46
80
  [id="plugins-{type}s-{plugin}-options"]
47
81
  ==== Translate Filter Configuration Options
48
82
 
@@ -57,6 +91,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
57
91
  | <<plugins-{type}s-{plugin}-exact>> |<<boolean,boolean>>|No
58
92
  | <<plugins-{type}s-{plugin}-fallback>> |<<string,string>>|No
59
93
  | <<plugins-{type}s-{plugin}-field>> |<<string,string>>|Yes
94
+ | <<plugins-{type}s-{plugin}-iterate_on>> |<<string,string>>|No
60
95
  | <<plugins-{type}s-{plugin}-override>> |<<boolean,boolean>>|No
61
96
  | <<plugins-{type}s-{plugin}-refresh_interval>> |<<number,number>>|No
62
97
  | <<plugins-{type}s-{plugin}-regex>> |<<boolean,boolean>>|No
@@ -69,7 +104,7 @@ filter plugins.
69
104
  &nbsp;
70
105
 
71
106
  [id="plugins-{type}s-{plugin}-destination"]
72
- ===== `destination`
107
+ ===== `destination`
73
108
 
74
109
  * Value type is <<string,string>>
75
110
  * Default value is `"translation"`
@@ -77,10 +112,10 @@ filter plugins.
77
112
  The destination field you wish to populate with the translated code. The default
78
113
  is a field named `translation`. Set this to the same value as source if you want
79
114
  to do a substitution, in this case filter will allways succeed. This will clobber
80
- the old value of the source field!
115
+ the old value of the source field!
81
116
 
82
117
  [id="plugins-{type}s-{plugin}-dictionary"]
83
- ===== `dictionary`
118
+ ===== `dictionary`
84
119
 
85
120
  * Value type is <<hash,hash>>
86
121
  * Default value is `{}`
@@ -92,10 +127,10 @@ Example:
92
127
  [source,ruby]
93
128
  filter {
94
129
  translate {
95
- dictionary => {
96
- "100" => "Continue",
97
- "101" => "Switching Protocols",
98
- "merci" => "thank you",
130
+ dictionary => {
131
+ "100" => "Continue"
132
+ "101" => "Switching Protocols"
133
+ "merci" => "thank you"
99
134
  "old version" => "new version"
100
135
  }
101
136
  }
@@ -104,7 +139,7 @@ Example:
104
139
  NOTE: It is an error to specify both `dictionary` and `dictionary_path`.
105
140
 
106
141
  [id="plugins-{type}s-{plugin}-dictionary_path"]
107
- ===== `dictionary_path`
142
+ ===== `dictionary_path`
108
143
 
109
144
  * Value type is <<path,path>>
110
145
  * There is no default value for this setting.
@@ -122,12 +157,11 @@ NOTE: it is an error to specify both `dictionary` and `dictionary_path`.
122
157
 
123
158
  The currently supported formats are YAML, JSON, and CSV. Format selection is
124
159
  based on the file extension: `json` for JSON, `yaml` or `yml` for YAML, and
125
- `csv` for CSV. The JSON format only supports simple key/value, unnested
126
- objects. The CSV format expects exactly two columns, with the first serving
127
- as the original text, and the second column as the replacement.
160
+ `csv` for CSV. The CSV format expects exactly two columns, with the first serving
161
+ as the original text (lookup key), and the second column as the translation.
128
162
 
129
163
  [id="plugins-{type}s-{plugin}-exact"]
130
- ===== `exact`
164
+ ===== `exact`
131
165
 
132
166
  * Value type is <<boolean,boolean>>
133
167
  * Default value is `true`
@@ -148,10 +182,10 @@ will be also set to `bar`. However, if logstash receives an event with the `data
148
182
  set to `foofing`, the destination field will be set to `barfing`.
149
183
 
150
184
  Set both `exact => true` AND `regex => `true` if you would like to match using dictionary
151
- keys as regular expressions. A large dictionary could be expensive to match in this case.
185
+ keys as regular expressions. A large dictionary could be expensive to match in this case.
152
186
 
153
187
  [id="plugins-{type}s-{plugin}-fallback"]
154
- ===== `fallback`
188
+ ===== `fallback`
155
189
 
156
190
  * Value type is <<string,string>>
157
191
  * There is no default value for this setting.
@@ -169,19 +203,122 @@ then the destination field would still be populated, but with the value of `no m
169
203
  This configuration can be dynamic and include parts of the event using the `%{field}` syntax.
170
204
 
171
205
  [id="plugins-{type}s-{plugin}-field"]
172
- ===== `field`
206
+ ===== `field`
173
207
 
174
208
  * This is a required setting.
175
209
  * Value type is <<string,string>>
176
210
  * There is no default value for this setting.
177
211
 
178
212
  The name of the logstash event field containing the value to be compared for a
179
- match by the translate filter (e.g. `message`, `host`, `response_code`).
213
+ match by the translate filter (e.g. `message`, `host`, `response_code`).
180
214
 
181
215
  If this field is an array, only the first value will be used.
182
216
 
217
+ [id="plugins-{type}s-{plugin}-iterate_on"]
218
+ ===== `iterate_on`
219
+
220
+ * Value type is <<string,string>>
221
+ * There is no default value for this setting.
222
+
223
+ When the value that you need to perform enrichment on is a variable sized array
224
+ then specify the field name in this setting. This setting introduces two modes,
225
+ 1) when the value is an array of strings and 2) when the value is an array of
226
+ objects (as in JSON object). +
227
+ In the first mode, you should have the same field name in both `field` and
228
+ `iterate_on`, the result will be an array added to the field specified in the
229
+ `destination` setting. This array will have the looked up value (or the
230
+ `fallback` value or nil) in same ordinal position as each sought value. +
231
+ In the second mode, specify the field that has the array of objects in
232
+ `iterate_on` then specify the field in each object that provides the sought value
233
+ with `field` and the field to write the looked up value (or the `fallback` value)
234
+ to with `destination`.
235
+
236
+ For a dictionary of:
237
+ [source,csv]
238
+ 100,Yuki
239
+ 101,Rupert
240
+ 102,Ahmed
241
+ 103,Kwame
242
+
243
+ Example of Mode 1
244
+ [source,ruby]
245
+ filter {
246
+ translate {
247
+ iterate_on => "[collaborator_ids]"
248
+ field => "[collaborator_ids]"
249
+ destination => "[collaborator_names]"
250
+ fallback => "Unknown"
251
+ }
252
+ }
253
+
254
+ Before
255
+ [source,json]
256
+ {
257
+ "collaborator_ids": [100,103,110,102]
258
+ }
259
+
260
+ After
261
+ [source,json]
262
+ {
263
+ "collaborator_ids": [100,103,110,102],
264
+ "collabrator_names": ["Yuki","Kwame","Unknown","Ahmed"]
265
+ }
266
+
267
+ Example of Mode 2
268
+ [source,ruby]
269
+ filter {
270
+ translate {
271
+ iterate_on => "[collaborators]"
272
+ field => "[id]"
273
+ destination => "[name]"
274
+ fallback => "Unknown"
275
+ }
276
+ }
277
+
278
+ Before
279
+ [source,json]
280
+ {
281
+ "collaborators": [
282
+ {
283
+ "id": 100
284
+ },
285
+ {
286
+ "id": 103
287
+ },
288
+ {
289
+ "id": 110
290
+ },
291
+ {
292
+ "id": 101
293
+ }
294
+ ]
295
+ }
296
+
297
+ After
298
+ [source,json]
299
+ {
300
+ "collaborators": [
301
+ {
302
+ "id": 100,
303
+ "name": "Yuki"
304
+ },
305
+ {
306
+ "id": 103,
307
+ "name": "Kwame"
308
+ },
309
+ {
310
+ "id": 110,
311
+ "name": "Unknown"
312
+ },
313
+ {
314
+ "id": 101,
315
+ "name": "Rupert"
316
+ }
317
+ ]
318
+ }
319
+
183
320
  [id="plugins-{type}s-{plugin}-override"]
184
- ===== `override`
321
+ ===== `override`
185
322
 
186
323
  * Value type is <<boolean,boolean>>
187
324
  * Default value is `false`
@@ -191,21 +328,22 @@ whether the filter should skip translation (default) or overwrite the target fie
191
328
  value with the new translation value.
192
329
 
193
330
  [id="plugins-{type}s-{plugin}-refresh_interval"]
194
- ===== `refresh_interval`
331
+ ===== `refresh_interval`
195
332
 
196
333
  * Value type is <<number,number>>
197
334
  * Default value is `300`
198
335
 
199
336
  When using a dictionary file, this setting will indicate how frequently
200
- (in seconds) logstash will check the dictionary file for updates.
337
+ (in seconds) logstash will check the dictionary file for updates. +
338
+ A value of zero or less will disable refresh.
201
339
 
202
340
  [id="plugins-{type}s-{plugin}-regex"]
203
- ===== `regex`
341
+ ===== `regex`
204
342
 
205
343
  * Value type is <<boolean,boolean>>
206
344
  * Default value is `false`
207
345
 
208
- If you'd like to treat dictionary keys as regular expressions, set `exact => true`.
346
+ If you'd like to treat dictionary keys as regular expressions, set `regex => true`.
209
347
  Note: this is activated only when `exact => true`.
210
348
 
211
349
  [id="plugins-{type}s-{plugin}-refresh_behaviour"]
@@ -215,8 +353,10 @@ Note: this is activated only when `exact => true`.
215
353
  * Default value is `merge`
216
354
 
217
355
  When using a dictionary file, this setting indicates how the update will be executed.
218
- Setting this to `merge` leads to entries removed from the dictionary file being kept;
219
- `replace` deletes old entries on update.
356
+ Setting this to `merge` causes the new dictionary to be merged into the old one. This means
357
+ same entry will be updated but entries that existed before but not in the new dictionary
358
+ will remain after the merge; `replace` causes the whole dictionary to be replaced
359
+ with a new one (deleting all entries of the old one on update).
220
360
 
221
361
 
222
362
 
@@ -0,0 +1,44 @@
1
+ # encoding: utf-8
2
+
3
+ module LogStash module Filters
4
+ class ArrayOfMapsValueUpdate
5
+ def initialize(iterate_on, field, destination, fallback, lookup)
6
+ @iterate_on = ensure_reference_format(iterate_on)
7
+ @field = ensure_reference_format(field)
8
+ @destination = ensure_reference_format(destination)
9
+ @fallback = fallback
10
+ @use_fallback = !fallback.nil? # fallback is not nil, the user set a value in the config
11
+ @lookup = lookup
12
+ end
13
+
14
+ def test_for_inclusion(event, override)
15
+ event.include?(@iterate_on)
16
+ end
17
+
18
+ def update(event)
19
+ val = event.get(@iterate_on) # should be an array of hashes
20
+ source = Array(val)
21
+ matches = Array.new(source.size)
22
+ source.size.times do |index|
23
+ nested_field = "#{@iterate_on}[#{index}]#{@field}"
24
+ nested_destination = "#{@iterate_on}[#{index}]#{@destination}"
25
+ inner = event.get(nested_field)
26
+ next if inner.nil?
27
+ matched = [true, nil]
28
+ @lookup.fetch_strategy.fetch(inner, matched)
29
+ if matched.first
30
+ event.set(nested_destination, matched.last)
31
+ matches[index] = true
32
+ elsif @use_fallback
33
+ event.set(nested_destination, event.sprintf(@fallback))
34
+ matches[index] = true
35
+ end
36
+ end
37
+ return matches.any?
38
+ end
39
+
40
+ def ensure_reference_format(field)
41
+ field.start_with?("[") && field.end_with?("]") ? field : "[#{field}]"
42
+ end
43
+ end
44
+ end end
@@ -0,0 +1,37 @@
1
+ # encoding: utf-8
2
+
3
+ module LogStash module Filters
4
+ class ArrayOfValuesUpdate
5
+ def initialize(iterate_on, destination, fallback, lookup)
6
+ @iterate_on = iterate_on
7
+ @destination = destination
8
+ @fallback = fallback
9
+ @use_fallback = !fallback.nil? # fallback is not nil, the user set a value in the config
10
+ @lookup = lookup
11
+ end
12
+
13
+ def test_for_inclusion(event, override)
14
+ # Skip translation in case @destination iterate_on already exists and @override is disabled.
15
+ return false if event.include?(@destination) && !override
16
+ event.include?(@iterate_on)
17
+ end
18
+
19
+ def update(event)
20
+ val = event.get(@iterate_on)
21
+ source = Array(val)
22
+ target = Array.new(source.size)
23
+ if @use_fallback
24
+ target.fill(event.sprintf(@fallback))
25
+ end
26
+ source.each_with_index do |inner, index|
27
+ matched = [true, nil]
28
+ @lookup.fetch_strategy.fetch(inner, matched)
29
+ if matched.first
30
+ target[index] = matched.last
31
+ end
32
+ end
33
+ event.set(@destination, target)
34
+ return target.any?
35
+ end
36
+ end
37
+ end end
@@ -0,0 +1,25 @@
1
+ # encoding: utf-8
2
+ require "csv"
3
+
4
+ module LogStash module Filters module Dictionary
5
+ class CsvFile < File
6
+
7
+ protected
8
+
9
+ def initialize_for_file_type
10
+ @io = StringIO.new("")
11
+ @csv = CSV.new(@io)
12
+ end
13
+
14
+ def read_file_into_dictionary
15
+ # low level CSV read that tries to create as
16
+ # few intermediate objects as possible
17
+ # this overwrites the value at key
18
+ IO.foreach(@dictionary_path, :mode => 'r:bom|utf-8') do |line|
19
+ @io.string = line
20
+ k,v = @csv.shift
21
+ @dictionary[k] = v
22
+ end
23
+ end
24
+ end
25
+ end end end