logstash-filter-dissect 1.0.8 → 1.0.9
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/Gemfile +8 -0
- data/README.md +29 -1
- data/VERSION +1 -1
- data/docs/index.asciidoc +213 -0
- data/lib/jruby-dissect-library_jars.rb +1 -1
- data/lib/logstash/filters/dissect.rb +5 -3
- data/logstash-filter-dissect.gemspec +1 -1
- data/spec/filters/dissect_spec.rb +52 -0
- metadata +4 -4
- data/vendor/jars/org/logstash/dissect/jruby-dissect-library/1.0.8/jruby-dissect-library-1.0.8.jar +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 214e5686732a713c4f413999fe85595782d0746c
|
4
|
+
data.tar.gz: 528f4a1d7afa2d6ea0bb7d8c55d49ce3342af913
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 57f4154b1bc1e3eb59622d33d9e08a706f2a603957b5910bde47bf8bbab23c0271890957ba302abec26b7b633a3ce60081144a0d1026072bbff59275ca1f7089
|
7
|
+
data.tar.gz: 87662b417269eb478b762f6376db447f329af697c1b7df129ee068426e1530accf331ab30d4e7023f3b69fef55a0d2e9a9f500b5aa45aed86ecd0dce320c7dce
|
data/CHANGELOG.md
CHANGED
data/Gemfile
CHANGED
@@ -1,3 +1,11 @@
|
|
1
1
|
source 'https://rubygems.org'
|
2
|
+
|
2
3
|
gemspec
|
3
4
|
|
5
|
+
logstash_path = ENV["LOGSTASH_PATH"] || "../../logstash"
|
6
|
+
use_logstash_source = ENV["LOGSTASH_SOURCE"] && ENV["LOGSTASH_SOURCE"].to_s == "1"
|
7
|
+
|
8
|
+
if Dir.exist?(logstash_path) && use_logstash_source
|
9
|
+
gem 'logstash-core', :path => "#{logstash_path}/logstash-core"
|
10
|
+
gem 'logstash-core-plugin-api', :path => "#{logstash_path}/logstash-core-plugin-api"
|
11
|
+
end
|
data/README.md
CHANGED
@@ -1,3 +1,31 @@
|
|
1
|
+
## Description
|
2
|
+
|
3
|
+
Dissect filter is an alternative to Grok filter and can be used to extract structured fields from an unstructured line.
|
4
|
+
However, if the structure of your text varies from line to line then Grok is more suitable. There is a hybrid case where Dissect can be used to de-structure the section of the line that is reliably repeated and then Grok can be used on the remaining field values with more regex predictability and less overall work to do.
|
5
|
+
|
6
|
+
A set of fields and delimiters is called a *dissection*.
|
7
|
+
|
8
|
+
The dissection is described using a set of `%{}` sections:
|
9
|
+
....
|
10
|
+
%{a} - %{b} - %{c}
|
11
|
+
....
|
12
|
+
|
13
|
+
A *field* is the text from `%` to `}` inclusive.
|
14
|
+
|
15
|
+
A *delimiter* is the text between `}` and `%` characters. Delimiters can't contain these `}{%` characters.
|
16
|
+
|
17
|
+
The config might look like this:
|
18
|
+
|
19
|
+
```
|
20
|
+
filter {
|
21
|
+
dissect {
|
22
|
+
mapping => {
|
23
|
+
"message" => "%{ts} %{+ts} %{+ts} %{src} %{} %{prog}[%{pid}]: %{msg}"
|
24
|
+
}
|
25
|
+
}
|
26
|
+
}
|
27
|
+
```
|
28
|
+
|
1
29
|
### NOTE
|
2
30
|
Please read BUILD_INSTRUCTIONS.md
|
3
31
|
|
@@ -98,4 +126,4 @@ Programming is not a required skill. Whatever you've seen about open source and
|
|
98
126
|
|
99
127
|
It is more important to the community that you are able to contribute.
|
100
128
|
|
101
|
-
For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file.
|
129
|
+
For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
1.0.
|
1
|
+
1.0.9
|
data/docs/index.asciidoc
ADDED
@@ -0,0 +1,213 @@
|
|
1
|
+
:plugin: dissect
|
2
|
+
:type: filter
|
3
|
+
|
4
|
+
///////////////////////////////////////////
|
5
|
+
START - GENERATED VARIABLES, DO NOT EDIT!
|
6
|
+
///////////////////////////////////////////
|
7
|
+
:version: %VERSION%
|
8
|
+
:release_date: %RELEASE_DATE%
|
9
|
+
:changelog_url: %CHANGELOG_URL%
|
10
|
+
:include_path: ../../../../logstash/docs/include
|
11
|
+
///////////////////////////////////////////
|
12
|
+
END - GENERATED VARIABLES, DO NOT EDIT!
|
13
|
+
///////////////////////////////////////////
|
14
|
+
|
15
|
+
[id="plugins-{type}-{plugin}"]
|
16
|
+
|
17
|
+
=== Dissect filter plugin
|
18
|
+
|
19
|
+
include::{include_path}/plugin_header.asciidoc[]
|
20
|
+
|
21
|
+
==== Description
|
22
|
+
|
23
|
+
The Dissect filter is a kind of split operation. Unlike a regular split operation where one delimiter is applied to the whole string, this operation applies a set of delimiters # to a string value. +
|
24
|
+
Dissect does not use regular expressions and is very fast. +
|
25
|
+
However, if the structure of your text varies from line to line then Grok is more suitable. +
|
26
|
+
There is a hybrid case where Dissect can be used to de-structure the section of the line that is reliably repeated and then Grok can be used on the remaining field values with # more regex predictability and less overall work to do. +
|
27
|
+
|
28
|
+
A set of fields and delimiters is called a *dissection*.
|
29
|
+
|
30
|
+
The dissection is described using a set of `%{}` sections:
|
31
|
+
....
|
32
|
+
%{a} - %{b} - %{c}
|
33
|
+
....
|
34
|
+
|
35
|
+
A *field* is the text from `%` to `}` inclusive.
|
36
|
+
|
37
|
+
A *delimiter* is the text between `}` and `%` characters.
|
38
|
+
|
39
|
+
[NOTE]
|
40
|
+
delimiters can't contain these `}{%` characters.
|
41
|
+
|
42
|
+
The config might look like this:
|
43
|
+
....
|
44
|
+
filter {
|
45
|
+
dissect {
|
46
|
+
mapping => {
|
47
|
+
"message" => "%{ts} %{+ts} %{+ts} %{src} %{} %{prog}[%{pid}]: %{msg}"
|
48
|
+
}
|
49
|
+
}
|
50
|
+
}
|
51
|
+
....
|
52
|
+
When dissecting a string from left to right, text is captured upto the first delimiter - this captured text is stored in the first field. This is repeated for each field/# delimiter pair thereafter until the last delimiter is reached, then *the remaining text is stored in the last field*. +
|
53
|
+
|
54
|
+
*The Key:* +
|
55
|
+
The key is the text between the `%{` and `}`, exclusive of the ?, +, & prefixes and the ordinal suffix. +
|
56
|
+
`%{?aaa}` - key is `aaa` +
|
57
|
+
`%{+bbb/3}` - key is `bbb` +
|
58
|
+
`%{&ccc}` - key is `ccc` +
|
59
|
+
|
60
|
+
*Normal field notation:* +
|
61
|
+
The found value is added to the Event using the key. +
|
62
|
+
`%{some_field}` - a normal field has no prefix or suffix
|
63
|
+
|
64
|
+
*Skip field notation:* +
|
65
|
+
The found value is stored internally but not added to the Event. +
|
66
|
+
The key, if supplied, is prefixed with a `?`.
|
67
|
+
|
68
|
+
`%{}` is an empty skip field.
|
69
|
+
|
70
|
+
`%{?foo}` is a named skip field.
|
71
|
+
|
72
|
+
*Append field notation:* +
|
73
|
+
The value is appended to another value or stored if its the first field seen. +
|
74
|
+
The key is prefixed with a `+`. +
|
75
|
+
The final value is stored in the Event using the key. +
|
76
|
+
|
77
|
+
[NOTE]
|
78
|
+
====
|
79
|
+
The delimiter found before the field is appended with the value. +
|
80
|
+
If no delimiter is found before the field, a single space character is used.
|
81
|
+
====
|
82
|
+
|
83
|
+
`%{+some_field}` is an append field. +
|
84
|
+
`%{+some_field/2}` is an append field with an order modifier.
|
85
|
+
|
86
|
+
An order modifier, `/digits`, allows one to reorder the append sequence. +
|
87
|
+
e.g. for a text of `1 2 3 go`, this `%{+a/2} %{+a/1} %{+a/4} %{+a/3}` will build a key/value of `a => 2 1 go 3` +
|
88
|
+
Append fields without an order modifier will append in declared order. +
|
89
|
+
e.g. for a text of `1 2 3 go`, this `%{a} %{b} %{+a}` will build two key/values of `a => 1 3 go, b => 2` +
|
90
|
+
|
91
|
+
*Indirect field notation:* +
|
92
|
+
The found value is added to the Event using the found value of another field as the key. +
|
93
|
+
The key is prefixed with a `&`. +
|
94
|
+
`%{&some_field}` - an indirect field where the key is indirectly sourced from the value of `some_field`. +
|
95
|
+
e.g. for a text of `error: some_error, some_description`, this `error: %{?err}, %{&err}` will build a key/value of `some_error => some_description`.
|
96
|
+
|
97
|
+
[NOTE]
|
98
|
+
for append and indirect field the key can refer to a field that already exists in the event before dissection.
|
99
|
+
|
100
|
+
[NOTE]
|
101
|
+
use a Skip field if you do not want the indirection key/value stored.
|
102
|
+
|
103
|
+
e.g. for a text of `google: 77.98`, this `%{?a}: %{&a}` will build a key/value of `google => 77.98`.
|
104
|
+
|
105
|
+
[NOTE]
|
106
|
+
===============================
|
107
|
+
append and indirect cannot be combined and will fail validation. +
|
108
|
+
`%{+&something}` - will add a value to the `&something` key, probably not the intended outcome. +
|
109
|
+
`%{&+something}` will add a value to the `+something` key, again probably unintended. +
|
110
|
+
===============================
|
111
|
+
|
112
|
+
*Delimiter repetition:* +
|
113
|
+
In the source text if a field has variable width padded with delimiters, the padding will be ignored. +
|
114
|
+
e.g. for texts of:
|
115
|
+
....
|
116
|
+
00000043 ViewReceiver I
|
117
|
+
000000b3 Peer I
|
118
|
+
....
|
119
|
+
with a dissection of `%{a} %{b} %{c}`; the padding is ignored, `event.get([c]) -> "I"`
|
120
|
+
|
121
|
+
[NOTE]
|
122
|
+
====
|
123
|
+
You probably want to use this filter inside an `if` block. +
|
124
|
+
This ensures that the event contains a field value with a suitable structure for the dissection.
|
125
|
+
====
|
126
|
+
|
127
|
+
For example...
|
128
|
+
....
|
129
|
+
filter {
|
130
|
+
if [type] == "syslog" or "syslog" in [tags] {
|
131
|
+
dissect {
|
132
|
+
mapping => {
|
133
|
+
"message" => "%{ts} %{+ts} %{+ts} %{src} %{} %{prog}[%{pid}]: %{msg}"
|
134
|
+
}
|
135
|
+
}
|
136
|
+
}
|
137
|
+
}
|
138
|
+
....
|
139
|
+
|
140
|
+
[id="plugins-{type}s-{plugin}-options"]
|
141
|
+
==== Dissect Filter Configuration Options
|
142
|
+
|
143
|
+
This plugin supports the following configuration options plus the <<plugins-{type}s-{plugin}-common-options>> described later.
|
144
|
+
|
145
|
+
[cols="<,<,<",options="header",]
|
146
|
+
|=======================================================================
|
147
|
+
|Setting |Input type|Required
|
148
|
+
| <<plugins-{type}s-{plugin}-convert_datatype>> |<<hash,hash>>|No
|
149
|
+
| <<plugins-{type}s-{plugin}-mapping>> |<<hash,hash>>|No
|
150
|
+
| <<plugins-{type}s-{plugin}-tag_on_failure>> |<<array,array>>|No
|
151
|
+
|=======================================================================
|
152
|
+
|
153
|
+
Also see <<plugins-{type}s-{plugin}-common-options>> for a list of options supported by all
|
154
|
+
filter plugins.
|
155
|
+
|
156
|
+
|
157
|
+
|
158
|
+
[id="plugins-{type}s-{plugin}-convert_datatype"]
|
159
|
+
===== `convert_datatype`
|
160
|
+
|
161
|
+
* Value type is <<hash,hash>>
|
162
|
+
* Default value is `{}`
|
163
|
+
|
164
|
+
With this setting `int` and `float` datatype conversions can be specified. +
|
165
|
+
These will be done after all `mapping` dissections have taken place. +
|
166
|
+
Feel free to use this setting on its own without a `mapping` section. +
|
167
|
+
|
168
|
+
For example
|
169
|
+
[source, ruby]
|
170
|
+
filter {
|
171
|
+
dissect {
|
172
|
+
convert_datatype => {
|
173
|
+
cpu => "float"
|
174
|
+
code => "int"
|
175
|
+
}
|
176
|
+
}
|
177
|
+
}
|
178
|
+
|
179
|
+
[id="plugins-{type}s-{plugin}-mapping"]
|
180
|
+
===== `mapping`
|
181
|
+
|
182
|
+
* Value type is <<hash,hash>>
|
183
|
+
* Default value is `{}`
|
184
|
+
|
185
|
+
A hash of dissections of `field => value` +
|
186
|
+
A later dissection can be done on values from a previous dissection or they can be independent.
|
187
|
+
|
188
|
+
For example
|
189
|
+
[source, ruby]
|
190
|
+
filter {
|
191
|
+
dissect {
|
192
|
+
mapping => {
|
193
|
+
"message" => "%{field1} %{field2} %{description}"
|
194
|
+
"description" => "%{field3} %{field4} %{field5}"
|
195
|
+
}
|
196
|
+
}
|
197
|
+
}
|
198
|
+
|
199
|
+
This is useful if you want to keep the field `description` but also
|
200
|
+
dissect it some more.
|
201
|
+
|
202
|
+
[id="plugins-{type}s-{plugin}-tag_on_failure"]
|
203
|
+
===== `tag_on_failure`
|
204
|
+
|
205
|
+
* Value type is <<array,array>>
|
206
|
+
* Default value is `["_dissectfailure"]`
|
207
|
+
|
208
|
+
Append values to the `tags` field when dissection fails
|
209
|
+
|
210
|
+
|
211
|
+
|
212
|
+
[id="plugins-{type}s-{plugin}-common-options"]
|
213
|
+
include::{include_path}/{type}.asciidoc[]
|
@@ -6,8 +6,6 @@ require "java"
|
|
6
6
|
require "jruby-dissect-library_jars"
|
7
7
|
require "jruby_dissector"
|
8
8
|
|
9
|
-
# ==== *Dissect or how to de-structure text*
|
10
|
-
#
|
11
9
|
# The Dissect filter is a kind of split operation. Unlike a regular split operation where one delimiter is applied to the whole string, this operation applies a set of delimiters # to a string value. +
|
12
10
|
# Dissect does not use regular expressions and is very fast. +
|
13
11
|
# However, if the structure of your text varies from line to line then Grok is more suitable. +
|
@@ -80,7 +78,7 @@ require "jruby_dissector"
|
|
80
78
|
# The found value is added to the Event using the found value of another field as the key. +
|
81
79
|
# The key is prefixed with a `&`. +
|
82
80
|
# `%{&some_field}` - an indirect field where the key is indirectly sourced from the value of `some_field`. +
|
83
|
-
# e.g. for a text of `error: some_error, some_description`, this `error: %{?err}, %{&err}` will build a key/value of `some_error =>
|
81
|
+
# e.g. for a text of `error: some_error, some_description`, this `error: %{?err}, %{&err}` will build a key/value of `some_error => some_description`.
|
84
82
|
#
|
85
83
|
# [NOTE]
|
86
84
|
# for append and indirect field the key can refer to a field that already exists in the event before dissection.
|
@@ -184,4 +182,8 @@ module LogStash module Filters class Dissect < LogStash::Filters::Base
|
|
184
182
|
@dissector.dissect_multi(events, self)
|
185
183
|
events
|
186
184
|
end
|
185
|
+
|
186
|
+
def metric_increment(metric_name)
|
187
|
+
metric.increment(metric_name)
|
188
|
+
end
|
187
189
|
end end end
|
@@ -12,7 +12,7 @@ Gem::Specification.new do |s|
|
|
12
12
|
s.require_paths = ["lib", "vendor/jars"]
|
13
13
|
|
14
14
|
# Files
|
15
|
-
s.files = Dir[
|
15
|
+
s.files = Dir["lib/**/*","spec/**/*","*.gemspec","*.md","CONTRIBUTORS","Gemfile","LICENSE","NOTICE.TXT", "vendor/jar-dependencies/**/*.jar", "vendor/jar-dependencies/**/*.rb", "VERSION", "docs/**/*"]
|
16
16
|
# Tests
|
17
17
|
s.test_files = s.files.grep(%r{^(test|spec|features)/})
|
18
18
|
|
@@ -222,4 +222,56 @@ describe LogStash::Filters::Dissect do
|
|
222
222
|
end
|
223
223
|
end
|
224
224
|
end
|
225
|
+
|
226
|
+
describe "metrics tracking" do
|
227
|
+
|
228
|
+
let(:options) { { "mapping" => { "message" => "%{a} %{b}" } } }
|
229
|
+
subject { described_class.new(options) }
|
230
|
+
|
231
|
+
before(:each) { subject.register }
|
232
|
+
|
233
|
+
context "when match is successful" do
|
234
|
+
let(:event) { LogStash::Event.new("message" => "1 2") }
|
235
|
+
|
236
|
+
it "should increment the matches metric" do
|
237
|
+
expect(subject).to receive(:metric_increment).once.with(:matches)
|
238
|
+
subject.filter(event)
|
239
|
+
end
|
240
|
+
end
|
241
|
+
|
242
|
+
context "when match is not successful" do
|
243
|
+
let(:event) { LogStash::Event.new("message" => "") }
|
244
|
+
|
245
|
+
it "should increment the failures metric" do
|
246
|
+
expect(subject).to receive(:metric_increment).once.with(:failures)
|
247
|
+
subject.filter(event)
|
248
|
+
end
|
249
|
+
end
|
250
|
+
end
|
251
|
+
|
252
|
+
describe "Basic dissection" do
|
253
|
+
|
254
|
+
let(:options) { { "mapping" => { "message" => "%{a} %{b}" } } }
|
255
|
+
subject { described_class.new(options) }
|
256
|
+
let(:event) { LogStash::Event.new(event_data) }
|
257
|
+
|
258
|
+
before(:each) do
|
259
|
+
subject.register
|
260
|
+
subject.filter(event)
|
261
|
+
end
|
262
|
+
|
263
|
+
context "when no field" do
|
264
|
+
let(:event_data) { {} }
|
265
|
+
it "should not add tags to the event" do
|
266
|
+
expect(event.get("tags")).to be_nil
|
267
|
+
end
|
268
|
+
end
|
269
|
+
|
270
|
+
context "when field is empty" do
|
271
|
+
let(:event_data) { { "message" => "" } }
|
272
|
+
it "should add tags to the event" do
|
273
|
+
expect(event.get("tags")).to include("_dissectfailure")
|
274
|
+
end
|
275
|
+
end
|
276
|
+
end
|
225
277
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-filter-dissect
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.9
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Elastic
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2017-06-23 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -87,12 +87,12 @@ files:
|
|
87
87
|
- NOTICE.TXT
|
88
88
|
- README.md
|
89
89
|
- VERSION
|
90
|
+
- docs/index.asciidoc
|
90
91
|
- lib/jruby-dissect-library_jars.rb
|
91
92
|
- lib/logstash/filters/dissect.rb
|
92
93
|
- logstash-filter-dissect.gemspec
|
93
94
|
- spec/filters/dissect_spec.rb
|
94
95
|
- spec/spec_helper.rb
|
95
|
-
- vendor/jars/org/logstash/dissect/jruby-dissect-library/1.0.8/jruby-dissect-library-1.0.8.jar
|
96
96
|
homepage: http://www.elastic.co/guide/en/logstash/current/index.html
|
97
97
|
licenses:
|
98
98
|
- Apache License (2.0)
|
@@ -116,7 +116,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
116
116
|
version: '0'
|
117
117
|
requirements: []
|
118
118
|
rubyforge_project:
|
119
|
-
rubygems_version: 2.
|
119
|
+
rubygems_version: 2.4.8
|
120
120
|
signing_key:
|
121
121
|
specification_version: 4
|
122
122
|
summary: This dissect filter will de-structure text into multiple fields.
|