logstash-codec-protobuf 1.2.1 → 1.2.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +3 -0
- data/README.md +73 -17
- data/docs/index.asciidoc +96 -23
- data/lib/logstash/codecs/protobuf.rb +262 -84
- data/logstash-codec-protobuf.gemspec +1 -1
- data/spec/codecs/{protobuf_spec.rb → pb2_spec.rb} +0 -0
- data/spec/codecs/{protobuf3_spec.rb → pb3_decode_spec.rb} +7 -75
- data/spec/codecs/pb3_encode_spec.rb +220 -0
- data/spec/helpers/pb3/ReservationEntry_pb.rb +64 -0
- data/spec/helpers/pb3/rum2_pb.rb +87 -0
- data/spec/helpers/pb3/rum3_pb.rb +87 -0
- data/spec/helpers/pb3/rum_pb.rb +87 -0
- metadata +16 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a4a059736022035d5e326d6cb27ddec13cd54da37dcb1d2f37a8076cb0179e37
|
4
|
+
data.tar.gz: 3402775465ce05bbbbc87a7367acc482cd7d2468c84e27ad07ef0ac4383e8a97
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4b1dea1ed7701b7e6e26e8226991bc718fab06302071f60b7c8b8b0b4b9a2b45d1d0809be6fb7a76a30af0c291181a5a77cb8a51cdf2a4c5e9ddc2596c0afc1e
|
7
|
+
data.tar.gz: 778c28f5ed7cfa1ed5bd54f175f8efec4dc74e0d93e1d47dc3ded5671a2e61c6baf4e2f9b6bf9e87e896324f3b2a3e22c3a3692195b62fcb23a2bc37815f26aa
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -3,34 +3,46 @@
|
|
3
3
|
This is a codec plugin for [Logstash](https://github.com/elastic/logstash) to parse protobuf messages.
|
4
4
|
|
5
5
|
# Prerequisites and Installation
|
6
|
-
|
6
|
+
|
7
7
|
* prepare your Ruby versions of the Protobuf definitions:
|
8
|
-
* For protobuf 2 use the [ruby-protoc compiler](https://github.com/codekitchen/ruby-protocol-buffers).
|
8
|
+
* For protobuf 2 use the [ruby-protoc compiler](https://github.com/codekitchen/ruby-protocol-buffers).
|
9
9
|
* For protobuf 3 use the [official google protobuf compiler](https://developers.google.com/protocol-buffers/docs/reference/ruby-generated).
|
10
10
|
* install the codec: `bin/logstash-plugin install logstash-codec-protobuf`
|
11
11
|
* use the codec in your Logstash config file. See details below.
|
12
12
|
|
13
13
|
## Configuration
|
14
14
|
|
15
|
-
|
15
|
+
There are two ways to specify the locations of the ruby protobuf definitions:
|
16
|
+
* specify each class and their loading order using the configurations `include_path`. This option will soon be deprecated in favour of the autoloader.
|
17
|
+
* specify the path to the main protobuf class, and a folder from which to load its dependencies, using `class_file` and `protobuf_root_directory`. The codec will detect the dependencies of each file and load them automatically.
|
18
|
+
|
19
|
+
`include_path` (optional): an array of strings with filenames where logstash can find your protobuf definitions. Requires absolute paths. Please note that protobuf v2 files have the ending `.pb.rb` whereas files compiled for protobuf v3 end in `_pb.rb`. Cannot be used together with `protobuf_root_directory` or `class_file`.
|
20
|
+
|
21
|
+
`protobuf_root_directory` (optional): Only to be used in combination with `class_file`. Absolute path to the directory that contains all compiled protobuf files. Cannot be used together with `include_path`.
|
22
|
+
|
23
|
+
`class_file` (optional): Relative path to the ruby file that contains class_name. Only to be used in combination with `protobuf_root_directory`. Cannot be used together with `include_path`.
|
16
24
|
|
17
|
-
`class_name` (required): the name of the protobuf class that is to be decoded or encoded. For protobuf 2 separate the modules with ::. For protobuf 3 use single dots.
|
25
|
+
`class_name` (required): the name of the protobuf class that is to be decoded or encoded. For protobuf 2 separate the modules with ::. For protobuf 3 use single dots.
|
18
26
|
|
19
27
|
`protobuf_version` (optional): set this to 3 if you want to use protobuf 3 definitions. Defaults to 2.
|
20
28
|
|
29
|
+
`stop_on_error` (optional): Decoder only: will stop the entire pipeline upon discovery of a non decodable message. Deactivated by default.
|
30
|
+
|
31
|
+
`pb3_encoder_autoconvert_types` (optional): Encoder only: will try to fix type mismatches between the protobuf definition and the actual data. Available for protobuf 3 only. Activated by default.
|
32
|
+
|
21
33
|
## Usage example: decoder
|
22
34
|
|
23
35
|
Use this as a codec in any logstash input. Just provide the name of the class that your incoming objects will be encoded in, and specify the path to the compiled definition.
|
24
36
|
Here's an example for a kafka input with protobuf 2:
|
25
37
|
|
26
38
|
```ruby
|
27
|
-
kafka
|
39
|
+
kafka
|
28
40
|
{
|
29
41
|
topic_id => "..."
|
30
42
|
key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
31
43
|
value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
32
44
|
|
33
|
-
codec => protobuf
|
45
|
+
codec => protobuf
|
34
46
|
{
|
35
47
|
class_name => "Animals::Mammals::Unicorn"
|
36
48
|
include_path => ['/path/to/pb_definitions/Animal.pb.rb', '/path/to/pb_definitions/Unicorn.pb.rb']
|
@@ -38,15 +50,15 @@ kafka
|
|
38
50
|
}
|
39
51
|
```
|
40
52
|
|
41
|
-
Example for protobuf 3:
|
53
|
+
Example for protobuf 3, manual class loading specification (deprecated):
|
42
54
|
|
43
55
|
```ruby
|
44
|
-
kafka
|
56
|
+
kafka
|
45
57
|
{
|
46
58
|
topic_id => "..."
|
47
59
|
key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
48
60
|
value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
49
|
-
codec => protobuf
|
61
|
+
codec => protobuf
|
50
62
|
{
|
51
63
|
class_name => "Animals.Mammals.Unicorn"
|
52
64
|
include_path => ['/path/to/pb_definitions/Animal_pb.rb', '/path/to/pb_definitions/Unicorn_pb.rb']
|
@@ -55,6 +67,25 @@ kafka
|
|
55
67
|
}
|
56
68
|
```
|
57
69
|
|
70
|
+
Example for protobuf 3, automatic class loading specification:
|
71
|
+
|
72
|
+
```ruby
|
73
|
+
kafka
|
74
|
+
{
|
75
|
+
topic_id => "..."
|
76
|
+
key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
77
|
+
value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
78
|
+
codec => protobuf
|
79
|
+
{
|
80
|
+
class_name => "Animals.Mammals.Unicorn"
|
81
|
+
class_file => '/path/to/pb_definitions/some_folder/Unicorn_pb.rb'
|
82
|
+
protobuf_root_directory => "/path/to/pb_definitions/"
|
83
|
+
protobuf_version => 3
|
84
|
+
}
|
85
|
+
}
|
86
|
+
```
|
87
|
+
In this example, all protobuf files must live in a subfolder of `/path/to/pb_definitions/`.
|
88
|
+
|
58
89
|
For version 3 class names check the bottom of the generated protobuf ruby file. It contains lines like this:
|
59
90
|
|
60
91
|
```ruby
|
@@ -67,7 +98,7 @@ If you're using a kafka input please also set the deserializer classes as shown
|
|
67
98
|
|
68
99
|
### Class loading order
|
69
100
|
|
70
|
-
Imagine you have the following protobuf version 2 relationship: class Unicorn lives in namespace Animal::Horse and uses another class Wings.
|
101
|
+
Imagine you have the following protobuf version 2 relationship: class Unicorn lives in namespace Animal::Horse and uses another class Wings.
|
71
102
|
|
72
103
|
```ruby
|
73
104
|
module Animal
|
@@ -91,7 +122,7 @@ Set the class name to the parent class:
|
|
91
122
|
class_name => "Animal::Mammal::Unicorn"
|
92
123
|
```
|
93
124
|
|
94
|
-
for protobuf 2. For protobuf 3 use
|
125
|
+
for protobuf 2. For protobuf 3 use
|
95
126
|
|
96
127
|
```ruby
|
97
128
|
class_name => "Animal.Mammal.Unicorn"
|
@@ -99,14 +130,35 @@ class_name => "Animal.Mammal.Unicorn"
|
|
99
130
|
|
100
131
|
## Usage example: encoder
|
101
132
|
|
102
|
-
The configuration of the codec for encoding logstash events for a protobuf output is pretty much the same as for the decoder input usage as demonstrated above
|
133
|
+
The configuration of the codec for encoding logstash events for a protobuf output is pretty much the same as for the decoder input usage as demonstrated above, with the following exception: when writing to the Kafka output,
|
134
|
+
* do not set the `value_deserializer_class` or the `key_deserializer_class`.
|
135
|
+
* do set the serializer class like so: `value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"`.
|
136
|
+
|
137
|
+
Please be aware of the following:
|
103
138
|
* the protobuf definition needs to contain all the fields that logstash typically adds to an event, in the corrent data type. Examples for this are `@timestamp` (string), `@version` (string), `host`, `path`, all of which depend on your input sources and filters aswell. If you do not want to add those fields to your protobuf definition then please use a [modify filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html) to [remove](https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-remove_field) the undesired fields.
|
104
139
|
* object members starting with `@` are somewhat problematic in protobuf definitions. Therefore those fields will automatically be renamed to remove the at character. This also effects the important `@timestamp` field. Please name it just "timestamp" in your definition.
|
140
|
+
* fields with a nil value will automatically be removed from the event. Empty fields will not be removed.
|
141
|
+
* it is recommended to set the config option `pb3_encoder_autoconvert_types` to true. Otherwise any type mismatch between your data and the protobuf definition will cause an event to be lost. The auto typeconversion does not alter your data. It just tries to convert obviously identical data into the expected datatype, such as converting integers to floats where floats are expected, or "true" / "false" strings into booleans where booleans are expected.
|
105
142
|
|
143
|
+
```ruby
|
144
|
+
kafka
|
145
|
+
{
|
146
|
+
codec => protobuf
|
147
|
+
{
|
148
|
+
class_name => "Animals.Mammals.Unicorn"
|
149
|
+
class_file => '/path/to/pb_definitions/some_folder/Unicorn_pb.rb'
|
150
|
+
protobuf_root_directory => "/path/to/pb_definitions/"
|
151
|
+
protobuf_version => 3
|
152
|
+
}
|
153
|
+
...
|
154
|
+
value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"
|
155
|
+
}
|
156
|
+
}
|
157
|
+
```
|
106
158
|
|
107
159
|
## Troubleshooting
|
108
160
|
|
109
|
-
### Protobuf 2
|
161
|
+
### Decoder: Protobuf 2
|
110
162
|
#### "uninitialized constant SOME_CLASS_NAME"
|
111
163
|
|
112
164
|
If you include more than one definition class, consider the order of inclusion. This is especially relevant if you include whole directories. A definition might refer to another definition that is not loaded yet. In this case, please specify the files in the `include_path` variable in reverse order of reference. See 'Example with referenced definitions' above.
|
@@ -115,12 +167,16 @@ If you include more than one definition class, consider the order of inclusion.
|
|
115
167
|
|
116
168
|
Maybe your protobuf definition does not fullfill the requirements and needs additional fields. Run logstash with the `--debug` flag and search for error messages.
|
117
169
|
|
118
|
-
### Protobuf 3
|
170
|
+
### Decoder: Protobuf 3
|
171
|
+
|
172
|
+
#### NullPointerException
|
173
|
+
|
174
|
+
Check for missing imports. There's a high probability that one of the imported classes has dependencies of its own and those are not being fully satisfied. To avoid this, consider using the autoloader feature by setting the configurations for `protobuf_root_directory` and `class_file`.
|
119
175
|
|
120
|
-
|
176
|
+
### Encoder: Protobuf 3
|
121
177
|
|
122
|
-
|
178
|
+
#### NullPointerException
|
123
179
|
|
124
|
-
|
180
|
+
Check for missing imports. There's a high probability that one of the imported classes has dependencies of its own and those are not being fully satisfied. To avoid this, consider using the autoloader feature by setting the configurations for `protobuf_root_directory` and `class_file`.
|
125
181
|
|
126
182
|
|
data/docs/index.asciidoc
CHANGED
@@ -28,29 +28,31 @@ For protobuf 3 use the https://developers.google.com/protocol-buffers/docs/refer
|
|
28
28
|
|
29
29
|
The following shows a usage example (protobuf v2) for decoding events from a kafka stream:
|
30
30
|
[source,ruby]
|
31
|
-
kafka
|
31
|
+
kafka
|
32
32
|
{
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
33
|
+
topic_id => "..."
|
34
|
+
key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
35
|
+
value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
36
|
+
codec => protobuf
|
37
|
+
{
|
38
|
+
class_name => "Animals::Mammals::Unicorn"
|
39
|
+
class_file => '/path/to/pb_definitions/some_folder/Unicorn.pb.rb'
|
40
|
+
protobuf_root_directory => "/path/to/pb_definitions/"
|
41
|
+
}
|
41
42
|
}
|
42
43
|
|
43
|
-
|
44
|
+
Decoder usage example for protobuf v3:
|
44
45
|
[source,ruby]
|
45
|
-
kafka
|
46
|
+
kafka
|
46
47
|
{
|
47
48
|
topic_id => "..."
|
48
49
|
key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
49
50
|
value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
|
50
|
-
codec => protobuf
|
51
|
+
codec => protobuf
|
51
52
|
{
|
52
53
|
class_name => "Animals.Mammals.Unicorn"
|
53
|
-
|
54
|
+
class_file => '/path/to/pb_definitions/some_folder/Unicorn_pb.rb'
|
55
|
+
protobuf_root_directory => "/path/to/pb_definitions/"
|
54
56
|
protobuf_version => 3
|
55
57
|
}
|
56
58
|
}
|
@@ -60,10 +62,29 @@ The codec can be used in input and output plugins. +
|
|
60
62
|
When using the codec in the kafka input plugin please set the deserializer classes as shown above. +
|
61
63
|
When using the codec in an output plugin:
|
62
64
|
|
63
|
-
* make sure to include all the desired fields in the protobuf definition, including timestamp.
|
64
|
-
Remove fields that are not part of the protobuf definition from the event by using the mutate filter.
|
65
|
+
* make sure to include all the desired fields in the protobuf definition, including timestamp.
|
66
|
+
Remove fields that are not part of the protobuf definition from the event by using the mutate filter. Encoding will fail if the event has fields which are not in the protobuf definition.
|
65
67
|
* the `@` symbol is currently not supported in field names when loading the protobuf definitions for encoding. Make sure to call the timestamp field `timestamp`
|
66
68
|
instead of `@timestamp` in the protobuf file. Logstash event fields will be stripped of the leading `@` before conversion.
|
69
|
+
* fields with a nil value will automatically be removed from the event. Empty fields will not be removed.
|
70
|
+
* it is recommended to set the config option `pb3_encoder_autoconvert_types` to true. Otherwise any type mismatch between your data and the protobuf definition will cause an event to be lost. The auto typeconversion does not alter your data. It just tries to convert obviously identical data into the expected datatype, such as converting integers to floats where floats are expected, or "true" / "false" strings into booleans where booleans are expected.
|
71
|
+
* When writing to Kafka: set the serializer class: `value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"`
|
72
|
+
|
73
|
+
Encoder usage example (protobufg v3):
|
74
|
+
|
75
|
+
[source,ruby]
|
76
|
+
kafka
|
77
|
+
{
|
78
|
+
codec => protobuf
|
79
|
+
{
|
80
|
+
class_name => "Animals.Mammals.Unicorn"
|
81
|
+
class_file => '/path/to/pb_definitions/some_folder/Unicorn_pb.rb'
|
82
|
+
protobuf_root_directory => "/path/to/pb_definitions/"
|
83
|
+
protobuf_version => 3
|
84
|
+
}
|
85
|
+
value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"
|
86
|
+
}
|
87
|
+
}
|
67
88
|
|
68
89
|
|
69
90
|
[id="plugins-{type}s-{plugin}-options"]
|
@@ -73,14 +94,18 @@ When using the codec in an output plugin:
|
|
73
94
|
|=======================================================================
|
74
95
|
|Setting |Input type|Required
|
75
96
|
| <<plugins-{type}s-{plugin}-class_name>> |<<string,string>>|Yes
|
76
|
-
| <<plugins-{type}s-{plugin}-
|
97
|
+
| <<plugins-{type}s-{plugin}-class_file>> |<<string,string>>|No
|
98
|
+
| <<plugins-{type}s-{plugin}-protobuf_root_directory>> |<<string,string>>|No
|
99
|
+
| <<plugins-{type}s-{plugin}-include_path>> |<<array,array>>|No
|
77
100
|
| <<plugins-{type}s-{plugin}-protobuf_version>> |<<number,number>>|Yes
|
101
|
+
| <<plugins-{type}s-{plugin}-stop_on_error>> |<<boolean,boolean>>|No
|
102
|
+
| <<plugins-{type}s-{plugin}-pb3_encoder_autoconvert_types>> |<<boolean,boolean>>|No
|
78
103
|
|=======================================================================
|
79
104
|
|
80
105
|
|
81
106
|
|
82
107
|
[id="plugins-{type}s-{plugin}-class_name"]
|
83
|
-
===== `class_name`
|
108
|
+
===== `class_name`
|
84
109
|
|
85
110
|
* This is a required setting.
|
86
111
|
* Value type is <<string,string>>
|
@@ -99,18 +124,45 @@ For protobuf v3, you can copy the class name from the Descriptorpool registratio
|
|
99
124
|
[source,ruby]
|
100
125
|
Animals.Mammals.Unicorn = Google::Protobuf::DescriptorPool.generated_pool.lookup("Animals.Mammals.Unicorn").msgclass
|
101
126
|
|
102
|
-
|
103
127
|
If your class references other definitions: you only have to add the name of the main class here.
|
104
128
|
|
129
|
+
[id="plugins-{type}s-{plugin}-class_file"]
|
130
|
+
===== `class_file`
|
131
|
+
|
132
|
+
* Value type is <<string,string>>
|
133
|
+
* There is no default value for this setting.
|
134
|
+
|
135
|
+
Absolute path to the directory that contains all compiled protobuf files. If the protobuf definitions are spread across multiple folders, this needs to point to the folder containing all those folders.
|
136
|
+
|
137
|
+
[id="plugins-{type}s-{plugin}-protobuf_root_directory"]
|
138
|
+
===== `protobuf_root_directory`
|
139
|
+
|
140
|
+
* Value type is <<string,string>>
|
141
|
+
* There is no default value for this setting.
|
142
|
+
|
143
|
+
Absolute path to the root directory that contains all referenced/used dependencies of the main class (`class_name`) or any of its dependencies. Must be used in combination with the `class_file` setting, and can not be used in combination with the legacy loading mechanism `include_path`.
|
144
|
+
|
145
|
+
Example:
|
146
|
+
|
147
|
+
[source]
|
148
|
+
pb3
|
149
|
+
├── header
|
150
|
+
│ └── header_pb.rb
|
151
|
+
├── messageA_pb.rb
|
152
|
+
|
153
|
+
In this case `messageA_pb.rb` has an embedded message from `header/header_pb.rb`.
|
154
|
+
If `class_file` is set to `messageA_pb.rb`, and `class_name` to `MessageA`, `protobuf_root_directory` must be set to `/path/to/pb3`, which includes both definitions.
|
155
|
+
|
156
|
+
|
105
157
|
[id="plugins-{type}s-{plugin}-include_path"]
|
106
|
-
===== `include_path`
|
158
|
+
===== `include_path`
|
107
159
|
|
108
|
-
* This is a required setting.
|
109
160
|
* Value type is <<array,array>>
|
110
161
|
* There is no default value for this setting.
|
111
162
|
|
112
|
-
|
113
|
-
|
163
|
+
Legacy protobuf definition loading mechanism for backwards compatibility:
|
164
|
+
List of absolute pathes to files with protobuf definitions.
|
165
|
+
When using more than one file, make sure to arrange the files in reverse order of dependency so that each class is loaded before it is
|
114
166
|
refered to by another.
|
115
167
|
|
116
168
|
Example: a class _Unicorn_ referencing another protobuf class _Wings_
|
@@ -129,13 +181,34 @@ include_path => ['/path/to/pb_definitions/wings.pb.rb','/path/to/pb_definitions/
|
|
129
181
|
|
130
182
|
Please note that protobuf v2 files have the ending `.pb.rb` whereas files compiled for protobuf v3 end in `_pb.rb`.
|
131
183
|
|
184
|
+
Cannot be used together with `protobuf_root_directory` or `class_file`.
|
185
|
+
|
132
186
|
[id="plugins-{type}s-{plugin}-protobuf_version"]
|
133
|
-
===== `protobuf_version`
|
187
|
+
===== `protobuf_version`
|
134
188
|
|
135
189
|
* Value type is <<number,number>>
|
136
190
|
* Default value is 2
|
137
191
|
|
138
192
|
Protocol buffers version. Valid settings are 2, 3.
|
139
193
|
|
194
|
+
[id="plugins-{type}s-{plugin}-stop_on_error"]
|
195
|
+
===== `stop_on_error`
|
196
|
+
|
197
|
+
* Value type is <<boolean,boolean>>
|
198
|
+
* Default value is false
|
199
|
+
|
200
|
+
Stop entire pipeline when encountering a non decodable message.
|
201
|
+
|
202
|
+
[id="plugins-{type}s-{plugin}-pb3_encoder_autoconvert_types"]
|
203
|
+
===== `pb3_encoder_autoconvert_types`
|
204
|
+
|
205
|
+
* Value type is <<boolean,boolean>>
|
206
|
+
* Default value is true
|
140
207
|
|
208
|
+
Convert data types to match the protobuf definition (if possible).
|
209
|
+
The protobuf encoder library is very strict with regards to data types. Example: an event has an integer field but the protobuf definition expects a float. This would lead to an exception and the event would be lost.
|
210
|
+
This feature tries to convert the datatypes to the expectations of the protobuf definitions, without modifying the data whatsoever. Examples of conversions it might attempt:
|
211
|
+
"true" string => true boolean
|
212
|
+
17 int => 17.0 float
|
213
|
+
12345 number => "12345" string
|
141
214
|
|
@@ -92,8 +92,6 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
92
92
|
# `class_file` and `include_path` cannot be used at the same time.
|
93
93
|
config :class_file, :validate => :string, :default => '', :required => false
|
94
94
|
|
95
|
-
# Absolute path to the directory that contains all compiled protobuf files.
|
96
|
-
#
|
97
95
|
# Absolute path to the root directory that contains all referenced/used dependencies
|
98
96
|
# of the main class (`class_name`) or any of its dependencies.
|
99
97
|
#
|
@@ -101,12 +99,12 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
101
99
|
#
|
102
100
|
# pb3
|
103
101
|
# ├── header
|
104
|
-
# │
|
102
|
+
# │ └── header_pb.rb
|
105
103
|
# ├── messageA_pb.rb
|
106
104
|
#
|
107
105
|
# In this case `messageA_pb.rb` has an embedded message from `header/header_pb.rb`.
|
108
106
|
# If `class_file` is set to `messageA_pb.rb`, and `class_name` to
|
109
|
-
# `MessageA`, `protobuf_root_directory` must be set to `/path/to/pb3
|
107
|
+
# `MessageA`, `protobuf_root_directory` must be set to `/path/to/pb3`, which includes
|
110
108
|
# both definitions.
|
111
109
|
config :protobuf_root_directory, :validate => :string, :required => false
|
112
110
|
|
@@ -128,12 +126,6 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
128
126
|
# [source,ruby]
|
129
127
|
# include_path => ['/path/to/protobuf/definitions/Wings.pb.rb','/path/to/protobuf/definitions/Unicorn.pb.rb']
|
130
128
|
#
|
131
|
-
# When using the codec in an output plugin:
|
132
|
-
# * make sure to include all the desired fields in the protobuf definition, including timestamp.
|
133
|
-
# Remove fields that are not part of the protobuf definition from the event by using the mutate filter.
|
134
|
-
# * the @ symbol is currently not supported in field names when loading the protobuf definitions for encoding. Make sure to call the timestamp field "timestamp"
|
135
|
-
# instead of "@timestamp" in the protobuf file. Logstash event fields will be stripped of the leading @ before conversion.
|
136
|
-
#
|
137
129
|
# `class_file` and `include_path` cannot be used at the same time.
|
138
130
|
config :include_path, :validate => :array, :default => [], :required => false
|
139
131
|
|
@@ -145,6 +137,11 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
145
137
|
# To tolerate faulty messages that cannot be decoded, set this to false. Otherwise the pipeline will stop upon encountering a non decipherable message.
|
146
138
|
config :stop_on_error, :validate => :boolean, :default => false, :required => false
|
147
139
|
|
140
|
+
# Instruct the encoder to attempt converting data types to match the protobuf definitions. Available only for protobuf version 3.
|
141
|
+
config :pb3_encoder_autoconvert_types, :validate => :boolean, :default => true, :required => false
|
142
|
+
|
143
|
+
|
144
|
+
|
148
145
|
attr_reader :execution_context
|
149
146
|
|
150
147
|
# id of the pipeline whose events you want to read from.
|
@@ -156,6 +153,8 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
156
153
|
@metainfo_messageclasses = {}
|
157
154
|
@metainfo_enumclasses = {}
|
158
155
|
@metainfo_pb2_enumlist = []
|
156
|
+
@pb3_typeconversion_tag = "_protobuf_type_converted"
|
157
|
+
|
159
158
|
|
160
159
|
if @include_path.length > 0 and not class_file.strip.empty?
|
161
160
|
raise LogStash::ConfigurationError, "Cannot use `include_path` and `class_file` at the same time"
|
@@ -174,7 +173,6 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
174
173
|
end
|
175
174
|
|
176
175
|
@class_file = "#{@protobuf_root_directory}/#{@class_file}" unless (Pathname.new @class_file).absolute? or @class_file.empty?
|
177
|
-
|
178
176
|
# exclusive access while loading protobuf definitions
|
179
177
|
Google::Protobuf::DescriptorPool.with_lock.synchronize do
|
180
178
|
# load from `class_file`
|
@@ -184,6 +182,7 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
184
182
|
|
185
183
|
if @protobuf_version == 3
|
186
184
|
@pb_builder = Google::Protobuf::DescriptorPool.generated_pool.lookup(class_name).msgclass
|
185
|
+
|
187
186
|
else
|
188
187
|
@pb_builder = pb2_create_instance(class_name)
|
189
188
|
end
|
@@ -222,11 +221,11 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
222
221
|
|
223
222
|
def encode(event)
|
224
223
|
if @protobuf_version == 3
|
225
|
-
protobytes =
|
224
|
+
protobytes = pb3_encode(event)
|
226
225
|
else
|
227
|
-
protobytes =
|
226
|
+
protobytes = pb2_encode(event)
|
228
227
|
end
|
229
|
-
|
228
|
+
@on_event.call(event, protobytes)
|
230
229
|
end # def encode
|
231
230
|
|
232
231
|
|
@@ -256,101 +255,278 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
256
255
|
result
|
257
256
|
end
|
258
257
|
|
259
|
-
def
|
260
|
-
|
261
|
-
|
258
|
+
def pb3_encode(event)
|
259
|
+
|
260
|
+
datahash = event.to_hash
|
261
|
+
|
262
|
+
is_recursive_call = !event.get('tags').nil? and event.get('tags').include? @pb3_typeconversion_tag
|
263
|
+
if is_recursive_call
|
264
|
+
datahash = pb3_remove_typeconversion_tag(datahash)
|
265
|
+
end
|
266
|
+
datahash = pb3_prepare_for_encoding(datahash)
|
267
|
+
if datahash.nil?
|
268
|
+
@logger.warn("Protobuf encoding error 4: empty data for event #{event.to_hash}")
|
269
|
+
end
|
270
|
+
if @pb_builder.nil?
|
271
|
+
@logger.warn("Protobuf encoding error 5: empty protobuf builder for class #{@class_name}")
|
272
|
+
end
|
273
|
+
pb_obj = @pb_builder.new(datahash)
|
262
274
|
@pb_builder.encode(pb_obj)
|
275
|
+
|
263
276
|
rescue ArgumentError => e
|
264
277
|
k = event.to_hash.keys.join(", ")
|
265
|
-
@logger.
|
266
|
-
|
278
|
+
@logger.warn("Protobuf encoding error 1: Argument error (#{e.inspect}). Reason: probably mismatching protobuf definition. \
|
279
|
+
Required fields in the protobuf definition are: #{k} and fields must not begin with @ sign. The event has been discarded.")
|
280
|
+
rescue TypeError => e
|
281
|
+
pb3_handle_type_errors(event, e, is_recursive_call, datahash)
|
267
282
|
rescue => e
|
268
|
-
@logger.
|
269
|
-
raise e
|
283
|
+
@logger.warn("Protobuf encoding error 3: #{e.inspect}. Event discarded. Input data: #{datahash}. The event has been discarded. Backtrace: #{e.backtrace}")
|
270
284
|
end
|
271
285
|
|
272
286
|
|
273
|
-
def pb3_encode(datahash, class_name)
|
274
|
-
if datahash.is_a?(::Hash)
|
275
287
|
|
276
288
|
|
289
|
+
def pb3_handle_type_errors(event, e, is_recursive_call, datahash)
|
290
|
+
begin
|
291
|
+
if is_recursive_call
|
292
|
+
@logger.warn("Protobuf encoding error 2.1: Type error (#{e.inspect}). Some types could not be converted. The event has been discarded. Type mismatches: #{mismatches}.")
|
293
|
+
else
|
294
|
+
if @pb3_encoder_autoconvert_types
|
277
295
|
|
278
|
-
|
279
|
-
|
280
|
-
|
281
|
-
datahash = datahash.inject({}){|x,(k,v)| x[k.gsub(/@/,'').to_sym] = (should_convert_to_string?(v) ? v.to_s : v); x}
|
296
|
+
msg = "Protobuf encoding error 2.2: Type error (#{e.inspect}). Will try to convert the data types. Original data: #{datahash}"
|
297
|
+
@logger.warn(msg)
|
298
|
+
mismatches = pb3_get_type_mismatches(datahash, "", @class_name)
|
282
299
|
|
283
|
-
|
284
|
-
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
|
290
|
-
|
291
|
-
|
292
|
-
|
293
|
-
|
294
|
-
|
295
|
-
|
296
|
-
|
297
|
-
|
298
|
-
|
299
|
-
|
300
|
-
|
301
|
-
|
302
|
-
|
300
|
+
msg = "Protobuf encoding info 2.2: Type mismatches found: #{mismatches}." # TODO remove
|
301
|
+
@logger.warn(msg)
|
302
|
+
|
303
|
+
event = pb3_convert_mismatched_types(event, mismatches)
|
304
|
+
# Add a (temporary) tag to handle the recursion stop
|
305
|
+
pb3_add_tag(event, @pb3_typeconversion_tag )
|
306
|
+
pb3_encode(event)
|
307
|
+
else
|
308
|
+
@logger.warn("Protobuf encoding error 2.3: Type error (#{e.inspect}). The event has been discarded. Try setting pb3_encoder_autoconvert_types => true for automatic type conversion.")
|
309
|
+
end
|
310
|
+
end
|
311
|
+
rescue TypeError => e
|
312
|
+
if @pb3_encoder_autoconvert_types
|
313
|
+
@logger.warn("Protobuf encoding error 2.4.1: (#{e.inspect}). Failed to convert data types. The event has been discarded. original data: #{datahash}")
|
314
|
+
else
|
315
|
+
@logger.warn("Protobuf encoding error 2.4.2: (#{e.inspect}). The event has been discarded.")
|
316
|
+
end
|
317
|
+
rescue => ex
|
318
|
+
@logger.warn("Protobuf encoding error 2.5: (#{e.inspect}). The event has been discarded. Auto-typecasting was on: #{@pb3_encoder_autoconvert_types}")
|
319
|
+
end
|
320
|
+
end # pb3_handle_type_errors
|
321
|
+
|
322
|
+
def pb3_get_type_mismatches(data, key_prefix, pb_class)
|
323
|
+
mismatches = []
|
324
|
+
data.to_hash.each do |key, value|
|
325
|
+
expected_type = pb3_get_expected_type(key, pb_class)
|
326
|
+
r = pb3_compare_datatypes(value, key, key_prefix, pb_class, expected_type)
|
327
|
+
mismatches.concat(r)
|
328
|
+
end # data.each
|
329
|
+
mismatches
|
330
|
+
end
|
331
|
+
|
332
|
+
|
333
|
+
def pb3_get_expected_type(key, pb_class)
|
334
|
+
pb_descriptor = Google::Protobuf::DescriptorPool.generated_pool.lookup(pb_class)
|
335
|
+
|
336
|
+
if !pb_descriptor.nil?
|
337
|
+
pb_builder = pb_descriptor.msgclass
|
338
|
+
pb_obj = pb_builder.new({})
|
339
|
+
v = pb_obj.send(key)
|
340
|
+
|
341
|
+
if !v.nil?
|
342
|
+
v.class
|
343
|
+
else
|
344
|
+
nil
|
345
|
+
end
|
346
|
+
end
|
347
|
+
end
|
348
|
+
|
349
|
+
def pb3_compare_datatypes(value, key, key_prefix, pb_class, expected_type)
|
350
|
+
mismatches = []
|
351
|
+
|
352
|
+
if value.nil?
|
353
|
+
is_mismatch = false
|
354
|
+
else
|
355
|
+
case value
|
356
|
+
when ::Hash, Google::Protobuf::MessageExts
|
357
|
+
|
358
|
+
is_mismatch = false
|
359
|
+
descriptor = Google::Protobuf::DescriptorPool.generated_pool.lookup(pb_class).lookup(key)
|
360
|
+
if descriptor.subtype != nil
|
361
|
+
class_of_nested_object = pb3_get_descriptorpool_name(descriptor.subtype.msgclass)
|
362
|
+
new_prefix = "#{key}."
|
363
|
+
recursive_mismatches = pb3_get_type_mismatches(value, new_prefix, class_of_nested_object)
|
364
|
+
mismatches.concat(recursive_mismatches)
|
365
|
+
end
|
366
|
+
when ::Array
|
367
|
+
|
368
|
+
expected_type = pb3_get_expected_type(key, pb_class)
|
369
|
+
is_mismatch = (expected_type != Google::Protobuf::RepeatedField)
|
370
|
+
child_type = Google::Protobuf::DescriptorPool.generated_pool.lookup(pb_class).lookup(key).type
|
371
|
+
value.each_with_index do | v, i |
|
372
|
+
new_prefix = "#{key}."
|
373
|
+
recursive_mismatches = pb3_compare_datatypes(v, i.to_s, new_prefix, pb_class, child_type)
|
374
|
+
mismatches.concat(recursive_mismatches)
|
375
|
+
is_mismatch |= recursive_mismatches.any?
|
303
376
|
end # do
|
304
|
-
|
377
|
+
else # is scalar data type
|
305
378
|
|
306
|
-
|
307
|
-
|
308
|
-
|
309
|
-
|
310
|
-
|
311
|
-
|
312
|
-
|
313
|
-
|
314
|
-
|
315
|
-
|
316
|
-
|
317
|
-
|
318
|
-
|
319
|
-
|
320
|
-
|
321
|
-
|
322
|
-
|
323
|
-
|
324
|
-
|
325
|
-
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
330
|
-
|
379
|
+
is_mismatch = ! pb3_is_scalar_datatype_match(expected_type, value.class)
|
380
|
+
end # if
|
381
|
+
end # if value.nil?
|
382
|
+
|
383
|
+
if is_mismatch
|
384
|
+
mismatches << {"key" => "#{key_prefix}#{key}", "actual_type" => value.class, "expected_type" => expected_type, "value" => value}
|
385
|
+
end
|
386
|
+
mismatches
|
387
|
+
end
|
388
|
+
|
389
|
+
def pb3_remove_typeconversion_tag(data)
|
390
|
+
# remove the tag that we added to the event because
|
391
|
+
# the protobuf definition might not have a field for tags
|
392
|
+
data['tags'].delete(@pb3_typeconversion_tag)
|
393
|
+
if data['tags'].length == 0
|
394
|
+
data.delete('tags')
|
395
|
+
end
|
396
|
+
data
|
397
|
+
end
|
398
|
+
|
399
|
+
def pb3_get_descriptorpool_name(child_class)
|
400
|
+
# make instance
|
401
|
+
inst = child_class.new
|
402
|
+
# get the lookup name for the Descriptorpool
|
403
|
+
inst.class.descriptor.name
|
404
|
+
end
|
405
|
+
|
406
|
+
def pb3_is_scalar_datatype_match(expected_type, actual_type)
|
407
|
+
if expected_type == actual_type
|
408
|
+
true
|
409
|
+
else
|
410
|
+
e = expected_type.to_s.downcase.to_sym
|
411
|
+
a = actual_type.to_s.downcase.to_sym
|
412
|
+
case e
|
413
|
+
# when :string, :integer
|
414
|
+
when :string
|
415
|
+
a == e
|
416
|
+
when :integer
|
417
|
+
a == e
|
418
|
+
when :float
|
419
|
+
a == :float || a == :integer
|
420
|
+
end
|
421
|
+
end
|
422
|
+
end
|
423
|
+
|
424
|
+
|
425
|
+
def pb3_convert_mismatched_types_getter(struct, key)
|
426
|
+
if struct.is_a? ::Hash
|
427
|
+
struct[key]
|
428
|
+
else
|
429
|
+
struct.get(key)
|
430
|
+
end
|
431
|
+
end
|
432
|
+
|
433
|
+
def pb3_convert_mismatched_types_setter(struct, key, value)
|
434
|
+
if struct.is_a? ::Hash
|
435
|
+
struct[key] = value
|
436
|
+
else
|
437
|
+
struct.set(key, value)
|
438
|
+
end
|
439
|
+
struct
|
440
|
+
end
|
441
|
+
|
442
|
+
def pb3_add_tag(event, tag )
|
443
|
+
if event.get('tags').nil?
|
444
|
+
event.set('tags', [tag])
|
445
|
+
else
|
446
|
+
existing_tags = event.get('tags')
|
447
|
+
event.set("tags", existing_tags << tag)
|
448
|
+
end
|
449
|
+
end
|
450
|
+
|
451
|
+
# Due to recursion on nested fields in the event object this method might be given an event (1st call) or a hash (2nd .. nth call)
|
452
|
+
# First call will be the event object, child objects will be hashes.
|
453
|
+
def pb3_convert_mismatched_types(struct, mismatches)
|
454
|
+
mismatches.each do | m |
|
455
|
+
key = m['key']
|
456
|
+
expected_type = m['expected_type']
|
457
|
+
actual_type = m['actual_type']
|
458
|
+
if key.include? "." # the mismatch is in a child object
|
459
|
+
levels = key.split(/\./) # key is something like http_user_agent.minor_version and needs to be splitted.
|
460
|
+
key = levels[0]
|
461
|
+
sub_levels = levels.drop(1).join(".")
|
462
|
+
new_mismatches = [{"key"=>sub_levels, "actual_type"=>m["actual_type"], "expected_type"=>m["expected_type"]}]
|
463
|
+
value = pb3_convert_mismatched_types_getter(struct, key)
|
464
|
+
new_value = pb3_convert_mismatched_types(value, new_mismatches)
|
465
|
+
struct = pb3_convert_mismatched_types_setter(struct, key, new_value )
|
466
|
+
else
|
467
|
+
value = pb3_convert_mismatched_types_getter(struct, key)
|
468
|
+
begin
|
469
|
+
case expected_type.to_s
|
470
|
+
when "Integer"
|
471
|
+
case actual_type.to_s
|
472
|
+
when "String"
|
473
|
+
new_value = value.to_i
|
474
|
+
when "Float"
|
475
|
+
if value.floor == value # convert values like 2.0 to 2, but not 2.1
|
476
|
+
new_value = value.to_i
|
477
|
+
end
|
478
|
+
end
|
479
|
+
when "String"
|
480
|
+
new_value = value.to_s
|
481
|
+
when "Float"
|
482
|
+
new_value = value.to_f
|
483
|
+
when "Boolean","TrueClass", "FalseClass"
|
484
|
+
new_value = value.to_s.downcase == "true"
|
485
|
+
end
|
486
|
+
if !new_value.nil?
|
487
|
+
struct = pb3_convert_mismatched_types_setter(struct, key, new_value )
|
488
|
+
end
|
489
|
+
rescue Exception => ex
|
490
|
+
@logger.debug("Protobuf encoding error 5: Could not convert types for protobuf encoding: #{ex}")
|
331
491
|
end
|
332
|
-
|
333
|
-
|
334
|
-
|
492
|
+
end # if key contains .
|
493
|
+
end # mismatches.each
|
494
|
+
struct
|
495
|
+
end
|
496
|
+
|
497
|
+
def pb3_prepare_for_encoding(datahash)
|
498
|
+
# 0) Remove empty fields.
|
499
|
+
datahash = datahash.select { |key, value| !value.nil? }
|
500
|
+
|
501
|
+
# Preparation: the data cannot be encoded until certain criteria are met:
|
502
|
+
# 1) remove @ signs from keys.
|
503
|
+
# 2) convert timestamps and other objects to strings
|
504
|
+
datahash = datahash.inject({}){|x,(k,v)| x[k.gsub(/@/,'').to_sym] = (should_convert_to_string?(v) ? v.to_s : v); x}
|
505
|
+
|
506
|
+
datahash.each do |key, value|
|
507
|
+
datahash[key] = pb3_prepare_for_encoding(value) if value.is_a?(Hash)
|
335
508
|
end
|
509
|
+
|
336
510
|
datahash
|
337
511
|
end
|
338
512
|
|
339
|
-
|
340
|
-
|
513
|
+
|
514
|
+
|
515
|
+
def pb2_encode(event)
|
516
|
+
data = pb2_prepare_for_encoding(event.to_hash, @class_name)
|
341
517
|
msg = @pb_builder.new(data)
|
342
518
|
msg.serialize_to_string
|
343
519
|
rescue NoMethodError => e
|
344
|
-
@logger.
|
520
|
+
@logger.warn("Encoding error 2. Probably mismatching protobuf definition. Required fields in the protobuf definition are: " + event.to_hash.keys.join(", ") + " and the timestamp field name must not include a @. ")
|
345
521
|
raise e
|
346
522
|
rescue => e
|
347
|
-
@logger.
|
523
|
+
@logger.warn("Encoding error 1: #{e.inspect}")
|
348
524
|
raise e
|
349
525
|
end
|
350
526
|
|
351
527
|
|
352
528
|
|
353
|
-
def
|
529
|
+
def pb2_prepare_for_encoding(datahash, class_name)
|
354
530
|
if datahash.is_a?(::Hash)
|
355
531
|
# Preparation: the data cannot be encoded until certain criteria are met:
|
356
532
|
# 1) remove @ signs from keys.
|
@@ -368,11 +544,11 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
368
544
|
# make this field an array/list of protobuf objects
|
369
545
|
# value is a list of hashed complex objects, each of which needs to be protobuffed and
|
370
546
|
# put back into the list.
|
371
|
-
original_value.map { |x|
|
547
|
+
original_value.map { |x| pb2_prepare_for_encoding(x, c) }
|
372
548
|
original_value
|
373
549
|
else
|
374
550
|
proto_obj = pb2_create_instance(c)
|
375
|
-
proto_obj.new(
|
551
|
+
proto_obj.new(pb2_prepare_for_encoding(original_value, c)) # this line is reached in the colourtest for an enum. Enums should not be instantiated. Should enums even be in the messageclasses? I dont think so! TODO bug
|
376
552
|
end # if is array
|
377
553
|
end # if datahash_include
|
378
554
|
end # do
|
@@ -383,7 +559,7 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
383
559
|
|
384
560
|
|
385
561
|
def should_convert_to_string?(v)
|
386
|
-
!(v.is_a?(
|
562
|
+
!(v.is_a?(Integer) || v.is_a?(Float) || v.is_a?(::Hash) || v.is_a?(::Array) || [true, false].include?(v))
|
387
563
|
end
|
388
564
|
|
389
565
|
|
@@ -394,6 +570,7 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
394
570
|
|
395
571
|
|
396
572
|
def pb3_metadata_analyis(filename)
|
573
|
+
|
397
574
|
regex_class_name = /\s*add_message "(?<name>.+?)" do\s+/ # TODO optimize both regexes for speed (negative lookahead)
|
398
575
|
regex_pbdefs = /\s*(optional|repeated)(\s*):(?<name>.+),(\s*):(?<type>\w+),(\s*)(?<position>\d+)(, \"(?<enum_class>.*?)\")?/
|
399
576
|
class_name = ""
|
@@ -493,6 +670,7 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
|
|
493
670
|
require filename
|
494
671
|
rescue Exception => e
|
495
672
|
@logger.error("Unable to load file: #{filename}. Reason: #{e.inspect}")
|
673
|
+
raise e
|
496
674
|
end
|
497
675
|
end
|
498
676
|
|