logstash-codec-protobuf 1.2.1 → 1.2.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 48d78608a993ce87e16cefbf92995c27ea0f95b3a9af82f030d3bd7fc22a1e6d
4
- data.tar.gz: 5f1e2aab280322aab0878eae12ee51f98f30257025ce8d50b0bac39a4edd793e
3
+ metadata.gz: a4a059736022035d5e326d6cb27ddec13cd54da37dcb1d2f37a8076cb0179e37
4
+ data.tar.gz: 3402775465ce05bbbbc87a7367acc482cd7d2468c84e27ad07ef0ac4383e8a97
5
5
  SHA512:
6
- metadata.gz: 932aaff952bf4982d5192701eb3a3997416d76dc750516d53e48f08689f16f93bef0c734a2813f9d71f36ac8a3566fc8916af0c3e3b33b58714b33f40848b1f0
7
- data.tar.gz: 76562195f5f05b10dc34695447ace2bd74081e1fdee281c4f5b28e04722f518f6fff0fac789b462c7a7be2877c03236922063d7bb86d16676c853bf1506cac26
6
+ metadata.gz: 4b1dea1ed7701b7e6e26e8226991bc718fab06302071f60b7c8b8b0b4b9a2b45d1d0809be6fb7a76a30af0c291181a5a77cb8a51cdf2a4c5e9ddc2596c0afc1e
7
+ data.tar.gz: 778c28f5ed7cfa1ed5bd54f175f8efec4dc74e0d93e1d47dc3ded5671a2e61c6baf4e2f9b6bf9e87e896324f3b2a3e22c3a3692195b62fcb23a2bc37815f26aa
@@ -1,3 +1,6 @@
1
+ ## 1.2.2
2
+ - Add type conversion feature to encoder
3
+
1
4
  ## 1.2.1
2
5
  - Keep original data in case of parsing errors
3
6
 
data/README.md CHANGED
@@ -3,34 +3,46 @@
3
3
  This is a codec plugin for [Logstash](https://github.com/elastic/logstash) to parse protobuf messages.
4
4
 
5
5
  # Prerequisites and Installation
6
-
6
+
7
7
  * prepare your Ruby versions of the Protobuf definitions:
8
- * For protobuf 2 use the [ruby-protoc compiler](https://github.com/codekitchen/ruby-protocol-buffers).
8
+ * For protobuf 2 use the [ruby-protoc compiler](https://github.com/codekitchen/ruby-protocol-buffers).
9
9
  * For protobuf 3 use the [official google protobuf compiler](https://developers.google.com/protocol-buffers/docs/reference/ruby-generated).
10
10
  * install the codec: `bin/logstash-plugin install logstash-codec-protobuf`
11
11
  * use the codec in your Logstash config file. See details below.
12
12
 
13
13
  ## Configuration
14
14
 
15
- `include_path` (required): an array of strings with filenames where logstash can find your protobuf definitions. Requires absolute paths. Please note that protobuf v2 files have the ending `.pb.rb` whereas files compiled for protobuf v3 end in `_pb.rb`.
15
+ There are two ways to specify the locations of the ruby protobuf definitions:
16
+ * specify each class and their loading order using the configurations `include_path`. This option will soon be deprecated in favour of the autoloader.
17
+ * specify the path to the main protobuf class, and a folder from which to load its dependencies, using `class_file` and `protobuf_root_directory`. The codec will detect the dependencies of each file and load them automatically.
18
+
19
+ `include_path` (optional): an array of strings with filenames where logstash can find your protobuf definitions. Requires absolute paths. Please note that protobuf v2 files have the ending `.pb.rb` whereas files compiled for protobuf v3 end in `_pb.rb`. Cannot be used together with `protobuf_root_directory` or `class_file`.
20
+
21
+ `protobuf_root_directory` (optional): Only to be used in combination with `class_file`. Absolute path to the directory that contains all compiled protobuf files. Cannot be used together with `include_path`.
22
+
23
+ `class_file` (optional): Relative path to the ruby file that contains class_name. Only to be used in combination with `protobuf_root_directory`. Cannot be used together with `include_path`.
16
24
 
17
- `class_name` (required): the name of the protobuf class that is to be decoded or encoded. For protobuf 2 separate the modules with ::. For protobuf 3 use single dots.
25
+ `class_name` (required): the name of the protobuf class that is to be decoded or encoded. For protobuf 2 separate the modules with ::. For protobuf 3 use single dots.
18
26
 
19
27
  `protobuf_version` (optional): set this to 3 if you want to use protobuf 3 definitions. Defaults to 2.
20
28
 
29
+ `stop_on_error` (optional): Decoder only: will stop the entire pipeline upon discovery of a non decodable message. Deactivated by default.
30
+
31
+ `pb3_encoder_autoconvert_types` (optional): Encoder only: will try to fix type mismatches between the protobuf definition and the actual data. Available for protobuf 3 only. Activated by default.
32
+
21
33
  ## Usage example: decoder
22
34
 
23
35
  Use this as a codec in any logstash input. Just provide the name of the class that your incoming objects will be encoded in, and specify the path to the compiled definition.
24
36
  Here's an example for a kafka input with protobuf 2:
25
37
 
26
38
  ```ruby
27
- kafka
39
+ kafka
28
40
  {
29
41
  topic_id => "..."
30
42
  key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
31
43
  value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
32
44
 
33
- codec => protobuf
45
+ codec => protobuf
34
46
  {
35
47
  class_name => "Animals::Mammals::Unicorn"
36
48
  include_path => ['/path/to/pb_definitions/Animal.pb.rb', '/path/to/pb_definitions/Unicorn.pb.rb']
@@ -38,15 +50,15 @@ kafka
38
50
  }
39
51
  ```
40
52
 
41
- Example for protobuf 3:
53
+ Example for protobuf 3, manual class loading specification (deprecated):
42
54
 
43
55
  ```ruby
44
- kafka
56
+ kafka
45
57
  {
46
58
  topic_id => "..."
47
59
  key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
48
60
  value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
49
- codec => protobuf
61
+ codec => protobuf
50
62
  {
51
63
  class_name => "Animals.Mammals.Unicorn"
52
64
  include_path => ['/path/to/pb_definitions/Animal_pb.rb', '/path/to/pb_definitions/Unicorn_pb.rb']
@@ -55,6 +67,25 @@ kafka
55
67
  }
56
68
  ```
57
69
 
70
+ Example for protobuf 3, automatic class loading specification:
71
+
72
+ ```ruby
73
+ kafka
74
+ {
75
+ topic_id => "..."
76
+ key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
77
+ value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
78
+ codec => protobuf
79
+ {
80
+ class_name => "Animals.Mammals.Unicorn"
81
+ class_file => '/path/to/pb_definitions/some_folder/Unicorn_pb.rb'
82
+ protobuf_root_directory => "/path/to/pb_definitions/"
83
+ protobuf_version => 3
84
+ }
85
+ }
86
+ ```
87
+ In this example, all protobuf files must live in a subfolder of `/path/to/pb_definitions/`.
88
+
58
89
  For version 3 class names check the bottom of the generated protobuf ruby file. It contains lines like this:
59
90
 
60
91
  ```ruby
@@ -67,7 +98,7 @@ If you're using a kafka input please also set the deserializer classes as shown
67
98
 
68
99
  ### Class loading order
69
100
 
70
- Imagine you have the following protobuf version 2 relationship: class Unicorn lives in namespace Animal::Horse and uses another class Wings.
101
+ Imagine you have the following protobuf version 2 relationship: class Unicorn lives in namespace Animal::Horse and uses another class Wings.
71
102
 
72
103
  ```ruby
73
104
  module Animal
@@ -91,7 +122,7 @@ Set the class name to the parent class:
91
122
  class_name => "Animal::Mammal::Unicorn"
92
123
  ```
93
124
 
94
- for protobuf 2. For protobuf 3 use
125
+ for protobuf 2. For protobuf 3 use
95
126
 
96
127
  ```ruby
97
128
  class_name => "Animal.Mammal.Unicorn"
@@ -99,14 +130,35 @@ class_name => "Animal.Mammal.Unicorn"
99
130
 
100
131
  ## Usage example: encoder
101
132
 
102
- The configuration of the codec for encoding logstash events for a protobuf output is pretty much the same as for the decoder input usage as demonstrated above. There are some constraints though that you need to be aware of:
133
+ The configuration of the codec for encoding logstash events for a protobuf output is pretty much the same as for the decoder input usage as demonstrated above, with the following exception: when writing to the Kafka output,
134
+ * do not set the `value_deserializer_class` or the `key_deserializer_class`.
135
+ * do set the serializer class like so: `value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"`.
136
+
137
+ Please be aware of the following:
103
138
  * the protobuf definition needs to contain all the fields that logstash typically adds to an event, in the corrent data type. Examples for this are `@timestamp` (string), `@version` (string), `host`, `path`, all of which depend on your input sources and filters aswell. If you do not want to add those fields to your protobuf definition then please use a [modify filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html) to [remove](https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-remove_field) the undesired fields.
104
139
  * object members starting with `@` are somewhat problematic in protobuf definitions. Therefore those fields will automatically be renamed to remove the at character. This also effects the important `@timestamp` field. Please name it just "timestamp" in your definition.
140
+ * fields with a nil value will automatically be removed from the event. Empty fields will not be removed.
141
+ * it is recommended to set the config option `pb3_encoder_autoconvert_types` to true. Otherwise any type mismatch between your data and the protobuf definition will cause an event to be lost. The auto typeconversion does not alter your data. It just tries to convert obviously identical data into the expected datatype, such as converting integers to floats where floats are expected, or "true" / "false" strings into booleans where booleans are expected.
105
142
 
143
+ ```ruby
144
+ kafka
145
+ {
146
+ codec => protobuf
147
+ {
148
+ class_name => "Animals.Mammals.Unicorn"
149
+ class_file => '/path/to/pb_definitions/some_folder/Unicorn_pb.rb'
150
+ protobuf_root_directory => "/path/to/pb_definitions/"
151
+ protobuf_version => 3
152
+ }
153
+ ...
154
+ value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"
155
+ }
156
+ }
157
+ ```
106
158
 
107
159
  ## Troubleshooting
108
160
 
109
- ### Protobuf 2
161
+ ### Decoder: Protobuf 2
110
162
  #### "uninitialized constant SOME_CLASS_NAME"
111
163
 
112
164
  If you include more than one definition class, consider the order of inclusion. This is especially relevant if you include whole directories. A definition might refer to another definition that is not loaded yet. In this case, please specify the files in the `include_path` variable in reverse order of reference. See 'Example with referenced definitions' above.
@@ -115,12 +167,16 @@ If you include more than one definition class, consider the order of inclusion.
115
167
 
116
168
  Maybe your protobuf definition does not fullfill the requirements and needs additional fields. Run logstash with the `--debug` flag and search for error messages.
117
169
 
118
- ### Protobuf 3
170
+ ### Decoder: Protobuf 3
171
+
172
+ #### NullPointerException
173
+
174
+ Check for missing imports. There's a high probability that one of the imported classes has dependencies of its own and those are not being fully satisfied. To avoid this, consider using the autoloader feature by setting the configurations for `protobuf_root_directory` and `class_file`.
119
175
 
120
- Tba.
176
+ ### Encoder: Protobuf 3
121
177
 
122
- ## Limitations and roadmap
178
+ #### NullPointerException
123
179
 
124
- * maybe add support for setting undefined fields from default values in the decoder
180
+ Check for missing imports. There's a high probability that one of the imported classes has dependencies of its own and those are not being fully satisfied. To avoid this, consider using the autoloader feature by setting the configurations for `protobuf_root_directory` and `class_file`.
125
181
 
126
182
 
@@ -28,29 +28,31 @@ For protobuf 3 use the https://developers.google.com/protocol-buffers/docs/refer
28
28
 
29
29
  The following shows a usage example (protobuf v2) for decoding events from a kafka stream:
30
30
  [source,ruby]
31
- kafka
31
+ kafka
32
32
  {
33
- topic_id => "..."
34
- key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
35
- value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
36
- codec => protobuf
37
- {
38
- class_name => "Animals::Mammals::Unicorn"
39
- include_path => ['/path/to/protobuf/definitions/UnicornProtobuf.pb.rb']
40
- }
33
+ topic_id => "..."
34
+ key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
35
+ value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
36
+ codec => protobuf
37
+ {
38
+ class_name => "Animals::Mammals::Unicorn"
39
+ class_file => '/path/to/pb_definitions/some_folder/Unicorn.pb.rb'
40
+ protobuf_root_directory => "/path/to/pb_definitions/"
41
+ }
41
42
  }
42
43
 
43
- Usage example for protobuf v3:
44
+ Decoder usage example for protobuf v3:
44
45
  [source,ruby]
45
- kafka
46
+ kafka
46
47
  {
47
48
  topic_id => "..."
48
49
  key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
49
50
  value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
50
- codec => protobuf
51
+ codec => protobuf
51
52
  {
52
53
  class_name => "Animals.Mammals.Unicorn"
53
- include_path => ['/path/to/pb_definitions/Animal_pb.rb', '/path/to/pb_definitions/Unicorn_pb.rb']
54
+ class_file => '/path/to/pb_definitions/some_folder/Unicorn_pb.rb'
55
+ protobuf_root_directory => "/path/to/pb_definitions/"
54
56
  protobuf_version => 3
55
57
  }
56
58
  }
@@ -60,10 +62,29 @@ The codec can be used in input and output plugins. +
60
62
  When using the codec in the kafka input plugin please set the deserializer classes as shown above. +
61
63
  When using the codec in an output plugin:
62
64
 
63
- * make sure to include all the desired fields in the protobuf definition, including timestamp.
64
- Remove fields that are not part of the protobuf definition from the event by using the mutate filter.
65
+ * make sure to include all the desired fields in the protobuf definition, including timestamp.
66
+ Remove fields that are not part of the protobuf definition from the event by using the mutate filter. Encoding will fail if the event has fields which are not in the protobuf definition.
65
67
  * the `@` symbol is currently not supported in field names when loading the protobuf definitions for encoding. Make sure to call the timestamp field `timestamp`
66
68
  instead of `@timestamp` in the protobuf file. Logstash event fields will be stripped of the leading `@` before conversion.
69
+ * fields with a nil value will automatically be removed from the event. Empty fields will not be removed.
70
+ * it is recommended to set the config option `pb3_encoder_autoconvert_types` to true. Otherwise any type mismatch between your data and the protobuf definition will cause an event to be lost. The auto typeconversion does not alter your data. It just tries to convert obviously identical data into the expected datatype, such as converting integers to floats where floats are expected, or "true" / "false" strings into booleans where booleans are expected.
71
+ * When writing to Kafka: set the serializer class: `value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"`
72
+
73
+ Encoder usage example (protobufg v3):
74
+
75
+ [source,ruby]
76
+ kafka
77
+ {
78
+ codec => protobuf
79
+ {
80
+ class_name => "Animals.Mammals.Unicorn"
81
+ class_file => '/path/to/pb_definitions/some_folder/Unicorn_pb.rb'
82
+ protobuf_root_directory => "/path/to/pb_definitions/"
83
+ protobuf_version => 3
84
+ }
85
+ value_serializer => "org.apache.kafka.common.serialization.ByteArraySerializer"
86
+ }
87
+ }
67
88
 
68
89
 
69
90
  [id="plugins-{type}s-{plugin}-options"]
@@ -73,14 +94,18 @@ When using the codec in an output plugin:
73
94
  |=======================================================================
74
95
  |Setting |Input type|Required
75
96
  | <<plugins-{type}s-{plugin}-class_name>> |<<string,string>>|Yes
76
- | <<plugins-{type}s-{plugin}-include_path>> |<<array,array>>|Yes
97
+ | <<plugins-{type}s-{plugin}-class_file>> |<<string,string>>|No
98
+ | <<plugins-{type}s-{plugin}-protobuf_root_directory>> |<<string,string>>|No
99
+ | <<plugins-{type}s-{plugin}-include_path>> |<<array,array>>|No
77
100
  | <<plugins-{type}s-{plugin}-protobuf_version>> |<<number,number>>|Yes
101
+ | <<plugins-{type}s-{plugin}-stop_on_error>> |<<boolean,boolean>>|No
102
+ | <<plugins-{type}s-{plugin}-pb3_encoder_autoconvert_types>> |<<boolean,boolean>>|No
78
103
  |=======================================================================
79
104
 
80
105
  &nbsp;
81
106
 
82
107
  [id="plugins-{type}s-{plugin}-class_name"]
83
- ===== `class_name`
108
+ ===== `class_name`
84
109
 
85
110
  * This is a required setting.
86
111
  * Value type is <<string,string>>
@@ -99,18 +124,45 @@ For protobuf v3, you can copy the class name from the Descriptorpool registratio
99
124
  [source,ruby]
100
125
  Animals.Mammals.Unicorn = Google::Protobuf::DescriptorPool.generated_pool.lookup("Animals.Mammals.Unicorn").msgclass
101
126
 
102
-
103
127
  If your class references other definitions: you only have to add the name of the main class here.
104
128
 
129
+ [id="plugins-{type}s-{plugin}-class_file"]
130
+ ===== `class_file`
131
+
132
+ * Value type is <<string,string>>
133
+ * There is no default value for this setting.
134
+
135
+ Absolute path to the directory that contains all compiled protobuf files. If the protobuf definitions are spread across multiple folders, this needs to point to the folder containing all those folders.
136
+
137
+ [id="plugins-{type}s-{plugin}-protobuf_root_directory"]
138
+ ===== `protobuf_root_directory`
139
+
140
+ * Value type is <<string,string>>
141
+ * There is no default value for this setting.
142
+
143
+ Absolute path to the root directory that contains all referenced/used dependencies of the main class (`class_name`) or any of its dependencies. Must be used in combination with the `class_file` setting, and can not be used in combination with the legacy loading mechanism `include_path`.
144
+
145
+ Example:
146
+
147
+ [source]
148
+ pb3
149
+ ├── header
150
+ │ └── header_pb.rb
151
+ ├── messageA_pb.rb
152
+
153
+ In this case `messageA_pb.rb` has an embedded message from `header/header_pb.rb`.
154
+ If `class_file` is set to `messageA_pb.rb`, and `class_name` to `MessageA`, `protobuf_root_directory` must be set to `/path/to/pb3`, which includes both definitions.
155
+
156
+
105
157
  [id="plugins-{type}s-{plugin}-include_path"]
106
- ===== `include_path`
158
+ ===== `include_path`
107
159
 
108
- * This is a required setting.
109
160
  * Value type is <<array,array>>
110
161
  * There is no default value for this setting.
111
162
 
112
- List of absolute pathes to files with protobuf definitions.
113
- When using more than one file, make sure to arrange the files in reverse order of dependency so that each class is loaded before it is
163
+ Legacy protobuf definition loading mechanism for backwards compatibility:
164
+ List of absolute pathes to files with protobuf definitions.
165
+ When using more than one file, make sure to arrange the files in reverse order of dependency so that each class is loaded before it is
114
166
  refered to by another.
115
167
 
116
168
  Example: a class _Unicorn_ referencing another protobuf class _Wings_
@@ -129,13 +181,34 @@ include_path => ['/path/to/pb_definitions/wings.pb.rb','/path/to/pb_definitions/
129
181
 
130
182
  Please note that protobuf v2 files have the ending `.pb.rb` whereas files compiled for protobuf v3 end in `_pb.rb`.
131
183
 
184
+ Cannot be used together with `protobuf_root_directory` or `class_file`.
185
+
132
186
  [id="plugins-{type}s-{plugin}-protobuf_version"]
133
- ===== `protobuf_version`
187
+ ===== `protobuf_version`
134
188
 
135
189
  * Value type is <<number,number>>
136
190
  * Default value is 2
137
191
 
138
192
  Protocol buffers version. Valid settings are 2, 3.
139
193
 
194
+ [id="plugins-{type}s-{plugin}-stop_on_error"]
195
+ ===== `stop_on_error`
196
+
197
+ * Value type is <<boolean,boolean>>
198
+ * Default value is false
199
+
200
+ Stop entire pipeline when encountering a non decodable message.
201
+
202
+ [id="plugins-{type}s-{plugin}-pb3_encoder_autoconvert_types"]
203
+ ===== `pb3_encoder_autoconvert_types`
204
+
205
+ * Value type is <<boolean,boolean>>
206
+ * Default value is true
140
207
 
208
+ Convert data types to match the protobuf definition (if possible).
209
+ The protobuf encoder library is very strict with regards to data types. Example: an event has an integer field but the protobuf definition expects a float. This would lead to an exception and the event would be lost.
210
+ This feature tries to convert the datatypes to the expectations of the protobuf definitions, without modifying the data whatsoever. Examples of conversions it might attempt:
211
+ "true" string => true boolean
212
+ 17 int => 17.0 float
213
+ 12345 number => "12345" string
141
214
 
@@ -92,8 +92,6 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
92
92
  # `class_file` and `include_path` cannot be used at the same time.
93
93
  config :class_file, :validate => :string, :default => '', :required => false
94
94
 
95
- # Absolute path to the directory that contains all compiled protobuf files.
96
- #
97
95
  # Absolute path to the root directory that contains all referenced/used dependencies
98
96
  # of the main class (`class_name`) or any of its dependencies.
99
97
  #
@@ -101,12 +99,12 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
101
99
  #
102
100
  # pb3
103
101
  # ├── header
104
- # │   └── header_pb.rb
102
+ # │ └── header_pb.rb
105
103
  # ├── messageA_pb.rb
106
104
  #
107
105
  # In this case `messageA_pb.rb` has an embedded message from `header/header_pb.rb`.
108
106
  # If `class_file` is set to `messageA_pb.rb`, and `class_name` to
109
- # `MessageA`, `protobuf_root_directory` must be set to `/path/to/pb3`. Which includes
107
+ # `MessageA`, `protobuf_root_directory` must be set to `/path/to/pb3`, which includes
110
108
  # both definitions.
111
109
  config :protobuf_root_directory, :validate => :string, :required => false
112
110
 
@@ -128,12 +126,6 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
128
126
  # [source,ruby]
129
127
  # include_path => ['/path/to/protobuf/definitions/Wings.pb.rb','/path/to/protobuf/definitions/Unicorn.pb.rb']
130
128
  #
131
- # When using the codec in an output plugin:
132
- # * make sure to include all the desired fields in the protobuf definition, including timestamp.
133
- # Remove fields that are not part of the protobuf definition from the event by using the mutate filter.
134
- # * the @ symbol is currently not supported in field names when loading the protobuf definitions for encoding. Make sure to call the timestamp field "timestamp"
135
- # instead of "@timestamp" in the protobuf file. Logstash event fields will be stripped of the leading @ before conversion.
136
- #
137
129
  # `class_file` and `include_path` cannot be used at the same time.
138
130
  config :include_path, :validate => :array, :default => [], :required => false
139
131
 
@@ -145,6 +137,11 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
145
137
  # To tolerate faulty messages that cannot be decoded, set this to false. Otherwise the pipeline will stop upon encountering a non decipherable message.
146
138
  config :stop_on_error, :validate => :boolean, :default => false, :required => false
147
139
 
140
+ # Instruct the encoder to attempt converting data types to match the protobuf definitions. Available only for protobuf version 3.
141
+ config :pb3_encoder_autoconvert_types, :validate => :boolean, :default => true, :required => false
142
+
143
+
144
+
148
145
  attr_reader :execution_context
149
146
 
150
147
  # id of the pipeline whose events you want to read from.
@@ -156,6 +153,8 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
156
153
  @metainfo_messageclasses = {}
157
154
  @metainfo_enumclasses = {}
158
155
  @metainfo_pb2_enumlist = []
156
+ @pb3_typeconversion_tag = "_protobuf_type_converted"
157
+
159
158
 
160
159
  if @include_path.length > 0 and not class_file.strip.empty?
161
160
  raise LogStash::ConfigurationError, "Cannot use `include_path` and `class_file` at the same time"
@@ -174,7 +173,6 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
174
173
  end
175
174
 
176
175
  @class_file = "#{@protobuf_root_directory}/#{@class_file}" unless (Pathname.new @class_file).absolute? or @class_file.empty?
177
-
178
176
  # exclusive access while loading protobuf definitions
179
177
  Google::Protobuf::DescriptorPool.with_lock.synchronize do
180
178
  # load from `class_file`
@@ -184,6 +182,7 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
184
182
 
185
183
  if @protobuf_version == 3
186
184
  @pb_builder = Google::Protobuf::DescriptorPool.generated_pool.lookup(class_name).msgclass
185
+
187
186
  else
188
187
  @pb_builder = pb2_create_instance(class_name)
189
188
  end
@@ -222,11 +221,11 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
222
221
 
223
222
  def encode(event)
224
223
  if @protobuf_version == 3
225
- protobytes = pb3_encode_wrapper(event)
224
+ protobytes = pb3_encode(event)
226
225
  else
227
- protobytes = pb2_encode_wrapper(event)
226
+ protobytes = pb2_encode(event)
228
227
  end
229
- @on_event.call(event, protobytes)
228
+ @on_event.call(event, protobytes)
230
229
  end # def encode
231
230
 
232
231
 
@@ -256,101 +255,278 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
256
255
  result
257
256
  end
258
257
 
259
- def pb3_encode_wrapper(event)
260
- data = pb3_encode(event.to_hash, @class_name)
261
- pb_obj = @pb_builder.new(data)
258
+ def pb3_encode(event)
259
+
260
+ datahash = event.to_hash
261
+
262
+ is_recursive_call = !event.get('tags').nil? and event.get('tags').include? @pb3_typeconversion_tag
263
+ if is_recursive_call
264
+ datahash = pb3_remove_typeconversion_tag(datahash)
265
+ end
266
+ datahash = pb3_prepare_for_encoding(datahash)
267
+ if datahash.nil?
268
+ @logger.warn("Protobuf encoding error 4: empty data for event #{event.to_hash}")
269
+ end
270
+ if @pb_builder.nil?
271
+ @logger.warn("Protobuf encoding error 5: empty protobuf builder for class #{@class_name}")
272
+ end
273
+ pb_obj = @pb_builder.new(datahash)
262
274
  @pb_builder.encode(pb_obj)
275
+
263
276
  rescue ArgumentError => e
264
277
  k = event.to_hash.keys.join(", ")
265
- @logger.debug("Encoding error 2. Probably mismatching protobuf definition. Required fields in the protobuf definition are: #{k} and the timestamp field name must not include an @.")
266
- raise e
278
+ @logger.warn("Protobuf encoding error 1: Argument error (#{e.inspect}). Reason: probably mismatching protobuf definition. \
279
+ Required fields in the protobuf definition are: #{k} and fields must not begin with @ sign. The event has been discarded.")
280
+ rescue TypeError => e
281
+ pb3_handle_type_errors(event, e, is_recursive_call, datahash)
267
282
  rescue => e
268
- @logger.debug("Couldn't generate protobuf: #{e.inspect}")
269
- raise e
283
+ @logger.warn("Protobuf encoding error 3: #{e.inspect}. Event discarded. Input data: #{datahash}. The event has been discarded. Backtrace: #{e.backtrace}")
270
284
  end
271
285
 
272
286
 
273
- def pb3_encode(datahash, class_name)
274
- if datahash.is_a?(::Hash)
275
287
 
276
288
 
289
+ def pb3_handle_type_errors(event, e, is_recursive_call, datahash)
290
+ begin
291
+ if is_recursive_call
292
+ @logger.warn("Protobuf encoding error 2.1: Type error (#{e.inspect}). Some types could not be converted. The event has been discarded. Type mismatches: #{mismatches}.")
293
+ else
294
+ if @pb3_encoder_autoconvert_types
277
295
 
278
- # Preparation: the data cannot be encoded until certain criteria are met:
279
- # 1) remove @ signs from keys.
280
- # 2) convert timestamps and other objects to strings
281
- datahash = datahash.inject({}){|x,(k,v)| x[k.gsub(/@/,'').to_sym] = (should_convert_to_string?(v) ? v.to_s : v); x}
296
+ msg = "Protobuf encoding error 2.2: Type error (#{e.inspect}). Will try to convert the data types. Original data: #{datahash}"
297
+ @logger.warn(msg)
298
+ mismatches = pb3_get_type_mismatches(datahash, "", @class_name)
282
299
 
283
- # Check if any of the fields in this hash are protobuf classes and if so, create a builder for them.
284
- meta = @metainfo_messageclasses[class_name]
285
- if meta
286
- meta.map do | (field_name,class_name) |
287
- key = field_name.to_sym
288
- if datahash.include?(key)
289
- original_value = datahash[key]
290
- datahash[key] =
291
- if original_value.is_a?(::Array)
292
- # make this field an array/list of protobuf objects
293
- # value is a list of hashed complex objects, each of which needs to be protobuffed and
294
- # put back into the list.
295
- original_value.map { |x| pb3_encode(x, class_name) }
296
- original_value
297
- else
298
- r = pb3_encode(original_value, class_name)
299
- builder = Google::Protobuf::DescriptorPool.generated_pool.lookup(class_name).msgclass
300
- builder.new(r)
301
- end # if is array
302
- end # if datahash_include
300
+ msg = "Protobuf encoding info 2.2: Type mismatches found: #{mismatches}." # TODO remove
301
+ @logger.warn(msg)
302
+
303
+ event = pb3_convert_mismatched_types(event, mismatches)
304
+ # Add a (temporary) tag to handle the recursion stop
305
+ pb3_add_tag(event, @pb3_typeconversion_tag )
306
+ pb3_encode(event)
307
+ else
308
+ @logger.warn("Protobuf encoding error 2.3: Type error (#{e.inspect}). The event has been discarded. Try setting pb3_encoder_autoconvert_types => true for automatic type conversion.")
309
+ end
310
+ end
311
+ rescue TypeError => e
312
+ if @pb3_encoder_autoconvert_types
313
+ @logger.warn("Protobuf encoding error 2.4.1: (#{e.inspect}). Failed to convert data types. The event has been discarded. original data: #{datahash}")
314
+ else
315
+ @logger.warn("Protobuf encoding error 2.4.2: (#{e.inspect}). The event has been discarded.")
316
+ end
317
+ rescue => ex
318
+ @logger.warn("Protobuf encoding error 2.5: (#{e.inspect}). The event has been discarded. Auto-typecasting was on: #{@pb3_encoder_autoconvert_types}")
319
+ end
320
+ end # pb3_handle_type_errors
321
+
322
+ def pb3_get_type_mismatches(data, key_prefix, pb_class)
323
+ mismatches = []
324
+ data.to_hash.each do |key, value|
325
+ expected_type = pb3_get_expected_type(key, pb_class)
326
+ r = pb3_compare_datatypes(value, key, key_prefix, pb_class, expected_type)
327
+ mismatches.concat(r)
328
+ end # data.each
329
+ mismatches
330
+ end
331
+
332
+
333
+ def pb3_get_expected_type(key, pb_class)
334
+ pb_descriptor = Google::Protobuf::DescriptorPool.generated_pool.lookup(pb_class)
335
+
336
+ if !pb_descriptor.nil?
337
+ pb_builder = pb_descriptor.msgclass
338
+ pb_obj = pb_builder.new({})
339
+ v = pb_obj.send(key)
340
+
341
+ if !v.nil?
342
+ v.class
343
+ else
344
+ nil
345
+ end
346
+ end
347
+ end
348
+
349
+ def pb3_compare_datatypes(value, key, key_prefix, pb_class, expected_type)
350
+ mismatches = []
351
+
352
+ if value.nil?
353
+ is_mismatch = false
354
+ else
355
+ case value
356
+ when ::Hash, Google::Protobuf::MessageExts
357
+
358
+ is_mismatch = false
359
+ descriptor = Google::Protobuf::DescriptorPool.generated_pool.lookup(pb_class).lookup(key)
360
+ if descriptor.subtype != nil
361
+ class_of_nested_object = pb3_get_descriptorpool_name(descriptor.subtype.msgclass)
362
+ new_prefix = "#{key}."
363
+ recursive_mismatches = pb3_get_type_mismatches(value, new_prefix, class_of_nested_object)
364
+ mismatches.concat(recursive_mismatches)
365
+ end
366
+ when ::Array
367
+
368
+ expected_type = pb3_get_expected_type(key, pb_class)
369
+ is_mismatch = (expected_type != Google::Protobuf::RepeatedField)
370
+ child_type = Google::Protobuf::DescriptorPool.generated_pool.lookup(pb_class).lookup(key).type
371
+ value.each_with_index do | v, i |
372
+ new_prefix = "#{key}."
373
+ recursive_mismatches = pb3_compare_datatypes(v, i.to_s, new_prefix, pb_class, child_type)
374
+ mismatches.concat(recursive_mismatches)
375
+ is_mismatch |= recursive_mismatches.any?
303
376
  end # do
304
- end # if meta
377
+ else # is scalar data type
305
378
 
306
- # Check if any of the fields in this hash are enum classes and if so, create a builder for them.
307
- meta = @metainfo_enumclasses[class_name]
308
- if meta
309
- meta.map do | (field_name,class_name) |
310
- key = field_name.to_sym
311
- if datahash.include?(key)
312
- original_value = datahash[key]
313
- datahash[key] = case original_value
314
- when ::Array
315
- original_value.map { |x| pb3_encode(x, class_name) }
316
- original_value
317
- when Fixnum
318
- original_value # integers will be automatically converted into enum
319
- # else
320
- # feature request: support for providing integers as strings or symbols.
321
- # not fully tested yet:
322
- # begin
323
- # enum_lookup_name = "#{class_name}::#{original_value}"
324
- # enum_lookup_name.split('::').inject(Object) do |mod, class_name|
325
- # mod.const_get(class_name)
326
- # end # do
327
- # rescue => e
328
- # @logger.debug("Encoding error 3: could not translate #{original_value} into enum. #{e}")
329
- # raise e
330
- # end
379
+ is_mismatch = ! pb3_is_scalar_datatype_match(expected_type, value.class)
380
+ end # if
381
+ end # if value.nil?
382
+
383
+ if is_mismatch
384
+ mismatches << {"key" => "#{key_prefix}#{key}", "actual_type" => value.class, "expected_type" => expected_type, "value" => value}
385
+ end
386
+ mismatches
387
+ end
388
+
389
+ def pb3_remove_typeconversion_tag(data)
390
+ # remove the tag that we added to the event because
391
+ # the protobuf definition might not have a field for tags
392
+ data['tags'].delete(@pb3_typeconversion_tag)
393
+ if data['tags'].length == 0
394
+ data.delete('tags')
395
+ end
396
+ data
397
+ end
398
+
399
+ def pb3_get_descriptorpool_name(child_class)
400
+ # make instance
401
+ inst = child_class.new
402
+ # get the lookup name for the Descriptorpool
403
+ inst.class.descriptor.name
404
+ end
405
+
406
+ def pb3_is_scalar_datatype_match(expected_type, actual_type)
407
+ if expected_type == actual_type
408
+ true
409
+ else
410
+ e = expected_type.to_s.downcase.to_sym
411
+ a = actual_type.to_s.downcase.to_sym
412
+ case e
413
+ # when :string, :integer
414
+ when :string
415
+ a == e
416
+ when :integer
417
+ a == e
418
+ when :float
419
+ a == :float || a == :integer
420
+ end
421
+ end
422
+ end
423
+
424
+
425
+ def pb3_convert_mismatched_types_getter(struct, key)
426
+ if struct.is_a? ::Hash
427
+ struct[key]
428
+ else
429
+ struct.get(key)
430
+ end
431
+ end
432
+
433
+ def pb3_convert_mismatched_types_setter(struct, key, value)
434
+ if struct.is_a? ::Hash
435
+ struct[key] = value
436
+ else
437
+ struct.set(key, value)
438
+ end
439
+ struct
440
+ end
441
+
442
+ def pb3_add_tag(event, tag )
443
+ if event.get('tags').nil?
444
+ event.set('tags', [tag])
445
+ else
446
+ existing_tags = event.get('tags')
447
+ event.set("tags", existing_tags << tag)
448
+ end
449
+ end
450
+
451
+ # Due to recursion on nested fields in the event object this method might be given an event (1st call) or a hash (2nd .. nth call)
452
+ # First call will be the event object, child objects will be hashes.
453
+ def pb3_convert_mismatched_types(struct, mismatches)
454
+ mismatches.each do | m |
455
+ key = m['key']
456
+ expected_type = m['expected_type']
457
+ actual_type = m['actual_type']
458
+ if key.include? "." # the mismatch is in a child object
459
+ levels = key.split(/\./) # key is something like http_user_agent.minor_version and needs to be splitted.
460
+ key = levels[0]
461
+ sub_levels = levels.drop(1).join(".")
462
+ new_mismatches = [{"key"=>sub_levels, "actual_type"=>m["actual_type"], "expected_type"=>m["expected_type"]}]
463
+ value = pb3_convert_mismatched_types_getter(struct, key)
464
+ new_value = pb3_convert_mismatched_types(value, new_mismatches)
465
+ struct = pb3_convert_mismatched_types_setter(struct, key, new_value )
466
+ else
467
+ value = pb3_convert_mismatched_types_getter(struct, key)
468
+ begin
469
+ case expected_type.to_s
470
+ when "Integer"
471
+ case actual_type.to_s
472
+ when "String"
473
+ new_value = value.to_i
474
+ when "Float"
475
+ if value.floor == value # convert values like 2.0 to 2, but not 2.1
476
+ new_value = value.to_i
477
+ end
478
+ end
479
+ when "String"
480
+ new_value = value.to_s
481
+ when "Float"
482
+ new_value = value.to_f
483
+ when "Boolean","TrueClass", "FalseClass"
484
+ new_value = value.to_s.downcase == "true"
485
+ end
486
+ if !new_value.nil?
487
+ struct = pb3_convert_mismatched_types_setter(struct, key, new_value )
488
+ end
489
+ rescue Exception => ex
490
+ @logger.debug("Protobuf encoding error 5: Could not convert types for protobuf encoding: #{ex}")
331
491
  end
332
- end # if datahash_include
333
- end # do
334
- end # if meta
492
+ end # if key contains .
493
+ end # mismatches.each
494
+ struct
495
+ end
496
+
497
+ def pb3_prepare_for_encoding(datahash)
498
+ # 0) Remove empty fields.
499
+ datahash = datahash.select { |key, value| !value.nil? }
500
+
501
+ # Preparation: the data cannot be encoded until certain criteria are met:
502
+ # 1) remove @ signs from keys.
503
+ # 2) convert timestamps and other objects to strings
504
+ datahash = datahash.inject({}){|x,(k,v)| x[k.gsub(/@/,'').to_sym] = (should_convert_to_string?(v) ? v.to_s : v); x}
505
+
506
+ datahash.each do |key, value|
507
+ datahash[key] = pb3_prepare_for_encoding(value) if value.is_a?(Hash)
335
508
  end
509
+
336
510
  datahash
337
511
  end
338
512
 
339
- def pb2_encode_wrapper(event)
340
- data = pb2_encode(event.to_hash, @class_name)
513
+
514
+
515
+ def pb2_encode(event)
516
+ data = pb2_prepare_for_encoding(event.to_hash, @class_name)
341
517
  msg = @pb_builder.new(data)
342
518
  msg.serialize_to_string
343
519
  rescue NoMethodError => e
344
- @logger.debug("Encoding error 2. Probably mismatching protobuf definition. Required fields in the protobuf definition are: " + event.to_hash.keys.join(", ") + " and the timestamp field name must not include a @. ")
520
+ @logger.warn("Encoding error 2. Probably mismatching protobuf definition. Required fields in the protobuf definition are: " + event.to_hash.keys.join(", ") + " and the timestamp field name must not include a @. ")
345
521
  raise e
346
522
  rescue => e
347
- @logger.debug("Encoding error 1: #{e.inspect}")
523
+ @logger.warn("Encoding error 1: #{e.inspect}")
348
524
  raise e
349
525
  end
350
526
 
351
527
 
352
528
 
353
- def pb2_encode(datahash, class_name)
529
+ def pb2_prepare_for_encoding(datahash, class_name)
354
530
  if datahash.is_a?(::Hash)
355
531
  # Preparation: the data cannot be encoded until certain criteria are met:
356
532
  # 1) remove @ signs from keys.
@@ -368,11 +544,11 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
368
544
  # make this field an array/list of protobuf objects
369
545
  # value is a list of hashed complex objects, each of which needs to be protobuffed and
370
546
  # put back into the list.
371
- original_value.map { |x| pb2_encode(x, c) }
547
+ original_value.map { |x| pb2_prepare_for_encoding(x, c) }
372
548
  original_value
373
549
  else
374
550
  proto_obj = pb2_create_instance(c)
375
- proto_obj.new(pb2_encode(original_value, c)) # this line is reached in the colourtest for an enum. Enums should not be instantiated. Should enums even be in the messageclasses? I dont think so! TODO bug
551
+ proto_obj.new(pb2_prepare_for_encoding(original_value, c)) # this line is reached in the colourtest for an enum. Enums should not be instantiated. Should enums even be in the messageclasses? I dont think so! TODO bug
376
552
  end # if is array
377
553
  end # if datahash_include
378
554
  end # do
@@ -383,7 +559,7 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
383
559
 
384
560
 
385
561
  def should_convert_to_string?(v)
386
- !(v.is_a?(Fixnum) || v.is_a?(::Hash) || v.is_a?(::Array) || [true, false].include?(v))
562
+ !(v.is_a?(Integer) || v.is_a?(Float) || v.is_a?(::Hash) || v.is_a?(::Array) || [true, false].include?(v))
387
563
  end
388
564
 
389
565
 
@@ -394,6 +570,7 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
394
570
 
395
571
 
396
572
  def pb3_metadata_analyis(filename)
573
+
397
574
  regex_class_name = /\s*add_message "(?<name>.+?)" do\s+/ # TODO optimize both regexes for speed (negative lookahead)
398
575
  regex_pbdefs = /\s*(optional|repeated)(\s*):(?<name>.+),(\s*):(?<type>\w+),(\s*)(?<position>\d+)(, \"(?<enum_class>.*?)\")?/
399
576
  class_name = ""
@@ -493,6 +670,7 @@ class LogStash::Codecs::Protobuf < LogStash::Codecs::Base
493
670
  require filename
494
671
  rescue Exception => e
495
672
  @logger.error("Unable to load file: #{filename}. Reason: #{e.inspect}")
673
+ raise e
496
674
  end
497
675
  end
498
676