fluent-plugin-kubernetes_metadata_filter-rh 2.6.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (40) hide show
  1. checksums.yaml +7 -0
  2. data/.circleci/config.yml +57 -0
  3. data/.gitignore +19 -0
  4. data/.rubocop.yml +57 -0
  5. data/Gemfile +9 -0
  6. data/Gemfile.lock +156 -0
  7. data/LICENSE.txt +201 -0
  8. data/README.md +253 -0
  9. data/Rakefile +41 -0
  10. data/fluent-plugin-kubernetes_metadata_filter.gemspec +34 -0
  11. data/lib/fluent/plugin/filter_kubernetes_metadata.rb +378 -0
  12. data/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb +102 -0
  13. data/lib/fluent/plugin/kubernetes_metadata_common.rb +120 -0
  14. data/lib/fluent/plugin/kubernetes_metadata_stats.rb +46 -0
  15. data/lib/fluent/plugin/kubernetes_metadata_util.rb +40 -0
  16. data/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb +154 -0
  17. data/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb +172 -0
  18. data/test/cassettes/invalid_api_server_config.yml +53 -0
  19. data/test/cassettes/kubernetes_docker_metadata_annotations.yml +205 -0
  20. data/test/cassettes/kubernetes_docker_metadata_dotted_labels.yml +197 -0
  21. data/test/cassettes/kubernetes_get_api_v1.yml +193 -0
  22. data/test/cassettes/kubernetes_get_api_v1_using_token.yml +195 -0
  23. data/test/cassettes/kubernetes_get_namespace_default.yml +69 -0
  24. data/test/cassettes/kubernetes_get_namespace_default_using_token.yml +71 -0
  25. data/test/cassettes/kubernetes_get_pod.yml +146 -0
  26. data/test/cassettes/kubernetes_get_pod_using_token.yml +148 -0
  27. data/test/cassettes/metadata_from_tag_and_journald_fields.yml +153 -0
  28. data/test/cassettes/metadata_from_tag_journald_and_kubernetes_fields.yml +285 -0
  29. data/test/cassettes/valid_kubernetes_api_server.yml +55 -0
  30. data/test/cassettes/valid_kubernetes_api_server_using_token.yml +57 -0
  31. data/test/helper.rb +82 -0
  32. data/test/plugin/test.token +1 -0
  33. data/test/plugin/test_cache_stats.rb +33 -0
  34. data/test/plugin/test_cache_strategy.rb +194 -0
  35. data/test/plugin/test_filter_kubernetes_metadata.rb +1012 -0
  36. data/test/plugin/test_utils.rb +56 -0
  37. data/test/plugin/test_watch_namespaces.rb +245 -0
  38. data/test/plugin/test_watch_pods.rb +344 -0
  39. data/test/plugin/watch_test.rb +74 -0
  40. metadata +269 -0
data/README.md ADDED
@@ -0,0 +1,253 @@
1
+ # fluent-plugin-kubernetes_metadata_filter, a plugin for [Fluentd](http://fluentd.org)
2
+ [![Circle CI](https://circleci.com/gh/fabric8io/fluent-plugin-kubernetes_metadata_filter.svg?style=svg)](https://circleci.com/gh/fabric8io/fluent-plugin-kubernetes_metadata_filter)
3
+ [![Code Climate](https://codeclimate.com/github/fabric8io/fluent-plugin-kubernetes_metadata_filter/badges/gpa.svg)](https://codeclimate.com/github/fabric8io/fluent-plugin-kubernetes_metadata_filter)
4
+ [![Test Coverage](https://codeclimate.com/github/fabric8io/fluent-plugin-kubernetes_metadata_filter/badges/coverage.svg)](https://codeclimate.com/github/fabric8io/fluent-plugin-kubernetes_metadata_filter)
5
+ [![Ruby Style Guide](https://img.shields.io/badge/code_style-rubocop-brightgreen.svg)](https://github.com/rubocop-hq/rubocop)
6
+ [![Ruby Style Guide](https://img.shields.io/badge/code_style-community-brightgreen.svg)](https://rubystyle.guide)
7
+
8
+ The Kubernetes metadata plugin filter enriches container log records with pod and namespace metadata.
9
+
10
+ This plugin derives basic metadata about the container that emitted a given log record using the source of the log record. Records from journald provide metadata about the
11
+ container environment as named fields. Records from JSON files encode metadata about the container in the file name. The initial metadata derived from the source is used
12
+ to lookup additional metadata about the container's associated pod and namespace (e.g. UUIDs, labels, annotations) when the kubernetes_url is configured. If the plugin cannot
13
+ authoritatively determine the namespace of the container emitting a log record, it will use an 'orphan' namespace ID in the metadata. This behaviors supports multi-tenant systems
14
+ that rely on the authenticity of the namespace for proper log isolation.
15
+
16
+ ## Requirements
17
+
18
+ | fluent-plugin-kubernetes_metadata_filter | fluentd | ruby |
19
+ |-------------------|---------|------|
20
+ | >= 2.5.0 | >= v1.10.0 | >= 2.5 |
21
+ | >= 2.0.0 | >= v0.14.20 | >= 2.1 |
22
+ | < 2.0.0 | >= v0.12.0 | >= 1.9 |
23
+
24
+ NOTE: For v0.12 version, you should use 1.x.y version. Please send patch into v0.12 branch if you encountered 1.x version's bug.
25
+
26
+ NOTE: This documentation is for fluent-plugin-kubernetes_metadata_filter-plugin-elasticsearch 2.x or later. For 1.x documentation, please see [v0.12 branch](https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/tree/v0.12).
27
+
28
+ ## Installation
29
+
30
+ gem install fluent-plugin-kubernetes_metadata_filter
31
+
32
+ ## Configuration
33
+
34
+ Configuration options for fluent.conf are:
35
+
36
+ * `kubernetes_url` - URL to the API server. Set this to retrieve further kubernetes metadata for logs from kubernetes API server. If not specified, environment variables `KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT` will be used if both are present which is typically true when running fluentd in a pod.
37
+ * `apiVersion` - API version to use (default: `v1`)
38
+ * `ca_file` - path to CA file for Kubernetes server certificate validation
39
+ * `verify_ssl` - validate SSL certificates (default: `true`)
40
+ * `client_cert` - path to a client cert file to authenticate to the API server
41
+ * `client_key` - path to a client key file to authenticate to the API server
42
+ * `bearer_token_file` - path to a file containing the bearer token to use for authentication
43
+ * `tag_to_kubernetes_name_regexp` - the regular expression used to extract kubernetes metadata (pod name, container name, namespace) from the current fluentd tag.
44
+ This must used named capture groups for `container_name`, `pod_name` & `namespace` default: See [code](https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L52)
45
+ * `cache_size` - size of the cache of Kubernetes metadata to reduce requests to the API server (default: `1000`)
46
+ * `cache_ttl` - TTL in seconds of each cached element. Set to negative value to disable TTL eviction (default: `3600` - 1 hour)
47
+ * `watch` - set up a watch on pods on the API server for updates to metadata (default: `true`)
48
+ * `de_dot` - replace dots in labels and annotations with configured `de_dot_separator`, required for ElasticSearch 2.x compatibility (default: `true`)
49
+ * `de_dot_separator` - separator to use if `de_dot` is enabled (default: `_`)
50
+ * *DEPRECATED* `use_journal` - If false, messages are expected to be formatted and tagged as if read by the fluentd in\_tail plugin with wildcard filename. If true, messages are expected to be formatted as if read from the systemd journal. The `MESSAGE` field has the full message. The `CONTAINER_NAME` field has the encoded k8s metadata (see below). The `CONTAINER_ID_FULL` field has the full container uuid. This requires docker to use the `--log-driver=journald` log driver. If unset (the default), the plugin will use the `CONTAINER_NAME` and `CONTAINER_ID_FULL` fields
51
+ if available, otherwise, will use the tag in the `tag_to_kubernetes_name_regexp` format.
52
+ * `container_name_to_kubernetes_regexp` - The regular expression used to extract the k8s metadata encoded in the journal `CONTAINER_NAME` field default: See [code](https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L68)
53
+ * This corresponds to the definition [in the source](https://github.com/kubernetes/kubernetes/blob/release-1.6/pkg/kubelet/dockertools/docker.go#L317)
54
+ * `annotation_match` - Array of regular expressions matching annotation field names. Matched annotations are added to a log record.
55
+ * `allow_orphans` - Modify the namespace and namespace id to the values of `orphaned_namespace_name` and `orphaned_namespace_id`
56
+ when true (default: `true`)
57
+ * `orphaned_namespace_name` - The namespace to associate with records where the namespace can not be determined (default: `.orphaned`)
58
+ * `orphaned_namespace_id` - The namespace id to associate with records where the namespace can not be determined (default: `orphaned`)
59
+ * `lookup_from_k8s_field` - If the field `kubernetes` is present, lookup the metadata from the given subfields such as `kubernetes.namespace_name`, `kubernetes.pod_name`, etc. This allows you to avoid having to pass in metadata to lookup in an explicitly formatted tag name or in an explicitly formatted `CONTAINER_NAME` value. For example, set `kubernetes.namespace_name`, `kubernetes.pod_name`, `kubernetes.container_name`, and `docker.id` in the record, and the filter will fill in the rest. (default: `true`)
60
+ * `ssl_partial_chain` - if `ca_file` is for an intermediate CA, or otherwise we do not have the root CA and want
61
+ to trust the intermediate CA certs we do have, set this to `true` - this corresponds to
62
+ the `openssl s_client -partial_chain` flag and `X509_V_FLAG_PARTIAL_CHAIN` (default: `false`)
63
+ * `skip_labels` - Skip all label fields from the metadata.
64
+ * `skip_container_metadata` - Skip some of the container data of the metadata. The metadata will not contain the container_image and container_image_id fields.
65
+ * `skip_master_url` - Skip the master_url field from the metadata.
66
+ * `skip_namespace_metadata` - Skip the namespace_id field from the metadata. The fetch_namespace_metadata function will be skipped. The plugin will be faster and cpu consumption will be less.
67
+ * `watch_retry_interval` - The time interval in seconds for retry backoffs when watch connections fail. (default: `10`)
68
+
69
+ **NOTE:** As of the release 2.1.x of this plugin, it no longer supports parsing the source message into JSON and attaching it to the
70
+ payload. The following configuration options are removed:
71
+
72
+ * `merge_json_log`
73
+ * `preserve_json_log`
74
+
75
+ One way of preserving JSON logs can be through the [parser plugin](https://docs.fluentd.org/filter/parser)
76
+
77
+ **NOTE** As of this release, the use of `use_journal` is **DEPRECATED**. If this setting is not present, the plugin will
78
+ attempt to figure out the source of the metadata fields from the following:
79
+ - If `lookup_from_k8s_field true` (the default) and the following fields are present in the record:
80
+ `docker.container_id`, `kubernetes.namespace_name`, `kubernetes.pod_name`, `kubernetes.container_name`,
81
+ then the plugin will use those values as the source to use to lookup the metadata
82
+ - If `use_journal true`, or `use_journal` is unset, and the fields `CONTAINER_NAME` and `CONTAINER_ID_FULL` are present in the record,
83
+ then the plugin will parse those values using `container_name_to_kubernetes_regexp` and use those as the source to lookup the metadata
84
+ - Otherwise, if the tag matches `tag_to_kubernetes_name_regexp`, the plugin will parse the tag and use those values to
85
+ lookup the metdata
86
+
87
+ Reading from the JSON formatted log files with `in_tail` and wildcard filenames while respecting the CRI-o log format with the same config you need the fluent-plugin "multi-format-parser":
88
+
89
+ ```
90
+ fluent-gem install fluent-plugin-multi-format-parser
91
+ ```
92
+
93
+ The config block could look like this:
94
+ ```
95
+ <source>
96
+ @type tail
97
+ path /var/log/containers/*.log
98
+ pos_file fluentd-docker.pos
99
+ read_from_head true
100
+ tag kubernetes.*
101
+ <parse>
102
+ @type multi_format
103
+ <pattern>
104
+ format json
105
+ time_key time
106
+ time_type string
107
+ time_format "%Y-%m-%dT%H:%M:%S.%NZ"
108
+ keep_time_key false
109
+ </pattern>
110
+ <pattern>
111
+ format regexp
112
+ expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
113
+ time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
114
+ keep_time_key false
115
+ </pattern>
116
+ </parse>
117
+ </source>
118
+
119
+ <filter kubernetes.var.log.containers.**.log>
120
+ @type kubernetes_metadata
121
+ </filter>
122
+
123
+ <match **>
124
+ @type stdout
125
+ </match>
126
+ ```
127
+
128
+ Reading from the systemd journal (requires the fluentd `fluent-plugin-systemd` and `systemd-journal` plugins, and requires docker to use the `--log-driver=journald` log driver):
129
+ ```
130
+ <source>
131
+ @type systemd
132
+ path /run/log/journal
133
+ pos_file journal.pos
134
+ tag journal
135
+ read_from_head true
136
+ </source>
137
+
138
+ # probably want to use something like fluent-plugin-rewrite-tag-filter to
139
+ # retag entries from k8s
140
+ <match journal>
141
+ @type rewrite_tag_filter
142
+ rewriterule1 CONTAINER_NAME ^k8s_ kubernetes.journal.container
143
+ ...
144
+ </match>
145
+
146
+ <filter kubernetes.**>
147
+ @type kubernetes_metadata
148
+ use_journal true
149
+ </filter>
150
+
151
+ <match **>
152
+ @type stdout
153
+ </match>
154
+ ```
155
+ ## Log content as JSON
156
+ In former versions this plugin parsed the value of the key log as JSON. In the current version this feature was removed, to avoid duplicate features in the fluentd plugin ecosystem. It can parsed with the parser plugin like this:
157
+ ```
158
+ <filter kubernetes.**>
159
+ @type parser
160
+ key_name log
161
+ <parse>
162
+ @type json
163
+ json_parser json
164
+ </parse>
165
+ replace_invalid_sequence true
166
+ reserve_data true # this preserves unparsable log lines
167
+ emit_invalid_record_to_error false # In case of unparsable log lines keep the error log clean
168
+ reserve_time # the time was already parsed in the source, we don't want to overwrite it with current time.
169
+ </filter>
170
+ ```
171
+
172
+ ## Environment variables for Kubernetes
173
+
174
+ If the name of the Kubernetes node the plugin is running on is set as
175
+ an environment variable with the name `K8S_NODE_NAME`, it will reduce cache
176
+ misses and needless calls to the Kubernetes API.
177
+
178
+ In the Kubernetes container definition, this is easily accomplished by:
179
+
180
+ ```yaml
181
+ env:
182
+ - name: K8S_NODE_NAME
183
+ valueFrom:
184
+ fieldRef:
185
+ fieldPath: spec.nodeName
186
+ ```
187
+
188
+ ## Example input/output
189
+
190
+ Kubernetes creates symlinks to Docker log files in `/var/log/containers/*.log`. Docker logs in JSON format.
191
+
192
+ Assuming following inputs are coming from a log file named `/var/log/containers/fabric8-console-controller-98rqc_default_fabric8-console-container-df14e0d5ae4c07284fa636d739c8fc2e6b52bc344658de7d3f08c36a2e804115.log`:
193
+
194
+ ```
195
+ {
196
+ "log": "2015/05/05 19:54:41 \n",
197
+ "stream": "stderr",
198
+ "time": "2015-05-05T19:54:41.240447294Z"
199
+ }
200
+ ```
201
+
202
+ Then output becomes as belows
203
+ ```
204
+ {
205
+ "log": "2015/05/05 19:54:41 \n",
206
+ "stream": "stderr",
207
+ "docker": {
208
+ "id": "df14e0d5ae4c07284fa636d739c8fc2e6b52bc344658de7d3f08c36a2e804115",
209
+ }
210
+ "kubernetes": {
211
+ "host": "jimmi-redhat.localnet",
212
+ "pod_name":"fabric8-console-controller-98rqc",
213
+ "pod_id": "c76927af-f563-11e4-b32d-54ee7527188d",
214
+ "pod_ip": "172.17.0.8",
215
+ "container_name": "fabric8-console-container",
216
+ "namespace_name": "default",
217
+ "namespace_id": "23437884-8e08-4d95-850b-e94378c9b2fd",
218
+ "namespace_annotations": {
219
+ "fabric8.io/git-commit": "5e1116f63df0bac2a80bdae2ebdc563577bbdf3c"
220
+ },
221
+ "namespace_labels": {
222
+ "product_version": "v1.0.0"
223
+ },
224
+ "labels": {
225
+ "component": "fabric8Console"
226
+ }
227
+ }
228
+ }
229
+ ```
230
+
231
+ If using journal input, from docker configured with `--log-driver=journald`, the input looks like the `journalctl -o export` format:
232
+ ```
233
+ # The stream identification is encoded into the PRIORITY field as an
234
+ # integer: 6, or github.com/coreos/go-systemd/journal.Info, marks stdout,
235
+ # while 3, or github.com/coreos/go-systemd/journal.Err, marks stderr.
236
+ PRIORITY=6
237
+ CONTAINER_ID=b6cbb6e73c0a
238
+ CONTAINER_ID_FULL=b6cbb6e73c0ad63ab820e4baa97cdc77cec729930e38a714826764ac0491341a
239
+ CONTAINER_NAME=k8s_registry.a49f5318_docker-registry-1-hhoj0_default_ae3a9bdc-1f66-11e6-80a2-fa163e2fff3a_799e4035
240
+ MESSAGE=172.17.0.1 - - [21/May/2016:16:52:05 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1"
241
+ ```
242
+
243
+ ## Contributing
244
+
245
+ 1. Fork it
246
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
247
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
248
+ 4. Test it (`GEM_HOME=vendor bundle install; GEM_HOME=vendor bundle exec rake test`)
249
+ 5. Push to the branch (`git push origin my-new-feature`)
250
+ 6. Create new Pull Request
251
+
252
+ ## Copyright
253
+ Copyright (c) 2015 jimmidyson
data/Rakefile ADDED
@@ -0,0 +1,41 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/setup'
4
+ require 'bundler/gem_tasks'
5
+ require 'rake/testtask'
6
+ require 'bump/tasks'
7
+ require 'rubocop/rake_task'
8
+
9
+ task test: [:base_test]
10
+ task default: [:test, :build, :rubocop]
11
+
12
+ RuboCop::RakeTask.new
13
+
14
+ desc 'Run test_unit based test'
15
+ Rake::TestTask.new(:base_test) do |t|
16
+ # To run test for only one file (or file path pattern)
17
+ # $ bundle exec rake base_test TEST=test/test_specified_path.rb
18
+ # $ bundle exec rake base_test TEST=test/test_*.rb
19
+ t.libs << 'test'
20
+ t.test_files = Dir['test/**/test_*.rb'].sort
21
+ t.warning = false
22
+ end
23
+
24
+ desc 'Add copyright headers'
25
+ task :headers do
26
+ require 'rubygems'
27
+ require 'copyright_header'
28
+
29
+ args = {
30
+ license: 'Apache-2.0',
31
+ copyright_software: 'Fluentd Kubernetes Metadata Filter Plugin',
32
+ copyright_software_description: 'Enrich Fluentd events with Kubernetes metadata',
33
+ copyright_holders: ['Red Hat, Inc.'],
34
+ copyright_years: ['2015-2021'],
35
+ add_path: 'lib:test',
36
+ output_dir: '.'
37
+ }
38
+
39
+ command_line = CopyrightHeader::CommandLine.new(args)
40
+ command_line.execute
41
+ end
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ lib = File.expand_path('lib', __dir__)
4
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5
+
6
+ Gem::Specification.new do |gem|
7
+ gem.name = 'fluent-plugin-kubernetes_metadata_filter-rh'
8
+ gem.version = '2.6.1'
9
+ gem.authors = ['Stan Kwong']
10
+ gem.email = ['jpdstan@gmail.com']
11
+ gem.description = 'Filter plugin to add Kubernetes metadata'
12
+ gem.summary = 'Fluentd filter plugin to add Kubernetes metadata'
13
+ gem.homepage = 'https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter'
14
+ gem.license = 'Apache-2.0'
15
+
16
+ gem.files = `git ls-files`.split($/)
17
+
18
+ gem.required_ruby_version = '>= 2.5.0'
19
+
20
+ gem.add_runtime_dependency 'fluentd', ['>= 0.14.0', '< 1.13']
21
+ gem.add_runtime_dependency 'kubeclient', '< 5'
22
+ gem.add_runtime_dependency 'lru_redux'
23
+
24
+ gem.add_development_dependency 'bump'
25
+ gem.add_development_dependency 'bundler', '~> 2.0'
26
+ gem.add_development_dependency 'copyright-header'
27
+ gem.add_development_dependency 'minitest', '~> 4.0'
28
+ gem.add_development_dependency 'rake'
29
+ gem.add_development_dependency 'test-unit', '~> 3.0.2'
30
+ gem.add_development_dependency 'test-unit-rr', '~> 1.0.3'
31
+ gem.add_development_dependency 'vcr'
32
+ gem.add_development_dependency 'webmock'
33
+ gem.add_development_dependency 'yajl-ruby'
34
+ end
@@ -0,0 +1,378 @@
1
+ # frozen_string_literal: true
2
+
3
+ #
4
+ # Fluentd Kubernetes Metadata Filter Plugin - Enrich Fluentd events with
5
+ # Kubernetes metadata
6
+ #
7
+ # Copyright 2017 Red Hat, Inc.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ #
21
+
22
+ require_relative 'kubernetes_metadata_cache_strategy'
23
+ require_relative 'kubernetes_metadata_common'
24
+ require_relative 'kubernetes_metadata_stats'
25
+ require_relative 'kubernetes_metadata_util'
26
+ require_relative 'kubernetes_metadata_watch_namespaces'
27
+ require_relative 'kubernetes_metadata_watch_pods'
28
+
29
+ require 'fluent/plugin/filter'
30
+ require 'resolv'
31
+
32
+ module Fluent::Plugin
33
+ class KubernetesMetadataFilter < Fluent::Plugin::Filter
34
+ K8_POD_CA_CERT = 'ca.crt'
35
+ K8_POD_TOKEN = 'token'
36
+
37
+ include KubernetesMetadata::CacheStrategy
38
+ include KubernetesMetadata::Common
39
+ include KubernetesMetadata::Util
40
+ include KubernetesMetadata::WatchNamespaces
41
+ include KubernetesMetadata::WatchPods
42
+
43
+ Fluent::Plugin.register_filter('kubernetes_metadata', self)
44
+
45
+ config_param :kubernetes_url, :string, default: nil
46
+ config_param :cache_size, :integer, default: 1000
47
+ config_param :cache_ttl, :integer, default: 60 * 60
48
+ config_param :watch, :bool, default: true
49
+ config_param :apiVersion, :string, default: 'v1'
50
+ config_param :client_cert, :string, default: nil
51
+ config_param :client_key, :string, default: nil
52
+ config_param :ca_file, :string, default: nil
53
+ config_param :verify_ssl, :bool, default: true
54
+ config_param :tag_to_kubernetes_name_regexp,
55
+ :string,
56
+ default: 'var\.log\.containers\.(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$'
57
+ config_param :bearer_token_file, :string, default: nil
58
+ config_param :secret_dir, :string, default: '/var/run/secrets/kubernetes.io/serviceaccount'
59
+ config_param :de_dot, :bool, default: true
60
+ config_param :de_dot_separator, :string, default: '_'
61
+ # if reading from the journal, the record will contain the following fields in the following
62
+ # format:
63
+ # CONTAINER_NAME=k8s_$containername.$containerhash_$podname_$namespacename_$poduuid_$rand32bitashex
64
+ # CONTAINER_FULL_ID=dockeridassha256hexvalue
65
+ config_param :use_journal, :bool, default: nil
66
+ # Field 2 is the container_hash, field 5 is the pod_id, and field 6 is the pod_randhex
67
+ # I would have included them as named groups, but you can't have named groups that are
68
+ # non-capturing :P
69
+ # parse format is defined here: https://github.com/kubernetes/kubernetes/blob/release-1.6/pkg/kubelet/dockertools/docker.go#L317
70
+ config_param :container_name_to_kubernetes_regexp,
71
+ :string,
72
+ default: '^(?<name_prefix>[^_]+)_(?<container_name>[^\._]+)(\.(?<container_hash>[^_]+))?_(?<pod_name>[^_]+)_(?<namespace>[^_]+)_[^_]+_[^_]+$'
73
+
74
+ config_param :annotation_match, :array, default: []
75
+ config_param :stats_interval, :integer, default: 30
76
+ config_param :allow_orphans, :bool, default: true
77
+ config_param :orphaned_namespace_name, :string, default: '.orphaned'
78
+ config_param :orphaned_namespace_id, :string, default: 'orphaned'
79
+ config_param :lookup_from_k8s_field, :bool, default: true
80
+ # if `ca_file` is for an intermediate CA, or otherwise we do not have the root CA and want
81
+ # to trust the intermediate CA certs we do have, set this to `true` - this corresponds to
82
+ # the openssl s_client -partial_chain flag and X509_V_FLAG_PARTIAL_CHAIN
83
+ config_param :ssl_partial_chain, :bool, default: false
84
+ config_param :skip_labels, :bool, default: false
85
+ config_param :skip_container_metadata, :bool, default: false
86
+ config_param :skip_master_url, :bool, default: false
87
+ config_param :skip_namespace_metadata, :bool, default: false
88
+ # The time interval in seconds for retry backoffs when watch connections fail.
89
+ config_param :watch_retry_interval, :integer, default: 1
90
+ # The base number of exponential backoff for retries.
91
+ config_param :watch_retry_exponential_backoff_base, :integer, default: 2
92
+ # The maximum number of times to retry pod and namespace watches.
93
+ config_param :watch_retry_max_times, :integer, default: 10
94
+
95
+ def fetch_pod_metadata(namespace_name, pod_name)
96
+ log.trace("fetching pod metadata: #{namespace_name}/#{pod_name}") if log.trace?
97
+ options = {
98
+ resource_version: '0' # Fetch from API server cache instead of etcd quorum read
99
+ }
100
+ pod_object = @client.get_pod(pod_name, namespace_name, options)
101
+ log.trace("raw metadata for #{namespace_name}/#{pod_name}: #{pod_object}") if log.trace?
102
+ metadata = parse_pod_metadata(pod_object)
103
+ @stats.bump(:pod_cache_api_updates)
104
+ log.trace("parsed metadata for #{namespace_name}/#{pod_name}: #{metadata}") if log.trace?
105
+ @cache[metadata['pod_id']] = metadata
106
+ rescue StandardError => e
107
+ @stats.bump(:pod_cache_api_nil_error)
108
+ log.debug "Exception '#{e}' encountered fetching pod metadata from Kubernetes API #{@apiVersion} endpoint #{@kubernetes_url}"
109
+ {}
110
+ end
111
+
112
+ def dump_stats
113
+ @curr_time = Time.now
114
+ return if @curr_time.to_i - @prev_time.to_i < @stats_interval
115
+
116
+ @prev_time = @curr_time
117
+ @stats.set(:pod_cache_size, @cache.count)
118
+ @stats.set(:namespace_cache_size, @namespace_cache.count) if @namespace_cache
119
+ log.info(@stats)
120
+ if log.level == Fluent::Log::LEVEL_TRACE
121
+ log.trace(" id cache: #{@id_cache.to_a}")
122
+ log.trace(" pod cache: #{@cache.to_a}")
123
+ log.trace("namespace cache: #{@namespace_cache.to_a}")
124
+ end
125
+ end
126
+
127
+ def fetch_namespace_metadata(namespace_name)
128
+ log.trace("fetching namespace metadata: #{namespace_name}") if log.trace?
129
+ options = {
130
+ resource_version: '0' # Fetch from API server cache instead of etcd quorum read
131
+ }
132
+ namespace_object = @client.get_namespace(namespace_name, nil, options)
133
+ log.trace("raw metadata for #{namespace_name}: #{namespace_object}") if log.trace?
134
+ metadata = parse_namespace_metadata(namespace_object)
135
+ @stats.bump(:namespace_cache_api_updates)
136
+ log.trace("parsed metadata for #{namespace_name}: #{metadata}") if log.trace?
137
+ @namespace_cache[metadata['namespace_id']] = metadata
138
+ rescue StandardError => e
139
+ @stats.bump(:namespace_cache_api_nil_error)
140
+ log.debug "Exception '#{e}' encountered fetching namespace metadata from Kubernetes API #{@apiVersion} endpoint #{@kubernetes_url}"
141
+ {}
142
+ end
143
+
144
+ def initialize
145
+ super
146
+ @prev_time = Time.now
147
+ end
148
+
149
+ def configure(conf)
150
+ super
151
+
152
+ def log.trace?
153
+ level == Fluent::Log::LEVEL_TRACE
154
+ end
155
+
156
+ require 'kubeclient'
157
+ require 'lru_redux'
158
+ @stats = KubernetesMetadata::Stats.new
159
+
160
+ if @de_dot && @de_dot_separator.include?('.')
161
+ raise Fluent::ConfigError, "Invalid de_dot_separator: cannot be or contain '.'"
162
+ end
163
+
164
+ if @cache_ttl < 0
165
+ log.info 'Setting the cache TTL to :none because it was <= 0'
166
+ @cache_ttl = :none
167
+ end
168
+
169
+ # Caches pod/namespace UID tuples for a given container UID.
170
+ @id_cache = LruRedux::TTL::ThreadSafeCache.new(@cache_size, @cache_ttl)
171
+
172
+ # Use the container UID as the key to fetch a hash containing pod metadata
173
+ @cache = LruRedux::TTL::ThreadSafeCache.new(@cache_size, @cache_ttl)
174
+
175
+ # Use the namespace UID as the key to fetch a hash containing namespace metadata
176
+ @namespace_cache = LruRedux::TTL::ThreadSafeCache.new(@cache_size, @cache_ttl)
177
+
178
+ @tag_to_kubernetes_name_regexp_compiled = Regexp.compile(@tag_to_kubernetes_name_regexp)
179
+ @container_name_to_kubernetes_regexp_compiled = Regexp.compile(@container_name_to_kubernetes_regexp)
180
+
181
+ # Use Kubernetes default service account if we're in a pod.
182
+ if @kubernetes_url.nil?
183
+ log.debug 'Kubernetes URL is not set - inspecting environ'
184
+
185
+ env_host = ENV['KUBERNETES_SERVICE_HOST']
186
+ env_port = ENV['KUBERNETES_SERVICE_PORT']
187
+ if present?(env_host) && present?(env_port)
188
+ if env_host =~ Resolv::IPv6::Regex
189
+ # Brackets are needed around IPv6 addresses
190
+ env_host = "[#{env_host}]"
191
+ end
192
+ @kubernetes_url = "https://#{env_host}:#{env_port}/api"
193
+ log.debug "Kubernetes URL is now '#{@kubernetes_url}'"
194
+ else
195
+ log.debug 'No Kubernetes URL could be found in config or environ'
196
+ end
197
+ end
198
+
199
+ # Use SSL certificate and bearer token from Kubernetes service account.
200
+ if Dir.exist?(@secret_dir)
201
+ log.debug "Found directory with secrets: #{@secret_dir}"
202
+ ca_cert = File.join(@secret_dir, K8_POD_CA_CERT)
203
+ pod_token = File.join(@secret_dir, K8_POD_TOKEN)
204
+
205
+ if !present?(@ca_file) && File.exist?(ca_cert)
206
+ log.debug "Found CA certificate: #{ca_cert}"
207
+ @ca_file = ca_cert
208
+ end
209
+
210
+ if !present?(@bearer_token_file) && File.exist?(pod_token)
211
+ log.debug "Found pod token: #{pod_token}"
212
+ @bearer_token_file = pod_token
213
+ end
214
+ end
215
+
216
+ if present?(@kubernetes_url)
217
+ ssl_options = {
218
+ client_cert: present?(@client_cert) ? OpenSSL::X509::Certificate.new(File.read(@client_cert)) : nil,
219
+ client_key: present?(@client_key) ? OpenSSL::PKey::RSA.new(File.read(@client_key)) : nil,
220
+ ca_file: @ca_file,
221
+ verify_ssl: @verify_ssl ? OpenSSL::SSL::VERIFY_PEER : OpenSSL::SSL::VERIFY_NONE
222
+ }
223
+
224
+ if @ssl_partial_chain
225
+ # taken from the ssl.rb OpenSSL::SSL::SSLContext code for DEFAULT_CERT_STORE
226
+ require 'openssl'
227
+ ssl_store = OpenSSL::X509::Store.new
228
+ ssl_store.set_default_paths
229
+ flagval = if defined? OpenSSL::X509::V_FLAG_PARTIAL_CHAIN
230
+ OpenSSL::X509::V_FLAG_PARTIAL_CHAIN
231
+ else
232
+ # this version of ruby does not define OpenSSL::X509::V_FLAG_PARTIAL_CHAIN
233
+ 0x80000
234
+ end
235
+ ssl_store.flags = OpenSSL::X509::V_FLAG_CRL_CHECK_ALL | flagval
236
+ ssl_options[:cert_store] = ssl_store
237
+ end
238
+
239
+ auth_options = {}
240
+
241
+ if present?(@bearer_token_file)
242
+ bearer_token = File.read(@bearer_token_file)
243
+ auth_options[:bearer_token] = bearer_token
244
+ end
245
+
246
+ log.debug 'Creating K8S client'
247
+ @client = Kubeclient::Client.new(
248
+ @kubernetes_url,
249
+ @apiVersion,
250
+ ssl_options: ssl_options,
251
+ auth_options: auth_options,
252
+ as: :parsed_symbolized
253
+ )
254
+
255
+ begin
256
+ @client.api_valid?
257
+ rescue KubeException => e
258
+ raise Fluent::ConfigError, "Invalid Kubernetes API #{@apiVersion} endpoint #{@kubernetes_url}: #{e.message}"
259
+ end
260
+
261
+ if @watch
262
+ if ENV['K8S_NODE_NAME'].nil? || ENV['K8S_NODE_NAME'].strip.empty?
263
+ log.warn("!! The environment variable 'K8S_NODE_NAME' is not set to the node name which can affect the API server and watch efficiency !!")
264
+ end
265
+
266
+ pod_thread = Thread.new(self, &:set_up_pod_thread)
267
+ pod_thread.abort_on_exception = true
268
+
269
+ namespace_thread = Thread.new(self, &:set_up_namespace_thread)
270
+ namespace_thread.abort_on_exception = true
271
+ end
272
+ end
273
+ @time_fields = []
274
+ @time_fields.push('_SOURCE_REALTIME_TIMESTAMP', '__REALTIME_TIMESTAMP') if @use_journal || @use_journal.nil?
275
+ @time_fields.push('time') unless @use_journal
276
+ @time_fields.push('@timestamp') if @lookup_from_k8s_field
277
+
278
+ @annotations_regexps = []
279
+ @annotation_match.each do |regexp|
280
+ @annotations_regexps << Regexp.compile(regexp)
281
+ rescue RegexpError => e
282
+ log.error "Error: invalid regular expression in annotation_match: #{e}"
283
+ end
284
+ end
285
+
286
+ def get_metadata_for_record(namespace_name, pod_name, container_name, container_id, create_time, batch_miss_cache)
287
+ metadata = {
288
+ 'docker' => { 'container_id' => container_id },
289
+ 'kubernetes' => {
290
+ 'container_name' => container_name,
291
+ 'namespace_name' => namespace_name,
292
+ 'pod_name' => pod_name
293
+ }
294
+ }
295
+ if present?(@kubernetes_url)
296
+ pod_metadata = get_pod_metadata(container_id, namespace_name, pod_name, create_time, batch_miss_cache)
297
+
298
+ if (pod_metadata.include? 'containers') && (pod_metadata['containers'].include? container_id) && !@skip_container_metadata
299
+ metadata['kubernetes']['container_image'] = pod_metadata['containers'][container_id]['image']
300
+ metadata['kubernetes']['container_image_id'] = pod_metadata['containers'][container_id]['image_id']
301
+ end
302
+
303
+ metadata['kubernetes'].merge!(pod_metadata) if pod_metadata
304
+ metadata['kubernetes'].delete('containers')
305
+ end
306
+ metadata
307
+ end
308
+
309
+ def filter_stream(tag, es)
310
+ return es if (es.respond_to?(:empty?) && es.empty?) || !es.is_a?(Fluent::EventStream)
311
+
312
+ new_es = Fluent::MultiEventStream.new
313
+ tag_match_data = tag.match(@tag_to_kubernetes_name_regexp_compiled) unless @use_journal
314
+ tag_metadata = nil
315
+ batch_miss_cache = {}
316
+ es.each do |time, record|
317
+ if tag_match_data && tag_metadata.nil?
318
+ tag_metadata = get_metadata_for_record(tag_match_data['namespace'], tag_match_data['pod_name'], tag_match_data['container_name'],
319
+ tag_match_data['docker_id'], create_time_from_record(record, time), batch_miss_cache)
320
+ end
321
+ metadata = Marshal.load(Marshal.dump(tag_metadata)) if tag_metadata
322
+ if (@use_journal || @use_journal.nil?) &&
323
+ (j_metadata = get_metadata_for_journal_record(record, time, batch_miss_cache))
324
+ metadata = j_metadata
325
+ end
326
+ if @lookup_from_k8s_field && record.key?('kubernetes') && record.key?('docker') &&
327
+ record['kubernetes'].respond_to?(:has_key?) && record['docker'].respond_to?(:has_key?) &&
328
+ record['kubernetes'].key?('namespace_name') &&
329
+ record['kubernetes'].key?('pod_name') &&
330
+ record['kubernetes'].key?('container_name') &&
331
+ record['docker'].key?('container_id') &&
332
+ (k_metadata = get_metadata_for_record(record['kubernetes']['namespace_name'], record['kubernetes']['pod_name'],
333
+ record['kubernetes']['container_name'], record['docker']['container_id'],
334
+ create_time_from_record(record, time), batch_miss_cache))
335
+ metadata = k_metadata
336
+ end
337
+
338
+ record = record.merge(metadata) if metadata
339
+ new_es.add(time, record)
340
+ end
341
+ dump_stats
342
+ new_es
343
+ end
344
+
345
+ def get_metadata_for_journal_record(record, time, batch_miss_cache)
346
+ metadata = nil
347
+ if record.key?('CONTAINER_NAME') && record.key?('CONTAINER_ID_FULL')
348
+ metadata = record['CONTAINER_NAME'].match(@container_name_to_kubernetes_regexp_compiled) do |match_data|
349
+ get_metadata_for_record(match_data['namespace'], match_data['pod_name'], match_data['container_name'],
350
+ record['CONTAINER_ID_FULL'], create_time_from_record(record, time), batch_miss_cache)
351
+ end
352
+ unless metadata
353
+ log.debug "Error: could not match CONTAINER_NAME from record #{record}"
354
+ @stats.bump(:container_name_match_failed)
355
+ end
356
+ elsif record.key?('CONTAINER_NAME') && record['CONTAINER_NAME'].start_with?('k8s_')
357
+ log.debug "Error: no container name and id in record #{record}"
358
+ @stats.bump(:container_name_id_missing)
359
+ end
360
+ metadata
361
+ end
362
+
363
+ def de_dot!(h)
364
+ h.keys.each do |ref|
365
+ next unless h[ref] && ref =~ /\./
366
+
367
+ v = h.delete(ref)
368
+ newref = ref.to_s.gsub('.', @de_dot_separator)
369
+ h[newref] = v
370
+ end
371
+ end
372
+
373
+ # copied from activesupport
374
+ def present?(object)
375
+ object.respond_to?(:empty?) ? !object.empty? : !!object
376
+ end
377
+ end
378
+ end