fluent-plugin-kubernetes_metadata_filter-rh 2.6.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.circleci/config.yml +57 -0
- data/.gitignore +19 -0
- data/.rubocop.yml +57 -0
- data/Gemfile +9 -0
- data/Gemfile.lock +156 -0
- data/LICENSE.txt +201 -0
- data/README.md +253 -0
- data/Rakefile +41 -0
- data/fluent-plugin-kubernetes_metadata_filter.gemspec +34 -0
- data/lib/fluent/plugin/filter_kubernetes_metadata.rb +378 -0
- data/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb +102 -0
- data/lib/fluent/plugin/kubernetes_metadata_common.rb +120 -0
- data/lib/fluent/plugin/kubernetes_metadata_stats.rb +46 -0
- data/lib/fluent/plugin/kubernetes_metadata_util.rb +40 -0
- data/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb +154 -0
- data/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb +172 -0
- data/test/cassettes/invalid_api_server_config.yml +53 -0
- data/test/cassettes/kubernetes_docker_metadata_annotations.yml +205 -0
- data/test/cassettes/kubernetes_docker_metadata_dotted_labels.yml +197 -0
- data/test/cassettes/kubernetes_get_api_v1.yml +193 -0
- data/test/cassettes/kubernetes_get_api_v1_using_token.yml +195 -0
- data/test/cassettes/kubernetes_get_namespace_default.yml +69 -0
- data/test/cassettes/kubernetes_get_namespace_default_using_token.yml +71 -0
- data/test/cassettes/kubernetes_get_pod.yml +146 -0
- data/test/cassettes/kubernetes_get_pod_using_token.yml +148 -0
- data/test/cassettes/metadata_from_tag_and_journald_fields.yml +153 -0
- data/test/cassettes/metadata_from_tag_journald_and_kubernetes_fields.yml +285 -0
- data/test/cassettes/valid_kubernetes_api_server.yml +55 -0
- data/test/cassettes/valid_kubernetes_api_server_using_token.yml +57 -0
- data/test/helper.rb +82 -0
- data/test/plugin/test.token +1 -0
- data/test/plugin/test_cache_stats.rb +33 -0
- data/test/plugin/test_cache_strategy.rb +194 -0
- data/test/plugin/test_filter_kubernetes_metadata.rb +1012 -0
- data/test/plugin/test_utils.rb +56 -0
- data/test/plugin/test_watch_namespaces.rb +245 -0
- data/test/plugin/test_watch_pods.rb +344 -0
- data/test/plugin/watch_test.rb +74 -0
- metadata +269 -0
data/README.md
ADDED
@@ -0,0 +1,253 @@
|
|
1
|
+
# fluent-plugin-kubernetes_metadata_filter, a plugin for [Fluentd](http://fluentd.org)
|
2
|
+
[![Circle CI](https://circleci.com/gh/fabric8io/fluent-plugin-kubernetes_metadata_filter.svg?style=svg)](https://circleci.com/gh/fabric8io/fluent-plugin-kubernetes_metadata_filter)
|
3
|
+
[![Code Climate](https://codeclimate.com/github/fabric8io/fluent-plugin-kubernetes_metadata_filter/badges/gpa.svg)](https://codeclimate.com/github/fabric8io/fluent-plugin-kubernetes_metadata_filter)
|
4
|
+
[![Test Coverage](https://codeclimate.com/github/fabric8io/fluent-plugin-kubernetes_metadata_filter/badges/coverage.svg)](https://codeclimate.com/github/fabric8io/fluent-plugin-kubernetes_metadata_filter)
|
5
|
+
[![Ruby Style Guide](https://img.shields.io/badge/code_style-rubocop-brightgreen.svg)](https://github.com/rubocop-hq/rubocop)
|
6
|
+
[![Ruby Style Guide](https://img.shields.io/badge/code_style-community-brightgreen.svg)](https://rubystyle.guide)
|
7
|
+
|
8
|
+
The Kubernetes metadata plugin filter enriches container log records with pod and namespace metadata.
|
9
|
+
|
10
|
+
This plugin derives basic metadata about the container that emitted a given log record using the source of the log record. Records from journald provide metadata about the
|
11
|
+
container environment as named fields. Records from JSON files encode metadata about the container in the file name. The initial metadata derived from the source is used
|
12
|
+
to lookup additional metadata about the container's associated pod and namespace (e.g. UUIDs, labels, annotations) when the kubernetes_url is configured. If the plugin cannot
|
13
|
+
authoritatively determine the namespace of the container emitting a log record, it will use an 'orphan' namespace ID in the metadata. This behaviors supports multi-tenant systems
|
14
|
+
that rely on the authenticity of the namespace for proper log isolation.
|
15
|
+
|
16
|
+
## Requirements
|
17
|
+
|
18
|
+
| fluent-plugin-kubernetes_metadata_filter | fluentd | ruby |
|
19
|
+
|-------------------|---------|------|
|
20
|
+
| >= 2.5.0 | >= v1.10.0 | >= 2.5 |
|
21
|
+
| >= 2.0.0 | >= v0.14.20 | >= 2.1 |
|
22
|
+
| < 2.0.0 | >= v0.12.0 | >= 1.9 |
|
23
|
+
|
24
|
+
NOTE: For v0.12 version, you should use 1.x.y version. Please send patch into v0.12 branch if you encountered 1.x version's bug.
|
25
|
+
|
26
|
+
NOTE: This documentation is for fluent-plugin-kubernetes_metadata_filter-plugin-elasticsearch 2.x or later. For 1.x documentation, please see [v0.12 branch](https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/tree/v0.12).
|
27
|
+
|
28
|
+
## Installation
|
29
|
+
|
30
|
+
gem install fluent-plugin-kubernetes_metadata_filter
|
31
|
+
|
32
|
+
## Configuration
|
33
|
+
|
34
|
+
Configuration options for fluent.conf are:
|
35
|
+
|
36
|
+
* `kubernetes_url` - URL to the API server. Set this to retrieve further kubernetes metadata for logs from kubernetes API server. If not specified, environment variables `KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT` will be used if both are present which is typically true when running fluentd in a pod.
|
37
|
+
* `apiVersion` - API version to use (default: `v1`)
|
38
|
+
* `ca_file` - path to CA file for Kubernetes server certificate validation
|
39
|
+
* `verify_ssl` - validate SSL certificates (default: `true`)
|
40
|
+
* `client_cert` - path to a client cert file to authenticate to the API server
|
41
|
+
* `client_key` - path to a client key file to authenticate to the API server
|
42
|
+
* `bearer_token_file` - path to a file containing the bearer token to use for authentication
|
43
|
+
* `tag_to_kubernetes_name_regexp` - the regular expression used to extract kubernetes metadata (pod name, container name, namespace) from the current fluentd tag.
|
44
|
+
This must used named capture groups for `container_name`, `pod_name` & `namespace` default: See [code](https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L52)
|
45
|
+
* `cache_size` - size of the cache of Kubernetes metadata to reduce requests to the API server (default: `1000`)
|
46
|
+
* `cache_ttl` - TTL in seconds of each cached element. Set to negative value to disable TTL eviction (default: `3600` - 1 hour)
|
47
|
+
* `watch` - set up a watch on pods on the API server for updates to metadata (default: `true`)
|
48
|
+
* `de_dot` - replace dots in labels and annotations with configured `de_dot_separator`, required for ElasticSearch 2.x compatibility (default: `true`)
|
49
|
+
* `de_dot_separator` - separator to use if `de_dot` is enabled (default: `_`)
|
50
|
+
* *DEPRECATED* `use_journal` - If false, messages are expected to be formatted and tagged as if read by the fluentd in\_tail plugin with wildcard filename. If true, messages are expected to be formatted as if read from the systemd journal. The `MESSAGE` field has the full message. The `CONTAINER_NAME` field has the encoded k8s metadata (see below). The `CONTAINER_ID_FULL` field has the full container uuid. This requires docker to use the `--log-driver=journald` log driver. If unset (the default), the plugin will use the `CONTAINER_NAME` and `CONTAINER_ID_FULL` fields
|
51
|
+
if available, otherwise, will use the tag in the `tag_to_kubernetes_name_regexp` format.
|
52
|
+
* `container_name_to_kubernetes_regexp` - The regular expression used to extract the k8s metadata encoded in the journal `CONTAINER_NAME` field default: See [code](https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L68)
|
53
|
+
* This corresponds to the definition [in the source](https://github.com/kubernetes/kubernetes/blob/release-1.6/pkg/kubelet/dockertools/docker.go#L317)
|
54
|
+
* `annotation_match` - Array of regular expressions matching annotation field names. Matched annotations are added to a log record.
|
55
|
+
* `allow_orphans` - Modify the namespace and namespace id to the values of `orphaned_namespace_name` and `orphaned_namespace_id`
|
56
|
+
when true (default: `true`)
|
57
|
+
* `orphaned_namespace_name` - The namespace to associate with records where the namespace can not be determined (default: `.orphaned`)
|
58
|
+
* `orphaned_namespace_id` - The namespace id to associate with records where the namespace can not be determined (default: `orphaned`)
|
59
|
+
* `lookup_from_k8s_field` - If the field `kubernetes` is present, lookup the metadata from the given subfields such as `kubernetes.namespace_name`, `kubernetes.pod_name`, etc. This allows you to avoid having to pass in metadata to lookup in an explicitly formatted tag name or in an explicitly formatted `CONTAINER_NAME` value. For example, set `kubernetes.namespace_name`, `kubernetes.pod_name`, `kubernetes.container_name`, and `docker.id` in the record, and the filter will fill in the rest. (default: `true`)
|
60
|
+
* `ssl_partial_chain` - if `ca_file` is for an intermediate CA, or otherwise we do not have the root CA and want
|
61
|
+
to trust the intermediate CA certs we do have, set this to `true` - this corresponds to
|
62
|
+
the `openssl s_client -partial_chain` flag and `X509_V_FLAG_PARTIAL_CHAIN` (default: `false`)
|
63
|
+
* `skip_labels` - Skip all label fields from the metadata.
|
64
|
+
* `skip_container_metadata` - Skip some of the container data of the metadata. The metadata will not contain the container_image and container_image_id fields.
|
65
|
+
* `skip_master_url` - Skip the master_url field from the metadata.
|
66
|
+
* `skip_namespace_metadata` - Skip the namespace_id field from the metadata. The fetch_namespace_metadata function will be skipped. The plugin will be faster and cpu consumption will be less.
|
67
|
+
* `watch_retry_interval` - The time interval in seconds for retry backoffs when watch connections fail. (default: `10`)
|
68
|
+
|
69
|
+
**NOTE:** As of the release 2.1.x of this plugin, it no longer supports parsing the source message into JSON and attaching it to the
|
70
|
+
payload. The following configuration options are removed:
|
71
|
+
|
72
|
+
* `merge_json_log`
|
73
|
+
* `preserve_json_log`
|
74
|
+
|
75
|
+
One way of preserving JSON logs can be through the [parser plugin](https://docs.fluentd.org/filter/parser)
|
76
|
+
|
77
|
+
**NOTE** As of this release, the use of `use_journal` is **DEPRECATED**. If this setting is not present, the plugin will
|
78
|
+
attempt to figure out the source of the metadata fields from the following:
|
79
|
+
- If `lookup_from_k8s_field true` (the default) and the following fields are present in the record:
|
80
|
+
`docker.container_id`, `kubernetes.namespace_name`, `kubernetes.pod_name`, `kubernetes.container_name`,
|
81
|
+
then the plugin will use those values as the source to use to lookup the metadata
|
82
|
+
- If `use_journal true`, or `use_journal` is unset, and the fields `CONTAINER_NAME` and `CONTAINER_ID_FULL` are present in the record,
|
83
|
+
then the plugin will parse those values using `container_name_to_kubernetes_regexp` and use those as the source to lookup the metadata
|
84
|
+
- Otherwise, if the tag matches `tag_to_kubernetes_name_regexp`, the plugin will parse the tag and use those values to
|
85
|
+
lookup the metdata
|
86
|
+
|
87
|
+
Reading from the JSON formatted log files with `in_tail` and wildcard filenames while respecting the CRI-o log format with the same config you need the fluent-plugin "multi-format-parser":
|
88
|
+
|
89
|
+
```
|
90
|
+
fluent-gem install fluent-plugin-multi-format-parser
|
91
|
+
```
|
92
|
+
|
93
|
+
The config block could look like this:
|
94
|
+
```
|
95
|
+
<source>
|
96
|
+
@type tail
|
97
|
+
path /var/log/containers/*.log
|
98
|
+
pos_file fluentd-docker.pos
|
99
|
+
read_from_head true
|
100
|
+
tag kubernetes.*
|
101
|
+
<parse>
|
102
|
+
@type multi_format
|
103
|
+
<pattern>
|
104
|
+
format json
|
105
|
+
time_key time
|
106
|
+
time_type string
|
107
|
+
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
|
108
|
+
keep_time_key false
|
109
|
+
</pattern>
|
110
|
+
<pattern>
|
111
|
+
format regexp
|
112
|
+
expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
|
113
|
+
time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
|
114
|
+
keep_time_key false
|
115
|
+
</pattern>
|
116
|
+
</parse>
|
117
|
+
</source>
|
118
|
+
|
119
|
+
<filter kubernetes.var.log.containers.**.log>
|
120
|
+
@type kubernetes_metadata
|
121
|
+
</filter>
|
122
|
+
|
123
|
+
<match **>
|
124
|
+
@type stdout
|
125
|
+
</match>
|
126
|
+
```
|
127
|
+
|
128
|
+
Reading from the systemd journal (requires the fluentd `fluent-plugin-systemd` and `systemd-journal` plugins, and requires docker to use the `--log-driver=journald` log driver):
|
129
|
+
```
|
130
|
+
<source>
|
131
|
+
@type systemd
|
132
|
+
path /run/log/journal
|
133
|
+
pos_file journal.pos
|
134
|
+
tag journal
|
135
|
+
read_from_head true
|
136
|
+
</source>
|
137
|
+
|
138
|
+
# probably want to use something like fluent-plugin-rewrite-tag-filter to
|
139
|
+
# retag entries from k8s
|
140
|
+
<match journal>
|
141
|
+
@type rewrite_tag_filter
|
142
|
+
rewriterule1 CONTAINER_NAME ^k8s_ kubernetes.journal.container
|
143
|
+
...
|
144
|
+
</match>
|
145
|
+
|
146
|
+
<filter kubernetes.**>
|
147
|
+
@type kubernetes_metadata
|
148
|
+
use_journal true
|
149
|
+
</filter>
|
150
|
+
|
151
|
+
<match **>
|
152
|
+
@type stdout
|
153
|
+
</match>
|
154
|
+
```
|
155
|
+
## Log content as JSON
|
156
|
+
In former versions this plugin parsed the value of the key log as JSON. In the current version this feature was removed, to avoid duplicate features in the fluentd plugin ecosystem. It can parsed with the parser plugin like this:
|
157
|
+
```
|
158
|
+
<filter kubernetes.**>
|
159
|
+
@type parser
|
160
|
+
key_name log
|
161
|
+
<parse>
|
162
|
+
@type json
|
163
|
+
json_parser json
|
164
|
+
</parse>
|
165
|
+
replace_invalid_sequence true
|
166
|
+
reserve_data true # this preserves unparsable log lines
|
167
|
+
emit_invalid_record_to_error false # In case of unparsable log lines keep the error log clean
|
168
|
+
reserve_time # the time was already parsed in the source, we don't want to overwrite it with current time.
|
169
|
+
</filter>
|
170
|
+
```
|
171
|
+
|
172
|
+
## Environment variables for Kubernetes
|
173
|
+
|
174
|
+
If the name of the Kubernetes node the plugin is running on is set as
|
175
|
+
an environment variable with the name `K8S_NODE_NAME`, it will reduce cache
|
176
|
+
misses and needless calls to the Kubernetes API.
|
177
|
+
|
178
|
+
In the Kubernetes container definition, this is easily accomplished by:
|
179
|
+
|
180
|
+
```yaml
|
181
|
+
env:
|
182
|
+
- name: K8S_NODE_NAME
|
183
|
+
valueFrom:
|
184
|
+
fieldRef:
|
185
|
+
fieldPath: spec.nodeName
|
186
|
+
```
|
187
|
+
|
188
|
+
## Example input/output
|
189
|
+
|
190
|
+
Kubernetes creates symlinks to Docker log files in `/var/log/containers/*.log`. Docker logs in JSON format.
|
191
|
+
|
192
|
+
Assuming following inputs are coming from a log file named `/var/log/containers/fabric8-console-controller-98rqc_default_fabric8-console-container-df14e0d5ae4c07284fa636d739c8fc2e6b52bc344658de7d3f08c36a2e804115.log`:
|
193
|
+
|
194
|
+
```
|
195
|
+
{
|
196
|
+
"log": "2015/05/05 19:54:41 \n",
|
197
|
+
"stream": "stderr",
|
198
|
+
"time": "2015-05-05T19:54:41.240447294Z"
|
199
|
+
}
|
200
|
+
```
|
201
|
+
|
202
|
+
Then output becomes as belows
|
203
|
+
```
|
204
|
+
{
|
205
|
+
"log": "2015/05/05 19:54:41 \n",
|
206
|
+
"stream": "stderr",
|
207
|
+
"docker": {
|
208
|
+
"id": "df14e0d5ae4c07284fa636d739c8fc2e6b52bc344658de7d3f08c36a2e804115",
|
209
|
+
}
|
210
|
+
"kubernetes": {
|
211
|
+
"host": "jimmi-redhat.localnet",
|
212
|
+
"pod_name":"fabric8-console-controller-98rqc",
|
213
|
+
"pod_id": "c76927af-f563-11e4-b32d-54ee7527188d",
|
214
|
+
"pod_ip": "172.17.0.8",
|
215
|
+
"container_name": "fabric8-console-container",
|
216
|
+
"namespace_name": "default",
|
217
|
+
"namespace_id": "23437884-8e08-4d95-850b-e94378c9b2fd",
|
218
|
+
"namespace_annotations": {
|
219
|
+
"fabric8.io/git-commit": "5e1116f63df0bac2a80bdae2ebdc563577bbdf3c"
|
220
|
+
},
|
221
|
+
"namespace_labels": {
|
222
|
+
"product_version": "v1.0.0"
|
223
|
+
},
|
224
|
+
"labels": {
|
225
|
+
"component": "fabric8Console"
|
226
|
+
}
|
227
|
+
}
|
228
|
+
}
|
229
|
+
```
|
230
|
+
|
231
|
+
If using journal input, from docker configured with `--log-driver=journald`, the input looks like the `journalctl -o export` format:
|
232
|
+
```
|
233
|
+
# The stream identification is encoded into the PRIORITY field as an
|
234
|
+
# integer: 6, or github.com/coreos/go-systemd/journal.Info, marks stdout,
|
235
|
+
# while 3, or github.com/coreos/go-systemd/journal.Err, marks stderr.
|
236
|
+
PRIORITY=6
|
237
|
+
CONTAINER_ID=b6cbb6e73c0a
|
238
|
+
CONTAINER_ID_FULL=b6cbb6e73c0ad63ab820e4baa97cdc77cec729930e38a714826764ac0491341a
|
239
|
+
CONTAINER_NAME=k8s_registry.a49f5318_docker-registry-1-hhoj0_default_ae3a9bdc-1f66-11e6-80a2-fa163e2fff3a_799e4035
|
240
|
+
MESSAGE=172.17.0.1 - - [21/May/2016:16:52:05 +0000] "GET /healthz HTTP/1.1" 200 0 "" "Go-http-client/1.1"
|
241
|
+
```
|
242
|
+
|
243
|
+
## Contributing
|
244
|
+
|
245
|
+
1. Fork it
|
246
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
247
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
248
|
+
4. Test it (`GEM_HOME=vendor bundle install; GEM_HOME=vendor bundle exec rake test`)
|
249
|
+
5. Push to the branch (`git push origin my-new-feature`)
|
250
|
+
6. Create new Pull Request
|
251
|
+
|
252
|
+
## Copyright
|
253
|
+
Copyright (c) 2015 jimmidyson
|
data/Rakefile
ADDED
@@ -0,0 +1,41 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'bundler/setup'
|
4
|
+
require 'bundler/gem_tasks'
|
5
|
+
require 'rake/testtask'
|
6
|
+
require 'bump/tasks'
|
7
|
+
require 'rubocop/rake_task'
|
8
|
+
|
9
|
+
task test: [:base_test]
|
10
|
+
task default: [:test, :build, :rubocop]
|
11
|
+
|
12
|
+
RuboCop::RakeTask.new
|
13
|
+
|
14
|
+
desc 'Run test_unit based test'
|
15
|
+
Rake::TestTask.new(:base_test) do |t|
|
16
|
+
# To run test for only one file (or file path pattern)
|
17
|
+
# $ bundle exec rake base_test TEST=test/test_specified_path.rb
|
18
|
+
# $ bundle exec rake base_test TEST=test/test_*.rb
|
19
|
+
t.libs << 'test'
|
20
|
+
t.test_files = Dir['test/**/test_*.rb'].sort
|
21
|
+
t.warning = false
|
22
|
+
end
|
23
|
+
|
24
|
+
desc 'Add copyright headers'
|
25
|
+
task :headers do
|
26
|
+
require 'rubygems'
|
27
|
+
require 'copyright_header'
|
28
|
+
|
29
|
+
args = {
|
30
|
+
license: 'Apache-2.0',
|
31
|
+
copyright_software: 'Fluentd Kubernetes Metadata Filter Plugin',
|
32
|
+
copyright_software_description: 'Enrich Fluentd events with Kubernetes metadata',
|
33
|
+
copyright_holders: ['Red Hat, Inc.'],
|
34
|
+
copyright_years: ['2015-2021'],
|
35
|
+
add_path: 'lib:test',
|
36
|
+
output_dir: '.'
|
37
|
+
}
|
38
|
+
|
39
|
+
command_line = CopyrightHeader::CommandLine.new(args)
|
40
|
+
command_line.execute
|
41
|
+
end
|
@@ -0,0 +1,34 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
lib = File.expand_path('lib', __dir__)
|
4
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
5
|
+
|
6
|
+
Gem::Specification.new do |gem|
|
7
|
+
gem.name = 'fluent-plugin-kubernetes_metadata_filter-rh'
|
8
|
+
gem.version = '2.6.1'
|
9
|
+
gem.authors = ['Stan Kwong']
|
10
|
+
gem.email = ['jpdstan@gmail.com']
|
11
|
+
gem.description = 'Filter plugin to add Kubernetes metadata'
|
12
|
+
gem.summary = 'Fluentd filter plugin to add Kubernetes metadata'
|
13
|
+
gem.homepage = 'https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter'
|
14
|
+
gem.license = 'Apache-2.0'
|
15
|
+
|
16
|
+
gem.files = `git ls-files`.split($/)
|
17
|
+
|
18
|
+
gem.required_ruby_version = '>= 2.5.0'
|
19
|
+
|
20
|
+
gem.add_runtime_dependency 'fluentd', ['>= 0.14.0', '< 1.13']
|
21
|
+
gem.add_runtime_dependency 'kubeclient', '< 5'
|
22
|
+
gem.add_runtime_dependency 'lru_redux'
|
23
|
+
|
24
|
+
gem.add_development_dependency 'bump'
|
25
|
+
gem.add_development_dependency 'bundler', '~> 2.0'
|
26
|
+
gem.add_development_dependency 'copyright-header'
|
27
|
+
gem.add_development_dependency 'minitest', '~> 4.0'
|
28
|
+
gem.add_development_dependency 'rake'
|
29
|
+
gem.add_development_dependency 'test-unit', '~> 3.0.2'
|
30
|
+
gem.add_development_dependency 'test-unit-rr', '~> 1.0.3'
|
31
|
+
gem.add_development_dependency 'vcr'
|
32
|
+
gem.add_development_dependency 'webmock'
|
33
|
+
gem.add_development_dependency 'yajl-ruby'
|
34
|
+
end
|
@@ -0,0 +1,378 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
#
|
4
|
+
# Fluentd Kubernetes Metadata Filter Plugin - Enrich Fluentd events with
|
5
|
+
# Kubernetes metadata
|
6
|
+
#
|
7
|
+
# Copyright 2017 Red Hat, Inc.
|
8
|
+
#
|
9
|
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
10
|
+
# you may not use this file except in compliance with the License.
|
11
|
+
# You may obtain a copy of the License at
|
12
|
+
#
|
13
|
+
# http://www.apache.org/licenses/LICENSE-2.0
|
14
|
+
#
|
15
|
+
# Unless required by applicable law or agreed to in writing, software
|
16
|
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
17
|
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
18
|
+
# See the License for the specific language governing permissions and
|
19
|
+
# limitations under the License.
|
20
|
+
#
|
21
|
+
|
22
|
+
require_relative 'kubernetes_metadata_cache_strategy'
|
23
|
+
require_relative 'kubernetes_metadata_common'
|
24
|
+
require_relative 'kubernetes_metadata_stats'
|
25
|
+
require_relative 'kubernetes_metadata_util'
|
26
|
+
require_relative 'kubernetes_metadata_watch_namespaces'
|
27
|
+
require_relative 'kubernetes_metadata_watch_pods'
|
28
|
+
|
29
|
+
require 'fluent/plugin/filter'
|
30
|
+
require 'resolv'
|
31
|
+
|
32
|
+
module Fluent::Plugin
|
33
|
+
class KubernetesMetadataFilter < Fluent::Plugin::Filter
|
34
|
+
K8_POD_CA_CERT = 'ca.crt'
|
35
|
+
K8_POD_TOKEN = 'token'
|
36
|
+
|
37
|
+
include KubernetesMetadata::CacheStrategy
|
38
|
+
include KubernetesMetadata::Common
|
39
|
+
include KubernetesMetadata::Util
|
40
|
+
include KubernetesMetadata::WatchNamespaces
|
41
|
+
include KubernetesMetadata::WatchPods
|
42
|
+
|
43
|
+
Fluent::Plugin.register_filter('kubernetes_metadata', self)
|
44
|
+
|
45
|
+
config_param :kubernetes_url, :string, default: nil
|
46
|
+
config_param :cache_size, :integer, default: 1000
|
47
|
+
config_param :cache_ttl, :integer, default: 60 * 60
|
48
|
+
config_param :watch, :bool, default: true
|
49
|
+
config_param :apiVersion, :string, default: 'v1'
|
50
|
+
config_param :client_cert, :string, default: nil
|
51
|
+
config_param :client_key, :string, default: nil
|
52
|
+
config_param :ca_file, :string, default: nil
|
53
|
+
config_param :verify_ssl, :bool, default: true
|
54
|
+
config_param :tag_to_kubernetes_name_regexp,
|
55
|
+
:string,
|
56
|
+
default: 'var\.log\.containers\.(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$'
|
57
|
+
config_param :bearer_token_file, :string, default: nil
|
58
|
+
config_param :secret_dir, :string, default: '/var/run/secrets/kubernetes.io/serviceaccount'
|
59
|
+
config_param :de_dot, :bool, default: true
|
60
|
+
config_param :de_dot_separator, :string, default: '_'
|
61
|
+
# if reading from the journal, the record will contain the following fields in the following
|
62
|
+
# format:
|
63
|
+
# CONTAINER_NAME=k8s_$containername.$containerhash_$podname_$namespacename_$poduuid_$rand32bitashex
|
64
|
+
# CONTAINER_FULL_ID=dockeridassha256hexvalue
|
65
|
+
config_param :use_journal, :bool, default: nil
|
66
|
+
# Field 2 is the container_hash, field 5 is the pod_id, and field 6 is the pod_randhex
|
67
|
+
# I would have included them as named groups, but you can't have named groups that are
|
68
|
+
# non-capturing :P
|
69
|
+
# parse format is defined here: https://github.com/kubernetes/kubernetes/blob/release-1.6/pkg/kubelet/dockertools/docker.go#L317
|
70
|
+
config_param :container_name_to_kubernetes_regexp,
|
71
|
+
:string,
|
72
|
+
default: '^(?<name_prefix>[^_]+)_(?<container_name>[^\._]+)(\.(?<container_hash>[^_]+))?_(?<pod_name>[^_]+)_(?<namespace>[^_]+)_[^_]+_[^_]+$'
|
73
|
+
|
74
|
+
config_param :annotation_match, :array, default: []
|
75
|
+
config_param :stats_interval, :integer, default: 30
|
76
|
+
config_param :allow_orphans, :bool, default: true
|
77
|
+
config_param :orphaned_namespace_name, :string, default: '.orphaned'
|
78
|
+
config_param :orphaned_namespace_id, :string, default: 'orphaned'
|
79
|
+
config_param :lookup_from_k8s_field, :bool, default: true
|
80
|
+
# if `ca_file` is for an intermediate CA, or otherwise we do not have the root CA and want
|
81
|
+
# to trust the intermediate CA certs we do have, set this to `true` - this corresponds to
|
82
|
+
# the openssl s_client -partial_chain flag and X509_V_FLAG_PARTIAL_CHAIN
|
83
|
+
config_param :ssl_partial_chain, :bool, default: false
|
84
|
+
config_param :skip_labels, :bool, default: false
|
85
|
+
config_param :skip_container_metadata, :bool, default: false
|
86
|
+
config_param :skip_master_url, :bool, default: false
|
87
|
+
config_param :skip_namespace_metadata, :bool, default: false
|
88
|
+
# The time interval in seconds for retry backoffs when watch connections fail.
|
89
|
+
config_param :watch_retry_interval, :integer, default: 1
|
90
|
+
# The base number of exponential backoff for retries.
|
91
|
+
config_param :watch_retry_exponential_backoff_base, :integer, default: 2
|
92
|
+
# The maximum number of times to retry pod and namespace watches.
|
93
|
+
config_param :watch_retry_max_times, :integer, default: 10
|
94
|
+
|
95
|
+
def fetch_pod_metadata(namespace_name, pod_name)
|
96
|
+
log.trace("fetching pod metadata: #{namespace_name}/#{pod_name}") if log.trace?
|
97
|
+
options = {
|
98
|
+
resource_version: '0' # Fetch from API server cache instead of etcd quorum read
|
99
|
+
}
|
100
|
+
pod_object = @client.get_pod(pod_name, namespace_name, options)
|
101
|
+
log.trace("raw metadata for #{namespace_name}/#{pod_name}: #{pod_object}") if log.trace?
|
102
|
+
metadata = parse_pod_metadata(pod_object)
|
103
|
+
@stats.bump(:pod_cache_api_updates)
|
104
|
+
log.trace("parsed metadata for #{namespace_name}/#{pod_name}: #{metadata}") if log.trace?
|
105
|
+
@cache[metadata['pod_id']] = metadata
|
106
|
+
rescue StandardError => e
|
107
|
+
@stats.bump(:pod_cache_api_nil_error)
|
108
|
+
log.debug "Exception '#{e}' encountered fetching pod metadata from Kubernetes API #{@apiVersion} endpoint #{@kubernetes_url}"
|
109
|
+
{}
|
110
|
+
end
|
111
|
+
|
112
|
+
def dump_stats
|
113
|
+
@curr_time = Time.now
|
114
|
+
return if @curr_time.to_i - @prev_time.to_i < @stats_interval
|
115
|
+
|
116
|
+
@prev_time = @curr_time
|
117
|
+
@stats.set(:pod_cache_size, @cache.count)
|
118
|
+
@stats.set(:namespace_cache_size, @namespace_cache.count) if @namespace_cache
|
119
|
+
log.info(@stats)
|
120
|
+
if log.level == Fluent::Log::LEVEL_TRACE
|
121
|
+
log.trace(" id cache: #{@id_cache.to_a}")
|
122
|
+
log.trace(" pod cache: #{@cache.to_a}")
|
123
|
+
log.trace("namespace cache: #{@namespace_cache.to_a}")
|
124
|
+
end
|
125
|
+
end
|
126
|
+
|
127
|
+
def fetch_namespace_metadata(namespace_name)
|
128
|
+
log.trace("fetching namespace metadata: #{namespace_name}") if log.trace?
|
129
|
+
options = {
|
130
|
+
resource_version: '0' # Fetch from API server cache instead of etcd quorum read
|
131
|
+
}
|
132
|
+
namespace_object = @client.get_namespace(namespace_name, nil, options)
|
133
|
+
log.trace("raw metadata for #{namespace_name}: #{namespace_object}") if log.trace?
|
134
|
+
metadata = parse_namespace_metadata(namespace_object)
|
135
|
+
@stats.bump(:namespace_cache_api_updates)
|
136
|
+
log.trace("parsed metadata for #{namespace_name}: #{metadata}") if log.trace?
|
137
|
+
@namespace_cache[metadata['namespace_id']] = metadata
|
138
|
+
rescue StandardError => e
|
139
|
+
@stats.bump(:namespace_cache_api_nil_error)
|
140
|
+
log.debug "Exception '#{e}' encountered fetching namespace metadata from Kubernetes API #{@apiVersion} endpoint #{@kubernetes_url}"
|
141
|
+
{}
|
142
|
+
end
|
143
|
+
|
144
|
+
def initialize
|
145
|
+
super
|
146
|
+
@prev_time = Time.now
|
147
|
+
end
|
148
|
+
|
149
|
+
def configure(conf)
|
150
|
+
super
|
151
|
+
|
152
|
+
def log.trace?
|
153
|
+
level == Fluent::Log::LEVEL_TRACE
|
154
|
+
end
|
155
|
+
|
156
|
+
require 'kubeclient'
|
157
|
+
require 'lru_redux'
|
158
|
+
@stats = KubernetesMetadata::Stats.new
|
159
|
+
|
160
|
+
if @de_dot && @de_dot_separator.include?('.')
|
161
|
+
raise Fluent::ConfigError, "Invalid de_dot_separator: cannot be or contain '.'"
|
162
|
+
end
|
163
|
+
|
164
|
+
if @cache_ttl < 0
|
165
|
+
log.info 'Setting the cache TTL to :none because it was <= 0'
|
166
|
+
@cache_ttl = :none
|
167
|
+
end
|
168
|
+
|
169
|
+
# Caches pod/namespace UID tuples for a given container UID.
|
170
|
+
@id_cache = LruRedux::TTL::ThreadSafeCache.new(@cache_size, @cache_ttl)
|
171
|
+
|
172
|
+
# Use the container UID as the key to fetch a hash containing pod metadata
|
173
|
+
@cache = LruRedux::TTL::ThreadSafeCache.new(@cache_size, @cache_ttl)
|
174
|
+
|
175
|
+
# Use the namespace UID as the key to fetch a hash containing namespace metadata
|
176
|
+
@namespace_cache = LruRedux::TTL::ThreadSafeCache.new(@cache_size, @cache_ttl)
|
177
|
+
|
178
|
+
@tag_to_kubernetes_name_regexp_compiled = Regexp.compile(@tag_to_kubernetes_name_regexp)
|
179
|
+
@container_name_to_kubernetes_regexp_compiled = Regexp.compile(@container_name_to_kubernetes_regexp)
|
180
|
+
|
181
|
+
# Use Kubernetes default service account if we're in a pod.
|
182
|
+
if @kubernetes_url.nil?
|
183
|
+
log.debug 'Kubernetes URL is not set - inspecting environ'
|
184
|
+
|
185
|
+
env_host = ENV['KUBERNETES_SERVICE_HOST']
|
186
|
+
env_port = ENV['KUBERNETES_SERVICE_PORT']
|
187
|
+
if present?(env_host) && present?(env_port)
|
188
|
+
if env_host =~ Resolv::IPv6::Regex
|
189
|
+
# Brackets are needed around IPv6 addresses
|
190
|
+
env_host = "[#{env_host}]"
|
191
|
+
end
|
192
|
+
@kubernetes_url = "https://#{env_host}:#{env_port}/api"
|
193
|
+
log.debug "Kubernetes URL is now '#{@kubernetes_url}'"
|
194
|
+
else
|
195
|
+
log.debug 'No Kubernetes URL could be found in config or environ'
|
196
|
+
end
|
197
|
+
end
|
198
|
+
|
199
|
+
# Use SSL certificate and bearer token from Kubernetes service account.
|
200
|
+
if Dir.exist?(@secret_dir)
|
201
|
+
log.debug "Found directory with secrets: #{@secret_dir}"
|
202
|
+
ca_cert = File.join(@secret_dir, K8_POD_CA_CERT)
|
203
|
+
pod_token = File.join(@secret_dir, K8_POD_TOKEN)
|
204
|
+
|
205
|
+
if !present?(@ca_file) && File.exist?(ca_cert)
|
206
|
+
log.debug "Found CA certificate: #{ca_cert}"
|
207
|
+
@ca_file = ca_cert
|
208
|
+
end
|
209
|
+
|
210
|
+
if !present?(@bearer_token_file) && File.exist?(pod_token)
|
211
|
+
log.debug "Found pod token: #{pod_token}"
|
212
|
+
@bearer_token_file = pod_token
|
213
|
+
end
|
214
|
+
end
|
215
|
+
|
216
|
+
if present?(@kubernetes_url)
|
217
|
+
ssl_options = {
|
218
|
+
client_cert: present?(@client_cert) ? OpenSSL::X509::Certificate.new(File.read(@client_cert)) : nil,
|
219
|
+
client_key: present?(@client_key) ? OpenSSL::PKey::RSA.new(File.read(@client_key)) : nil,
|
220
|
+
ca_file: @ca_file,
|
221
|
+
verify_ssl: @verify_ssl ? OpenSSL::SSL::VERIFY_PEER : OpenSSL::SSL::VERIFY_NONE
|
222
|
+
}
|
223
|
+
|
224
|
+
if @ssl_partial_chain
|
225
|
+
# taken from the ssl.rb OpenSSL::SSL::SSLContext code for DEFAULT_CERT_STORE
|
226
|
+
require 'openssl'
|
227
|
+
ssl_store = OpenSSL::X509::Store.new
|
228
|
+
ssl_store.set_default_paths
|
229
|
+
flagval = if defined? OpenSSL::X509::V_FLAG_PARTIAL_CHAIN
|
230
|
+
OpenSSL::X509::V_FLAG_PARTIAL_CHAIN
|
231
|
+
else
|
232
|
+
# this version of ruby does not define OpenSSL::X509::V_FLAG_PARTIAL_CHAIN
|
233
|
+
0x80000
|
234
|
+
end
|
235
|
+
ssl_store.flags = OpenSSL::X509::V_FLAG_CRL_CHECK_ALL | flagval
|
236
|
+
ssl_options[:cert_store] = ssl_store
|
237
|
+
end
|
238
|
+
|
239
|
+
auth_options = {}
|
240
|
+
|
241
|
+
if present?(@bearer_token_file)
|
242
|
+
bearer_token = File.read(@bearer_token_file)
|
243
|
+
auth_options[:bearer_token] = bearer_token
|
244
|
+
end
|
245
|
+
|
246
|
+
log.debug 'Creating K8S client'
|
247
|
+
@client = Kubeclient::Client.new(
|
248
|
+
@kubernetes_url,
|
249
|
+
@apiVersion,
|
250
|
+
ssl_options: ssl_options,
|
251
|
+
auth_options: auth_options,
|
252
|
+
as: :parsed_symbolized
|
253
|
+
)
|
254
|
+
|
255
|
+
begin
|
256
|
+
@client.api_valid?
|
257
|
+
rescue KubeException => e
|
258
|
+
raise Fluent::ConfigError, "Invalid Kubernetes API #{@apiVersion} endpoint #{@kubernetes_url}: #{e.message}"
|
259
|
+
end
|
260
|
+
|
261
|
+
if @watch
|
262
|
+
if ENV['K8S_NODE_NAME'].nil? || ENV['K8S_NODE_NAME'].strip.empty?
|
263
|
+
log.warn("!! The environment variable 'K8S_NODE_NAME' is not set to the node name which can affect the API server and watch efficiency !!")
|
264
|
+
end
|
265
|
+
|
266
|
+
pod_thread = Thread.new(self, &:set_up_pod_thread)
|
267
|
+
pod_thread.abort_on_exception = true
|
268
|
+
|
269
|
+
namespace_thread = Thread.new(self, &:set_up_namespace_thread)
|
270
|
+
namespace_thread.abort_on_exception = true
|
271
|
+
end
|
272
|
+
end
|
273
|
+
@time_fields = []
|
274
|
+
@time_fields.push('_SOURCE_REALTIME_TIMESTAMP', '__REALTIME_TIMESTAMP') if @use_journal || @use_journal.nil?
|
275
|
+
@time_fields.push('time') unless @use_journal
|
276
|
+
@time_fields.push('@timestamp') if @lookup_from_k8s_field
|
277
|
+
|
278
|
+
@annotations_regexps = []
|
279
|
+
@annotation_match.each do |regexp|
|
280
|
+
@annotations_regexps << Regexp.compile(regexp)
|
281
|
+
rescue RegexpError => e
|
282
|
+
log.error "Error: invalid regular expression in annotation_match: #{e}"
|
283
|
+
end
|
284
|
+
end
|
285
|
+
|
286
|
+
def get_metadata_for_record(namespace_name, pod_name, container_name, container_id, create_time, batch_miss_cache)
|
287
|
+
metadata = {
|
288
|
+
'docker' => { 'container_id' => container_id },
|
289
|
+
'kubernetes' => {
|
290
|
+
'container_name' => container_name,
|
291
|
+
'namespace_name' => namespace_name,
|
292
|
+
'pod_name' => pod_name
|
293
|
+
}
|
294
|
+
}
|
295
|
+
if present?(@kubernetes_url)
|
296
|
+
pod_metadata = get_pod_metadata(container_id, namespace_name, pod_name, create_time, batch_miss_cache)
|
297
|
+
|
298
|
+
if (pod_metadata.include? 'containers') && (pod_metadata['containers'].include? container_id) && !@skip_container_metadata
|
299
|
+
metadata['kubernetes']['container_image'] = pod_metadata['containers'][container_id]['image']
|
300
|
+
metadata['kubernetes']['container_image_id'] = pod_metadata['containers'][container_id]['image_id']
|
301
|
+
end
|
302
|
+
|
303
|
+
metadata['kubernetes'].merge!(pod_metadata) if pod_metadata
|
304
|
+
metadata['kubernetes'].delete('containers')
|
305
|
+
end
|
306
|
+
metadata
|
307
|
+
end
|
308
|
+
|
309
|
+
def filter_stream(tag, es)
|
310
|
+
return es if (es.respond_to?(:empty?) && es.empty?) || !es.is_a?(Fluent::EventStream)
|
311
|
+
|
312
|
+
new_es = Fluent::MultiEventStream.new
|
313
|
+
tag_match_data = tag.match(@tag_to_kubernetes_name_regexp_compiled) unless @use_journal
|
314
|
+
tag_metadata = nil
|
315
|
+
batch_miss_cache = {}
|
316
|
+
es.each do |time, record|
|
317
|
+
if tag_match_data && tag_metadata.nil?
|
318
|
+
tag_metadata = get_metadata_for_record(tag_match_data['namespace'], tag_match_data['pod_name'], tag_match_data['container_name'],
|
319
|
+
tag_match_data['docker_id'], create_time_from_record(record, time), batch_miss_cache)
|
320
|
+
end
|
321
|
+
metadata = Marshal.load(Marshal.dump(tag_metadata)) if tag_metadata
|
322
|
+
if (@use_journal || @use_journal.nil?) &&
|
323
|
+
(j_metadata = get_metadata_for_journal_record(record, time, batch_miss_cache))
|
324
|
+
metadata = j_metadata
|
325
|
+
end
|
326
|
+
if @lookup_from_k8s_field && record.key?('kubernetes') && record.key?('docker') &&
|
327
|
+
record['kubernetes'].respond_to?(:has_key?) && record['docker'].respond_to?(:has_key?) &&
|
328
|
+
record['kubernetes'].key?('namespace_name') &&
|
329
|
+
record['kubernetes'].key?('pod_name') &&
|
330
|
+
record['kubernetes'].key?('container_name') &&
|
331
|
+
record['docker'].key?('container_id') &&
|
332
|
+
(k_metadata = get_metadata_for_record(record['kubernetes']['namespace_name'], record['kubernetes']['pod_name'],
|
333
|
+
record['kubernetes']['container_name'], record['docker']['container_id'],
|
334
|
+
create_time_from_record(record, time), batch_miss_cache))
|
335
|
+
metadata = k_metadata
|
336
|
+
end
|
337
|
+
|
338
|
+
record = record.merge(metadata) if metadata
|
339
|
+
new_es.add(time, record)
|
340
|
+
end
|
341
|
+
dump_stats
|
342
|
+
new_es
|
343
|
+
end
|
344
|
+
|
345
|
+
def get_metadata_for_journal_record(record, time, batch_miss_cache)
|
346
|
+
metadata = nil
|
347
|
+
if record.key?('CONTAINER_NAME') && record.key?('CONTAINER_ID_FULL')
|
348
|
+
metadata = record['CONTAINER_NAME'].match(@container_name_to_kubernetes_regexp_compiled) do |match_data|
|
349
|
+
get_metadata_for_record(match_data['namespace'], match_data['pod_name'], match_data['container_name'],
|
350
|
+
record['CONTAINER_ID_FULL'], create_time_from_record(record, time), batch_miss_cache)
|
351
|
+
end
|
352
|
+
unless metadata
|
353
|
+
log.debug "Error: could not match CONTAINER_NAME from record #{record}"
|
354
|
+
@stats.bump(:container_name_match_failed)
|
355
|
+
end
|
356
|
+
elsif record.key?('CONTAINER_NAME') && record['CONTAINER_NAME'].start_with?('k8s_')
|
357
|
+
log.debug "Error: no container name and id in record #{record}"
|
358
|
+
@stats.bump(:container_name_id_missing)
|
359
|
+
end
|
360
|
+
metadata
|
361
|
+
end
|
362
|
+
|
363
|
+
def de_dot!(h)
|
364
|
+
h.keys.each do |ref|
|
365
|
+
next unless h[ref] && ref =~ /\./
|
366
|
+
|
367
|
+
v = h.delete(ref)
|
368
|
+
newref = ref.to_s.gsub('.', @de_dot_separator)
|
369
|
+
h[newref] = v
|
370
|
+
end
|
371
|
+
end
|
372
|
+
|
373
|
+
# copied from activesupport
|
374
|
+
def present?(object)
|
375
|
+
object.respond_to?(:empty?) ? !object.empty? : !!object
|
376
|
+
end
|
377
|
+
end
|
378
|
+
end
|