fluent-plugin-postgres-flex 0.1.0.rc1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/Changelog.md +6 -0
- data/Readme.md +109 -0
- data/lib/fluent/plugin/out_postgres-flex.rb +212 -0
- metadata +47 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 45da1fcf4cb02a37294161935326a66098c6bfe4
|
4
|
+
data.tar.gz: e05ff3eb81b8015f544962e64f5de4f2339c19b0
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 0f05aded3e2ca6d3b460996ac2c35f35e6f3c7ec2f4d385be31f8de047b169a5b30b8972c995b069dbc24e79f35c2fe7abf38520a52eab7188b4edae5261445e
|
7
|
+
data.tar.gz: c5513e6150d666d811b3db98feb9767b2e81d306bef9d8d7871866490415cf7eac768d6e254a2cae03aad103fac1ab6c00aabac383c088fa9f962dc48484a362
|
data/Changelog.md
ADDED
data/Readme.md
ADDED
@@ -0,0 +1,109 @@
|
|
1
|
+
# Flexible Postgres output for fluentd
|
2
|
+
|
3
|
+
An output plugin for [fluentd](https://www.fluentd.org/) for use with [Postgres](https://www.postgresql.org/) and [TimescaleDB](https://www.timescale.com/) that provides a great amount of flexibility in designing your log table structure.
|
4
|
+
|
5
|
+
This plugin automatically reads your log table's schema and maps log record properties to dedicated columns if possible. All other properties will be stored in an _extra_ column of type `json` or `jsonb`. This plugin also handles Porstgres `enum` types, it will try to map string properties to enum columns.
|
6
|
+
|
7
|
+
Consider following log table:
|
8
|
+
|
9
|
+
```sql
|
10
|
+
CREATE TYPE Severity AS ENUM (
|
11
|
+
'debug',
|
12
|
+
'info',
|
13
|
+
'notice',
|
14
|
+
'warning',
|
15
|
+
'error',
|
16
|
+
'alert',
|
17
|
+
'emergency'
|
18
|
+
);
|
19
|
+
|
20
|
+
CREATE TABLE public.logs (
|
21
|
+
time TIMESTAMPTZ NOT NULL,
|
22
|
+
severity Severity NOT NULL DEFAULT 'info',
|
23
|
+
message TEXT NULL,
|
24
|
+
extra JSONB NULL
|
25
|
+
);
|
26
|
+
```
|
27
|
+
|
28
|
+
And a log event of the form:
|
29
|
+
|
30
|
+
```
|
31
|
+
time 2019-10-10 10:01:20.1234
|
32
|
+
tag 'backend'
|
33
|
+
record {"severity":"notice","message":"Starting up...","hostname":"node0123","meta":{"env":"production"}}
|
34
|
+
```
|
35
|
+
|
36
|
+
You will end up with this row inserted:
|
37
|
+
|
38
|
+
| time | severity | message | extra |
|
39
|
+
|------|----------|------------|----------------------------------------------------|
|
40
|
+
| `2019-10-10 10:01:20.1234` | notice | Starting up... | `{"hostname":"node0123", "meta":{"env":"production"}}`
|
41
|
+
|
42
|
+
The properties `severity` and `message` where mapped to their dedicated columns, all other properties landed in the `extra` column.
|
43
|
+
|
44
|
+
__Note:__ The event's tag is not used in any way. I consider the tag an implementation detail of fluentd's routing system, that should not be used elsewhere. If the tag contains valuable data in your setup, you can include it as property with the `record_transformer` plugin.
|
45
|
+
|
46
|
+
|
47
|
+
## Requirements
|
48
|
+
|
49
|
+
- The `pg` Gem and it's native library.
|
50
|
+
- A time column of type `timestamp with timezone` or `timezone without timezone` in your log table.
|
51
|
+
- An _extra_ column of type `json` or `jsonb` to store all values without a dedicated column.
|
52
|
+
|
53
|
+
|
54
|
+
## Configuration
|
55
|
+
|
56
|
+
- __`host`__ (string, default: `localhost`)<br>
|
57
|
+
The database server's hostname.
|
58
|
+
|
59
|
+
- __`port`__ (integer, default: `5432`)<br>
|
60
|
+
The database server's port.
|
61
|
+
|
62
|
+
- __`database`__ (string)<br>
|
63
|
+
The database name.
|
64
|
+
|
65
|
+
- __`username`__ (string)<br>
|
66
|
+
The database user name.
|
67
|
+
|
68
|
+
- __`password`__ (string)<br>
|
69
|
+
The database user's password.
|
70
|
+
|
71
|
+
- __`table`__ (string)<br>
|
72
|
+
The name of the log table.
|
73
|
+
|
74
|
+
- __`time_column`__ (string, default: `time`)<br>
|
75
|
+
The column name to store the timestamp of log events. Must be of type `timestamp with timezone` or `timezone without timezone`.
|
76
|
+
|
77
|
+
- __`extra_column`__ (string, default: `extra`)<br>
|
78
|
+
The column name to store excess properties without a dedicated column. Must be of type `json` or `jsonb`.
|
79
|
+
|
80
|
+
|
81
|
+
## Value coercion
|
82
|
+
|
83
|
+
This plugin tries to coerce all values in a meaningful way
|
84
|
+
|
85
|
+
| column type | value type | coercion rule |
|
86
|
+
|- |- |- |
|
87
|
+
| timestamp | `string` | Parse as RFC3339 string |
|
88
|
+
| | `number` | Interpret as seconds since Unix epoch (with fractions) |
|
89
|
+
| | _others_ | _undefined, place in extra columns_ |
|
90
|
+
| text | _all_ | Convert to JSON string |
|
91
|
+
| boolean | `string` | Interpret `"t"`, `"true"` (any case) as `true`, `false` otherwise |
|
92
|
+
| | `number` | Interpret `0` as `false`, other values as `true` |
|
93
|
+
| | _others_ | _undefined, place in extra columns_ |
|
94
|
+
| real numbers | `boolean` | Interpret `true` as `1.0`, `false` as `0.0` |
|
95
|
+
| | `string` | Parse as decimal (with fractions) |
|
96
|
+
| | _others_ | _undefined, place in extra columns_ |
|
97
|
+
| integers | `boolean` | Interpret `true` as `1`, `false` as `0` |
|
98
|
+
| | `string` | Parse as decimal (without fractions) |
|
99
|
+
| json | _all_ | Convert to JSON string |
|
100
|
+
|
101
|
+
|
102
|
+
## Log table design considerations
|
103
|
+
|
104
|
+
- Since you want to avoid losing log events, your log table should be designed in a way that it is
|
105
|
+
(almost) impossible to error at inserting data. This means that all columns should be either _nullable_ or provide a default value. The only exception is the time column, which is guaranteed to be filled with the event's time stamp.
|
106
|
+
|
107
|
+
- You may or may not need a primary key. For general use, a primary key is not really necessary, since you `select` and `delete` events only in bulk, not individually.
|
108
|
+
|
109
|
+
- Keep that in mind that the `timestamp` value type provides microsecond precision. This is good enough for many use cases but might not be enough for yours.
|
@@ -0,0 +1,212 @@
|
|
1
|
+
require 'fluent/plugin/output'
|
2
|
+
require 'fluent/time'
|
3
|
+
require 'pg'
|
4
|
+
require 'oj'
|
5
|
+
require 'date'
|
6
|
+
|
7
|
+
module Fluent::Plugin
|
8
|
+
class PostgresFlexOutput < Output
|
9
|
+
Fluent::Plugin.register_output('postgres-flex', self)
|
10
|
+
|
11
|
+
config_param :host, :string, default: 'localhost'
|
12
|
+
config_param :port, :integer, default: 5432
|
13
|
+
config_param :database, :string
|
14
|
+
config_param :username, :string
|
15
|
+
config_param :password, :string
|
16
|
+
config_param :table, :string
|
17
|
+
config_param :time_column, :string, default: 'time'
|
18
|
+
config_param :extra_column, :string, default: 'extra'
|
19
|
+
|
20
|
+
config_section :buffer do
|
21
|
+
config_set_default :flush_mode, :immediate
|
22
|
+
end
|
23
|
+
|
24
|
+
TimestampFormat = '%Y-%m-%d %H:%M:%S.%N %z'
|
25
|
+
TimestampFormatter = Fluent::TimeFormatter.new(TimestampFormat)
|
26
|
+
OjOptions = { mode: :strict }
|
27
|
+
|
28
|
+
def start
|
29
|
+
super
|
30
|
+
reconnect()
|
31
|
+
end
|
32
|
+
|
33
|
+
def stop
|
34
|
+
super
|
35
|
+
@db.finish
|
36
|
+
end
|
37
|
+
|
38
|
+
def write(chunk)
|
39
|
+
values = []
|
40
|
+
chunk.each { |time, record| values << record_to_values(time, record) }
|
41
|
+
|
42
|
+
begin
|
43
|
+
@db.async_exec("INSERT INTO #{ @db.quote_ident(@table) } #{ @value_names } VALUES #{ values.join(',') }")
|
44
|
+
rescue PG::UnableToSend => err
|
45
|
+
reconnect()
|
46
|
+
throw err
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
private
|
51
|
+
|
52
|
+
def reconnect
|
53
|
+
@db = PG::Connection.new(
|
54
|
+
:host => @host,
|
55
|
+
:port => @port,
|
56
|
+
:dbname => @database,
|
57
|
+
:user => @username,
|
58
|
+
:password => @password
|
59
|
+
)
|
60
|
+
@schema, @value_names = parse_schema(@db)
|
61
|
+
end
|
62
|
+
|
63
|
+
# Convert a single record to a postgres value string
|
64
|
+
#
|
65
|
+
# All values that have a dedicated column will be coerced and stored there. If coercion fails,
|
66
|
+
# the value will be retained in the _extra_ column and the default value will be used.
|
67
|
+
def record_to_values(eventTime, record)
|
68
|
+
direct_fields = []
|
69
|
+
|
70
|
+
@schema.each_pair { |key, type|
|
71
|
+
value = coerce_value(record[key], type)
|
72
|
+
|
73
|
+
if value.nil?
|
74
|
+
log.warn "Could not coerce value #{record[key].inspect} to required type #{type.inspect}"
|
75
|
+
else
|
76
|
+
direct_fields << value
|
77
|
+
record.delete(key)
|
78
|
+
end
|
79
|
+
}
|
80
|
+
|
81
|
+
time = @db.escape_literal(TimestampFormatter.format_with_subsec(eventTime))
|
82
|
+
extras = @db.escape_literal(Oj.dump(record, OjOptions))
|
83
|
+
|
84
|
+
return "(#{ time },#{ direct_fields.join(',') },#{ extras })"
|
85
|
+
end
|
86
|
+
|
87
|
+
# Coerce a single value to the type requiered by the database column
|
88
|
+
#
|
89
|
+
# @return The coerced value or nil, if the value could not be coerced
|
90
|
+
def coerce_value(v, type)
|
91
|
+
if v.nil?
|
92
|
+
'DEFAULT'
|
93
|
+
else
|
94
|
+
case type
|
95
|
+
when :timestamp
|
96
|
+
case v
|
97
|
+
when String
|
98
|
+
# Parse as RFC3339
|
99
|
+
@db.escape_literal(DateTime.rfc3339(v).to_time.utc.strfrm(TimestampFormat))
|
100
|
+
when Numeric
|
101
|
+
# Interpret as Unix time: seconds (with fractions) since epoch
|
102
|
+
@db.escape_literal(Time.at(v).utc.strfrm(TimestampFormat))
|
103
|
+
else nil
|
104
|
+
end
|
105
|
+
when :string
|
106
|
+
@db.escape_literal(Oj.dump(v, OjOptions))
|
107
|
+
when :boolean
|
108
|
+
case v
|
109
|
+
when TrueClass; 'true'
|
110
|
+
when FalseClass; 'false'
|
111
|
+
when String
|
112
|
+
# Accept 't', 'T', 'true', 'TRUE', 'True'..., 1 as true, false otherwise
|
113
|
+
(v.downcase == 't' || v.downcase == 'true' || v == '1').to_s
|
114
|
+
when Numeric
|
115
|
+
v != 0
|
116
|
+
else nil
|
117
|
+
end
|
118
|
+
when :integer
|
119
|
+
case v
|
120
|
+
when TrueClass; '1' # Accept true as 1
|
121
|
+
when FalseClass; '0' # Accept false as 0
|
122
|
+
when String; v.to_i(10).to_s # Parse string as decimal
|
123
|
+
when Integer; v.to_s
|
124
|
+
else nil
|
125
|
+
end
|
126
|
+
when :float
|
127
|
+
case
|
128
|
+
when Float; v.to_s
|
129
|
+
when String; v.to_f.to_s # Parse string as float
|
130
|
+
else nil
|
131
|
+
end
|
132
|
+
when :json
|
133
|
+
begin
|
134
|
+
@db.escape_literal(Oj.dump(v, OjOptions))
|
135
|
+
rescue Oj::Error => e
|
136
|
+
# TODO log this
|
137
|
+
nil
|
138
|
+
end
|
139
|
+
when Array # enums
|
140
|
+
return @db.escape_literal(v) if type.include?(v)
|
141
|
+
else nil
|
142
|
+
end
|
143
|
+
end
|
144
|
+
end
|
145
|
+
|
146
|
+
# Parse postgres database schema and build a hash of column_name => type
|
147
|
+
def parse_schema(db)
|
148
|
+
# Map enum_name: String => enum_values: String[]
|
149
|
+
enums = db.async_exec(
|
150
|
+
'SELECT DISTINCT (n.nspname||\'.\'||t.typname) AS "name", e.enumlabel as "value"' +
|
151
|
+
' FROM pg_type t' +
|
152
|
+
' JOIN pg_enum e on t.oid = e.enumtypid' +
|
153
|
+
' JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace'
|
154
|
+
).reduce({}) { |map, row|
|
155
|
+
name = row['name']
|
156
|
+
map[name] = [] unless map[name]
|
157
|
+
map[name] << row['value']
|
158
|
+
|
159
|
+
map
|
160
|
+
}
|
161
|
+
|
162
|
+
# Map column_name: String => type: Symbol|String[]
|
163
|
+
schema = db.async_exec(
|
164
|
+
'SELECT column_name, ' +
|
165
|
+
' (CASE WHEN data_type != \'USER-DEFINED\' THEN data_type ELSE (udt_schema||\'.\'||udt_name) END) as "type"' +
|
166
|
+
" FROM information_schema.columns WHERE table_name = #{ @db.escape_literal(@table) }"
|
167
|
+
).reduce({}) { |map, row|
|
168
|
+
name = row['column_name']
|
169
|
+
type = case row['type']
|
170
|
+
when 'timestamp with time zone'; :timestamp
|
171
|
+
when 'timestamp without time zone'; :timestamp
|
172
|
+
when 'text'; :string
|
173
|
+
when 'character varying'; :string
|
174
|
+
when 'character'; :string
|
175
|
+
when 'boolean'; :boolean
|
176
|
+
when 'smallint'; :integer
|
177
|
+
when 'integer'; :integer
|
178
|
+
when 'bigint'; :integer
|
179
|
+
when 'decimal'; :float
|
180
|
+
when 'numeric'; :float
|
181
|
+
when 'real'; :float
|
182
|
+
when 'double precision'; :float
|
183
|
+
when 'json'; :json
|
184
|
+
when 'jsonb'; :json
|
185
|
+
else enums[row['type']] # Is enum?
|
186
|
+
end
|
187
|
+
|
188
|
+
if type.nil?
|
189
|
+
log.warn "Unhandled column type '#{type}'"
|
190
|
+
else
|
191
|
+
if name == @time_column
|
192
|
+
if type != :timestamp
|
193
|
+
raise Fluent::ConfigError.new('time column must be of type "timestamp with/without timestamp"')
|
194
|
+
end
|
195
|
+
elsif name == @extra_column
|
196
|
+
if type != :json
|
197
|
+
raise Fluent::ConfigError.new('extra column must be of type "json/jsonb"')
|
198
|
+
end
|
199
|
+
else
|
200
|
+
map[name] = type
|
201
|
+
end
|
202
|
+
end
|
203
|
+
|
204
|
+
map
|
205
|
+
}
|
206
|
+
|
207
|
+
value_names = "(#{ @time_column },#{ schema.keys.join(',') },#{ @extra_column })"
|
208
|
+
|
209
|
+
return schema.freeze, value_names.freeze
|
210
|
+
end
|
211
|
+
end
|
212
|
+
end
|
metadata
ADDED
@@ -0,0 +1,47 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: fluent-plugin-postgres-flex
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0.rc1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- André Wachter
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2019-11-19 00:00:00.000000000 Z
|
12
|
+
dependencies: []
|
13
|
+
description: Store fluentd structured log data in Postgres and TimescaleDB.
|
14
|
+
email: rubygems@anfe.ma
|
15
|
+
executables: []
|
16
|
+
extensions: []
|
17
|
+
extra_rdoc_files: []
|
18
|
+
files:
|
19
|
+
- Changelog.md
|
20
|
+
- Readme.md
|
21
|
+
- lib/fluent/plugin/out_postgres-flex.rb
|
22
|
+
homepage: https://github.com/anfema/fluent-plugin-postgres-flex
|
23
|
+
licenses:
|
24
|
+
- Apache-2.0
|
25
|
+
metadata:
|
26
|
+
source_code_uri: https://github.com/anfema/fluent-plugin-postgres-flex
|
27
|
+
post_install_message:
|
28
|
+
rdoc_options: []
|
29
|
+
require_paths:
|
30
|
+
- lib
|
31
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
32
|
+
requirements:
|
33
|
+
- - ">="
|
34
|
+
- !ruby/object:Gem::Version
|
35
|
+
version: '0'
|
36
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: 1.3.1
|
41
|
+
requirements: []
|
42
|
+
rubyforge_project:
|
43
|
+
rubygems_version: 2.5.2.3
|
44
|
+
signing_key:
|
45
|
+
specification_version: 4
|
46
|
+
summary: A fluentd plugin for storing logs in Postgres and TimescaleDB
|
47
|
+
test_files: []
|