fluent-plugin-postgres-flex 0.1.0.rc1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 45da1fcf4cb02a37294161935326a66098c6bfe4
4
+ data.tar.gz: e05ff3eb81b8015f544962e64f5de4f2339c19b0
5
+ SHA512:
6
+ metadata.gz: 0f05aded3e2ca6d3b460996ac2c35f35e6f3c7ec2f4d385be31f8de047b169a5b30b8972c995b069dbc24e79f35c2fe7abf38520a52eab7188b4edae5261445e
7
+ data.tar.gz: c5513e6150d666d811b3db98feb9767b2e81d306bef9d8d7871866490415cf7eac768d6e254a2cae03aad103fac1ab6c00aabac383c088fa9f962dc48484a362
@@ -0,0 +1,6 @@
1
+ # Changelog
2
+
3
+ ### 0.1.1
4
+ 2019-11-19
5
+
6
+ Initial release.
@@ -0,0 +1,109 @@
1
+ # Flexible Postgres output for fluentd
2
+
3
+ An output plugin for [fluentd](https://www.fluentd.org/) for use with [Postgres](https://www.postgresql.org/) and [TimescaleDB](https://www.timescale.com/) that provides a great amount of flexibility in designing your log table structure.
4
+
5
+ This plugin automatically reads your log table's schema and maps log record properties to dedicated columns if possible. All other properties will be stored in an _extra_ column of type `json` or `jsonb`. This plugin also handles Porstgres `enum` types, it will try to map string properties to enum columns.
6
+
7
+ Consider following log table:
8
+
9
+ ```sql
10
+ CREATE TYPE Severity AS ENUM (
11
+ 'debug',
12
+ 'info',
13
+ 'notice',
14
+ 'warning',
15
+ 'error',
16
+ 'alert',
17
+ 'emergency'
18
+ );
19
+
20
+ CREATE TABLE public.logs (
21
+ time TIMESTAMPTZ NOT NULL,
22
+ severity Severity NOT NULL DEFAULT 'info',
23
+ message TEXT NULL,
24
+ extra JSONB NULL
25
+ );
26
+ ```
27
+
28
+ And a log event of the form:
29
+
30
+ ```
31
+ time 2019-10-10 10:01:20.1234
32
+ tag 'backend'
33
+ record {"severity":"notice","message":"Starting up...","hostname":"node0123","meta":{"env":"production"}}
34
+ ```
35
+
36
+ You will end up with this row inserted:
37
+
38
+ | time | severity | message | extra |
39
+ |------|----------|------------|----------------------------------------------------|
40
+ | `2019-10-10 10:01:20.1234` | notice | Starting up... | `{"hostname":"node0123", "meta":{"env":"production"}}`
41
+
42
+ The properties `severity` and `message` where mapped to their dedicated columns, all other properties landed in the `extra` column.
43
+
44
+ __Note:__ The event's tag is not used in any way. I consider the tag an implementation detail of fluentd's routing system, that should not be used elsewhere. If the tag contains valuable data in your setup, you can include it as property with the `record_transformer` plugin.
45
+
46
+
47
+ ## Requirements
48
+
49
+ - The `pg` Gem and it's native library.
50
+ - A time column of type `timestamp with timezone` or `timezone without timezone` in your log table.
51
+ - An _extra_ column of type `json` or `jsonb` to store all values without a dedicated column.
52
+
53
+
54
+ ## Configuration
55
+
56
+ - __`host`__ (string, default: `localhost`)<br>
57
+ The database server's hostname.
58
+
59
+ - __`port`__ (integer, default: `5432`)<br>
60
+ The database server's port.
61
+
62
+ - __`database`__ (string)<br>
63
+ The database name.
64
+
65
+ - __`username`__ (string)<br>
66
+ The database user name.
67
+
68
+ - __`password`__ (string)<br>
69
+ The database user's password.
70
+
71
+ - __`table`__ (string)<br>
72
+ The name of the log table.
73
+
74
+ - __`time_column`__ (string, default: `time`)<br>
75
+ The column name to store the timestamp of log events. Must be of type `timestamp with timezone` or `timezone without timezone`.
76
+
77
+ - __`extra_column`__ (string, default: `extra`)<br>
78
+ The column name to store excess properties without a dedicated column. Must be of type `json` or `jsonb`.
79
+
80
+
81
+ ## Value coercion
82
+
83
+ This plugin tries to coerce all values in a meaningful way
84
+
85
+ | column type | value type | coercion rule |
86
+ |- |- |- |
87
+ | timestamp | `string` | Parse as RFC3339 string |
88
+ | | `number` | Interpret as seconds since Unix epoch (with fractions) |
89
+ | | _others_ | _undefined, place in extra columns_ |
90
+ | text | _all_ | Convert to JSON string |
91
+ | boolean | `string` | Interpret `"t"`, `"true"` (any case) as `true`, `false` otherwise |
92
+ | | `number` | Interpret `0` as `false`, other values as `true` |
93
+ | | _others_ | _undefined, place in extra columns_ |
94
+ | real numbers | `boolean` | Interpret `true` as `1.0`, `false` as `0.0` |
95
+ | | `string` | Parse as decimal (with fractions) |
96
+ | | _others_ | _undefined, place in extra columns_ |
97
+ | integers | `boolean` | Interpret `true` as `1`, `false` as `0` |
98
+ | | `string` | Parse as decimal (without fractions) |
99
+ | json | _all_ | Convert to JSON string |
100
+
101
+
102
+ ## Log table design considerations
103
+
104
+ - Since you want to avoid losing log events, your log table should be designed in a way that it is
105
+ (almost) impossible to error at inserting data. This means that all columns should be either _nullable_ or provide a default value. The only exception is the time column, which is guaranteed to be filled with the event's time stamp.
106
+
107
+ - You may or may not need a primary key. For general use, a primary key is not really necessary, since you `select` and `delete` events only in bulk, not individually.
108
+
109
+ - Keep that in mind that the `timestamp` value type provides microsecond precision. This is good enough for many use cases but might not be enough for yours.
@@ -0,0 +1,212 @@
1
+ require 'fluent/plugin/output'
2
+ require 'fluent/time'
3
+ require 'pg'
4
+ require 'oj'
5
+ require 'date'
6
+
7
+ module Fluent::Plugin
8
+ class PostgresFlexOutput < Output
9
+ Fluent::Plugin.register_output('postgres-flex', self)
10
+
11
+ config_param :host, :string, default: 'localhost'
12
+ config_param :port, :integer, default: 5432
13
+ config_param :database, :string
14
+ config_param :username, :string
15
+ config_param :password, :string
16
+ config_param :table, :string
17
+ config_param :time_column, :string, default: 'time'
18
+ config_param :extra_column, :string, default: 'extra'
19
+
20
+ config_section :buffer do
21
+ config_set_default :flush_mode, :immediate
22
+ end
23
+
24
+ TimestampFormat = '%Y-%m-%d %H:%M:%S.%N %z'
25
+ TimestampFormatter = Fluent::TimeFormatter.new(TimestampFormat)
26
+ OjOptions = { mode: :strict }
27
+
28
+ def start
29
+ super
30
+ reconnect()
31
+ end
32
+
33
+ def stop
34
+ super
35
+ @db.finish
36
+ end
37
+
38
+ def write(chunk)
39
+ values = []
40
+ chunk.each { |time, record| values << record_to_values(time, record) }
41
+
42
+ begin
43
+ @db.async_exec("INSERT INTO #{ @db.quote_ident(@table) } #{ @value_names } VALUES #{ values.join(',') }")
44
+ rescue PG::UnableToSend => err
45
+ reconnect()
46
+ throw err
47
+ end
48
+ end
49
+
50
+ private
51
+
52
+ def reconnect
53
+ @db = PG::Connection.new(
54
+ :host => @host,
55
+ :port => @port,
56
+ :dbname => @database,
57
+ :user => @username,
58
+ :password => @password
59
+ )
60
+ @schema, @value_names = parse_schema(@db)
61
+ end
62
+
63
+ # Convert a single record to a postgres value string
64
+ #
65
+ # All values that have a dedicated column will be coerced and stored there. If coercion fails,
66
+ # the value will be retained in the _extra_ column and the default value will be used.
67
+ def record_to_values(eventTime, record)
68
+ direct_fields = []
69
+
70
+ @schema.each_pair { |key, type|
71
+ value = coerce_value(record[key], type)
72
+
73
+ if value.nil?
74
+ log.warn "Could not coerce value #{record[key].inspect} to required type #{type.inspect}"
75
+ else
76
+ direct_fields << value
77
+ record.delete(key)
78
+ end
79
+ }
80
+
81
+ time = @db.escape_literal(TimestampFormatter.format_with_subsec(eventTime))
82
+ extras = @db.escape_literal(Oj.dump(record, OjOptions))
83
+
84
+ return "(#{ time },#{ direct_fields.join(',') },#{ extras })"
85
+ end
86
+
87
+ # Coerce a single value to the type requiered by the database column
88
+ #
89
+ # @return The coerced value or nil, if the value could not be coerced
90
+ def coerce_value(v, type)
91
+ if v.nil?
92
+ 'DEFAULT'
93
+ else
94
+ case type
95
+ when :timestamp
96
+ case v
97
+ when String
98
+ # Parse as RFC3339
99
+ @db.escape_literal(DateTime.rfc3339(v).to_time.utc.strfrm(TimestampFormat))
100
+ when Numeric
101
+ # Interpret as Unix time: seconds (with fractions) since epoch
102
+ @db.escape_literal(Time.at(v).utc.strfrm(TimestampFormat))
103
+ else nil
104
+ end
105
+ when :string
106
+ @db.escape_literal(Oj.dump(v, OjOptions))
107
+ when :boolean
108
+ case v
109
+ when TrueClass; 'true'
110
+ when FalseClass; 'false'
111
+ when String
112
+ # Accept 't', 'T', 'true', 'TRUE', 'True'..., 1 as true, false otherwise
113
+ (v.downcase == 't' || v.downcase == 'true' || v == '1').to_s
114
+ when Numeric
115
+ v != 0
116
+ else nil
117
+ end
118
+ when :integer
119
+ case v
120
+ when TrueClass; '1' # Accept true as 1
121
+ when FalseClass; '0' # Accept false as 0
122
+ when String; v.to_i(10).to_s # Parse string as decimal
123
+ when Integer; v.to_s
124
+ else nil
125
+ end
126
+ when :float
127
+ case
128
+ when Float; v.to_s
129
+ when String; v.to_f.to_s # Parse string as float
130
+ else nil
131
+ end
132
+ when :json
133
+ begin
134
+ @db.escape_literal(Oj.dump(v, OjOptions))
135
+ rescue Oj::Error => e
136
+ # TODO log this
137
+ nil
138
+ end
139
+ when Array # enums
140
+ return @db.escape_literal(v) if type.include?(v)
141
+ else nil
142
+ end
143
+ end
144
+ end
145
+
146
+ # Parse postgres database schema and build a hash of column_name => type
147
+ def parse_schema(db)
148
+ # Map enum_name: String => enum_values: String[]
149
+ enums = db.async_exec(
150
+ 'SELECT DISTINCT (n.nspname||\'.\'||t.typname) AS "name", e.enumlabel as "value"' +
151
+ ' FROM pg_type t' +
152
+ ' JOIN pg_enum e on t.oid = e.enumtypid' +
153
+ ' JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace'
154
+ ).reduce({}) { |map, row|
155
+ name = row['name']
156
+ map[name] = [] unless map[name]
157
+ map[name] << row['value']
158
+
159
+ map
160
+ }
161
+
162
+ # Map column_name: String => type: Symbol|String[]
163
+ schema = db.async_exec(
164
+ 'SELECT column_name, ' +
165
+ ' (CASE WHEN data_type != \'USER-DEFINED\' THEN data_type ELSE (udt_schema||\'.\'||udt_name) END) as "type"' +
166
+ " FROM information_schema.columns WHERE table_name = #{ @db.escape_literal(@table) }"
167
+ ).reduce({}) { |map, row|
168
+ name = row['column_name']
169
+ type = case row['type']
170
+ when 'timestamp with time zone'; :timestamp
171
+ when 'timestamp without time zone'; :timestamp
172
+ when 'text'; :string
173
+ when 'character varying'; :string
174
+ when 'character'; :string
175
+ when 'boolean'; :boolean
176
+ when 'smallint'; :integer
177
+ when 'integer'; :integer
178
+ when 'bigint'; :integer
179
+ when 'decimal'; :float
180
+ when 'numeric'; :float
181
+ when 'real'; :float
182
+ when 'double precision'; :float
183
+ when 'json'; :json
184
+ when 'jsonb'; :json
185
+ else enums[row['type']] # Is enum?
186
+ end
187
+
188
+ if type.nil?
189
+ log.warn "Unhandled column type '#{type}'"
190
+ else
191
+ if name == @time_column
192
+ if type != :timestamp
193
+ raise Fluent::ConfigError.new('time column must be of type "timestamp with/without timestamp"')
194
+ end
195
+ elsif name == @extra_column
196
+ if type != :json
197
+ raise Fluent::ConfigError.new('extra column must be of type "json/jsonb"')
198
+ end
199
+ else
200
+ map[name] = type
201
+ end
202
+ end
203
+
204
+ map
205
+ }
206
+
207
+ value_names = "(#{ @time_column },#{ schema.keys.join(',') },#{ @extra_column })"
208
+
209
+ return schema.freeze, value_names.freeze
210
+ end
211
+ end
212
+ end
metadata ADDED
@@ -0,0 +1,47 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: fluent-plugin-postgres-flex
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0.rc1
5
+ platform: ruby
6
+ authors:
7
+ - André Wachter
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2019-11-19 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: Store fluentd structured log data in Postgres and TimescaleDB.
14
+ email: rubygems@anfe.ma
15
+ executables: []
16
+ extensions: []
17
+ extra_rdoc_files: []
18
+ files:
19
+ - Changelog.md
20
+ - Readme.md
21
+ - lib/fluent/plugin/out_postgres-flex.rb
22
+ homepage: https://github.com/anfema/fluent-plugin-postgres-flex
23
+ licenses:
24
+ - Apache-2.0
25
+ metadata:
26
+ source_code_uri: https://github.com/anfema/fluent-plugin-postgres-flex
27
+ post_install_message:
28
+ rdoc_options: []
29
+ require_paths:
30
+ - lib
31
+ required_ruby_version: !ruby/object:Gem::Requirement
32
+ requirements:
33
+ - - ">="
34
+ - !ruby/object:Gem::Version
35
+ version: '0'
36
+ required_rubygems_version: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">"
39
+ - !ruby/object:Gem::Version
40
+ version: 1.3.1
41
+ requirements: []
42
+ rubyforge_project:
43
+ rubygems_version: 2.5.2.3
44
+ signing_key:
45
+ specification_version: 4
46
+ summary: A fluentd plugin for storing logs in Postgres and TimescaleDB
47
+ test_files: []