activerecord-trino-adapter 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +59 -0
- data/lib/active_record/connection_adapters/trino/database_statements.rb +8 -1
- data/lib/active_record/connection_adapters/trino/schema_statements.rb +56 -11
- data/lib/active_record/connection_adapters/trino_adapter.rb +33 -0
- data/lib/active_record/trino/config.rb +13 -0
- data/lib/active_record/trino/version.rb +1 -1
- data/lib/active_record/trino.rb +13 -0
- metadata +15 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1371d47ebd2cce2d35ff2390bcc4fb825a851c222cb6a445ece9dcfcd5eb47d7
|
|
4
|
+
data.tar.gz: 705759f9bc0613d910025bd238b3c8618c05aa6a51525085aeeb9301605a438a
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 89048af018a7bc867bef4f34ec3eb585d021fb85019e4c1b125746c9b751949dbbbcdb975b9077888b69cb423e561a43e68a65a858d363b99f4398d528dda46b
|
|
7
|
+
data.tar.gz: d5db090fb26309668cf9859ff46224f91fef8a05318fef993bf1f20866b9e56b5faac98382bf99e77cffe28919b140f9c176013360f0718b622e20dfe092d689
|
data/README.md
CHANGED
|
@@ -87,6 +87,65 @@ All keys are read from the `database.yml` entry:
|
|
|
87
87
|
| `query_timeout` | `150` | Hard ceiling on query duration, in seconds. Cap lower for user-facing paths and higher for backfills |
|
|
88
88
|
| `plan_timeout` | `30` | Ceiling on Trino query-planning phase, in seconds |
|
|
89
89
|
| `slow_query_threshold_seconds` | `20` | Threshold above which an `active_record_trino.slow_query` notification is emitted |
|
|
90
|
+
| `persistent` | `false` | Reuse one keep-alive HTTP connection per adapter instance instead of opening a fresh TCP+TLS connection for every request. See [Persistent HTTP connections](#persistent-http-connections) |
|
|
91
|
+
| `gzip` | _nil_ | When `true`, requests gzip-compressed HTTP response bodies from Trino |
|
|
92
|
+
| `bulk_column_reflection` | `false` | Reflect every table's columns in a single `information_schema.columns` query instead of one per table. See [Bulk column reflection](#bulk-column-reflection) |
|
|
93
|
+
| `static_schema` | `false` | Serve table existence from columns declared via `ActiveRecord::Trino.define_columns` instead of running `SHOW TABLES`. See [Static schema declarations](#static-schema-declarations) |
|
|
94
|
+
|
|
95
|
+
### Persistent HTTP connections
|
|
96
|
+
|
|
97
|
+
A single Trino query is 4-6 HTTP requests (`POST /v1/statement`, then repeated
|
|
98
|
+
`nextUri` polls), and by default each one pays a full TCP + TLS handshake. With
|
|
99
|
+
`persistent: true` the adapter keeps one keep-alive connection per adapter
|
|
100
|
+
instance (Rails checks out one adapter instance per thread, so no locking is
|
|
101
|
+
involved) and reuses it across requests and queries. Against a TLS-fronted
|
|
102
|
+
cluster this typically saves several hundred milliseconds per query.
|
|
103
|
+
|
|
104
|
+
Idle keep-alive sockets are recycled after 100 seconds, below typical
|
|
105
|
+
load-balancer idle timeouts, and Trino protocol GET polls are retried
|
|
106
|
+
transparently if the server closes a kept-alive socket. `disconnect!` shuts the
|
|
107
|
+
pool down; `reconnect!` rebuilds it.
|
|
108
|
+
|
|
109
|
+
### Bulk column reflection
|
|
110
|
+
|
|
111
|
+
By default the adapter reflects a model's columns with a single-table
|
|
112
|
+
`information_schema.columns` query the first time ActiveRecord loads its schema.
|
|
113
|
+
With `bulk_column_reflection: true`, the first reflection instead issues one
|
|
114
|
+
`information_schema.columns` query for the whole catalog/schema and groups the
|
|
115
|
+
result by table, memoized per connection. Since each Trino query carries a few
|
|
116
|
+
hundred milliseconds of fixed overhead, reflecting N tables this way costs one
|
|
117
|
+
round trip instead of N — useful when a process touches many warehouse tables.
|
|
118
|
+
The cache is cleared on `reconnect!` and by `ActiveRecord::Trino.reset_schema_cache!`.
|
|
119
|
+
|
|
120
|
+
### Static schema declarations
|
|
121
|
+
|
|
122
|
+
If your team owns the warehouse schema, you can declare a table's columns in
|
|
123
|
+
code and skip reflection entirely. Register columns with
|
|
124
|
+
`ActiveRecord::Trino.define_columns`:
|
|
125
|
+
|
|
126
|
+
```ruby
|
|
127
|
+
ActiveRecord::Trino.define_columns("sales_by_day", [
|
|
128
|
+
{ name: :territory_id, sql_type: "integer", null: false },
|
|
129
|
+
{ name: :month, sql_type: "date" },
|
|
130
|
+
{ name: :amount, sql_type: "decimal(18, 2)" }, # null defaults to true
|
|
131
|
+
])
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
When a table is registered, `#columns` serves the declared definitions and never
|
|
135
|
+
queries `information_schema`. Use the Trino SQL type strings (`bigint`,
|
|
136
|
+
`integer`, `date`, `decimal(18, 2)`, `varchar`, `timestamp(3)`, `boolean`, …) —
|
|
137
|
+
the same values `information_schema` would return — so type casting is identical.
|
|
138
|
+
Declaration is per table: undeclared tables still reflect.
|
|
139
|
+
|
|
140
|
+
Setting `static_schema: true` additionally serves table existence
|
|
141
|
+
(`#data_sources`) from the registered tables, so `SHOW TABLES` is never run. With
|
|
142
|
+
this enabled, every table the app queries must be declared, or ActiveRecord will
|
|
143
|
+
treat it as nonexistent.
|
|
144
|
+
|
|
145
|
+
Declarations become the source of truth: a column added, dropped, or retyped in
|
|
146
|
+
the warehouse is not picked up until you update the declaration and redeploy
|
|
147
|
+
(unlike reflection, which self-corrects on reconnect). Best suited to schemas
|
|
148
|
+
your team controls and changes deliberately.
|
|
90
149
|
|
|
91
150
|
## Instrumentation
|
|
92
151
|
|
|
@@ -53,7 +53,7 @@ module ActiveRecord
|
|
|
53
53
|
# rubocop:disable Metrics/AbcSize
|
|
54
54
|
def run_trino_query(sql)
|
|
55
55
|
start = monotonic_now
|
|
56
|
-
query =
|
|
56
|
+
query = start_query(sql)
|
|
57
57
|
internal = consume_query(query)
|
|
58
58
|
capture_query_metadata(query)
|
|
59
59
|
notify_slow_query(sql, monotonic_now - start)
|
|
@@ -69,6 +69,13 @@ module ActiveRecord
|
|
|
69
69
|
end
|
|
70
70
|
# rubocop:enable Metrics/AbcSize
|
|
71
71
|
|
|
72
|
+
def start_query(sql)
|
|
73
|
+
return client.query(sql) unless persistent?
|
|
74
|
+
|
|
75
|
+
statement = ::Trino::Client::StatementClient.new(persistent_faraday, sql, @client_options)
|
|
76
|
+
::Trino::Client::Query.new(statement)
|
|
77
|
+
end
|
|
78
|
+
|
|
72
79
|
def consume_query(query)
|
|
73
80
|
columns = query.columns || []
|
|
74
81
|
rows = []
|
|
@@ -5,18 +5,20 @@ module ActiveRecord
|
|
|
5
5
|
module Trino
|
|
6
6
|
module SchemaStatements
|
|
7
7
|
def columns(table_name)
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
)
|
|
8
|
+
table = table_name.to_s
|
|
9
|
+
declared = ActiveRecord::Trino.static_columns[table]
|
|
10
|
+
return build_static_columns(declared) if declared
|
|
11
|
+
|
|
12
|
+
if bulk_column_reflection?
|
|
13
|
+
column_definitions.fetch(table, [])
|
|
14
|
+
else
|
|
15
|
+
build_columns(run_trino_query(table_columns_query(table)).rows)
|
|
16
16
|
end
|
|
17
17
|
end
|
|
18
18
|
|
|
19
19
|
def data_sources
|
|
20
|
+
return ActiveRecord::Trino.static_columns.keys if static_schema?
|
|
21
|
+
|
|
20
22
|
run_trino_query("SHOW TABLES").rows.map(&:first)
|
|
21
23
|
end
|
|
22
24
|
alias tables data_sources
|
|
@@ -46,13 +48,46 @@ module ActiveRecord
|
|
|
46
48
|
false
|
|
47
49
|
end
|
|
48
50
|
|
|
49
|
-
def
|
|
50
|
-
@
|
|
51
|
+
def clear_column_cache!
|
|
52
|
+
@column_definitions = nil
|
|
51
53
|
end
|
|
52
54
|
|
|
53
55
|
private
|
|
54
56
|
|
|
55
|
-
def
|
|
57
|
+
def build_columns(rows)
|
|
58
|
+
rows.map do |name, data_type, is_nullable|
|
|
59
|
+
Trino::Column.new(
|
|
60
|
+
name: name,
|
|
61
|
+
sql_type: data_type,
|
|
62
|
+
type: type_map.lookup(data_type),
|
|
63
|
+
null: nullable?(is_nullable)
|
|
64
|
+
)
|
|
65
|
+
end
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
def build_static_columns(definitions)
|
|
69
|
+
definitions.map do |definition|
|
|
70
|
+
sql_type = definition.fetch(:sql_type)
|
|
71
|
+
Trino::Column.new(
|
|
72
|
+
name: definition.fetch(:name).to_s,
|
|
73
|
+
sql_type: sql_type,
|
|
74
|
+
type: type_map.lookup(sql_type),
|
|
75
|
+
null: definition.fetch(:null, true)
|
|
76
|
+
)
|
|
77
|
+
end
|
|
78
|
+
end
|
|
79
|
+
|
|
80
|
+
def column_definitions
|
|
81
|
+
@column_definitions ||= load_column_definitions
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
def load_column_definitions
|
|
85
|
+
run_trino_query(all_columns_query).rows.group_by(&:first).transform_values do |rows|
|
|
86
|
+
build_columns(rows.map { |row| row.drop(1) })
|
|
87
|
+
end
|
|
88
|
+
end
|
|
89
|
+
|
|
90
|
+
def table_columns_query(table_name)
|
|
56
91
|
<<~SQL.strip
|
|
57
92
|
SELECT column_name, data_type, is_nullable
|
|
58
93
|
FROM information_schema.columns
|
|
@@ -63,6 +98,16 @@ module ActiveRecord
|
|
|
63
98
|
SQL
|
|
64
99
|
end
|
|
65
100
|
|
|
101
|
+
def all_columns_query
|
|
102
|
+
<<~SQL.strip
|
|
103
|
+
SELECT table_name, column_name, data_type, is_nullable
|
|
104
|
+
FROM information_schema.columns
|
|
105
|
+
WHERE table_catalog = #{quote(trino_catalog)}
|
|
106
|
+
AND table_schema = #{quote(trino_schema)}
|
|
107
|
+
ORDER BY table_name, ordinal_position
|
|
108
|
+
SQL
|
|
109
|
+
end
|
|
110
|
+
|
|
66
111
|
def trino_catalog
|
|
67
112
|
@client_options[:catalog]
|
|
68
113
|
end
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
require "active_record"
|
|
4
4
|
require "active_record/connection_adapters/abstract_adapter"
|
|
5
|
+
require "faraday/net_http_persistent"
|
|
5
6
|
require "trino-client"
|
|
6
7
|
|
|
7
8
|
require "active_record/trino"
|
|
@@ -19,6 +20,8 @@ module ActiveRecord
|
|
|
19
20
|
class TrinoAdapter < AbstractAdapter
|
|
20
21
|
ADAPTER_NAME = "Trino"
|
|
21
22
|
|
|
23
|
+
PERSISTENT_IDLE_TIMEOUT = 100
|
|
24
|
+
|
|
22
25
|
include Trino::Quoting
|
|
23
26
|
include Trino::DatabaseStatements
|
|
24
27
|
include Trino::SchemaStatements
|
|
@@ -28,6 +31,9 @@ module ActiveRecord
|
|
|
28
31
|
super
|
|
29
32
|
@client_options = ActiveRecord::Trino::Config.client_options(@config)
|
|
30
33
|
@slow_query_threshold = ActiveRecord::Trino::Config.slow_query_threshold(@config)
|
|
34
|
+
@persistent = ActiveRecord::Trino::Config.persistent?(@config)
|
|
35
|
+
@bulk_column_reflection = ActiveRecord::Trino::Config.bulk_column_reflection?(@config)
|
|
36
|
+
@static_schema = ActiveRecord::Trino::Config.static_schema?(@config)
|
|
31
37
|
@client = build_client
|
|
32
38
|
install_safety_belts!
|
|
33
39
|
end
|
|
@@ -46,7 +52,10 @@ module ActiveRecord
|
|
|
46
52
|
end
|
|
47
53
|
|
|
48
54
|
def disconnect!
|
|
55
|
+
@persistent_faraday&.close
|
|
56
|
+
@persistent_faraday = nil
|
|
49
57
|
@client = nil
|
|
58
|
+
clear_column_cache!
|
|
50
59
|
end
|
|
51
60
|
|
|
52
61
|
def supports_transactions?
|
|
@@ -104,12 +113,36 @@ module ActiveRecord
|
|
|
104
113
|
|
|
105
114
|
attr_reader :client, :last_query_id, :last_query_info_uri, :last_query_stats
|
|
106
115
|
|
|
116
|
+
def persistent?
|
|
117
|
+
@persistent
|
|
118
|
+
end
|
|
119
|
+
|
|
120
|
+
def bulk_column_reflection?
|
|
121
|
+
@bulk_column_reflection
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
def static_schema?
|
|
125
|
+
@static_schema
|
|
126
|
+
end
|
|
127
|
+
|
|
107
128
|
private
|
|
108
129
|
|
|
109
130
|
def build_client
|
|
110
131
|
::Trino::Client.new(@client_options)
|
|
111
132
|
end
|
|
112
133
|
|
|
134
|
+
def persistent_faraday
|
|
135
|
+
@persistent_faraday ||= build_persistent_faraday
|
|
136
|
+
end
|
|
137
|
+
|
|
138
|
+
def build_persistent_faraday
|
|
139
|
+
faraday = ::Trino::Client.faraday_client(@client_options)
|
|
140
|
+
faraday.builder.adapter(:net_http_persistent) do |http|
|
|
141
|
+
http.idle_timeout = PERSISTENT_IDLE_TIMEOUT
|
|
142
|
+
end
|
|
143
|
+
faraday
|
|
144
|
+
end
|
|
145
|
+
|
|
113
146
|
def type_map
|
|
114
147
|
@type_map ||= Trino::TypeMap.build
|
|
115
148
|
end
|
|
@@ -27,11 +27,24 @@ module ActiveRecord
|
|
|
27
27
|
ssl: ssl,
|
|
28
28
|
http_proxy: symbolized[:http_proxy],
|
|
29
29
|
time_zone: symbolized[:time_zone],
|
|
30
|
+
gzip: symbolized[:gzip],
|
|
30
31
|
query_timeout: symbolized.fetch(:query_timeout, DEFAULT_QUERY_TIMEOUT),
|
|
31
32
|
plan_timeout: symbolized.fetch(:plan_timeout, DEFAULT_PLAN_TIMEOUT),
|
|
32
33
|
}.compact
|
|
33
34
|
end
|
|
34
35
|
|
|
36
|
+
def persistent?(config)
|
|
37
|
+
!!symbolize(config).fetch(:persistent, false)
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
def bulk_column_reflection?(config)
|
|
41
|
+
!!symbolize(config).fetch(:bulk_column_reflection, false)
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
def static_schema?(config)
|
|
45
|
+
!!symbolize(config).fetch(:static_schema, false)
|
|
46
|
+
end
|
|
47
|
+
|
|
35
48
|
def default_port(ssl)
|
|
36
49
|
ssl ? DEFAULT_HTTPS_PORT : DEFAULT_HTTP_PORT
|
|
37
50
|
end
|
data/lib/active_record/trino.rb
CHANGED
|
@@ -31,8 +31,21 @@ end
|
|
|
31
31
|
|
|
32
32
|
module ActiveRecord
|
|
33
33
|
module Trino
|
|
34
|
+
def self.static_columns
|
|
35
|
+
@static_columns ||= {}
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
def self.define_columns(table_name, definitions)
|
|
39
|
+
static_columns[table_name.to_s] = Array(definitions)
|
|
40
|
+
end
|
|
41
|
+
|
|
34
42
|
def self.reset_schema_cache!(model_class)
|
|
35
43
|
model_class.reset_column_information
|
|
44
|
+
|
|
45
|
+
model_class.connection_pool.connections.each do |conn|
|
|
46
|
+
conn.clear_column_cache! if conn.respond_to?(:clear_column_cache!)
|
|
47
|
+
end
|
|
48
|
+
|
|
36
49
|
return unless model_class.connection.respond_to?(:schema_cache)
|
|
37
50
|
|
|
38
51
|
model_class.connection.schema_cache.clear!
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: activerecord-trino-adapter
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Garett Arrowood
|
|
@@ -49,6 +49,20 @@ dependencies:
|
|
|
49
49
|
- - "<"
|
|
50
50
|
- !ruby/object:Gem::Version
|
|
51
51
|
version: '8.1'
|
|
52
|
+
- !ruby/object:Gem::Dependency
|
|
53
|
+
name: faraday-net_http_persistent
|
|
54
|
+
requirement: !ruby/object:Gem::Requirement
|
|
55
|
+
requirements:
|
|
56
|
+
- - ">="
|
|
57
|
+
- !ruby/object:Gem::Version
|
|
58
|
+
version: '2.0'
|
|
59
|
+
type: :runtime
|
|
60
|
+
prerelease: false
|
|
61
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
62
|
+
requirements:
|
|
63
|
+
- - ">="
|
|
64
|
+
- !ruby/object:Gem::Version
|
|
65
|
+
version: '2.0'
|
|
52
66
|
- !ruby/object:Gem::Dependency
|
|
53
67
|
name: trino-client
|
|
54
68
|
requirement: !ruby/object:Gem::Requirement
|