activerecord-trino-adapter 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b38f0bee0dc9764f269632d9b138de7aff8c4fb3fb696c8862bd8de23a34999e
4
- data.tar.gz: 8a24ba5209ee11e82ee43d67182798599b96b1a9f1a621b4f60edf0a2ae3c0ed
3
+ metadata.gz: 1371d47ebd2cce2d35ff2390bcc4fb825a851c222cb6a445ece9dcfcd5eb47d7
4
+ data.tar.gz: 705759f9bc0613d910025bd238b3c8618c05aa6a51525085aeeb9301605a438a
5
5
  SHA512:
6
- metadata.gz: 83cba7a9d4c291d021d94729bd2e1fbb37e101ec19749163b4eebcee99969360cd46e3831e0d952eb63770d6db53e46abce9627fa82208f3103b38361c15148b
7
- data.tar.gz: fd39bfe9b1025178a0a95e35b03bbd9617a3859931e7eea02af7a8ce96a8cf02dd7090469fe08ceca19208a1c9f53a20d83345ca78229e2adb426408102fa624
6
+ metadata.gz: 89048af018a7bc867bef4f34ec3eb585d021fb85019e4c1b125746c9b751949dbbbcdb975b9077888b69cb423e561a43e68a65a858d363b99f4398d528dda46b
7
+ data.tar.gz: d5db090fb26309668cf9859ff46224f91fef8a05318fef993bf1f20866b9e56b5faac98382bf99e77cffe28919b140f9c176013360f0718b622e20dfe092d689
data/README.md CHANGED
@@ -87,6 +87,65 @@ All keys are read from the `database.yml` entry:
87
87
  | `query_timeout` | `150` | Hard ceiling on query duration, in seconds. Cap lower for user-facing paths and higher for backfills |
88
88
  | `plan_timeout` | `30` | Ceiling on Trino query-planning phase, in seconds |
89
89
  | `slow_query_threshold_seconds` | `20` | Threshold above which an `active_record_trino.slow_query` notification is emitted |
90
+ | `persistent` | `false` | Reuse one keep-alive HTTP connection per adapter instance instead of opening a fresh TCP+TLS connection for every request. See [Persistent HTTP connections](#persistent-http-connections) |
91
+ | `gzip` | _nil_ | When `true`, requests gzip-compressed HTTP response bodies from Trino |
92
+ | `bulk_column_reflection` | `false` | Reflect every table's columns in a single `information_schema.columns` query instead of one per table. See [Bulk column reflection](#bulk-column-reflection) |
93
+ | `static_schema` | `false` | Serve table existence from columns declared via `ActiveRecord::Trino.define_columns` instead of running `SHOW TABLES`. See [Static schema declarations](#static-schema-declarations) |
94
+
95
+ ### Persistent HTTP connections
96
+
97
+ A single Trino query is 4-6 HTTP requests (`POST /v1/statement`, then repeated
98
+ `nextUri` polls), and by default each one pays a full TCP + TLS handshake. With
99
+ `persistent: true` the adapter keeps one keep-alive connection per adapter
100
+ instance (Rails checks out one adapter instance per thread, so no locking is
101
+ involved) and reuses it across requests and queries. Against a TLS-fronted
102
+ cluster this typically saves several hundred milliseconds per query.
103
+
104
+ Idle keep-alive sockets are recycled after 100 seconds, below typical
105
+ load-balancer idle timeouts, and Trino protocol GET polls are retried
106
+ transparently if the server closes a kept-alive socket. `disconnect!` shuts the
107
+ pool down; `reconnect!` rebuilds it.
108
+
109
+ ### Bulk column reflection
110
+
111
+ By default the adapter reflects a model's columns with a single-table
112
+ `information_schema.columns` query the first time ActiveRecord loads its schema.
113
+ With `bulk_column_reflection: true`, the first reflection instead issues one
114
+ `information_schema.columns` query for the whole catalog/schema and groups the
115
+ result by table, memoized per connection. Since each Trino query carries a few
116
+ hundred milliseconds of fixed overhead, reflecting N tables this way costs one
117
+ round trip instead of N — useful when a process touches many warehouse tables.
118
+ The cache is cleared on `reconnect!` and by `ActiveRecord::Trino.reset_schema_cache!`.
119
+
120
+ ### Static schema declarations
121
+
122
+ If your team owns the warehouse schema, you can declare a table's columns in
123
+ code and skip reflection entirely. Register columns with
124
+ `ActiveRecord::Trino.define_columns`:
125
+
126
+ ```ruby
127
+ ActiveRecord::Trino.define_columns("sales_by_day", [
128
+ { name: :territory_id, sql_type: "integer", null: false },
129
+ { name: :month, sql_type: "date" },
130
+ { name: :amount, sql_type: "decimal(18, 2)" }, # null defaults to true
131
+ ])
132
+ ```
133
+
134
+ When a table is registered, `#columns` serves the declared definitions and never
135
+ queries `information_schema`. Use the Trino SQL type strings (`bigint`,
136
+ `integer`, `date`, `decimal(18, 2)`, `varchar`, `timestamp(3)`, `boolean`, …) —
137
+ the same values `information_schema` would return — so type casting is identical.
138
+ Declaration is per table: undeclared tables still reflect.
139
+
140
+ Setting `static_schema: true` additionally serves table existence
141
+ (`#data_sources`) from the registered tables, so `SHOW TABLES` is never run. With
142
+ this enabled, every table the app queries must be declared, or ActiveRecord will
143
+ treat it as nonexistent.
144
+
145
+ Declarations become the source of truth: a column added, dropped, or retyped in
146
+ the warehouse is not picked up until you update the declaration and redeploy
147
+ (unlike reflection, which self-corrects on reconnect). Best suited to schemas
148
+ your team controls and changes deliberately.
90
149
 
91
150
  ## Instrumentation
92
151
 
@@ -53,7 +53,7 @@ module ActiveRecord
53
53
  # rubocop:disable Metrics/AbcSize
54
54
  def run_trino_query(sql)
55
55
  start = monotonic_now
56
- query = client.query(sql)
56
+ query = start_query(sql)
57
57
  internal = consume_query(query)
58
58
  capture_query_metadata(query)
59
59
  notify_slow_query(sql, monotonic_now - start)
@@ -69,6 +69,13 @@ module ActiveRecord
69
69
  end
70
70
  # rubocop:enable Metrics/AbcSize
71
71
 
72
+ def start_query(sql)
73
+ return client.query(sql) unless persistent?
74
+
75
+ statement = ::Trino::Client::StatementClient.new(persistent_faraday, sql, @client_options)
76
+ ::Trino::Client::Query.new(statement)
77
+ end
78
+
72
79
  def consume_query(query)
73
80
  columns = query.columns || []
74
81
  rows = []
@@ -5,18 +5,20 @@ module ActiveRecord
5
5
  module Trino
6
6
  module SchemaStatements
7
7
  def columns(table_name)
8
- rows = run_trino_query(columns_query(table_name.to_s)).rows
9
- rows.map do |name, data_type, is_nullable|
10
- Trino::Column.new(
11
- name: name,
12
- sql_type: data_type,
13
- type: type_map.lookup(data_type),
14
- null: nullable?(is_nullable)
15
- )
8
+ table = table_name.to_s
9
+ declared = ActiveRecord::Trino.static_columns[table]
10
+ return build_static_columns(declared) if declared
11
+
12
+ if bulk_column_reflection?
13
+ column_definitions.fetch(table, [])
14
+ else
15
+ build_columns(run_trino_query(table_columns_query(table)).rows)
16
16
  end
17
17
  end
18
18
 
19
19
  def data_sources
20
+ return ActiveRecord::Trino.static_columns.keys if static_schema?
21
+
20
22
  run_trino_query("SHOW TABLES").rows.map(&:first)
21
23
  end
22
24
  alias tables data_sources
@@ -46,13 +48,46 @@ module ActiveRecord
46
48
  false
47
49
  end
48
50
 
49
- def schema_cache
50
- @schema_cache ||= ActiveRecord::ConnectionAdapters::SchemaCache.new(self)
51
+ def clear_column_cache!
52
+ @column_definitions = nil
51
53
  end
52
54
 
53
55
  private
54
56
 
55
- def columns_query(table_name)
57
+ def build_columns(rows)
58
+ rows.map do |name, data_type, is_nullable|
59
+ Trino::Column.new(
60
+ name: name,
61
+ sql_type: data_type,
62
+ type: type_map.lookup(data_type),
63
+ null: nullable?(is_nullable)
64
+ )
65
+ end
66
+ end
67
+
68
+ def build_static_columns(definitions)
69
+ definitions.map do |definition|
70
+ sql_type = definition.fetch(:sql_type)
71
+ Trino::Column.new(
72
+ name: definition.fetch(:name).to_s,
73
+ sql_type: sql_type,
74
+ type: type_map.lookup(sql_type),
75
+ null: definition.fetch(:null, true)
76
+ )
77
+ end
78
+ end
79
+
80
+ def column_definitions
81
+ @column_definitions ||= load_column_definitions
82
+ end
83
+
84
+ def load_column_definitions
85
+ run_trino_query(all_columns_query).rows.group_by(&:first).transform_values do |rows|
86
+ build_columns(rows.map { |row| row.drop(1) })
87
+ end
88
+ end
89
+
90
+ def table_columns_query(table_name)
56
91
  <<~SQL.strip
57
92
  SELECT column_name, data_type, is_nullable
58
93
  FROM information_schema.columns
@@ -63,6 +98,16 @@ module ActiveRecord
63
98
  SQL
64
99
  end
65
100
 
101
+ def all_columns_query
102
+ <<~SQL.strip
103
+ SELECT table_name, column_name, data_type, is_nullable
104
+ FROM information_schema.columns
105
+ WHERE table_catalog = #{quote(trino_catalog)}
106
+ AND table_schema = #{quote(trino_schema)}
107
+ ORDER BY table_name, ordinal_position
108
+ SQL
109
+ end
110
+
66
111
  def trino_catalog
67
112
  @client_options[:catalog]
68
113
  end
@@ -2,6 +2,7 @@
2
2
 
3
3
  require "active_record"
4
4
  require "active_record/connection_adapters/abstract_adapter"
5
+ require "faraday/net_http_persistent"
5
6
  require "trino-client"
6
7
 
7
8
  require "active_record/trino"
@@ -19,6 +20,8 @@ module ActiveRecord
19
20
  class TrinoAdapter < AbstractAdapter
20
21
  ADAPTER_NAME = "Trino"
21
22
 
23
+ PERSISTENT_IDLE_TIMEOUT = 100
24
+
22
25
  include Trino::Quoting
23
26
  include Trino::DatabaseStatements
24
27
  include Trino::SchemaStatements
@@ -28,6 +31,9 @@ module ActiveRecord
28
31
  super
29
32
  @client_options = ActiveRecord::Trino::Config.client_options(@config)
30
33
  @slow_query_threshold = ActiveRecord::Trino::Config.slow_query_threshold(@config)
34
+ @persistent = ActiveRecord::Trino::Config.persistent?(@config)
35
+ @bulk_column_reflection = ActiveRecord::Trino::Config.bulk_column_reflection?(@config)
36
+ @static_schema = ActiveRecord::Trino::Config.static_schema?(@config)
31
37
  @client = build_client
32
38
  install_safety_belts!
33
39
  end
@@ -46,7 +52,10 @@ module ActiveRecord
46
52
  end
47
53
 
48
54
  def disconnect!
55
+ @persistent_faraday&.close
56
+ @persistent_faraday = nil
49
57
  @client = nil
58
+ clear_column_cache!
50
59
  end
51
60
 
52
61
  def supports_transactions?
@@ -104,12 +113,36 @@ module ActiveRecord
104
113
 
105
114
  attr_reader :client, :last_query_id, :last_query_info_uri, :last_query_stats
106
115
 
116
+ def persistent?
117
+ @persistent
118
+ end
119
+
120
+ def bulk_column_reflection?
121
+ @bulk_column_reflection
122
+ end
123
+
124
+ def static_schema?
125
+ @static_schema
126
+ end
127
+
107
128
  private
108
129
 
109
130
  def build_client
110
131
  ::Trino::Client.new(@client_options)
111
132
  end
112
133
 
134
+ def persistent_faraday
135
+ @persistent_faraday ||= build_persistent_faraday
136
+ end
137
+
138
+ def build_persistent_faraday
139
+ faraday = ::Trino::Client.faraday_client(@client_options)
140
+ faraday.builder.adapter(:net_http_persistent) do |http|
141
+ http.idle_timeout = PERSISTENT_IDLE_TIMEOUT
142
+ end
143
+ faraday
144
+ end
145
+
113
146
  def type_map
114
147
  @type_map ||= Trino::TypeMap.build
115
148
  end
@@ -27,11 +27,24 @@ module ActiveRecord
27
27
  ssl: ssl,
28
28
  http_proxy: symbolized[:http_proxy],
29
29
  time_zone: symbolized[:time_zone],
30
+ gzip: symbolized[:gzip],
30
31
  query_timeout: symbolized.fetch(:query_timeout, DEFAULT_QUERY_TIMEOUT),
31
32
  plan_timeout: symbolized.fetch(:plan_timeout, DEFAULT_PLAN_TIMEOUT),
32
33
  }.compact
33
34
  end
34
35
 
36
+ def persistent?(config)
37
+ !!symbolize(config).fetch(:persistent, false)
38
+ end
39
+
40
+ def bulk_column_reflection?(config)
41
+ !!symbolize(config).fetch(:bulk_column_reflection, false)
42
+ end
43
+
44
+ def static_schema?(config)
45
+ !!symbolize(config).fetch(:static_schema, false)
46
+ end
47
+
35
48
  def default_port(ssl)
36
49
  ssl ? DEFAULT_HTTPS_PORT : DEFAULT_HTTP_PORT
37
50
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module ActiveRecord
4
4
  module Trino
5
- VERSION = "0.1.0"
5
+ VERSION = "0.2.0"
6
6
  end
7
7
  end
@@ -31,8 +31,21 @@ end
31
31
 
32
32
  module ActiveRecord
33
33
  module Trino
34
+ def self.static_columns
35
+ @static_columns ||= {}
36
+ end
37
+
38
+ def self.define_columns(table_name, definitions)
39
+ static_columns[table_name.to_s] = Array(definitions)
40
+ end
41
+
34
42
  def self.reset_schema_cache!(model_class)
35
43
  model_class.reset_column_information
44
+
45
+ model_class.connection_pool.connections.each do |conn|
46
+ conn.clear_column_cache! if conn.respond_to?(:clear_column_cache!)
47
+ end
48
+
36
49
  return unless model_class.connection.respond_to?(:schema_cache)
37
50
 
38
51
  model_class.connection.schema_cache.clear!
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: activerecord-trino-adapter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Garett Arrowood
@@ -49,6 +49,20 @@ dependencies:
49
49
  - - "<"
50
50
  - !ruby/object:Gem::Version
51
51
  version: '8.1'
52
+ - !ruby/object:Gem::Dependency
53
+ name: faraday-net_http_persistent
54
+ requirement: !ruby/object:Gem::Requirement
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ version: '2.0'
59
+ type: :runtime
60
+ prerelease: false
61
+ version_requirements: !ruby/object:Gem::Requirement
62
+ requirements:
63
+ - - ">="
64
+ - !ruby/object:Gem::Version
65
+ version: '2.0'
52
66
  - !ruby/object:Gem::Dependency
53
67
  name: trino-client
54
68
  requirement: !ruby/object:Gem::Requirement