dwh 0.3.0 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e70f914cc994c4be7a9d76b0d72d170ae6bf4d895427ec90adcdf9e3099774fe
4
- data.tar.gz: 3ef66bc3d9a326bbae4b51d2bb3ec6af45425971617aee3a3131c7d496cf9127
3
+ metadata.gz: 9a67d957a140e258a8bef00efbae509bb7169e9de98dae911945316e7f24afb1
4
+ data.tar.gz: 0ae9acadd0a1a8e0e508f8a264c1534118d323202706c0488f3ae17ebdd98e9e
5
5
  SHA512:
6
- metadata.gz: d05640e86dc5a6df2135173dd7a513df0805b4035fa5c4c5fa182659b1286fb6fd78f67ff2e259899f3d4ecab19b44bf5e7bfb427acfa5daa6065d71fb2fa18d
7
- data.tar.gz: 28e5c623c8401dea1d222a1b318325543d4c209e083944d2fd43367ae6721c2ebe1c2b7e69190462a19887770d722edb7f2d1d851f90e5cc53b4c51cf2b65953
6
+ metadata.gz: 1aff93e7071cd35b748b43e174c3beeea2fb4c748a97bbf81d407e95e978c411e160b53ebe5cdef331ed685233407143485a0be7775b6b62db73538aebf05718
7
+ data.tar.gz: d9ad974f9e7a4edf0211b40bb0be33f730f8cd6abace42df05aa4339dcfc53babeea6c495c9d6d4d4b91519fa30f03852e99f6a0a0e14912914daf91bd62fb00
data/CHANGELOG.md CHANGED
@@ -1,5 +1,35 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.4.2] - 2026-05-22
4
+
5
+ ### Fixed
6
+
7
+ - **DuckDB adapter**: Pass connection options with `config:` when opening a database so initialization uses the correct DuckDB API parameter.
8
+
9
+ ## [0.4.1] - 2026-04-29
10
+
11
+ ### Added
12
+
13
+ - Databricks `execute_stream` support for `EXTERNAL_LINKS` result delivery using CSV downloads
14
+
15
+ ### Changed
16
+
17
+ - Databricks now uses method-specific result delivery defaults: `execute` uses `INLINE` + `JSON_ARRAY`, and `execute_stream` uses `EXTERNAL_LINKS` + `CSV`.
18
+
19
+ ## [0.4.0] - 2026-04-28
20
+
21
+ ### Added
22
+
23
+ - Token persistence interface via `DWH::TokenStore` for adapters that support OAuth token lifecycle management.
24
+ - `TokenManageable` adapter concern for standardized token read/write behavior across adapters.
25
+ - PKCE-based U2M OAuth support for the Databricks adapter.
26
+ - Expanded tests for OAuth and Databricks token flows.
27
+
28
+ ### Changed
29
+
30
+ - Databricks adapter now requires explicit `auth_mode` to reduce ambiguous auth configuration.
31
+ - Updated documentation for adapter auth/token usage and adapter authoring.
32
+
3
33
  ## [0.3.0] - 2026-04-22
4
34
 
5
35
  ### Changed
@@ -179,7 +179,7 @@ adapter = DWH.create(:snowflake, {
179
179
  account_identifier: 'myorg-myaccount.us-east-1',
180
180
  oauth_client_id: '<YOUR_CLIENT_ID>',
181
181
  oauth_client_secret: '<YOUR_CLIENT_SECRET>',
182
- oauth_redirect_url: 'https://localhost:3030/some/path',
182
+ oauth_redirect_uri: 'https://localhost:3030/some/path',
183
183
  database: 'ANALYTICS',
184
184
  client_name: 'myapp' # sent as user agent header value
185
185
  })
@@ -189,10 +189,119 @@ To successfully use OAuth you have to pass the adapter valid access and refresh
189
189
 
190
190
  The typical flow is like so:
191
191
 
192
- 1. Generate an authorization code by visiting the url generated by `adapter.authorization_url.` This will redirect to the configured `oauth_redirect_url.` You must be able to retrieve the `code` from there.
192
+ 1. Generate an authorization code by visiting the url generated by `adapter.authorization_url.` This will redirect to the configured `oauth_redirect_uri.` You must be able to retrieve the `code` from there.
193
193
  2. Take the code from above and generate new access tokens: `adapter.generate_oauth_tokens(code)`. This will return Hash with access_token and refresh_token. You can cache and reuse this until the refresh_token gets expired. This method will also apply the token to the current adapter instance.
194
194
  3. You can apply an existing set of tokens like so:`adapter.apply_oauth_tokens(access_token: token, refresh_token: token, expires_at: Time.now)`
195
195
 
196
+ ### Host integration contract (CLI / server)
197
+
198
+ The host application is responsible for orchestration and persistence. DWH is responsible for OAuth protocol calls and token lifecycle methods.
199
+
200
+ Public OAuth methods you can call on adapters that include `OpenAuthorizable`:
201
+
202
+ - `authorization_url(state:, scope:)` - build provider authorize URL (authorization-code flow only)
203
+ - `generate_oauth_tokens(code)` - exchange auth code for tokens and apply/store them
204
+ - `apply_oauth_tokens(access_token:, refresh_token:, expires_at:)` - inject tokens from host storage
205
+ - `oauth_access_token` - get a usable access token (load/refresh/mint as needed)
206
+ - `refresh_access_token` - explicitly refresh using current refresh token
207
+ - `mint_access_token` - explicitly mint via client credentials (M2M only)
208
+ - `oauth_token_info` - inspect current token state
209
+
210
+ Host requirements:
211
+
212
+ 1. Build and pass adapter config to `DWH.create(...)`, including correct `auth_mode` and OAuth fields.
213
+ 2. For U2M, implement browser redirect + callback capture, then call `generate_oauth_tokens(code)`.
214
+ 3. Validate OAuth `state` in the host callback handler (DWH does not enforce callback state verification).
215
+ 4. Persist tokens either by:
216
+ - passing a `token_store` object (`load`, `store`, `delete`), or
217
+ - storing tokens externally and rehydrating with `apply_oauth_tokens`.
218
+ 5. Handle auth exceptions and trigger reconnect UX when needed (for example, expired/invalid refresh token).
219
+
220
+ Call order by mode:
221
+
222
+ - **M2M (`oauth_m2m`)**
223
+ 1. `adapter = DWH.create(...)`
224
+ 2. Run query/test methods (`execute`, `test_connection`, etc.)
225
+ 3. DWH internally calls `oauth_access_token`, which mints or refreshes as required
226
+
227
+ - **U2M (`oauth_u2m`)**
228
+ 1. `adapter = DWH.create(...)`
229
+ 2. `url = adapter.authorization_url(...)`
230
+ 3. Host sends user to URL and receives callback with `code`
231
+ 4. `adapter.generate_oauth_tokens(code)`
232
+ 5. Run query/test methods; DWH reuses and refreshes tokens as needed
233
+
234
+ Note: for PKCE-enabled U2M providers, call `authorization_url` and `generate_oauth_tokens` on the same adapter instance.
235
+
236
+ ## Databricks
237
+
238
+ The Databricks adapter uses the SQL Statements REST API and supports OAuth with
239
+ both machine-to-machine (M2M) and user-to-machine (U2M) authorization-code flow.
240
+ Set `auth_mode` explicitly to select the flow.
241
+
242
+ ### Basic configuration
243
+
244
+ ```ruby
245
+ adapter = DWH.create(:databricks, {
246
+ host: 'workspace.cloud.databricks.com',
247
+ auth_mode: 'oauth_m2m',
248
+ warehouse: 'warehouse_id',
249
+ oauth_client_id: '<CLIENT_ID>',
250
+ oauth_client_secret: '<CLIENT_SECRET>',
251
+ catalog: 'main',
252
+ schema: 'default'
253
+ })
254
+ ```
255
+
256
+ ### M2M (service principal) flow
257
+
258
+ Set `auth_mode: 'oauth_m2m'`. The adapter mints access tokens using
259
+ `grant_type=client_credentials`.
260
+
261
+ ### U2M (authorization code) flow
262
+
263
+ Set `auth_mode: 'oauth_u2m'` and provide `oauth_redirect_uri` in config, then run:
264
+
265
+ 1. Generate authorize URL from `adapter.authorization_url`
266
+ 2. Capture `code` from redirect callback
267
+ 3. Exchange with `adapter.generate_oauth_tokens(code)`
268
+
269
+ When U2M is active, PKCE is applied automatically by the adapter.
270
+
271
+ ### Large result sets (>25MiB)
272
+
273
+ Databricks `INLINE` result disposition is limited for large payloads. The adapter uses
274
+ method-driven defaults:
275
+
276
+ - `execute` uses `INLINE` + `JSON_ARRAY` (in-memory/object style results)
277
+ - `execute_stream` uses `EXTERNAL_LINKS` + `CSV` (large export path)
278
+
279
+ ```ruby
280
+ adapter = DWH.create(:databricks, {
281
+ host: 'workspace.cloud.databricks.com',
282
+ auth_mode: 'oauth_m2m',
283
+ warehouse: 'warehouse_id',
284
+ oauth_client_id: '<CLIENT_ID>',
285
+ oauth_client_secret: '<CLIENT_SECRET>'
286
+ })
287
+
288
+ File.open('export.csv', 'w') do |io|
289
+ adapter.execute_stream('SELECT * FROM big_table', io)
290
+ end
291
+ ```
292
+
293
+ For low-memory exports, prefer `File`/`Tempfile`/pipes as the IO target. Avoid `StringIO`
294
+ for very large result sets, because it keeps output bytes in memory.
295
+ When using `execute_stream` with EXTERNAL_LINKS CSV, streaming stats/row counts are tracked
296
+ consistently with other adapters (data rows only, header excluded).
297
+
298
+ ### Migration note
299
+
300
+ Databricks now requires explicit `auth_mode`.
301
+
302
+ - Existing service-principal setups should set `auth_mode: 'oauth_m2m'`
303
+ - U2M setups should set `auth_mode: 'oauth_u2m'` and provide `oauth_redirect_uri`
304
+
196
305
  ## MySQL Adapter
197
306
 
198
307
  The MySQL adapter uses the `mysql2` gem. Note that MySQL's concept of "database" maps to "schema" in DWH.
@@ -363,6 +363,49 @@ end
363
363
 
364
364
  ## Advanced Features
365
365
 
366
+ ### Identity-Bound Token Store Integration
367
+
368
+ Adapters that use OAuth/M2M token exchange can support host-managed persistence by
369
+ accepting `token_store` in adapter config and using base helpers from `Adapter`.
370
+
371
+ `token_store` should be identity-bound by the host app (for example datasource-bound
372
+ for service accounts, user+datasource-bound for per-user OAuth). The adapter should
373
+ not parse or infer identity keys.
374
+
375
+ ```ruby
376
+ class MyOAuthAdapter < Adapter
377
+ config :token_store, Object, required: false, default: nil
378
+
379
+ def access_token
380
+ payload = load_tokens_from_store
381
+ apply_token_payload(payload) if payload
382
+ return @access_token if @access_token && !token_expired?
383
+
384
+ token = request_new_token!
385
+ store_tokens_in_store(
386
+ access_token: token[:access_token],
387
+ refresh_token: token[:refresh_token],
388
+ expires_at: token[:expires_at]
389
+ )
390
+ @access_token
391
+ end
392
+ end
393
+ ```
394
+
395
+ Expected token-store methods:
396
+
397
+ - `load` -> returns `nil` or hash with `access_token`, optional `refresh_token`, and `expires_at`
398
+ - `store(token_hash)` -> persists latest token payload
399
+ - `delete` -> optional revoke/cleanup hook for terminal auth failures
400
+
401
+ For OAuth adapters, keep token persistence centralized and only override provider hooks:
402
+
403
+ - `oauth_supports_authorization_code_flow?` -> `true` for U2M flows
404
+ - `oauth_supports_client_credentials_flow?` -> `true` for M2M flows
405
+ - `oauth_client_credentials_params` -> provider-specific client-credentials form body
406
+ - `oauth_tokenization_url` -> provider token endpoint
407
+ - `oauth_token_expiry_leeway_seconds` -> eager refresh buffer (for near-expiry tokens)
408
+
366
409
  ### Error Handling
367
410
 
368
411
  ```ruby
data/docs/guides/usage.md CHANGED
@@ -341,6 +341,71 @@ readonly_analytics = DWH.create(:sqlite, {
341
341
  })
342
342
  ```
343
343
 
344
+ ## Identity-Bound Token Stores
345
+
346
+ For OAuth and M2M adapters, DWH can optionally reuse tokens through a host-provided
347
+ `token_store` object. The store should be identity-bound before it is passed into
348
+ adapter config so DWH remains agnostic of app-level user/datasource models.
349
+
350
+ ### Token Store Contract
351
+
352
+ `DWH::TokenStore` is the reference contract, but duck-typed objects are supported.
353
+
354
+ ```ruby
355
+ class MyTokenStore < DWH::TokenStore
356
+ def load
357
+ # return nil or hash with:
358
+ # access_token (optional), refresh_token (optional), expires_at (optional)
359
+ #
360
+ # Notes:
361
+ # - providing expires_at enables proactive refresh/mint behavior
362
+ # - providing refresh_token enables refresh when access_token expires
363
+ end
364
+
365
+ def store(token)
366
+ # token includes at least access_token and expires_at
367
+ end
368
+
369
+ def delete
370
+ # optional cleanup/revoke path for terminal auth failures
371
+ end
372
+ end
373
+ ```
374
+
375
+ ### Service Account (M2M) Example
376
+
377
+ ```ruby
378
+ store = DatasourceTokenStore.new(datasource_id: datasource.id)
379
+
380
+ client = DWH.create(:databricks, {
381
+ host: datasource.host,
382
+ auth_mode: 'oauth_m2m',
383
+ warehouse: datasource.warehouse,
384
+ oauth_client_id: datasource.oauth_client_id,
385
+ oauth_client_secret: datasource.oauth_client_secret,
386
+ token_store: store
387
+ })
388
+ ```
389
+
390
+ ### Per-User OAuth Example (host-owned)
391
+
392
+ ```ruby
393
+ store = UserDatasourceTokenStore.new(
394
+ user_id: current_user.id,
395
+ datasource_id: datasource.id
396
+ )
397
+
398
+ client = DWH.create(:snowflake, {
399
+ auth_mode: 'oauth',
400
+ account_identifier: datasource.account_identifier,
401
+ database: datasource.database,
402
+ oauth_client_id: datasource.oauth_client_id,
403
+ oauth_client_secret: datasource.oauth_client_secret,
404
+ oauth_redirect_uri: datasource.oauth_redirect_uri,
405
+ token_store: store
406
+ })
407
+ ```
408
+
344
409
  ## Error Handling and Debugging
345
410
 
346
411
  ### Comprehensive Error Handling
@@ -1,11 +1,14 @@
1
1
  require 'csv'
2
- require 'base64'
2
+ require_relative 'open_authorizable'
3
3
 
4
4
  module DWH
5
5
  module Adapters
6
6
  # Databricks adapter for executing SQL queries against Databricks SQL warehouses.
7
7
  #
8
- # Supports OAuth M2M (service principal) authentication only.
8
+ # Supports OAuth M2M (service principal) and U2M (authorization code) flows.
9
+ # The host application must set auth_mode explicitly:
10
+ # - oauth_m2m: client_credentials flow
11
+ # - oauth_u2m: authorization_code + PKCE flow
9
12
  #
10
13
  # @example Connection with OAuth (service principal)
11
14
  # DWH.create(:databricks, {
@@ -17,7 +20,15 @@ module DWH
17
20
  # schema: 'default'
18
21
  # })
19
22
  class Databricks < Adapter
23
+ include OpenAuthorizable
24
+
25
+ oauth_with authorize: ->(adapter) { "https://#{adapter.host}/oidc/v1/authorize" },
26
+ tokenize: ->(adapter) { "https://#{adapter.host}/oidc/v1/token" },
27
+ default_scope: 'all-apis'
28
+
20
29
  config :host, String, required: true, message: 'Databricks workspace host (e.g., adb-xxx.databricks.cloud.com)'
30
+ config :auth_mode, String, required: true, allowed: %w[oauth_m2m oauth_u2m],
31
+ message: 'Authentication mode: oauth_m2m or oauth_u2m'
21
32
  config :oauth_client_id, String, required: true, message: 'OAuth client ID (service principal application ID)'
22
33
  config :oauth_client_secret, String, required: true, message: 'OAuth client secret'
23
34
  config :client_name, String, required: false, default: 'Ruby DWH Gem', message: 'Client name sent to Databricks'
@@ -33,7 +44,7 @@ module DWH
33
44
 
34
45
  def initialize(config)
35
46
  super
36
- validate_auth_config
47
+ validate_oauth_config
37
48
  end
38
49
 
39
50
  def connection
@@ -44,7 +55,7 @@ module DWH
44
55
  url: "https://#{workspace_host}",
45
56
  headers: {
46
57
  'Content-Type' => 'application/json',
47
- 'Authorization' => "Bearer #{auth_token}",
58
+ 'Authorization' => "Bearer #{oauth_access_token}",
48
59
  'User-Agent' => config[:client_name]
49
60
  },
50
61
  request: {
@@ -67,7 +78,7 @@ module DWH
67
78
  def execute(sql, format: :array, retries: 0)
68
79
  result = with_retry(retries + 1) do
69
80
  with_debug(sql) do
70
- response = submit_query(sql)
81
+ response = submit_query_for_execute(sql)
71
82
  fetch_data(handle_query_response(response))
72
83
  end
73
84
  end
@@ -78,7 +89,7 @@ module DWH
78
89
  def execute_stream(sql, io, stats: nil, retries: 0)
79
90
  with_retry(retries) do
80
91
  with_debug(sql) do
81
- response = submit_query(sql)
92
+ response = submit_query_for_execute_stream(sql)
82
93
  fetch_data(handle_query_response(response), io: io, stats: stats)
83
94
  end
84
95
  end
@@ -92,7 +103,7 @@ module DWH
92
103
  # @yield [chunk] yields each chunk of data as it's processed
93
104
  def stream(sql, &block)
94
105
  with_debug(sql) do
95
- response = submit_query(sql)
106
+ response = submit_query_for_execute(sql)
96
107
  fetch_data(handle_query_response(response), proc: block)
97
108
  end
98
109
  end
@@ -159,42 +170,14 @@ module DWH
159
170
 
160
171
  private
161
172
 
162
- def validate_auth_config
163
- raise ConfigError, 'oauth_client_id is required' unless config[:oauth_client_id]
164
- raise ConfigError, 'oauth_client_secret is required' unless config[:oauth_client_secret]
165
- end
166
-
167
- def auth_token
168
- return @oauth_access_token if @oauth_access_token && !token_expired?
169
-
170
- request_oauth_access_token!
171
- @oauth_access_token
172
- end
173
-
174
- def request_oauth_access_token!
175
- credentials = Base64.strict_encode64("#{config[:oauth_client_id]}:#{config[:oauth_client_secret]}")
176
- response = Faraday.post(
177
- "https://#{workspace_host}/oidc/v1/token",
178
- 'grant_type=client_credentials&scope=all-apis',
179
- 'Authorization' => "Basic #{credentials}",
180
- 'Content-Type' => 'application/x-www-form-urlencoded'
181
- )
182
-
183
- raise AuthenticationError, "OAuth M2M token request failed (#{response.status}): #{response.body}" unless response.status == 200
184
-
185
- data = JSON.parse(response.body)
186
- @oauth_access_token = data['access_token']
187
- expires_in = data['expires_in'] || 3600
188
- @token_expires_at = Time.now + [expires_in - 60, 60].max
189
- end
190
-
191
173
  def reset_connection
192
174
  @oauth_access_token = nil
175
+ @oauth_refresh_token = nil
193
176
  @token_expires_at = nil
194
177
  close
195
178
  end
196
179
 
197
- def submit_query(sql)
180
+ def submit_query(sql, disposition: 'INLINE', format: 'JSON_ARRAY')
198
181
  connection.post(STATEMENTS_API) do |req|
199
182
  req.body = {
200
183
  statement: sql,
@@ -203,12 +186,20 @@ module DWH
203
186
  schema: config[:schema],
204
187
  wait_timeout: '30s',
205
188
  on_wait_timeout: 'CONTINUE',
206
- format: 'JSON_ARRAY',
207
- disposition: 'INLINE'
189
+ format:,
190
+ disposition:
208
191
  }.compact.merge(extra_query_params).to_json
209
192
  end
210
193
  end
211
194
 
195
+ def submit_query_for_execute(sql)
196
+ submit_query(sql, disposition: 'INLINE', format: 'JSON_ARRAY')
197
+ end
198
+
199
+ def submit_query_for_execute_stream(sql)
200
+ submit_query(sql, disposition: 'EXTERNAL_LINKS', format: 'CSV')
201
+ end
202
+
212
203
  def handle_query_response(response)
213
204
  body = JSON.parse(response.body)
214
205
 
@@ -250,6 +241,8 @@ module DWH
250
241
  end
251
242
 
252
243
  def fetch_data(result, io: nil, stats: nil, proc: nil)
244
+ return fetch_external_links_data(result, io:, stats:, proc:) if result.dig('result', 'external_links')
245
+
253
246
  columns = result.dig('manifest', 'schema', 'columns')&.map { |col| col['name'] } || []
254
247
  chunks = result.dig('manifest', 'chunks') || []
255
248
  collector = {
@@ -279,6 +272,60 @@ module DWH
279
272
  collector
280
273
  end
281
274
 
275
+ def fetch_external_links_data(result, io:, proc:, stats: nil)
276
+ if io.nil?
277
+ raise UnsupportedCapability,
278
+ "Databricks EXTERNAL_LINKS is supported only for execute_stream. Use result_format: 'CSV' with execute_stream."
279
+ end
280
+ raise UnsupportedCapability, 'Databricks EXTERNAL_LINKS does not support stream/yield. Use execute_stream.' if proc
281
+
282
+ csv_buffer = +''
283
+ header_skipped = false
284
+ current = result
285
+ loop do
286
+ links = current.dig('result', 'external_links') || []
287
+ links.each do |link|
288
+ url = link['external_link']
289
+ raise ExecutionError, 'Databricks external link missing external_link URL' if url.to_s.strip.empty?
290
+
291
+ response = external_link_http_client.get(url)
292
+ raise ExecutionError, "Failed to download Databricks external link: #{response.status}" unless response.status == 200
293
+
294
+ body = response.body.to_s
295
+ io << body
296
+ header_skipped = append_csv_stats(stats, body, csv_buffer, header_skipped:) if stats
297
+ end
298
+
299
+ next_chunk_internal_link = links.first&.dig('next_chunk_internal_link')
300
+ break if next_chunk_internal_link.to_s.strip.empty?
301
+
302
+ current = JSON.parse(connection.get(next_chunk_internal_link).body)
303
+ end
304
+
305
+ io.rewind
306
+ { columns: [], data: [], io: io, stats: stats, wrote_header: true }
307
+ end
308
+
309
+ def append_csv_stats(stats, chunk, csv_buffer, header_skipped:)
310
+ return if stats.nil?
311
+
312
+ csv_buffer << chunk
313
+ rows = CSV.parse(csv_buffer, skip_blanks: true)
314
+ rows.each_with_index do |row, index|
315
+ if !header_skipped && index.zero?
316
+ header_skipped = true
317
+ next
318
+ end
319
+
320
+ stats << row
321
+ end
322
+ csv_buffer.clear
323
+ header_skipped
324
+ rescue CSV::MalformedCSVError
325
+ logger.debug("Unparseable:\n #{chunk}")
326
+ header_skipped
327
+ end
328
+
282
329
  def write_data(data, collector, io = nil, stats = nil, proc = nil)
283
330
  if io
284
331
  unless collector[:wrote_header]
@@ -321,7 +368,38 @@ module DWH
321
368
  end
322
369
 
323
370
  def workspace_host
324
- config[:host].to_s.gsub(%r{\Ahttps?://}, '').gsub(%r{/+\z}, '')
371
+ config[:host].to_s
372
+ end
373
+
374
+ def external_link_http_client
375
+ @external_link_http_client ||= Faraday.new
376
+ end
377
+
378
+ def oauth_supports_authorization_code_flow?
379
+ auth_mode == 'oauth_u2m'
380
+ end
381
+
382
+ def oauth_supports_client_credentials_flow?
383
+ auth_mode == 'oauth_m2m'
384
+ end
385
+
386
+ def oauth_redirect_uri_required?
387
+ oauth_supports_authorization_code_flow?
388
+ end
389
+
390
+ def oauth_client_credentials_params
391
+ {
392
+ grant_type: 'client_credentials',
393
+ scope: 'all-apis'
394
+ }
395
+ end
396
+
397
+ def oauth_token_expiry_leeway_seconds
398
+ 30
399
+ end
400
+
401
+ def oauth_uses_pkce?
402
+ oauth_supports_authorization_code_flow?
325
403
  end
326
404
  end
327
405
  end
@@ -32,7 +32,7 @@ module DWH
32
32
  ducked_config[key.to_s] = val
33
33
  end
34
34
  end
35
- @db = DuckDB::Database.open(config[:file], ducked_config)
35
+ @db = DuckDB::Database.open(config[:file], config: ducked_config)
36
36
  self.class.databases[config[:file]] = @db
37
37
  end
38
38
 
@@ -1,5 +1,7 @@
1
1
  require 'base64'
2
2
  require 'securerandom'
3
+ require 'digest'
4
+ require_relative 'token_manageable'
3
5
 
4
6
  module DWH
5
7
  module Adapters
@@ -26,7 +28,7 @@ module DWH
26
28
  module OpenAuthorizable
27
29
  # rubcop:disable Style/DocumentationModule
28
30
  module ClassMethods
29
- def oauth_with(authorize:, tokenize:, default_scope: 'refresh_token')
31
+ def oauth_with(authorize: nil, tokenize: nil, default_scope: 'refresh_token')
30
32
  @oauth_settings = { authorize: authorize, tokenize: tokenize, default_scope: default_scope }
31
33
  end
32
34
 
@@ -39,14 +41,18 @@ module DWH
39
41
 
40
42
  def self.included(base)
41
43
  base.extend(ClassMethods)
44
+ base.include(TokenManageable)
42
45
  base.config :oauth_client_id, String, required: false, message: 'OAuth client_id'
43
46
  base.config :oauth_client_secret, String, required: false, message: 'OAuth client_secret'
44
47
  base.config :oauth_redirect_uri, String, required: false, message: 'OAuth redirect_uri'
45
- base.config :oauth_scope, String, required: false, message: 'OAuth redirect_url'
48
+ base.config :oauth_scope, String, required: false, message: 'OAuth scope'
46
49
  end
47
50
 
48
51
  # Generate authorization URL for user to visit
49
52
  def authorization_url(state: SecureRandom.hex(16), scope: nil)
53
+ raise UnsupportedCapability, "#{adapter_name} does not support authorization-code OAuth flow" unless oauth_supports_authorization_code_flow?
54
+
55
+ code_verifier = oauth_pkce_code_verifier_for_session
50
56
  params = {
51
57
  'response_type' => 'code',
52
58
  'client_id' => oauth_client_id,
@@ -54,6 +60,7 @@ module DWH
54
60
  'state' => state,
55
61
  'scope' => scope || oauth_scope || oauth_settings[:default_scope]
56
62
  }.compact
63
+ params.merge!(oauth_pkce_authorization_params(code_verifier))
57
64
 
58
65
  uri = URI(oauth_settings[:authorize])
59
66
  uri.query = URI.encode_www_form(params)
@@ -66,7 +73,7 @@ module DWH
66
73
  #
67
74
  # param access_token [String] the access token
68
75
  # @param refresh_token [String] optional refresh token
69
- def apply_oauth_tokens(access_token:, refresh_token: nil, expires_at: nil)
76
+ def apply_oauth_tokens(access_token: nil, refresh_token: nil, expires_at: nil)
70
77
  @oauth_access_token = access_token
71
78
  @oauth_refresh_token = refresh_token
72
79
  @token_expires_at = expires_at
@@ -77,11 +84,15 @@ module DWH
77
84
  # @param authorization_code [String] this code should come from
78
85
  # the redirect that is captured from the #authorization_url
79
86
  def generate_oauth_tokens(authorization_code)
87
+ raise UnsupportedCapability, "#{adapter_name} does not support authorization-code OAuth flow" unless oauth_supports_authorization_code_flow?
88
+
89
+ code_verifier = oauth_pkce_code_verifier_for_session
80
90
  params = {
81
91
  grant_type: 'authorization_code',
82
92
  code: authorization_code,
83
93
  redirect_uri: oauth_redirect_uri
84
94
  }
95
+ params.merge!(oauth_pkce_token_params(code_verifier))
85
96
 
86
97
  response = oauth_http_client.post(oauth_tokenization_url) do |req|
87
98
  req.headers['Content-Type'] = 'application/x-www-form-urlencoded'
@@ -95,14 +106,23 @@ module DWH
95
106
  def refresh_access_token
96
107
  raise AuthenticationError, 'No refresh token available' unless @oauth_refresh_token
97
108
 
98
- params = {
99
- grant_type: 'refresh_token',
100
- refresh_token: @oauth_refresh_token
101
- }
109
+ params = oauth_refresh_token_params
110
+ response = oauth_http_client.post(oauth_tokenization_url) do |req|
111
+ req.headers['Content-Type'] = 'application/x-www-form-urlencoded'
112
+ req.headers['Authorization'] = oauth_token_request_auth_header
113
+ req.body = URI.encode_www_form(params)
114
+ end
102
115
 
116
+ oauth_token_response(response)
117
+ end
118
+
119
+ def mint_access_token
120
+ raise UnsupportedCapability, "#{adapter_name} does not support client-credentials OAuth flow" unless oauth_supports_client_credentials_flow?
121
+
122
+ params = oauth_client_credentials_params
103
123
  response = oauth_http_client.post(oauth_tokenization_url) do |req|
104
124
  req.headers['Content-Type'] = 'application/x-www-form-urlencoded'
105
- req.headers['Authorization'] = basic_auth_header
125
+ req.headers['Authorization'] = oauth_token_request_auth_header
106
126
  req.body = URI.encode_www_form(params)
107
127
  end
108
128
 
@@ -116,22 +136,22 @@ module DWH
116
136
  # @return [String] access token
117
137
  # @raise [AuthenticationError]
118
138
  def oauth_access_token
119
- if token_expired? && @oauth_refresh_token
120
- refresh_access_token
139
+ load_oauth_tokens_from_store! unless @oauth_access_token || @oauth_refresh_token
140
+ return @oauth_access_token if oauth_token_usable?
121
141
 
122
- # return token unless exception was raised
123
- @oauth_access_token
124
- elsif @oauth_access_token
125
- @oauth_access_token
126
- else
127
- raise AuthenticationError,
128
- 'Access token was never set. Either run the auth flow or set the tokens via apply_oauth_tokens method.'
129
- end
142
+ refresh_access_token if oauth_refresh_token_usable?
143
+ return @oauth_access_token if oauth_token_usable?
144
+
145
+ mint_access_token if oauth_supports_client_credentials_flow?
146
+ return @oauth_access_token if oauth_token_usable?
147
+
148
+ raise AuthenticationError,
149
+ 'Access token was never set. Either run the auth flow, mint via client credentials, or set tokens via apply_oauth_tokens.'
130
150
  end
131
151
 
132
152
  # Check if we have a valid access token
133
153
  def oauth_authenticated?
134
- @oauth_access_token && !token_expired?
154
+ @oauth_access_token && oauth_token_usable?
135
155
  end
136
156
 
137
157
  # Get current state of tokens
@@ -140,7 +160,7 @@ module DWH
140
160
  access_token: @oauth_access_token,
141
161
  refresh_token: @oauth_refresh_token,
142
162
  expires_at: @token_expires_at,
143
- expired: token_expired?,
163
+ expired: !oauth_token_usable?,
144
164
  authenticated: oauth_authenticated?
145
165
  }
146
166
  end
@@ -148,9 +168,11 @@ module DWH
148
168
  def validate_oauth_config
149
169
  raise ConfigError, 'Missing config: oauth_client_id. Required for OAuth.' unless config[:oauth_client_id]
150
170
  raise ConfigError, 'Missing config: oauth_client_secret. Required for OAuth.' unless config[:oauth_client_secret]
151
- raise ConfigError, 'Missing config: oauth_redirect_url. Required for OAuth.' unless config[:oauth_redirect_uri]
152
171
 
153
- oauth_settings
172
+ raise ConfigError, 'Missing config: oauth_redirect_uri. Required for OAuth.' if oauth_redirect_uri_required? && !config[:oauth_redirect_uri]
173
+
174
+ oauth_settings if oauth_supports_authorization_code_flow?
175
+ true
154
176
  end
155
177
 
156
178
  def oauth_settings
@@ -170,6 +192,75 @@ module DWH
170
192
  "Basic #{credentials}"
171
193
  end
172
194
 
195
+ def oauth_token_request_auth_header
196
+ basic_auth_header
197
+ end
198
+
199
+ def oauth_refresh_token_params
200
+ {
201
+ grant_type: 'refresh_token',
202
+ refresh_token: @oauth_refresh_token
203
+ }
204
+ end
205
+
206
+ def oauth_client_credentials_params
207
+ {
208
+ grant_type: 'client_credentials',
209
+ scope: oauth_scope || oauth_settings[:default_scope]
210
+ }.compact
211
+ end
212
+
213
+ def oauth_supports_authorization_code_flow?
214
+ true
215
+ end
216
+
217
+ def oauth_supports_client_credentials_flow?
218
+ false
219
+ end
220
+
221
+ def oauth_redirect_uri_required?
222
+ oauth_supports_authorization_code_flow?
223
+ end
224
+
225
+ def oauth_token_expiry_leeway_seconds
226
+ 0
227
+ end
228
+
229
+ # PKCE is optional and disabled by default
230
+ def oauth_uses_pkce?
231
+ false
232
+ end
233
+
234
+ def oauth_pkce_code_challenge_method
235
+ 'S256'
236
+ end
237
+
238
+ def oauth_pkce_code_verifier_for_session
239
+ return nil unless oauth_uses_pkce?
240
+
241
+ @oauth_pkce_code_verifier_for_session ||= oauth_pkce_code_verifier
242
+ end
243
+
244
+ def oauth_pkce_code_verifier
245
+ SecureRandom.urlsafe_base64(64).delete('=')
246
+ end
247
+
248
+ def oauth_token_usable?
249
+ return false unless @oauth_access_token
250
+
251
+ !token_expiring_soon?
252
+ end
253
+
254
+ def oauth_refresh_token_usable?
255
+ @oauth_refresh_token && token_expired?
256
+ end
257
+
258
+ def token_expiring_soon?(seconds = oauth_token_expiry_leeway_seconds)
259
+ return true if @token_expires_at.nil?
260
+
261
+ (Time.now + seconds) >= @token_expires_at
262
+ end
263
+
173
264
  def oauth_http_client
174
265
  @oauth_http_client ||= Faraday.new(
175
266
  headers: {
@@ -191,11 +282,20 @@ module DWH
191
282
  # Calculate expiration time
192
283
  expires_in = data['expires_in'] || 3600
193
284
  @token_expires_at = Time.now + expires_in
285
+ store_tokens_in_store(
286
+ access_token: @oauth_access_token,
287
+ refresh_token: @oauth_refresh_token,
288
+ expires_at: @token_expires_at,
289
+ token_type: data['token_type'],
290
+ scope: data['scope'],
291
+ raw: data
292
+ )
194
293
 
195
294
  { success: true, data: data }
196
295
  else
197
296
  error_data = parse_error_response(response)
198
297
  if error_data['error'] == 'invalid_grant' && @oauth_refresh_token
298
+ delete_tokens_from_store
199
299
  raise TokenExpiredError, "Potentially expired refresh token. #{error_data['message']}"
200
300
  end
201
301
 
@@ -205,11 +305,48 @@ module DWH
205
305
 
206
306
  private
207
307
 
308
+ def load_oauth_tokens_from_store!
309
+ payload = load_tokens_from_store
310
+ return unless payload
311
+
312
+ apply_oauth_tokens(
313
+ access_token: payload[:access_token],
314
+ refresh_token: payload[:refresh_token],
315
+ expires_at: payload[:expires_at]
316
+ )
317
+ end
318
+
208
319
  def parse_error_response(response)
209
320
  JSON.parse(response.body)
210
321
  rescue JSON::ParserError
211
322
  { 'error' => 'unknown', 'message' => response.body }
212
323
  end
324
+
325
+ def oauth_pkce_authorization_params(code_verifier)
326
+ return {} unless oauth_uses_pkce?
327
+
328
+ {
329
+ 'code_challenge' => oauth_pkce_code_challenge(code_verifier),
330
+ 'code_challenge_method' => oauth_pkce_code_challenge_method
331
+ }
332
+ end
333
+
334
+ def oauth_pkce_token_params(code_verifier)
335
+ return {} unless oauth_uses_pkce?
336
+
337
+ { code_verifier: code_verifier }
338
+ end
339
+
340
+ def oauth_pkce_code_challenge(code_verifier)
341
+ case oauth_pkce_code_challenge_method
342
+ when 'S256'
343
+ Base64.urlsafe_encode64(Digest::SHA256.digest(code_verifier), padding: false)
344
+ when 'plain'
345
+ code_verifier
346
+ else
347
+ raise ConfigError, "Unsupported PKCE code challenge method: #{oauth_pkce_code_challenge_method}"
348
+ end
349
+ end
213
350
  end
214
351
  end
215
352
  end
@@ -39,7 +39,7 @@ module DWH
39
39
  # account_identifier: 'myorg-myaccount.us-east-1',
40
40
  # oauth_client_id: '<YOUR_CLIENT_ID>',
41
41
  # oauth_client_secret: '<YOUR_CLIENT_SECRET>',
42
- # oauth_redirect_url: 'https://localhost:3030/some/path',
42
+ # oauth_redirect_uri: 'https://localhost:3030/some/path',
43
43
  # database: 'ANALYTICS'
44
44
  # })
45
45
  #
@@ -0,0 +1,81 @@
1
+ require 'time'
2
+
3
+ module DWH
4
+ module Adapters
5
+ # TokenManageable hold the logic to load, store and delete tokens from the token store.
6
+ module TokenManageable
7
+ def token_store
8
+ config[:token_store]
9
+ end
10
+
11
+ def load_tokens_from_store
12
+ return nil unless token_store.respond_to?(:load)
13
+
14
+ payload = token_store.load
15
+ normalize_token_payload(payload)
16
+ rescue StandardError => e
17
+ logger.warn("Failed loading token from token_store: #{e.message}")
18
+ nil
19
+ end
20
+
21
+ def store_tokens_in_store(token_payload)
22
+ return unless token_store.respond_to?(:store)
23
+
24
+ token_store.store(normalize_token_payload_for_store(token_payload))
25
+ rescue StandardError => e
26
+ logger.warn("Failed storing token in token_store: #{e.message}")
27
+ end
28
+
29
+ def delete_tokens_from_store
30
+ return unless token_store.respond_to?(:delete)
31
+
32
+ token_store.delete
33
+ rescue StandardError => e
34
+ logger.warn("Failed deleting token from token_store: #{e.message}")
35
+ end
36
+
37
+ private
38
+
39
+ def normalize_token_payload(payload)
40
+ return nil unless payload.is_a?(Hash)
41
+
42
+ data = payload.transform_keys(&:to_sym)
43
+ access_token = data[:access_token]
44
+ access_token = nil if access_token.to_s.strip == ''
45
+
46
+ refresh_token = data[:refresh_token]
47
+ refresh_token = nil if refresh_token.respond_to?(:empty?) && refresh_token.empty?
48
+ return nil if access_token.nil? && refresh_token.nil?
49
+
50
+ {
51
+ access_token: access_token&.to_s,
52
+ refresh_token: refresh_token,
53
+ expires_at: parse_token_expiry(data[:expires_at])
54
+ }
55
+ end
56
+
57
+ def normalize_token_payload_for_store(payload)
58
+ data = payload.transform_keys(&:to_sym)
59
+ cleaned = {
60
+ access_token: data[:access_token]&.to_s,
61
+ refresh_token: data[:refresh_token],
62
+ expires_at: parse_token_expiry(data[:expires_at]),
63
+ token_type: data[:token_type],
64
+ scope: data[:scope],
65
+ issued_at: parse_token_expiry(data[:issued_at]),
66
+ raw: data[:raw]
67
+ }
68
+ cleaned.reject { |_k, v| v.nil? }
69
+ end
70
+
71
+ def parse_token_expiry(value)
72
+ return value if value.is_a?(Time)
73
+ return nil if value.nil? || value.to_s.strip == ''
74
+
75
+ Time.parse(value.to_s)
76
+ rescue StandardError
77
+ nil
78
+ end
79
+ end
80
+ end
81
+ end
data/lib/dwh/adapters.rb CHANGED
@@ -79,6 +79,13 @@ module DWH
79
79
  # @return [Hash] the actual instance configuration
80
80
  attr_reader :config
81
81
 
82
+ # Optional host-implemented token store for OAuth token reuse.
83
+ # That should implement the following methods:
84
+ # - load -> Hash|nil
85
+ # - store(token_hash)
86
+ # - delete
87
+ config :token_store, Object, required: false, default: nil, message: 'Token store instance implementing load/store/delete'
88
+
82
89
  def initialize(config)
83
90
  @config = config.transform_keys(&:to_sym)
84
91
  # Per instance customization of general settings
@@ -0,0 +1,24 @@
1
+ module DWH
2
+ # Optional contract for host applications that want token persistence.
3
+ #
4
+ # The store instance should be identity-bound before it is passed into
5
+ # adapter config so the adapter remains unaware of user/datasource identity.
6
+ #
7
+ # This class is intentionally minimal and can be subclassed or duck-typed.
8
+ class TokenStore
9
+ # @return [Hash,nil] token payload
10
+ def load
11
+ raise NotImplementedError, "#{self.class} must implement ##{__method__}"
12
+ end
13
+
14
+ # @param token [Hash] normalized payload with at least access_token and expires_at
15
+ def store(_token)
16
+ raise NotImplementedError, "#{self.class} must implement ##{__method__}"
17
+ end
18
+
19
+ # Remove/revoke persisted token state.
20
+ def delete
21
+ raise NotImplementedError, "#{self.class} must implement ##{__method__}"
22
+ end
23
+ end
24
+ end
data/lib/dwh/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DWH
4
- VERSION = '0.3.0'
4
+ VERSION = '0.4.2'
5
5
  end
data/lib/dwh.rb CHANGED
@@ -5,6 +5,7 @@ require_relative 'dwh/errors'
5
5
  require_relative 'dwh/logger'
6
6
  require_relative 'dwh/streaming_stats'
7
7
  require_relative 'dwh/factory'
8
+ require_relative 'dwh/token_store'
8
9
  require_relative 'dwh/adapters'
9
10
  require_relative 'dwh/table'
10
11
  require_relative 'dwh/table_stats'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dwh
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ajo Abraham
@@ -164,6 +164,7 @@ files:
164
164
  - lib/dwh/adapters/snowflake.rb
165
165
  - lib/dwh/adapters/sql_server.rb
166
166
  - lib/dwh/adapters/sqlite.rb
167
+ - lib/dwh/adapters/token_manageable.rb
167
168
  - lib/dwh/adapters/trino.rb
168
169
  - lib/dwh/behaviors.rb
169
170
  - lib/dwh/capabilities.rb
@@ -192,6 +193,7 @@ files:
192
193
  - lib/dwh/streaming_stats.rb
193
194
  - lib/dwh/table.rb
194
195
  - lib/dwh/table_stats.rb
196
+ - lib/dwh/token_store.rb
195
197
  - lib/dwh/version.rb
196
198
  - sig/dwh.rbs
197
199
  homepage: https://www.strata.site