dwh 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +26 -0
- data/docs/guides/adapters.md +84 -2
- data/docs/guides/creating-adapters.md +43 -0
- data/docs/guides/usage.md +65 -0
- data/lib/dwh/adapters/athena.rb +8 -1
- data/lib/dwh/adapters/databricks.rb +338 -0
- data/lib/dwh/adapters/duck_db.rb +7 -1
- data/lib/dwh/adapters/my_sql.rb +7 -1
- data/lib/dwh/adapters/open_authorizable.rb +159 -22
- data/lib/dwh/adapters/postgres.rb +7 -1
- data/lib/dwh/adapters/snowflake.rb +1 -1
- data/lib/dwh/adapters/sql_server.rb +7 -1
- data/lib/dwh/adapters/token_manageable.rb +81 -0
- data/lib/dwh/adapters/trino.rb +7 -1
- data/lib/dwh/adapters.rb +7 -0
- data/lib/dwh/settings/databricks.yml +1 -2
- data/lib/dwh/token_store.rb +24 -0
- data/lib/dwh/version.rb +1 -1
- data/lib/dwh.rb +3 -0
- metadata +4 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: e782fe9e0167d10f1672d0ce2b1601445cd0484c2e9b68b46df11be0cd0f20ca
|
|
4
|
+
data.tar.gz: 795f5cb0173413e2a475b216f824aed8b0de975a4abb39e30fa27f7cccb9f779
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f42b78511e879191933ff87b4441041b63541413c37f3f308a4854564f278ef4e8ff91e7547a573f03104e37e4fb068e187ee5d3cbab9fe834dbf129b27f68ac
|
|
7
|
+
data.tar.gz: ef6f624c3e0f7f2dfd9deff755af97d28798aed6c8c8ddea474899fbb02f079bfc4a80772ddb2048e9d0777da4f5a508a51ede9669159082758f8019e523bb08
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,31 @@
|
|
|
1
1
|
## [Unreleased]
|
|
2
2
|
|
|
3
|
+
## [0.4.0] - 2026-04-28
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
- Token persistence interface via `DWH::TokenStore` for adapters that support OAuth token lifecycle management.
|
|
8
|
+
- `TokenManageable` adapter concern for standardized token read/write behavior across adapters.
|
|
9
|
+
- PKCE-based U2M OAuth support for the Databricks adapter.
|
|
10
|
+
- Expanded tests for OAuth and Databricks token flows.
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
- Databricks adapter now requires explicit `auth_mode` to reduce ambiguous auth configuration.
|
|
15
|
+
- Updated documentation for adapter auth/token usage and adapter authoring.
|
|
16
|
+
|
|
17
|
+
## [0.3.0] - 2026-04-22
|
|
18
|
+
|
|
19
|
+
### Changed
|
|
20
|
+
|
|
21
|
+
- Added Databricks Adapter
|
|
22
|
+
|
|
23
|
+
## [0.2.1] - 2025-01-27
|
|
24
|
+
|
|
25
|
+
### Changed
|
|
26
|
+
|
|
27
|
+
- **Adapter missing-gem error messages** (Athena, DuckDB, MySQL, PostgreSQL, SQL Server, Trino): replace platform-specific system library install instructions with links to official documentation. Messages now include `gem install` and a single link for system libraries.
|
|
28
|
+
|
|
3
29
|
## [0.2.0] - 2025-10-12
|
|
4
30
|
|
|
5
31
|
### Added
|
data/docs/guides/adapters.md
CHANGED
|
@@ -179,7 +179,7 @@ adapter = DWH.create(:snowflake, {
|
|
|
179
179
|
account_identifier: 'myorg-myaccount.us-east-1',
|
|
180
180
|
oauth_client_id: '<YOUR_CLIENT_ID>',
|
|
181
181
|
oauth_client_secret: '<YOUR_CLIENT_SECRET>',
|
|
182
|
-
|
|
182
|
+
oauth_redirect_uri: 'https://localhost:3030/some/path',
|
|
183
183
|
database: 'ANALYTICS',
|
|
184
184
|
client_name: 'myapp' # sent as user agent header value
|
|
185
185
|
})
|
|
@@ -189,10 +189,92 @@ To successfully use OAuth you have to pass the adapter valid access and refresh
|
|
|
189
189
|
|
|
190
190
|
The typical flow is like so:
|
|
191
191
|
|
|
192
|
-
1. Generate an authorization code by visiting the url generated by `adapter.authorization_url.` This will redirect to the configured `
|
|
192
|
+
1. Generate an authorization code by visiting the url generated by `adapter.authorization_url.` This will redirect to the configured `oauth_redirect_uri.` You must be able to retrieve the `code` from there.
|
|
193
193
|
2. Take the code from above and generate new access tokens: `adapter.generate_oauth_tokens(code)`. This will return Hash with access_token and refresh_token. You can cache and reuse this until the refresh_token gets expired. This method will also apply the token to the current adapter instance.
|
|
194
194
|
3. You can apply an existing set of tokens like so:`adapter.apply_oauth_tokens(access_token: token, refresh_token: token, expires_at: Time.now)`
|
|
195
195
|
|
|
196
|
+
### Host integration contract (CLI / server)
|
|
197
|
+
|
|
198
|
+
The host application is responsible for orchestration and persistence. DWH is responsible for OAuth protocol calls and token lifecycle methods.
|
|
199
|
+
|
|
200
|
+
Public OAuth methods you can call on adapters that include `OpenAuthorizable`:
|
|
201
|
+
|
|
202
|
+
- `authorization_url(state:, scope:)` - build provider authorize URL (authorization-code flow only)
|
|
203
|
+
- `generate_oauth_tokens(code)` - exchange auth code for tokens and apply/store them
|
|
204
|
+
- `apply_oauth_tokens(access_token:, refresh_token:, expires_at:)` - inject tokens from host storage
|
|
205
|
+
- `oauth_access_token` - get a usable access token (load/refresh/mint as needed)
|
|
206
|
+
- `refresh_access_token` - explicitly refresh using current refresh token
|
|
207
|
+
- `mint_access_token` - explicitly mint via client credentials (M2M only)
|
|
208
|
+
- `oauth_token_info` - inspect current token state
|
|
209
|
+
|
|
210
|
+
Host requirements:
|
|
211
|
+
|
|
212
|
+
1. Build and pass adapter config to `DWH.create(...)`, including correct `auth_mode` and OAuth fields.
|
|
213
|
+
2. For U2M, implement browser redirect + callback capture, then call `generate_oauth_tokens(code)`.
|
|
214
|
+
3. Validate OAuth `state` in the host callback handler (DWH does not enforce callback state verification).
|
|
215
|
+
4. Persist tokens either by:
|
|
216
|
+
- passing a `token_store` object (`load`, `store`, `delete`), or
|
|
217
|
+
- storing tokens externally and rehydrating with `apply_oauth_tokens`.
|
|
218
|
+
5. Handle auth exceptions and trigger reconnect UX when needed (for example, expired/invalid refresh token).
|
|
219
|
+
|
|
220
|
+
Call order by mode:
|
|
221
|
+
|
|
222
|
+
- **M2M (`oauth_m2m`)**
|
|
223
|
+
1. `adapter = DWH.create(...)`
|
|
224
|
+
2. Run query/test methods (`execute`, `test_connection`, etc.)
|
|
225
|
+
3. DWH internally calls `oauth_access_token`, which mints or refreshes as required
|
|
226
|
+
|
|
227
|
+
- **U2M (`oauth_u2m`)**
|
|
228
|
+
1. `adapter = DWH.create(...)`
|
|
229
|
+
2. `url = adapter.authorization_url(...)`
|
|
230
|
+
3. Host sends user to URL and receives callback with `code`
|
|
231
|
+
4. `adapter.generate_oauth_tokens(code)`
|
|
232
|
+
5. Run query/test methods; DWH reuses and refreshes tokens as needed
|
|
233
|
+
|
|
234
|
+
Note: for PKCE-enabled U2M providers, call `authorization_url` and `generate_oauth_tokens` on the same adapter instance.
|
|
235
|
+
|
|
236
|
+
## Databricks
|
|
237
|
+
|
|
238
|
+
The Databricks adapter uses the SQL Statements REST API and supports OAuth with
|
|
239
|
+
both machine-to-machine (M2M) and user-to-machine (U2M) authorization-code flow.
|
|
240
|
+
Set `auth_mode` explicitly to select the flow.
|
|
241
|
+
|
|
242
|
+
### Basic configuration
|
|
243
|
+
|
|
244
|
+
```ruby
|
|
245
|
+
adapter = DWH.create(:databricks, {
|
|
246
|
+
host: 'workspace.cloud.databricks.com',
|
|
247
|
+
auth_mode: 'oauth_m2m',
|
|
248
|
+
warehouse: 'warehouse_id',
|
|
249
|
+
oauth_client_id: '<CLIENT_ID>',
|
|
250
|
+
oauth_client_secret: '<CLIENT_SECRET>',
|
|
251
|
+
catalog: 'main',
|
|
252
|
+
schema: 'default'
|
|
253
|
+
})
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
### M2M (service principal) flow
|
|
257
|
+
|
|
258
|
+
Set `auth_mode: 'oauth_m2m'`. The adapter mints access tokens using
|
|
259
|
+
`grant_type=client_credentials`.
|
|
260
|
+
|
|
261
|
+
### U2M (authorization code) flow
|
|
262
|
+
|
|
263
|
+
Set `auth_mode: 'oauth_u2m'` and provide `oauth_redirect_uri` in config, then run:
|
|
264
|
+
|
|
265
|
+
1. Generate authorize URL from `adapter.authorization_url`
|
|
266
|
+
2. Capture `code` from redirect callback
|
|
267
|
+
3. Exchange with `adapter.generate_oauth_tokens(code)`
|
|
268
|
+
|
|
269
|
+
When U2M is active, PKCE is applied automatically by the adapter.
|
|
270
|
+
|
|
271
|
+
### Migration note
|
|
272
|
+
|
|
273
|
+
Databricks now requires explicit `auth_mode`.
|
|
274
|
+
|
|
275
|
+
- Existing service-principal setups should set `auth_mode: 'oauth_m2m'`
|
|
276
|
+
- U2M setups should set `auth_mode: 'oauth_u2m'` and provide `oauth_redirect_uri`
|
|
277
|
+
|
|
196
278
|
## MySQL Adapter
|
|
197
279
|
|
|
198
280
|
The MySQL adapter uses the `mysql2` gem. Note that MySQL's concept of "database" maps to "schema" in DWH.
|
|
@@ -363,6 +363,49 @@ end
|
|
|
363
363
|
|
|
364
364
|
## Advanced Features
|
|
365
365
|
|
|
366
|
+
### Identity-Bound Token Store Integration
|
|
367
|
+
|
|
368
|
+
Adapters that use OAuth/M2M token exchange can support host-managed persistence by
|
|
369
|
+
accepting `token_store` in adapter config and using base helpers from `Adapter`.
|
|
370
|
+
|
|
371
|
+
`token_store` should be identity-bound by the host app (for example datasource-bound
|
|
372
|
+
for service accounts, user+datasource-bound for per-user OAuth). The adapter should
|
|
373
|
+
not parse or infer identity keys.
|
|
374
|
+
|
|
375
|
+
```ruby
|
|
376
|
+
class MyOAuthAdapter < Adapter
|
|
377
|
+
config :token_store, Object, required: false, default: nil
|
|
378
|
+
|
|
379
|
+
def access_token
|
|
380
|
+
payload = load_tokens_from_store
|
|
381
|
+
apply_token_payload(payload) if payload
|
|
382
|
+
return @access_token if @access_token && !token_expired?
|
|
383
|
+
|
|
384
|
+
token = request_new_token!
|
|
385
|
+
store_tokens_in_store(
|
|
386
|
+
access_token: token[:access_token],
|
|
387
|
+
refresh_token: token[:refresh_token],
|
|
388
|
+
expires_at: token[:expires_at]
|
|
389
|
+
)
|
|
390
|
+
@access_token
|
|
391
|
+
end
|
|
392
|
+
end
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
Expected token-store methods:
|
|
396
|
+
|
|
397
|
+
- `load` -> returns `nil` or hash with `access_token`, optional `refresh_token`, and `expires_at`
|
|
398
|
+
- `store(token_hash)` -> persists latest token payload
|
|
399
|
+
- `delete` -> optional revoke/cleanup hook for terminal auth failures
|
|
400
|
+
|
|
401
|
+
For OAuth adapters, keep token persistence centralized and only override provider hooks:
|
|
402
|
+
|
|
403
|
+
- `oauth_supports_authorization_code_flow?` -> `true` for U2M flows
|
|
404
|
+
- `oauth_supports_client_credentials_flow?` -> `true` for M2M flows
|
|
405
|
+
- `oauth_client_credentials_params` -> provider-specific client-credentials form body
|
|
406
|
+
- `oauth_tokenization_url` -> provider token endpoint
|
|
407
|
+
- `oauth_token_expiry_leeway_seconds` -> eager refresh buffer (for near-expiry tokens)
|
|
408
|
+
|
|
366
409
|
### Error Handling
|
|
367
410
|
|
|
368
411
|
```ruby
|
data/docs/guides/usage.md
CHANGED
|
@@ -341,6 +341,71 @@ readonly_analytics = DWH.create(:sqlite, {
|
|
|
341
341
|
})
|
|
342
342
|
```
|
|
343
343
|
|
|
344
|
+
## Identity-Bound Token Stores
|
|
345
|
+
|
|
346
|
+
For OAuth and M2M adapters, DWH can optionally reuse tokens through a host-provided
|
|
347
|
+
`token_store` object. The store should be identity-bound before it is passed into
|
|
348
|
+
adapter config so DWH remains agnostic of app-level user/datasource models.
|
|
349
|
+
|
|
350
|
+
### Token Store Contract
|
|
351
|
+
|
|
352
|
+
`DWH::TokenStore` is the reference contract, but duck-typed objects are supported.
|
|
353
|
+
|
|
354
|
+
```ruby
|
|
355
|
+
class MyTokenStore < DWH::TokenStore
|
|
356
|
+
def load
|
|
357
|
+
# return nil or hash with:
|
|
358
|
+
# access_token (optional), refresh_token (optional), expires_at (optional)
|
|
359
|
+
#
|
|
360
|
+
# Notes:
|
|
361
|
+
# - providing expires_at enables proactive refresh/mint behavior
|
|
362
|
+
# - providing refresh_token enables refresh when access_token expires
|
|
363
|
+
end
|
|
364
|
+
|
|
365
|
+
def store(token)
|
|
366
|
+
# token includes at least access_token and expires_at
|
|
367
|
+
end
|
|
368
|
+
|
|
369
|
+
def delete
|
|
370
|
+
# optional cleanup/revoke path for terminal auth failures
|
|
371
|
+
end
|
|
372
|
+
end
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
### Service Account (M2M) Example
|
|
376
|
+
|
|
377
|
+
```ruby
|
|
378
|
+
store = DatasourceTokenStore.new(datasource_id: datasource.id)
|
|
379
|
+
|
|
380
|
+
client = DWH.create(:databricks, {
|
|
381
|
+
host: datasource.host,
|
|
382
|
+
auth_mode: 'oauth_m2m',
|
|
383
|
+
warehouse: datasource.warehouse,
|
|
384
|
+
oauth_client_id: datasource.oauth_client_id,
|
|
385
|
+
oauth_client_secret: datasource.oauth_client_secret,
|
|
386
|
+
token_store: store
|
|
387
|
+
})
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
### Per-User OAuth Example (host-owned)
|
|
391
|
+
|
|
392
|
+
```ruby
|
|
393
|
+
store = UserDatasourceTokenStore.new(
|
|
394
|
+
user_id: current_user.id,
|
|
395
|
+
datasource_id: datasource.id
|
|
396
|
+
)
|
|
397
|
+
|
|
398
|
+
client = DWH.create(:snowflake, {
|
|
399
|
+
auth_mode: 'oauth',
|
|
400
|
+
account_identifier: datasource.account_identifier,
|
|
401
|
+
database: datasource.database,
|
|
402
|
+
oauth_client_id: datasource.oauth_client_id,
|
|
403
|
+
oauth_client_secret: datasource.oauth_client_secret,
|
|
404
|
+
oauth_redirect_uri: datasource.oauth_redirect_uri,
|
|
405
|
+
token_store: store
|
|
406
|
+
})
|
|
407
|
+
```
|
|
408
|
+
|
|
344
409
|
## Error Handling and Debugging
|
|
345
410
|
|
|
346
411
|
### Comprehensive Error Handling
|
data/lib/dwh/adapters/athena.rb
CHANGED
|
@@ -202,8 +202,15 @@ module DWH
|
|
|
202
202
|
def valid_config?
|
|
203
203
|
super
|
|
204
204
|
require 'aws-sdk-athena'
|
|
205
|
+
require 'aws-sdk-s3'
|
|
205
206
|
rescue LoadError
|
|
206
|
-
raise ConfigError,
|
|
207
|
+
raise ConfigError, <<~MSG
|
|
208
|
+
Athena adapter requires the 'aws-sdk-athena' and 'aws-sdk-s3' gems.
|
|
209
|
+
|
|
210
|
+
Install with: gem install aws-sdk-athena aws-sdk-s3
|
|
211
|
+
|
|
212
|
+
No system libraries required (pure Ruby).
|
|
213
|
+
MSG
|
|
207
214
|
end
|
|
208
215
|
|
|
209
216
|
private
|
|
@@ -0,0 +1,338 @@
|
|
|
1
|
+
require 'csv'
|
|
2
|
+
require_relative 'open_authorizable'
|
|
3
|
+
|
|
4
|
+
module DWH
|
|
5
|
+
module Adapters
|
|
6
|
+
# Databricks adapter for executing SQL queries against Databricks SQL warehouses.
|
|
7
|
+
#
|
|
8
|
+
# Supports OAuth M2M (service principal) and U2M (authorization code) flows.
|
|
9
|
+
# The host application must set auth_mode explicitly:
|
|
10
|
+
# - oauth_m2m: client_credentials flow
|
|
11
|
+
# - oauth_u2m: authorization_code + PKCE flow
|
|
12
|
+
#
|
|
13
|
+
# @example Connection with OAuth (service principal)
|
|
14
|
+
# DWH.create(:databricks, {
|
|
15
|
+
# host: 'adb-1234567890123456.7.azuredatabricks.net',
|
|
16
|
+
# warehouse: 'abc123def456',
|
|
17
|
+
# oauth_client_id: 'service-principal-app-id',
|
|
18
|
+
# oauth_client_secret: 'your-oauth-secret-here',
|
|
19
|
+
# catalog: 'main',
|
|
20
|
+
# schema: 'default'
|
|
21
|
+
# })
|
|
22
|
+
class Databricks < Adapter
|
|
23
|
+
include OpenAuthorizable
|
|
24
|
+
|
|
25
|
+
oauth_with authorize: ->(adapter) { "https://#{adapter.host}/oidc/v1/authorize" },
|
|
26
|
+
tokenize: ->(adapter) { "https://#{adapter.host}/oidc/v1/token" },
|
|
27
|
+
default_scope: 'all-apis'
|
|
28
|
+
|
|
29
|
+
config :host, String, required: true, message: 'Databricks workspace host (e.g., adb-xxx.databricks.cloud.com)'
|
|
30
|
+
config :auth_mode, String, required: true, allowed: %w[oauth_m2m oauth_u2m],
|
|
31
|
+
message: 'Authentication mode: oauth_m2m or oauth_u2m'
|
|
32
|
+
config :oauth_client_id, String, required: true, message: 'OAuth client ID (service principal application ID)'
|
|
33
|
+
config :oauth_client_secret, String, required: true, message: 'OAuth client secret'
|
|
34
|
+
config :client_name, String, required: false, default: 'Ruby DWH Gem', message: 'Client name sent to Databricks'
|
|
35
|
+
config :query_timeout, Integer, required: false, default: 3600, message: 'Query execution timeout in seconds'
|
|
36
|
+
config :warehouse, String, required: true, message: 'Databricks SQL warehouse ID to use for query execution'
|
|
37
|
+
config :catalog, String, required: false, message: 'Default catalog (Unity Catalog)'
|
|
38
|
+
config :schema, String, required: false, message: 'Default schema'
|
|
39
|
+
|
|
40
|
+
DEFAULT_POLL_INTERVAL = 0.25
|
|
41
|
+
MAX_POLL_INTERVAL = 30
|
|
42
|
+
|
|
43
|
+
STATEMENTS_API = '/api/2.0/sql/statements'.freeze
|
|
44
|
+
|
|
45
|
+
def initialize(config)
|
|
46
|
+
super
|
|
47
|
+
validate_oauth_config
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def connection
|
|
51
|
+
return @connection if @connection && !token_expired?
|
|
52
|
+
|
|
53
|
+
reset_connection if token_expired?
|
|
54
|
+
@connection = Faraday.new(
|
|
55
|
+
url: "https://#{workspace_host}",
|
|
56
|
+
headers: {
|
|
57
|
+
'Content-Type' => 'application/json',
|
|
58
|
+
'Authorization' => "Bearer #{oauth_access_token}",
|
|
59
|
+
'User-Agent' => config[:client_name]
|
|
60
|
+
},
|
|
61
|
+
request: {
|
|
62
|
+
timeout: config[:query_timeout]
|
|
63
|
+
}.merge(extra_connection_params)
|
|
64
|
+
)
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
def test_connection(raise_exception: false)
|
|
68
|
+
execute('SELECT 1')
|
|
69
|
+
true
|
|
70
|
+
rescue StandardError => e
|
|
71
|
+
raise ConnectionError, "Failed to connect to Databricks: #{e.message}" if raise_exception
|
|
72
|
+
|
|
73
|
+
logger.error "Connection test failed: #{e.message}"
|
|
74
|
+
false
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
# (see Adapter#execute)
|
|
78
|
+
def execute(sql, format: :array, retries: 0)
|
|
79
|
+
result = with_retry(retries + 1) do
|
|
80
|
+
with_debug(sql) do
|
|
81
|
+
response = submit_query(sql)
|
|
82
|
+
fetch_data(handle_query_response(response))
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
format_result(result, format)
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
def execute_stream(sql, io, stats: nil, retries: 0)
|
|
90
|
+
with_retry(retries) do
|
|
91
|
+
with_debug(sql) do
|
|
92
|
+
response = submit_query(sql)
|
|
93
|
+
fetch_data(handle_query_response(response), io: io, stats: stats)
|
|
94
|
+
end
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
io.rewind
|
|
98
|
+
io
|
|
99
|
+
end
|
|
100
|
+
|
|
101
|
+
# Execute SQL query and yield streamed results
|
|
102
|
+
# @param sql [String] SQL query to execute
|
|
103
|
+
# @yield [chunk] yields each chunk of data as it's processed
|
|
104
|
+
def stream(sql, &block)
|
|
105
|
+
with_debug(sql) do
|
|
106
|
+
response = submit_query(sql)
|
|
107
|
+
fetch_data(handle_query_response(response), proc: block)
|
|
108
|
+
end
|
|
109
|
+
end
|
|
110
|
+
|
|
111
|
+
def tables(**qualifiers)
|
|
112
|
+
catalog = qualifiers[:catalog] || config[:catalog]
|
|
113
|
+
schema = qualifiers[:schema] || config[:schema]
|
|
114
|
+
|
|
115
|
+
raise ConfigError, 'catalog is required for Databricks tables query' unless catalog
|
|
116
|
+
|
|
117
|
+
sql = "SELECT table_name FROM #{catalog}.information_schema.tables"
|
|
118
|
+
sql += " WHERE table_schema = '#{schema}'" if schema
|
|
119
|
+
|
|
120
|
+
result = execute(sql)
|
|
121
|
+
result.flatten
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
def metadata(table, **qualifiers)
|
|
125
|
+
catalog = qualifiers[:catalog] || config[:catalog]
|
|
126
|
+
schema = qualifiers[:schema] || config[:schema]
|
|
127
|
+
|
|
128
|
+
raise ConfigError, 'catalog is required for Databricks metadata query' unless catalog
|
|
129
|
+
|
|
130
|
+
db_table = Table.new(table, schema: schema, catalog: catalog)
|
|
131
|
+
|
|
132
|
+
sql = <<~SQL
|
|
133
|
+
SELECT column_name, data_type, numeric_precision, numeric_scale, character_maximum_length
|
|
134
|
+
FROM #{catalog}.information_schema.columns
|
|
135
|
+
WHERE table_name = '#{db_table.physical_name}'
|
|
136
|
+
SQL
|
|
137
|
+
sql += " AND table_schema = '#{db_table.schema}'" if db_table.schema
|
|
138
|
+
|
|
139
|
+
columns = execute(sql)
|
|
140
|
+
|
|
141
|
+
columns.each do |col|
|
|
142
|
+
db_table << Column.new(
|
|
143
|
+
name: col[0]&.downcase,
|
|
144
|
+
data_type: col[1]&.downcase,
|
|
145
|
+
precision: col[2],
|
|
146
|
+
scale: col[3],
|
|
147
|
+
max_char_length: col[4]
|
|
148
|
+
)
|
|
149
|
+
end
|
|
150
|
+
|
|
151
|
+
db_table
|
|
152
|
+
end
|
|
153
|
+
|
|
154
|
+
def stats(table, date_column: nil)
|
|
155
|
+
date_fields = if date_column
|
|
156
|
+
", MIN(#{date_column}) AS date_start, MAX(#{date_column}) AS date_end"
|
|
157
|
+
else
|
|
158
|
+
', NULL AS date_start, NULL AS date_end'
|
|
159
|
+
end
|
|
160
|
+
|
|
161
|
+
data = execute("SELECT COUNT(*) AS row_count#{date_fields} FROM #{table}")
|
|
162
|
+
cols = data.first
|
|
163
|
+
|
|
164
|
+
TableStats.new(
|
|
165
|
+
row_count: cols[0],
|
|
166
|
+
date_start: cols[1],
|
|
167
|
+
date_end: cols[2]
|
|
168
|
+
)
|
|
169
|
+
end
|
|
170
|
+
|
|
171
|
+
private
|
|
172
|
+
|
|
173
|
+
def reset_connection
|
|
174
|
+
@oauth_access_token = nil
|
|
175
|
+
@oauth_refresh_token = nil
|
|
176
|
+
@token_expires_at = nil
|
|
177
|
+
close
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
def submit_query(sql)
|
|
181
|
+
connection.post(STATEMENTS_API) do |req|
|
|
182
|
+
req.body = {
|
|
183
|
+
statement: sql,
|
|
184
|
+
warehouse_id: config[:warehouse],
|
|
185
|
+
catalog: config[:catalog],
|
|
186
|
+
schema: config[:schema],
|
|
187
|
+
wait_timeout: '30s',
|
|
188
|
+
on_wait_timeout: 'CONTINUE',
|
|
189
|
+
format: 'JSON_ARRAY',
|
|
190
|
+
disposition: 'INLINE'
|
|
191
|
+
}.compact.merge(extra_query_params).to_json
|
|
192
|
+
end
|
|
193
|
+
end
|
|
194
|
+
|
|
195
|
+
def handle_query_response(response)
|
|
196
|
+
body = JSON.parse(response.body)
|
|
197
|
+
|
|
198
|
+
case response.status
|
|
199
|
+
when 200
|
|
200
|
+
state = body.dig('status', 'state')
|
|
201
|
+
state == 'SUCCEEDED' ? body : poll(body['statement_id'])
|
|
202
|
+
when 202
|
|
203
|
+
poll(body['statement_id'])
|
|
204
|
+
else
|
|
205
|
+
error_message = body['message'] || body['error_code'] || response.body
|
|
206
|
+
raise ExecutionError, "Databricks query failed (#{response.status}): #{error_message}"
|
|
207
|
+
end
|
|
208
|
+
end
|
|
209
|
+
|
|
210
|
+
def poll(statement_id)
|
|
211
|
+
sleep_interval = DEFAULT_POLL_INTERVAL
|
|
212
|
+
|
|
213
|
+
logger.debug "Polling for query completion: #{statement_id}"
|
|
214
|
+
|
|
215
|
+
loop do
|
|
216
|
+
response = connection.get("#{STATEMENTS_API}/#{statement_id}")
|
|
217
|
+
body = JSON.parse(response.body)
|
|
218
|
+
state = body.dig('status', 'state')
|
|
219
|
+
|
|
220
|
+
case state
|
|
221
|
+
when 'SUCCEEDED'
|
|
222
|
+
return body
|
|
223
|
+
when 'FAILED', 'CANCELED', 'CLOSED'
|
|
224
|
+
error_msg = body.dig('status', 'error', 'message') || state
|
|
225
|
+
raise ExecutionError, "Databricks query #{state}: #{error_msg}"
|
|
226
|
+
else
|
|
227
|
+
logger.debug "Query still running (state: #{state}). Sleeping #{sleep_interval}s..."
|
|
228
|
+
sleep(sleep_interval)
|
|
229
|
+
sleep_interval = sleep_interval == MAX_POLL_INTERVAL ? DEFAULT_POLL_INTERVAL : sleep_interval
|
|
230
|
+
sleep_interval = [sleep_interval * 2, MAX_POLL_INTERVAL].min
|
|
231
|
+
end
|
|
232
|
+
end
|
|
233
|
+
end
|
|
234
|
+
|
|
235
|
+
def fetch_data(result, io: nil, stats: nil, proc: nil)
|
|
236
|
+
columns = result.dig('manifest', 'schema', 'columns')&.map { |col| col['name'] } || []
|
|
237
|
+
chunks = result.dig('manifest', 'chunks') || []
|
|
238
|
+
collector = {
|
|
239
|
+
columns: columns,
|
|
240
|
+
data: [],
|
|
241
|
+
io: io,
|
|
242
|
+
stats: stats,
|
|
243
|
+
wrote_header: false
|
|
244
|
+
}
|
|
245
|
+
|
|
246
|
+
write_data(result.dig('result', 'data_array') || [], collector, io, stats, proc)
|
|
247
|
+
|
|
248
|
+
return collector unless chunks.size > 1
|
|
249
|
+
|
|
250
|
+
statement_id = result['statement_id']
|
|
251
|
+
chunks[1..].each do |chunk|
|
|
252
|
+
chunk_index = chunk['chunk_index']
|
|
253
|
+
logger.debug "Fetching chunk #{chunk_index} of #{chunks.size} for statement: #{statement_id}"
|
|
254
|
+
|
|
255
|
+
resp = connection.get("#{STATEMENTS_API}/#{statement_id}/result/chunks/#{chunk_index}")
|
|
256
|
+
raise ExecutionError, "Failed to fetch chunk #{chunk_index}: #{resp.body}" unless resp.status == 200
|
|
257
|
+
|
|
258
|
+
chunk_data = JSON.parse(resp.body)
|
|
259
|
+
write_data(chunk_data['data_array'] || [], collector, io, stats, proc)
|
|
260
|
+
end
|
|
261
|
+
|
|
262
|
+
collector
|
|
263
|
+
end
|
|
264
|
+
|
|
265
|
+
def write_data(data, collector, io = nil, stats = nil, proc = nil)
|
|
266
|
+
if io
|
|
267
|
+
unless collector[:wrote_header]
|
|
268
|
+
io << CSV.generate_line(collector[:columns])
|
|
269
|
+
collector[:wrote_header] = true
|
|
270
|
+
end
|
|
271
|
+
|
|
272
|
+
data.each do |row|
|
|
273
|
+
stats << row if stats
|
|
274
|
+
io << CSV.generate_line(row)
|
|
275
|
+
end
|
|
276
|
+
elsif proc
|
|
277
|
+
data.each { proc.call(it) }
|
|
278
|
+
else
|
|
279
|
+
data.each { collector[:data] << it }
|
|
280
|
+
end
|
|
281
|
+
|
|
282
|
+
collector
|
|
283
|
+
end
|
|
284
|
+
|
|
285
|
+
def format_result(result, format)
|
|
286
|
+
data = result[:data]
|
|
287
|
+
columns = result[:columns]
|
|
288
|
+
|
|
289
|
+
case format
|
|
290
|
+
when :array
|
|
291
|
+
data
|
|
292
|
+
when :object
|
|
293
|
+
data.map { |row| columns.zip(row).to_h }
|
|
294
|
+
when :csv
|
|
295
|
+
CSV.generate do |csv|
|
|
296
|
+
csv << columns
|
|
297
|
+
data.each { |row| csv << row }
|
|
298
|
+
end
|
|
299
|
+
when :native
|
|
300
|
+
result
|
|
301
|
+
else
|
|
302
|
+
raise UnsupportedCapability, "Unknown result format: #{format}"
|
|
303
|
+
end
|
|
304
|
+
end
|
|
305
|
+
|
|
306
|
+
def workspace_host
|
|
307
|
+
config[:host].to_s
|
|
308
|
+
end
|
|
309
|
+
|
|
310
|
+
def oauth_supports_authorization_code_flow?
|
|
311
|
+
auth_mode == 'oauth_u2m'
|
|
312
|
+
end
|
|
313
|
+
|
|
314
|
+
def oauth_supports_client_credentials_flow?
|
|
315
|
+
auth_mode == 'oauth_m2m'
|
|
316
|
+
end
|
|
317
|
+
|
|
318
|
+
def oauth_redirect_uri_required?
|
|
319
|
+
oauth_supports_authorization_code_flow?
|
|
320
|
+
end
|
|
321
|
+
|
|
322
|
+
def oauth_client_credentials_params
|
|
323
|
+
{
|
|
324
|
+
grant_type: 'client_credentials',
|
|
325
|
+
scope: 'all-apis'
|
|
326
|
+
}
|
|
327
|
+
end
|
|
328
|
+
|
|
329
|
+
def oauth_token_expiry_leeway_seconds
|
|
330
|
+
30
|
|
331
|
+
end
|
|
332
|
+
|
|
333
|
+
def oauth_uses_pkce?
|
|
334
|
+
oauth_supports_authorization_code_flow?
|
|
335
|
+
end
|
|
336
|
+
end
|
|
337
|
+
end
|
|
338
|
+
end
|
data/lib/dwh/adapters/duck_db.rb
CHANGED
|
@@ -209,7 +209,13 @@ module DWH
|
|
|
209
209
|
super
|
|
210
210
|
require 'duckdb'
|
|
211
211
|
rescue LoadError
|
|
212
|
-
raise ConfigError,
|
|
212
|
+
raise ConfigError, <<~MSG
|
|
213
|
+
DuckDB adapter requires the 'duckdb' gem.
|
|
214
|
+
|
|
215
|
+
Install with: gem install duckdb
|
|
216
|
+
|
|
217
|
+
See https://github.com/suketa/ruby-duckdb for installation details.
|
|
218
|
+
MSG
|
|
213
219
|
end
|
|
214
220
|
|
|
215
221
|
private
|
data/lib/dwh/adapters/my_sql.rb
CHANGED
|
@@ -219,7 +219,13 @@ module DWH
|
|
|
219
219
|
super
|
|
220
220
|
require 'mysql2'
|
|
221
221
|
rescue LoadError
|
|
222
|
-
raise ConfigError,
|
|
222
|
+
raise ConfigError, <<~MSG
|
|
223
|
+
MySQL adapter requires the 'mysql2' gem.
|
|
224
|
+
|
|
225
|
+
Install with: gem install mysql2
|
|
226
|
+
|
|
227
|
+
System libraries: https://dev.mysql.com/downloads/
|
|
228
|
+
MSG
|
|
223
229
|
end
|
|
224
230
|
|
|
225
231
|
def result_to_csv(result)
|
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
require 'base64'
|
|
2
2
|
require 'securerandom'
|
|
3
|
+
require 'digest'
|
|
4
|
+
require_relative 'token_manageable'
|
|
3
5
|
|
|
4
6
|
module DWH
|
|
5
7
|
module Adapters
|
|
@@ -26,7 +28,7 @@ module DWH
|
|
|
26
28
|
module OpenAuthorizable
|
|
27
29
|
# rubcop:disable Style/DocumentationModule
|
|
28
30
|
module ClassMethods
|
|
29
|
-
def oauth_with(authorize
|
|
31
|
+
def oauth_with(authorize: nil, tokenize: nil, default_scope: 'refresh_token')
|
|
30
32
|
@oauth_settings = { authorize: authorize, tokenize: tokenize, default_scope: default_scope }
|
|
31
33
|
end
|
|
32
34
|
|
|
@@ -39,14 +41,18 @@ module DWH
|
|
|
39
41
|
|
|
40
42
|
def self.included(base)
|
|
41
43
|
base.extend(ClassMethods)
|
|
44
|
+
base.include(TokenManageable)
|
|
42
45
|
base.config :oauth_client_id, String, required: false, message: 'OAuth client_id'
|
|
43
46
|
base.config :oauth_client_secret, String, required: false, message: 'OAuth client_secret'
|
|
44
47
|
base.config :oauth_redirect_uri, String, required: false, message: 'OAuth redirect_uri'
|
|
45
|
-
base.config :oauth_scope, String, required: false, message: 'OAuth
|
|
48
|
+
base.config :oauth_scope, String, required: false, message: 'OAuth scope'
|
|
46
49
|
end
|
|
47
50
|
|
|
48
51
|
# Generate authorization URL for user to visit
|
|
49
52
|
def authorization_url(state: SecureRandom.hex(16), scope: nil)
|
|
53
|
+
raise UnsupportedCapability, "#{adapter_name} does not support authorization-code OAuth flow" unless oauth_supports_authorization_code_flow?
|
|
54
|
+
|
|
55
|
+
code_verifier = oauth_pkce_code_verifier_for_session
|
|
50
56
|
params = {
|
|
51
57
|
'response_type' => 'code',
|
|
52
58
|
'client_id' => oauth_client_id,
|
|
@@ -54,6 +60,7 @@ module DWH
|
|
|
54
60
|
'state' => state,
|
|
55
61
|
'scope' => scope || oauth_scope || oauth_settings[:default_scope]
|
|
56
62
|
}.compact
|
|
63
|
+
params.merge!(oauth_pkce_authorization_params(code_verifier))
|
|
57
64
|
|
|
58
65
|
uri = URI(oauth_settings[:authorize])
|
|
59
66
|
uri.query = URI.encode_www_form(params)
|
|
@@ -66,7 +73,7 @@ module DWH
|
|
|
66
73
|
#
|
|
67
74
|
# param access_token [String] the access token
|
|
68
75
|
# @param refresh_token [String] optional refresh token
|
|
69
|
-
def apply_oauth_tokens(access_token
|
|
76
|
+
def apply_oauth_tokens(access_token: nil, refresh_token: nil, expires_at: nil)
|
|
70
77
|
@oauth_access_token = access_token
|
|
71
78
|
@oauth_refresh_token = refresh_token
|
|
72
79
|
@token_expires_at = expires_at
|
|
@@ -77,11 +84,15 @@ module DWH
|
|
|
77
84
|
# @param authorization_code [String] this code should come from
|
|
78
85
|
# the redirect that is captured from the #authorization_url
|
|
79
86
|
def generate_oauth_tokens(authorization_code)
|
|
87
|
+
raise UnsupportedCapability, "#{adapter_name} does not support authorization-code OAuth flow" unless oauth_supports_authorization_code_flow?
|
|
88
|
+
|
|
89
|
+
code_verifier = oauth_pkce_code_verifier_for_session
|
|
80
90
|
params = {
|
|
81
91
|
grant_type: 'authorization_code',
|
|
82
92
|
code: authorization_code,
|
|
83
93
|
redirect_uri: oauth_redirect_uri
|
|
84
94
|
}
|
|
95
|
+
params.merge!(oauth_pkce_token_params(code_verifier))
|
|
85
96
|
|
|
86
97
|
response = oauth_http_client.post(oauth_tokenization_url) do |req|
|
|
87
98
|
req.headers['Content-Type'] = 'application/x-www-form-urlencoded'
|
|
@@ -95,14 +106,23 @@ module DWH
|
|
|
95
106
|
def refresh_access_token
|
|
96
107
|
raise AuthenticationError, 'No refresh token available' unless @oauth_refresh_token
|
|
97
108
|
|
|
98
|
-
params =
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
109
|
+
params = oauth_refresh_token_params
|
|
110
|
+
response = oauth_http_client.post(oauth_tokenization_url) do |req|
|
|
111
|
+
req.headers['Content-Type'] = 'application/x-www-form-urlencoded'
|
|
112
|
+
req.headers['Authorization'] = oauth_token_request_auth_header
|
|
113
|
+
req.body = URI.encode_www_form(params)
|
|
114
|
+
end
|
|
102
115
|
|
|
116
|
+
oauth_token_response(response)
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
def mint_access_token
|
|
120
|
+
raise UnsupportedCapability, "#{adapter_name} does not support client-credentials OAuth flow" unless oauth_supports_client_credentials_flow?
|
|
121
|
+
|
|
122
|
+
params = oauth_client_credentials_params
|
|
103
123
|
response = oauth_http_client.post(oauth_tokenization_url) do |req|
|
|
104
124
|
req.headers['Content-Type'] = 'application/x-www-form-urlencoded'
|
|
105
|
-
req.headers['Authorization'] =
|
|
125
|
+
req.headers['Authorization'] = oauth_token_request_auth_header
|
|
106
126
|
req.body = URI.encode_www_form(params)
|
|
107
127
|
end
|
|
108
128
|
|
|
@@ -116,22 +136,22 @@ module DWH
|
|
|
116
136
|
# @return [String] access token
|
|
117
137
|
# @raise [AuthenticationError]
|
|
118
138
|
def oauth_access_token
|
|
119
|
-
|
|
120
|
-
|
|
139
|
+
load_oauth_tokens_from_store! unless @oauth_access_token || @oauth_refresh_token
|
|
140
|
+
return @oauth_access_token if oauth_token_usable?
|
|
121
141
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
142
|
+
refresh_access_token if oauth_refresh_token_usable?
|
|
143
|
+
return @oauth_access_token if oauth_token_usable?
|
|
144
|
+
|
|
145
|
+
mint_access_token if oauth_supports_client_credentials_flow?
|
|
146
|
+
return @oauth_access_token if oauth_token_usable?
|
|
147
|
+
|
|
148
|
+
raise AuthenticationError,
|
|
149
|
+
'Access token was never set. Either run the auth flow, mint via client credentials, or set tokens via apply_oauth_tokens.'
|
|
130
150
|
end
|
|
131
151
|
|
|
132
152
|
# Check if we have a valid access token
|
|
133
153
|
def oauth_authenticated?
|
|
134
|
-
@oauth_access_token &&
|
|
154
|
+
@oauth_access_token && oauth_token_usable?
|
|
135
155
|
end
|
|
136
156
|
|
|
137
157
|
# Get current state of tokens
|
|
@@ -140,7 +160,7 @@ module DWH
|
|
|
140
160
|
access_token: @oauth_access_token,
|
|
141
161
|
refresh_token: @oauth_refresh_token,
|
|
142
162
|
expires_at: @token_expires_at,
|
|
143
|
-
expired:
|
|
163
|
+
expired: !oauth_token_usable?,
|
|
144
164
|
authenticated: oauth_authenticated?
|
|
145
165
|
}
|
|
146
166
|
end
|
|
@@ -148,9 +168,11 @@ module DWH
|
|
|
148
168
|
def validate_oauth_config
|
|
149
169
|
raise ConfigError, 'Missing config: oauth_client_id. Required for OAuth.' unless config[:oauth_client_id]
|
|
150
170
|
raise ConfigError, 'Missing config: oauth_client_secret. Required for OAuth.' unless config[:oauth_client_secret]
|
|
151
|
-
raise ConfigError, 'Missing config: oauth_redirect_url. Required for OAuth.' unless config[:oauth_redirect_uri]
|
|
152
171
|
|
|
153
|
-
|
|
172
|
+
raise ConfigError, 'Missing config: oauth_redirect_uri. Required for OAuth.' if oauth_redirect_uri_required? && !config[:oauth_redirect_uri]
|
|
173
|
+
|
|
174
|
+
oauth_settings if oauth_supports_authorization_code_flow?
|
|
175
|
+
true
|
|
154
176
|
end
|
|
155
177
|
|
|
156
178
|
def oauth_settings
|
|
@@ -170,6 +192,75 @@ module DWH
|
|
|
170
192
|
"Basic #{credentials}"
|
|
171
193
|
end
|
|
172
194
|
|
|
195
|
+
def oauth_token_request_auth_header
|
|
196
|
+
basic_auth_header
|
|
197
|
+
end
|
|
198
|
+
|
|
199
|
+
def oauth_refresh_token_params
|
|
200
|
+
{
|
|
201
|
+
grant_type: 'refresh_token',
|
|
202
|
+
refresh_token: @oauth_refresh_token
|
|
203
|
+
}
|
|
204
|
+
end
|
|
205
|
+
|
|
206
|
+
def oauth_client_credentials_params
|
|
207
|
+
{
|
|
208
|
+
grant_type: 'client_credentials',
|
|
209
|
+
scope: oauth_scope || oauth_settings[:default_scope]
|
|
210
|
+
}.compact
|
|
211
|
+
end
|
|
212
|
+
|
|
213
|
+
def oauth_supports_authorization_code_flow?
|
|
214
|
+
true
|
|
215
|
+
end
|
|
216
|
+
|
|
217
|
+
def oauth_supports_client_credentials_flow?
|
|
218
|
+
false
|
|
219
|
+
end
|
|
220
|
+
|
|
221
|
+
def oauth_redirect_uri_required?
|
|
222
|
+
oauth_supports_authorization_code_flow?
|
|
223
|
+
end
|
|
224
|
+
|
|
225
|
+
def oauth_token_expiry_leeway_seconds
|
|
226
|
+
0
|
|
227
|
+
end
|
|
228
|
+
|
|
229
|
+
# PKCE is optional and disabled by default
|
|
230
|
+
def oauth_uses_pkce?
|
|
231
|
+
false
|
|
232
|
+
end
|
|
233
|
+
|
|
234
|
+
def oauth_pkce_code_challenge_method
|
|
235
|
+
'S256'
|
|
236
|
+
end
|
|
237
|
+
|
|
238
|
+
def oauth_pkce_code_verifier_for_session
|
|
239
|
+
return nil unless oauth_uses_pkce?
|
|
240
|
+
|
|
241
|
+
@oauth_pkce_code_verifier_for_session ||= oauth_pkce_code_verifier
|
|
242
|
+
end
|
|
243
|
+
|
|
244
|
+
def oauth_pkce_code_verifier
|
|
245
|
+
SecureRandom.urlsafe_base64(64).delete('=')
|
|
246
|
+
end
|
|
247
|
+
|
|
248
|
+
def oauth_token_usable?
|
|
249
|
+
return false unless @oauth_access_token
|
|
250
|
+
|
|
251
|
+
!token_expiring_soon?
|
|
252
|
+
end
|
|
253
|
+
|
|
254
|
+
def oauth_refresh_token_usable?
|
|
255
|
+
@oauth_refresh_token && token_expired?
|
|
256
|
+
end
|
|
257
|
+
|
|
258
|
+
def token_expiring_soon?(seconds = oauth_token_expiry_leeway_seconds)
|
|
259
|
+
return true if @token_expires_at.nil?
|
|
260
|
+
|
|
261
|
+
(Time.now + seconds) >= @token_expires_at
|
|
262
|
+
end
|
|
263
|
+
|
|
173
264
|
def oauth_http_client
|
|
174
265
|
@oauth_http_client ||= Faraday.new(
|
|
175
266
|
headers: {
|
|
@@ -191,11 +282,20 @@ module DWH
|
|
|
191
282
|
# Calculate expiration time
|
|
192
283
|
expires_in = data['expires_in'] || 3600
|
|
193
284
|
@token_expires_at = Time.now + expires_in
|
|
285
|
+
store_tokens_in_store(
|
|
286
|
+
access_token: @oauth_access_token,
|
|
287
|
+
refresh_token: @oauth_refresh_token,
|
|
288
|
+
expires_at: @token_expires_at,
|
|
289
|
+
token_type: data['token_type'],
|
|
290
|
+
scope: data['scope'],
|
|
291
|
+
raw: data
|
|
292
|
+
)
|
|
194
293
|
|
|
195
294
|
{ success: true, data: data }
|
|
196
295
|
else
|
|
197
296
|
error_data = parse_error_response(response)
|
|
198
297
|
if error_data['error'] == 'invalid_grant' && @oauth_refresh_token
|
|
298
|
+
delete_tokens_from_store
|
|
199
299
|
raise TokenExpiredError, "Potentially expired refresh token. #{error_data['message']}"
|
|
200
300
|
end
|
|
201
301
|
|
|
@@ -205,11 +305,48 @@ module DWH
|
|
|
205
305
|
|
|
206
306
|
private
|
|
207
307
|
|
|
308
|
+
def load_oauth_tokens_from_store!
|
|
309
|
+
payload = load_tokens_from_store
|
|
310
|
+
return unless payload
|
|
311
|
+
|
|
312
|
+
apply_oauth_tokens(
|
|
313
|
+
access_token: payload[:access_token],
|
|
314
|
+
refresh_token: payload[:refresh_token],
|
|
315
|
+
expires_at: payload[:expires_at]
|
|
316
|
+
)
|
|
317
|
+
end
|
|
318
|
+
|
|
208
319
|
def parse_error_response(response)
|
|
209
320
|
JSON.parse(response.body)
|
|
210
321
|
rescue JSON::ParserError
|
|
211
322
|
{ 'error' => 'unknown', 'message' => response.body }
|
|
212
323
|
end
|
|
324
|
+
|
|
325
|
+
def oauth_pkce_authorization_params(code_verifier)
|
|
326
|
+
return {} unless oauth_uses_pkce?
|
|
327
|
+
|
|
328
|
+
{
|
|
329
|
+
'code_challenge' => oauth_pkce_code_challenge(code_verifier),
|
|
330
|
+
'code_challenge_method' => oauth_pkce_code_challenge_method
|
|
331
|
+
}
|
|
332
|
+
end
|
|
333
|
+
|
|
334
|
+
def oauth_pkce_token_params(code_verifier)
|
|
335
|
+
return {} unless oauth_uses_pkce?
|
|
336
|
+
|
|
337
|
+
{ code_verifier: code_verifier }
|
|
338
|
+
end
|
|
339
|
+
|
|
340
|
+
def oauth_pkce_code_challenge(code_verifier)
|
|
341
|
+
case oauth_pkce_code_challenge_method
|
|
342
|
+
when 'S256'
|
|
343
|
+
Base64.urlsafe_encode64(Digest::SHA256.digest(code_verifier), padding: false)
|
|
344
|
+
when 'plain'
|
|
345
|
+
code_verifier
|
|
346
|
+
else
|
|
347
|
+
raise ConfigError, "Unsupported PKCE code challenge method: #{oauth_pkce_code_challenge_method}"
|
|
348
|
+
end
|
|
349
|
+
end
|
|
213
350
|
end
|
|
214
351
|
end
|
|
215
352
|
end
|
|
@@ -221,7 +221,13 @@ module DWH
|
|
|
221
221
|
super
|
|
222
222
|
require 'pg'
|
|
223
223
|
rescue LoadError
|
|
224
|
-
raise ConfigError,
|
|
224
|
+
raise ConfigError, <<~MSG
|
|
225
|
+
PostgreSQL adapter requires the 'pg' gem.
|
|
226
|
+
|
|
227
|
+
Install with: gem install pg
|
|
228
|
+
|
|
229
|
+
System libraries: https://www.postgresql.org/download/
|
|
230
|
+
MSG
|
|
225
231
|
end
|
|
226
232
|
|
|
227
233
|
private
|
|
@@ -39,7 +39,7 @@ module DWH
|
|
|
39
39
|
# account_identifier: 'myorg-myaccount.us-east-1',
|
|
40
40
|
# oauth_client_id: '<YOUR_CLIENT_ID>',
|
|
41
41
|
# oauth_client_secret: '<YOUR_CLIENT_SECRET>',
|
|
42
|
-
#
|
|
42
|
+
# oauth_redirect_uri: 'https://localhost:3030/some/path',
|
|
43
43
|
# database: 'ANALYTICS'
|
|
44
44
|
# })
|
|
45
45
|
#
|
|
@@ -234,7 +234,13 @@ module DWH
|
|
|
234
234
|
super
|
|
235
235
|
require 'tiny_tds'
|
|
236
236
|
rescue LoadError
|
|
237
|
-
raise ConfigError,
|
|
237
|
+
raise ConfigError, <<~MSG
|
|
238
|
+
SQL Server adapter requires the 'tiny_tds' gem.
|
|
239
|
+
|
|
240
|
+
Install with: gem install tiny_tds
|
|
241
|
+
|
|
242
|
+
System libraries (FreeTDS): https://www.freetds.org/
|
|
243
|
+
MSG
|
|
238
244
|
end
|
|
239
245
|
|
|
240
246
|
private
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
require 'time'
|
|
2
|
+
|
|
3
|
+
module DWH
|
|
4
|
+
module Adapters
|
|
5
|
+
# TokenManageable hold the logic to load, store and delete tokens from the token store.
|
|
6
|
+
module TokenManageable
|
|
7
|
+
def token_store
|
|
8
|
+
config[:token_store]
|
|
9
|
+
end
|
|
10
|
+
|
|
11
|
+
def load_tokens_from_store
|
|
12
|
+
return nil unless token_store.respond_to?(:load)
|
|
13
|
+
|
|
14
|
+
payload = token_store.load
|
|
15
|
+
normalize_token_payload(payload)
|
|
16
|
+
rescue StandardError => e
|
|
17
|
+
logger.warn("Failed loading token from token_store: #{e.message}")
|
|
18
|
+
nil
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def store_tokens_in_store(token_payload)
|
|
22
|
+
return unless token_store.respond_to?(:store)
|
|
23
|
+
|
|
24
|
+
token_store.store(normalize_token_payload_for_store(token_payload))
|
|
25
|
+
rescue StandardError => e
|
|
26
|
+
logger.warn("Failed storing token in token_store: #{e.message}")
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
def delete_tokens_from_store
|
|
30
|
+
return unless token_store.respond_to?(:delete)
|
|
31
|
+
|
|
32
|
+
token_store.delete
|
|
33
|
+
rescue StandardError => e
|
|
34
|
+
logger.warn("Failed deleting token from token_store: #{e.message}")
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
private
|
|
38
|
+
|
|
39
|
+
def normalize_token_payload(payload)
|
|
40
|
+
return nil unless payload.is_a?(Hash)
|
|
41
|
+
|
|
42
|
+
data = payload.transform_keys(&:to_sym)
|
|
43
|
+
access_token = data[:access_token]
|
|
44
|
+
access_token = nil if access_token.to_s.strip == ''
|
|
45
|
+
|
|
46
|
+
refresh_token = data[:refresh_token]
|
|
47
|
+
refresh_token = nil if refresh_token.respond_to?(:empty?) && refresh_token.empty?
|
|
48
|
+
return nil if access_token.nil? && refresh_token.nil?
|
|
49
|
+
|
|
50
|
+
{
|
|
51
|
+
access_token: access_token&.to_s,
|
|
52
|
+
refresh_token: refresh_token,
|
|
53
|
+
expires_at: parse_token_expiry(data[:expires_at])
|
|
54
|
+
}
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
def normalize_token_payload_for_store(payload)
|
|
58
|
+
data = payload.transform_keys(&:to_sym)
|
|
59
|
+
cleaned = {
|
|
60
|
+
access_token: data[:access_token]&.to_s,
|
|
61
|
+
refresh_token: data[:refresh_token],
|
|
62
|
+
expires_at: parse_token_expiry(data[:expires_at]),
|
|
63
|
+
token_type: data[:token_type],
|
|
64
|
+
scope: data[:scope],
|
|
65
|
+
issued_at: parse_token_expiry(data[:issued_at]),
|
|
66
|
+
raw: data[:raw]
|
|
67
|
+
}
|
|
68
|
+
cleaned.reject { |_k, v| v.nil? }
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
def parse_token_expiry(value)
|
|
72
|
+
return value if value.is_a?(Time)
|
|
73
|
+
return nil if value.nil? || value.to_s.strip == ''
|
|
74
|
+
|
|
75
|
+
Time.parse(value.to_s)
|
|
76
|
+
rescue StandardError
|
|
77
|
+
nil
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
end
|
|
81
|
+
end
|
data/lib/dwh/adapters/trino.rb
CHANGED
|
@@ -194,7 +194,13 @@ module DWH
|
|
|
194
194
|
super
|
|
195
195
|
require 'trino-client'
|
|
196
196
|
rescue LoadError
|
|
197
|
-
raise ConfigError,
|
|
197
|
+
raise ConfigError, <<~MSG
|
|
198
|
+
Trino adapter requires the 'trino-client' gem.
|
|
199
|
+
|
|
200
|
+
Install with: gem install trino-client
|
|
201
|
+
|
|
202
|
+
No system libraries required (pure Ruby).
|
|
203
|
+
MSG
|
|
198
204
|
end
|
|
199
205
|
|
|
200
206
|
private
|
data/lib/dwh/adapters.rb
CHANGED
|
@@ -79,6 +79,13 @@ module DWH
|
|
|
79
79
|
# @return [Hash] the actual instance configuration
|
|
80
80
|
attr_reader :config
|
|
81
81
|
|
|
82
|
+
# Optional host-implemented token store for OAuth token reuse.
|
|
83
|
+
# That should implement the following methods:
|
|
84
|
+
# - load -> Hash|nil
|
|
85
|
+
# - store(token_hash)
|
|
86
|
+
# - delete
|
|
87
|
+
config :token_store, Object, required: false, default: nil, message: 'Token store instance implementing load/store/delete'
|
|
88
|
+
|
|
82
89
|
def initialize(config)
|
|
83
90
|
@config = config.transform_keys(&:to_sym)
|
|
84
91
|
# Per instance customization of general settings
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
module DWH
|
|
2
|
+
# Optional contract for host applications that want token persistence.
|
|
3
|
+
#
|
|
4
|
+
# The store instance should be identity-bound before it is passed into
|
|
5
|
+
# adapter config so the adapter remains unaware of user/datasource identity.
|
|
6
|
+
#
|
|
7
|
+
# This class is intentionally minimal and can be subclassed or duck-typed.
|
|
8
|
+
class TokenStore
|
|
9
|
+
# @return [Hash,nil] token payload
|
|
10
|
+
def load
|
|
11
|
+
raise NotImplementedError, "#{self.class} must implement ##{__method__}"
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
# @param token [Hash] normalized payload with at least access_token and expires_at
|
|
15
|
+
def store(_token)
|
|
16
|
+
raise NotImplementedError, "#{self.class} must implement ##{__method__}"
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
# Remove/revoke persisted token state.
|
|
20
|
+
def delete
|
|
21
|
+
raise NotImplementedError, "#{self.class} must implement ##{__method__}"
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
end
|
data/lib/dwh/version.rb
CHANGED
data/lib/dwh.rb
CHANGED
|
@@ -5,6 +5,7 @@ require_relative 'dwh/errors'
|
|
|
5
5
|
require_relative 'dwh/logger'
|
|
6
6
|
require_relative 'dwh/streaming_stats'
|
|
7
7
|
require_relative 'dwh/factory'
|
|
8
|
+
require_relative 'dwh/token_store'
|
|
8
9
|
require_relative 'dwh/adapters'
|
|
9
10
|
require_relative 'dwh/table'
|
|
10
11
|
require_relative 'dwh/table_stats'
|
|
@@ -18,6 +19,7 @@ require_relative 'dwh/adapters/duck_db'
|
|
|
18
19
|
require_relative 'dwh/adapters/sqlite'
|
|
19
20
|
require_relative 'dwh/adapters/athena'
|
|
20
21
|
require_relative 'dwh/adapters/redshift'
|
|
22
|
+
require_relative 'dwh/adapters/databricks'
|
|
21
23
|
|
|
22
24
|
# DWH encapsulates the full functionality of this gem.
|
|
23
25
|
#
|
|
@@ -49,6 +51,7 @@ module DWH
|
|
|
49
51
|
register(:sqlite, Adapters::Sqlite)
|
|
50
52
|
register(:athena, Adapters::Athena)
|
|
51
53
|
register(:redshift, Adapters::Redshift)
|
|
54
|
+
register(:databricks, Adapters::Databricks)
|
|
52
55
|
|
|
53
56
|
# start_reaper
|
|
54
57
|
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: dwh
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.4.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Ajo Abraham
|
|
@@ -154,6 +154,7 @@ files:
|
|
|
154
154
|
- lib/dwh.rb
|
|
155
155
|
- lib/dwh/adapters.rb
|
|
156
156
|
- lib/dwh/adapters/athena.rb
|
|
157
|
+
- lib/dwh/adapters/databricks.rb
|
|
157
158
|
- lib/dwh/adapters/druid.rb
|
|
158
159
|
- lib/dwh/adapters/duck_db.rb
|
|
159
160
|
- lib/dwh/adapters/my_sql.rb
|
|
@@ -163,6 +164,7 @@ files:
|
|
|
163
164
|
- lib/dwh/adapters/snowflake.rb
|
|
164
165
|
- lib/dwh/adapters/sql_server.rb
|
|
165
166
|
- lib/dwh/adapters/sqlite.rb
|
|
167
|
+
- lib/dwh/adapters/token_manageable.rb
|
|
166
168
|
- lib/dwh/adapters/trino.rb
|
|
167
169
|
- lib/dwh/behaviors.rb
|
|
168
170
|
- lib/dwh/capabilities.rb
|
|
@@ -191,6 +193,7 @@ files:
|
|
|
191
193
|
- lib/dwh/streaming_stats.rb
|
|
192
194
|
- lib/dwh/table.rb
|
|
193
195
|
- lib/dwh/table_stats.rb
|
|
196
|
+
- lib/dwh/token_store.rb
|
|
194
197
|
- lib/dwh/version.rb
|
|
195
198
|
- sig/dwh.rbs
|
|
196
199
|
homepage: https://www.strata.site
|