ducklake 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +3 -0
- data/README.md +330 -0
- data/lib/ducklake/client.rb +430 -0
- data/lib/ducklake/result.rb +22 -0
- data/lib/ducklake/version.rb +3 -0
- data/lib/ducklake.rb +19 -0
- metadata +58 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 9e31c96f2bcf9728685f6eb72f3cdfad44174eb28347dbe549391703f2795414
|
4
|
+
data.tar.gz: 63befb2ceaad4c0587b6ef10672e687aa2c51cdc78c95451c85599595d9e83d7
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 1a62d39d7962cbdd8b60a49ba0f7a20d47bbf2a3a18061ae59330cab07b67d91d7dab533487da99cdfbfc3d2b1d7cceaf407bfdd471ffe2bee6d84d2f5567413
|
7
|
+
data.tar.gz: 7032e761c77d93463beb4fb650e17fba8a8e3d54554ca5debc9f14a1625a18806c9167f5719cd7bab7079b068adae151b53a404600ac09b02eb1974cfe90c06e
|
data/CHANGELOG.md
ADDED
data/README.md
ADDED
@@ -0,0 +1,330 @@
|
|
1
|
+
# DuckLake Ruby
|
2
|
+
|
3
|
+
:fire: [DuckLake](https://ducklake.select/) for Ruby
|
4
|
+
|
5
|
+
Run your own data lake with a SQL database and file/object storage
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
DuckLake::Client.new(
|
9
|
+
catalog_url: "postgres://user:pass@host:5432/db",
|
10
|
+
storage_url: "s3://my-bucket/"
|
11
|
+
)
|
12
|
+
```
|
13
|
+
|
14
|
+
[Learn more](https://duckdb.org/2025/05/27/ducklake.html)
|
15
|
+
|
16
|
+
Note: DuckLake is [not considered production-ready](https://ducklake.select/faq#is-ducklake-production-ready) at the moment
|
17
|
+
|
18
|
+
[](https://github.com/ankane/ducklake-ruby/actions)
|
19
|
+
|
20
|
+
## Installation
|
21
|
+
|
22
|
+
First, install libduckdb. For Homebrew, use:
|
23
|
+
|
24
|
+
```sh
|
25
|
+
brew install duckdb
|
26
|
+
```
|
27
|
+
|
28
|
+
Then add this line to your application’s Gemfile:
|
29
|
+
|
30
|
+
```ruby
|
31
|
+
gem "ducklake"
|
32
|
+
```
|
33
|
+
|
34
|
+
## Getting Started
|
35
|
+
|
36
|
+
Create a client - this one stores everything locally
|
37
|
+
|
38
|
+
```ruby
|
39
|
+
ducklake =
|
40
|
+
DuckLake::Client.new(
|
41
|
+
catalog_url: "sqlite:///ducklake.sqlite",
|
42
|
+
storage_url: "data_files/",
|
43
|
+
create_if_not_exists: true
|
44
|
+
)
|
45
|
+
```
|
46
|
+
|
47
|
+
Create a table
|
48
|
+
|
49
|
+
```ruby
|
50
|
+
ducklake.sql("CREATE TABLE events (id bigint, name text)")
|
51
|
+
```
|
52
|
+
|
53
|
+
Load data from a file
|
54
|
+
|
55
|
+
```ruby
|
56
|
+
ducklake.sql("COPY events FROM 'data.csv'")
|
57
|
+
```
|
58
|
+
|
59
|
+
Confirm a new Parquet file was added to the data lake
|
60
|
+
|
61
|
+
```ruby
|
62
|
+
ducklake.list_files("events")
|
63
|
+
```
|
64
|
+
|
65
|
+
Query the data
|
66
|
+
|
67
|
+
```ruby
|
68
|
+
ducklake.sql("SELECT COUNT(*) FROM events").to_a
|
69
|
+
```
|
70
|
+
|
71
|
+
## Catalog Database
|
72
|
+
|
73
|
+
Catalog information can be stored in:
|
74
|
+
|
75
|
+
- Postgres: `postgres://user@pass@host:5432/dbname`
|
76
|
+
- SQLite: `sqlite:///path/to/dbname.sqlite`
|
77
|
+
- DuckDB: `duckdb:///path/to/dbname.duckdb`
|
78
|
+
|
79
|
+
Note: MySQL and MariaDB are not currently supported due to [duckdb/ducklake#70](https://github.com/duckdb/ducklake/issues/70) and [duckdb/ducklake#210](https://github.com/duckdb/ducklake/issues/210)
|
80
|
+
|
81
|
+
There are two ways to set up the schema:
|
82
|
+
|
83
|
+
1. Run [this script](https://ducklake.select/docs/stable/specification/tables/overview#full-schema-creation-script)
|
84
|
+
2. Configure the client to do it
|
85
|
+
|
86
|
+
```ruby
|
87
|
+
DuckLake::Client.new(create_if_not_exists: true, ...)
|
88
|
+
```
|
89
|
+
|
90
|
+
## Data Storage
|
91
|
+
|
92
|
+
Data can be stored in:
|
93
|
+
|
94
|
+
- Local files: `data_files/`
|
95
|
+
- Amazon S3: `s3://my-bucket/path/`
|
96
|
+
- [Other providers](https://ducklake.select/docs/stable/duckdb/usage/choosing_storage): todo
|
97
|
+
|
98
|
+
### Amazon S3
|
99
|
+
|
100
|
+
Credentials are detected in the standard AWS SDK locations, or you can pass them manually
|
101
|
+
|
102
|
+
```ruby
|
103
|
+
DuckLake::Client.new(
|
104
|
+
storage_options: {
|
105
|
+
aws_access_key_id: "...",
|
106
|
+
aws_secret_access_key: "...",
|
107
|
+
region: "us-east-1"
|
108
|
+
},
|
109
|
+
...
|
110
|
+
)
|
111
|
+
```
|
112
|
+
|
113
|
+
IAM permissions
|
114
|
+
|
115
|
+
- Read: `s3::ListBucket`, `s3::GetObject`
|
116
|
+
- Write: `s3::ListBucket`, `s3::PutObject`
|
117
|
+
- Maintenance: `s3::ListBucket`, `s3::GetObject`, `s3::PutObject`, `s3::DeleteObject`
|
118
|
+
|
119
|
+
## Operations
|
120
|
+
|
121
|
+
Create an empty table
|
122
|
+
|
123
|
+
```ruby
|
124
|
+
ducklake.sql("CREATE TABLE events (id bigint, name text)")
|
125
|
+
```
|
126
|
+
|
127
|
+
Or a table from a file
|
128
|
+
|
129
|
+
```ruby
|
130
|
+
ducklake.sql("CREATE TABLE events AS FROM 'data.csv'")
|
131
|
+
```
|
132
|
+
|
133
|
+
Load data from a file
|
134
|
+
|
135
|
+
```ruby
|
136
|
+
ducklake.sql("COPY events FROM 'data.csv'")
|
137
|
+
```
|
138
|
+
|
139
|
+
You can also load data directly from other [data sources](https://duckdb.org/docs/stable/data/data_sources)
|
140
|
+
|
141
|
+
```ruby
|
142
|
+
ducklake.attach("blog", "postgres://localhost:5432/blog")
|
143
|
+
ducklake.sql("INSERT INTO events SELECT * FROM blog.ahoy_events")
|
144
|
+
```
|
145
|
+
|
146
|
+
Or [register existing data files](https://ducklake.select/docs/stable/duckdb/metadata/adding_files)
|
147
|
+
|
148
|
+
```ruby
|
149
|
+
ducklake.add_data_files("events", "data.parquet")
|
150
|
+
```
|
151
|
+
|
152
|
+
Note: This transfers ownership to DuckLake, so the file can be deleted after running `cleanup_old_files`
|
153
|
+
|
154
|
+
Update data
|
155
|
+
|
156
|
+
```ruby
|
157
|
+
ducklake.sql("UPDATE events SET name = ? WHERE id = 1", ["Test", 1])
|
158
|
+
```
|
159
|
+
|
160
|
+
Delete data
|
161
|
+
|
162
|
+
```ruby
|
163
|
+
ducklake.sql("DELETE * FROM events WHERE id = ?", [1])
|
164
|
+
```
|
165
|
+
|
166
|
+
Update the schema
|
167
|
+
|
168
|
+
```ruby
|
169
|
+
ducklake.sql("ALTER TABLE events ADD COLUMN active BOOLEAN")
|
170
|
+
```
|
171
|
+
|
172
|
+
## Snapshots
|
173
|
+
|
174
|
+
Get snapshots
|
175
|
+
|
176
|
+
```ruby
|
177
|
+
ducklake.snapshots
|
178
|
+
```
|
179
|
+
|
180
|
+
Query the data at a specific snapshot version or time
|
181
|
+
|
182
|
+
```ruby
|
183
|
+
ducklake.sql("SELECT * FROM events AT (VERSION => ?)", [3])
|
184
|
+
#
|
185
|
+
ducklake.sql("SELECT * FROM events AT (TIMESTAMP => ?)", [Date.today - 7])
|
186
|
+
```
|
187
|
+
|
188
|
+
You can also specify a snapshot when creating the client
|
189
|
+
|
190
|
+
```ruby
|
191
|
+
DuckLake::Client.new(snapshot_version: 3, ...)
|
192
|
+
# or
|
193
|
+
DuckLake::Client.new(snapshot_time: Date.today - 7, ...)
|
194
|
+
```
|
195
|
+
|
196
|
+
## Maintenance
|
197
|
+
|
198
|
+
Merge files
|
199
|
+
|
200
|
+
```ruby
|
201
|
+
ducklake.merge_adjacent_files
|
202
|
+
```
|
203
|
+
|
204
|
+
Expire snapshots
|
205
|
+
|
206
|
+
```ruby
|
207
|
+
ducklake.expire_snapshots(older_than: Date.today - 7)
|
208
|
+
```
|
209
|
+
|
210
|
+
Clean up old files
|
211
|
+
|
212
|
+
```ruby
|
213
|
+
ducklake.cleanup_old_files(older_than: Date.today - 7)
|
214
|
+
```
|
215
|
+
|
216
|
+
## Configuration
|
217
|
+
|
218
|
+
Get [options](https://ducklake.select/docs/stable/duckdb/usage/configuration)
|
219
|
+
|
220
|
+
```ruby
|
221
|
+
ducklake.options
|
222
|
+
```
|
223
|
+
|
224
|
+
Set an option globally
|
225
|
+
|
226
|
+
```ruby
|
227
|
+
ducklake.set_option("parquet_compression", "zstd")
|
228
|
+
```
|
229
|
+
|
230
|
+
Or for a specific table
|
231
|
+
|
232
|
+
```ruby
|
233
|
+
ducklake.set_option("parquet_compression", "zstd", table_name: "events")
|
234
|
+
```
|
235
|
+
|
236
|
+
## Security
|
237
|
+
|
238
|
+
See [best practices](https://duckdb.org/docs/stable/operations_manual/securing_duckdb/overview.html) for DuckDB security.
|
239
|
+
|
240
|
+
Grant minimal permissions for the catalog database and data storage.
|
241
|
+
|
242
|
+
### External Access
|
243
|
+
|
244
|
+
[Restrict external access](https://duckdb.org/docs/stable/operations_manual/securing_duckdb/overview.html#restricting-file-access) to the DuckDB engine
|
245
|
+
|
246
|
+
```ruby
|
247
|
+
ducklake.disable_external_access
|
248
|
+
```
|
249
|
+
|
250
|
+
Allow specific directories and paths
|
251
|
+
|
252
|
+
```ruby
|
253
|
+
ducklake.disable_external_access(
|
254
|
+
allowed_directories: ["/path/to/directory"],
|
255
|
+
allowed_paths: ["/path/to/file.txt"]
|
256
|
+
)
|
257
|
+
```
|
258
|
+
|
259
|
+
The storage URL is automatically included in `allowed_directories`
|
260
|
+
|
261
|
+
## Reference
|
262
|
+
|
263
|
+
Get table info
|
264
|
+
|
265
|
+
```ruby
|
266
|
+
ducklake.table_info
|
267
|
+
```
|
268
|
+
|
269
|
+
Get column info
|
270
|
+
|
271
|
+
```ruby
|
272
|
+
ducklake.column_info("events")
|
273
|
+
```
|
274
|
+
|
275
|
+
Drop a table
|
276
|
+
|
277
|
+
```ruby
|
278
|
+
ducklake.drop_table("events")
|
279
|
+
# or
|
280
|
+
ducklake.drop_table("events", if_exists: true)
|
281
|
+
```
|
282
|
+
|
283
|
+
List files
|
284
|
+
|
285
|
+
```ruby
|
286
|
+
ducklake.list_files("events")
|
287
|
+
```
|
288
|
+
|
289
|
+
List files at a specific snapshot version or time
|
290
|
+
|
291
|
+
```ruby
|
292
|
+
ducklake.list_files("events", snapshot_version: 3)
|
293
|
+
# or
|
294
|
+
ducklake.list_files("events", snapshot_time: Date.today - 7)
|
295
|
+
```
|
296
|
+
|
297
|
+
## History
|
298
|
+
|
299
|
+
View the [changelog](https://github.com/ankane/ducklake-ruby/blob/master/CHANGELOG.md)
|
300
|
+
|
301
|
+
## Contributing
|
302
|
+
|
303
|
+
Everyone is encouraged to help improve this project. Here are a few ways you can help:
|
304
|
+
|
305
|
+
- [Report bugs](https://github.com/ankane/ducklake-ruby/issues)
|
306
|
+
- Fix bugs and [submit pull requests](https://github.com/ankane/ducklake-ruby/pulls)
|
307
|
+
- Write, clarify, or fix documentation
|
308
|
+
- Suggest or add new features
|
309
|
+
|
310
|
+
To get started with development:
|
311
|
+
|
312
|
+
```sh
|
313
|
+
git clone https://github.com/ankane/ducklake-ruby.git
|
314
|
+
cd ducklake-ruby
|
315
|
+
bundle install
|
316
|
+
|
317
|
+
# Postgres
|
318
|
+
createdb ducklake_ruby_test
|
319
|
+
bundle exec rake test:postgres
|
320
|
+
|
321
|
+
# MySQL and MariaDB
|
322
|
+
mysqladmin create ducklake_ruby_test
|
323
|
+
bundle exec rake test:mysql
|
324
|
+
|
325
|
+
# SQLite
|
326
|
+
bundle exec rake test:sqlite
|
327
|
+
|
328
|
+
# DuckDB
|
329
|
+
bundle exec rake test:duckdb
|
330
|
+
```
|
@@ -0,0 +1,430 @@
|
|
1
|
+
module DuckLake
|
2
|
+
class Client
|
3
|
+
def initialize(
|
4
|
+
catalog_url:,
|
5
|
+
storage_url:,
|
6
|
+
storage_options: {},
|
7
|
+
snapshot_version: nil,
|
8
|
+
snapshot_time: nil,
|
9
|
+
data_inlining_row_limit: 0,
|
10
|
+
create_if_not_exists: false,
|
11
|
+
_read_only: false # experimental
|
12
|
+
)
|
13
|
+
catalog_uri = URI.parse(catalog_url)
|
14
|
+
storage_uri = URI.parse(storage_url)
|
15
|
+
|
16
|
+
extension = nil
|
17
|
+
case catalog_uri.scheme
|
18
|
+
when "postgres", "postgresql"
|
19
|
+
extension = "postgres"
|
20
|
+
attach = "postgres:#{catalog_uri}"
|
21
|
+
when "mysql", "mariadb"
|
22
|
+
extension = "mysql"
|
23
|
+
attach = "mysql:#{catalog_uri}"
|
24
|
+
when "sqlite"
|
25
|
+
extension = "sqlite"
|
26
|
+
attach = "sqlite:#{catalog_path(catalog_uri)}"
|
27
|
+
when "duckdb"
|
28
|
+
attach = "duckdb:#{catalog_path(catalog_uri)}"
|
29
|
+
else
|
30
|
+
raise ArgumentError, "Unsupported catalog type: #{catalog_uri.scheme}"
|
31
|
+
end
|
32
|
+
|
33
|
+
secret_options = nil
|
34
|
+
storage_options = storage_options.dup
|
35
|
+
|
36
|
+
case storage_uri.scheme
|
37
|
+
when "s3"
|
38
|
+
# https://duckdb.org/docs/stable/core_extensions/httpfs/s3api.html
|
39
|
+
key_id = storage_options.delete(:aws_access_key_id)
|
40
|
+
secret = storage_options.delete(:aws_secret_access_key)
|
41
|
+
region = storage_options.delete(:region)
|
42
|
+
|
43
|
+
secret_options = {
|
44
|
+
type: "s3",
|
45
|
+
provider: "credential_chain"
|
46
|
+
}
|
47
|
+
secret_options[:key_id] = key_id if key_id
|
48
|
+
secret_options[:secret] = secret if secret
|
49
|
+
secret_options[:region] = region if region
|
50
|
+
end
|
51
|
+
|
52
|
+
if storage_options.any?
|
53
|
+
raise ArgumentError, "Unsupported #{storage_uri.scheme || "file"} storage options: #{storage_options.keys.map(&:inspect).join(", ")}"
|
54
|
+
end
|
55
|
+
|
56
|
+
attach_options = {data_path: storage_url}
|
57
|
+
attach_options[:read_only] = true if _read_only
|
58
|
+
attach_options[:snapshot_version] = snapshot_version if !snapshot_version.nil?
|
59
|
+
attach_options[:snapshot_time] = snapshot_time if !snapshot_time.nil?
|
60
|
+
attach_options[:data_inlining_row_limit] = data_inlining_row_limit if data_inlining_row_limit > 0
|
61
|
+
attach_options[:create_if_not_exists] = false unless create_if_not_exists
|
62
|
+
|
63
|
+
@catalog = "ducklake"
|
64
|
+
@storage_url = storage_url
|
65
|
+
|
66
|
+
if _read_only
|
67
|
+
config = DuckDB::Config.new
|
68
|
+
config["access_mode"] = "READ_ONLY"
|
69
|
+
|
70
|
+
# make the entire database read-only, not just DuckLake
|
71
|
+
# read-only mode can only be set when the database is opened
|
72
|
+
# and cannot be used on in-memory database, so create a temporary one
|
73
|
+
@tmpdir = Dir.mktmpdir
|
74
|
+
ObjectSpace.define_finalizer(@tmpdir, self.class.finalize(@tmpdir.dup))
|
75
|
+
dbpath = File.join(@tmpdir, "memory.duckdb")
|
76
|
+
DuckDB::Database.open(dbpath) { }
|
77
|
+
|
78
|
+
@db = DuckDB::Database.open(dbpath, config)
|
79
|
+
else
|
80
|
+
@db = DuckDB::Database.open
|
81
|
+
end
|
82
|
+
|
83
|
+
@conn = @db.connect
|
84
|
+
|
85
|
+
install_extension("ducklake")
|
86
|
+
install_extension(extension) if extension
|
87
|
+
create_secret(secret_options) if secret_options
|
88
|
+
attach_with_options(@catalog, "ducklake:#{attach}", attach_options)
|
89
|
+
execute("USE #{quote_identifier(@catalog)}")
|
90
|
+
detach("memory")
|
91
|
+
end
|
92
|
+
|
93
|
+
# https://duckdb.org/docs/stable/operations_manual/securing_duckdb/overview.html#restricting-file-access
|
94
|
+
def disable_external_access(allowed_directories: [], allowed_paths: [])
|
95
|
+
allowed_directories += [@storage_url]
|
96
|
+
execute("SET allowed_directories = #{quote_array(allowed_directories)}")
|
97
|
+
execute("SET allowed_paths = #{quote_array(allowed_paths)}")
|
98
|
+
execute("SET enable_external_access = false")
|
99
|
+
nil
|
100
|
+
end
|
101
|
+
|
102
|
+
def sql(sql, params = [])
|
103
|
+
execute(sql, params)
|
104
|
+
end
|
105
|
+
|
106
|
+
def attach(alias_, url)
|
107
|
+
type = nil
|
108
|
+
extension = nil
|
109
|
+
|
110
|
+
uri = URI.parse(url)
|
111
|
+
case uri.scheme
|
112
|
+
when "postgres", "postgresql"
|
113
|
+
type = "postgres"
|
114
|
+
extension = "postgres"
|
115
|
+
else
|
116
|
+
raise ArgumentError, "Unsupported data source type: #{uri.scheme}"
|
117
|
+
end
|
118
|
+
|
119
|
+
install_extension(extension) if extension
|
120
|
+
|
121
|
+
options = {
|
122
|
+
type: type,
|
123
|
+
read_only: true
|
124
|
+
}
|
125
|
+
attach_with_options(alias_, url, options)
|
126
|
+
end
|
127
|
+
|
128
|
+
def detach(alias_)
|
129
|
+
execute("DETACH #{quote_identifier(alias_)}")
|
130
|
+
nil
|
131
|
+
end
|
132
|
+
|
133
|
+
def table_info
|
134
|
+
symbolize_keys execute("SELECT * FROM ducklake_table_info(?)", [@catalog])
|
135
|
+
end
|
136
|
+
|
137
|
+
def column_info(table)
|
138
|
+
sql = <<~SQL
|
139
|
+
SELECT column_name AS name, LOWER(data_type) AS type
|
140
|
+
FROM information_schema.columns
|
141
|
+
WHERE table_catalog = ? AND table_schema = ? AND table_name = ?
|
142
|
+
ORDER BY ordinal_position
|
143
|
+
SQL
|
144
|
+
result = execute(sql, [@catalog, "main", table])
|
145
|
+
if result.empty?
|
146
|
+
raise CatalogError, "Table does not exist!"
|
147
|
+
end
|
148
|
+
symbolize_keys result
|
149
|
+
end
|
150
|
+
|
151
|
+
# TODO more DDL methods?
|
152
|
+
def drop_table(table, if_exists: nil)
|
153
|
+
execute("DROP TABLE#{" IF EXISTS" if if_exists} #{quote_identifier(table)}")
|
154
|
+
nil
|
155
|
+
end
|
156
|
+
|
157
|
+
# https://ducklake.select/docs/stable/duckdb/usage/snapshots
|
158
|
+
def snapshots
|
159
|
+
symbolize_keys execute("SELECT * FROM ducklake_snapshots(?)", [@catalog])
|
160
|
+
end
|
161
|
+
|
162
|
+
# https://ducklake.select/docs/stable/duckdb/usage/configuration
|
163
|
+
def options
|
164
|
+
symbolize_keys execute("SELECT * FROM ducklake_options(?)", [@catalog])
|
165
|
+
end
|
166
|
+
|
167
|
+
# https://ducklake.select/docs/stable/duckdb/usage/configuration
|
168
|
+
def set_option(name, value, table_name: nil)
|
169
|
+
args = ["?", "?", "?"]
|
170
|
+
params = [@catalog, name, value]
|
171
|
+
|
172
|
+
if !table_name.nil?
|
173
|
+
args << "table_name => ?"
|
174
|
+
params << table_name
|
175
|
+
end
|
176
|
+
|
177
|
+
execute("CALL ducklake_set_option(#{args.join(", ")})", params)
|
178
|
+
nil
|
179
|
+
end
|
180
|
+
|
181
|
+
def format_version
|
182
|
+
execute("SELECT value FROM ducklake_options(?) WHERE option_name = ?", [@catalog, "version"]).first["value"]
|
183
|
+
end
|
184
|
+
|
185
|
+
# https://ducklake.select/docs/stable/duckdb/maintenance/merge_adjacent_files
|
186
|
+
def merge_adjacent_files
|
187
|
+
execute("CALL merge_adjacent_files()")
|
188
|
+
nil
|
189
|
+
end
|
190
|
+
|
191
|
+
# https://ducklake.select/docs/stable/duckdb/maintenance/expire_snapshots
|
192
|
+
def expire_snapshots(versions: nil, older_than: nil, dry_run: false)
|
193
|
+
args = ["?"]
|
194
|
+
params = [@catalog]
|
195
|
+
|
196
|
+
if !versions.nil?
|
197
|
+
# inline since duckdb gem does not support array params
|
198
|
+
args << "versions => #{quote_array(versions)}"
|
199
|
+
end
|
200
|
+
|
201
|
+
if !older_than.nil?
|
202
|
+
args << "older_than => ?"
|
203
|
+
params << older_than
|
204
|
+
end
|
205
|
+
|
206
|
+
if dry_run
|
207
|
+
args << "dry_run => ?"
|
208
|
+
params << dry_run
|
209
|
+
end
|
210
|
+
|
211
|
+
symbolize_keys execute("CALL ducklake_expire_snapshots(#{args.join(", ")})", params)
|
212
|
+
end
|
213
|
+
|
214
|
+
# https://ducklake.select/docs/stable/duckdb/maintenance/cleanup_old_files
|
215
|
+
def cleanup_old_files(cleanup_all: false, older_than: nil, dry_run: false)
|
216
|
+
args = ["?"]
|
217
|
+
params = [@catalog]
|
218
|
+
|
219
|
+
if cleanup_all
|
220
|
+
args << "cleanup_all => ?"
|
221
|
+
params << cleanup_all
|
222
|
+
end
|
223
|
+
|
224
|
+
if !older_than.nil?
|
225
|
+
args << "older_than => ?"
|
226
|
+
params << older_than
|
227
|
+
end
|
228
|
+
|
229
|
+
if dry_run
|
230
|
+
args << "dry_run => ?"
|
231
|
+
params << dry_run
|
232
|
+
end
|
233
|
+
|
234
|
+
symbolize_keys execute("CALL ducklake_cleanup_old_files(#{args.join(", ")})", params)
|
235
|
+
end
|
236
|
+
|
237
|
+
# https://ducklake.select/docs/stable/duckdb/advanced_features/data_inlining
|
238
|
+
def flush_inlined_data(table_name: nil)
|
239
|
+
args = ["?"]
|
240
|
+
params = [@catalog]
|
241
|
+
|
242
|
+
if !table_name.nil?
|
243
|
+
args << "table_name => ?"
|
244
|
+
params << table_name
|
245
|
+
end
|
246
|
+
|
247
|
+
symbolize_keys execute("CALL ducklake_flush_inlined_data(#{args.join(", ")})", params)
|
248
|
+
end
|
249
|
+
|
250
|
+
# https://ducklake.select/docs/stable/duckdb/metadata/list_files
|
251
|
+
def list_files(table, snapshot_version: nil, snapshot_time: nil)
|
252
|
+
args = ["?", "?"]
|
253
|
+
params = [@catalog, table]
|
254
|
+
|
255
|
+
if !snapshot_version.nil?
|
256
|
+
args << "snapshot_version => ?"
|
257
|
+
params << snapshot_version
|
258
|
+
end
|
259
|
+
|
260
|
+
if !snapshot_time.nil?
|
261
|
+
snapshot_time = snapshot_time.utc if snapshot_time.is_a?(Time)
|
262
|
+
args << "snapshot_time => ?"
|
263
|
+
params << snapshot_time
|
264
|
+
end
|
265
|
+
|
266
|
+
symbolize_keys execute("SELECT * FROM ducklake_list_files(#{args.join(", ")})", params)
|
267
|
+
end
|
268
|
+
|
269
|
+
# https://ducklake.select/docs/stable/duckdb/metadata/adding_files
|
270
|
+
def add_data_files(table, data, allow_missing: nil, ignore_extra_columns: nil)
|
271
|
+
params = [@catalog, table, data]
|
272
|
+
args = ["?", "?", "?"]
|
273
|
+
|
274
|
+
if !allow_missing.nil?
|
275
|
+
args << "allow_missing => ?"
|
276
|
+
params << allow_missing
|
277
|
+
end
|
278
|
+
|
279
|
+
if !ignore_extra_columns.nil?
|
280
|
+
args << "ignore_extra_columns => ?"
|
281
|
+
params << ignore_extra_columns
|
282
|
+
end
|
283
|
+
|
284
|
+
execute("CALL ducklake_add_data_files(#{args.join(", ")})", params)
|
285
|
+
nil
|
286
|
+
end
|
287
|
+
|
288
|
+
# libduckdb does not provide function
|
289
|
+
# https://duckdb.org/docs/stable/sql/dialect/keywords_and_identifiers.html
|
290
|
+
def quote_identifier(value)
|
291
|
+
"\"#{encoded(value).gsub('"', '""')}\""
|
292
|
+
end
|
293
|
+
|
294
|
+
# libduckdb does not provide function
|
295
|
+
# TODO support more types
|
296
|
+
def quote(value)
|
297
|
+
if value.nil?
|
298
|
+
"NULL"
|
299
|
+
elsif value == true
|
300
|
+
"true"
|
301
|
+
elsif value == false
|
302
|
+
"false"
|
303
|
+
elsif defined?(BigDecimal) && value.is_a?(BigDecimal)
|
304
|
+
value.to_s("F")
|
305
|
+
elsif value.is_a?(Numeric)
|
306
|
+
value.to_s
|
307
|
+
else
|
308
|
+
if value.is_a?(Time)
|
309
|
+
value = value.utc.iso8601(9)
|
310
|
+
elsif value.is_a?(DateTime)
|
311
|
+
value = value.iso8601(9)
|
312
|
+
elsif value.is_a?(Date)
|
313
|
+
value = value.strftime("%Y-%m-%d")
|
314
|
+
end
|
315
|
+
"'#{encoded(value).gsub("'", "''")}'"
|
316
|
+
end
|
317
|
+
end
|
318
|
+
|
319
|
+
def disconnect
|
320
|
+
@conn.disconnect
|
321
|
+
@db.close
|
322
|
+
nil
|
323
|
+
end
|
324
|
+
|
325
|
+
# hide internal state
|
326
|
+
def inspect
|
327
|
+
to_s
|
328
|
+
end
|
329
|
+
|
330
|
+
def self.finalize(dir)
|
331
|
+
proc { FileUtils.remove_entry(dir) }
|
332
|
+
end
|
333
|
+
|
334
|
+
private
|
335
|
+
|
336
|
+
def execute(sql, params = [])
|
337
|
+
# use prepare instead of query to prevent multiple statements at once
|
338
|
+
result =
|
339
|
+
@conn.prepare(sql) do |stmt|
|
340
|
+
params.each_with_index do |v, i|
|
341
|
+
stmt.bind(i + 1, v)
|
342
|
+
end
|
343
|
+
stmt.execute
|
344
|
+
end
|
345
|
+
|
346
|
+
# TODO add column types
|
347
|
+
Result.new(result.columns.map(&:name), result.to_a)
|
348
|
+
rescue DuckDB::Error => e
|
349
|
+
raise map_error(e), cause: nil
|
350
|
+
end
|
351
|
+
|
352
|
+
def error_mapping
|
353
|
+
@error_mapping ||= {
|
354
|
+
"Catalog Error: " => CatalogError,
|
355
|
+
"Conversion Error: " => ConversionError,
|
356
|
+
"Invalid Input Error: " => InvalidInputError,
|
357
|
+
"IO Error: " => IOError,
|
358
|
+
"Permission Error: " => PermissionError
|
359
|
+
}
|
360
|
+
end
|
361
|
+
|
362
|
+
# not ideal to base on prefix, but do not see a better way at the moment
|
363
|
+
def map_error(e)
|
364
|
+
error_mapping.each do |prefix, cls|
|
365
|
+
if e.message&.start_with?(prefix)
|
366
|
+
return cls.new(e.message.delete_prefix(prefix))
|
367
|
+
end
|
368
|
+
end
|
369
|
+
Error.new(e.message)
|
370
|
+
end
|
371
|
+
|
372
|
+
def install_extension(extension)
|
373
|
+
execute("INSTALL #{quote_identifier(extension)}")
|
374
|
+
end
|
375
|
+
|
376
|
+
def create_secret(options)
|
377
|
+
execute("CREATE SECRET (#{options_args(options)})")
|
378
|
+
end
|
379
|
+
|
380
|
+
def attach_with_options(alias_, url, options)
|
381
|
+
execute("ATTACH #{quote(url)} AS #{quote_identifier(alias_)} (#{options_args(options)})")
|
382
|
+
end
|
383
|
+
|
384
|
+
def options_args(options)
|
385
|
+
options.map { |k, v| "#{option_name(k)} #{quote(v)}" }.join(", ")
|
386
|
+
end
|
387
|
+
|
388
|
+
def option_name(k)
|
389
|
+
name = k.to_s.upcase
|
390
|
+
# should never contain user input, but just to be safe
|
391
|
+
unless name.match?(/\A[A-Z_]+\z/)
|
392
|
+
raise "Invalid option name"
|
393
|
+
end
|
394
|
+
name
|
395
|
+
end
|
396
|
+
|
397
|
+
def symbolize_keys(result)
|
398
|
+
result.map { |v| v.transform_keys(&:to_sym) }
|
399
|
+
end
|
400
|
+
|
401
|
+
def catalog_path(uri)
|
402
|
+
# custom message for sqlite://db.sqlite
|
403
|
+
# TODO improve message
|
404
|
+
if !uri.host.empty?
|
405
|
+
raise ArgumentError, "Unexpected host in catalog_url"
|
406
|
+
end
|
407
|
+
|
408
|
+
if uri.path.length < 2 || uri.user || uri.password || uri.port || uri.query || uri.fragment
|
409
|
+
raise ArgumentError, "Invalid catalog_url"
|
410
|
+
end
|
411
|
+
|
412
|
+
uri.path[1..]
|
413
|
+
end
|
414
|
+
|
415
|
+
def quote_array(value)
|
416
|
+
"[#{value.map { |v| quote(v) }.join(", ")}]"
|
417
|
+
end
|
418
|
+
|
419
|
+
def encoded(value)
|
420
|
+
value = value.to_s if value.is_a?(Symbol)
|
421
|
+
if !value.respond_to?(:to_str)
|
422
|
+
raise TypeError, "no implicit conversion of #{value.class.name} into String"
|
423
|
+
end
|
424
|
+
if ![Encoding::UTF_8, Encoding::US_ASCII].include?(value.encoding) || !value.valid_encoding?
|
425
|
+
raise ArgumentError, "Unsupported encoding"
|
426
|
+
end
|
427
|
+
value
|
428
|
+
end
|
429
|
+
end
|
430
|
+
end
|
@@ -0,0 +1,22 @@
|
|
1
|
+
module DuckLake
|
2
|
+
class Result
|
3
|
+
include Enumerable
|
4
|
+
|
5
|
+
attr_reader :columns, :rows
|
6
|
+
|
7
|
+
def initialize(columns, rows)
|
8
|
+
@columns = columns
|
9
|
+
@rows = rows
|
10
|
+
end
|
11
|
+
|
12
|
+
def each
|
13
|
+
@rows.each do |row|
|
14
|
+
yield @columns.zip(row).to_h
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
def empty?
|
19
|
+
rows.empty?
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
data/lib/ducklake.rb
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
# dependencies
|
2
|
+
require "duckdb"
|
3
|
+
|
4
|
+
# stdlib
|
5
|
+
require "uri"
|
6
|
+
|
7
|
+
# modules
|
8
|
+
require_relative "ducklake/client"
|
9
|
+
require_relative "ducklake/result"
|
10
|
+
require_relative "ducklake/version"
|
11
|
+
|
12
|
+
module DuckLake
|
13
|
+
class Error < StandardError; end
|
14
|
+
class CatalogError < Error; end
|
15
|
+
class ConversionError < Error; end
|
16
|
+
class InvalidInputError < Error; end
|
17
|
+
class IOError < Error; end
|
18
|
+
class PermissionError < Error; end
|
19
|
+
end
|
metadata
ADDED
@@ -0,0 +1,58 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: ducklake
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Andrew Kane
|
8
|
+
bindir: bin
|
9
|
+
cert_chain: []
|
10
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
11
|
+
dependencies:
|
12
|
+
- !ruby/object:Gem::Dependency
|
13
|
+
name: duckdb
|
14
|
+
requirement: !ruby/object:Gem::Requirement
|
15
|
+
requirements:
|
16
|
+
- - ">="
|
17
|
+
- !ruby/object:Gem::Version
|
18
|
+
version: '0'
|
19
|
+
type: :runtime
|
20
|
+
prerelease: false
|
21
|
+
version_requirements: !ruby/object:Gem::Requirement
|
22
|
+
requirements:
|
23
|
+
- - ">="
|
24
|
+
- !ruby/object:Gem::Version
|
25
|
+
version: '0'
|
26
|
+
email: andrew@ankane.org
|
27
|
+
executables: []
|
28
|
+
extensions: []
|
29
|
+
extra_rdoc_files: []
|
30
|
+
files:
|
31
|
+
- CHANGELOG.md
|
32
|
+
- README.md
|
33
|
+
- lib/ducklake.rb
|
34
|
+
- lib/ducklake/client.rb
|
35
|
+
- lib/ducklake/result.rb
|
36
|
+
- lib/ducklake/version.rb
|
37
|
+
homepage: https://github.com/ankane/ducklake-ruby
|
38
|
+
licenses:
|
39
|
+
- MIT
|
40
|
+
metadata: {}
|
41
|
+
rdoc_options: []
|
42
|
+
require_paths:
|
43
|
+
- lib
|
44
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
45
|
+
requirements:
|
46
|
+
- - ">="
|
47
|
+
- !ruby/object:Gem::Version
|
48
|
+
version: '3.2'
|
49
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - ">="
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: '0'
|
54
|
+
requirements: []
|
55
|
+
rubygems_version: 3.6.9
|
56
|
+
specification_version: 4
|
57
|
+
summary: DuckLake for Ruby
|
58
|
+
test_files: []
|