activerecord-adbc-adapter 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/LICENSE.txt +20 -0
- data/README.md +223 -0
- data/lib/activerecord-adbc-adapter.rb +10 -0
- data/lib/activerecord_adbc_adapter/adapter.rb +209 -0
- data/lib/activerecord_adbc_adapter/column.rb +4 -0
- data/lib/activerecord_adbc_adapter/database_statements.rb +68 -0
- data/lib/activerecord_adbc_adapter/ingest.rb +13 -0
- data/lib/activerecord_adbc_adapter/quoting.rb +19 -0
- data/lib/activerecord_adbc_adapter/relation_arrowable.rb +18 -0
- data/lib/activerecord_adbc_adapter/result.rb +151 -0
- data/lib/activerecord_adbc_adapter/schema_creation.rb +44 -0
- data/lib/activerecord_adbc_adapter/schema_definitions.rb +8 -0
- data/lib/activerecord_adbc_adapter/schema_statements.rb +178 -0
- data/lib/activerecord_adbc_adapter/version.rb +3 -0
- metadata +84 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: c7d09e5234b5077f35cb5ea1b42067bb214e6cd41070e31160aaeab53fca4368
|
4
|
+
data.tar.gz: 21eb666112941a0998a6b5482d58a93b5e23dc83d07b5c40703ffba971b37cf3
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: b3cbdef06e5151fc373008042154a1f7134bbb346dde5322e8124796449e33a7c02b28c05723fa3c39771cf2019cab295d3c2def17a70b0cb1539bde5e702c39
|
7
|
+
data.tar.gz: f749b486c3da75fffd5f60060b9bcc5558eb03fab97c7fbdd9b94178fec31af2c398e0f03102c83ba9d7c5d8ca782363099ae88b8d2592bd049f4367ca18907d
|
data/LICENSE.txt
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright 2023 Sutou Kouhei <kou@clear-code.com>
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,223 @@
|
|
1
|
+
# Active Record ADBC adapter
|
2
|
+
|
3
|
+
## Description
|
4
|
+
|
5
|
+
Active Record ADBC adapter provides an
|
6
|
+
[ADBC](https://arrow.apache.org/adbc/) adapter for Active Record.
|
7
|
+
|
8
|
+
This adapter is optimized for extracting and loading large data
|
9
|
+
from/to DBs. The optimization is powered by Apache Arrow.
|
10
|
+
|
11
|
+
## Install
|
12
|
+
|
13
|
+
### `Gemfile`
|
14
|
+
|
15
|
+
Add
|
16
|
+
[rubygems-requirements-system](https://rubygems.org/gems/rubygems-requirements-system)
|
17
|
+
plugin and activerecord-adbc-adapter gem to your `Gemfile`:
|
18
|
+
|
19
|
+
```ruby
|
20
|
+
plugin "rubygems-requirements-system"
|
21
|
+
|
22
|
+
gem "activerecord-adbc-adapter"
|
23
|
+
```
|
24
|
+
|
25
|
+
This adapter requires ADBC libraries implemented in C++. The
|
26
|
+
rubygems-requirements-system plugin installs it into your system
|
27
|
+
automatically.
|
28
|
+
|
29
|
+
### GitHub Actions
|
30
|
+
|
31
|
+
On GitHub Actions, you should not use the [`bundler-cache:
|
32
|
+
true`](https://github.com/ruby/setup-ruby?tab=readme-ov-file#caching-bundle-install-automatically)
|
33
|
+
option. Because it only caches activerecord-adbc-adapter gem. It
|
34
|
+
doesn't cache ADBC libraries installed into your system. You need to
|
35
|
+
run `bundle install` after `actions/setup-ruby` something like the
|
36
|
+
following:
|
37
|
+
|
38
|
+
```yaml
|
39
|
+
- uses: ruby/setup-ruby@v1
|
40
|
+
with:
|
41
|
+
ruby-version: ruby
|
42
|
+
# bundler-cache: true # No "bundler-cache: true"
|
43
|
+
- run: bundle install
|
44
|
+
```
|
45
|
+
|
46
|
+
### ADBC driver
|
47
|
+
|
48
|
+
You also need to install ADBC drivers for systems you want to
|
49
|
+
connect. For example, you need the ADBC driver for PostgreSQL when you
|
50
|
+
want to connect to PostgreSQL.
|
51
|
+
|
52
|
+
See https://arrow.apache.org/adbc/current/driver/installation.html for
|
53
|
+
the latest information.
|
54
|
+
|
55
|
+
Here is information for some drivers:
|
56
|
+
|
57
|
+
PostgreSQL:
|
58
|
+
|
59
|
+
```bash
|
60
|
+
# Debian/Ubuntu
|
61
|
+
sudo apt install libadbc-driver-postgresql-dev
|
62
|
+
```
|
63
|
+
|
64
|
+
```bash
|
65
|
+
# Red Hat Enterprise Linux variants
|
66
|
+
sudo dnf install adbc-driver-postgresql-devel
|
67
|
+
```
|
68
|
+
|
69
|
+
SQLite:
|
70
|
+
|
71
|
+
```bash
|
72
|
+
# Debian/Ubuntu
|
73
|
+
sudo apt install libadbc-driver-sqlite-dev
|
74
|
+
```
|
75
|
+
|
76
|
+
```bash
|
77
|
+
# Red Hat Enterprise Linux variants
|
78
|
+
sudo dnf install adbc-driver-sqlite-devel
|
79
|
+
```
|
80
|
+
|
81
|
+
DuckDB: You don't need to install additional packages for ADBC
|
82
|
+
support. `libduckdb.so` includes ADBC support.
|
83
|
+
|
84
|
+
## Usage
|
85
|
+
|
86
|
+
This adapter is optimized for extracting and loading large data
|
87
|
+
from/to DBs. You should use the built-in adapters for normal CRUD use
|
88
|
+
cases. You should use this adapter only for large data.
|
89
|
+
|
90
|
+
You can use the Active Record's [multiple databases
|
91
|
+
support](https://guides.rubyonrails.org/active_record_multiple_databases.html)
|
92
|
+
feature.
|
93
|
+
|
94
|
+
### `config/database.yml`
|
95
|
+
|
96
|
+
Here is a sample `config/database.yml`. The `primary` configuration is
|
97
|
+
for the built-in PostgreSQL adapter. The `adbc` configuration is for
|
98
|
+
this adapter.
|
99
|
+
|
100
|
+
```yaml
|
101
|
+
default: &default
|
102
|
+
primary:
|
103
|
+
adapter: postgresql
|
104
|
+
encoding: unicode
|
105
|
+
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
|
106
|
+
url: <%= ENV.fetch("DATABASE_URL") { "postgresql:///my_app_#{Rails.env}" } %>
|
107
|
+
adbc:
|
108
|
+
adapter: adbc
|
109
|
+
driver: adbc_driver_postgresql
|
110
|
+
# "uri" not "url"!!!
|
111
|
+
uri: <%= ENV.fetch("DATABASE_URL") { "postgresql:///my_app_#{Rails.env}" } %>
|
112
|
+
# You need this to avoid migration.
|
113
|
+
database_tasks: false
|
114
|
+
development:
|
115
|
+
<<: *default
|
116
|
+
test:
|
117
|
+
<<: *default
|
118
|
+
production:
|
119
|
+
<<: *default
|
120
|
+
```
|
121
|
+
|
122
|
+
> [!NOTE]
|
123
|
+
>
|
124
|
+
> Configuration parameters are different for each ADBC driver. For
|
125
|
+
> example, [the PostgreSQL
|
126
|
+
> driver](https://arrow.apache.org/adbc/current/driver/postgresql.html)
|
127
|
+
> and [the SQLite
|
128
|
+
> driver(https://arrow.apache.org/adbc/current/driver/sqlite.html) use
|
129
|
+
> `uri` for connection information but [the DuckDB
|
130
|
+
> driver](https://duckdb.org/docs/stable/clients/adbc.html) uses
|
131
|
+
> `entrypoint` and `path`:
|
132
|
+
>
|
133
|
+
> ```yaml
|
134
|
+
> adbc:
|
135
|
+
> adapter: adbc
|
136
|
+
> driver: duckdb
|
137
|
+
> entrypoint: duckdb_adbc_init # You should not change this.
|
138
|
+
> path: <%= ENV.fetch("DATABASE_PATH") { Rails.root.join("db", "#{Rails.env}.duckdb") } %>
|
139
|
+
> # You need this to avoid migration.
|
140
|
+
> database_tasks: false
|
141
|
+
> ```
|
142
|
+
>
|
143
|
+
> See the driver's documentation for available parameters.
|
144
|
+
|
145
|
+
### Model
|
146
|
+
|
147
|
+
You need to create an abstract class for the ADBC connection:
|
148
|
+
|
149
|
+
```ruby
|
150
|
+
# app/models/adbc_application_record.rb
|
151
|
+
class AdbcApplicationRecord < ActiveRecord::Base
|
152
|
+
include ActiveRecordADBCAdapter::Ingest
|
153
|
+
|
154
|
+
self.abstract_class = true
|
155
|
+
connects_to database: { writing: :adbc, reading: :adbc }
|
156
|
+
end
|
157
|
+
```
|
158
|
+
|
159
|
+
You can create a model for the ADBC connection:
|
160
|
+
|
161
|
+
```ruby
|
162
|
+
# app/models/adbc_event.rb
|
163
|
+
class AdbcEvent < AdbcApplicationRecord
|
164
|
+
self.table_name = :events
|
165
|
+
end
|
166
|
+
```
|
167
|
+
|
168
|
+
You can use `Event` (that is derived from `ApplicationRecord` not
|
169
|
+
`AdbcApplicationRecord`) model for normal use case and `AdbcEvent` for
|
170
|
+
large data use case. For the large data use case, you should use
|
171
|
+
Apache Arrow data as-is as much as possible for performance. You can
|
172
|
+
get Apache Arrow data by `#to_arrow`:
|
173
|
+
|
174
|
+
```ruby
|
175
|
+
AdbcEvent.all.to_arrow
|
176
|
+
```
|
177
|
+
|
178
|
+
You can use `.ingest` to load Apache Arrow data:
|
179
|
+
|
180
|
+
```ruby
|
181
|
+
AdbcEvent.ingest(arrow_data)
|
182
|
+
```
|
183
|
+
|
184
|
+
You can copy large data efficiently by these method. Here is an sample
|
185
|
+
code to copy events in the latest 30 days to DuckDB from PostgreSQL:
|
186
|
+
|
187
|
+
```ruby
|
188
|
+
DuckDBAdbcEvent.ingest(PgAdbcEvent.where(created_at: ..(1.month.ago)).to_arrow)
|
189
|
+
```
|
190
|
+
|
191
|
+
## Performance
|
192
|
+
|
193
|
+
Here is a benchmark result with 100 integer columns and 10000 rows
|
194
|
+
load/dump with PostgreSQL. This adapter is faster than Active Record's
|
195
|
+
built-in PostgreSQL adapter and raw SQL (`INSERT INTO ...` and
|
196
|
+
`SELECT * FROM ...`).
|
197
|
+
|
198
|
+
Load:
|
199
|
+
|
200
|
+
```text
|
201
|
+
user system total real
|
202
|
+
SQL 0.003738 0.003141 0.006879 ( 0.350767)
|
203
|
+
Active Record 2.423670 0.047003 2.470673 ( 2.844029)
|
204
|
+
ADBC 0.010229 0.010960 0.021189 ( 0.075787)
|
205
|
+
```
|
206
|
+
|
207
|
+
Dump:
|
208
|
+
|
209
|
+
```text
|
210
|
+
user system total real
|
211
|
+
SQL 0.110934 0.006219 0.117153 ( 0.123201)
|
212
|
+
Active Record 0.271154 0.008948 0.280102 ( 0.283621)
|
213
|
+
ADBC 0.026966 0.006945 0.033911 ( 0.067801)
|
214
|
+
```
|
215
|
+
|
216
|
+

|
217
|
+
|
218
|
+
|
219
|
+
See [`benchmark/`](benchmark/) for details.
|
220
|
+
|
221
|
+
## License
|
222
|
+
|
223
|
+
The MIT license. See `LICENSE.txt` for details.
|
@@ -0,0 +1,10 @@
|
|
1
|
+
require "active_record"
|
2
|
+
|
3
|
+
require_relative "activerecord_adbc_adapter/ingest"
|
4
|
+
require_relative "activerecord_adbc_adapter/version"
|
5
|
+
|
6
|
+
module ActiveRecord::ConnectionAdapters
|
7
|
+
register("adbc",
|
8
|
+
"ActiveRecordADBCAdapter::Adapter",
|
9
|
+
"activerecord_adbc_adapter/adapter")
|
10
|
+
end
|
@@ -0,0 +1,209 @@
|
|
1
|
+
require "adbc"
|
2
|
+
|
3
|
+
require_relative "column"
|
4
|
+
require_relative "database_statements"
|
5
|
+
require_relative "quoting"
|
6
|
+
require_relative "relation_arrowable"
|
7
|
+
require_relative "result"
|
8
|
+
require_relative "schema_creation"
|
9
|
+
require_relative "schema_definitions"
|
10
|
+
require_relative "schema_statements"
|
11
|
+
|
12
|
+
module ActiveRecordADBCAdapter
|
13
|
+
# = Active Record ADBC Adapter
|
14
|
+
#
|
15
|
+
# ...
|
16
|
+
#
|
17
|
+
# Options:
|
18
|
+
#
|
19
|
+
# * ...
|
20
|
+
class Adapter < ActiveRecord::ConnectionAdapters::AbstractAdapter
|
21
|
+
ADAPTER_NAME = "ADBC"
|
22
|
+
|
23
|
+
class Connection
|
24
|
+
def initialize(**params)
|
25
|
+
params.delete(:database_tasks)
|
26
|
+
@database = ADBC::Database.open(**params)
|
27
|
+
@connection = @database.connect
|
28
|
+
end
|
29
|
+
|
30
|
+
def close
|
31
|
+
if @connection
|
32
|
+
@connection.release
|
33
|
+
@connection = nil
|
34
|
+
end
|
35
|
+
@database.release
|
36
|
+
@database = nil
|
37
|
+
end
|
38
|
+
|
39
|
+
def reconnect
|
40
|
+
if @connection
|
41
|
+
@connection.release
|
42
|
+
@connection = @database.connect
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
def open_statement(&block)
|
47
|
+
@connection.open_statement(&block)
|
48
|
+
end
|
49
|
+
|
50
|
+
if ADBC::Connection.method_defined?(:get_objects_raw)
|
51
|
+
# red-adbc provides convenient wrapper
|
52
|
+
def get_objects(...)
|
53
|
+
@connection.get_objects(...)
|
54
|
+
end
|
55
|
+
else
|
56
|
+
def get_objects(depth: :all,
|
57
|
+
catalog: nil,
|
58
|
+
db_schema: nil,
|
59
|
+
table_name: nil,
|
60
|
+
table_types: nil,
|
61
|
+
column_name: nil)
|
62
|
+
reader = @connection.get_objects(depth,
|
63
|
+
catalog,
|
64
|
+
db_schema,
|
65
|
+
table_name,
|
66
|
+
table_types,
|
67
|
+
column_name)
|
68
|
+
begin
|
69
|
+
reader.read_all
|
70
|
+
ensure
|
71
|
+
reader.unref
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
class << self
|
78
|
+
def new_client(params)
|
79
|
+
Connection.new(**params)
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
include DatabaseStatements
|
84
|
+
include Quoting
|
85
|
+
include SchemaStatements
|
86
|
+
|
87
|
+
FEATURES = [
|
88
|
+
:supports_insert_on_duplicate_skip,
|
89
|
+
]
|
90
|
+
|
91
|
+
def initialize(...)
|
92
|
+
super
|
93
|
+
|
94
|
+
@connection_parameters = @config.compact
|
95
|
+
@connection_parameters.delete(:adapter)
|
96
|
+
|
97
|
+
@raw_connection = nil
|
98
|
+
|
99
|
+
@features = {}
|
100
|
+
end
|
101
|
+
|
102
|
+
FEATURES.each do |feature|
|
103
|
+
define_method("#{feature}?") do
|
104
|
+
@features[feature]
|
105
|
+
end
|
106
|
+
end
|
107
|
+
|
108
|
+
def connect
|
109
|
+
@raw_connection = self.class.new_client(@connection_parameters)
|
110
|
+
detect_features
|
111
|
+
@raw_connection
|
112
|
+
end
|
113
|
+
|
114
|
+
def reconnect
|
115
|
+
@lock.synchronize do
|
116
|
+
@raw_connection&.reconnect
|
117
|
+
end
|
118
|
+
|
119
|
+
connect unless @raw_connection
|
120
|
+
end
|
121
|
+
|
122
|
+
def active?
|
123
|
+
@lock.synchronize do
|
124
|
+
return false unless @raw_connection
|
125
|
+
end
|
126
|
+
true
|
127
|
+
end
|
128
|
+
|
129
|
+
def disconnect!
|
130
|
+
@lock.synchronize do
|
131
|
+
super
|
132
|
+
@raw_connection&.close rescue nil
|
133
|
+
@raw_connection = nil
|
134
|
+
end
|
135
|
+
end
|
136
|
+
|
137
|
+
# Borrowed from
|
138
|
+
# ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#build_insert_sql.
|
139
|
+
#
|
140
|
+
# Copyright (c) David Heinemeier Hansson
|
141
|
+
#
|
142
|
+
# The MIT license.
|
143
|
+
def build_insert_sql(insert)
|
144
|
+
sql = +"INSERT #{insert.into} #{insert.values_list}"
|
145
|
+
|
146
|
+
if insert.skip_duplicates?
|
147
|
+
sql << " ON CONFLICT #{insert.conflict_target} DO NOTHING"
|
148
|
+
elsif insert.update_duplicates?
|
149
|
+
sql << " ON CONFLICT #{insert.conflict_target} DO UPDATE SET "
|
150
|
+
if insert.raw_update_sql?
|
151
|
+
sql << insert.raw_update_sql
|
152
|
+
else
|
153
|
+
sql << insert.touch_model_timestamps_unless { |column| "#{insert.model.quoted_table_name}.#{column} IS NOT DISTINCT FROM excluded.#{column}" }
|
154
|
+
sql << insert.updatable_columns.map { |column| "#{column}=excluded.#{column}" }.join(",")
|
155
|
+
end
|
156
|
+
end
|
157
|
+
|
158
|
+
sql << " RETURNING #{insert.returning}" if insert.returning
|
159
|
+
sql
|
160
|
+
end
|
161
|
+
|
162
|
+
def ingest(table_name, attributes, name: nil)
|
163
|
+
log(table_name, name) do |notification_payload|
|
164
|
+
with_raw_connection do |raw_connection|
|
165
|
+
raw_connection.open_statement do |statement|
|
166
|
+
statement.ingest(table_name, attributes, mode: :append)
|
167
|
+
end
|
168
|
+
end
|
169
|
+
end
|
170
|
+
end
|
171
|
+
|
172
|
+
def backend
|
173
|
+
# We can't use @connection_parameters here because #arel_visitor
|
174
|
+
# is called before Adapter#initialize. (#arel_visitor is called
|
175
|
+
# in AbstractAdapter#initialize.)
|
176
|
+
@config[:driver].gsub(/\Aadbc_driver_/, "")
|
177
|
+
end
|
178
|
+
|
179
|
+
def arel_visitor
|
180
|
+
case backend
|
181
|
+
when "postgresql"
|
182
|
+
Arel::Visitors::PostgreSQL.new(self)
|
183
|
+
else
|
184
|
+
super
|
185
|
+
end
|
186
|
+
end
|
187
|
+
|
188
|
+
private
|
189
|
+
def detect_features
|
190
|
+
detect_features_method = "detect_features_#{backend}"
|
191
|
+
if respond_to?(detect_features_method, true)
|
192
|
+
__send__(detect_features_method)
|
193
|
+
end
|
194
|
+
end
|
195
|
+
|
196
|
+
def detect_features_duckdb
|
197
|
+
@features[:supports_insert_on_duplicate_skip] = true
|
198
|
+
end
|
199
|
+
|
200
|
+
def detect_features_postgresql
|
201
|
+
@features[:supports_insert_on_duplicate_skip] = true
|
202
|
+
end
|
203
|
+
|
204
|
+
def detect_features_sqlite
|
205
|
+
@features[:supports_insert_on_duplicate_skip] = true
|
206
|
+
end
|
207
|
+
end
|
208
|
+
ActiveSupport.run_load_hooks(:active_record_adbcadapter, Adapter)
|
209
|
+
end
|
@@ -0,0 +1,68 @@
|
|
1
|
+
module ActiveRecordADBCAdapter
|
2
|
+
module DatabaseStatements
|
3
|
+
def perform_query(raw_connection,
|
4
|
+
sql,
|
5
|
+
binds,
|
6
|
+
type_casted_binds,
|
7
|
+
prepare:,
|
8
|
+
notification_payload:,
|
9
|
+
batch:)
|
10
|
+
raw_connection.open_statement do |statement|
|
11
|
+
statement.sql_query = sql
|
12
|
+
if binds.empty?
|
13
|
+
statement.execute[0]
|
14
|
+
else
|
15
|
+
statement.prepare
|
16
|
+
raw_records = {}
|
17
|
+
binds.zip(type_casted_binds) do |bind, type_casted_bind|
|
18
|
+
case type_casted_bind
|
19
|
+
when String
|
20
|
+
if type_casted_bind.encoding == Encoding::ASCII_8BIT
|
21
|
+
array = Arrow::BinaryArray.new([type_casted_bind])
|
22
|
+
else
|
23
|
+
array = Arrow::StringArray.new([type_casted_bind])
|
24
|
+
end
|
25
|
+
when DateTime
|
26
|
+
array = Arrow::TimestampArray.new(:micro,
|
27
|
+
[type_casted_bind.dup.localtime])
|
28
|
+
when Date
|
29
|
+
array = Arrow::Date32Array.new([type_casted_bind])
|
30
|
+
when ActiveRecord::Type::Time::Value
|
31
|
+
local_time = type_casted_bind.dup.localtime
|
32
|
+
time_value = (local_time.seconds_since_midnight * 1_000_000).to_i
|
33
|
+
array = Arrow::Time64Array.new(:micro, [time_value])
|
34
|
+
else
|
35
|
+
array = [type_casted_bind]
|
36
|
+
end
|
37
|
+
raw_records[bind.name] = array
|
38
|
+
end
|
39
|
+
record_batch = Arrow::RecordBatch.new(raw_records)
|
40
|
+
statement.bind(record_batch) do
|
41
|
+
statement.execute[0]
|
42
|
+
end
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
def cast_result(arrow_table)
|
48
|
+
Result.new(backend, arrow_table)
|
49
|
+
end
|
50
|
+
|
51
|
+
# Borrowed from
|
52
|
+
# ActiveRecord::ConnectionAdapters::PostgreSQL::DatabaseStatements.
|
53
|
+
#
|
54
|
+
# Copyright (c) David Heinemeier Hansson
|
55
|
+
#
|
56
|
+
# The MIT license.
|
57
|
+
READ_QUERY =
|
58
|
+
ActiveRecord::ConnectionAdapters::AbstractAdapter.build_read_query_regexp(
|
59
|
+
:close, :declare, :fetch, :move, :set, :show
|
60
|
+
) #:nodoc:
|
61
|
+
private_constant :READ_QUERY
|
62
|
+
def write_query?(sql) # :nodoc:
|
63
|
+
!READ_QUERY.match?(sql)
|
64
|
+
rescue ArgumentError # Invalid encoding
|
65
|
+
!READ_QUERY.match?(sql.b)
|
66
|
+
end
|
67
|
+
end
|
68
|
+
end
|
@@ -0,0 +1,13 @@
|
|
1
|
+
module ActiveRecordADBCAdapter
|
2
|
+
module Ingest
|
3
|
+
extend ActiveSupport::Concern
|
4
|
+
|
5
|
+
module ClassMethods
|
6
|
+
def ingest(attributes)
|
7
|
+
self.with_connection do |connection|
|
8
|
+
connection.ingest(table_name, attributes, name: "#{self} Ingest")
|
9
|
+
end
|
10
|
+
end
|
11
|
+
end
|
12
|
+
end
|
13
|
+
end
|
@@ -0,0 +1,19 @@
|
|
1
|
+
module ActiveRecordADBCAdapter
|
2
|
+
module Quoting
|
3
|
+
extend ActiveSupport::Concern
|
4
|
+
|
5
|
+
module ClassMethods
|
6
|
+
def quote_column_name(column_name)
|
7
|
+
"\"#{column_name.gsub("\"", "\"\"")}\""
|
8
|
+
end
|
9
|
+
end
|
10
|
+
|
11
|
+
def quoted_date(value)
|
12
|
+
value
|
13
|
+
end
|
14
|
+
|
15
|
+
def quoted_time(value)
|
16
|
+
value
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
@@ -0,0 +1,18 @@
|
|
1
|
+
module ActiveRecordADBCAdapter
|
2
|
+
module RelationArrowable
|
3
|
+
def to_arrow
|
4
|
+
result = exec_main_query
|
5
|
+
result.attach_model(model)
|
6
|
+
result.to_arrow
|
7
|
+
end
|
8
|
+
|
9
|
+
def each_record_batch(&block)
|
10
|
+
result = exec_main_query
|
11
|
+
result.attach_model(model)
|
12
|
+
result.each_record_batch(&block)
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
ActiveRecord::Relation.include(RelationArrowable)
|
17
|
+
ActiveRecord::Querying.delegate(:to_arrow, :each_record_batch, to: :all)
|
18
|
+
end
|
@@ -0,0 +1,151 @@
|
|
1
|
+
module ActiveRecordADBCAdapter
|
2
|
+
class Result
|
3
|
+
include Enumerable
|
4
|
+
|
5
|
+
def initialize(backend, table)
|
6
|
+
@backend = backend
|
7
|
+
@table = table
|
8
|
+
@schema = @table.schema
|
9
|
+
end
|
10
|
+
|
11
|
+
# This must be called before calling other methods.
|
12
|
+
def attach_model(model)
|
13
|
+
return unless @backend == "sqlite"
|
14
|
+
|
15
|
+
model_columns_hash = model.columns_hash
|
16
|
+
casted = false
|
17
|
+
new_chunked_arrays = []
|
18
|
+
new_fields = []
|
19
|
+
@table.columns.zip(@schema.fields) do |column, field|
|
20
|
+
chunked_array = nil
|
21
|
+
model_column = model_columns_hash[field.name]
|
22
|
+
if model_column
|
23
|
+
casted_type = nil
|
24
|
+
case model_column.sql_type_metadata.type
|
25
|
+
when :boolean
|
26
|
+
case field.data_type
|
27
|
+
when Arrow::IntegerDataType
|
28
|
+
casted_type = Arrow::BooleanDataType.new
|
29
|
+
end
|
30
|
+
when :date
|
31
|
+
case field.data_type
|
32
|
+
when Arrow::StringDataType
|
33
|
+
casted_type = Arrow::Date32DataType.new
|
34
|
+
end
|
35
|
+
when :datetime
|
36
|
+
case field.data_type
|
37
|
+
when Arrow::StringDataType
|
38
|
+
casted_type = Arrow::TimestampDataType.new(:nano)
|
39
|
+
end
|
40
|
+
end
|
41
|
+
if casted_type
|
42
|
+
chunked_array = column.cast(casted_type)
|
43
|
+
field = Arrow::Field.new(field.name, casted_type)
|
44
|
+
casted = true
|
45
|
+
end
|
46
|
+
end
|
47
|
+
new_chunked_arrays << (chunked_array || column.data)
|
48
|
+
new_fields << field
|
49
|
+
end
|
50
|
+
return unless casted
|
51
|
+
|
52
|
+
@schema = Arrow::Schema.new(new_fields)
|
53
|
+
@table = Arrow::Table.new(@schema, new_chunked_arrays)
|
54
|
+
end
|
55
|
+
|
56
|
+
def columns
|
57
|
+
@columns ||= fields.collect(&:name)
|
58
|
+
end
|
59
|
+
|
60
|
+
def column_types
|
61
|
+
@column_types ||= fields.inject({}) do |types, field|
|
62
|
+
types[field.name] = resolve_type(field.data_type)
|
63
|
+
types
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
def includes_column?(name)
|
68
|
+
columns.include?(name)
|
69
|
+
end
|
70
|
+
|
71
|
+
def rows
|
72
|
+
@rows ||= to_arrow.raw_records
|
73
|
+
end
|
74
|
+
|
75
|
+
def length
|
76
|
+
to_arrow.length
|
77
|
+
end
|
78
|
+
|
79
|
+
def empty?
|
80
|
+
length.zero?
|
81
|
+
end
|
82
|
+
|
83
|
+
def each(&block)
|
84
|
+
return to_enum(__method__) unless block_given?
|
85
|
+
|
86
|
+
rows.each do |record|
|
87
|
+
yield(Hash[@columns.zip(record)])
|
88
|
+
end
|
89
|
+
end
|
90
|
+
|
91
|
+
def indexed_rows
|
92
|
+
@indexed_rows ||= to_a
|
93
|
+
end
|
94
|
+
|
95
|
+
def cast_values(type_overrides = {})
|
96
|
+
# TODO: type_overrides support
|
97
|
+
if fields.size == 1
|
98
|
+
rows.map(&:first)
|
99
|
+
else
|
100
|
+
rows
|
101
|
+
end
|
102
|
+
end
|
103
|
+
|
104
|
+
def to_arrow
|
105
|
+
@table
|
106
|
+
end
|
107
|
+
|
108
|
+
def each_record_batch
|
109
|
+
return to_enum(__method__) unless block_given?
|
110
|
+
|
111
|
+
reader = Arrow::TableBatchReader.new(@table)
|
112
|
+
loop do
|
113
|
+
record_batch = reader.read_next
|
114
|
+
break if record_batch.nil?
|
115
|
+
yield(record_batch)
|
116
|
+
end
|
117
|
+
end
|
118
|
+
|
119
|
+
private
|
120
|
+
def fields
|
121
|
+
@fields ||= @schema.fields
|
122
|
+
end
|
123
|
+
|
124
|
+
def resolve_type(data_type)
|
125
|
+
case data_type
|
126
|
+
when Arrow::BooleanDataType
|
127
|
+
ActiveRecord::Type::Boolean.new
|
128
|
+
when Arrow::Int32DataType
|
129
|
+
ActiveRecord::Type::Integer.new(limit: 4)
|
130
|
+
when Arrow::Int64DataType
|
131
|
+
ActiveRecord::Type::Integer.new(limit: 8)
|
132
|
+
when Arrow::FloatDataType
|
133
|
+
ActiveRecord::Type::Float.new(limit: 24)
|
134
|
+
when Arrow::DoubleDataType
|
135
|
+
ActiveRecord::Type::Float.new
|
136
|
+
when Arrow::BinaryDataType
|
137
|
+
ActiveRecord::Type::Binary.new
|
138
|
+
when Arrow::StringDataType
|
139
|
+
ActiveRecord::Type::String.new
|
140
|
+
when Arrow::Date32DataType
|
141
|
+
ActiveRecord::Type::Date.new
|
142
|
+
when Arrow::TimestampDataType
|
143
|
+
ActiveRecord::Type::DateTime.new
|
144
|
+
when Arrow::Time64DataType
|
145
|
+
ActiveRecord::Type::Time.new
|
146
|
+
else
|
147
|
+
raise "Unknown: #{data_type.inspect}"
|
148
|
+
end
|
149
|
+
end
|
150
|
+
end
|
151
|
+
end
|
@@ -0,0 +1,44 @@
|
|
1
|
+
module ActiveRecordADBCAdapter
|
2
|
+
class SchemaCreation < ActiveRecord::ConnectionAdapters::SchemaCreation
|
3
|
+
private def quote_string(s)
|
4
|
+
@conn.quote_string(s)
|
5
|
+
end
|
6
|
+
|
7
|
+
private def sequence_name(column)
|
8
|
+
"sequence_#{column.table.name}_#{column.name}"
|
9
|
+
end
|
10
|
+
|
11
|
+
private def quoted_sequence_name(column)
|
12
|
+
quote_table_name(sequence_name(column))
|
13
|
+
end
|
14
|
+
|
15
|
+
def visit_ColumnDefinition(o)
|
16
|
+
sql = super
|
17
|
+
if o.type == :primary_key and @conn.backend == "duckdb"
|
18
|
+
sql << " DEFAULT NEXTVAL('#{quote_string(sequence_name(o))}')"
|
19
|
+
end
|
20
|
+
sql
|
21
|
+
end
|
22
|
+
|
23
|
+
def visit_TableDefinition(o)
|
24
|
+
o.columns.each do |column|
|
25
|
+
column.singleton_class.define_method(:table) do
|
26
|
+
o
|
27
|
+
end
|
28
|
+
end
|
29
|
+
sql = super
|
30
|
+
if @conn.backend == "duckdb"
|
31
|
+
o.columns.each do |column|
|
32
|
+
if column.type == :primary_key
|
33
|
+
s = +"CREATE SEQUENCE"
|
34
|
+
s << " IF NOT EXISTS" if o.if_not_exists
|
35
|
+
s << " #{quoted_sequence_name(column)}"
|
36
|
+
s << "; #{sql}"
|
37
|
+
sql = s
|
38
|
+
end
|
39
|
+
end
|
40
|
+
end
|
41
|
+
sql
|
42
|
+
end
|
43
|
+
end
|
44
|
+
end
|
@@ -0,0 +1,178 @@
|
|
1
|
+
module ActiveRecordADBCAdapter
|
2
|
+
module SchemaStatements
|
3
|
+
NATIVE_DATABASE_TYPES = {
|
4
|
+
"duckdb" => {
|
5
|
+
primary_key: "bigint PRIMARY KEY",
|
6
|
+
},
|
7
|
+
"postgresql" => {
|
8
|
+
primary_key: "bigserial PRIMARY KEY",
|
9
|
+
string: {name: "character varying"},
|
10
|
+
binary: {name: "bytea"},
|
11
|
+
datetime: {name: "timestamp without time zone"},
|
12
|
+
},
|
13
|
+
"sqlite" => {
|
14
|
+
primary_key: "integer PRIMARY KEY AUTOINCREMENT NOT NULL",
|
15
|
+
# INTEGER storage class can store 8 bytes value:
|
16
|
+
# https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes
|
17
|
+
integer: {name: "integer", limit: 8},
|
18
|
+
bigint: {name: "bigint", limit: 8},
|
19
|
+
},
|
20
|
+
}
|
21
|
+
|
22
|
+
def native_database_types
|
23
|
+
NATIVE_DATABASE_TYPES[backend] || super
|
24
|
+
end
|
25
|
+
|
26
|
+
def adbc_catalog
|
27
|
+
case backend
|
28
|
+
when "duckdb"
|
29
|
+
path = @connection_parameters[:path]
|
30
|
+
if path
|
31
|
+
File.basename(path, ".*")
|
32
|
+
else
|
33
|
+
"memory"
|
34
|
+
end
|
35
|
+
else
|
36
|
+
nil
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
def adbc_db_schema
|
41
|
+
case backend
|
42
|
+
when "duckdb"
|
43
|
+
"main"
|
44
|
+
else
|
45
|
+
nil
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
def adbc_table_type
|
50
|
+
case backend
|
51
|
+
when "duckdb"
|
52
|
+
"BASE TABLE"
|
53
|
+
else
|
54
|
+
"table"
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
def adbc_view_type
|
59
|
+
case backend
|
60
|
+
when "duckdb"
|
61
|
+
"VIEW"
|
62
|
+
else
|
63
|
+
"view"
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
def tables
|
68
|
+
type = adbc_table_type
|
69
|
+
with_raw_connection do |conn|
|
70
|
+
objects = conn.get_objects(depth: :tables,
|
71
|
+
catalog: adbc_catalog,
|
72
|
+
db_schema: adbc_db_schema,
|
73
|
+
table_types: [type])
|
74
|
+
tables = []
|
75
|
+
objects.raw_records.each do |_catalog_name, db_schemas|
|
76
|
+
db_schemas.each do |db_schema|
|
77
|
+
db_schema_tables = db_schema["db_schema_tables"]
|
78
|
+
next if db_schema_tables.nil?
|
79
|
+
db_schema_tables.each do |t|
|
80
|
+
# Some drivers may ignore table_types
|
81
|
+
next unless t["table_type"] == type
|
82
|
+
tables << t["table_name"]
|
83
|
+
end
|
84
|
+
end
|
85
|
+
end
|
86
|
+
tables
|
87
|
+
end
|
88
|
+
end
|
89
|
+
|
90
|
+
def views
|
91
|
+
type = adbc_view_type
|
92
|
+
with_raw_connection do |conn|
|
93
|
+
objects = conn.get_objects(depth: :tables,
|
94
|
+
catalog: adbc_catalog,
|
95
|
+
db_schema: adbc_db_schema,
|
96
|
+
table_types: [type])
|
97
|
+
views = []
|
98
|
+
objects.raw_records.each do |_catalog_name, db_schemas|
|
99
|
+
db_schemas.each do |db_schema|
|
100
|
+
db_schema_tables = db_schema["db_schema_tables"]
|
101
|
+
next if db_schema_tables.nil?
|
102
|
+
db_schema_tables.each do |t|
|
103
|
+
# Some drivers may ignore table_types
|
104
|
+
next unless t["table_type"] == type
|
105
|
+
views << t["table_name"]
|
106
|
+
end
|
107
|
+
end
|
108
|
+
end
|
109
|
+
views
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
113
|
+
def column_definitions(table_name)
|
114
|
+
with_raw_connection do |conn|
|
115
|
+
objects = conn.get_objects(catalog: adbc_catalog,
|
116
|
+
db_schema: adbc_db_schema,
|
117
|
+
table_name: table_name)
|
118
|
+
objects.raw_records.each do |_catalog_name, db_schemas|
|
119
|
+
db_schemas.each do |db_schema|
|
120
|
+
db_schema_tables = db_schema["db_schema_tables"]
|
121
|
+
next if db_schema_tables.nil?
|
122
|
+
db_schema_tables.each do |table|
|
123
|
+
return table["table_columns"]
|
124
|
+
end
|
125
|
+
end
|
126
|
+
end
|
127
|
+
[] # raise?
|
128
|
+
end
|
129
|
+
end
|
130
|
+
|
131
|
+
def primary_keys(table_name)
|
132
|
+
with_raw_connection do |conn|
|
133
|
+
objects = conn.get_objects(catalog: adbc_catalog,
|
134
|
+
db_schema: adbc_db_schema,
|
135
|
+
table_name: table_name)
|
136
|
+
objects.raw_records.each do |_catalog_name, db_schemas|
|
137
|
+
db_schemas.each do |db_schema|
|
138
|
+
db_schema_tables = db_schema["db_schema_tables"]
|
139
|
+
next if db_schema_tables.nil?
|
140
|
+
db_schema_tables.each do |table|
|
141
|
+
constraint = table["table_constraints"].find do |constraint|
|
142
|
+
constraint["constraint_type"] == "PRIMARY KEY"
|
143
|
+
end
|
144
|
+
return [] if constraint.nil?
|
145
|
+
return constraint["constraint_column_names"] || []
|
146
|
+
end
|
147
|
+
end
|
148
|
+
end
|
149
|
+
[]
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
153
|
+
private
|
154
|
+
def create_table_definition(name, **options)
|
155
|
+
TableDefinition.new(self, name, **options)
|
156
|
+
end
|
157
|
+
|
158
|
+
def schema_creation
|
159
|
+
SchemaCreation.new(self)
|
160
|
+
end
|
161
|
+
|
162
|
+
def new_column_from_field(table_name, field, definitions)
|
163
|
+
xdbc_type_name = field["xdbc_type_name"]
|
164
|
+
if xdbc_type_name
|
165
|
+
type_metadata = fetch_type_metadata(xdbc_type_name)
|
166
|
+
else
|
167
|
+
type_metadata = nil
|
168
|
+
end
|
169
|
+
Column.new(field["column_name"],
|
170
|
+
field["xdbc_column_def"],
|
171
|
+
type_metadata,
|
172
|
+
field["xdbc_nullable"] == 1,
|
173
|
+
nil,
|
174
|
+
collation: nil,
|
175
|
+
comment: nil)
|
176
|
+
end
|
177
|
+
end
|
178
|
+
end
|
metadata
ADDED
@@ -0,0 +1,84 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: activerecord-adbc-adapter
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Sutou Kouhei
|
8
|
+
bindir: bin
|
9
|
+
cert_chain: []
|
10
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
11
|
+
dependencies:
|
12
|
+
- !ruby/object:Gem::Dependency
|
13
|
+
name: activerecord
|
14
|
+
requirement: !ruby/object:Gem::Requirement
|
15
|
+
requirements:
|
16
|
+
- - ">="
|
17
|
+
- !ruby/object:Gem::Version
|
18
|
+
version: 8.0.0
|
19
|
+
type: :runtime
|
20
|
+
prerelease: false
|
21
|
+
version_requirements: !ruby/object:Gem::Requirement
|
22
|
+
requirements:
|
23
|
+
- - ">="
|
24
|
+
- !ruby/object:Gem::Version
|
25
|
+
version: 8.0.0
|
26
|
+
- !ruby/object:Gem::Dependency
|
27
|
+
name: red-adbc
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
29
|
+
requirements:
|
30
|
+
- - ">="
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
version: '0'
|
33
|
+
type: :runtime
|
34
|
+
prerelease: false
|
35
|
+
version_requirements: !ruby/object:Gem::Requirement
|
36
|
+
requirements:
|
37
|
+
- - ">="
|
38
|
+
- !ruby/object:Gem::Version
|
39
|
+
version: '0'
|
40
|
+
description: This gem provides an ADBC adapter for Active Record. This adapter is
|
41
|
+
optimized for extracting and loading large data from/to DBs. The optimization is
|
42
|
+
powered by Apache Arrow.
|
43
|
+
email:
|
44
|
+
- kou@clear-code.com
|
45
|
+
executables: []
|
46
|
+
extensions: []
|
47
|
+
extra_rdoc_files: []
|
48
|
+
files:
|
49
|
+
- LICENSE.txt
|
50
|
+
- README.md
|
51
|
+
- lib/activerecord-adbc-adapter.rb
|
52
|
+
- lib/activerecord_adbc_adapter/adapter.rb
|
53
|
+
- lib/activerecord_adbc_adapter/column.rb
|
54
|
+
- lib/activerecord_adbc_adapter/database_statements.rb
|
55
|
+
- lib/activerecord_adbc_adapter/ingest.rb
|
56
|
+
- lib/activerecord_adbc_adapter/quoting.rb
|
57
|
+
- lib/activerecord_adbc_adapter/relation_arrowable.rb
|
58
|
+
- lib/activerecord_adbc_adapter/result.rb
|
59
|
+
- lib/activerecord_adbc_adapter/schema_creation.rb
|
60
|
+
- lib/activerecord_adbc_adapter/schema_definitions.rb
|
61
|
+
- lib/activerecord_adbc_adapter/schema_statements.rb
|
62
|
+
- lib/activerecord_adbc_adapter/version.rb
|
63
|
+
homepage: https://github.com/red-data-tools/activerecord-adbc-adapter
|
64
|
+
licenses:
|
65
|
+
- MIT
|
66
|
+
metadata: {}
|
67
|
+
rdoc_options: []
|
68
|
+
require_paths:
|
69
|
+
- lib
|
70
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - ">="
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: '0'
|
75
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
76
|
+
requirements:
|
77
|
+
- - ">="
|
78
|
+
- !ruby/object:Gem::Version
|
79
|
+
version: '0'
|
80
|
+
requirements: []
|
81
|
+
rubygems_version: 3.6.9
|
82
|
+
specification_version: 4
|
83
|
+
summary: Active Record's ADBC adapter
|
84
|
+
test_files: []
|