activerecord-adbc-adapter 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: c7d09e5234b5077f35cb5ea1b42067bb214e6cd41070e31160aaeab53fca4368
4
+ data.tar.gz: 21eb666112941a0998a6b5482d58a93b5e23dc83d07b5c40703ffba971b37cf3
5
+ SHA512:
6
+ metadata.gz: b3cbdef06e5151fc373008042154a1f7134bbb346dde5322e8124796449e33a7c02b28c05723fa3c39771cf2019cab295d3c2def17a70b0cb1539bde5e702c39
7
+ data.tar.gz: f749b486c3da75fffd5f60060b9bcc5558eb03fab97c7fbdd9b94178fec31af2c398e0f03102c83ba9d7c5d8ca782363099ae88b8d2592bd049f4367ca18907d
data/LICENSE.txt ADDED
@@ -0,0 +1,20 @@
1
+ Copyright 2023 Sutou Kouhei <kou@clear-code.com>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,223 @@
1
+ # Active Record ADBC adapter
2
+
3
+ ## Description
4
+
5
+ Active Record ADBC adapter provides an
6
+ [ADBC](https://arrow.apache.org/adbc/) adapter for Active Record.
7
+
8
+ This adapter is optimized for extracting and loading large data
9
+ from/to DBs. The optimization is powered by Apache Arrow.
10
+
11
+ ## Install
12
+
13
+ ### `Gemfile`
14
+
15
+ Add
16
+ [rubygems-requirements-system](https://rubygems.org/gems/rubygems-requirements-system)
17
+ plugin and activerecord-adbc-adapter gem to your `Gemfile`:
18
+
19
+ ```ruby
20
+ plugin "rubygems-requirements-system"
21
+
22
+ gem "activerecord-adbc-adapter"
23
+ ```
24
+
25
+ This adapter requires ADBC libraries implemented in C++. The
26
+ rubygems-requirements-system plugin installs it into your system
27
+ automatically.
28
+
29
+ ### GitHub Actions
30
+
31
+ On GitHub Actions, you should not use the [`bundler-cache:
32
+ true`](https://github.com/ruby/setup-ruby?tab=readme-ov-file#caching-bundle-install-automatically)
33
+ option. Because it only caches activerecord-adbc-adapter gem. It
34
+ doesn't cache ADBC libraries installed into your system. You need to
35
+ run `bundle install` after `actions/setup-ruby` something like the
36
+ following:
37
+
38
+ ```yaml
39
+ - uses: ruby/setup-ruby@v1
40
+ with:
41
+ ruby-version: ruby
42
+ # bundler-cache: true # No "bundler-cache: true"
43
+ - run: bundle install
44
+ ```
45
+
46
+ ### ADBC driver
47
+
48
+ You also need to install ADBC drivers for systems you want to
49
+ connect. For example, you need the ADBC driver for PostgreSQL when you
50
+ want to connect to PostgreSQL.
51
+
52
+ See https://arrow.apache.org/adbc/current/driver/installation.html for
53
+ the latest information.
54
+
55
+ Here is information for some drivers:
56
+
57
+ PostgreSQL:
58
+
59
+ ```bash
60
+ # Debian/Ubuntu
61
+ sudo apt install libadbc-driver-postgresql-dev
62
+ ```
63
+
64
+ ```bash
65
+ # Red Hat Enterprise Linux variants
66
+ sudo dnf install adbc-driver-postgresql-devel
67
+ ```
68
+
69
+ SQLite:
70
+
71
+ ```bash
72
+ # Debian/Ubuntu
73
+ sudo apt install libadbc-driver-sqlite-dev
74
+ ```
75
+
76
+ ```bash
77
+ # Red Hat Enterprise Linux variants
78
+ sudo dnf install adbc-driver-sqlite-devel
79
+ ```
80
+
81
+ DuckDB: You don't need to install additional packages for ADBC
82
+ support. `libduckdb.so` includes ADBC support.
83
+
84
+ ## Usage
85
+
86
+ This adapter is optimized for extracting and loading large data
87
+ from/to DBs. You should use the built-in adapters for normal CRUD use
88
+ cases. You should use this adapter only for large data.
89
+
90
+ You can use the Active Record's [multiple databases
91
+ support](https://guides.rubyonrails.org/active_record_multiple_databases.html)
92
+ feature.
93
+
94
+ ### `config/database.yml`
95
+
96
+ Here is a sample `config/database.yml`. The `primary` configuration is
97
+ for the built-in PostgreSQL adapter. The `adbc` configuration is for
98
+ this adapter.
99
+
100
+ ```yaml
101
+ default: &default
102
+ primary:
103
+ adapter: postgresql
104
+ encoding: unicode
105
+ pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
106
+ url: <%= ENV.fetch("DATABASE_URL") { "postgresql:///my_app_#{Rails.env}" } %>
107
+ adbc:
108
+ adapter: adbc
109
+ driver: adbc_driver_postgresql
110
+ # "uri" not "url"!!!
111
+ uri: <%= ENV.fetch("DATABASE_URL") { "postgresql:///my_app_#{Rails.env}" } %>
112
+ # You need this to avoid migration.
113
+ database_tasks: false
114
+ development:
115
+ <<: *default
116
+ test:
117
+ <<: *default
118
+ production:
119
+ <<: *default
120
+ ```
121
+
122
+ > [!NOTE]
123
+ >
124
+ > Configuration parameters are different for each ADBC driver. For
125
+ > example, [the PostgreSQL
126
+ > driver](https://arrow.apache.org/adbc/current/driver/postgresql.html)
127
+ > and [the SQLite
128
+ > driver(https://arrow.apache.org/adbc/current/driver/sqlite.html) use
129
+ > `uri` for connection information but [the DuckDB
130
+ > driver](https://duckdb.org/docs/stable/clients/adbc.html) uses
131
+ > `entrypoint` and `path`:
132
+ >
133
+ > ```yaml
134
+ > adbc:
135
+ > adapter: adbc
136
+ > driver: duckdb
137
+ > entrypoint: duckdb_adbc_init # You should not change this.
138
+ > path: <%= ENV.fetch("DATABASE_PATH") { Rails.root.join("db", "#{Rails.env}.duckdb") } %>
139
+ > # You need this to avoid migration.
140
+ > database_tasks: false
141
+ > ```
142
+ >
143
+ > See the driver's documentation for available parameters.
144
+
145
+ ### Model
146
+
147
+ You need to create an abstract class for the ADBC connection:
148
+
149
+ ```ruby
150
+ # app/models/adbc_application_record.rb
151
+ class AdbcApplicationRecord < ActiveRecord::Base
152
+ include ActiveRecordADBCAdapter::Ingest
153
+
154
+ self.abstract_class = true
155
+ connects_to database: { writing: :adbc, reading: :adbc }
156
+ end
157
+ ```
158
+
159
+ You can create a model for the ADBC connection:
160
+
161
+ ```ruby
162
+ # app/models/adbc_event.rb
163
+ class AdbcEvent < AdbcApplicationRecord
164
+ self.table_name = :events
165
+ end
166
+ ```
167
+
168
+ You can use `Event` (that is derived from `ApplicationRecord` not
169
+ `AdbcApplicationRecord`) model for normal use case and `AdbcEvent` for
170
+ large data use case. For the large data use case, you should use
171
+ Apache Arrow data as-is as much as possible for performance. You can
172
+ get Apache Arrow data by `#to_arrow`:
173
+
174
+ ```ruby
175
+ AdbcEvent.all.to_arrow
176
+ ```
177
+
178
+ You can use `.ingest` to load Apache Arrow data:
179
+
180
+ ```ruby
181
+ AdbcEvent.ingest(arrow_data)
182
+ ```
183
+
184
+ You can copy large data efficiently by these method. Here is an sample
185
+ code to copy events in the latest 30 days to DuckDB from PostgreSQL:
186
+
187
+ ```ruby
188
+ DuckDBAdbcEvent.ingest(PgAdbcEvent.where(created_at: ..(1.month.ago)).to_arrow)
189
+ ```
190
+
191
+ ## Performance
192
+
193
+ Here is a benchmark result with 100 integer columns and 10000 rows
194
+ load/dump with PostgreSQL. This adapter is faster than Active Record's
195
+ built-in PostgreSQL adapter and raw SQL (`INSERT INTO ...` and
196
+ `SELECT * FROM ...`).
197
+
198
+ Load:
199
+
200
+ ```text
201
+ user system total real
202
+ SQL 0.003738 0.003141 0.006879 ( 0.350767)
203
+ Active Record 2.423670 0.047003 2.470673 ( 2.844029)
204
+ ADBC 0.010229 0.010960 0.021189 ( 0.075787)
205
+ ```
206
+
207
+ Dump:
208
+
209
+ ```text
210
+ user system total real
211
+ SQL 0.110934 0.006219 0.117153 ( 0.123201)
212
+ Active Record 0.271154 0.008948 0.280102 ( 0.283621)
213
+ ADBC 0.026966 0.006945 0.033911 ( 0.067801)
214
+ ```
215
+
216
+ ![Load and dump](benchmark/load-dump.svg)
217
+
218
+
219
+ See [`benchmark/`](benchmark/) for details.
220
+
221
+ ## License
222
+
223
+ The MIT license. See `LICENSE.txt` for details.
@@ -0,0 +1,10 @@
1
+ require "active_record"
2
+
3
+ require_relative "activerecord_adbc_adapter/ingest"
4
+ require_relative "activerecord_adbc_adapter/version"
5
+
6
+ module ActiveRecord::ConnectionAdapters
7
+ register("adbc",
8
+ "ActiveRecordADBCAdapter::Adapter",
9
+ "activerecord_adbc_adapter/adapter")
10
+ end
@@ -0,0 +1,209 @@
1
+ require "adbc"
2
+
3
+ require_relative "column"
4
+ require_relative "database_statements"
5
+ require_relative "quoting"
6
+ require_relative "relation_arrowable"
7
+ require_relative "result"
8
+ require_relative "schema_creation"
9
+ require_relative "schema_definitions"
10
+ require_relative "schema_statements"
11
+
12
+ module ActiveRecordADBCAdapter
13
+ # = Active Record ADBC Adapter
14
+ #
15
+ # ...
16
+ #
17
+ # Options:
18
+ #
19
+ # * ...
20
+ class Adapter < ActiveRecord::ConnectionAdapters::AbstractAdapter
21
+ ADAPTER_NAME = "ADBC"
22
+
23
+ class Connection
24
+ def initialize(**params)
25
+ params.delete(:database_tasks)
26
+ @database = ADBC::Database.open(**params)
27
+ @connection = @database.connect
28
+ end
29
+
30
+ def close
31
+ if @connection
32
+ @connection.release
33
+ @connection = nil
34
+ end
35
+ @database.release
36
+ @database = nil
37
+ end
38
+
39
+ def reconnect
40
+ if @connection
41
+ @connection.release
42
+ @connection = @database.connect
43
+ end
44
+ end
45
+
46
+ def open_statement(&block)
47
+ @connection.open_statement(&block)
48
+ end
49
+
50
+ if ADBC::Connection.method_defined?(:get_objects_raw)
51
+ # red-adbc provides convenient wrapper
52
+ def get_objects(...)
53
+ @connection.get_objects(...)
54
+ end
55
+ else
56
+ def get_objects(depth: :all,
57
+ catalog: nil,
58
+ db_schema: nil,
59
+ table_name: nil,
60
+ table_types: nil,
61
+ column_name: nil)
62
+ reader = @connection.get_objects(depth,
63
+ catalog,
64
+ db_schema,
65
+ table_name,
66
+ table_types,
67
+ column_name)
68
+ begin
69
+ reader.read_all
70
+ ensure
71
+ reader.unref
72
+ end
73
+ end
74
+ end
75
+ end
76
+
77
+ class << self
78
+ def new_client(params)
79
+ Connection.new(**params)
80
+ end
81
+ end
82
+
83
+ include DatabaseStatements
84
+ include Quoting
85
+ include SchemaStatements
86
+
87
+ FEATURES = [
88
+ :supports_insert_on_duplicate_skip,
89
+ ]
90
+
91
+ def initialize(...)
92
+ super
93
+
94
+ @connection_parameters = @config.compact
95
+ @connection_parameters.delete(:adapter)
96
+
97
+ @raw_connection = nil
98
+
99
+ @features = {}
100
+ end
101
+
102
+ FEATURES.each do |feature|
103
+ define_method("#{feature}?") do
104
+ @features[feature]
105
+ end
106
+ end
107
+
108
+ def connect
109
+ @raw_connection = self.class.new_client(@connection_parameters)
110
+ detect_features
111
+ @raw_connection
112
+ end
113
+
114
+ def reconnect
115
+ @lock.synchronize do
116
+ @raw_connection&.reconnect
117
+ end
118
+
119
+ connect unless @raw_connection
120
+ end
121
+
122
+ def active?
123
+ @lock.synchronize do
124
+ return false unless @raw_connection
125
+ end
126
+ true
127
+ end
128
+
129
+ def disconnect!
130
+ @lock.synchronize do
131
+ super
132
+ @raw_connection&.close rescue nil
133
+ @raw_connection = nil
134
+ end
135
+ end
136
+
137
+ # Borrowed from
138
+ # ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#build_insert_sql.
139
+ #
140
+ # Copyright (c) David Heinemeier Hansson
141
+ #
142
+ # The MIT license.
143
+ def build_insert_sql(insert)
144
+ sql = +"INSERT #{insert.into} #{insert.values_list}"
145
+
146
+ if insert.skip_duplicates?
147
+ sql << " ON CONFLICT #{insert.conflict_target} DO NOTHING"
148
+ elsif insert.update_duplicates?
149
+ sql << " ON CONFLICT #{insert.conflict_target} DO UPDATE SET "
150
+ if insert.raw_update_sql?
151
+ sql << insert.raw_update_sql
152
+ else
153
+ sql << insert.touch_model_timestamps_unless { |column| "#{insert.model.quoted_table_name}.#{column} IS NOT DISTINCT FROM excluded.#{column}" }
154
+ sql << insert.updatable_columns.map { |column| "#{column}=excluded.#{column}" }.join(",")
155
+ end
156
+ end
157
+
158
+ sql << " RETURNING #{insert.returning}" if insert.returning
159
+ sql
160
+ end
161
+
162
+ def ingest(table_name, attributes, name: nil)
163
+ log(table_name, name) do |notification_payload|
164
+ with_raw_connection do |raw_connection|
165
+ raw_connection.open_statement do |statement|
166
+ statement.ingest(table_name, attributes, mode: :append)
167
+ end
168
+ end
169
+ end
170
+ end
171
+
172
+ def backend
173
+ # We can't use @connection_parameters here because #arel_visitor
174
+ # is called before Adapter#initialize. (#arel_visitor is called
175
+ # in AbstractAdapter#initialize.)
176
+ @config[:driver].gsub(/\Aadbc_driver_/, "")
177
+ end
178
+
179
+ def arel_visitor
180
+ case backend
181
+ when "postgresql"
182
+ Arel::Visitors::PostgreSQL.new(self)
183
+ else
184
+ super
185
+ end
186
+ end
187
+
188
+ private
189
+ def detect_features
190
+ detect_features_method = "detect_features_#{backend}"
191
+ if respond_to?(detect_features_method, true)
192
+ __send__(detect_features_method)
193
+ end
194
+ end
195
+
196
+ def detect_features_duckdb
197
+ @features[:supports_insert_on_duplicate_skip] = true
198
+ end
199
+
200
+ def detect_features_postgresql
201
+ @features[:supports_insert_on_duplicate_skip] = true
202
+ end
203
+
204
+ def detect_features_sqlite
205
+ @features[:supports_insert_on_duplicate_skip] = true
206
+ end
207
+ end
208
+ ActiveSupport.run_load_hooks(:active_record_adbcadapter, Adapter)
209
+ end
@@ -0,0 +1,4 @@
1
+ module ActiveRecordADBCAdapter
2
+ class Column < ActiveRecord::ConnectionAdapters::Column # :nodoc:
3
+ end
4
+ end
@@ -0,0 +1,68 @@
1
+ module ActiveRecordADBCAdapter
2
+ module DatabaseStatements
3
+ def perform_query(raw_connection,
4
+ sql,
5
+ binds,
6
+ type_casted_binds,
7
+ prepare:,
8
+ notification_payload:,
9
+ batch:)
10
+ raw_connection.open_statement do |statement|
11
+ statement.sql_query = sql
12
+ if binds.empty?
13
+ statement.execute[0]
14
+ else
15
+ statement.prepare
16
+ raw_records = {}
17
+ binds.zip(type_casted_binds) do |bind, type_casted_bind|
18
+ case type_casted_bind
19
+ when String
20
+ if type_casted_bind.encoding == Encoding::ASCII_8BIT
21
+ array = Arrow::BinaryArray.new([type_casted_bind])
22
+ else
23
+ array = Arrow::StringArray.new([type_casted_bind])
24
+ end
25
+ when DateTime
26
+ array = Arrow::TimestampArray.new(:micro,
27
+ [type_casted_bind.dup.localtime])
28
+ when Date
29
+ array = Arrow::Date32Array.new([type_casted_bind])
30
+ when ActiveRecord::Type::Time::Value
31
+ local_time = type_casted_bind.dup.localtime
32
+ time_value = (local_time.seconds_since_midnight * 1_000_000).to_i
33
+ array = Arrow::Time64Array.new(:micro, [time_value])
34
+ else
35
+ array = [type_casted_bind]
36
+ end
37
+ raw_records[bind.name] = array
38
+ end
39
+ record_batch = Arrow::RecordBatch.new(raw_records)
40
+ statement.bind(record_batch) do
41
+ statement.execute[0]
42
+ end
43
+ end
44
+ end
45
+ end
46
+
47
+ def cast_result(arrow_table)
48
+ Result.new(backend, arrow_table)
49
+ end
50
+
51
+ # Borrowed from
52
+ # ActiveRecord::ConnectionAdapters::PostgreSQL::DatabaseStatements.
53
+ #
54
+ # Copyright (c) David Heinemeier Hansson
55
+ #
56
+ # The MIT license.
57
+ READ_QUERY =
58
+ ActiveRecord::ConnectionAdapters::AbstractAdapter.build_read_query_regexp(
59
+ :close, :declare, :fetch, :move, :set, :show
60
+ ) #:nodoc:
61
+ private_constant :READ_QUERY
62
+ def write_query?(sql) # :nodoc:
63
+ !READ_QUERY.match?(sql)
64
+ rescue ArgumentError # Invalid encoding
65
+ !READ_QUERY.match?(sql.b)
66
+ end
67
+ end
68
+ end
@@ -0,0 +1,13 @@
1
+ module ActiveRecordADBCAdapter
2
+ module Ingest
3
+ extend ActiveSupport::Concern
4
+
5
+ module ClassMethods
6
+ def ingest(attributes)
7
+ self.with_connection do |connection|
8
+ connection.ingest(table_name, attributes, name: "#{self} Ingest")
9
+ end
10
+ end
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,19 @@
1
+ module ActiveRecordADBCAdapter
2
+ module Quoting
3
+ extend ActiveSupport::Concern
4
+
5
+ module ClassMethods
6
+ def quote_column_name(column_name)
7
+ "\"#{column_name.gsub("\"", "\"\"")}\""
8
+ end
9
+ end
10
+
11
+ def quoted_date(value)
12
+ value
13
+ end
14
+
15
+ def quoted_time(value)
16
+ value
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,18 @@
1
+ module ActiveRecordADBCAdapter
2
+ module RelationArrowable
3
+ def to_arrow
4
+ result = exec_main_query
5
+ result.attach_model(model)
6
+ result.to_arrow
7
+ end
8
+
9
+ def each_record_batch(&block)
10
+ result = exec_main_query
11
+ result.attach_model(model)
12
+ result.each_record_batch(&block)
13
+ end
14
+ end
15
+
16
+ ActiveRecord::Relation.include(RelationArrowable)
17
+ ActiveRecord::Querying.delegate(:to_arrow, :each_record_batch, to: :all)
18
+ end
@@ -0,0 +1,151 @@
1
+ module ActiveRecordADBCAdapter
2
+ class Result
3
+ include Enumerable
4
+
5
+ def initialize(backend, table)
6
+ @backend = backend
7
+ @table = table
8
+ @schema = @table.schema
9
+ end
10
+
11
+ # This must be called before calling other methods.
12
+ def attach_model(model)
13
+ return unless @backend == "sqlite"
14
+
15
+ model_columns_hash = model.columns_hash
16
+ casted = false
17
+ new_chunked_arrays = []
18
+ new_fields = []
19
+ @table.columns.zip(@schema.fields) do |column, field|
20
+ chunked_array = nil
21
+ model_column = model_columns_hash[field.name]
22
+ if model_column
23
+ casted_type = nil
24
+ case model_column.sql_type_metadata.type
25
+ when :boolean
26
+ case field.data_type
27
+ when Arrow::IntegerDataType
28
+ casted_type = Arrow::BooleanDataType.new
29
+ end
30
+ when :date
31
+ case field.data_type
32
+ when Arrow::StringDataType
33
+ casted_type = Arrow::Date32DataType.new
34
+ end
35
+ when :datetime
36
+ case field.data_type
37
+ when Arrow::StringDataType
38
+ casted_type = Arrow::TimestampDataType.new(:nano)
39
+ end
40
+ end
41
+ if casted_type
42
+ chunked_array = column.cast(casted_type)
43
+ field = Arrow::Field.new(field.name, casted_type)
44
+ casted = true
45
+ end
46
+ end
47
+ new_chunked_arrays << (chunked_array || column.data)
48
+ new_fields << field
49
+ end
50
+ return unless casted
51
+
52
+ @schema = Arrow::Schema.new(new_fields)
53
+ @table = Arrow::Table.new(@schema, new_chunked_arrays)
54
+ end
55
+
56
+ def columns
57
+ @columns ||= fields.collect(&:name)
58
+ end
59
+
60
+ def column_types
61
+ @column_types ||= fields.inject({}) do |types, field|
62
+ types[field.name] = resolve_type(field.data_type)
63
+ types
64
+ end
65
+ end
66
+
67
+ def includes_column?(name)
68
+ columns.include?(name)
69
+ end
70
+
71
+ def rows
72
+ @rows ||= to_arrow.raw_records
73
+ end
74
+
75
+ def length
76
+ to_arrow.length
77
+ end
78
+
79
+ def empty?
80
+ length.zero?
81
+ end
82
+
83
+ def each(&block)
84
+ return to_enum(__method__) unless block_given?
85
+
86
+ rows.each do |record|
87
+ yield(Hash[@columns.zip(record)])
88
+ end
89
+ end
90
+
91
+ def indexed_rows
92
+ @indexed_rows ||= to_a
93
+ end
94
+
95
+ def cast_values(type_overrides = {})
96
+ # TODO: type_overrides support
97
+ if fields.size == 1
98
+ rows.map(&:first)
99
+ else
100
+ rows
101
+ end
102
+ end
103
+
104
+ def to_arrow
105
+ @table
106
+ end
107
+
108
+ def each_record_batch
109
+ return to_enum(__method__) unless block_given?
110
+
111
+ reader = Arrow::TableBatchReader.new(@table)
112
+ loop do
113
+ record_batch = reader.read_next
114
+ break if record_batch.nil?
115
+ yield(record_batch)
116
+ end
117
+ end
118
+
119
+ private
120
+ def fields
121
+ @fields ||= @schema.fields
122
+ end
123
+
124
+ def resolve_type(data_type)
125
+ case data_type
126
+ when Arrow::BooleanDataType
127
+ ActiveRecord::Type::Boolean.new
128
+ when Arrow::Int32DataType
129
+ ActiveRecord::Type::Integer.new(limit: 4)
130
+ when Arrow::Int64DataType
131
+ ActiveRecord::Type::Integer.new(limit: 8)
132
+ when Arrow::FloatDataType
133
+ ActiveRecord::Type::Float.new(limit: 24)
134
+ when Arrow::DoubleDataType
135
+ ActiveRecord::Type::Float.new
136
+ when Arrow::BinaryDataType
137
+ ActiveRecord::Type::Binary.new
138
+ when Arrow::StringDataType
139
+ ActiveRecord::Type::String.new
140
+ when Arrow::Date32DataType
141
+ ActiveRecord::Type::Date.new
142
+ when Arrow::TimestampDataType
143
+ ActiveRecord::Type::DateTime.new
144
+ when Arrow::Time64DataType
145
+ ActiveRecord::Type::Time.new
146
+ else
147
+ raise "Unknown: #{data_type.inspect}"
148
+ end
149
+ end
150
+ end
151
+ end
@@ -0,0 +1,44 @@
1
+ module ActiveRecordADBCAdapter
2
+ class SchemaCreation < ActiveRecord::ConnectionAdapters::SchemaCreation
3
+ private def quote_string(s)
4
+ @conn.quote_string(s)
5
+ end
6
+
7
+ private def sequence_name(column)
8
+ "sequence_#{column.table.name}_#{column.name}"
9
+ end
10
+
11
+ private def quoted_sequence_name(column)
12
+ quote_table_name(sequence_name(column))
13
+ end
14
+
15
+ def visit_ColumnDefinition(o)
16
+ sql = super
17
+ if o.type == :primary_key and @conn.backend == "duckdb"
18
+ sql << " DEFAULT NEXTVAL('#{quote_string(sequence_name(o))}')"
19
+ end
20
+ sql
21
+ end
22
+
23
+ def visit_TableDefinition(o)
24
+ o.columns.each do |column|
25
+ column.singleton_class.define_method(:table) do
26
+ o
27
+ end
28
+ end
29
+ sql = super
30
+ if @conn.backend == "duckdb"
31
+ o.columns.each do |column|
32
+ if column.type == :primary_key
33
+ s = +"CREATE SEQUENCE"
34
+ s << " IF NOT EXISTS" if o.if_not_exists
35
+ s << " #{quoted_sequence_name(column)}"
36
+ s << "; #{sql}"
37
+ sql = s
38
+ end
39
+ end
40
+ end
41
+ sql
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,8 @@
1
+ module ActiveRecordADBCAdapter
2
+ class TableDefinition < ActiveRecord::ConnectionAdapters::TableDefinition
3
+ private
4
+ def aliased_types(name, fallback)
5
+ fallback
6
+ end
7
+ end
8
+ end
@@ -0,0 +1,178 @@
1
+ module ActiveRecordADBCAdapter
2
+ module SchemaStatements
3
+ NATIVE_DATABASE_TYPES = {
4
+ "duckdb" => {
5
+ primary_key: "bigint PRIMARY KEY",
6
+ },
7
+ "postgresql" => {
8
+ primary_key: "bigserial PRIMARY KEY",
9
+ string: {name: "character varying"},
10
+ binary: {name: "bytea"},
11
+ datetime: {name: "timestamp without time zone"},
12
+ },
13
+ "sqlite" => {
14
+ primary_key: "integer PRIMARY KEY AUTOINCREMENT NOT NULL",
15
+ # INTEGER storage class can store 8 bytes value:
16
+ # https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes
17
+ integer: {name: "integer", limit: 8},
18
+ bigint: {name: "bigint", limit: 8},
19
+ },
20
+ }
21
+
22
+ def native_database_types
23
+ NATIVE_DATABASE_TYPES[backend] || super
24
+ end
25
+
26
+ def adbc_catalog
27
+ case backend
28
+ when "duckdb"
29
+ path = @connection_parameters[:path]
30
+ if path
31
+ File.basename(path, ".*")
32
+ else
33
+ "memory"
34
+ end
35
+ else
36
+ nil
37
+ end
38
+ end
39
+
40
+ def adbc_db_schema
41
+ case backend
42
+ when "duckdb"
43
+ "main"
44
+ else
45
+ nil
46
+ end
47
+ end
48
+
49
+ def adbc_table_type
50
+ case backend
51
+ when "duckdb"
52
+ "BASE TABLE"
53
+ else
54
+ "table"
55
+ end
56
+ end
57
+
58
+ def adbc_view_type
59
+ case backend
60
+ when "duckdb"
61
+ "VIEW"
62
+ else
63
+ "view"
64
+ end
65
+ end
66
+
67
+ def tables
68
+ type = adbc_table_type
69
+ with_raw_connection do |conn|
70
+ objects = conn.get_objects(depth: :tables,
71
+ catalog: adbc_catalog,
72
+ db_schema: adbc_db_schema,
73
+ table_types: [type])
74
+ tables = []
75
+ objects.raw_records.each do |_catalog_name, db_schemas|
76
+ db_schemas.each do |db_schema|
77
+ db_schema_tables = db_schema["db_schema_tables"]
78
+ next if db_schema_tables.nil?
79
+ db_schema_tables.each do |t|
80
+ # Some drivers may ignore table_types
81
+ next unless t["table_type"] == type
82
+ tables << t["table_name"]
83
+ end
84
+ end
85
+ end
86
+ tables
87
+ end
88
+ end
89
+
90
+ def views
91
+ type = adbc_view_type
92
+ with_raw_connection do |conn|
93
+ objects = conn.get_objects(depth: :tables,
94
+ catalog: adbc_catalog,
95
+ db_schema: adbc_db_schema,
96
+ table_types: [type])
97
+ views = []
98
+ objects.raw_records.each do |_catalog_name, db_schemas|
99
+ db_schemas.each do |db_schema|
100
+ db_schema_tables = db_schema["db_schema_tables"]
101
+ next if db_schema_tables.nil?
102
+ db_schema_tables.each do |t|
103
+ # Some drivers may ignore table_types
104
+ next unless t["table_type"] == type
105
+ views << t["table_name"]
106
+ end
107
+ end
108
+ end
109
+ views
110
+ end
111
+ end
112
+
113
+ def column_definitions(table_name)
114
+ with_raw_connection do |conn|
115
+ objects = conn.get_objects(catalog: adbc_catalog,
116
+ db_schema: adbc_db_schema,
117
+ table_name: table_name)
118
+ objects.raw_records.each do |_catalog_name, db_schemas|
119
+ db_schemas.each do |db_schema|
120
+ db_schema_tables = db_schema["db_schema_tables"]
121
+ next if db_schema_tables.nil?
122
+ db_schema_tables.each do |table|
123
+ return table["table_columns"]
124
+ end
125
+ end
126
+ end
127
+ [] # raise?
128
+ end
129
+ end
130
+
131
+ def primary_keys(table_name)
132
+ with_raw_connection do |conn|
133
+ objects = conn.get_objects(catalog: adbc_catalog,
134
+ db_schema: adbc_db_schema,
135
+ table_name: table_name)
136
+ objects.raw_records.each do |_catalog_name, db_schemas|
137
+ db_schemas.each do |db_schema|
138
+ db_schema_tables = db_schema["db_schema_tables"]
139
+ next if db_schema_tables.nil?
140
+ db_schema_tables.each do |table|
141
+ constraint = table["table_constraints"].find do |constraint|
142
+ constraint["constraint_type"] == "PRIMARY KEY"
143
+ end
144
+ return [] if constraint.nil?
145
+ return constraint["constraint_column_names"] || []
146
+ end
147
+ end
148
+ end
149
+ []
150
+ end
151
+ end
152
+
153
+ private
154
+ def create_table_definition(name, **options)
155
+ TableDefinition.new(self, name, **options)
156
+ end
157
+
158
+ def schema_creation
159
+ SchemaCreation.new(self)
160
+ end
161
+
162
+ def new_column_from_field(table_name, field, definitions)
163
+ xdbc_type_name = field["xdbc_type_name"]
164
+ if xdbc_type_name
165
+ type_metadata = fetch_type_metadata(xdbc_type_name)
166
+ else
167
+ type_metadata = nil
168
+ end
169
+ Column.new(field["column_name"],
170
+ field["xdbc_column_def"],
171
+ type_metadata,
172
+ field["xdbc_nullable"] == 1,
173
+ nil,
174
+ collation: nil,
175
+ comment: nil)
176
+ end
177
+ end
178
+ end
@@ -0,0 +1,3 @@
1
+ module ActiveRecordADBCAdapter
2
+ VERSION = "0.0.1"
3
+ end
metadata ADDED
@@ -0,0 +1,84 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: activerecord-adbc-adapter
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Sutou Kouhei
8
+ bindir: bin
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: activerecord
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - ">="
17
+ - !ruby/object:Gem::Version
18
+ version: 8.0.0
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - ">="
24
+ - !ruby/object:Gem::Version
25
+ version: 8.0.0
26
+ - !ruby/object:Gem::Dependency
27
+ name: red-adbc
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: '0'
33
+ type: :runtime
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - ">="
38
+ - !ruby/object:Gem::Version
39
+ version: '0'
40
+ description: This gem provides an ADBC adapter for Active Record. This adapter is
41
+ optimized for extracting and loading large data from/to DBs. The optimization is
42
+ powered by Apache Arrow.
43
+ email:
44
+ - kou@clear-code.com
45
+ executables: []
46
+ extensions: []
47
+ extra_rdoc_files: []
48
+ files:
49
+ - LICENSE.txt
50
+ - README.md
51
+ - lib/activerecord-adbc-adapter.rb
52
+ - lib/activerecord_adbc_adapter/adapter.rb
53
+ - lib/activerecord_adbc_adapter/column.rb
54
+ - lib/activerecord_adbc_adapter/database_statements.rb
55
+ - lib/activerecord_adbc_adapter/ingest.rb
56
+ - lib/activerecord_adbc_adapter/quoting.rb
57
+ - lib/activerecord_adbc_adapter/relation_arrowable.rb
58
+ - lib/activerecord_adbc_adapter/result.rb
59
+ - lib/activerecord_adbc_adapter/schema_creation.rb
60
+ - lib/activerecord_adbc_adapter/schema_definitions.rb
61
+ - lib/activerecord_adbc_adapter/schema_statements.rb
62
+ - lib/activerecord_adbc_adapter/version.rb
63
+ homepage: https://github.com/red-data-tools/activerecord-adbc-adapter
64
+ licenses:
65
+ - MIT
66
+ metadata: {}
67
+ rdoc_options: []
68
+ require_paths:
69
+ - lib
70
+ required_ruby_version: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ required_rubygems_version: !ruby/object:Gem::Requirement
76
+ requirements:
77
+ - - ">="
78
+ - !ruby/object:Gem::Version
79
+ version: '0'
80
+ requirements: []
81
+ rubygems_version: 3.6.9
82
+ specification_version: 4
83
+ summary: Active Record's ADBC adapter
84
+ test_files: []