exwiw 0.5.0 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +15 -0
- data/README.md +83 -1
- data/lib/exwiw/adapter/mongodb_adapter.rb +48 -10
- data/lib/exwiw/adapter/postgresql_adapter.rb +18 -1
- data/lib/exwiw/after_insert_hook.rb +1 -0
- data/lib/exwiw/cli.rb +41 -2
- data/lib/exwiw/determine_table_processing_order.rb +142 -25
- data/lib/exwiw/explain_runner.rb +4 -1
- data/lib/exwiw/query_ast_builder.rb +303 -5
- data/lib/exwiw/runner.rb +6 -1
- data/lib/exwiw/table_config.rb +15 -0
- data/lib/exwiw/version.rb +1 -1
- data/lib/exwiw.rb +7 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 567683d65df5d9f147ab9415a67baf48a80e21ad32e1ef7635c624dfc3d28c47
|
|
4
|
+
data.tar.gz: 1513b577f6f2368df60edc45a54c96495ece4f1ee9b453e92adb8991f182fcdf
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: a9680642eb34f99ed3f0c2924154a5171edf541286dfed14befe15b8c029271419a13d3295694847b1143898b0200bf6eb1d6448e6665dbad7bef026e3c3fbbb
|
|
7
|
+
data.tar.gz: 30e6ef9f988965b85f899fdb0646b6e4e2befd68f95a247a9edfeddb8d3a6088f611f07d5376c7d3b9f4f58f147072e03294a140312dec1622994fb6da175720
|
data/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,21 @@
|
|
|
2
2
|
|
|
3
3
|
## [Unreleased]
|
|
4
4
|
|
|
5
|
+
## [0.5.2] - 2026-06-18
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
|
|
9
|
+
- **PostgreSQL: an extension the restore target cannot create no longer aborts the restore, and `pglogical` is never emitted.** `dump_schema` prepends `CREATE EXTENSION IF NOT EXISTS` for every extension installed on the source, wrapped in a `DO` block that previously swallowed only `feature_not_supported` (the extension's binaries are unavailable on the target). A source on managed Postgres/AlloyDB carrying `pglogical` (logical replication) emitted `CREATE EXTENSION ... SCHEMA "pglogical"`, which on a target lacking that schema fails with `invalid_schema_name` — an error the handler did not catch, so the whole restore aborted. The handler now also catches `invalid_schema_name`, and instead of silently discarding the skip it re-raises it as a `WARNING` (carrying SQLSTATE and the original message) so the skip is visible in the restore logs rather than vanishing. `insufficient_privilege` is intentionally **not** caught: a restore role lacking `CREATE` privilege is a misconfiguration that must fail loudly. Separately, `pglogical` is now excluded from the prepended extensions entirely (alongside `plpgsql` and the `google_*`/`rds_*`/`aiven_*` managed-platform extensions) — it is a replication mechanism of the source, not part of the copied data.
|
|
10
|
+
- **Table processing order no longer aborts on `belongs_to` cycles.** Two distinct problems made `export`/`explain` fail with "Circular belongs_to dependency detected" on schemas that have no resolvable order: (1) a `belongs_to` whose target table is **not part of the run** — most commonly an embedded MongoDB collection, which is masked through its parent and never dumped on its own — was treated as a dependency that could never be satisfied, so every table that (transitively) referenced one froze and was misreported as a cycle; such out-of-run targets are now ignored when ordering. (2) A **genuine** cycle (e.g. `a belongs_to b` and `b belongs_to a`) now **breaks deterministically with a warning** instead of raising: exwiw emits the cycle member (a table in a strongly-connected component of the unresolved-dependency graph) with the fewest unresolved dependencies, preferring one that still has an already-ordered parent so its extraction stays constrained, and logs which `belongs_to` edge was dropped. The dropped edge is not enforced while ordering, so for the mongodb adapter that table may match a superset of rows (the not-yet-processed parent contributes no `$in` filter); mark one of the `belongs_to` entries forming the cycle with `ignore: true` to break it explicitly instead. Acyclic tables that merely wait on a cycle are never reordered ahead of their parents.
|
|
11
|
+
- **MongoDB: `dump_schema` tolerates collections declared in the schema but absent from the source database.** Listing indexes for a non-existent collection makes the driver raise `NamespaceNotFound` (code 26), which aborted the whole export when the schema covered more collections than the connected database actually had (schema/DB drift, or a sparse development database). The existing collections are now resolved once up front and indexes are emitted only for those; `createCollection` is still emitted for every configured collection, so the target schema is created in full.
|
|
12
|
+
- **MongoDB: a related collection that cannot be scoped no longer falls back to dumping the whole collection.** When a non-target collection's `belongs_to` parents all yield no ids to filter by — because every parent matched nothing or is not dumped on its own (e.g. an embedded collection) — the assembled filter is empty. Previously that empty filter was sent as `find({})`, scanning and dumping the **entire collection across every scope** (a cross-scope data-exposure risk). Such a collection is now constrained to match no rows and a warning is logged instead. A collection with no `belongs_to` at all is still treated as reference/master data and dumped in full.
|
|
13
|
+
|
|
14
|
+
## [0.5.1] - 2026-06-18
|
|
15
|
+
|
|
16
|
+
### Added
|
|
17
|
+
|
|
18
|
+
- **Scope-column extraction mode** (`--scope-column`, SQL adapters only). For schemas where many independent top-level tables share the same scope/tenant column instead of converging on a single `belongs_to` root, exwiw can now filter **every** table by that shared column (`--scope-column=COLUMN` with `--ids` as its values) rather than anchoring on one `--target-table`. A table that carries the column is filtered directly; a table that lacks it but `belongs_to` a table that has it is joined up to the nearest such table and filtered there. A table that `belongs_to` a parent which is itself scoped but carries no scope column of its own (e.g. a *hub* table scoped only because an extractable child references it) is constrained to the parent's in-scope ids via a subquery (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`), so the hub's other children ride along to just the in-scope rows — limited to a single forward hop and a single unambiguous scopable parent. A table that cannot be scoped at all (no column and no path to one) makes the run **abort with a list of the offending tables**, so an unscoped table is never silently dumped in full. Two user-owned table-config keys support this and are preserved across `schema:generate` regeneration: **`scope_exempt: true`** exports a genuine reference/master table in full (rails-managed tables are treated as exempt automatically), and **`scope_column`** overrides the filtered column name for a table that stores the same scope value under a different name. `--scope-column` is mutually exclusive with `--target-table`, `--target-collection`, `--ids-column`, and `--ids-field`, can be set in `exwiw.yml`, and works with `exwiw explain`.
|
|
19
|
+
|
|
5
20
|
## [0.5.0] - 2026-06-16
|
|
6
21
|
|
|
7
22
|
### Added
|
data/README.md
CHANGED
|
@@ -129,6 +129,88 @@ exwiw explain \
|
|
|
129
129
|
|
|
130
130
|
The `--output-dir`, `--output-format`, `--insert-only`, and `--after-insert-hook` options are dump-specific and rejected when used with `explain`.
|
|
131
131
|
|
|
132
|
+
### Scope-column mode (`--scope-column`)
|
|
133
|
+
|
|
134
|
+
The default `--target-table` extraction assumes the schema converges on a single
|
|
135
|
+
root: every table is reached by walking `belongs_to` toward that one table. Some
|
|
136
|
+
schemas are not shaped that way — many independent top-level tables each carry the
|
|
137
|
+
*same* scope/tenant column (e.g. `tenant_id`, `account_uuid`) and there is no
|
|
138
|
+
single root. Choosing one of them as `--target-table` would leave the others
|
|
139
|
+
unrelated to it, and an unrelated table is dumped in full — a problem if it holds
|
|
140
|
+
personal data.
|
|
141
|
+
|
|
142
|
+
`--scope-column` handles this shape: instead of one anchor table, **every table is
|
|
143
|
+
filtered by a shared column** whose values are `--ids`.
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
exwiw \
|
|
147
|
+
--adapter=postgresql \
|
|
148
|
+
--host=localhost --port=5432 --user=reader \
|
|
149
|
+
--database=app_production \
|
|
150
|
+
--schema-dir=exwiw/schema \
|
|
151
|
+
--scope-column=tenant_id \
|
|
152
|
+
--ids=42,43 \
|
|
153
|
+
--output-dir=dump
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
Each table is resolved as follows:
|
|
157
|
+
|
|
158
|
+
- **Carries the scope column** → `WHERE scope_column IN (ids)`.
|
|
159
|
+
- **Lacks it but `belongs_to` reaches a table that has it** → exwiw joins up to the
|
|
160
|
+
nearest such table and applies the scope filter there (the same join machinery
|
|
161
|
+
the single-target mode uses).
|
|
162
|
+
- **`belongs_to` a parent that is itself scoped but carries no scope column of its
|
|
163
|
+
own** → exwiw constrains this table to the parent's in-scope ids via a subquery
|
|
164
|
+
(`fk IN (SELECT parent.pk FROM <parent's scoped query>)`). This covers a *hub*
|
|
165
|
+
table that has no scope column and is scoped only because an extractable child
|
|
166
|
+
references it (see referenced-by below): the hub's other `belongs_to` children
|
|
167
|
+
ride along to just the in-scope rows instead of being dumped in full. Limited to
|
|
168
|
+
a single forward hop and a single unambiguous scopable parent.
|
|
169
|
+
- **Cannot be scoped at all** (no scope column and no path to one) → exwiw
|
|
170
|
+
**aborts** and lists the offending tables, so an unscoped table is never silently
|
|
171
|
+
dumped in full. For each, either add a `belongs_to` path, set `ignore: true` to
|
|
172
|
+
skip it, or mark it `scope_exempt: true` (below) to export it in full.
|
|
173
|
+
|
|
174
|
+
`--scope-column` is SQL-only (mysql / postgresql / sqlite) and mutually exclusive
|
|
175
|
+
with `--target-table`, `--target-collection`, `--ids-column`, and `--ids-field`.
|
|
176
|
+
It works with `exwiw explain` too, which is the recommended way to preview the
|
|
177
|
+
queries before exporting.
|
|
178
|
+
|
|
179
|
+
#### `scope_exempt` (intentional full dump)
|
|
180
|
+
|
|
181
|
+
A genuine reference/master table (no personal data) that has no scope linkage can
|
|
182
|
+
opt out of the strict check and be exported in full:
|
|
183
|
+
|
|
184
|
+
```json
|
|
185
|
+
{
|
|
186
|
+
"name": "countries",
|
|
187
|
+
"primary_key": "id",
|
|
188
|
+
"scope_exempt": true,
|
|
189
|
+
"columns": [{ "name": "id" }, { "name": "code" }]
|
|
190
|
+
}
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
Rails-managed tables (`schema_migrations`, `ar_internal_metadata`) are treated as
|
|
194
|
+
exempt automatically.
|
|
195
|
+
|
|
196
|
+
#### Per-table `scope_column` override
|
|
197
|
+
|
|
198
|
+
scope-column mode assumes a single shared **value** space — the same `--ids` apply
|
|
199
|
+
to every scoped table. If a table stores that same value under a differently named
|
|
200
|
+
column, override the column name for that table:
|
|
201
|
+
|
|
202
|
+
```json
|
|
203
|
+
{
|
|
204
|
+
"name": "legacy_orders",
|
|
205
|
+
"primary_key": "id",
|
|
206
|
+
"scope_column": "legacy_tenant_id",
|
|
207
|
+
"columns": [{ "name": "id" }, { "name": "legacy_tenant_id" }]
|
|
208
|
+
}
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
Both `scope_exempt` and `scope_column` are user-maintained and preserved across
|
|
212
|
+
`schema:generate` regeneration (the generators never emit them).
|
|
213
|
+
|
|
132
214
|
### Config file (`exwiw.yml`)
|
|
133
215
|
|
|
134
216
|
Options you would otherwise repeat on every run can be kept in a YAML config file. Pass it with `--config=PATH`; when `--config` is omitted, exwiw automatically loads `exwiw.yml` (or `exwiw.yaml`) from the current directory if present.
|
|
@@ -144,7 +226,7 @@ output_format: insert # insert | copy
|
|
|
144
226
|
insert_only: false
|
|
145
227
|
after_insert_hook: hooks/seed.rb
|
|
146
228
|
log_level: info # debug | info
|
|
147
|
-
# target_table / ids / ids_field / ids_column may also be set here
|
|
229
|
+
# target_table / ids / ids_field / ids_column / scope_column may also be set here
|
|
148
230
|
```
|
|
149
231
|
|
|
150
232
|
With the file above, only the connection details need to be supplied on the CLI:
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require 'json'
|
|
4
|
+
require 'set'
|
|
4
5
|
|
|
5
6
|
# NOTE: This adapter consumes MongodbCollectionConfig (`fields` instead of
|
|
6
7
|
# `columns`, plus `embedded_in`). Top-level collections are dumped as one
|
|
@@ -71,16 +72,7 @@ module Exwiw
|
|
|
71
72
|
{ config.primary_key => { "$in" => coerce_ids(dump_target.ids) } }
|
|
72
73
|
end
|
|
73
74
|
else
|
|
74
|
-
config
|
|
75
|
-
# Constrain by the parent field this FK actually references
|
|
76
|
-
# (`relation.references`, default the parent primary_key). The
|
|
77
|
-
# values were captured from that field's documents in #execute, so
|
|
78
|
-
# their BSON type already matches the stored FK — no coercion.
|
|
79
|
-
values = parent_state_for(relation, config_by_name)
|
|
80
|
-
next if values.nil? || values.empty?
|
|
81
|
-
|
|
82
|
-
acc[relation.foreign_key] = { "$in" => values }
|
|
83
|
-
end
|
|
75
|
+
related_collection_filter(config, config_by_name)
|
|
84
76
|
end
|
|
85
77
|
|
|
86
78
|
Exwiw::MongoQuery::Find.new(
|
|
@@ -160,6 +152,14 @@ module Exwiw
|
|
|
160
152
|
|
|
161
153
|
collections = ordered_tables.reject(&:embedded?)
|
|
162
154
|
|
|
155
|
+
# Index listing targets a specific collection, and MongoDB raises
|
|
156
|
+
# NamespaceNotFound (code 26) for one that does not exist. The schema may
|
|
157
|
+
# declare collections absent from this database (schema/DB drift, or a
|
|
158
|
+
# sparse dev DB), so resolve the set that actually exists up front and emit
|
|
159
|
+
# indexes only for those. `createCollection` is still emitted for every
|
|
160
|
+
# config below, so the target schema is created in full regardless.
|
|
161
|
+
existing_collections = db.database.collection_names.to_set
|
|
162
|
+
|
|
163
163
|
File.open(output_path, 'w') do |file|
|
|
164
164
|
file.puts("// Auto-generated by exwiw. Apply with: mongosh \"$MONGODB_URI\" #{File.basename(output_path)}")
|
|
165
165
|
file.puts
|
|
@@ -172,6 +172,11 @@ module Exwiw
|
|
|
172
172
|
|
|
173
173
|
collections.each do |config|
|
|
174
174
|
name = config.name
|
|
175
|
+
unless existing_collections.include?(name)
|
|
176
|
+
@logger.debug(" Collection '#{name}' is not present in the source database; emitting no indexes.")
|
|
177
|
+
next
|
|
178
|
+
end
|
|
179
|
+
|
|
175
180
|
indexes = db[name].indexes.to_a.reject { |idx| idx['name'] == '_id_' }
|
|
176
181
|
indexes.each do |idx|
|
|
177
182
|
key = idx['key']
|
|
@@ -274,6 +279,39 @@ module Exwiw
|
|
|
274
279
|
([config.primary_key] + referenced).uniq
|
|
275
280
|
end
|
|
276
281
|
|
|
282
|
+
# Build the scoping filter for a non-target collection from its belongs_to
|
|
283
|
+
# parents' captured ids. Each belongs_to is constrained by the parent field
|
|
284
|
+
# the FK references (`relation.references`, default the parent primary_key);
|
|
285
|
+
# the values were captured from that field in #execute, so their BSON type
|
|
286
|
+
# already matches the stored FK — no coercion.
|
|
287
|
+
#
|
|
288
|
+
# A belongs_to whose parent produced no ids contributes no constraint:
|
|
289
|
+
# either the parent matched nothing, or it is not dumped here (e.g. an
|
|
290
|
+
# embedded collection, or one excluded from the run). If that leaves the
|
|
291
|
+
# filter empty even though the collection HAS belongs_to, the collection
|
|
292
|
+
# cannot be scoped from the dump target — and falling back to an empty `{}`
|
|
293
|
+
# filter would scan and dump the ENTIRE collection across every scope. That
|
|
294
|
+
# is never what a scoped extraction wants, so constrain it to match nothing
|
|
295
|
+
# and warn instead. (A collection with no belongs_to at all is genuine
|
|
296
|
+
# reference/master data and is still dumped in full via `{}`.)
|
|
297
|
+
private def related_collection_filter(config, config_by_name)
|
|
298
|
+
filter = config.belongs_tos.each_with_object({}) do |relation, acc|
|
|
299
|
+
values = parent_state_for(relation, config_by_name)
|
|
300
|
+
next if values.nil? || values.empty?
|
|
301
|
+
|
|
302
|
+
acc[relation.foreign_key] = { "$in" => values }
|
|
303
|
+
end
|
|
304
|
+
|
|
305
|
+
return filter unless filter.empty? && config.belongs_tos.any?
|
|
306
|
+
|
|
307
|
+
@logger.warn(
|
|
308
|
+
" Collection '#{config.name}' has belongs_to but no parent produced ids to scope by " \
|
|
309
|
+
"(parents matched nothing, or are not dumped on their own such as embedded collections). " \
|
|
310
|
+
"Constraining it to match no rows to avoid an unscoped full-collection dump."
|
|
311
|
+
)
|
|
312
|
+
{ config.primary_key => { "$in" => [] } }
|
|
313
|
+
end
|
|
314
|
+
|
|
277
315
|
# The captured parent-collection values a child belongs_to should be
|
|
278
316
|
# constrained by: the values of the parent field the FK references
|
|
279
317
|
# (`relation.references`, default the parent primary_key). nil when the
|
|
@@ -65,7 +65,18 @@ module Exwiw
|
|
|
65
65
|
ext_ddl = extensions.map do |extname, schema|
|
|
66
66
|
stmt = "CREATE EXTENSION IF NOT EXISTS #{connection.quote_ident(extname)}"
|
|
67
67
|
stmt += " SCHEMA #{connection.quote_ident(schema)}" unless schema == "public"
|
|
68
|
-
|
|
68
|
+
# Best-effort prepend: a restore target that genuinely cannot create the
|
|
69
|
+
# extension should not abort the whole restore. Two such cases are caught:
|
|
70
|
+
# feature_not_supported (0A000) -- the extension's binaries are unavailable
|
|
71
|
+
# invalid_schema_name (3F000) -- the extension's required schema is absent
|
|
72
|
+
# insufficient_privilege (42501) is deliberately NOT caught: a restore role
|
|
73
|
+
# lacking CREATE privilege is a misconfiguration to fix, not to skip silently.
|
|
74
|
+
# The skip is re-raised as a WARNING so it surfaces in the restore logs
|
|
75
|
+
# instead of vanishing.
|
|
76
|
+
warning = connection.escape_literal("exwiw: skipped CREATE EXTENSION #{extname} (SQLSTATE %): %")
|
|
77
|
+
"DO $$ BEGIN #{stmt}; " \
|
|
78
|
+
"EXCEPTION WHEN feature_not_supported OR invalid_schema_name THEN " \
|
|
79
|
+
"RAISE WARNING #{warning}, SQLSTATE, SQLERRM; END $$;"
|
|
69
80
|
end.join("\n") + "\n\n"
|
|
70
81
|
@logger.debug(" Found #{extensions.size} extension(s) to prepend.")
|
|
71
82
|
stdout = ext_ddl + stdout
|
|
@@ -382,11 +393,17 @@ module Exwiw
|
|
|
382
393
|
end
|
|
383
394
|
|
|
384
395
|
private def query_extensions
|
|
396
|
+
# Skip plpgsql (always present) and managed-platform bookkeeping extensions
|
|
397
|
+
# (google_*/rds_*/aiven_*). pglogical is also skipped: it is a logical-
|
|
398
|
+
# replication mechanism of the source, not part of the data being copied,
|
|
399
|
+
# and its dedicated `pglogical` schema is typically absent on the restore
|
|
400
|
+
# target — so prepending CREATE EXTENSION for it only breaks the restore.
|
|
385
401
|
sql = <<~SQL
|
|
386
402
|
SELECT e.extname, n.nspname
|
|
387
403
|
FROM pg_extension e
|
|
388
404
|
JOIN pg_namespace n ON n.oid = e.extnamespace
|
|
389
405
|
WHERE e.extname != 'plpgsql'
|
|
406
|
+
AND e.extname != 'pglogical'
|
|
390
407
|
AND e.extname NOT LIKE 'google\\_%' ESCAPE '\\'
|
|
391
408
|
AND e.extname NOT LIKE 'rds\\_%' ESCAPE '\\'
|
|
392
409
|
AND e.extname NOT LIKE 'aiven\\_%' ESCAPE '\\'
|
|
@@ -38,6 +38,7 @@ module Exwiw
|
|
|
38
38
|
'EXWIW_DATABASE_USER' => cli_options[:database_user].to_s,
|
|
39
39
|
'EXWIW_DATABASE_NAME' => cli_options[:database_name].to_s,
|
|
40
40
|
'EXWIW_TARGET_TABLE' => cli_options[:target_table].to_s,
|
|
41
|
+
'EXWIW_SCOPE_COLUMN' => cli_options[:scope_column].to_s,
|
|
41
42
|
'EXWIW_IDS' => Array(cli_options[:ids]).join(','),
|
|
42
43
|
'EXWIW_OUTPUT_FORMAT' => cli_options[:output_format].to_s,
|
|
43
44
|
}
|
data/lib/exwiw/cli.rb
CHANGED
|
@@ -35,6 +35,7 @@ module Exwiw
|
|
|
35
35
|
ids
|
|
36
36
|
ids_field
|
|
37
37
|
ids_column
|
|
38
|
+
scope_column
|
|
38
39
|
].freeze
|
|
39
40
|
|
|
40
41
|
# Database connection settings are environment-specific (and sometimes
|
|
@@ -77,6 +78,7 @@ module Exwiw
|
|
|
77
78
|
@ids = []
|
|
78
79
|
@ids_field = nil
|
|
79
80
|
@ids_column = nil
|
|
81
|
+
@scope_column = nil
|
|
80
82
|
@output_format = nil
|
|
81
83
|
@insert_only = nil
|
|
82
84
|
@after_insert_hook_path = nil
|
|
@@ -109,6 +111,7 @@ module Exwiw
|
|
|
109
111
|
table_name: @target_table_name,
|
|
110
112
|
ids: @ids,
|
|
111
113
|
ids_field: @ids_field,
|
|
114
|
+
scope_column: @scope_column,
|
|
112
115
|
)
|
|
113
116
|
|
|
114
117
|
logger = build_logger
|
|
@@ -161,6 +164,7 @@ module Exwiw
|
|
|
161
164
|
end
|
|
162
165
|
|
|
163
166
|
resolve_target_collection_alias!
|
|
167
|
+
resolve_scope_column!
|
|
164
168
|
resolve_ids_column_alias!
|
|
165
169
|
resolve_uri_option!
|
|
166
170
|
|
|
@@ -228,8 +232,13 @@ module Exwiw
|
|
|
228
232
|
exit 1
|
|
229
233
|
end
|
|
230
234
|
|
|
231
|
-
if
|
|
232
|
-
$stderr.puts "--
|
|
235
|
+
if @scope_column && @ids.empty?
|
|
236
|
+
$stderr.puts "--ids is required when --scope-column is specified"
|
|
237
|
+
exit 1
|
|
238
|
+
end
|
|
239
|
+
|
|
240
|
+
if !@target_table_name && !@scope_column && @ids.any?
|
|
241
|
+
$stderr.puts "--target-table or --scope-column is required when --ids is specified"
|
|
233
242
|
exit 1
|
|
234
243
|
end
|
|
235
244
|
|
|
@@ -309,6 +318,7 @@ module Exwiw
|
|
|
309
318
|
end
|
|
310
319
|
@ids_field ||= config["ids_field"]
|
|
311
320
|
@ids_column ||= config["ids_column"]
|
|
321
|
+
@scope_column ||= config["scope_column"]
|
|
312
322
|
end
|
|
313
323
|
|
|
314
324
|
# Strip a trailing slash (like the CLI's dir options) and expand relative to
|
|
@@ -376,6 +386,33 @@ module Exwiw
|
|
|
376
386
|
end
|
|
377
387
|
end
|
|
378
388
|
|
|
389
|
+
# `--scope-column` switches to scope-column mode: every table is filtered by a
|
|
390
|
+
# shared column (`--ids` are its values) instead of anchoring on one
|
|
391
|
+
# `--target-table`. It is SQL-only and mutually exclusive with the single-target
|
|
392
|
+
# flags. Runs after resolve_target_collection_alias! (so --target-collection is
|
|
393
|
+
# already folded into @target_table_name) and before resolve_ids_column_alias!
|
|
394
|
+
# so the clearer "cannot combine" message wins over the generic ids-column one.
|
|
395
|
+
private def resolve_scope_column!
|
|
396
|
+
return if @scope_column.nil?
|
|
397
|
+
|
|
398
|
+
sql_adapters = ["mysql", "postgresql", "sqlite"]
|
|
399
|
+
unless sql_adapters.include?(@database_adapter)
|
|
400
|
+
$stderr.puts "--scope-column is only supported by the sql adapters"
|
|
401
|
+
exit 1
|
|
402
|
+
end
|
|
403
|
+
|
|
404
|
+
if @target_table_name
|
|
405
|
+
$stderr.puts "--scope-column cannot be combined with --target-table/--target-collection"
|
|
406
|
+
exit 1
|
|
407
|
+
end
|
|
408
|
+
|
|
409
|
+
if @ids_field || @ids_column
|
|
410
|
+
flag = @ids_column ? "--ids-column" : "--ids-field"
|
|
411
|
+
$stderr.puts "--scope-column cannot be combined with #{flag}"
|
|
412
|
+
exit 1
|
|
413
|
+
end
|
|
414
|
+
end
|
|
415
|
+
|
|
379
416
|
# `--uri` supplies a full connection string (e.g. `mongodb+srv://...`) and is
|
|
380
417
|
# mongodb-only — the SQL adapters shell out to their own client binaries with
|
|
381
418
|
# discrete host/port/user flags and have no equivalent. Runs after the
|
|
@@ -442,6 +479,7 @@ module Exwiw
|
|
|
442
479
|
target_table: @target_table_name,
|
|
443
480
|
ids: @ids.dup.freeze,
|
|
444
481
|
ids_field: @ids_field,
|
|
482
|
+
scope_column: @scope_column,
|
|
445
483
|
output_format: @output_format,
|
|
446
484
|
insert_only: @insert_only,
|
|
447
485
|
log_level: @log_level,
|
|
@@ -500,6 +538,7 @@ module Exwiw
|
|
|
500
538
|
opts.on("--ids=[IDS]", "Comma-separated list of identifiers. Required when --target-table is given.") { |v| @ids = v.split(',') }
|
|
501
539
|
opts.on("--ids-field=[FIELD]", "Field on the target collection that --ids is matched against. Defaults to the primary key. (mongodb adapter only)") { |v| @ids_field = v }
|
|
502
540
|
opts.on("--ids-column=[COLUMN]", "Column on the target table that --ids is matched against. Defaults to the primary key. (sql adapters only)") { |v| @ids_column = v }
|
|
541
|
+
opts.on("--scope-column=[COLUMN]", "Filter every table by this shared column (--ids are its values) instead of a single --target-table. Tables lacking it are reached via belongs_to. SQL adapters only; mutually exclusive with --target-table.") { |v| @scope_column = v }
|
|
503
542
|
opts.on("--output-format=[FORMAT]", "Output format: insert (default) or copy (PostgreSQL only, export subcommand only)") { |v| @output_format = v }
|
|
504
543
|
opts.on("--insert-only", "Do not generate DELETE SQL files (export subcommand only)") { @insert_only = true }
|
|
505
544
|
opts.on("--after-insert-hook=PATH", "Path to a .rb or .sh post-processing hook executed after all insert/delete files are written (export subcommand only)") do |v|
|
|
@@ -1,35 +1,58 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require "set"
|
|
4
|
+
|
|
3
5
|
module Exwiw
|
|
4
6
|
module DetermineTableProcessingOrder
|
|
5
7
|
module_function
|
|
6
8
|
|
|
7
9
|
# @param tables [Array<Exwiw::TableConfig>] tables
|
|
10
|
+
# @param logger [Logger, nil] receives a warning when a cycle has to be broken
|
|
8
11
|
# @return [Array<String>] sorted table names
|
|
9
|
-
def run(tables)
|
|
12
|
+
def run(tables, logger: nil)
|
|
10
13
|
return tables.map(&:name) if tables.size < 2
|
|
11
14
|
|
|
12
15
|
ordered_table_names = []
|
|
16
|
+
ordered = Set.new
|
|
13
17
|
|
|
14
18
|
table_by_name = tables.each_with_object({}) do |table, acc|
|
|
15
19
|
acc[table.name] = table
|
|
16
20
|
end
|
|
17
21
|
|
|
22
|
+
# Only belongs_to relations whose target is also in this run constrain the
|
|
23
|
+
# order. A belongs_to pointing at a table that is not being processed here
|
|
24
|
+
# — e.g. an embedded MongoDB collection (masked through its parent, never
|
|
25
|
+
# dumped on its own) or any table excluded from the run — is not something
|
|
26
|
+
# we can or need to order against, so it must never block resolution.
|
|
27
|
+
# Without this, such a dependency would stay unresolved forever and
|
|
28
|
+
# masquerade as a circular dependency, freezing every table that
|
|
29
|
+
# (transitively) references it.
|
|
30
|
+
present_names = table_by_name.keys.to_set
|
|
31
|
+
|
|
18
32
|
loop do
|
|
19
33
|
break if table_by_name.empty?
|
|
20
34
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
not_resolved_names.empty?
|
|
35
|
+
resolvable = table_by_name.values.select do |table|
|
|
36
|
+
unresolved_dependencies(table, present_names, ordered).empty?
|
|
25
37
|
end
|
|
26
38
|
|
|
27
|
-
if
|
|
28
|
-
|
|
39
|
+
if resolvable.empty?
|
|
40
|
+
# No table has all its (in-run) dependencies satisfied, yet tables
|
|
41
|
+
# remain: the belongs_to graph has a genuine cycle and no strict
|
|
42
|
+
# topological order exists. Rather than aborting the whole export, break
|
|
43
|
+
# the cycle by emitting one cycle member; see pick_cycle_victim for how
|
|
44
|
+
# the member is chosen. Warn so the dropped constraint is visible.
|
|
45
|
+
victim = pick_cycle_victim(table_by_name.values, present_names, ordered)
|
|
46
|
+
warn_cycle_break(logger, victim, unresolved_dependencies(victim, present_names, ordered))
|
|
47
|
+
resolvable = [victim]
|
|
29
48
|
end
|
|
30
49
|
|
|
31
|
-
|
|
50
|
+
# In the normal (acyclic) path, emit every currently-resolvable table in
|
|
51
|
+
# insertion order — preserving the historical ordering the snapshot specs
|
|
52
|
+
# depend on. The cycle-break path emits exactly its single chosen victim.
|
|
53
|
+
resolvable.each do |table|
|
|
32
54
|
ordered_table_names << table.name
|
|
55
|
+
ordered << table.name
|
|
33
56
|
table_by_name.delete(table.name)
|
|
34
57
|
end
|
|
35
58
|
end
|
|
@@ -37,30 +60,124 @@ module Exwiw
|
|
|
37
60
|
ordered_table_names
|
|
38
61
|
end
|
|
39
62
|
|
|
63
|
+
# The belongs_to target table names of `table`. A polymorphic belongs_to is
|
|
64
|
+
# expanded into one entry per concrete target by schema generation, so each
|
|
65
|
+
# entry is a plain table name here.
|
|
40
66
|
def compute_table_dependencies(table)
|
|
41
|
-
table.belongs_tos.
|
|
42
|
-
|
|
67
|
+
table.belongs_tos.map(&:table_name)
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
# The dependencies still blocking `table`: belongs_to targets that are part
|
|
71
|
+
# of this run, not yet ordered, and not the table itself (a self-referential
|
|
72
|
+
# belongs_to never blocks).
|
|
73
|
+
private_class_method def unresolved_dependencies(table, present_names, ordered)
|
|
74
|
+
compute_table_dependencies(table).uniq.select do |dep|
|
|
75
|
+
present_names.include?(dep) && !ordered.include?(dep) && dep != table.name
|
|
43
76
|
end
|
|
44
77
|
end
|
|
45
78
|
|
|
46
|
-
#
|
|
47
|
-
#
|
|
48
|
-
#
|
|
49
|
-
#
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
79
|
+
# Choose the next table to emit when the order is stuck in a cycle. Only
|
|
80
|
+
# genuine cycle members are eligible — a table in a non-trivial
|
|
81
|
+
# strongly-connected component of the unresolved-dependency subgraph — so an
|
|
82
|
+
# acyclic table that merely waits on a cycle is never reordered ahead of its
|
|
83
|
+
# parent. Among the members, prefer one that still has at least one
|
|
84
|
+
# already-ordered parent, so its extraction stays constrained instead of
|
|
85
|
+
# collapsing to "match every row" (a cross-scope over-extraction risk for the
|
|
86
|
+
# mongodb adapter); break remaining ties by fewest unresolved dependencies,
|
|
87
|
+
# then by name, for determinism.
|
|
88
|
+
private_class_method def pick_cycle_victim(remaining, present_names, ordered)
|
|
89
|
+
adjacency = remaining.each_with_object({}) do |table, acc|
|
|
90
|
+
acc[table.name] = unresolved_dependencies(table, present_names, ordered)
|
|
91
|
+
end
|
|
92
|
+
cyclic_names = strongly_connected_members(adjacency)
|
|
93
|
+
|
|
94
|
+
candidates = remaining.select { |table| cyclic_names.include?(table.name) }
|
|
95
|
+
candidates = remaining if candidates.empty? # defensive; a stall implies a cycle
|
|
96
|
+
|
|
97
|
+
anchored = candidates.select { |table| ordered_parent?(table, present_names, ordered) }
|
|
98
|
+
pool = anchored.empty? ? candidates : anchored
|
|
99
|
+
|
|
100
|
+
pool.min_by { |table| [unresolved_dependencies(table, present_names, ordered).size, table.name] }
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
# True when `table` has a belongs_to whose target was already ordered, so its
|
|
104
|
+
# extraction filter will be constrained rather than an unscoped full scan.
|
|
105
|
+
private_class_method def ordered_parent?(table, present_names, ordered)
|
|
106
|
+
compute_table_dependencies(table).any? do |dep|
|
|
107
|
+
dep != table.name && present_names.include?(dep) && ordered.include?(dep)
|
|
54
108
|
end
|
|
55
109
|
end
|
|
56
110
|
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
111
|
+
# Names belonging to a non-trivial strongly-connected component (size > 1) of
|
|
112
|
+
# `adjacency` (table name -> unresolved dependency names), i.e. the genuine
|
|
113
|
+
# cycle participants. Iterative Tarjan; nodes and edges are visited in name
|
|
114
|
+
# order so the result is deterministic. Self-edges are already excluded from
|
|
115
|
+
# the adjacency, so a size-1 component is never a cycle.
|
|
116
|
+
private_class_method def strongly_connected_members(adjacency)
|
|
117
|
+
index = {}
|
|
118
|
+
low = {}
|
|
119
|
+
on_stack = {}
|
|
120
|
+
stack = []
|
|
121
|
+
counter = 0
|
|
122
|
+
members = Set.new
|
|
123
|
+
neighbors = adjacency.each_with_object({}) { |(name, deps), acc| acc[name] = deps.sort }
|
|
124
|
+
|
|
125
|
+
adjacency.keys.sort.each do |start|
|
|
126
|
+
next if index.key?(start)
|
|
127
|
+
|
|
128
|
+
work = [[start, 0]]
|
|
129
|
+
until work.empty?
|
|
130
|
+
node, edge_i = work.last
|
|
131
|
+
if edge_i.zero?
|
|
132
|
+
index[node] = counter
|
|
133
|
+
low[node] = counter
|
|
134
|
+
counter += 1
|
|
135
|
+
stack.push(node)
|
|
136
|
+
on_stack[node] = true
|
|
137
|
+
end
|
|
138
|
+
|
|
139
|
+
adj = neighbors[node] || []
|
|
140
|
+
if edge_i < adj.size
|
|
141
|
+
work.last[1] += 1
|
|
142
|
+
w = adj[edge_i]
|
|
143
|
+
next unless adjacency.key?(w) # ignore edges leaving the remaining set
|
|
144
|
+
|
|
145
|
+
if index.key?(w)
|
|
146
|
+
low[node] = [low[node], index[w]].min if on_stack[w]
|
|
147
|
+
else
|
|
148
|
+
work.push([w, 0])
|
|
149
|
+
end
|
|
150
|
+
else
|
|
151
|
+
if low[node] == index[node]
|
|
152
|
+
component = []
|
|
153
|
+
loop do
|
|
154
|
+
w = stack.pop
|
|
155
|
+
on_stack[w] = false
|
|
156
|
+
component << w
|
|
157
|
+
break if w == node
|
|
158
|
+
end
|
|
159
|
+
members.merge(component) if component.size > 1
|
|
160
|
+
end
|
|
161
|
+
work.pop
|
|
162
|
+
low[work.last[0]] = [low[work.last[0]], low[node]].min unless work.empty?
|
|
163
|
+
end
|
|
164
|
+
end
|
|
165
|
+
end
|
|
166
|
+
|
|
167
|
+
members
|
|
168
|
+
end
|
|
169
|
+
|
|
170
|
+
private_class_method def warn_cycle_break(logger, victim, dropped)
|
|
171
|
+
return if logger.nil?
|
|
172
|
+
|
|
173
|
+
logger.warn(
|
|
174
|
+
"Circular belongs_to dependency detected. Breaking it by ordering " \
|
|
175
|
+
"'#{victim.name}' before its parent table(s): #{dropped.join(', ')}. The dropped " \
|
|
176
|
+
"relationship is not enforced while ordering, so '#{victim.name}' is extracted " \
|
|
177
|
+
"without that parent constraint (the mongodb adapter may then match a superset of " \
|
|
178
|
+
"rows; SQL output may not load in foreign-key order). To break the cycle explicitly " \
|
|
179
|
+
"instead, mark one of the belongs_to entries forming it with `ignore: true`."
|
|
180
|
+
)
|
|
64
181
|
end
|
|
65
182
|
end
|
|
66
183
|
end
|
data/lib/exwiw/explain_runner.rb
CHANGED
|
@@ -26,8 +26,11 @@ module Exwiw
|
|
|
26
26
|
target = table_by_name[@dump_target.table_name]
|
|
27
27
|
adapter.validate_as_dump_target!(target) if target
|
|
28
28
|
|
|
29
|
+
dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
|
|
30
|
+
QueryAstBuilder.validate_scope!(dumpable_configs, table_by_name, @dump_target, @logger)
|
|
31
|
+
|
|
29
32
|
@logger.debug("Determining table processing order...")
|
|
30
|
-
ordered_table_names = DetermineTableProcessingOrder.run(
|
|
33
|
+
ordered_table_names = DetermineTableProcessingOrder.run(dumpable_configs, logger: @logger)
|
|
31
34
|
|
|
32
35
|
total_size = ordered_table_names.size
|
|
33
36
|
ordered_table_names.each_with_index do |table_name, idx|
|
|
@@ -2,23 +2,58 @@
|
|
|
2
2
|
|
|
3
3
|
module Exwiw
|
|
4
4
|
class QueryAstBuilder
|
|
5
|
-
def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true)
|
|
6
|
-
new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse).run
|
|
5
|
+
def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
|
|
6
|
+
new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse, allow_forward: allow_forward).run
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
# Scope-column mode classification for a single table. One of
|
|
10
|
+
# :exempt / :direct / :via_path / :referenced_by / :via_scoped_parent / :unscopable.
|
|
11
|
+
def self.scope_category(table_name, table_by_name, dump_target, logger)
|
|
12
|
+
new(table_name, table_by_name, dump_target, logger).scope_category
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
# Strict pre-flight for scope-column mode: abort if any extractable table
|
|
16
|
+
# cannot be scoped, so an unscoped (potentially sensitive) table is never
|
|
17
|
+
# silently dumped in full. No-op outside scope mode. `tables` is the set of
|
|
18
|
+
# dumpable configs (ignore:true tables are skipped — they are not extracted).
|
|
19
|
+
def self.validate_scope!(tables, table_by_name, dump_target, logger)
|
|
20
|
+
return if dump_target.scope_column.nil?
|
|
21
|
+
|
|
22
|
+
unscopable =
|
|
23
|
+
tables.reject(&:ignore).select do |table|
|
|
24
|
+
scope_category(table.name, table_by_name, dump_target, logger) == :unscopable
|
|
25
|
+
end
|
|
26
|
+
return if unscopable.empty?
|
|
27
|
+
|
|
28
|
+
names = unscopable.map(&:name).sort.join(", ")
|
|
29
|
+
raise ArgumentError,
|
|
30
|
+
"scope-column mode: #{unscopable.size} table(s) cannot be scoped by " \
|
|
31
|
+
"'#{dump_target.scope_column}': #{names}. For each, add `scope_exempt: true` " \
|
|
32
|
+
"to export it in full, set `ignore: true` to skip it, or add a belongs_to path " \
|
|
33
|
+
"to a table that carries the scope column (use a per-table `scope_column` if the " \
|
|
34
|
+
"column name differs on that table)."
|
|
7
35
|
end
|
|
8
36
|
|
|
9
37
|
attr_reader :table_name, :table_by_name, :dump_target
|
|
10
38
|
|
|
11
|
-
def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true)
|
|
39
|
+
def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
|
|
12
40
|
@table_name = table_name
|
|
13
41
|
@table_by_name = table_by_name
|
|
14
42
|
@dump_target = dump_target
|
|
15
43
|
@logger = logger
|
|
16
44
|
@allow_reverse = allow_reverse
|
|
45
|
+
# @allow_forward gates the "scope via an indirectly-scoped belongs_to
|
|
46
|
+
# parent" rescue (build_belongs_to_scoped_clause). Disabled while building a
|
|
47
|
+
# parent/child subquery so a single forward hop never recurses into another
|
|
48
|
+
# (which could loop on a belongs_to cycle).
|
|
49
|
+
@allow_forward = allow_forward
|
|
17
50
|
end
|
|
18
51
|
|
|
19
52
|
def run
|
|
20
53
|
table = table_by_name.fetch(table_name)
|
|
21
54
|
|
|
55
|
+
return build_scoped(table) if scope_mode?
|
|
56
|
+
|
|
22
57
|
where_clauses = build_where_clauses(table, dump_target)
|
|
23
58
|
join_clauses = build_join_clauses(table, table_by_name, dump_target)
|
|
24
59
|
|
|
@@ -130,8 +165,10 @@ module Exwiw
|
|
|
130
165
|
next if relation.nil? || relation.polymorphic?
|
|
131
166
|
|
|
132
167
|
# Build the child's own extraction query. allow_reverse:false stops a
|
|
133
|
-
# chain of FK-less tables from recursing back into each other
|
|
134
|
-
|
|
168
|
+
# chain of FK-less tables from recursing back into each other;
|
|
169
|
+
# allow_forward:false stops the child from forward-scoping back through
|
|
170
|
+
# this very table (which would loop).
|
|
171
|
+
child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false, allow_forward: false)
|
|
135
172
|
|
|
136
173
|
# Only an *already constrained* child narrows anything; an unconstrained
|
|
137
174
|
# child would select every fk value (i.e. dump all) and not help.
|
|
@@ -169,6 +206,64 @@ module Exwiw
|
|
|
169
206
|
)
|
|
170
207
|
end
|
|
171
208
|
|
|
209
|
+
# Scope-column mode. Builds a `fk IN (SELECT parent.pk FROM <parent
|
|
210
|
+
# extraction query>)` clause for a table whose belongs_to parent is itself
|
|
211
|
+
# scopable but carries no scope column of its own — so find_path_to_scoped
|
|
212
|
+
# cannot terminate on it (via_path fails) and nothing references this table
|
|
213
|
+
# (referenced_by fails). The classic shape is a hub scoped only via
|
|
214
|
+
# referenced_by (e.g. CDP `customer_accounts`, scoped by the `customers` that
|
|
215
|
+
# reference it) with sibling detail tables (`customer_account_details`, ...)
|
|
216
|
+
# hanging off it. Constraining those siblings to the hub's in-scope ids keeps
|
|
217
|
+
# them out of a full dump. Returns nil when there is no single, unambiguous
|
|
218
|
+
# scopable parent, leaving the caller on the unscopable path.
|
|
219
|
+
private def build_belongs_to_scoped_clause(table)
|
|
220
|
+
candidates = table.belongs_tos.filter_map do |relation|
|
|
221
|
+
# A polymorphic belongs_to points at several parent tables through one
|
|
222
|
+
# column, so it cannot project to a single parent id set; skip it.
|
|
223
|
+
next if relation.polymorphic?
|
|
224
|
+
|
|
225
|
+
parent = table_by_name[relation.table_name]
|
|
226
|
+
next if parent.nil?
|
|
227
|
+
|
|
228
|
+
# Build the parent's own scoped query. allow_reverse stays true so the
|
|
229
|
+
# parent may be scoped via referenced_by; allow_forward:false bounds this
|
|
230
|
+
# to a single forward hop so a belongs_to cycle cannot loop.
|
|
231
|
+
parent_query = self.class.run(parent.name, table_by_name, dump_target, @logger, allow_reverse: true, allow_forward: false)
|
|
232
|
+
|
|
233
|
+
# Only a constrained parent narrows anything; an unconstrained parent
|
|
234
|
+
# would select every pk (i.e. dump all) and not help.
|
|
235
|
+
next unless parent_query.where_clauses.any? || parent_query.join_clauses.any?
|
|
236
|
+
|
|
237
|
+
[relation, parent, parent_query]
|
|
238
|
+
end
|
|
239
|
+
|
|
240
|
+
# Only the unambiguous single-parent case. Multiple scopable parents would
|
|
241
|
+
# need their subqueries combined (not supported); fall back to unscopable.
|
|
242
|
+
if candidates.size != 1
|
|
243
|
+
if candidates.size > 1
|
|
244
|
+
@logger.debug(" #{table.name} has multiple scopable parents; skipping forward scope (unscopable).")
|
|
245
|
+
end
|
|
246
|
+
return nil
|
|
247
|
+
end
|
|
248
|
+
|
|
249
|
+
relation, parent, parent_query = candidates.first
|
|
250
|
+
|
|
251
|
+
# Project the parent's extraction query down to just its primary key — the
|
|
252
|
+
# column this table's foreign key points at.
|
|
253
|
+
pk_column = TableColumn.from_symbol_keys(name: parent.primary_key)
|
|
254
|
+
projected = QueryAst::Select.new
|
|
255
|
+
projected.from(parent_query.from_table_name)
|
|
256
|
+
projected.select([pk_column])
|
|
257
|
+
parent_query.join_clauses.each { |j| projected.join(j) }
|
|
258
|
+
parent_query.where_clauses.each { |w| projected.where(w) }
|
|
259
|
+
|
|
260
|
+
QueryAst::WhereClause.new(
|
|
261
|
+
column_name: relation.foreign_key,
|
|
262
|
+
operator: :in_subquery,
|
|
263
|
+
value: QueryAst::SelectSubquery.new(query: projected)
|
|
264
|
+
)
|
|
265
|
+
end
|
|
266
|
+
|
|
172
267
|
private def build_where_clauses(table, dump_target)
|
|
173
268
|
clauses = []
|
|
174
269
|
|
|
@@ -264,5 +359,208 @@ module Exwiw
|
|
|
264
359
|
|
|
265
360
|
queue
|
|
266
361
|
end
|
|
362
|
+
|
|
363
|
+
# ------------------------------------------------------------------
|
|
364
|
+
# Scope-column mode (Exwiw::DumpTarget#scope_column).
|
|
365
|
+
#
|
|
366
|
+
# The single-target machinery above anchors everything on one named table.
|
|
367
|
+
# Scope mode instead filters every table by a shared column. The relationship
|
|
368
|
+
# walk is the same idea — the *terminus* is just "any table carrying the
|
|
369
|
+
# scope column" rather than "the one named target".
|
|
370
|
+
# ------------------------------------------------------------------
|
|
371
|
+
|
|
372
|
+
private def scope_mode?
|
|
373
|
+
!dump_target.scope_column.nil?
|
|
374
|
+
end
|
|
375
|
+
|
|
376
|
+
# Classifier used by validate_scope! and mirrored by build_scoped below.
|
|
377
|
+
def scope_category
|
|
378
|
+
table = table_by_name.fetch(table_name)
|
|
379
|
+
return :exempt if scope_exempt?(table)
|
|
380
|
+
return :direct if directly_scoped?(table)
|
|
381
|
+
return :via_path if build_join_clauses_scoped(table).any?
|
|
382
|
+
return :referenced_by if @allow_reverse && build_referenced_by_clause(table)
|
|
383
|
+
return :via_scoped_parent if @allow_forward && build_belongs_to_scoped_clause(table)
|
|
384
|
+
|
|
385
|
+
:unscopable
|
|
386
|
+
end
|
|
387
|
+
|
|
388
|
+
private def build_scoped(table)
|
|
389
|
+
ast = QueryAst::Select.new
|
|
390
|
+
ast.from(table.name)
|
|
391
|
+
if table.rails_managed?
|
|
392
|
+
ast.select_all!
|
|
393
|
+
else
|
|
394
|
+
ast.select(table.columns)
|
|
395
|
+
end
|
|
396
|
+
|
|
397
|
+
# Reference/master (or rails-managed) table: export every row.
|
|
398
|
+
return ast if scope_exempt?(table)
|
|
399
|
+
|
|
400
|
+
# Carries the scope column itself: filter on it directly.
|
|
401
|
+
if directly_scoped?(table)
|
|
402
|
+
ast.where(scope_where_clause(table))
|
|
403
|
+
ast.where(table.filter) if table.filter
|
|
404
|
+
return ast
|
|
405
|
+
end
|
|
406
|
+
|
|
407
|
+
# Reachable via belongs_to: join up to the scoped ancestor (the scope
|
|
408
|
+
# filter is applied at the terminal join inside build_join_clauses_scoped).
|
|
409
|
+
join_clauses = build_join_clauses_scoped(table)
|
|
410
|
+
unless join_clauses.empty?
|
|
411
|
+
join_clauses.each { |join_clause| ast.join(join_clause) }
|
|
412
|
+
ast.where(table.filter) if table.filter
|
|
413
|
+
return ast
|
|
414
|
+
end
|
|
415
|
+
|
|
416
|
+
if @allow_reverse
|
|
417
|
+
# Referenced by an extractable (scoped) child: constrain via subquery.
|
|
418
|
+
reverse_clause = build_referenced_by_clause(table)
|
|
419
|
+
if reverse_clause
|
|
420
|
+
ast.where(reverse_clause)
|
|
421
|
+
return ast
|
|
422
|
+
end
|
|
423
|
+
end
|
|
424
|
+
|
|
425
|
+
if @allow_forward
|
|
426
|
+
# Belongs_to a parent that is itself scoped but carries no scope column of
|
|
427
|
+
# its own (so via_path cannot terminate on it) — e.g. a hub table scoped
|
|
428
|
+
# only via referenced_by. Constrain this table to that parent's in-scope
|
|
429
|
+
# ids so its rows ride along instead of being dumped in full.
|
|
430
|
+
parent_clause = build_belongs_to_scoped_clause(table)
|
|
431
|
+
if parent_clause
|
|
432
|
+
ast.where(parent_clause)
|
|
433
|
+
return ast
|
|
434
|
+
end
|
|
435
|
+
end
|
|
436
|
+
|
|
437
|
+
# Only the genuine top-level build (no rescue disabled) is allowed to fail
|
|
438
|
+
# hard. The Runner/ExplainRunner pre-flight (validate_scope!) rejects
|
|
439
|
+
# unscopable tables before extraction, so a top-level build never
|
|
440
|
+
# legitimately lands here; if it does, raise rather than emit an unfiltered
|
|
441
|
+
# (potential full PII) dump.
|
|
442
|
+
if @allow_reverse && @allow_forward
|
|
443
|
+
raise ArgumentError, scope_unscopable_message(table)
|
|
444
|
+
end
|
|
445
|
+
|
|
446
|
+
# Unscopable during a reverse/forward subquery build (a rescue is disabled):
|
|
447
|
+
# return the unconstrained AST so the caller's "constrained only" check
|
|
448
|
+
# filters this candidate out (it never becomes a real dump query).
|
|
449
|
+
ast
|
|
450
|
+
end
|
|
451
|
+
|
|
452
|
+
# The shared column this table is filtered on: a per-table `scope_column`
|
|
453
|
+
# override when present, otherwise the global `--scope-column`.
|
|
454
|
+
private def resolved_scope_column(table)
|
|
455
|
+
table.scope_column || dump_target.scope_column
|
|
456
|
+
end
|
|
457
|
+
|
|
458
|
+
private def scope_exempt?(table)
|
|
459
|
+
table.scope_exempt || table.rails_managed?
|
|
460
|
+
end
|
|
461
|
+
|
|
462
|
+
private def directly_scoped?(table)
|
|
463
|
+
column = resolved_scope_column(table)
|
|
464
|
+
table.columns.any? { |c| c.name == column }
|
|
465
|
+
end
|
|
466
|
+
|
|
467
|
+
private def scope_where_clause(table)
|
|
468
|
+
Exwiw::QueryAst::WhereClause.new(
|
|
469
|
+
column_name: resolved_scope_column(table),
|
|
470
|
+
operator: :eq,
|
|
471
|
+
value: dump_target.ids
|
|
472
|
+
)
|
|
473
|
+
end
|
|
474
|
+
|
|
475
|
+
# BFS over belongs_tos to the nearest *directly scoped* ancestor. Unlike the
|
|
476
|
+
# target-mode walk, the returned path INCLUDES that ancestor: the scope column
|
|
477
|
+
# lives on the ancestor itself (not on a foreign key of the child), so the
|
|
478
|
+
# ancestor must be joined and then filtered.
|
|
479
|
+
private def find_path_to_scoped(table)
|
|
480
|
+
visited = {}
|
|
481
|
+
queue = [[table.name, [table.name]]]
|
|
482
|
+
|
|
483
|
+
until queue.empty?
|
|
484
|
+
current_table_name, path = queue.shift
|
|
485
|
+
next if visited[current_table_name]
|
|
486
|
+
visited[current_table_name] = true
|
|
487
|
+
|
|
488
|
+
current_table = table_by_name[current_table_name]
|
|
489
|
+
next if current_table.nil?
|
|
490
|
+
|
|
491
|
+
current_table.belongs_tos.each do |relation|
|
|
492
|
+
next_table_name = relation.table_name
|
|
493
|
+
next_table = table_by_name[next_table_name]
|
|
494
|
+
next if next_table.nil?
|
|
495
|
+
|
|
496
|
+
next_path = path + [next_table_name]
|
|
497
|
+
return next_path if directly_scoped?(next_table)
|
|
498
|
+
|
|
499
|
+
queue.push([next_table_name, next_path])
|
|
500
|
+
end
|
|
501
|
+
end
|
|
502
|
+
|
|
503
|
+
[]
|
|
504
|
+
end
|
|
505
|
+
|
|
506
|
+
private def build_join_clauses_scoped(table)
|
|
507
|
+
path_tables = find_path_to_scoped(table)
|
|
508
|
+
@logger.debug(" Join path from #{table.name} to a scoped table: #{path_tables}")
|
|
509
|
+
|
|
510
|
+
return [] if path_tables.size < 2
|
|
511
|
+
|
|
512
|
+
path_tables.each_cons(2).map do |from_table_name, to_table_name|
|
|
513
|
+
from_table = table_by_name[from_table_name]
|
|
514
|
+
to_table = table_by_name[to_table_name]
|
|
515
|
+
|
|
516
|
+
join_clause = build_scoped_join_clause(from_table, to_table)
|
|
517
|
+
|
|
518
|
+
# Only the final hop's to_table is directly scoped (the BFS stops there),
|
|
519
|
+
# so the scope filter rides on that join's where_clauses, compiled against
|
|
520
|
+
# join_table_name = the scoped ancestor.
|
|
521
|
+
if directly_scoped?(to_table)
|
|
522
|
+
join_clause.where_clauses.push scope_where_clause(to_table)
|
|
523
|
+
end
|
|
524
|
+
|
|
525
|
+
if to_table.filter
|
|
526
|
+
join_clause.where_clauses.push to_table.filter
|
|
527
|
+
end
|
|
528
|
+
|
|
529
|
+
join_clause
|
|
530
|
+
end
|
|
531
|
+
end
|
|
532
|
+
|
|
533
|
+
# One belongs_to hop as a JoinClause, with the polymorphic type condition
|
|
534
|
+
# placed on the source table (base_where_clauses) when the hop is polymorphic
|
|
535
|
+
# — mirroring the target-mode loop in build_join_clauses.
|
|
536
|
+
private def build_scoped_join_clause(from_table, to_table)
|
|
537
|
+
relation = from_table.belongs_to(to_table.name)
|
|
538
|
+
|
|
539
|
+
join_clause = QueryAst::JoinClause.new(
|
|
540
|
+
base_table_name: from_table.name,
|
|
541
|
+
foreign_key: relation.foreign_key,
|
|
542
|
+
join_table_name: to_table.name,
|
|
543
|
+
primary_key: to_table.primary_key,
|
|
544
|
+
where_clauses: [],
|
|
545
|
+
base_where_clauses: []
|
|
546
|
+
)
|
|
547
|
+
|
|
548
|
+
if relation.polymorphic?
|
|
549
|
+
join_clause.base_where_clauses.push QueryAst::WhereClause.new(
|
|
550
|
+
column_name: relation.foreign_type,
|
|
551
|
+
operator: :eq,
|
|
552
|
+
value: [relation.type_value]
|
|
553
|
+
)
|
|
554
|
+
end
|
|
555
|
+
|
|
556
|
+
join_clause
|
|
557
|
+
end
|
|
558
|
+
|
|
559
|
+
private def scope_unscopable_message(table)
|
|
560
|
+
"Table '#{table.name}' cannot be scoped in scope-column mode: it has no " \
|
|
561
|
+
"'#{dump_target.scope_column}' column (nor a per-table scope_column override) and no " \
|
|
562
|
+
"belongs_to path to a table that does. Add `scope_exempt: true` to export it in full, " \
|
|
563
|
+
"set `ignore: true` to skip it, or add the missing belongs_to."
|
|
564
|
+
end
|
|
267
565
|
end
|
|
268
566
|
end
|
data/lib/exwiw/runner.rb
CHANGED
|
@@ -38,8 +38,13 @@ module Exwiw
|
|
|
38
38
|
target = table_by_name[@dump_target.table_name]
|
|
39
39
|
adapter.validate_as_dump_target!(target) if target
|
|
40
40
|
|
|
41
|
+
dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
|
|
42
|
+
# Scope-column mode: abort if any extractable table cannot be scoped (no-op
|
|
43
|
+
# otherwise). Done before extraction so nothing is dumped if it would leak.
|
|
44
|
+
QueryAstBuilder.validate_scope!(dumpable_configs, table_by_name, @dump_target, @logger)
|
|
45
|
+
|
|
41
46
|
@logger.info("Determining table processing order...")
|
|
42
|
-
ordered_table_names = DetermineTableProcessingOrder.run(
|
|
47
|
+
ordered_table_names = DetermineTableProcessingOrder.run(dumpable_configs, logger: @logger)
|
|
43
48
|
|
|
44
49
|
clean_output_dir!
|
|
45
50
|
|
data/lib/exwiw/table_config.rb
CHANGED
|
@@ -26,6 +26,18 @@ module Exwiw
|
|
|
26
26
|
attribute :columns, array(TableColumn), default: []
|
|
27
27
|
attribute :bulk_insert_chunk_size, optional(Integer), skip_serializing_if_nil: true
|
|
28
28
|
attribute :ignore, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
|
|
29
|
+
# Scope-column mode only (see Exwiw::DumpTarget#scope_column). Both are
|
|
30
|
+
# user-configured and never emitted by the schema generators.
|
|
31
|
+
#
|
|
32
|
+
# `scope_exempt: true` exports the whole table without scope filtering — the
|
|
33
|
+
# explicit, auditable escape hatch for genuine reference/master tables under
|
|
34
|
+
# the strict "every table must be scopable" rule.
|
|
35
|
+
#
|
|
36
|
+
# `scope_column` overrides the physical column this table is filtered on when
|
|
37
|
+
# it differs from the global `--scope-column` name (same scope value, just a
|
|
38
|
+
# different column name on this table).
|
|
39
|
+
attribute :scope_exempt, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
|
|
40
|
+
attribute :scope_column, optional(String), skip_serializing_if_nil: true
|
|
29
41
|
|
|
30
42
|
def self.from(hash)
|
|
31
43
|
config = super
|
|
@@ -137,6 +149,9 @@ module Exwiw
|
|
|
137
149
|
merged_table.filter = filter
|
|
138
150
|
merged_table.bulk_insert_chunk_size = passed_table.bulk_insert_chunk_size
|
|
139
151
|
merged_table.ignore = ignore
|
|
152
|
+
# User-owned, never regenerated: carry over from the existing config.
|
|
153
|
+
merged_table.scope_exempt = scope_exempt
|
|
154
|
+
merged_table.scope_column = scope_column
|
|
140
155
|
|
|
141
156
|
# Structural facts of each belongs_to come from the freshly generated
|
|
142
157
|
# config, but the user-owned `comment`/`ignore`/`references` carry over
|
data/lib/exwiw/version.rb
CHANGED
data/lib/exwiw.rb
CHANGED
|
@@ -39,7 +39,13 @@ module Exwiw
|
|
|
39
39
|
# `ids_field` optionally overrides which field `--ids` is matched against on
|
|
40
40
|
# the target table. When nil the table's primary key is used (the historical
|
|
41
41
|
# behavior). Currently only honored by the mongodb adapter.
|
|
42
|
-
|
|
42
|
+
#
|
|
43
|
+
# `scope_column` switches the extraction to scope-column mode: instead of a
|
|
44
|
+
# single `table_name` anchor, every table is filtered by a shared column
|
|
45
|
+
# (`scope_column IN ids`) and tables lacking it are reached by walking
|
|
46
|
+
# belongs_to up to the nearest table that has it. When set, `table_name` is
|
|
47
|
+
# nil. SQL adapters only.
|
|
48
|
+
DumpTarget = Struct.new(:table_name, :ids, :ids_field, :scope_column, keyword_init: true)
|
|
43
49
|
# `uri` is an optional full connection string (currently only honored by the
|
|
44
50
|
# mongodb adapter, e.g. `mongodb+srv://...`). When present it is the source of
|
|
45
51
|
# truth for the connection — host/port/user/password are ignored — so TLS,
|