exwiw 0.5.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 224bdc1d3b0f94e08463ad9e42a6e67d0592d902388b5873f5840226dbdbd3fe
4
- data.tar.gz: de9ddd4a625565e0bcd28ff3f74df8da06092c443ad1f170d41c5858a24c4802
3
+ metadata.gz: 567683d65df5d9f147ab9415a67baf48a80e21ad32e1ef7635c624dfc3d28c47
4
+ data.tar.gz: 1513b577f6f2368df60edc45a54c96495ece4f1ee9b453e92adb8991f182fcdf
5
5
  SHA512:
6
- metadata.gz: '08f564c07c09561a4b9b825bb7f6ca43a076df5b8262f165addad471639084fa5b5074330215edd36951ce0c427f510533f39d03e0632c1306ba4e9054391b33'
7
- data.tar.gz: 2c161f236a676a15774fb097a7a2c4d66f95f38be8c465dfc29788b4c45165aa3b13dee2b1da771e317672e63f089d2496e10cf4fe2ebf16f97885e0e1c49c76
6
+ metadata.gz: a9680642eb34f99ed3f0c2924154a5171edf541286dfed14befe15b8c029271419a13d3295694847b1143898b0200bf6eb1d6448e6665dbad7bef026e3c3fbbb
7
+ data.tar.gz: 30e6ef9f988965b85f899fdb0646b6e4e2befd68f95a247a9edfeddb8d3a6088f611f07d5376c7d3b9f4f58f147072e03294a140312dec1622994fb6da175720
data/CHANGELOG.md CHANGED
@@ -2,6 +2,21 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [0.5.2] - 2026-06-18
6
+
7
+ ### Fixed
8
+
9
+ - **PostgreSQL: an extension the restore target cannot create no longer aborts the restore, and `pglogical` is never emitted.** `dump_schema` prepends `CREATE EXTENSION IF NOT EXISTS` for every extension installed on the source, wrapped in a `DO` block that previously swallowed only `feature_not_supported` (the extension's binaries are unavailable on the target). A source on managed Postgres/AlloyDB carrying `pglogical` (logical replication) emitted `CREATE EXTENSION ... SCHEMA "pglogical"`, which on a target lacking that schema fails with `invalid_schema_name` — an error the handler did not catch, so the whole restore aborted. The handler now also catches `invalid_schema_name`, and instead of silently discarding the skip it re-raises it as a `WARNING` (carrying SQLSTATE and the original message) so the skip is visible in the restore logs rather than vanishing. `insufficient_privilege` is intentionally **not** caught: a restore role lacking `CREATE` privilege is a misconfiguration that must fail loudly. Separately, `pglogical` is now excluded from the prepended extensions entirely (alongside `plpgsql` and the `google_*`/`rds_*`/`aiven_*` managed-platform extensions) — it is a replication mechanism of the source, not part of the copied data.
10
+ - **Table processing order no longer aborts on `belongs_to` cycles.** Two distinct problems made `export`/`explain` fail with "Circular belongs_to dependency detected" on schemas that have no resolvable order: (1) a `belongs_to` whose target table is **not part of the run** — most commonly an embedded MongoDB collection, which is masked through its parent and never dumped on its own — was treated as a dependency that could never be satisfied, so every table that (transitively) referenced one froze and was misreported as a cycle; such out-of-run targets are now ignored when ordering. (2) A **genuine** cycle (e.g. `a belongs_to b` and `b belongs_to a`) now **breaks deterministically with a warning** instead of raising: exwiw emits the cycle member (a table in a strongly-connected component of the unresolved-dependency graph) with the fewest unresolved dependencies, preferring one that still has an already-ordered parent so its extraction stays constrained, and logs which `belongs_to` edge was dropped. The dropped edge is not enforced while ordering, so for the mongodb adapter that table may match a superset of rows (the not-yet-processed parent contributes no `$in` filter); mark one of the `belongs_to` entries forming the cycle with `ignore: true` to break it explicitly instead. Acyclic tables that merely wait on a cycle are never reordered ahead of their parents.
11
+ - **MongoDB: `dump_schema` tolerates collections declared in the schema but absent from the source database.** Listing indexes for a non-existent collection makes the driver raise `NamespaceNotFound` (code 26), which aborted the whole export when the schema covered more collections than the connected database actually had (schema/DB drift, or a sparse development database). The existing collections are now resolved once up front and indexes are emitted only for those; `createCollection` is still emitted for every configured collection, so the target schema is created in full.
12
+ - **MongoDB: a related collection that cannot be scoped no longer falls back to dumping the whole collection.** When a non-target collection's `belongs_to` parents all yield no ids to filter by — because every parent matched nothing or is not dumped on its own (e.g. an embedded collection) — the assembled filter is empty. Previously that empty filter was sent as `find({})`, scanning and dumping the **entire collection across every scope** (a cross-scope data-exposure risk). Such a collection is now constrained to match no rows and a warning is logged instead. A collection with no `belongs_to` at all is still treated as reference/master data and dumped in full.
13
+
14
+ ## [0.5.1] - 2026-06-18
15
+
16
+ ### Added
17
+
18
+ - **Scope-column extraction mode** (`--scope-column`, SQL adapters only). For schemas where many independent top-level tables share the same scope/tenant column instead of converging on a single `belongs_to` root, exwiw can now filter **every** table by that shared column (`--scope-column=COLUMN` with `--ids` as its values) rather than anchoring on one `--target-table`. A table that carries the column is filtered directly; a table that lacks it but `belongs_to` a table that has it is joined up to the nearest such table and filtered there. A table that `belongs_to` a parent which is itself scoped but carries no scope column of its own (e.g. a *hub* table scoped only because an extractable child references it) is constrained to the parent's in-scope ids via a subquery (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`), so the hub's other children ride along to just the in-scope rows — limited to a single forward hop and a single unambiguous scopable parent. A table that cannot be scoped at all (no column and no path to one) makes the run **abort with a list of the offending tables**, so an unscoped table is never silently dumped in full. Two user-owned table-config keys support this and are preserved across `schema:generate` regeneration: **`scope_exempt: true`** exports a genuine reference/master table in full (rails-managed tables are treated as exempt automatically), and **`scope_column`** overrides the filtered column name for a table that stores the same scope value under a different name. `--scope-column` is mutually exclusive with `--target-table`, `--target-collection`, `--ids-column`, and `--ids-field`, can be set in `exwiw.yml`, and works with `exwiw explain`.
19
+
5
20
  ## [0.5.0] - 2026-06-16
6
21
 
7
22
  ### Added
data/README.md CHANGED
@@ -129,6 +129,88 @@ exwiw explain \
129
129
 
130
130
  The `--output-dir`, `--output-format`, `--insert-only`, and `--after-insert-hook` options are dump-specific and rejected when used with `explain`.
131
131
 
132
+ ### Scope-column mode (`--scope-column`)
133
+
134
+ The default `--target-table` extraction assumes the schema converges on a single
135
+ root: every table is reached by walking `belongs_to` toward that one table. Some
136
+ schemas are not shaped that way — many independent top-level tables each carry the
137
+ *same* scope/tenant column (e.g. `tenant_id`, `account_uuid`) and there is no
138
+ single root. Choosing one of them as `--target-table` would leave the others
139
+ unrelated to it, and an unrelated table is dumped in full — a problem if it holds
140
+ personal data.
141
+
142
+ `--scope-column` handles this shape: instead of one anchor table, **every table is
143
+ filtered by a shared column** whose values are `--ids`.
144
+
145
+ ```bash
146
+ exwiw \
147
+ --adapter=postgresql \
148
+ --host=localhost --port=5432 --user=reader \
149
+ --database=app_production \
150
+ --schema-dir=exwiw/schema \
151
+ --scope-column=tenant_id \
152
+ --ids=42,43 \
153
+ --output-dir=dump
154
+ ```
155
+
156
+ Each table is resolved as follows:
157
+
158
+ - **Carries the scope column** → `WHERE scope_column IN (ids)`.
159
+ - **Lacks it but `belongs_to` reaches a table that has it** → exwiw joins up to the
160
+ nearest such table and applies the scope filter there (the same join machinery
161
+ the single-target mode uses).
162
+ - **`belongs_to` a parent that is itself scoped but carries no scope column of its
163
+ own** → exwiw constrains this table to the parent's in-scope ids via a subquery
164
+ (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`). This covers a *hub*
165
+ table that has no scope column and is scoped only because an extractable child
166
+ references it (see referenced-by below): the hub's other `belongs_to` children
167
+ ride along to just the in-scope rows instead of being dumped in full. Limited to
168
+ a single forward hop and a single unambiguous scopable parent.
169
+ - **Cannot be scoped at all** (no scope column and no path to one) → exwiw
170
+ **aborts** and lists the offending tables, so an unscoped table is never silently
171
+ dumped in full. For each, either add a `belongs_to` path, set `ignore: true` to
172
+ skip it, or mark it `scope_exempt: true` (below) to export it in full.
173
+
174
+ `--scope-column` is SQL-only (mysql / postgresql / sqlite) and mutually exclusive
175
+ with `--target-table`, `--target-collection`, `--ids-column`, and `--ids-field`.
176
+ It works with `exwiw explain` too, which is the recommended way to preview the
177
+ queries before exporting.
178
+
179
+ #### `scope_exempt` (intentional full dump)
180
+
181
+ A genuine reference/master table (no personal data) that has no scope linkage can
182
+ opt out of the strict check and be exported in full:
183
+
184
+ ```json
185
+ {
186
+ "name": "countries",
187
+ "primary_key": "id",
188
+ "scope_exempt": true,
189
+ "columns": [{ "name": "id" }, { "name": "code" }]
190
+ }
191
+ ```
192
+
193
+ Rails-managed tables (`schema_migrations`, `ar_internal_metadata`) are treated as
194
+ exempt automatically.
195
+
196
+ #### Per-table `scope_column` override
197
+
198
+ scope-column mode assumes a single shared **value** space — the same `--ids` apply
199
+ to every scoped table. If a table stores that same value under a differently named
200
+ column, override the column name for that table:
201
+
202
+ ```json
203
+ {
204
+ "name": "legacy_orders",
205
+ "primary_key": "id",
206
+ "scope_column": "legacy_tenant_id",
207
+ "columns": [{ "name": "id" }, { "name": "legacy_tenant_id" }]
208
+ }
209
+ ```
210
+
211
+ Both `scope_exempt` and `scope_column` are user-maintained and preserved across
212
+ `schema:generate` regeneration (the generators never emit them).
213
+
132
214
  ### Config file (`exwiw.yml`)
133
215
 
134
216
  Options you would otherwise repeat on every run can be kept in a YAML config file. Pass it with `--config=PATH`; when `--config` is omitted, exwiw automatically loads `exwiw.yml` (or `exwiw.yaml`) from the current directory if present.
@@ -144,7 +226,7 @@ output_format: insert # insert | copy
144
226
  insert_only: false
145
227
  after_insert_hook: hooks/seed.rb
146
228
  log_level: info # debug | info
147
- # target_table / ids / ids_field / ids_column may also be set here
229
+ # target_table / ids / ids_field / ids_column / scope_column may also be set here
148
230
  ```
149
231
 
150
232
  With the file above, only the connection details need to be supplied on the CLI:
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'json'
4
+ require 'set'
4
5
 
5
6
  # NOTE: This adapter consumes MongodbCollectionConfig (`fields` instead of
6
7
  # `columns`, plus `embedded_in`). Top-level collections are dumped as one
@@ -71,16 +72,7 @@ module Exwiw
71
72
  { config.primary_key => { "$in" => coerce_ids(dump_target.ids) } }
72
73
  end
73
74
  else
74
- config.belongs_tos.each_with_object({}) do |relation, acc|
75
- # Constrain by the parent field this FK actually references
76
- # (`relation.references`, default the parent primary_key). The
77
- # values were captured from that field's documents in #execute, so
78
- # their BSON type already matches the stored FK — no coercion.
79
- values = parent_state_for(relation, config_by_name)
80
- next if values.nil? || values.empty?
81
-
82
- acc[relation.foreign_key] = { "$in" => values }
83
- end
75
+ related_collection_filter(config, config_by_name)
84
76
  end
85
77
 
86
78
  Exwiw::MongoQuery::Find.new(
@@ -160,6 +152,14 @@ module Exwiw
160
152
 
161
153
  collections = ordered_tables.reject(&:embedded?)
162
154
 
155
+ # Index listing targets a specific collection, and MongoDB raises
156
+ # NamespaceNotFound (code 26) for one that does not exist. The schema may
157
+ # declare collections absent from this database (schema/DB drift, or a
158
+ # sparse dev DB), so resolve the set that actually exists up front and emit
159
+ # indexes only for those. `createCollection` is still emitted for every
160
+ # config below, so the target schema is created in full regardless.
161
+ existing_collections = db.database.collection_names.to_set
162
+
163
163
  File.open(output_path, 'w') do |file|
164
164
  file.puts("// Auto-generated by exwiw. Apply with: mongosh \"$MONGODB_URI\" #{File.basename(output_path)}")
165
165
  file.puts
@@ -172,6 +172,11 @@ module Exwiw
172
172
 
173
173
  collections.each do |config|
174
174
  name = config.name
175
+ unless existing_collections.include?(name)
176
+ @logger.debug(" Collection '#{name}' is not present in the source database; emitting no indexes.")
177
+ next
178
+ end
179
+
175
180
  indexes = db[name].indexes.to_a.reject { |idx| idx['name'] == '_id_' }
176
181
  indexes.each do |idx|
177
182
  key = idx['key']
@@ -274,6 +279,39 @@ module Exwiw
274
279
  ([config.primary_key] + referenced).uniq
275
280
  end
276
281
 
282
+ # Build the scoping filter for a non-target collection from its belongs_to
283
+ # parents' captured ids. Each belongs_to is constrained by the parent field
284
+ # the FK references (`relation.references`, default the parent primary_key);
285
+ # the values were captured from that field in #execute, so their BSON type
286
+ # already matches the stored FK — no coercion.
287
+ #
288
+ # A belongs_to whose parent produced no ids contributes no constraint:
289
+ # either the parent matched nothing, or it is not dumped here (e.g. an
290
+ # embedded collection, or one excluded from the run). If that leaves the
291
+ # filter empty even though the collection HAS belongs_to, the collection
292
+ # cannot be scoped from the dump target — and falling back to an empty `{}`
293
+ # filter would scan and dump the ENTIRE collection across every scope. That
294
+ # is never what a scoped extraction wants, so constrain it to match nothing
295
+ # and warn instead. (A collection with no belongs_to at all is genuine
296
+ # reference/master data and is still dumped in full via `{}`.)
297
+ private def related_collection_filter(config, config_by_name)
298
+ filter = config.belongs_tos.each_with_object({}) do |relation, acc|
299
+ values = parent_state_for(relation, config_by_name)
300
+ next if values.nil? || values.empty?
301
+
302
+ acc[relation.foreign_key] = { "$in" => values }
303
+ end
304
+
305
+ return filter unless filter.empty? && config.belongs_tos.any?
306
+
307
+ @logger.warn(
308
+ " Collection '#{config.name}' has belongs_to but no parent produced ids to scope by " \
309
+ "(parents matched nothing, or are not dumped on their own such as embedded collections). " \
310
+ "Constraining it to match no rows to avoid an unscoped full-collection dump."
311
+ )
312
+ { config.primary_key => { "$in" => [] } }
313
+ end
314
+
277
315
  # The captured parent-collection values a child belongs_to should be
278
316
  # constrained by: the values of the parent field the FK references
279
317
  # (`relation.references`, default the parent primary_key). nil when the
@@ -65,7 +65,18 @@ module Exwiw
65
65
  ext_ddl = extensions.map do |extname, schema|
66
66
  stmt = "CREATE EXTENSION IF NOT EXISTS #{connection.quote_ident(extname)}"
67
67
  stmt += " SCHEMA #{connection.quote_ident(schema)}" unless schema == "public"
68
- "DO $$ BEGIN #{stmt}; EXCEPTION WHEN feature_not_supported THEN NULL; END $$;"
68
+ # Best-effort prepend: a restore target that genuinely cannot create the
69
+ # extension should not abort the whole restore. Two such cases are caught:
70
+ # feature_not_supported (0A000) -- the extension's binaries are unavailable
71
+ # invalid_schema_name (3F000) -- the extension's required schema is absent
72
+ # insufficient_privilege (42501) is deliberately NOT caught: a restore role
73
+ # lacking CREATE privilege is a misconfiguration to fix, not to skip silently.
74
+ # The skip is re-raised as a WARNING so it surfaces in the restore logs
75
+ # instead of vanishing.
76
+ warning = connection.escape_literal("exwiw: skipped CREATE EXTENSION #{extname} (SQLSTATE %): %")
77
+ "DO $$ BEGIN #{stmt}; " \
78
+ "EXCEPTION WHEN feature_not_supported OR invalid_schema_name THEN " \
79
+ "RAISE WARNING #{warning}, SQLSTATE, SQLERRM; END $$;"
69
80
  end.join("\n") + "\n\n"
70
81
  @logger.debug(" Found #{extensions.size} extension(s) to prepend.")
71
82
  stdout = ext_ddl + stdout
@@ -382,11 +393,17 @@ module Exwiw
382
393
  end
383
394
 
384
395
  private def query_extensions
396
+ # Skip plpgsql (always present) and managed-platform bookkeeping extensions
397
+ # (google_*/rds_*/aiven_*). pglogical is also skipped: it is a logical-
398
+ # replication mechanism of the source, not part of the data being copied,
399
+ # and its dedicated `pglogical` schema is typically absent on the restore
400
+ # target — so prepending CREATE EXTENSION for it only breaks the restore.
385
401
  sql = <<~SQL
386
402
  SELECT e.extname, n.nspname
387
403
  FROM pg_extension e
388
404
  JOIN pg_namespace n ON n.oid = e.extnamespace
389
405
  WHERE e.extname != 'plpgsql'
406
+ AND e.extname != 'pglogical'
390
407
  AND e.extname NOT LIKE 'google\\_%' ESCAPE '\\'
391
408
  AND e.extname NOT LIKE 'rds\\_%' ESCAPE '\\'
392
409
  AND e.extname NOT LIKE 'aiven\\_%' ESCAPE '\\'
@@ -38,6 +38,7 @@ module Exwiw
38
38
  'EXWIW_DATABASE_USER' => cli_options[:database_user].to_s,
39
39
  'EXWIW_DATABASE_NAME' => cli_options[:database_name].to_s,
40
40
  'EXWIW_TARGET_TABLE' => cli_options[:target_table].to_s,
41
+ 'EXWIW_SCOPE_COLUMN' => cli_options[:scope_column].to_s,
41
42
  'EXWIW_IDS' => Array(cli_options[:ids]).join(','),
42
43
  'EXWIW_OUTPUT_FORMAT' => cli_options[:output_format].to_s,
43
44
  }
data/lib/exwiw/cli.rb CHANGED
@@ -35,6 +35,7 @@ module Exwiw
35
35
  ids
36
36
  ids_field
37
37
  ids_column
38
+ scope_column
38
39
  ].freeze
39
40
 
40
41
  # Database connection settings are environment-specific (and sometimes
@@ -77,6 +78,7 @@ module Exwiw
77
78
  @ids = []
78
79
  @ids_field = nil
79
80
  @ids_column = nil
81
+ @scope_column = nil
80
82
  @output_format = nil
81
83
  @insert_only = nil
82
84
  @after_insert_hook_path = nil
@@ -109,6 +111,7 @@ module Exwiw
109
111
  table_name: @target_table_name,
110
112
  ids: @ids,
111
113
  ids_field: @ids_field,
114
+ scope_column: @scope_column,
112
115
  )
113
116
 
114
117
  logger = build_logger
@@ -161,6 +164,7 @@ module Exwiw
161
164
  end
162
165
 
163
166
  resolve_target_collection_alias!
167
+ resolve_scope_column!
164
168
  resolve_ids_column_alias!
165
169
  resolve_uri_option!
166
170
 
@@ -228,8 +232,13 @@ module Exwiw
228
232
  exit 1
229
233
  end
230
234
 
231
- if !@target_table_name && @ids.any?
232
- $stderr.puts "--target-table is required when --ids is specified"
235
+ if @scope_column && @ids.empty?
236
+ $stderr.puts "--ids is required when --scope-column is specified"
237
+ exit 1
238
+ end
239
+
240
+ if !@target_table_name && !@scope_column && @ids.any?
241
+ $stderr.puts "--target-table or --scope-column is required when --ids is specified"
233
242
  exit 1
234
243
  end
235
244
 
@@ -309,6 +318,7 @@ module Exwiw
309
318
  end
310
319
  @ids_field ||= config["ids_field"]
311
320
  @ids_column ||= config["ids_column"]
321
+ @scope_column ||= config["scope_column"]
312
322
  end
313
323
 
314
324
  # Strip a trailing slash (like the CLI's dir options) and expand relative to
@@ -376,6 +386,33 @@ module Exwiw
376
386
  end
377
387
  end
378
388
 
389
+ # `--scope-column` switches to scope-column mode: every table is filtered by a
390
+ # shared column (`--ids` are its values) instead of anchoring on one
391
+ # `--target-table`. It is SQL-only and mutually exclusive with the single-target
392
+ # flags. Runs after resolve_target_collection_alias! (so --target-collection is
393
+ # already folded into @target_table_name) and before resolve_ids_column_alias!
394
+ # so the clearer "cannot combine" message wins over the generic ids-column one.
395
+ private def resolve_scope_column!
396
+ return if @scope_column.nil?
397
+
398
+ sql_adapters = ["mysql", "postgresql", "sqlite"]
399
+ unless sql_adapters.include?(@database_adapter)
400
+ $stderr.puts "--scope-column is only supported by the sql adapters"
401
+ exit 1
402
+ end
403
+
404
+ if @target_table_name
405
+ $stderr.puts "--scope-column cannot be combined with --target-table/--target-collection"
406
+ exit 1
407
+ end
408
+
409
+ if @ids_field || @ids_column
410
+ flag = @ids_column ? "--ids-column" : "--ids-field"
411
+ $stderr.puts "--scope-column cannot be combined with #{flag}"
412
+ exit 1
413
+ end
414
+ end
415
+
379
416
  # `--uri` supplies a full connection string (e.g. `mongodb+srv://...`) and is
380
417
  # mongodb-only — the SQL adapters shell out to their own client binaries with
381
418
  # discrete host/port/user flags and have no equivalent. Runs after the
@@ -442,6 +479,7 @@ module Exwiw
442
479
  target_table: @target_table_name,
443
480
  ids: @ids.dup.freeze,
444
481
  ids_field: @ids_field,
482
+ scope_column: @scope_column,
445
483
  output_format: @output_format,
446
484
  insert_only: @insert_only,
447
485
  log_level: @log_level,
@@ -500,6 +538,7 @@ module Exwiw
500
538
  opts.on("--ids=[IDS]", "Comma-separated list of identifiers. Required when --target-table is given.") { |v| @ids = v.split(',') }
501
539
  opts.on("--ids-field=[FIELD]", "Field on the target collection that --ids is matched against. Defaults to the primary key. (mongodb adapter only)") { |v| @ids_field = v }
502
540
  opts.on("--ids-column=[COLUMN]", "Column on the target table that --ids is matched against. Defaults to the primary key. (sql adapters only)") { |v| @ids_column = v }
541
+ opts.on("--scope-column=[COLUMN]", "Filter every table by this shared column (--ids are its values) instead of a single --target-table. Tables lacking it are reached via belongs_to. SQL adapters only; mutually exclusive with --target-table.") { |v| @scope_column = v }
503
542
  opts.on("--output-format=[FORMAT]", "Output format: insert (default) or copy (PostgreSQL only, export subcommand only)") { |v| @output_format = v }
504
543
  opts.on("--insert-only", "Do not generate DELETE SQL files (export subcommand only)") { @insert_only = true }
505
544
  opts.on("--after-insert-hook=PATH", "Path to a .rb or .sh post-processing hook executed after all insert/delete files are written (export subcommand only)") do |v|
@@ -1,35 +1,58 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "set"
4
+
3
5
  module Exwiw
4
6
  module DetermineTableProcessingOrder
5
7
  module_function
6
8
 
7
9
  # @param tables [Array<Exwiw::TableConfig>] tables
10
+ # @param logger [Logger, nil] receives a warning when a cycle has to be broken
8
11
  # @return [Array<String>] sorted table names
9
- def run(tables)
12
+ def run(tables, logger: nil)
10
13
  return tables.map(&:name) if tables.size < 2
11
14
 
12
15
  ordered_table_names = []
16
+ ordered = Set.new
13
17
 
14
18
  table_by_name = tables.each_with_object({}) do |table, acc|
15
19
  acc[table.name] = table
16
20
  end
17
21
 
22
+ # Only belongs_to relations whose target is also in this run constrain the
23
+ # order. A belongs_to pointing at a table that is not being processed here
24
+ # — e.g. an embedded MongoDB collection (masked through its parent, never
25
+ # dumped on its own) or any table excluded from the run — is not something
26
+ # we can or need to order against, so it must never block resolution.
27
+ # Without this, such a dependency would stay unresolved forever and
28
+ # masquerade as a circular dependency, freezing every table that
29
+ # (transitively) references it.
30
+ present_names = table_by_name.keys.to_set
31
+
18
32
  loop do
19
33
  break if table_by_name.empty?
20
34
 
21
- tables_with_no_dependencies = table_by_name.values.select do |table|
22
- not_resolved_names = compute_table_dependencies(table) - ordered_table_names - [table.name]
23
-
24
- not_resolved_names.empty?
35
+ resolvable = table_by_name.values.select do |table|
36
+ unresolved_dependencies(table, present_names, ordered).empty?
25
37
  end
26
38
 
27
- if tables_with_no_dependencies.empty?
28
- raise ArgumentError, build_cycle_error_message(table_by_name, ordered_table_names)
39
+ if resolvable.empty?
40
+ # No table has all its (in-run) dependencies satisfied, yet tables
41
+ # remain: the belongs_to graph has a genuine cycle and no strict
42
+ # topological order exists. Rather than aborting the whole export, break
43
+ # the cycle by emitting one cycle member; see pick_cycle_victim for how
44
+ # the member is chosen. Warn so the dropped constraint is visible.
45
+ victim = pick_cycle_victim(table_by_name.values, present_names, ordered)
46
+ warn_cycle_break(logger, victim, unresolved_dependencies(victim, present_names, ordered))
47
+ resolvable = [victim]
29
48
  end
30
49
 
31
- tables_with_no_dependencies.each do |table|
50
+ # In the normal (acyclic) path, emit every currently-resolvable table in
51
+ # insertion order — preserving the historical ordering the snapshot specs
52
+ # depend on. The cycle-break path emits exactly its single chosen victim.
53
+ resolvable.each do |table|
32
54
  ordered_table_names << table.name
55
+ ordered << table.name
33
56
  table_by_name.delete(table.name)
34
57
  end
35
58
  end
@@ -37,30 +60,124 @@ module Exwiw
37
60
  ordered_table_names
38
61
  end
39
62
 
63
+ # The belongs_to target table names of `table`. A polymorphic belongs_to is
64
+ # expanded into one entry per concrete target by schema generation, so each
65
+ # entry is a plain table name here.
40
66
  def compute_table_dependencies(table)
41
- table.belongs_tos.each_with_object([]) do |relation, acc|
42
- acc << relation.table_name
67
+ table.belongs_tos.map(&:table_name)
68
+ end
69
+
70
+ # The dependencies still blocking `table`: belongs_to targets that are part
71
+ # of this run, not yet ordered, and not the table itself (a self-referential
72
+ # belongs_to never blocks).
73
+ private_class_method def unresolved_dependencies(table, present_names, ordered)
74
+ compute_table_dependencies(table).uniq.select do |dep|
75
+ present_names.include?(dep) && !ordered.include?(dep) && dep != table.name
43
76
  end
44
77
  end
45
78
 
46
- # When no table can be resolved but some remain, the belongs_to graph
47
- # contains a cycle (e.g. A belongs_to B and B belongs_to A). A topological
48
- # order cannot exist, so report the offending tables instead of looping
49
- # forever.
50
- private_class_method def cycle_diagnostics(table_by_name, ordered_table_names)
51
- table_by_name.values.map do |table|
52
- unresolved = (compute_table_dependencies(table) - ordered_table_names - [table.name]).uniq
53
- " #{table.name} -> #{unresolved.join(', ')}"
79
+ # Choose the next table to emit when the order is stuck in a cycle. Only
80
+ # genuine cycle members are eligible a table in a non-trivial
81
+ # strongly-connected component of the unresolved-dependency subgraph so an
82
+ # acyclic table that merely waits on a cycle is never reordered ahead of its
83
+ # parent. Among the members, prefer one that still has at least one
84
+ # already-ordered parent, so its extraction stays constrained instead of
85
+ # collapsing to "match every row" (a cross-scope over-extraction risk for the
86
+ # mongodb adapter); break remaining ties by fewest unresolved dependencies,
87
+ # then by name, for determinism.
88
+ private_class_method def pick_cycle_victim(remaining, present_names, ordered)
89
+ adjacency = remaining.each_with_object({}) do |table, acc|
90
+ acc[table.name] = unresolved_dependencies(table, present_names, ordered)
91
+ end
92
+ cyclic_names = strongly_connected_members(adjacency)
93
+
94
+ candidates = remaining.select { |table| cyclic_names.include?(table.name) }
95
+ candidates = remaining if candidates.empty? # defensive; a stall implies a cycle
96
+
97
+ anchored = candidates.select { |table| ordered_parent?(table, present_names, ordered) }
98
+ pool = anchored.empty? ? candidates : anchored
99
+
100
+ pool.min_by { |table| [unresolved_dependencies(table, present_names, ordered).size, table.name] }
101
+ end
102
+
103
+ # True when `table` has a belongs_to whose target was already ordered, so its
104
+ # extraction filter will be constrained rather than an unscoped full scan.
105
+ private_class_method def ordered_parent?(table, present_names, ordered)
106
+ compute_table_dependencies(table).any? do |dep|
107
+ dep != table.name && present_names.include?(dep) && ordered.include?(dep)
54
108
  end
55
109
  end
56
110
 
57
- private_class_method def build_cycle_error_message(table_by_name, ordered_table_names)
58
- "Circular belongs_to dependency detected among tables: " \
59
- "#{table_by_name.keys.sort.join(', ')}. " \
60
- "A processing order cannot be determined. " \
61
- "Remove one of the belongs_to entries forming the cycle.\n" \
62
- "Unresolved dependencies:\n" \
63
- "#{cycle_diagnostics(table_by_name, ordered_table_names).join("\n")}"
111
+ # Names belonging to a non-trivial strongly-connected component (size > 1) of
112
+ # `adjacency` (table name -> unresolved dependency names), i.e. the genuine
113
+ # cycle participants. Iterative Tarjan; nodes and edges are visited in name
114
+ # order so the result is deterministic. Self-edges are already excluded from
115
+ # the adjacency, so a size-1 component is never a cycle.
116
+ private_class_method def strongly_connected_members(adjacency)
117
+ index = {}
118
+ low = {}
119
+ on_stack = {}
120
+ stack = []
121
+ counter = 0
122
+ members = Set.new
123
+ neighbors = adjacency.each_with_object({}) { |(name, deps), acc| acc[name] = deps.sort }
124
+
125
+ adjacency.keys.sort.each do |start|
126
+ next if index.key?(start)
127
+
128
+ work = [[start, 0]]
129
+ until work.empty?
130
+ node, edge_i = work.last
131
+ if edge_i.zero?
132
+ index[node] = counter
133
+ low[node] = counter
134
+ counter += 1
135
+ stack.push(node)
136
+ on_stack[node] = true
137
+ end
138
+
139
+ adj = neighbors[node] || []
140
+ if edge_i < adj.size
141
+ work.last[1] += 1
142
+ w = adj[edge_i]
143
+ next unless adjacency.key?(w) # ignore edges leaving the remaining set
144
+
145
+ if index.key?(w)
146
+ low[node] = [low[node], index[w]].min if on_stack[w]
147
+ else
148
+ work.push([w, 0])
149
+ end
150
+ else
151
+ if low[node] == index[node]
152
+ component = []
153
+ loop do
154
+ w = stack.pop
155
+ on_stack[w] = false
156
+ component << w
157
+ break if w == node
158
+ end
159
+ members.merge(component) if component.size > 1
160
+ end
161
+ work.pop
162
+ low[work.last[0]] = [low[work.last[0]], low[node]].min unless work.empty?
163
+ end
164
+ end
165
+ end
166
+
167
+ members
168
+ end
169
+
170
+ private_class_method def warn_cycle_break(logger, victim, dropped)
171
+ return if logger.nil?
172
+
173
+ logger.warn(
174
+ "Circular belongs_to dependency detected. Breaking it by ordering " \
175
+ "'#{victim.name}' before its parent table(s): #{dropped.join(', ')}. The dropped " \
176
+ "relationship is not enforced while ordering, so '#{victim.name}' is extracted " \
177
+ "without that parent constraint (the mongodb adapter may then match a superset of " \
178
+ "rows; SQL output may not load in foreign-key order). To break the cycle explicitly " \
179
+ "instead, mark one of the belongs_to entries forming it with `ignore: true`."
180
+ )
64
181
  end
65
182
  end
66
183
  end
@@ -26,8 +26,11 @@ module Exwiw
26
26
  target = table_by_name[@dump_target.table_name]
27
27
  adapter.validate_as_dump_target!(target) if target
28
28
 
29
+ dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
30
+ QueryAstBuilder.validate_scope!(dumpable_configs, table_by_name, @dump_target, @logger)
31
+
29
32
  @logger.debug("Determining table processing order...")
30
- ordered_table_names = DetermineTableProcessingOrder.run(configs.select { |c| adapter.dumpable?(c) })
33
+ ordered_table_names = DetermineTableProcessingOrder.run(dumpable_configs, logger: @logger)
31
34
 
32
35
  total_size = ordered_table_names.size
33
36
  ordered_table_names.each_with_index do |table_name, idx|
@@ -2,23 +2,58 @@
2
2
 
3
3
  module Exwiw
4
4
  class QueryAstBuilder
5
- def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true)
6
- new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse).run
5
+ def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
6
+ new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse, allow_forward: allow_forward).run
7
+ end
8
+
9
+ # Scope-column mode classification for a single table. One of
10
+ # :exempt / :direct / :via_path / :referenced_by / :via_scoped_parent / :unscopable.
11
+ def self.scope_category(table_name, table_by_name, dump_target, logger)
12
+ new(table_name, table_by_name, dump_target, logger).scope_category
13
+ end
14
+
15
+ # Strict pre-flight for scope-column mode: abort if any extractable table
16
+ # cannot be scoped, so an unscoped (potentially sensitive) table is never
17
+ # silently dumped in full. No-op outside scope mode. `tables` is the set of
18
+ # dumpable configs (ignore:true tables are skipped — they are not extracted).
19
+ def self.validate_scope!(tables, table_by_name, dump_target, logger)
20
+ return if dump_target.scope_column.nil?
21
+
22
+ unscopable =
23
+ tables.reject(&:ignore).select do |table|
24
+ scope_category(table.name, table_by_name, dump_target, logger) == :unscopable
25
+ end
26
+ return if unscopable.empty?
27
+
28
+ names = unscopable.map(&:name).sort.join(", ")
29
+ raise ArgumentError,
30
+ "scope-column mode: #{unscopable.size} table(s) cannot be scoped by " \
31
+ "'#{dump_target.scope_column}': #{names}. For each, add `scope_exempt: true` " \
32
+ "to export it in full, set `ignore: true` to skip it, or add a belongs_to path " \
33
+ "to a table that carries the scope column (use a per-table `scope_column` if the " \
34
+ "column name differs on that table)."
7
35
  end
8
36
 
9
37
  attr_reader :table_name, :table_by_name, :dump_target
10
38
 
11
- def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true)
39
+ def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
12
40
  @table_name = table_name
13
41
  @table_by_name = table_by_name
14
42
  @dump_target = dump_target
15
43
  @logger = logger
16
44
  @allow_reverse = allow_reverse
45
+ # @allow_forward gates the "scope via an indirectly-scoped belongs_to
46
+ # parent" rescue (build_belongs_to_scoped_clause). Disabled while building a
47
+ # parent/child subquery so a single forward hop never recurses into another
48
+ # (which could loop on a belongs_to cycle).
49
+ @allow_forward = allow_forward
17
50
  end
18
51
 
19
52
  def run
20
53
  table = table_by_name.fetch(table_name)
21
54
 
55
+ return build_scoped(table) if scope_mode?
56
+
22
57
  where_clauses = build_where_clauses(table, dump_target)
23
58
  join_clauses = build_join_clauses(table, table_by_name, dump_target)
24
59
 
@@ -130,8 +165,10 @@ module Exwiw
130
165
  next if relation.nil? || relation.polymorphic?
131
166
 
132
167
  # Build the child's own extraction query. allow_reverse:false stops a
133
- # chain of FK-less tables from recursing back into each other.
134
- child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false)
168
+ # chain of FK-less tables from recursing back into each other;
169
+ # allow_forward:false stops the child from forward-scoping back through
170
+ # this very table (which would loop).
171
+ child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false, allow_forward: false)
135
172
 
136
173
  # Only an *already constrained* child narrows anything; an unconstrained
137
174
  # child would select every fk value (i.e. dump all) and not help.
@@ -169,6 +206,64 @@ module Exwiw
169
206
  )
170
207
  end
171
208
 
209
+ # Scope-column mode. Builds a `fk IN (SELECT parent.pk FROM <parent
210
+ # extraction query>)` clause for a table whose belongs_to parent is itself
211
+ # scopable but carries no scope column of its own — so find_path_to_scoped
212
+ # cannot terminate on it (via_path fails) and nothing references this table
213
+ # (referenced_by fails). The classic shape is a hub scoped only via
214
+ # referenced_by (e.g. CDP `customer_accounts`, scoped by the `customers` that
215
+ # reference it) with sibling detail tables (`customer_account_details`, ...)
216
+ # hanging off it. Constraining those siblings to the hub's in-scope ids keeps
217
+ # them out of a full dump. Returns nil when there is no single, unambiguous
218
+ # scopable parent, leaving the caller on the unscopable path.
219
+ private def build_belongs_to_scoped_clause(table)
220
+ candidates = table.belongs_tos.filter_map do |relation|
221
+ # A polymorphic belongs_to points at several parent tables through one
222
+ # column, so it cannot project to a single parent id set; skip it.
223
+ next if relation.polymorphic?
224
+
225
+ parent = table_by_name[relation.table_name]
226
+ next if parent.nil?
227
+
228
+ # Build the parent's own scoped query. allow_reverse stays true so the
229
+ # parent may be scoped via referenced_by; allow_forward:false bounds this
230
+ # to a single forward hop so a belongs_to cycle cannot loop.
231
+ parent_query = self.class.run(parent.name, table_by_name, dump_target, @logger, allow_reverse: true, allow_forward: false)
232
+
233
+ # Only a constrained parent narrows anything; an unconstrained parent
234
+ # would select every pk (i.e. dump all) and not help.
235
+ next unless parent_query.where_clauses.any? || parent_query.join_clauses.any?
236
+
237
+ [relation, parent, parent_query]
238
+ end
239
+
240
+ # Only the unambiguous single-parent case. Multiple scopable parents would
241
+ # need their subqueries combined (not supported); fall back to unscopable.
242
+ if candidates.size != 1
243
+ if candidates.size > 1
244
+ @logger.debug(" #{table.name} has multiple scopable parents; skipping forward scope (unscopable).")
245
+ end
246
+ return nil
247
+ end
248
+
249
+ relation, parent, parent_query = candidates.first
250
+
251
+ # Project the parent's extraction query down to just its primary key — the
252
+ # column this table's foreign key points at.
253
+ pk_column = TableColumn.from_symbol_keys(name: parent.primary_key)
254
+ projected = QueryAst::Select.new
255
+ projected.from(parent_query.from_table_name)
256
+ projected.select([pk_column])
257
+ parent_query.join_clauses.each { |j| projected.join(j) }
258
+ parent_query.where_clauses.each { |w| projected.where(w) }
259
+
260
+ QueryAst::WhereClause.new(
261
+ column_name: relation.foreign_key,
262
+ operator: :in_subquery,
263
+ value: QueryAst::SelectSubquery.new(query: projected)
264
+ )
265
+ end
266
+
172
267
  private def build_where_clauses(table, dump_target)
173
268
  clauses = []
174
269
 
@@ -264,5 +359,208 @@ module Exwiw
264
359
 
265
360
  queue
266
361
  end
362
+
363
+ # ------------------------------------------------------------------
364
+ # Scope-column mode (Exwiw::DumpTarget#scope_column).
365
+ #
366
+ # The single-target machinery above anchors everything on one named table.
367
+ # Scope mode instead filters every table by a shared column. The relationship
368
+ # walk is the same idea — the *terminus* is just "any table carrying the
369
+ # scope column" rather than "the one named target".
370
+ # ------------------------------------------------------------------
371
+
372
+ private def scope_mode?
373
+ !dump_target.scope_column.nil?
374
+ end
375
+
376
+ # Classifier used by validate_scope! and mirrored by build_scoped below.
377
+ def scope_category
378
+ table = table_by_name.fetch(table_name)
379
+ return :exempt if scope_exempt?(table)
380
+ return :direct if directly_scoped?(table)
381
+ return :via_path if build_join_clauses_scoped(table).any?
382
+ return :referenced_by if @allow_reverse && build_referenced_by_clause(table)
383
+ return :via_scoped_parent if @allow_forward && build_belongs_to_scoped_clause(table)
384
+
385
+ :unscopable
386
+ end
387
+
388
+ private def build_scoped(table)
389
+ ast = QueryAst::Select.new
390
+ ast.from(table.name)
391
+ if table.rails_managed?
392
+ ast.select_all!
393
+ else
394
+ ast.select(table.columns)
395
+ end
396
+
397
+ # Reference/master (or rails-managed) table: export every row.
398
+ return ast if scope_exempt?(table)
399
+
400
+ # Carries the scope column itself: filter on it directly.
401
+ if directly_scoped?(table)
402
+ ast.where(scope_where_clause(table))
403
+ ast.where(table.filter) if table.filter
404
+ return ast
405
+ end
406
+
407
+ # Reachable via belongs_to: join up to the scoped ancestor (the scope
408
+ # filter is applied at the terminal join inside build_join_clauses_scoped).
409
+ join_clauses = build_join_clauses_scoped(table)
410
+ unless join_clauses.empty?
411
+ join_clauses.each { |join_clause| ast.join(join_clause) }
412
+ ast.where(table.filter) if table.filter
413
+ return ast
414
+ end
415
+
416
+ if @allow_reverse
417
+ # Referenced by an extractable (scoped) child: constrain via subquery.
418
+ reverse_clause = build_referenced_by_clause(table)
419
+ if reverse_clause
420
+ ast.where(reverse_clause)
421
+ return ast
422
+ end
423
+ end
424
+
425
+ if @allow_forward
426
+ # Belongs_to a parent that is itself scoped but carries no scope column of
427
+ # its own (so via_path cannot terminate on it) — e.g. a hub table scoped
428
+ # only via referenced_by. Constrain this table to that parent's in-scope
429
+ # ids so its rows ride along instead of being dumped in full.
430
+ parent_clause = build_belongs_to_scoped_clause(table)
431
+ if parent_clause
432
+ ast.where(parent_clause)
433
+ return ast
434
+ end
435
+ end
436
+
437
+ # Only the genuine top-level build (no rescue disabled) is allowed to fail
438
+ # hard. The Runner/ExplainRunner pre-flight (validate_scope!) rejects
439
+ # unscopable tables before extraction, so a top-level build never
440
+ # legitimately lands here; if it does, raise rather than emit an unfiltered
441
+ # (potential full PII) dump.
442
+ if @allow_reverse && @allow_forward
443
+ raise ArgumentError, scope_unscopable_message(table)
444
+ end
445
+
446
+ # Unscopable during a reverse/forward subquery build (a rescue is disabled):
447
+ # return the unconstrained AST so the caller's "constrained only" check
448
+ # filters this candidate out (it never becomes a real dump query).
449
+ ast
450
+ end
451
+
452
+ # The shared column this table is filtered on: a per-table `scope_column`
453
+ # override when present, otherwise the global `--scope-column`.
454
+ private def resolved_scope_column(table)
455
+ table.scope_column || dump_target.scope_column
456
+ end
457
+
458
+ private def scope_exempt?(table)
459
+ table.scope_exempt || table.rails_managed?
460
+ end
461
+
462
+ private def directly_scoped?(table)
463
+ column = resolved_scope_column(table)
464
+ table.columns.any? { |c| c.name == column }
465
+ end
466
+
467
+ private def scope_where_clause(table)
468
+ Exwiw::QueryAst::WhereClause.new(
469
+ column_name: resolved_scope_column(table),
470
+ operator: :eq,
471
+ value: dump_target.ids
472
+ )
473
+ end
474
+
475
+ # BFS over belongs_tos to the nearest *directly scoped* ancestor. Unlike the
476
+ # target-mode walk, the returned path INCLUDES that ancestor: the scope column
477
+ # lives on the ancestor itself (not on a foreign key of the child), so the
478
+ # ancestor must be joined and then filtered.
479
+ private def find_path_to_scoped(table)
480
+ visited = {}
481
+ queue = [[table.name, [table.name]]]
482
+
483
+ until queue.empty?
484
+ current_table_name, path = queue.shift
485
+ next if visited[current_table_name]
486
+ visited[current_table_name] = true
487
+
488
+ current_table = table_by_name[current_table_name]
489
+ next if current_table.nil?
490
+
491
+ current_table.belongs_tos.each do |relation|
492
+ next_table_name = relation.table_name
493
+ next_table = table_by_name[next_table_name]
494
+ next if next_table.nil?
495
+
496
+ next_path = path + [next_table_name]
497
+ return next_path if directly_scoped?(next_table)
498
+
499
+ queue.push([next_table_name, next_path])
500
+ end
501
+ end
502
+
503
+ []
504
+ end
505
+
506
+ private def build_join_clauses_scoped(table)
507
+ path_tables = find_path_to_scoped(table)
508
+ @logger.debug(" Join path from #{table.name} to a scoped table: #{path_tables}")
509
+
510
+ return [] if path_tables.size < 2
511
+
512
+ path_tables.each_cons(2).map do |from_table_name, to_table_name|
513
+ from_table = table_by_name[from_table_name]
514
+ to_table = table_by_name[to_table_name]
515
+
516
+ join_clause = build_scoped_join_clause(from_table, to_table)
517
+
518
+ # Only the final hop's to_table is directly scoped (the BFS stops there),
519
+ # so the scope filter rides on that join's where_clauses, compiled against
520
+ # join_table_name = the scoped ancestor.
521
+ if directly_scoped?(to_table)
522
+ join_clause.where_clauses.push scope_where_clause(to_table)
523
+ end
524
+
525
+ if to_table.filter
526
+ join_clause.where_clauses.push to_table.filter
527
+ end
528
+
529
+ join_clause
530
+ end
531
+ end
532
+
533
+ # One belongs_to hop as a JoinClause, with the polymorphic type condition
534
+ # placed on the source table (base_where_clauses) when the hop is polymorphic
535
+ # — mirroring the target-mode loop in build_join_clauses.
536
+ private def build_scoped_join_clause(from_table, to_table)
537
+ relation = from_table.belongs_to(to_table.name)
538
+
539
+ join_clause = QueryAst::JoinClause.new(
540
+ base_table_name: from_table.name,
541
+ foreign_key: relation.foreign_key,
542
+ join_table_name: to_table.name,
543
+ primary_key: to_table.primary_key,
544
+ where_clauses: [],
545
+ base_where_clauses: []
546
+ )
547
+
548
+ if relation.polymorphic?
549
+ join_clause.base_where_clauses.push QueryAst::WhereClause.new(
550
+ column_name: relation.foreign_type,
551
+ operator: :eq,
552
+ value: [relation.type_value]
553
+ )
554
+ end
555
+
556
+ join_clause
557
+ end
558
+
559
+ private def scope_unscopable_message(table)
560
+ "Table '#{table.name}' cannot be scoped in scope-column mode: it has no " \
561
+ "'#{dump_target.scope_column}' column (nor a per-table scope_column override) and no " \
562
+ "belongs_to path to a table that does. Add `scope_exempt: true` to export it in full, " \
563
+ "set `ignore: true` to skip it, or add the missing belongs_to."
564
+ end
267
565
  end
268
566
  end
data/lib/exwiw/runner.rb CHANGED
@@ -38,8 +38,13 @@ module Exwiw
38
38
  target = table_by_name[@dump_target.table_name]
39
39
  adapter.validate_as_dump_target!(target) if target
40
40
 
41
+ dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
42
+ # Scope-column mode: abort if any extractable table cannot be scoped (no-op
43
+ # otherwise). Done before extraction so nothing is dumped if it would leak.
44
+ QueryAstBuilder.validate_scope!(dumpable_configs, table_by_name, @dump_target, @logger)
45
+
41
46
  @logger.info("Determining table processing order...")
42
- ordered_table_names = DetermineTableProcessingOrder.run(configs.select { |c| adapter.dumpable?(c) })
47
+ ordered_table_names = DetermineTableProcessingOrder.run(dumpable_configs, logger: @logger)
43
48
 
44
49
  clean_output_dir!
45
50
 
@@ -26,6 +26,18 @@ module Exwiw
26
26
  attribute :columns, array(TableColumn), default: []
27
27
  attribute :bulk_insert_chunk_size, optional(Integer), skip_serializing_if_nil: true
28
28
  attribute :ignore, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
29
+ # Scope-column mode only (see Exwiw::DumpTarget#scope_column). Both are
30
+ # user-configured and never emitted by the schema generators.
31
+ #
32
+ # `scope_exempt: true` exports the whole table without scope filtering — the
33
+ # explicit, auditable escape hatch for genuine reference/master tables under
34
+ # the strict "every table must be scopable" rule.
35
+ #
36
+ # `scope_column` overrides the physical column this table is filtered on when
37
+ # it differs from the global `--scope-column` name (same scope value, just a
38
+ # different column name on this table).
39
+ attribute :scope_exempt, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
40
+ attribute :scope_column, optional(String), skip_serializing_if_nil: true
29
41
 
30
42
  def self.from(hash)
31
43
  config = super
@@ -137,6 +149,9 @@ module Exwiw
137
149
  merged_table.filter = filter
138
150
  merged_table.bulk_insert_chunk_size = passed_table.bulk_insert_chunk_size
139
151
  merged_table.ignore = ignore
152
+ # User-owned, never regenerated: carry over from the existing config.
153
+ merged_table.scope_exempt = scope_exempt
154
+ merged_table.scope_column = scope_column
140
155
 
141
156
  # Structural facts of each belongs_to come from the freshly generated
142
157
  # config, but the user-owned `comment`/`ignore`/`references` carry over
data/lib/exwiw/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Exwiw
4
- VERSION = "0.5.0"
4
+ VERSION = "0.5.2"
5
5
  end
data/lib/exwiw.rb CHANGED
@@ -39,7 +39,13 @@ module Exwiw
39
39
  # `ids_field` optionally overrides which field `--ids` is matched against on
40
40
  # the target table. When nil the table's primary key is used (the historical
41
41
  # behavior). Currently only honored by the mongodb adapter.
42
- DumpTarget = Struct.new(:table_name, :ids, :ids_field, keyword_init: true)
42
+ #
43
+ # `scope_column` switches the extraction to scope-column mode: instead of a
44
+ # single `table_name` anchor, every table is filtered by a shared column
45
+ # (`scope_column IN ids`) and tables lacking it are reached by walking
46
+ # belongs_to up to the nearest table that has it. When set, `table_name` is
47
+ # nil. SQL adapters only.
48
+ DumpTarget = Struct.new(:table_name, :ids, :ids_field, :scope_column, keyword_init: true)
43
49
  # `uri` is an optional full connection string (currently only honored by the
44
50
  # mongodb adapter, e.g. `mongodb+srv://...`). When present it is the source of
45
51
  # truth for the connection — host/port/user/password are ignored — so TLS,
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: exwiw
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.5.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shia