exwiw 0.4.11 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 23b64ea4c8b3562427b3e605fc17dec1251cf6916f2be02aeed0b3a29e03a62d
4
- data.tar.gz: ce8a51a31bb84c22abd164e3ec928065dff99f1dbd6eb83c72a9520e8c7e0737
3
+ metadata.gz: 2af5a1cc29946424a2b6498f19d7ad77714f108194575ddd6014c5fc2d829416
4
+ data.tar.gz: 3113e80b88ab11a95344140f819a9247eadc3706c45a9b2a665f946a88a945b7
5
5
  SHA512:
6
- metadata.gz: 779684b310965b1f59a7d9bb20d20d61526435845c1c3c5d7b6a6e1c7febcb96ce82d12ebf22bcb894546d8dd9f876984d302a87a2e321a1fb3aa8de8aeb4447
7
- data.tar.gz: e19d3931e974e09487e6a661e482cdeac9cf665e34d80dc82ed9fff6f099ffa7cf0a2a8fec2b7f682b301899f9121f3c6fd49680fe0a70fb300bb04d7b9c9f78
6
+ metadata.gz: 539f4ab428d75f97714475d607d91ae2870a2bf860ac8ab2f732d5a5709d5e0cf2fd085c0bc3bbdf9b095f969542eb866c1f3c3766cc5e3a8be0294cf3cfcf0f
7
+ data.tar.gz: 7b41f191e5e6d5ff60a1c927111d06f85b6c461a1dd63d5665b254338eafcde037233bb7f67027306258515e71aa1af3cf38e6aadf20b02ec9f6a91f05c7f2c4
data/CHANGELOG.md CHANGED
@@ -2,6 +2,23 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [0.5.1] - 2026-06-18
6
+
7
+ ### Added
8
+
9
+ - **Scope-column extraction mode** (`--scope-column`, SQL adapters only). For schemas where many independent top-level tables share the same scope/tenant column instead of converging on a single `belongs_to` root, exwiw can now filter **every** table by that shared column (`--scope-column=COLUMN` with `--ids` as its values) rather than anchoring on one `--target-table`. A table that carries the column is filtered directly; a table that lacks it but `belongs_to` a table that has it is joined up to the nearest such table and filtered there. A table that `belongs_to` a parent which is itself scoped but carries no scope column of its own (e.g. a *hub* table scoped only because an extractable child references it) is constrained to the parent's in-scope ids via a subquery (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`), so the hub's other children ride along to just the in-scope rows — limited to a single forward hop and a single unambiguous scopable parent. A table that cannot be scoped at all (no column and no path to one) makes the run **abort with a list of the offending tables**, so an unscoped table is never silently dumped in full. Two user-owned table-config keys support this and are preserved across `schema:generate` regeneration: **`scope_exempt: true`** exports a genuine reference/master table in full (rails-managed tables are treated as exempt automatically), and **`scope_column`** overrides the filtered column name for a table that stores the same scope value under a different name. `--scope-column` is mutually exclusive with `--target-table`, `--target-collection`, `--ids-column`, and `--ids-field`, can be set in `exwiw.yml`, and works with `exwiw explain`.
10
+
11
+ ## [0.5.0] - 2026-06-16
12
+
13
+ ### Added
14
+
15
+ - A YAML **config file** (`exwiw.yml`) can now hold any option except the database connection settings, so they no longer have to be repeated on every invocation. Pass it with `--config=PATH`; when `--config` is omitted, `exwiw.yml` (or `exwiw.yaml`) is loaded automatically from the current directory if present. **Options passed on the CLI take precedence** over the file (the file only fills in options not given on the CLI). Connection settings — `host`, `port`, `user`, `database`, `uri`, `password` — are **rejected** in the file (they must come from the CLI/environment); `adapter` is the one connection-related key allowed. Relative paths in the file (`schema_dir`, `output_dir`, `after_insert_hook`) are resolved relative to the config file's own directory (so a root-level `exwiw.yml` with `schema_dir: exwiw/schema` reads naturally and an absolute `--config` works from any directory). Unknown keys are rejected to catch typos, and export-only keys (`output_dir`, `output_format`, `insert_only`, `after_insert_hook`) are ignored under `explain` so one file can be shared by both subcommands.
16
+
17
+ ### Changed
18
+
19
+ - **BREAKING**: the `export`/`explain` CLI option `--config-dir` has been renamed to `--schema-dir` to distinguish the directory of schema JSON files from the new `--config` config file. Its short form `-c` is now `--config` (the config file); `--schema-dir` has no short form. The hook contract is renamed to match: the shell-hook environment variable `EXWIW_CONFIG_DIR` is now `EXWIW_SCHEMA_DIR`, and the Ruby-hook `cli_options[:config_dir]` is now `cli_options[:schema_dir]`. Update invocations, scripts, and hooks accordingly (`--config-dir` no longer exists). `--schema-dir` is still required and has no default unless `schema_dir` is set in the config file.
20
+ - **BREAKING**: the env var that overrides where `schema:generate`, `schema:tidy`, and `schema:generate_mongoid` write their config has been renamed from `OUTPUT_DIR_PATH` to `EXWIW_SCHEMA_DIR_PATH`, and the default output directory is now `exwiw/schema` (previously `exwiw`). The new name disambiguates it from the dump-side `--output-dir`, and the dedicated `schema/` subdirectory leaves `exwiw/` free for other artifacts (hooks, dumps). `OUTPUT_DIR_PATH` is no longer read. Existing repositories should set `EXWIW_SCHEMA_DIR_PATH` (e.g. `EXWIW_SCHEMA_DIR_PATH=exwiw` to preserve the old flat layout) and/or move their config under `exwiw/schema/`; otherwise a `generate` run will write a fresh copy into `exwiw/schema/` and leave the old files stale. The `export`/`explain` CLI is unaffected, but examples now point at `exwiw/schema`.
21
+
5
22
  ## [0.4.11] - 2026-06-15
6
23
 
7
24
  ### Fixed
data/README.md CHANGED
@@ -72,7 +72,7 @@ exwiw \
72
72
  --port=3306 \
73
73
  --user=reader \
74
74
  --database=app_production \
75
- --config-dir=exwiw \
75
+ --schema-dir=exwiw/schema \
76
76
  --target-table=shops \
77
77
  --ids=1 \ # comma separated ids
78
78
  --output-dir=dump \
@@ -81,7 +81,7 @@ exwiw \
81
81
 
82
82
  By default `--ids` are matched against the target table's primary key. `--ids-column=COLUMN` matches them against a different column instead (e.g. `--target-table=users --ids=alice@example.com --ids-column=email`). Related tables are still extracted correctly: their foreign keys are resolved through the target via a subquery (`WHERE fk IN (SELECT pk FROM target WHERE COLUMN IN (...))`), so only the target table's filter column changes. This is the SQL-adapter counterpart of the mongodb `--ids-field`; the two are mutually exclusive and each is rejected by the other adapter family. Note: if `COLUMN` is itself masked, re-running `delete-*` against an already-imported (masked) dump won't match, so prefer a stable natural key.
83
83
 
84
- When `--target-table` and `--ids` are omitted, exwiw dumps all tables defined in `--config-dir`:
84
+ When `--target-table` and `--ids` are omitted, exwiw dumps all tables defined in `--schema-dir`:
85
85
 
86
86
  ```bash
87
87
  # dump all tables
@@ -91,7 +91,7 @@ exwiw \
91
91
  --port=5432 \
92
92
  --user=reader \
93
93
  --database=app_production \
94
- --config-dir=exwiw \
94
+ --schema-dir=exwiw/schema \
95
95
  --output-dir=dump
96
96
  ```
97
97
 
@@ -123,25 +123,140 @@ exwiw explain \
123
123
  --adapter=postgresql \
124
124
  --host=localhost --port=5432 --user=reader \
125
125
  --database=app_production \
126
- --config-dir=exwiw \
126
+ --schema-dir=exwiw/schema \
127
127
  --target-table=shops --ids=1
128
128
  ```
129
129
 
130
130
  The `--output-dir`, `--output-format`, `--insert-only`, and `--after-insert-hook` options are dump-specific and rejected when used with `explain`.
131
131
 
132
+ ### Scope-column mode (`--scope-column`)
133
+
134
+ The default `--target-table` extraction assumes the schema converges on a single
135
+ root: every table is reached by walking `belongs_to` toward that one table. Some
136
+ schemas are not shaped that way — many independent top-level tables each carry the
137
+ *same* scope/tenant column (e.g. `tenant_id`, `account_uuid`) and there is no
138
+ single root. Choosing one of them as `--target-table` would leave the others
139
+ unrelated to it, and an unrelated table is dumped in full — a problem if it holds
140
+ personal data.
141
+
142
+ `--scope-column` handles this shape: instead of one anchor table, **every table is
143
+ filtered by a shared column** whose values are `--ids`.
144
+
145
+ ```bash
146
+ exwiw \
147
+ --adapter=postgresql \
148
+ --host=localhost --port=5432 --user=reader \
149
+ --database=app_production \
150
+ --schema-dir=exwiw/schema \
151
+ --scope-column=tenant_id \
152
+ --ids=42,43 \
153
+ --output-dir=dump
154
+ ```
155
+
156
+ Each table is resolved as follows:
157
+
158
+ - **Carries the scope column** → `WHERE scope_column IN (ids)`.
159
+ - **Lacks it but `belongs_to` reaches a table that has it** → exwiw joins up to the
160
+ nearest such table and applies the scope filter there (the same join machinery
161
+ the single-target mode uses).
162
+ - **`belongs_to` a parent that is itself scoped but carries no scope column of its
163
+ own** → exwiw constrains this table to the parent's in-scope ids via a subquery
164
+ (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`). This covers a *hub*
165
+ table that has no scope column and is scoped only because an extractable child
166
+ references it (see referenced-by below): the hub's other `belongs_to` children
167
+ ride along to just the in-scope rows instead of being dumped in full. Limited to
168
+ a single forward hop and a single unambiguous scopable parent.
169
+ - **Cannot be scoped at all** (no scope column and no path to one) → exwiw
170
+ **aborts** and lists the offending tables, so an unscoped table is never silently
171
+ dumped in full. For each, either add a `belongs_to` path, set `ignore: true` to
172
+ skip it, or mark it `scope_exempt: true` (below) to export it in full.
173
+
174
+ `--scope-column` is SQL-only (mysql / postgresql / sqlite) and mutually exclusive
175
+ with `--target-table`, `--target-collection`, `--ids-column`, and `--ids-field`.
176
+ It works with `exwiw explain` too, which is the recommended way to preview the
177
+ queries before exporting.
178
+
179
+ #### `scope_exempt` (intentional full dump)
180
+
181
+ A genuine reference/master table (no personal data) that has no scope linkage can
182
+ opt out of the strict check and be exported in full:
183
+
184
+ ```json
185
+ {
186
+ "name": "countries",
187
+ "primary_key": "id",
188
+ "scope_exempt": true,
189
+ "columns": [{ "name": "id" }, { "name": "code" }]
190
+ }
191
+ ```
192
+
193
+ Rails-managed tables (`schema_migrations`, `ar_internal_metadata`) are treated as
194
+ exempt automatically.
195
+
196
+ #### Per-table `scope_column` override
197
+
198
+ scope-column mode assumes a single shared **value** space — the same `--ids` apply
199
+ to every scoped table. If a table stores that same value under a differently named
200
+ column, override the column name for that table:
201
+
202
+ ```json
203
+ {
204
+ "name": "legacy_orders",
205
+ "primary_key": "id",
206
+ "scope_column": "legacy_tenant_id",
207
+ "columns": [{ "name": "id" }, { "name": "legacy_tenant_id" }]
208
+ }
209
+ ```
210
+
211
+ Both `scope_exempt` and `scope_column` are user-maintained and preserved across
212
+ `schema:generate` regeneration (the generators never emit them).
213
+
214
+ ### Config file (`exwiw.yml`)
215
+
216
+ Options you would otherwise repeat on every run can be kept in a YAML config file. Pass it with `--config=PATH`; when `--config` is omitted, exwiw automatically loads `exwiw.yml` (or `exwiw.yaml`) from the current directory if present.
217
+
218
+ **Options passed on the CLI always take precedence over the config file** — the config only fills in options you did not pass. This lets you commit the stable settings (which schema to read, output format, ...) while still varying the environment-specific connection details per invocation.
219
+
220
+ ```yaml
221
+ # exwiw.yml — keep at the project root, alongside exwiw/schema/
222
+ adapter: postgresql
223
+ schema_dir: exwiw/schema
224
+ output_dir: dump
225
+ output_format: insert # insert | copy
226
+ insert_only: false
227
+ after_insert_hook: hooks/seed.rb
228
+ log_level: info # debug | info
229
+ # target_table / ids / ids_field / ids_column / scope_column may also be set here
230
+ ```
231
+
232
+ With the file above, only the connection details need to be supplied on the CLI:
233
+
234
+ ```bash
235
+ DATABASE_PASSWORD=... exwiw \
236
+ --host=localhost --port=5432 --user=reader --database=app_production \
237
+ --target-table=shops --ids=1
238
+ ```
239
+
240
+ Notes:
241
+
242
+ - **Database connection settings stay on the CLI/environment.** `host`, `port`, `user`, `database`, `uri`, and `password` are **rejected** in the config file (exwiw exits with an error). `adapter` is the one connection-related key that *is* allowed in the file.
243
+ - **Relative paths in the config (`schema_dir`, `output_dir`, `after_insert_hook`) are resolved relative to the config file's own directory**, not the current working directory. So with the config at the project root, `schema_dir: exwiw/schema` reads naturally, and an absolute `--config=/path/to/exwiw.yml` works no matter where you run from. (CLI path flags remain relative to the current directory — each source resolves relative to where it is written.) Absolute paths are used as-is.
244
+ - Unknown keys are rejected so a typo surfaces immediately.
245
+ - Export-only keys (`output_dir`, `output_format`, `insert_only`, `after_insert_hook`) are ignored when running `explain`, so a single config file can be shared by both subcommands.
246
+
132
247
  ### Generator
133
248
 
134
249
  The config generator is provided as a Rake task.
135
250
 
136
251
  ```bash
137
- # generate table schema under exwiw/
252
+ # generate table schema under exwiw/schema/
138
253
  bundle exec rake exwiw:schema:generate
139
254
  ```
140
255
 
141
- By default, the schema files will be saved in the `exwiw` directory. You can specify a different output directory by setting the `OUTPUT_DIR_PATH` environment variable:
256
+ By default, the schema files will be saved in the `exwiw/schema` directory. You can specify a different output directory by setting the `EXWIW_SCHEMA_DIR_PATH` environment variable:
142
257
 
143
258
  ```sh
144
- OUTPUT_DIR_PATH=custom_directory bundle exec rake exwiw:schema:generate
259
+ EXWIW_SCHEMA_DIR_PATH=custom_directory bundle exec rake exwiw:schema:generate
145
260
  ```
146
261
 
147
262
  #### Tidying stale config (`schema:tidy`)
@@ -159,14 +274,14 @@ bundle exec rake exwiw:schema:tidy
159
274
 
160
275
  Because it reads the database directly, a table that still exists in the database but has lost (or never had) an ActiveRecord model is **kept** — only a table that is genuinely gone is removed. (This is the deliberate counterpart to `generate`, which is model-driven and only ever adds what the models know about.)
161
276
 
162
- It respects `OUTPUT_DIR_PATH` and the per-database subdirectory layout in the same way as `schema:generate`. Unlike `generate`, `tidy` never adds or regenerates entries — every surviving table/column (including hand-edited `comment` / `ignore` / `replace_with`) is left untouched, so it is safe to run on a customized config. The task prints which tables and columns it removed (or that the config was already tidy). Stale `belongs_tos` are not pruned by `tidy`; rerun `schema:generate` to refresh those.
277
+ It respects `EXWIW_SCHEMA_DIR_PATH` and the per-database subdirectory layout in the same way as `schema:generate`. Unlike `generate`, `tidy` never adds or regenerates entries — every surviving table/column (including hand-edited `comment` / `ignore` / `replace_with`) is left untouched, so it is safe to run on a customized config. The task prints which tables and columns it removed (or that the config was already tidy). Stale `belongs_tos` are not pruned by `tidy`; rerun `schema:generate` to refresh those.
163
278
 
164
279
  #### Multiple databases
165
280
 
166
281
  If the application uses Rails' multiple-database support (`connects_to`), `schema:generate` buckets models by the database they connect to and writes each database's config files into its own subdirectory of the output directory, named after the database config name (`primary`, `analytics`, ...):
167
282
 
168
283
  ```
169
- exwiw/
284
+ exwiw/schema/
170
285
  primary/
171
286
  shops.json
172
287
  users.json
@@ -267,7 +382,7 @@ This is an example of the one table schema:
267
382
  }
268
383
  ```
269
384
 
270
- `--config-dir` will use all json files in the specified directory.
385
+ `--schema-dir` will use all json files in the specified directory.
271
386
 
272
387
  ### Output format
273
388
 
@@ -307,7 +422,7 @@ SQL
307
422
 
308
423
  **Shell hook**: anything other than `.rb` is exec'd as a child process. It is a pure side-effect hook — exwiw does not capture its stdout. The hook receives these env vars and inherits `DATABASE_PASSWORD` from the parent:
309
424
 
310
- - `EXWIW_OUTPUT_DIR`, `EXWIW_CONFIG_DIR`
425
+ - `EXWIW_OUTPUT_DIR`, `EXWIW_SCHEMA_DIR`
311
426
  - `EXWIW_DATABASE_ADAPTER`, `EXWIW_DATABASE_HOST`, `EXWIW_DATABASE_PORT`, `EXWIW_DATABASE_USER`, `EXWIW_DATABASE_NAME`
312
427
  - `EXWIW_TARGET_TABLE`, `EXWIW_IDS` (comma-separated), `EXWIW_OUTPUT_FORMAT`
313
428
 
@@ -31,13 +31,14 @@ module Exwiw
31
31
  def self.run_shell(path:, cli_options:, output_dir:, logger:)
32
32
  env = {
33
33
  'EXWIW_OUTPUT_DIR' => output_dir,
34
- 'EXWIW_CONFIG_DIR' => cli_options[:config_dir].to_s,
34
+ 'EXWIW_SCHEMA_DIR' => cli_options[:schema_dir].to_s,
35
35
  'EXWIW_DATABASE_ADAPTER' => cli_options[:database_adapter].to_s,
36
36
  'EXWIW_DATABASE_HOST' => cli_options[:database_host].to_s,
37
37
  'EXWIW_DATABASE_PORT' => cli_options[:database_port].to_s,
38
38
  'EXWIW_DATABASE_USER' => cli_options[:database_user].to_s,
39
39
  'EXWIW_DATABASE_NAME' => cli_options[:database_name].to_s,
40
40
  'EXWIW_TARGET_TABLE' => cli_options[:target_table].to_s,
41
+ 'EXWIW_SCOPE_COLUMN' => cli_options[:scope_column].to_s,
41
42
  'EXWIW_IDS' => Array(cli_options[:ids]).join(','),
42
43
  'EXWIW_OUTPUT_FORMAT' => cli_options[:output_format].to_s,
43
44
  }
data/lib/exwiw/cli.rb CHANGED
@@ -5,6 +5,7 @@ require 'optparse'
5
5
  require 'pathname'
6
6
 
7
7
  require 'json'
8
+ require 'yaml'
8
9
 
9
10
  require 'exwiw'
10
11
 
@@ -12,6 +13,40 @@ module Exwiw
12
13
  class CLI
13
14
  KNOWN_SUBCOMMANDS = %w[export explain].freeze
14
15
 
16
+ # Config file loaded automatically when --config is omitted, if one exists in
17
+ # the current directory. Kept at the project root (rather than under exwiw/)
18
+ # so that config-relative paths like `schema_dir: exwiw/schema` read naturally.
19
+ # Both extensions are accepted; .yml wins when both are present.
20
+ DEFAULT_CONFIG_PATHS = %w[exwiw.yml exwiw.yaml].freeze
21
+
22
+ # Keys accepted in the config file. Anything outside this set is rejected so
23
+ # a typo surfaces immediately instead of being silently ignored. These mirror
24
+ # the non-connection CLI options (plus `adapter`).
25
+ ALLOWED_CONFIG_KEYS = %w[
26
+ adapter
27
+ schema_dir
28
+ output_dir
29
+ output_format
30
+ insert_only
31
+ after_insert_hook
32
+ log_level
33
+ target_table
34
+ target_collection
35
+ ids
36
+ ids_field
37
+ ids_column
38
+ scope_column
39
+ ].freeze
40
+
41
+ # Database connection settings are environment-specific (and sometimes
42
+ # secret-adjacent), so they must be passed via CLI/env, never the committed
43
+ # config file. `adapter` is the one connection-ish key allowed in config.
44
+ REJECTED_CONNECTION_KEYS = %w[host port user database uri password].freeze
45
+
46
+ # Keys that only make sense for `export`. They are skipped when merging config
47
+ # for `explain` so a shared config file does not trip validate_explain_only!.
48
+ EXPORT_ONLY_CONFIG_KEYS = %w[output_dir output_format insert_only after_insert_hook].freeze
49
+
15
50
  def self.start(argv)
16
51
  new(argv).run
17
52
  end
@@ -34,7 +69,8 @@ module Exwiw
34
69
  @database_password = ENV["DATABASE_PASSWORD"]
35
70
  @connection_uri = nil
36
71
  @output_dir = nil
37
- @config_dir = nil
72
+ @schema_dir = nil
73
+ @config_file_path = nil
38
74
  @database_adapter = nil
39
75
  @database_name = nil
40
76
  @target_table_name = nil
@@ -42,10 +78,13 @@ module Exwiw
42
78
  @ids = []
43
79
  @ids_field = nil
44
80
  @ids_column = nil
81
+ @scope_column = nil
45
82
  @output_format = nil
46
83
  @insert_only = nil
47
84
  @after_insert_hook_path = nil
48
- @log_level = :info
85
+ # nil (not :info) so we can tell "user passed --log-level" from the default,
86
+ # letting a config-file value fill in; the :info default is applied later.
87
+ @log_level = nil
49
88
 
50
89
  parser.parse!(@argv)
51
90
  end
@@ -72,6 +111,7 @@ module Exwiw
72
111
  table_name: @target_table_name,
73
112
  ids: @ids,
74
113
  ids_field: @ids_field,
114
+ scope_column: @scope_column,
75
115
  )
76
116
 
77
117
  logger = build_logger
@@ -82,7 +122,7 @@ module Exwiw
82
122
  Runner.new(
83
123
  connection_config: connection_config,
84
124
  output_dir: @output_dir,
85
- config_dir: @config_dir,
125
+ schema_dir: @schema_dir,
86
126
  dump_target: dump_target,
87
127
  output_format: @output_format,
88
128
  insert_only: @insert_only,
@@ -93,7 +133,7 @@ module Exwiw
93
133
  when "explain"
94
134
  ExplainRunner.new(
95
135
  connection_config: connection_config,
96
- config_dir: @config_dir,
136
+ schema_dir: @schema_dir,
97
137
  dump_target: dump_target,
98
138
  logger: logger,
99
139
  io: $stdout,
@@ -102,6 +142,14 @@ module Exwiw
102
142
  end
103
143
 
104
144
  private def validate_options!
145
+ # Fill in any options not given on the CLI from the config file. Done first
146
+ # so a config-provided `adapter` is in place before normalization below.
147
+ # CLI values always win (the merge only fills nil/empty ivars).
148
+ apply_config_file!
149
+
150
+ # Default log level once CLI and config have both had their say.
151
+ @log_level ||= :info
152
+
105
153
  # Fold driver/Rails adapter spellings (mysql2, sqlite3) into exwiw's
106
154
  # canonical names up front, so every check below — and the
107
155
  # EXWIW_DATABASE_ADAPTER passed to hooks — sees the canonical name.
@@ -116,6 +164,7 @@ module Exwiw
116
164
  end
117
165
 
118
166
  resolve_target_collection_alias!
167
+ resolve_scope_column!
119
168
  resolve_ids_column_alias!
120
169
  resolve_uri_option!
121
170
 
@@ -163,18 +212,18 @@ module Exwiw
163
212
  end
164
213
  end
165
214
 
166
- if @config_dir.nil?
167
- $stderr.puts "Config dir is required"
215
+ if @schema_dir.nil?
216
+ $stderr.puts "Schema dir is required (pass --schema-dir or set schema_dir in the config file)"
168
217
  exit 1
169
218
  end
170
219
 
171
- unless Dir.exist?(@config_dir)
172
- $stderr.puts "Config dir does not exist: #{@config_dir}"
220
+ unless Dir.exist?(@schema_dir)
221
+ $stderr.puts "Schema dir does not exist: #{@schema_dir}"
173
222
  exit 1
174
223
  end
175
224
 
176
- if Dir.glob(File.join(@config_dir, "*.json")).empty?
177
- $stderr.puts "Config dir contains no .json files: #{@config_dir}"
225
+ if Dir.glob(File.join(@schema_dir, "*.json")).empty?
226
+ $stderr.puts "Schema dir contains no .json files: #{@schema_dir}"
178
227
  exit 1
179
228
  end
180
229
 
@@ -183,8 +232,13 @@ module Exwiw
183
232
  exit 1
184
233
  end
185
234
 
186
- if !@target_table_name && @ids.any?
187
- $stderr.puts "--target-table is required when --ids is specified"
235
+ if @scope_column && @ids.empty?
236
+ $stderr.puts "--ids is required when --scope-column is specified"
237
+ exit 1
238
+ end
239
+
240
+ if !@target_table_name && !@scope_column && @ids.any?
241
+ $stderr.puts "--target-table or --scope-column is required when --ids is specified"
188
242
  exit 1
189
243
  end
190
244
 
@@ -202,6 +256,79 @@ module Exwiw
202
256
  end
203
257
  end
204
258
 
259
+ # Merge settings from the config file (YAML) into any options the user did
260
+ # not pass on the CLI. The CLI always wins: every assignment below only fills
261
+ # an ivar that is still nil/empty after parsing ARGV. Connection settings
262
+ # (except `adapter`) are rejected here — they belong on the CLI/env.
263
+ private def apply_config_file!
264
+ path =
265
+ if @config_file_path
266
+ unless File.file?(@config_file_path)
267
+ $stderr.puts "Config file not found: #{@config_file_path}"
268
+ exit 1
269
+ end
270
+ @config_file_path
271
+ else
272
+ DEFAULT_CONFIG_PATHS.map { |p| File.expand_path(p) }.find { |p| File.file?(p) }
273
+ end
274
+ return if path.nil?
275
+
276
+ # Paths inside the config file are resolved relative to the file's own
277
+ # directory (not cwd), so `schema_dir: exwiw/schema` reads naturally with the
278
+ # config kept at the project root, and an absolute --config works from any
279
+ # cwd. (CLI path flags stay cwd-relative — each source resolves relative to
280
+ # where it is written.) `path` is always absolute here.
281
+ base = File.dirname(path)
282
+
283
+ config = YAML.safe_load(File.read(path)) || {}
284
+ unless config.is_a?(Hash)
285
+ $stderr.puts "Config file must be a YAML mapping (key: value): #{path}"
286
+ exit 1
287
+ end
288
+
289
+ config.each_key do |key|
290
+ if REJECTED_CONNECTION_KEYS.include?(key)
291
+ $stderr.puts "'#{key}' is a database connection setting and must be passed via the CLI/environment, not the config file (#{path})"
292
+ exit 1
293
+ end
294
+ unless ALLOWED_CONFIG_KEYS.include?(key)
295
+ $stderr.puts "Unknown config key '#{key}' in #{path}. Allowed keys: #{ALLOWED_CONFIG_KEYS.join(', ')}"
296
+ exit 1
297
+ end
298
+ end
299
+
300
+ # For `explain`, drop export-only keys so a config shared with `export`
301
+ # does not make validate_explain_only! reject the run.
302
+ config = config.reject { |k, _| EXPORT_ONLY_CONFIG_KEYS.include?(k) } if @subcommand == "explain"
303
+
304
+ @database_adapter ||= config["adapter"]
305
+ @schema_dir ||= expand_dir(config["schema_dir"], base)
306
+ @output_dir ||= expand_dir(config["output_dir"], base)
307
+ @after_insert_hook_path ||= (File.expand_path(config["after_insert_hook"], base) if config["after_insert_hook"])
308
+ @output_format ||= config["output_format"]
309
+ @insert_only = config["insert_only"] if @insert_only.nil? && config.key?("insert_only")
310
+ @log_level ||= config["log_level"]&.to_sym
311
+ @target_table_name ||= config["target_table"]
312
+ @target_collection_name ||= config["target_collection"]
313
+ if @ids.empty? && config.key?("ids")
314
+ raw = config["ids"]
315
+ # Accept either a YAML list or a "1,2" string; coerce to strings to match
316
+ # the CLI's `--ids=1,2` -> ["1", "2"] shape.
317
+ @ids = (raw.is_a?(String) ? raw.split(",") : Array(raw)).map(&:to_s)
318
+ end
319
+ @ids_field ||= config["ids_field"]
320
+ @ids_column ||= config["ids_column"]
321
+ @scope_column ||= config["scope_column"]
322
+ end
323
+
324
+ # Strip a trailing slash (like the CLI's dir options) and expand relative to
325
+ # `base` (the config file's directory). Returns nil for a nil value.
326
+ private def expand_dir(value, base)
327
+ return nil if value.nil?
328
+ value = value.end_with?("/") ? value[0..-2] : value
329
+ File.expand_path(value, base)
330
+ end
331
+
205
332
  # `--target-collection` is a mongodb-only alias of `--target-table`. Fold it
206
333
  # into @target_table_name (the single field the rest of the CLI/runner uses)
207
334
  # after rejecting the misuses: combining it with --target-table, or using it
@@ -259,6 +386,33 @@ module Exwiw
259
386
  end
260
387
  end
261
388
 
389
+ # `--scope-column` switches to scope-column mode: every table is filtered by a
390
+ # shared column (`--ids` are its values) instead of anchoring on one
391
+ # `--target-table`. It is SQL-only and mutually exclusive with the single-target
392
+ # flags. Runs after resolve_target_collection_alias! (so --target-collection is
393
+ # already folded into @target_table_name) and before resolve_ids_column_alias!
394
+ # so the clearer "cannot combine" message wins over the generic ids-column one.
395
+ private def resolve_scope_column!
396
+ return if @scope_column.nil?
397
+
398
+ sql_adapters = ["mysql", "postgresql", "sqlite"]
399
+ unless sql_adapters.include?(@database_adapter)
400
+ $stderr.puts "--scope-column is only supported by the sql adapters"
401
+ exit 1
402
+ end
403
+
404
+ if @target_table_name
405
+ $stderr.puts "--scope-column cannot be combined with --target-table/--target-collection"
406
+ exit 1
407
+ end
408
+
409
+ if @ids_field || @ids_column
410
+ flag = @ids_column ? "--ids-column" : "--ids-field"
411
+ $stderr.puts "--scope-column cannot be combined with #{flag}"
412
+ exit 1
413
+ end
414
+ end
415
+
262
416
  # `--uri` supplies a full connection string (e.g. `mongodb+srv://...`) and is
263
417
  # mongodb-only — the SQL adapters shell out to their own client binaries with
264
418
  # discrete host/port/user flags and have no equivalent. Runs after the
@@ -319,12 +473,13 @@ module Exwiw
319
473
  database_user: @database_user,
320
474
  database_password: @database_password,
321
475
  output_dir: @output_dir,
322
- config_dir: @config_dir,
476
+ schema_dir: @schema_dir,
323
477
  database_adapter: @database_adapter,
324
478
  database_name: @database_name,
325
479
  target_table: @target_table_name,
326
480
  ids: @ids.dup.freeze,
327
481
  ids_field: @ids_field,
482
+ scope_column: @scope_column,
328
483
  output_format: @output_format,
329
484
  insert_only: @insert_only,
330
485
  log_level: @log_level,
@@ -368,9 +523,12 @@ module Exwiw
368
523
  v = v.end_with?("/") ? v[0..-2] : v
369
524
  @output_dir = File.expand_path(v)
370
525
  end
371
- opts.on("-c", "--config-dir=CONFIG_DIR_PATH", "Config dir path.") do |v|
526
+ opts.on("--schema-dir=SCHEMA_DIR_PATH", "Directory of schema JSON files. (or set schema_dir in the config file)") do |v|
372
527
  v = v.end_with?("/") ? v[0..-2] : v
373
- @config_dir = File.expand_path(v)
528
+ @schema_dir = File.expand_path(v)
529
+ end
530
+ opts.on("-c", "--config=CONFIG_FILE_PATH", "Path to the exwiw config YAML. Defaults to ./#{DEFAULT_CONFIG_PATHS.first} (or .#{File.extname(DEFAULT_CONFIG_PATHS.last)}) when present. CLI options take precedence; paths inside the file are resolved relative to the file.") do |v|
531
+ @config_file_path = File.expand_path(v)
374
532
  end
375
533
  opts.on("-a", "--adapter=ADAPTER", "Database adapter: mysql, sqlite, postgresql, mongodb (aliases: mysql2, sqlite3)") { |v| @database_adapter = v }
376
534
  opts.on("--uri=URI", "Full MongoDB connection URI (mongodb:// or mongodb+srv://). mongodb adapter only; takes precedence over --host/--port/--user. TLS, replicaSet, authSource and credentials are read from the URI.") { |v| @connection_uri = v }
@@ -380,6 +538,7 @@ module Exwiw
380
538
  opts.on("--ids=[IDS]", "Comma-separated list of identifiers. Required when --target-table is given.") { |v| @ids = v.split(',') }
381
539
  opts.on("--ids-field=[FIELD]", "Field on the target collection that --ids is matched against. Defaults to the primary key. (mongodb adapter only)") { |v| @ids_field = v }
382
540
  opts.on("--ids-column=[COLUMN]", "Column on the target table that --ids is matched against. Defaults to the primary key. (sql adapters only)") { |v| @ids_column = v }
541
+ opts.on("--scope-column=[COLUMN]", "Filter every table by this shared column (--ids are its values) instead of a single --target-table. Tables lacking it are reached via belongs_to. SQL adapters only; mutually exclusive with --target-table.") { |v| @scope_column = v }
383
542
  opts.on("--output-format=[FORMAT]", "Output format: insert (default) or copy (PostgreSQL only, export subcommand only)") { |v| @output_format = v }
384
543
  opts.on("--insert-only", "Do not generate DELETE SQL files (export subcommand only)") { @insert_only = true }
385
544
  opts.on("--after-insert-hook=PATH", "Path to a .rb or .sh post-processing hook executed after all insert/delete files are written (export subcommand only)") do |v|
@@ -4,13 +4,13 @@ module Exwiw
4
4
  class ExplainRunner
5
5
  def initialize(
6
6
  connection_config:,
7
- config_dir:,
7
+ schema_dir:,
8
8
  dump_target:,
9
9
  logger:,
10
10
  io: $stdout
11
11
  )
12
12
  @connection_config = connection_config
13
- @config_dir = config_dir
13
+ @schema_dir = schema_dir
14
14
  @dump_target = dump_target
15
15
  @logger = logger
16
16
  @io = io
@@ -26,8 +26,11 @@ module Exwiw
26
26
  target = table_by_name[@dump_target.table_name]
27
27
  adapter.validate_as_dump_target!(target) if target
28
28
 
29
+ dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
30
+ QueryAstBuilder.validate_scope!(dumpable_configs, table_by_name, @dump_target, @logger)
31
+
29
32
  @logger.debug("Determining table processing order...")
30
- ordered_table_names = DetermineTableProcessingOrder.run(configs.select { |c| adapter.dumpable?(c) })
33
+ ordered_table_names = DetermineTableProcessingOrder.run(dumpable_configs)
31
34
 
32
35
  total_size = ordered_table_names.size
33
36
  ordered_table_names.each_with_index do |table_name, idx|
@@ -53,7 +56,7 @@ module Exwiw
53
56
  end
54
57
 
55
58
  private def load_table_config(klass)
56
- Dir[File.join(@config_dir, "*.json")].map do |file|
59
+ Dir[File.join(@schema_dir, "*.json")].map do |file|
57
60
  json = JSON.parse(File.read(file))
58
61
  klass.from(json).reject_ignored_members!
59
62
  end
@@ -2,23 +2,58 @@
2
2
 
3
3
  module Exwiw
4
4
  class QueryAstBuilder
5
- def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true)
6
- new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse).run
5
+ def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
6
+ new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse, allow_forward: allow_forward).run
7
+ end
8
+
9
+ # Scope-column mode classification for a single table. One of
10
+ # :exempt / :direct / :via_path / :referenced_by / :via_scoped_parent / :unscopable.
11
+ def self.scope_category(table_name, table_by_name, dump_target, logger)
12
+ new(table_name, table_by_name, dump_target, logger).scope_category
13
+ end
14
+
15
+ # Strict pre-flight for scope-column mode: abort if any extractable table
16
+ # cannot be scoped, so an unscoped (potentially sensitive) table is never
17
+ # silently dumped in full. No-op outside scope mode. `tables` is the set of
18
+ # dumpable configs (ignore:true tables are skipped — they are not extracted).
19
+ def self.validate_scope!(tables, table_by_name, dump_target, logger)
20
+ return if dump_target.scope_column.nil?
21
+
22
+ unscopable =
23
+ tables.reject(&:ignore).select do |table|
24
+ scope_category(table.name, table_by_name, dump_target, logger) == :unscopable
25
+ end
26
+ return if unscopable.empty?
27
+
28
+ names = unscopable.map(&:name).sort.join(", ")
29
+ raise ArgumentError,
30
+ "scope-column mode: #{unscopable.size} table(s) cannot be scoped by " \
31
+ "'#{dump_target.scope_column}': #{names}. For each, add `scope_exempt: true` " \
32
+ "to export it in full, set `ignore: true` to skip it, or add a belongs_to path " \
33
+ "to a table that carries the scope column (use a per-table `scope_column` if the " \
34
+ "column name differs on that table)."
7
35
  end
8
36
 
9
37
  attr_reader :table_name, :table_by_name, :dump_target
10
38
 
11
- def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true)
39
+ def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
12
40
  @table_name = table_name
13
41
  @table_by_name = table_by_name
14
42
  @dump_target = dump_target
15
43
  @logger = logger
16
44
  @allow_reverse = allow_reverse
45
+ # @allow_forward gates the "scope via an indirectly-scoped belongs_to
46
+ # parent" rescue (build_belongs_to_scoped_clause). Disabled while building a
47
+ # parent/child subquery so a single forward hop never recurses into another
48
+ # (which could loop on a belongs_to cycle).
49
+ @allow_forward = allow_forward
17
50
  end
18
51
 
19
52
  def run
20
53
  table = table_by_name.fetch(table_name)
21
54
 
55
+ return build_scoped(table) if scope_mode?
56
+
22
57
  where_clauses = build_where_clauses(table, dump_target)
23
58
  join_clauses = build_join_clauses(table, table_by_name, dump_target)
24
59
 
@@ -130,8 +165,10 @@ module Exwiw
130
165
  next if relation.nil? || relation.polymorphic?
131
166
 
132
167
  # Build the child's own extraction query. allow_reverse:false stops a
133
- # chain of FK-less tables from recursing back into each other.
134
- child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false)
168
+ # chain of FK-less tables from recursing back into each other;
169
+ # allow_forward:false stops the child from forward-scoping back through
170
+ # this very table (which would loop).
171
+ child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false, allow_forward: false)
135
172
 
136
173
  # Only an *already constrained* child narrows anything; an unconstrained
137
174
  # child would select every fk value (i.e. dump all) and not help.
@@ -169,6 +206,64 @@ module Exwiw
169
206
  )
170
207
  end
171
208
 
209
+ # Scope-column mode. Builds a `fk IN (SELECT parent.pk FROM <parent
210
+ # extraction query>)` clause for a table whose belongs_to parent is itself
211
+ # scopable but carries no scope column of its own — so find_path_to_scoped
212
+ # cannot terminate on it (via_path fails) and nothing references this table
213
+ # (referenced_by fails). The classic shape is a hub scoped only via
214
+ # referenced_by (e.g. CDP `customer_accounts`, scoped by the `customers` that
215
+ # reference it) with sibling detail tables (`customer_account_details`, ...)
216
+ # hanging off it. Constraining those siblings to the hub's in-scope ids keeps
217
+ # them out of a full dump. Returns nil when there is no single, unambiguous
218
+ # scopable parent, leaving the caller on the unscopable path.
219
+ private def build_belongs_to_scoped_clause(table)
220
+ candidates = table.belongs_tos.filter_map do |relation|
221
+ # A polymorphic belongs_to points at several parent tables through one
222
+ # column, so it cannot project to a single parent id set; skip it.
223
+ next if relation.polymorphic?
224
+
225
+ parent = table_by_name[relation.table_name]
226
+ next if parent.nil?
227
+
228
+ # Build the parent's own scoped query. allow_reverse stays true so the
229
+ # parent may be scoped via referenced_by; allow_forward:false bounds this
230
+ # to a single forward hop so a belongs_to cycle cannot loop.
231
+ parent_query = self.class.run(parent.name, table_by_name, dump_target, @logger, allow_reverse: true, allow_forward: false)
232
+
233
+ # Only a constrained parent narrows anything; an unconstrained parent
234
+ # would select every pk (i.e. dump all) and not help.
235
+ next unless parent_query.where_clauses.any? || parent_query.join_clauses.any?
236
+
237
+ [relation, parent, parent_query]
238
+ end
239
+
240
+ # Only the unambiguous single-parent case. Multiple scopable parents would
241
+ # need their subqueries combined (not supported); fall back to unscopable.
242
+ if candidates.size != 1
243
+ if candidates.size > 1
244
+ @logger.debug(" #{table.name} has multiple scopable parents; skipping forward scope (unscopable).")
245
+ end
246
+ return nil
247
+ end
248
+
249
+ relation, parent, parent_query = candidates.first
250
+
251
+ # Project the parent's extraction query down to just its primary key — the
252
+ # column this table's foreign key points at.
253
+ pk_column = TableColumn.from_symbol_keys(name: parent.primary_key)
254
+ projected = QueryAst::Select.new
255
+ projected.from(parent_query.from_table_name)
256
+ projected.select([pk_column])
257
+ parent_query.join_clauses.each { |j| projected.join(j) }
258
+ parent_query.where_clauses.each { |w| projected.where(w) }
259
+
260
+ QueryAst::WhereClause.new(
261
+ column_name: relation.foreign_key,
262
+ operator: :in_subquery,
263
+ value: QueryAst::SelectSubquery.new(query: projected)
264
+ )
265
+ end
266
+
172
267
  private def build_where_clauses(table, dump_target)
173
268
  clauses = []
174
269
 
@@ -264,5 +359,208 @@ module Exwiw
264
359
 
265
360
  queue
266
361
  end
362
+
363
+ # ------------------------------------------------------------------
364
+ # Scope-column mode (Exwiw::DumpTarget#scope_column).
365
+ #
366
+ # The single-target machinery above anchors everything on one named table.
367
+ # Scope mode instead filters every table by a shared column. The relationship
368
+ # walk is the same idea — the *terminus* is just "any table carrying the
369
+ # scope column" rather than "the one named target".
370
+ # ------------------------------------------------------------------
371
+
372
+ private def scope_mode?
373
+ !dump_target.scope_column.nil?
374
+ end
375
+
376
+ # Classifier used by validate_scope! and mirrored by build_scoped below.
377
+ def scope_category
378
+ table = table_by_name.fetch(table_name)
379
+ return :exempt if scope_exempt?(table)
380
+ return :direct if directly_scoped?(table)
381
+ return :via_path if build_join_clauses_scoped(table).any?
382
+ return :referenced_by if @allow_reverse && build_referenced_by_clause(table)
383
+ return :via_scoped_parent if @allow_forward && build_belongs_to_scoped_clause(table)
384
+
385
+ :unscopable
386
+ end
387
+
388
+ private def build_scoped(table)
389
+ ast = QueryAst::Select.new
390
+ ast.from(table.name)
391
+ if table.rails_managed?
392
+ ast.select_all!
393
+ else
394
+ ast.select(table.columns)
395
+ end
396
+
397
+ # Reference/master (or rails-managed) table: export every row.
398
+ return ast if scope_exempt?(table)
399
+
400
+ # Carries the scope column itself: filter on it directly.
401
+ if directly_scoped?(table)
402
+ ast.where(scope_where_clause(table))
403
+ ast.where(table.filter) if table.filter
404
+ return ast
405
+ end
406
+
407
+ # Reachable via belongs_to: join up to the scoped ancestor (the scope
408
+ # filter is applied at the terminal join inside build_join_clauses_scoped).
409
+ join_clauses = build_join_clauses_scoped(table)
410
+ unless join_clauses.empty?
411
+ join_clauses.each { |join_clause| ast.join(join_clause) }
412
+ ast.where(table.filter) if table.filter
413
+ return ast
414
+ end
415
+
416
+ if @allow_reverse
417
+ # Referenced by an extractable (scoped) child: constrain via subquery.
418
+ reverse_clause = build_referenced_by_clause(table)
419
+ if reverse_clause
420
+ ast.where(reverse_clause)
421
+ return ast
422
+ end
423
+ end
424
+
425
+ if @allow_forward
426
+ # Belongs_to a parent that is itself scoped but carries no scope column of
427
+ # its own (so via_path cannot terminate on it) — e.g. a hub table scoped
428
+ # only via referenced_by. Constrain this table to that parent's in-scope
429
+ # ids so its rows ride along instead of being dumped in full.
430
+ parent_clause = build_belongs_to_scoped_clause(table)
431
+ if parent_clause
432
+ ast.where(parent_clause)
433
+ return ast
434
+ end
435
+ end
436
+
437
+ # Only the genuine top-level build (no rescue disabled) is allowed to fail
438
+ # hard. The Runner/ExplainRunner pre-flight (validate_scope!) rejects
439
+ # unscopable tables before extraction, so a top-level build never
440
+ # legitimately lands here; if it does, raise rather than emit an unfiltered
441
+ # (potential full PII) dump.
442
+ if @allow_reverse && @allow_forward
443
+ raise ArgumentError, scope_unscopable_message(table)
444
+ end
445
+
446
+ # Unscopable during a reverse/forward subquery build (a rescue is disabled):
447
+ # return the unconstrained AST so the caller's "constrained only" check
448
+ # filters this candidate out (it never becomes a real dump query).
449
+ ast
450
+ end
451
+
452
+ # The shared column this table is filtered on: a per-table `scope_column`
453
+ # override when present, otherwise the global `--scope-column`.
454
+ private def resolved_scope_column(table)
455
+ table.scope_column || dump_target.scope_column
456
+ end
457
+
458
+ private def scope_exempt?(table)
459
+ table.scope_exempt || table.rails_managed?
460
+ end
461
+
462
+ private def directly_scoped?(table)
463
+ column = resolved_scope_column(table)
464
+ table.columns.any? { |c| c.name == column }
465
+ end
466
+
467
+ private def scope_where_clause(table)
468
+ Exwiw::QueryAst::WhereClause.new(
469
+ column_name: resolved_scope_column(table),
470
+ operator: :eq,
471
+ value: dump_target.ids
472
+ )
473
+ end
474
+
475
+ # BFS over belongs_tos to the nearest *directly scoped* ancestor. Unlike the
476
+ # target-mode walk, the returned path INCLUDES that ancestor: the scope column
477
+ # lives on the ancestor itself (not on a foreign key of the child), so the
478
+ # ancestor must be joined and then filtered.
479
+ private def find_path_to_scoped(table)
480
+ visited = {}
481
+ queue = [[table.name, [table.name]]]
482
+
483
+ until queue.empty?
484
+ current_table_name, path = queue.shift
485
+ next if visited[current_table_name]
486
+ visited[current_table_name] = true
487
+
488
+ current_table = table_by_name[current_table_name]
489
+ next if current_table.nil?
490
+
491
+ current_table.belongs_tos.each do |relation|
492
+ next_table_name = relation.table_name
493
+ next_table = table_by_name[next_table_name]
494
+ next if next_table.nil?
495
+
496
+ next_path = path + [next_table_name]
497
+ return next_path if directly_scoped?(next_table)
498
+
499
+ queue.push([next_table_name, next_path])
500
+ end
501
+ end
502
+
503
+ []
504
+ end
505
+
506
+ private def build_join_clauses_scoped(table)
507
+ path_tables = find_path_to_scoped(table)
508
+ @logger.debug(" Join path from #{table.name} to a scoped table: #{path_tables}")
509
+
510
+ return [] if path_tables.size < 2
511
+
512
+ path_tables.each_cons(2).map do |from_table_name, to_table_name|
513
+ from_table = table_by_name[from_table_name]
514
+ to_table = table_by_name[to_table_name]
515
+
516
+ join_clause = build_scoped_join_clause(from_table, to_table)
517
+
518
+ # Only the final hop's to_table is directly scoped (the BFS stops there),
519
+ # so the scope filter rides on that join's where_clauses, compiled against
520
+ # join_table_name = the scoped ancestor.
521
+ if directly_scoped?(to_table)
522
+ join_clause.where_clauses.push scope_where_clause(to_table)
523
+ end
524
+
525
+ if to_table.filter
526
+ join_clause.where_clauses.push to_table.filter
527
+ end
528
+
529
+ join_clause
530
+ end
531
+ end
532
+
533
+ # One belongs_to hop as a JoinClause, with the polymorphic type condition
534
+ # placed on the source table (base_where_clauses) when the hop is polymorphic
535
+ # — mirroring the target-mode loop in build_join_clauses.
536
+ private def build_scoped_join_clause(from_table, to_table)
537
+ relation = from_table.belongs_to(to_table.name)
538
+
539
+ join_clause = QueryAst::JoinClause.new(
540
+ base_table_name: from_table.name,
541
+ foreign_key: relation.foreign_key,
542
+ join_table_name: to_table.name,
543
+ primary_key: to_table.primary_key,
544
+ where_clauses: [],
545
+ base_where_clauses: []
546
+ )
547
+
548
+ if relation.polymorphic?
549
+ join_clause.base_where_clauses.push QueryAst::WhereClause.new(
550
+ column_name: relation.foreign_type,
551
+ operator: :eq,
552
+ value: [relation.type_value]
553
+ )
554
+ end
555
+
556
+ join_clause
557
+ end
558
+
559
+ private def scope_unscopable_message(table)
560
+ "Table '#{table.name}' cannot be scoped in scope-column mode: it has no " \
561
+ "'#{dump_target.scope_column}' column (nor a per-table scope_column override) and no " \
562
+ "belongs_to path to a table that does. Add `scope_exempt: true` to export it in full, " \
563
+ "set `ignore: true` to skip it, or add the missing belongs_to."
564
+ end
267
565
  end
268
566
  end
data/lib/exwiw/runner.rb CHANGED
@@ -7,7 +7,7 @@ module Exwiw
7
7
  def initialize(
8
8
  connection_config:,
9
9
  output_dir:,
10
- config_dir:,
10
+ schema_dir:,
11
11
  dump_target:,
12
12
  logger:,
13
13
  output_format: 'insert',
@@ -17,7 +17,7 @@ module Exwiw
17
17
  )
18
18
  @connection_config = connection_config
19
19
  @output_dir = output_dir
20
- @config_dir = config_dir
20
+ @schema_dir = schema_dir
21
21
  @dump_target = dump_target
22
22
  @output_format = output_format
23
23
  @insert_only = insert_only
@@ -38,8 +38,13 @@ module Exwiw
38
38
  target = table_by_name[@dump_target.table_name]
39
39
  adapter.validate_as_dump_target!(target) if target
40
40
 
41
+ dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
42
+ # Scope-column mode: abort if any extractable table cannot be scoped (no-op
43
+ # otherwise). Done before extraction so nothing is dumped if it would leak.
44
+ QueryAstBuilder.validate_scope!(dumpable_configs, table_by_name, @dump_target, @logger)
45
+
41
46
  @logger.info("Determining table processing order...")
42
- ordered_table_names = DetermineTableProcessingOrder.run(configs.select { |c| adapter.dumpable?(c) })
47
+ ordered_table_names = DetermineTableProcessingOrder.run(dumpable_configs)
43
48
 
44
49
  clean_output_dir!
45
50
 
@@ -159,7 +164,7 @@ module Exwiw
159
164
  end
160
165
 
161
166
  private def load_table_config(klass)
162
- Dir[File.join(@config_dir, "*.json")].map do |file|
167
+ Dir[File.join(@schema_dir, "*.json")].map do |file|
163
168
  json = JSON.parse(File.read(file))
164
169
  # Drop belongs_tos/columns(fields) flagged ignore:true so they are not
165
170
  # considered during extraction. Done here (after loading from file)
@@ -26,6 +26,18 @@ module Exwiw
26
26
  attribute :columns, array(TableColumn), default: []
27
27
  attribute :bulk_insert_chunk_size, optional(Integer), skip_serializing_if_nil: true
28
28
  attribute :ignore, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
29
+ # Scope-column mode only (see Exwiw::DumpTarget#scope_column). Both are
30
+ # user-configured and never emitted by the schema generators.
31
+ #
32
+ # `scope_exempt: true` exports the whole table without scope filtering — the
33
+ # explicit, auditable escape hatch for genuine reference/master tables under
34
+ # the strict "every table must be scopable" rule.
35
+ #
36
+ # `scope_column` overrides the physical column this table is filtered on when
37
+ # it differs from the global `--scope-column` name (same scope value, just a
38
+ # different column name on this table).
39
+ attribute :scope_exempt, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
40
+ attribute :scope_column, optional(String), skip_serializing_if_nil: true
29
41
 
30
42
  def self.from(hash)
31
43
  config = super
@@ -137,6 +149,9 @@ module Exwiw
137
149
  merged_table.filter = filter
138
150
  merged_table.bulk_insert_chunk_size = passed_table.bulk_insert_chunk_size
139
151
  merged_table.ignore = ignore
152
+ # User-owned, never regenerated: carry over from the existing config.
153
+ merged_table.scope_exempt = scope_exempt
154
+ merged_table.scope_column = scope_column
140
155
 
141
156
  # Structural facts of each belongs_to come from the freshly generated
142
157
  # config, but the user-owned `comment`/`ignore`/`references` carry over
data/lib/exwiw/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Exwiw
4
- VERSION = "0.4.11"
4
+ VERSION = "0.5.1"
5
5
  end
data/lib/exwiw.rb CHANGED
@@ -39,7 +39,13 @@ module Exwiw
39
39
  # `ids_field` optionally overrides which field `--ids` is matched against on
40
40
  # the target table. When nil the table's primary key is used (the historical
41
41
  # behavior). Currently only honored by the mongodb adapter.
42
- DumpTarget = Struct.new(:table_name, :ids, :ids_field, keyword_init: true)
42
+ #
43
+ # `scope_column` switches the extraction to scope-column mode: instead of a
44
+ # single `table_name` anchor, every table is filtered by a shared column
45
+ # (`scope_column IN ids`) and tables lacking it are reached by walking
46
+ # belongs_to up to the nearest table that has it. When set, `table_name` is
47
+ # nil. SQL adapters only.
48
+ DumpTarget = Struct.new(:table_name, :ids, :ids_field, :scope_column, keyword_init: true)
43
49
  # `uri` is an optional full connection string (currently only honored by the
44
50
  # mongodb adapter, e.g. `mongodb+srv://...`). When present it is the source of
45
51
  # truth for the connection — host/port/user/password are ignored — so TLS,
data/lib/tasks/exwiw.rake CHANGED
@@ -7,7 +7,7 @@ namespace :exwiw do
7
7
  require "exwiw"
8
8
 
9
9
  Exwiw::SchemaGenerator.from_rails_application(
10
- output_dir: ENV["OUTPUT_DIR_PATH"] || "exwiw",
10
+ output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
11
11
  ).generate!
12
12
  end
13
13
 
@@ -16,7 +16,7 @@ namespace :exwiw do
16
16
  require "exwiw"
17
17
 
18
18
  result = Exwiw::SchemaGenerator.from_rails_application(
19
- output_dir: ENV["OUTPUT_DIR_PATH"] || "exwiw",
19
+ output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
20
20
  ).tidy!
21
21
 
22
22
  if result.empty?
@@ -47,7 +47,7 @@ namespace :exwiw do
47
47
  require "exwiw"
48
48
 
49
49
  Exwiw::MongoidSchemaGenerator.from_rails_application(
50
- output_dir: ENV["OUTPUT_DIR_PATH"] || "exwiw",
50
+ output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
51
51
  skip_unsupported: ENV["EXWIW_SKIP_UNSUPPORTED"] == "1",
52
52
  ).generate!
53
53
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: exwiw
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.11
4
+ version: 0.5.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shia