exwiw 0.4.11 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +17 -0
- data/README.md +126 -11
- data/lib/exwiw/after_insert_hook.rb +2 -1
- data/lib/exwiw/cli.rb +174 -15
- data/lib/exwiw/explain_runner.rb +7 -4
- data/lib/exwiw/query_ast_builder.rb +303 -5
- data/lib/exwiw/runner.rb +9 -4
- data/lib/exwiw/table_config.rb +15 -0
- data/lib/exwiw/version.rb +1 -1
- data/lib/exwiw.rb +7 -1
- data/lib/tasks/exwiw.rake +3 -3
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 2af5a1cc29946424a2b6498f19d7ad77714f108194575ddd6014c5fc2d829416
|
|
4
|
+
data.tar.gz: 3113e80b88ab11a95344140f819a9247eadc3706c45a9b2a665f946a88a945b7
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 539f4ab428d75f97714475d607d91ae2870a2bf860ac8ab2f732d5a5709d5e0cf2fd085c0bc3bbdf9b095f969542eb866c1f3c3766cc5e3a8be0294cf3cfcf0f
|
|
7
|
+
data.tar.gz: 7b41f191e5e6d5ff60a1c927111d06f85b6c461a1dd63d5665b254338eafcde037233bb7f67027306258515e71aa1af3cf38e6aadf20b02ec9f6a91f05c7f2c4
|
data/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,23 @@
|
|
|
2
2
|
|
|
3
3
|
## [Unreleased]
|
|
4
4
|
|
|
5
|
+
## [0.5.1] - 2026-06-18
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
|
|
9
|
+
- **Scope-column extraction mode** (`--scope-column`, SQL adapters only). For schemas where many independent top-level tables share the same scope/tenant column instead of converging on a single `belongs_to` root, exwiw can now filter **every** table by that shared column (`--scope-column=COLUMN` with `--ids` as its values) rather than anchoring on one `--target-table`. A table that carries the column is filtered directly; a table that lacks it but `belongs_to` a table that has it is joined up to the nearest such table and filtered there. A table that `belongs_to` a parent which is itself scoped but carries no scope column of its own (e.g. a *hub* table scoped only because an extractable child references it) is constrained to the parent's in-scope ids via a subquery (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`), so the hub's other children ride along to just the in-scope rows — limited to a single forward hop and a single unambiguous scopable parent. A table that cannot be scoped at all (no column and no path to one) makes the run **abort with a list of the offending tables**, so an unscoped table is never silently dumped in full. Two user-owned table-config keys support this and are preserved across `schema:generate` regeneration: **`scope_exempt: true`** exports a genuine reference/master table in full (rails-managed tables are treated as exempt automatically), and **`scope_column`** overrides the filtered column name for a table that stores the same scope value under a different name. `--scope-column` is mutually exclusive with `--target-table`, `--target-collection`, `--ids-column`, and `--ids-field`, can be set in `exwiw.yml`, and works with `exwiw explain`.
|
|
10
|
+
|
|
11
|
+
## [0.5.0] - 2026-06-16
|
|
12
|
+
|
|
13
|
+
### Added
|
|
14
|
+
|
|
15
|
+
- A YAML **config file** (`exwiw.yml`) can now hold any option except the database connection settings, so they no longer have to be repeated on every invocation. Pass it with `--config=PATH`; when `--config` is omitted, `exwiw.yml` (or `exwiw.yaml`) is loaded automatically from the current directory if present. **Options passed on the CLI take precedence** over the file (the file only fills in options not given on the CLI). Connection settings — `host`, `port`, `user`, `database`, `uri`, `password` — are **rejected** in the file (they must come from the CLI/environment); `adapter` is the one connection-related key allowed. Relative paths in the file (`schema_dir`, `output_dir`, `after_insert_hook`) are resolved relative to the config file's own directory (so a root-level `exwiw.yml` with `schema_dir: exwiw/schema` reads naturally and an absolute `--config` works from any directory). Unknown keys are rejected to catch typos, and export-only keys (`output_dir`, `output_format`, `insert_only`, `after_insert_hook`) are ignored under `explain` so one file can be shared by both subcommands.
|
|
16
|
+
|
|
17
|
+
### Changed
|
|
18
|
+
|
|
19
|
+
- **BREAKING**: the `export`/`explain` CLI option `--config-dir` has been renamed to `--schema-dir` to distinguish the directory of schema JSON files from the new `--config` config file. Its short form `-c` is now `--config` (the config file); `--schema-dir` has no short form. The hook contract is renamed to match: the shell-hook environment variable `EXWIW_CONFIG_DIR` is now `EXWIW_SCHEMA_DIR`, and the Ruby-hook `cli_options[:config_dir]` is now `cli_options[:schema_dir]`. Update invocations, scripts, and hooks accordingly (`--config-dir` no longer exists). `--schema-dir` is still required and has no default unless `schema_dir` is set in the config file.
|
|
20
|
+
- **BREAKING**: the env var that overrides where `schema:generate`, `schema:tidy`, and `schema:generate_mongoid` write their config has been renamed from `OUTPUT_DIR_PATH` to `EXWIW_SCHEMA_DIR_PATH`, and the default output directory is now `exwiw/schema` (previously `exwiw`). The new name disambiguates it from the dump-side `--output-dir`, and the dedicated `schema/` subdirectory leaves `exwiw/` free for other artifacts (hooks, dumps). `OUTPUT_DIR_PATH` is no longer read. Existing repositories should set `EXWIW_SCHEMA_DIR_PATH` (e.g. `EXWIW_SCHEMA_DIR_PATH=exwiw` to preserve the old flat layout) and/or move their config under `exwiw/schema/`; otherwise a `generate` run will write a fresh copy into `exwiw/schema/` and leave the old files stale. The `export`/`explain` CLI is unaffected, but examples now point at `exwiw/schema`.
|
|
21
|
+
|
|
5
22
|
## [0.4.11] - 2026-06-15
|
|
6
23
|
|
|
7
24
|
### Fixed
|
data/README.md
CHANGED
|
@@ -72,7 +72,7 @@ exwiw \
|
|
|
72
72
|
--port=3306 \
|
|
73
73
|
--user=reader \
|
|
74
74
|
--database=app_production \
|
|
75
|
-
--
|
|
75
|
+
--schema-dir=exwiw/schema \
|
|
76
76
|
--target-table=shops \
|
|
77
77
|
--ids=1 \ # comma separated ids
|
|
78
78
|
--output-dir=dump \
|
|
@@ -81,7 +81,7 @@ exwiw \
|
|
|
81
81
|
|
|
82
82
|
By default `--ids` are matched against the target table's primary key. `--ids-column=COLUMN` matches them against a different column instead (e.g. `--target-table=users --ids=alice@example.com --ids-column=email`). Related tables are still extracted correctly: their foreign keys are resolved through the target via a subquery (`WHERE fk IN (SELECT pk FROM target WHERE COLUMN IN (...))`), so only the target table's filter column changes. This is the SQL-adapter counterpart of the mongodb `--ids-field`; the two are mutually exclusive and each is rejected by the other adapter family. Note: if `COLUMN` is itself masked, re-running `delete-*` against an already-imported (masked) dump won't match, so prefer a stable natural key.
|
|
83
83
|
|
|
84
|
-
When `--target-table` and `--ids` are omitted, exwiw dumps all tables defined in `--
|
|
84
|
+
When `--target-table` and `--ids` are omitted, exwiw dumps all tables defined in `--schema-dir`:
|
|
85
85
|
|
|
86
86
|
```bash
|
|
87
87
|
# dump all tables
|
|
@@ -91,7 +91,7 @@ exwiw \
|
|
|
91
91
|
--port=5432 \
|
|
92
92
|
--user=reader \
|
|
93
93
|
--database=app_production \
|
|
94
|
-
--
|
|
94
|
+
--schema-dir=exwiw/schema \
|
|
95
95
|
--output-dir=dump
|
|
96
96
|
```
|
|
97
97
|
|
|
@@ -123,25 +123,140 @@ exwiw explain \
|
|
|
123
123
|
--adapter=postgresql \
|
|
124
124
|
--host=localhost --port=5432 --user=reader \
|
|
125
125
|
--database=app_production \
|
|
126
|
-
--
|
|
126
|
+
--schema-dir=exwiw/schema \
|
|
127
127
|
--target-table=shops --ids=1
|
|
128
128
|
```
|
|
129
129
|
|
|
130
130
|
The `--output-dir`, `--output-format`, `--insert-only`, and `--after-insert-hook` options are dump-specific and rejected when used with `explain`.
|
|
131
131
|
|
|
132
|
+
### Scope-column mode (`--scope-column`)
|
|
133
|
+
|
|
134
|
+
The default `--target-table` extraction assumes the schema converges on a single
|
|
135
|
+
root: every table is reached by walking `belongs_to` toward that one table. Some
|
|
136
|
+
schemas are not shaped that way — many independent top-level tables each carry the
|
|
137
|
+
*same* scope/tenant column (e.g. `tenant_id`, `account_uuid`) and there is no
|
|
138
|
+
single root. Choosing one of them as `--target-table` would leave the others
|
|
139
|
+
unrelated to it, and an unrelated table is dumped in full — a problem if it holds
|
|
140
|
+
personal data.
|
|
141
|
+
|
|
142
|
+
`--scope-column` handles this shape: instead of one anchor table, **every table is
|
|
143
|
+
filtered by a shared column** whose values are `--ids`.
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
exwiw \
|
|
147
|
+
--adapter=postgresql \
|
|
148
|
+
--host=localhost --port=5432 --user=reader \
|
|
149
|
+
--database=app_production \
|
|
150
|
+
--schema-dir=exwiw/schema \
|
|
151
|
+
--scope-column=tenant_id \
|
|
152
|
+
--ids=42,43 \
|
|
153
|
+
--output-dir=dump
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
Each table is resolved as follows:
|
|
157
|
+
|
|
158
|
+
- **Carries the scope column** → `WHERE scope_column IN (ids)`.
|
|
159
|
+
- **Lacks it but `belongs_to` reaches a table that has it** → exwiw joins up to the
|
|
160
|
+
nearest such table and applies the scope filter there (the same join machinery
|
|
161
|
+
the single-target mode uses).
|
|
162
|
+
- **`belongs_to` a parent that is itself scoped but carries no scope column of its
|
|
163
|
+
own** → exwiw constrains this table to the parent's in-scope ids via a subquery
|
|
164
|
+
(`fk IN (SELECT parent.pk FROM <parent's scoped query>)`). This covers a *hub*
|
|
165
|
+
table that has no scope column and is scoped only because an extractable child
|
|
166
|
+
references it (see referenced-by below): the hub's other `belongs_to` children
|
|
167
|
+
ride along to just the in-scope rows instead of being dumped in full. Limited to
|
|
168
|
+
a single forward hop and a single unambiguous scopable parent.
|
|
169
|
+
- **Cannot be scoped at all** (no scope column and no path to one) → exwiw
|
|
170
|
+
**aborts** and lists the offending tables, so an unscoped table is never silently
|
|
171
|
+
dumped in full. For each, either add a `belongs_to` path, set `ignore: true` to
|
|
172
|
+
skip it, or mark it `scope_exempt: true` (below) to export it in full.
|
|
173
|
+
|
|
174
|
+
`--scope-column` is SQL-only (mysql / postgresql / sqlite) and mutually exclusive
|
|
175
|
+
with `--target-table`, `--target-collection`, `--ids-column`, and `--ids-field`.
|
|
176
|
+
It works with `exwiw explain` too, which is the recommended way to preview the
|
|
177
|
+
queries before exporting.
|
|
178
|
+
|
|
179
|
+
#### `scope_exempt` (intentional full dump)
|
|
180
|
+
|
|
181
|
+
A genuine reference/master table (no personal data) that has no scope linkage can
|
|
182
|
+
opt out of the strict check and be exported in full:
|
|
183
|
+
|
|
184
|
+
```json
|
|
185
|
+
{
|
|
186
|
+
"name": "countries",
|
|
187
|
+
"primary_key": "id",
|
|
188
|
+
"scope_exempt": true,
|
|
189
|
+
"columns": [{ "name": "id" }, { "name": "code" }]
|
|
190
|
+
}
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
Rails-managed tables (`schema_migrations`, `ar_internal_metadata`) are treated as
|
|
194
|
+
exempt automatically.
|
|
195
|
+
|
|
196
|
+
#### Per-table `scope_column` override
|
|
197
|
+
|
|
198
|
+
scope-column mode assumes a single shared **value** space — the same `--ids` apply
|
|
199
|
+
to every scoped table. If a table stores that same value under a differently named
|
|
200
|
+
column, override the column name for that table:
|
|
201
|
+
|
|
202
|
+
```json
|
|
203
|
+
{
|
|
204
|
+
"name": "legacy_orders",
|
|
205
|
+
"primary_key": "id",
|
|
206
|
+
"scope_column": "legacy_tenant_id",
|
|
207
|
+
"columns": [{ "name": "id" }, { "name": "legacy_tenant_id" }]
|
|
208
|
+
}
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
Both `scope_exempt` and `scope_column` are user-maintained and preserved across
|
|
212
|
+
`schema:generate` regeneration (the generators never emit them).
|
|
213
|
+
|
|
214
|
+
### Config file (`exwiw.yml`)
|
|
215
|
+
|
|
216
|
+
Options you would otherwise repeat on every run can be kept in a YAML config file. Pass it with `--config=PATH`; when `--config` is omitted, exwiw automatically loads `exwiw.yml` (or `exwiw.yaml`) from the current directory if present.
|
|
217
|
+
|
|
218
|
+
**Options passed on the CLI always take precedence over the config file** — the config only fills in options you did not pass. This lets you commit the stable settings (which schema to read, output format, ...) while still varying the environment-specific connection details per invocation.
|
|
219
|
+
|
|
220
|
+
```yaml
|
|
221
|
+
# exwiw.yml — keep at the project root, alongside exwiw/schema/
|
|
222
|
+
adapter: postgresql
|
|
223
|
+
schema_dir: exwiw/schema
|
|
224
|
+
output_dir: dump
|
|
225
|
+
output_format: insert # insert | copy
|
|
226
|
+
insert_only: false
|
|
227
|
+
after_insert_hook: hooks/seed.rb
|
|
228
|
+
log_level: info # debug | info
|
|
229
|
+
# target_table / ids / ids_field / ids_column / scope_column may also be set here
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
With the file above, only the connection details need to be supplied on the CLI:
|
|
233
|
+
|
|
234
|
+
```bash
|
|
235
|
+
DATABASE_PASSWORD=... exwiw \
|
|
236
|
+
--host=localhost --port=5432 --user=reader --database=app_production \
|
|
237
|
+
--target-table=shops --ids=1
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
Notes:
|
|
241
|
+
|
|
242
|
+
- **Database connection settings stay on the CLI/environment.** `host`, `port`, `user`, `database`, `uri`, and `password` are **rejected** in the config file (exwiw exits with an error). `adapter` is the one connection-related key that *is* allowed in the file.
|
|
243
|
+
- **Relative paths in the config (`schema_dir`, `output_dir`, `after_insert_hook`) are resolved relative to the config file's own directory**, not the current working directory. So with the config at the project root, `schema_dir: exwiw/schema` reads naturally, and an absolute `--config=/path/to/exwiw.yml` works no matter where you run from. (CLI path flags remain relative to the current directory — each source resolves relative to where it is written.) Absolute paths are used as-is.
|
|
244
|
+
- Unknown keys are rejected so a typo surfaces immediately.
|
|
245
|
+
- Export-only keys (`output_dir`, `output_format`, `insert_only`, `after_insert_hook`) are ignored when running `explain`, so a single config file can be shared by both subcommands.
|
|
246
|
+
|
|
132
247
|
### Generator
|
|
133
248
|
|
|
134
249
|
The config generator is provided as a Rake task.
|
|
135
250
|
|
|
136
251
|
```bash
|
|
137
|
-
# generate table schema under exwiw/
|
|
252
|
+
# generate table schema under exwiw/schema/
|
|
138
253
|
bundle exec rake exwiw:schema:generate
|
|
139
254
|
```
|
|
140
255
|
|
|
141
|
-
By default, the schema files will be saved in the `exwiw` directory. You can specify a different output directory by setting the `
|
|
256
|
+
By default, the schema files will be saved in the `exwiw/schema` directory. You can specify a different output directory by setting the `EXWIW_SCHEMA_DIR_PATH` environment variable:
|
|
142
257
|
|
|
143
258
|
```sh
|
|
144
|
-
|
|
259
|
+
EXWIW_SCHEMA_DIR_PATH=custom_directory bundle exec rake exwiw:schema:generate
|
|
145
260
|
```
|
|
146
261
|
|
|
147
262
|
#### Tidying stale config (`schema:tidy`)
|
|
@@ -159,14 +274,14 @@ bundle exec rake exwiw:schema:tidy
|
|
|
159
274
|
|
|
160
275
|
Because it reads the database directly, a table that still exists in the database but has lost (or never had) an ActiveRecord model is **kept** — only a table that is genuinely gone is removed. (This is the deliberate counterpart to `generate`, which is model-driven and only ever adds what the models know about.)
|
|
161
276
|
|
|
162
|
-
It respects `
|
|
277
|
+
It respects `EXWIW_SCHEMA_DIR_PATH` and the per-database subdirectory layout in the same way as `schema:generate`. Unlike `generate`, `tidy` never adds or regenerates entries — every surviving table/column (including hand-edited `comment` / `ignore` / `replace_with`) is left untouched, so it is safe to run on a customized config. The task prints which tables and columns it removed (or that the config was already tidy). Stale `belongs_tos` are not pruned by `tidy`; rerun `schema:generate` to refresh those.
|
|
163
278
|
|
|
164
279
|
#### Multiple databases
|
|
165
280
|
|
|
166
281
|
If the application uses Rails' multiple-database support (`connects_to`), `schema:generate` buckets models by the database they connect to and writes each database's config files into its own subdirectory of the output directory, named after the database config name (`primary`, `analytics`, ...):
|
|
167
282
|
|
|
168
283
|
```
|
|
169
|
-
exwiw/
|
|
284
|
+
exwiw/schema/
|
|
170
285
|
primary/
|
|
171
286
|
shops.json
|
|
172
287
|
users.json
|
|
@@ -267,7 +382,7 @@ This is an example of the one table schema:
|
|
|
267
382
|
}
|
|
268
383
|
```
|
|
269
384
|
|
|
270
|
-
`--
|
|
385
|
+
`--schema-dir` will use all json files in the specified directory.
|
|
271
386
|
|
|
272
387
|
### Output format
|
|
273
388
|
|
|
@@ -307,7 +422,7 @@ SQL
|
|
|
307
422
|
|
|
308
423
|
**Shell hook**: anything other than `.rb` is exec'd as a child process. It is a pure side-effect hook — exwiw does not capture its stdout. The hook receives these env vars and inherits `DATABASE_PASSWORD` from the parent:
|
|
309
424
|
|
|
310
|
-
- `EXWIW_OUTPUT_DIR`, `
|
|
425
|
+
- `EXWIW_OUTPUT_DIR`, `EXWIW_SCHEMA_DIR`
|
|
311
426
|
- `EXWIW_DATABASE_ADAPTER`, `EXWIW_DATABASE_HOST`, `EXWIW_DATABASE_PORT`, `EXWIW_DATABASE_USER`, `EXWIW_DATABASE_NAME`
|
|
312
427
|
- `EXWIW_TARGET_TABLE`, `EXWIW_IDS` (comma-separated), `EXWIW_OUTPUT_FORMAT`
|
|
313
428
|
|
|
@@ -31,13 +31,14 @@ module Exwiw
|
|
|
31
31
|
def self.run_shell(path:, cli_options:, output_dir:, logger:)
|
|
32
32
|
env = {
|
|
33
33
|
'EXWIW_OUTPUT_DIR' => output_dir,
|
|
34
|
-
'
|
|
34
|
+
'EXWIW_SCHEMA_DIR' => cli_options[:schema_dir].to_s,
|
|
35
35
|
'EXWIW_DATABASE_ADAPTER' => cli_options[:database_adapter].to_s,
|
|
36
36
|
'EXWIW_DATABASE_HOST' => cli_options[:database_host].to_s,
|
|
37
37
|
'EXWIW_DATABASE_PORT' => cli_options[:database_port].to_s,
|
|
38
38
|
'EXWIW_DATABASE_USER' => cli_options[:database_user].to_s,
|
|
39
39
|
'EXWIW_DATABASE_NAME' => cli_options[:database_name].to_s,
|
|
40
40
|
'EXWIW_TARGET_TABLE' => cli_options[:target_table].to_s,
|
|
41
|
+
'EXWIW_SCOPE_COLUMN' => cli_options[:scope_column].to_s,
|
|
41
42
|
'EXWIW_IDS' => Array(cli_options[:ids]).join(','),
|
|
42
43
|
'EXWIW_OUTPUT_FORMAT' => cli_options[:output_format].to_s,
|
|
43
44
|
}
|
data/lib/exwiw/cli.rb
CHANGED
|
@@ -5,6 +5,7 @@ require 'optparse'
|
|
|
5
5
|
require 'pathname'
|
|
6
6
|
|
|
7
7
|
require 'json'
|
|
8
|
+
require 'yaml'
|
|
8
9
|
|
|
9
10
|
require 'exwiw'
|
|
10
11
|
|
|
@@ -12,6 +13,40 @@ module Exwiw
|
|
|
12
13
|
class CLI
|
|
13
14
|
KNOWN_SUBCOMMANDS = %w[export explain].freeze
|
|
14
15
|
|
|
16
|
+
# Config file loaded automatically when --config is omitted, if one exists in
|
|
17
|
+
# the current directory. Kept at the project root (rather than under exwiw/)
|
|
18
|
+
# so that config-relative paths like `schema_dir: exwiw/schema` read naturally.
|
|
19
|
+
# Both extensions are accepted; .yml wins when both are present.
|
|
20
|
+
DEFAULT_CONFIG_PATHS = %w[exwiw.yml exwiw.yaml].freeze
|
|
21
|
+
|
|
22
|
+
# Keys accepted in the config file. Anything outside this set is rejected so
|
|
23
|
+
# a typo surfaces immediately instead of being silently ignored. These mirror
|
|
24
|
+
# the non-connection CLI options (plus `adapter`).
|
|
25
|
+
ALLOWED_CONFIG_KEYS = %w[
|
|
26
|
+
adapter
|
|
27
|
+
schema_dir
|
|
28
|
+
output_dir
|
|
29
|
+
output_format
|
|
30
|
+
insert_only
|
|
31
|
+
after_insert_hook
|
|
32
|
+
log_level
|
|
33
|
+
target_table
|
|
34
|
+
target_collection
|
|
35
|
+
ids
|
|
36
|
+
ids_field
|
|
37
|
+
ids_column
|
|
38
|
+
scope_column
|
|
39
|
+
].freeze
|
|
40
|
+
|
|
41
|
+
# Database connection settings are environment-specific (and sometimes
|
|
42
|
+
# secret-adjacent), so they must be passed via CLI/env, never the committed
|
|
43
|
+
# config file. `adapter` is the one connection-ish key allowed in config.
|
|
44
|
+
REJECTED_CONNECTION_KEYS = %w[host port user database uri password].freeze
|
|
45
|
+
|
|
46
|
+
# Keys that only make sense for `export`. They are skipped when merging config
|
|
47
|
+
# for `explain` so a shared config file does not trip validate_explain_only!.
|
|
48
|
+
EXPORT_ONLY_CONFIG_KEYS = %w[output_dir output_format insert_only after_insert_hook].freeze
|
|
49
|
+
|
|
15
50
|
def self.start(argv)
|
|
16
51
|
new(argv).run
|
|
17
52
|
end
|
|
@@ -34,7 +69,8 @@ module Exwiw
|
|
|
34
69
|
@database_password = ENV["DATABASE_PASSWORD"]
|
|
35
70
|
@connection_uri = nil
|
|
36
71
|
@output_dir = nil
|
|
37
|
-
@
|
|
72
|
+
@schema_dir = nil
|
|
73
|
+
@config_file_path = nil
|
|
38
74
|
@database_adapter = nil
|
|
39
75
|
@database_name = nil
|
|
40
76
|
@target_table_name = nil
|
|
@@ -42,10 +78,13 @@ module Exwiw
|
|
|
42
78
|
@ids = []
|
|
43
79
|
@ids_field = nil
|
|
44
80
|
@ids_column = nil
|
|
81
|
+
@scope_column = nil
|
|
45
82
|
@output_format = nil
|
|
46
83
|
@insert_only = nil
|
|
47
84
|
@after_insert_hook_path = nil
|
|
48
|
-
|
|
85
|
+
# nil (not :info) so we can tell "user passed --log-level" from the default,
|
|
86
|
+
# letting a config-file value fill in; the :info default is applied later.
|
|
87
|
+
@log_level = nil
|
|
49
88
|
|
|
50
89
|
parser.parse!(@argv)
|
|
51
90
|
end
|
|
@@ -72,6 +111,7 @@ module Exwiw
|
|
|
72
111
|
table_name: @target_table_name,
|
|
73
112
|
ids: @ids,
|
|
74
113
|
ids_field: @ids_field,
|
|
114
|
+
scope_column: @scope_column,
|
|
75
115
|
)
|
|
76
116
|
|
|
77
117
|
logger = build_logger
|
|
@@ -82,7 +122,7 @@ module Exwiw
|
|
|
82
122
|
Runner.new(
|
|
83
123
|
connection_config: connection_config,
|
|
84
124
|
output_dir: @output_dir,
|
|
85
|
-
|
|
125
|
+
schema_dir: @schema_dir,
|
|
86
126
|
dump_target: dump_target,
|
|
87
127
|
output_format: @output_format,
|
|
88
128
|
insert_only: @insert_only,
|
|
@@ -93,7 +133,7 @@ module Exwiw
|
|
|
93
133
|
when "explain"
|
|
94
134
|
ExplainRunner.new(
|
|
95
135
|
connection_config: connection_config,
|
|
96
|
-
|
|
136
|
+
schema_dir: @schema_dir,
|
|
97
137
|
dump_target: dump_target,
|
|
98
138
|
logger: logger,
|
|
99
139
|
io: $stdout,
|
|
@@ -102,6 +142,14 @@ module Exwiw
|
|
|
102
142
|
end
|
|
103
143
|
|
|
104
144
|
private def validate_options!
|
|
145
|
+
# Fill in any options not given on the CLI from the config file. Done first
|
|
146
|
+
# so a config-provided `adapter` is in place before normalization below.
|
|
147
|
+
# CLI values always win (the merge only fills nil/empty ivars).
|
|
148
|
+
apply_config_file!
|
|
149
|
+
|
|
150
|
+
# Default log level once CLI and config have both had their say.
|
|
151
|
+
@log_level ||= :info
|
|
152
|
+
|
|
105
153
|
# Fold driver/Rails adapter spellings (mysql2, sqlite3) into exwiw's
|
|
106
154
|
# canonical names up front, so every check below — and the
|
|
107
155
|
# EXWIW_DATABASE_ADAPTER passed to hooks — sees the canonical name.
|
|
@@ -116,6 +164,7 @@ module Exwiw
|
|
|
116
164
|
end
|
|
117
165
|
|
|
118
166
|
resolve_target_collection_alias!
|
|
167
|
+
resolve_scope_column!
|
|
119
168
|
resolve_ids_column_alias!
|
|
120
169
|
resolve_uri_option!
|
|
121
170
|
|
|
@@ -163,18 +212,18 @@ module Exwiw
|
|
|
163
212
|
end
|
|
164
213
|
end
|
|
165
214
|
|
|
166
|
-
if @
|
|
167
|
-
$stderr.puts "
|
|
215
|
+
if @schema_dir.nil?
|
|
216
|
+
$stderr.puts "Schema dir is required (pass --schema-dir or set schema_dir in the config file)"
|
|
168
217
|
exit 1
|
|
169
218
|
end
|
|
170
219
|
|
|
171
|
-
unless Dir.exist?(@
|
|
172
|
-
$stderr.puts "
|
|
220
|
+
unless Dir.exist?(@schema_dir)
|
|
221
|
+
$stderr.puts "Schema dir does not exist: #{@schema_dir}"
|
|
173
222
|
exit 1
|
|
174
223
|
end
|
|
175
224
|
|
|
176
|
-
if Dir.glob(File.join(@
|
|
177
|
-
$stderr.puts "
|
|
225
|
+
if Dir.glob(File.join(@schema_dir, "*.json")).empty?
|
|
226
|
+
$stderr.puts "Schema dir contains no .json files: #{@schema_dir}"
|
|
178
227
|
exit 1
|
|
179
228
|
end
|
|
180
229
|
|
|
@@ -183,8 +232,13 @@ module Exwiw
|
|
|
183
232
|
exit 1
|
|
184
233
|
end
|
|
185
234
|
|
|
186
|
-
if
|
|
187
|
-
$stderr.puts "--
|
|
235
|
+
if @scope_column && @ids.empty?
|
|
236
|
+
$stderr.puts "--ids is required when --scope-column is specified"
|
|
237
|
+
exit 1
|
|
238
|
+
end
|
|
239
|
+
|
|
240
|
+
if !@target_table_name && !@scope_column && @ids.any?
|
|
241
|
+
$stderr.puts "--target-table or --scope-column is required when --ids is specified"
|
|
188
242
|
exit 1
|
|
189
243
|
end
|
|
190
244
|
|
|
@@ -202,6 +256,79 @@ module Exwiw
|
|
|
202
256
|
end
|
|
203
257
|
end
|
|
204
258
|
|
|
259
|
+
# Merge settings from the config file (YAML) into any options the user did
|
|
260
|
+
# not pass on the CLI. The CLI always wins: every assignment below only fills
|
|
261
|
+
# an ivar that is still nil/empty after parsing ARGV. Connection settings
|
|
262
|
+
# (except `adapter`) are rejected here — they belong on the CLI/env.
|
|
263
|
+
private def apply_config_file!
|
|
264
|
+
path =
|
|
265
|
+
if @config_file_path
|
|
266
|
+
unless File.file?(@config_file_path)
|
|
267
|
+
$stderr.puts "Config file not found: #{@config_file_path}"
|
|
268
|
+
exit 1
|
|
269
|
+
end
|
|
270
|
+
@config_file_path
|
|
271
|
+
else
|
|
272
|
+
DEFAULT_CONFIG_PATHS.map { |p| File.expand_path(p) }.find { |p| File.file?(p) }
|
|
273
|
+
end
|
|
274
|
+
return if path.nil?
|
|
275
|
+
|
|
276
|
+
# Paths inside the config file are resolved relative to the file's own
|
|
277
|
+
# directory (not cwd), so `schema_dir: exwiw/schema` reads naturally with the
|
|
278
|
+
# config kept at the project root, and an absolute --config works from any
|
|
279
|
+
# cwd. (CLI path flags stay cwd-relative — each source resolves relative to
|
|
280
|
+
# where it is written.) `path` is always absolute here.
|
|
281
|
+
base = File.dirname(path)
|
|
282
|
+
|
|
283
|
+
config = YAML.safe_load(File.read(path)) || {}
|
|
284
|
+
unless config.is_a?(Hash)
|
|
285
|
+
$stderr.puts "Config file must be a YAML mapping (key: value): #{path}"
|
|
286
|
+
exit 1
|
|
287
|
+
end
|
|
288
|
+
|
|
289
|
+
config.each_key do |key|
|
|
290
|
+
if REJECTED_CONNECTION_KEYS.include?(key)
|
|
291
|
+
$stderr.puts "'#{key}' is a database connection setting and must be passed via the CLI/environment, not the config file (#{path})"
|
|
292
|
+
exit 1
|
|
293
|
+
end
|
|
294
|
+
unless ALLOWED_CONFIG_KEYS.include?(key)
|
|
295
|
+
$stderr.puts "Unknown config key '#{key}' in #{path}. Allowed keys: #{ALLOWED_CONFIG_KEYS.join(', ')}"
|
|
296
|
+
exit 1
|
|
297
|
+
end
|
|
298
|
+
end
|
|
299
|
+
|
|
300
|
+
# For `explain`, drop export-only keys so a config shared with `export`
|
|
301
|
+
# does not make validate_explain_only! reject the run.
|
|
302
|
+
config = config.reject { |k, _| EXPORT_ONLY_CONFIG_KEYS.include?(k) } if @subcommand == "explain"
|
|
303
|
+
|
|
304
|
+
@database_adapter ||= config["adapter"]
|
|
305
|
+
@schema_dir ||= expand_dir(config["schema_dir"], base)
|
|
306
|
+
@output_dir ||= expand_dir(config["output_dir"], base)
|
|
307
|
+
@after_insert_hook_path ||= (File.expand_path(config["after_insert_hook"], base) if config["after_insert_hook"])
|
|
308
|
+
@output_format ||= config["output_format"]
|
|
309
|
+
@insert_only = config["insert_only"] if @insert_only.nil? && config.key?("insert_only")
|
|
310
|
+
@log_level ||= config["log_level"]&.to_sym
|
|
311
|
+
@target_table_name ||= config["target_table"]
|
|
312
|
+
@target_collection_name ||= config["target_collection"]
|
|
313
|
+
if @ids.empty? && config.key?("ids")
|
|
314
|
+
raw = config["ids"]
|
|
315
|
+
# Accept either a YAML list or a "1,2" string; coerce to strings to match
|
|
316
|
+
# the CLI's `--ids=1,2` -> ["1", "2"] shape.
|
|
317
|
+
@ids = (raw.is_a?(String) ? raw.split(",") : Array(raw)).map(&:to_s)
|
|
318
|
+
end
|
|
319
|
+
@ids_field ||= config["ids_field"]
|
|
320
|
+
@ids_column ||= config["ids_column"]
|
|
321
|
+
@scope_column ||= config["scope_column"]
|
|
322
|
+
end
|
|
323
|
+
|
|
324
|
+
# Strip a trailing slash (like the CLI's dir options) and expand relative to
|
|
325
|
+
# `base` (the config file's directory). Returns nil for a nil value.
|
|
326
|
+
private def expand_dir(value, base)
|
|
327
|
+
return nil if value.nil?
|
|
328
|
+
value = value.end_with?("/") ? value[0..-2] : value
|
|
329
|
+
File.expand_path(value, base)
|
|
330
|
+
end
|
|
331
|
+
|
|
205
332
|
# `--target-collection` is a mongodb-only alias of `--target-table`. Fold it
|
|
206
333
|
# into @target_table_name (the single field the rest of the CLI/runner uses)
|
|
207
334
|
# after rejecting the misuses: combining it with --target-table, or using it
|
|
@@ -259,6 +386,33 @@ module Exwiw
|
|
|
259
386
|
end
|
|
260
387
|
end
|
|
261
388
|
|
|
389
|
+
# `--scope-column` switches to scope-column mode: every table is filtered by a
|
|
390
|
+
# shared column (`--ids` are its values) instead of anchoring on one
|
|
391
|
+
# `--target-table`. It is SQL-only and mutually exclusive with the single-target
|
|
392
|
+
# flags. Runs after resolve_target_collection_alias! (so --target-collection is
|
|
393
|
+
# already folded into @target_table_name) and before resolve_ids_column_alias!
|
|
394
|
+
# so the clearer "cannot combine" message wins over the generic ids-column one.
|
|
395
|
+
private def resolve_scope_column!
|
|
396
|
+
return if @scope_column.nil?
|
|
397
|
+
|
|
398
|
+
sql_adapters = ["mysql", "postgresql", "sqlite"]
|
|
399
|
+
unless sql_adapters.include?(@database_adapter)
|
|
400
|
+
$stderr.puts "--scope-column is only supported by the sql adapters"
|
|
401
|
+
exit 1
|
|
402
|
+
end
|
|
403
|
+
|
|
404
|
+
if @target_table_name
|
|
405
|
+
$stderr.puts "--scope-column cannot be combined with --target-table/--target-collection"
|
|
406
|
+
exit 1
|
|
407
|
+
end
|
|
408
|
+
|
|
409
|
+
if @ids_field || @ids_column
|
|
410
|
+
flag = @ids_column ? "--ids-column" : "--ids-field"
|
|
411
|
+
$stderr.puts "--scope-column cannot be combined with #{flag}"
|
|
412
|
+
exit 1
|
|
413
|
+
end
|
|
414
|
+
end
|
|
415
|
+
|
|
262
416
|
# `--uri` supplies a full connection string (e.g. `mongodb+srv://...`) and is
|
|
263
417
|
# mongodb-only — the SQL adapters shell out to their own client binaries with
|
|
264
418
|
# discrete host/port/user flags and have no equivalent. Runs after the
|
|
@@ -319,12 +473,13 @@ module Exwiw
|
|
|
319
473
|
database_user: @database_user,
|
|
320
474
|
database_password: @database_password,
|
|
321
475
|
output_dir: @output_dir,
|
|
322
|
-
|
|
476
|
+
schema_dir: @schema_dir,
|
|
323
477
|
database_adapter: @database_adapter,
|
|
324
478
|
database_name: @database_name,
|
|
325
479
|
target_table: @target_table_name,
|
|
326
480
|
ids: @ids.dup.freeze,
|
|
327
481
|
ids_field: @ids_field,
|
|
482
|
+
scope_column: @scope_column,
|
|
328
483
|
output_format: @output_format,
|
|
329
484
|
insert_only: @insert_only,
|
|
330
485
|
log_level: @log_level,
|
|
@@ -368,9 +523,12 @@ module Exwiw
|
|
|
368
523
|
v = v.end_with?("/") ? v[0..-2] : v
|
|
369
524
|
@output_dir = File.expand_path(v)
|
|
370
525
|
end
|
|
371
|
-
opts.on("
|
|
526
|
+
opts.on("--schema-dir=SCHEMA_DIR_PATH", "Directory of schema JSON files. (or set schema_dir in the config file)") do |v|
|
|
372
527
|
v = v.end_with?("/") ? v[0..-2] : v
|
|
373
|
-
@
|
|
528
|
+
@schema_dir = File.expand_path(v)
|
|
529
|
+
end
|
|
530
|
+
opts.on("-c", "--config=CONFIG_FILE_PATH", "Path to the exwiw config YAML. Defaults to ./#{DEFAULT_CONFIG_PATHS.first} (or .#{File.extname(DEFAULT_CONFIG_PATHS.last)}) when present. CLI options take precedence; paths inside the file are resolved relative to the file.") do |v|
|
|
531
|
+
@config_file_path = File.expand_path(v)
|
|
374
532
|
end
|
|
375
533
|
opts.on("-a", "--adapter=ADAPTER", "Database adapter: mysql, sqlite, postgresql, mongodb (aliases: mysql2, sqlite3)") { |v| @database_adapter = v }
|
|
376
534
|
opts.on("--uri=URI", "Full MongoDB connection URI (mongodb:// or mongodb+srv://). mongodb adapter only; takes precedence over --host/--port/--user. TLS, replicaSet, authSource and credentials are read from the URI.") { |v| @connection_uri = v }
|
|
@@ -380,6 +538,7 @@ module Exwiw
|
|
|
380
538
|
opts.on("--ids=[IDS]", "Comma-separated list of identifiers. Required when --target-table is given.") { |v| @ids = v.split(',') }
|
|
381
539
|
opts.on("--ids-field=[FIELD]", "Field on the target collection that --ids is matched against. Defaults to the primary key. (mongodb adapter only)") { |v| @ids_field = v }
|
|
382
540
|
opts.on("--ids-column=[COLUMN]", "Column on the target table that --ids is matched against. Defaults to the primary key. (sql adapters only)") { |v| @ids_column = v }
|
|
541
|
+
opts.on("--scope-column=[COLUMN]", "Filter every table by this shared column (--ids are its values) instead of a single --target-table. Tables lacking it are reached via belongs_to. SQL adapters only; mutually exclusive with --target-table.") { |v| @scope_column = v }
|
|
383
542
|
opts.on("--output-format=[FORMAT]", "Output format: insert (default) or copy (PostgreSQL only, export subcommand only)") { |v| @output_format = v }
|
|
384
543
|
opts.on("--insert-only", "Do not generate DELETE SQL files (export subcommand only)") { @insert_only = true }
|
|
385
544
|
opts.on("--after-insert-hook=PATH", "Path to a .rb or .sh post-processing hook executed after all insert/delete files are written (export subcommand only)") do |v|
|
data/lib/exwiw/explain_runner.rb
CHANGED
|
@@ -4,13 +4,13 @@ module Exwiw
|
|
|
4
4
|
class ExplainRunner
|
|
5
5
|
def initialize(
|
|
6
6
|
connection_config:,
|
|
7
|
-
|
|
7
|
+
schema_dir:,
|
|
8
8
|
dump_target:,
|
|
9
9
|
logger:,
|
|
10
10
|
io: $stdout
|
|
11
11
|
)
|
|
12
12
|
@connection_config = connection_config
|
|
13
|
-
@
|
|
13
|
+
@schema_dir = schema_dir
|
|
14
14
|
@dump_target = dump_target
|
|
15
15
|
@logger = logger
|
|
16
16
|
@io = io
|
|
@@ -26,8 +26,11 @@ module Exwiw
|
|
|
26
26
|
target = table_by_name[@dump_target.table_name]
|
|
27
27
|
adapter.validate_as_dump_target!(target) if target
|
|
28
28
|
|
|
29
|
+
dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
|
|
30
|
+
QueryAstBuilder.validate_scope!(dumpable_configs, table_by_name, @dump_target, @logger)
|
|
31
|
+
|
|
29
32
|
@logger.debug("Determining table processing order...")
|
|
30
|
-
ordered_table_names = DetermineTableProcessingOrder.run(
|
|
33
|
+
ordered_table_names = DetermineTableProcessingOrder.run(dumpable_configs)
|
|
31
34
|
|
|
32
35
|
total_size = ordered_table_names.size
|
|
33
36
|
ordered_table_names.each_with_index do |table_name, idx|
|
|
@@ -53,7 +56,7 @@ module Exwiw
|
|
|
53
56
|
end
|
|
54
57
|
|
|
55
58
|
private def load_table_config(klass)
|
|
56
|
-
Dir[File.join(@
|
|
59
|
+
Dir[File.join(@schema_dir, "*.json")].map do |file|
|
|
57
60
|
json = JSON.parse(File.read(file))
|
|
58
61
|
klass.from(json).reject_ignored_members!
|
|
59
62
|
end
|
|
@@ -2,23 +2,58 @@
|
|
|
2
2
|
|
|
3
3
|
module Exwiw
|
|
4
4
|
class QueryAstBuilder
|
|
5
|
-
def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true)
|
|
6
|
-
new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse).run
|
|
5
|
+
def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
|
|
6
|
+
new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse, allow_forward: allow_forward).run
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
# Scope-column mode classification for a single table. One of
|
|
10
|
+
# :exempt / :direct / :via_path / :referenced_by / :via_scoped_parent / :unscopable.
|
|
11
|
+
def self.scope_category(table_name, table_by_name, dump_target, logger)
|
|
12
|
+
new(table_name, table_by_name, dump_target, logger).scope_category
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
# Strict pre-flight for scope-column mode: abort if any extractable table
|
|
16
|
+
# cannot be scoped, so an unscoped (potentially sensitive) table is never
|
|
17
|
+
# silently dumped in full. No-op outside scope mode. `tables` is the set of
|
|
18
|
+
# dumpable configs (ignore:true tables are skipped — they are not extracted).
|
|
19
|
+
def self.validate_scope!(tables, table_by_name, dump_target, logger)
|
|
20
|
+
return if dump_target.scope_column.nil?
|
|
21
|
+
|
|
22
|
+
unscopable =
|
|
23
|
+
tables.reject(&:ignore).select do |table|
|
|
24
|
+
scope_category(table.name, table_by_name, dump_target, logger) == :unscopable
|
|
25
|
+
end
|
|
26
|
+
return if unscopable.empty?
|
|
27
|
+
|
|
28
|
+
names = unscopable.map(&:name).sort.join(", ")
|
|
29
|
+
raise ArgumentError,
|
|
30
|
+
"scope-column mode: #{unscopable.size} table(s) cannot be scoped by " \
|
|
31
|
+
"'#{dump_target.scope_column}': #{names}. For each, add `scope_exempt: true` " \
|
|
32
|
+
"to export it in full, set `ignore: true` to skip it, or add a belongs_to path " \
|
|
33
|
+
"to a table that carries the scope column (use a per-table `scope_column` if the " \
|
|
34
|
+
"column name differs on that table)."
|
|
7
35
|
end
|
|
8
36
|
|
|
9
37
|
attr_reader :table_name, :table_by_name, :dump_target
|
|
10
38
|
|
|
11
|
-
def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true)
|
|
39
|
+
def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
|
|
12
40
|
@table_name = table_name
|
|
13
41
|
@table_by_name = table_by_name
|
|
14
42
|
@dump_target = dump_target
|
|
15
43
|
@logger = logger
|
|
16
44
|
@allow_reverse = allow_reverse
|
|
45
|
+
# @allow_forward gates the "scope via an indirectly-scoped belongs_to
|
|
46
|
+
# parent" rescue (build_belongs_to_scoped_clause). Disabled while building a
|
|
47
|
+
# parent/child subquery so a single forward hop never recurses into another
|
|
48
|
+
# (which could loop on a belongs_to cycle).
|
|
49
|
+
@allow_forward = allow_forward
|
|
17
50
|
end
|
|
18
51
|
|
|
19
52
|
def run
|
|
20
53
|
table = table_by_name.fetch(table_name)
|
|
21
54
|
|
|
55
|
+
return build_scoped(table) if scope_mode?
|
|
56
|
+
|
|
22
57
|
where_clauses = build_where_clauses(table, dump_target)
|
|
23
58
|
join_clauses = build_join_clauses(table, table_by_name, dump_target)
|
|
24
59
|
|
|
@@ -130,8 +165,10 @@ module Exwiw
|
|
|
130
165
|
next if relation.nil? || relation.polymorphic?
|
|
131
166
|
|
|
132
167
|
# Build the child's own extraction query. allow_reverse:false stops a
|
|
133
|
-
# chain of FK-less tables from recursing back into each other
|
|
134
|
-
|
|
168
|
+
# chain of FK-less tables from recursing back into each other;
|
|
169
|
+
# allow_forward:false stops the child from forward-scoping back through
|
|
170
|
+
# this very table (which would loop).
|
|
171
|
+
child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false, allow_forward: false)
|
|
135
172
|
|
|
136
173
|
# Only an *already constrained* child narrows anything; an unconstrained
|
|
137
174
|
# child would select every fk value (i.e. dump all) and not help.
|
|
@@ -169,6 +206,64 @@ module Exwiw
|
|
|
169
206
|
)
|
|
170
207
|
end
|
|
171
208
|
|
|
209
|
+
# Scope-column mode. Builds a `fk IN (SELECT parent.pk FROM <parent
|
|
210
|
+
# extraction query>)` clause for a table whose belongs_to parent is itself
|
|
211
|
+
# scopable but carries no scope column of its own — so find_path_to_scoped
|
|
212
|
+
# cannot terminate on it (via_path fails) and nothing references this table
|
|
213
|
+
# (referenced_by fails). The classic shape is a hub scoped only via
|
|
214
|
+
# referenced_by (e.g. CDP `customer_accounts`, scoped by the `customers` that
|
|
215
|
+
# reference it) with sibling detail tables (`customer_account_details`, ...)
|
|
216
|
+
# hanging off it. Constraining those siblings to the hub's in-scope ids keeps
|
|
217
|
+
# them out of a full dump. Returns nil when there is no single, unambiguous
|
|
218
|
+
# scopable parent, leaving the caller on the unscopable path.
|
|
219
|
+
private def build_belongs_to_scoped_clause(table)
|
|
220
|
+
candidates = table.belongs_tos.filter_map do |relation|
|
|
221
|
+
# A polymorphic belongs_to points at several parent tables through one
|
|
222
|
+
# column, so it cannot project to a single parent id set; skip it.
|
|
223
|
+
next if relation.polymorphic?
|
|
224
|
+
|
|
225
|
+
parent = table_by_name[relation.table_name]
|
|
226
|
+
next if parent.nil?
|
|
227
|
+
|
|
228
|
+
# Build the parent's own scoped query. allow_reverse stays true so the
|
|
229
|
+
# parent may be scoped via referenced_by; allow_forward:false bounds this
|
|
230
|
+
# to a single forward hop so a belongs_to cycle cannot loop.
|
|
231
|
+
parent_query = self.class.run(parent.name, table_by_name, dump_target, @logger, allow_reverse: true, allow_forward: false)
|
|
232
|
+
|
|
233
|
+
# Only a constrained parent narrows anything; an unconstrained parent
|
|
234
|
+
# would select every pk (i.e. dump all) and not help.
|
|
235
|
+
next unless parent_query.where_clauses.any? || parent_query.join_clauses.any?
|
|
236
|
+
|
|
237
|
+
[relation, parent, parent_query]
|
|
238
|
+
end
|
|
239
|
+
|
|
240
|
+
# Only the unambiguous single-parent case. Multiple scopable parents would
|
|
241
|
+
# need their subqueries combined (not supported); fall back to unscopable.
|
|
242
|
+
if candidates.size != 1
|
|
243
|
+
if candidates.size > 1
|
|
244
|
+
@logger.debug(" #{table.name} has multiple scopable parents; skipping forward scope (unscopable).")
|
|
245
|
+
end
|
|
246
|
+
return nil
|
|
247
|
+
end
|
|
248
|
+
|
|
249
|
+
relation, parent, parent_query = candidates.first
|
|
250
|
+
|
|
251
|
+
# Project the parent's extraction query down to just its primary key — the
|
|
252
|
+
# column this table's foreign key points at.
|
|
253
|
+
pk_column = TableColumn.from_symbol_keys(name: parent.primary_key)
|
|
254
|
+
projected = QueryAst::Select.new
|
|
255
|
+
projected.from(parent_query.from_table_name)
|
|
256
|
+
projected.select([pk_column])
|
|
257
|
+
parent_query.join_clauses.each { |j| projected.join(j) }
|
|
258
|
+
parent_query.where_clauses.each { |w| projected.where(w) }
|
|
259
|
+
|
|
260
|
+
QueryAst::WhereClause.new(
|
|
261
|
+
column_name: relation.foreign_key,
|
|
262
|
+
operator: :in_subquery,
|
|
263
|
+
value: QueryAst::SelectSubquery.new(query: projected)
|
|
264
|
+
)
|
|
265
|
+
end
|
|
266
|
+
|
|
172
267
|
private def build_where_clauses(table, dump_target)
|
|
173
268
|
clauses = []
|
|
174
269
|
|
|
@@ -264,5 +359,208 @@ module Exwiw
|
|
|
264
359
|
|
|
265
360
|
queue
|
|
266
361
|
end
|
|
362
|
+
|
|
363
|
+
# ------------------------------------------------------------------
|
|
364
|
+
# Scope-column mode (Exwiw::DumpTarget#scope_column).
|
|
365
|
+
#
|
|
366
|
+
# The single-target machinery above anchors everything on one named table.
|
|
367
|
+
# Scope mode instead filters every table by a shared column. The relationship
|
|
368
|
+
# walk is the same idea — the *terminus* is just "any table carrying the
|
|
369
|
+
# scope column" rather than "the one named target".
|
|
370
|
+
# ------------------------------------------------------------------
|
|
371
|
+
|
|
372
|
+
private def scope_mode?
|
|
373
|
+
!dump_target.scope_column.nil?
|
|
374
|
+
end
|
|
375
|
+
|
|
376
|
+
# Classifier used by validate_scope! and mirrored by build_scoped below.
|
|
377
|
+
def scope_category
|
|
378
|
+
table = table_by_name.fetch(table_name)
|
|
379
|
+
return :exempt if scope_exempt?(table)
|
|
380
|
+
return :direct if directly_scoped?(table)
|
|
381
|
+
return :via_path if build_join_clauses_scoped(table).any?
|
|
382
|
+
return :referenced_by if @allow_reverse && build_referenced_by_clause(table)
|
|
383
|
+
return :via_scoped_parent if @allow_forward && build_belongs_to_scoped_clause(table)
|
|
384
|
+
|
|
385
|
+
:unscopable
|
|
386
|
+
end
|
|
387
|
+
|
|
388
|
+
private def build_scoped(table)
|
|
389
|
+
ast = QueryAst::Select.new
|
|
390
|
+
ast.from(table.name)
|
|
391
|
+
if table.rails_managed?
|
|
392
|
+
ast.select_all!
|
|
393
|
+
else
|
|
394
|
+
ast.select(table.columns)
|
|
395
|
+
end
|
|
396
|
+
|
|
397
|
+
# Reference/master (or rails-managed) table: export every row.
|
|
398
|
+
return ast if scope_exempt?(table)
|
|
399
|
+
|
|
400
|
+
# Carries the scope column itself: filter on it directly.
|
|
401
|
+
if directly_scoped?(table)
|
|
402
|
+
ast.where(scope_where_clause(table))
|
|
403
|
+
ast.where(table.filter) if table.filter
|
|
404
|
+
return ast
|
|
405
|
+
end
|
|
406
|
+
|
|
407
|
+
# Reachable via belongs_to: join up to the scoped ancestor (the scope
|
|
408
|
+
# filter is applied at the terminal join inside build_join_clauses_scoped).
|
|
409
|
+
join_clauses = build_join_clauses_scoped(table)
|
|
410
|
+
unless join_clauses.empty?
|
|
411
|
+
join_clauses.each { |join_clause| ast.join(join_clause) }
|
|
412
|
+
ast.where(table.filter) if table.filter
|
|
413
|
+
return ast
|
|
414
|
+
end
|
|
415
|
+
|
|
416
|
+
if @allow_reverse
|
|
417
|
+
# Referenced by an extractable (scoped) child: constrain via subquery.
|
|
418
|
+
reverse_clause = build_referenced_by_clause(table)
|
|
419
|
+
if reverse_clause
|
|
420
|
+
ast.where(reverse_clause)
|
|
421
|
+
return ast
|
|
422
|
+
end
|
|
423
|
+
end
|
|
424
|
+
|
|
425
|
+
if @allow_forward
|
|
426
|
+
# Belongs_to a parent that is itself scoped but carries no scope column of
|
|
427
|
+
# its own (so via_path cannot terminate on it) — e.g. a hub table scoped
|
|
428
|
+
# only via referenced_by. Constrain this table to that parent's in-scope
|
|
429
|
+
# ids so its rows ride along instead of being dumped in full.
|
|
430
|
+
parent_clause = build_belongs_to_scoped_clause(table)
|
|
431
|
+
if parent_clause
|
|
432
|
+
ast.where(parent_clause)
|
|
433
|
+
return ast
|
|
434
|
+
end
|
|
435
|
+
end
|
|
436
|
+
|
|
437
|
+
# Only the genuine top-level build (no rescue disabled) is allowed to fail
|
|
438
|
+
# hard. The Runner/ExplainRunner pre-flight (validate_scope!) rejects
|
|
439
|
+
# unscopable tables before extraction, so a top-level build never
|
|
440
|
+
# legitimately lands here; if it does, raise rather than emit an unfiltered
|
|
441
|
+
# (potential full PII) dump.
|
|
442
|
+
if @allow_reverse && @allow_forward
|
|
443
|
+
raise ArgumentError, scope_unscopable_message(table)
|
|
444
|
+
end
|
|
445
|
+
|
|
446
|
+
# Unscopable during a reverse/forward subquery build (a rescue is disabled):
|
|
447
|
+
# return the unconstrained AST so the caller's "constrained only" check
|
|
448
|
+
# filters this candidate out (it never becomes a real dump query).
|
|
449
|
+
ast
|
|
450
|
+
end
|
|
451
|
+
|
|
452
|
+
# The shared column this table is filtered on: a per-table `scope_column`
|
|
453
|
+
# override when present, otherwise the global `--scope-column`.
|
|
454
|
+
private def resolved_scope_column(table)
|
|
455
|
+
table.scope_column || dump_target.scope_column
|
|
456
|
+
end
|
|
457
|
+
|
|
458
|
+
private def scope_exempt?(table)
|
|
459
|
+
table.scope_exempt || table.rails_managed?
|
|
460
|
+
end
|
|
461
|
+
|
|
462
|
+
private def directly_scoped?(table)
|
|
463
|
+
column = resolved_scope_column(table)
|
|
464
|
+
table.columns.any? { |c| c.name == column }
|
|
465
|
+
end
|
|
466
|
+
|
|
467
|
+
private def scope_where_clause(table)
|
|
468
|
+
Exwiw::QueryAst::WhereClause.new(
|
|
469
|
+
column_name: resolved_scope_column(table),
|
|
470
|
+
operator: :eq,
|
|
471
|
+
value: dump_target.ids
|
|
472
|
+
)
|
|
473
|
+
end
|
|
474
|
+
|
|
475
|
+
# BFS over belongs_tos to the nearest *directly scoped* ancestor. Unlike the
|
|
476
|
+
# target-mode walk, the returned path INCLUDES that ancestor: the scope column
|
|
477
|
+
# lives on the ancestor itself (not on a foreign key of the child), so the
|
|
478
|
+
# ancestor must be joined and then filtered.
|
|
479
|
+
private def find_path_to_scoped(table)
|
|
480
|
+
visited = {}
|
|
481
|
+
queue = [[table.name, [table.name]]]
|
|
482
|
+
|
|
483
|
+
until queue.empty?
|
|
484
|
+
current_table_name, path = queue.shift
|
|
485
|
+
next if visited[current_table_name]
|
|
486
|
+
visited[current_table_name] = true
|
|
487
|
+
|
|
488
|
+
current_table = table_by_name[current_table_name]
|
|
489
|
+
next if current_table.nil?
|
|
490
|
+
|
|
491
|
+
current_table.belongs_tos.each do |relation|
|
|
492
|
+
next_table_name = relation.table_name
|
|
493
|
+
next_table = table_by_name[next_table_name]
|
|
494
|
+
next if next_table.nil?
|
|
495
|
+
|
|
496
|
+
next_path = path + [next_table_name]
|
|
497
|
+
return next_path if directly_scoped?(next_table)
|
|
498
|
+
|
|
499
|
+
queue.push([next_table_name, next_path])
|
|
500
|
+
end
|
|
501
|
+
end
|
|
502
|
+
|
|
503
|
+
[]
|
|
504
|
+
end
|
|
505
|
+
|
|
506
|
+
private def build_join_clauses_scoped(table)
|
|
507
|
+
path_tables = find_path_to_scoped(table)
|
|
508
|
+
@logger.debug(" Join path from #{table.name} to a scoped table: #{path_tables}")
|
|
509
|
+
|
|
510
|
+
return [] if path_tables.size < 2
|
|
511
|
+
|
|
512
|
+
path_tables.each_cons(2).map do |from_table_name, to_table_name|
|
|
513
|
+
from_table = table_by_name[from_table_name]
|
|
514
|
+
to_table = table_by_name[to_table_name]
|
|
515
|
+
|
|
516
|
+
join_clause = build_scoped_join_clause(from_table, to_table)
|
|
517
|
+
|
|
518
|
+
# Only the final hop's to_table is directly scoped (the BFS stops there),
|
|
519
|
+
# so the scope filter rides on that join's where_clauses, compiled against
|
|
520
|
+
# join_table_name = the scoped ancestor.
|
|
521
|
+
if directly_scoped?(to_table)
|
|
522
|
+
join_clause.where_clauses.push scope_where_clause(to_table)
|
|
523
|
+
end
|
|
524
|
+
|
|
525
|
+
if to_table.filter
|
|
526
|
+
join_clause.where_clauses.push to_table.filter
|
|
527
|
+
end
|
|
528
|
+
|
|
529
|
+
join_clause
|
|
530
|
+
end
|
|
531
|
+
end
|
|
532
|
+
|
|
533
|
+
# One belongs_to hop as a JoinClause, with the polymorphic type condition
|
|
534
|
+
# placed on the source table (base_where_clauses) when the hop is polymorphic
|
|
535
|
+
# — mirroring the target-mode loop in build_join_clauses.
|
|
536
|
+
private def build_scoped_join_clause(from_table, to_table)
|
|
537
|
+
relation = from_table.belongs_to(to_table.name)
|
|
538
|
+
|
|
539
|
+
join_clause = QueryAst::JoinClause.new(
|
|
540
|
+
base_table_name: from_table.name,
|
|
541
|
+
foreign_key: relation.foreign_key,
|
|
542
|
+
join_table_name: to_table.name,
|
|
543
|
+
primary_key: to_table.primary_key,
|
|
544
|
+
where_clauses: [],
|
|
545
|
+
base_where_clauses: []
|
|
546
|
+
)
|
|
547
|
+
|
|
548
|
+
if relation.polymorphic?
|
|
549
|
+
join_clause.base_where_clauses.push QueryAst::WhereClause.new(
|
|
550
|
+
column_name: relation.foreign_type,
|
|
551
|
+
operator: :eq,
|
|
552
|
+
value: [relation.type_value]
|
|
553
|
+
)
|
|
554
|
+
end
|
|
555
|
+
|
|
556
|
+
join_clause
|
|
557
|
+
end
|
|
558
|
+
|
|
559
|
+
private def scope_unscopable_message(table)
|
|
560
|
+
"Table '#{table.name}' cannot be scoped in scope-column mode: it has no " \
|
|
561
|
+
"'#{dump_target.scope_column}' column (nor a per-table scope_column override) and no " \
|
|
562
|
+
"belongs_to path to a table that does. Add `scope_exempt: true` to export it in full, " \
|
|
563
|
+
"set `ignore: true` to skip it, or add the missing belongs_to."
|
|
564
|
+
end
|
|
267
565
|
end
|
|
268
566
|
end
|
data/lib/exwiw/runner.rb
CHANGED
|
@@ -7,7 +7,7 @@ module Exwiw
|
|
|
7
7
|
def initialize(
|
|
8
8
|
connection_config:,
|
|
9
9
|
output_dir:,
|
|
10
|
-
|
|
10
|
+
schema_dir:,
|
|
11
11
|
dump_target:,
|
|
12
12
|
logger:,
|
|
13
13
|
output_format: 'insert',
|
|
@@ -17,7 +17,7 @@ module Exwiw
|
|
|
17
17
|
)
|
|
18
18
|
@connection_config = connection_config
|
|
19
19
|
@output_dir = output_dir
|
|
20
|
-
@
|
|
20
|
+
@schema_dir = schema_dir
|
|
21
21
|
@dump_target = dump_target
|
|
22
22
|
@output_format = output_format
|
|
23
23
|
@insert_only = insert_only
|
|
@@ -38,8 +38,13 @@ module Exwiw
|
|
|
38
38
|
target = table_by_name[@dump_target.table_name]
|
|
39
39
|
adapter.validate_as_dump_target!(target) if target
|
|
40
40
|
|
|
41
|
+
dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
|
|
42
|
+
# Scope-column mode: abort if any extractable table cannot be scoped (no-op
|
|
43
|
+
# otherwise). Done before extraction so nothing is dumped if it would leak.
|
|
44
|
+
QueryAstBuilder.validate_scope!(dumpable_configs, table_by_name, @dump_target, @logger)
|
|
45
|
+
|
|
41
46
|
@logger.info("Determining table processing order...")
|
|
42
|
-
ordered_table_names = DetermineTableProcessingOrder.run(
|
|
47
|
+
ordered_table_names = DetermineTableProcessingOrder.run(dumpable_configs)
|
|
43
48
|
|
|
44
49
|
clean_output_dir!
|
|
45
50
|
|
|
@@ -159,7 +164,7 @@ module Exwiw
|
|
|
159
164
|
end
|
|
160
165
|
|
|
161
166
|
private def load_table_config(klass)
|
|
162
|
-
Dir[File.join(@
|
|
167
|
+
Dir[File.join(@schema_dir, "*.json")].map do |file|
|
|
163
168
|
json = JSON.parse(File.read(file))
|
|
164
169
|
# Drop belongs_tos/columns(fields) flagged ignore:true so they are not
|
|
165
170
|
# considered during extraction. Done here (after loading from file)
|
data/lib/exwiw/table_config.rb
CHANGED
|
@@ -26,6 +26,18 @@ module Exwiw
|
|
|
26
26
|
attribute :columns, array(TableColumn), default: []
|
|
27
27
|
attribute :bulk_insert_chunk_size, optional(Integer), skip_serializing_if_nil: true
|
|
28
28
|
attribute :ignore, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
|
|
29
|
+
# Scope-column mode only (see Exwiw::DumpTarget#scope_column). Both are
|
|
30
|
+
# user-configured and never emitted by the schema generators.
|
|
31
|
+
#
|
|
32
|
+
# `scope_exempt: true` exports the whole table without scope filtering — the
|
|
33
|
+
# explicit, auditable escape hatch for genuine reference/master tables under
|
|
34
|
+
# the strict "every table must be scopable" rule.
|
|
35
|
+
#
|
|
36
|
+
# `scope_column` overrides the physical column this table is filtered on when
|
|
37
|
+
# it differs from the global `--scope-column` name (same scope value, just a
|
|
38
|
+
# different column name on this table).
|
|
39
|
+
attribute :scope_exempt, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
|
|
40
|
+
attribute :scope_column, optional(String), skip_serializing_if_nil: true
|
|
29
41
|
|
|
30
42
|
def self.from(hash)
|
|
31
43
|
config = super
|
|
@@ -137,6 +149,9 @@ module Exwiw
|
|
|
137
149
|
merged_table.filter = filter
|
|
138
150
|
merged_table.bulk_insert_chunk_size = passed_table.bulk_insert_chunk_size
|
|
139
151
|
merged_table.ignore = ignore
|
|
152
|
+
# User-owned, never regenerated: carry over from the existing config.
|
|
153
|
+
merged_table.scope_exempt = scope_exempt
|
|
154
|
+
merged_table.scope_column = scope_column
|
|
140
155
|
|
|
141
156
|
# Structural facts of each belongs_to come from the freshly generated
|
|
142
157
|
# config, but the user-owned `comment`/`ignore`/`references` carry over
|
data/lib/exwiw/version.rb
CHANGED
data/lib/exwiw.rb
CHANGED
|
@@ -39,7 +39,13 @@ module Exwiw
|
|
|
39
39
|
# `ids_field` optionally overrides which field `--ids` is matched against on
|
|
40
40
|
# the target table. When nil the table's primary key is used (the historical
|
|
41
41
|
# behavior). Currently only honored by the mongodb adapter.
|
|
42
|
-
|
|
42
|
+
#
|
|
43
|
+
# `scope_column` switches the extraction to scope-column mode: instead of a
|
|
44
|
+
# single `table_name` anchor, every table is filtered by a shared column
|
|
45
|
+
# (`scope_column IN ids`) and tables lacking it are reached by walking
|
|
46
|
+
# belongs_to up to the nearest table that has it. When set, `table_name` is
|
|
47
|
+
# nil. SQL adapters only.
|
|
48
|
+
DumpTarget = Struct.new(:table_name, :ids, :ids_field, :scope_column, keyword_init: true)
|
|
43
49
|
# `uri` is an optional full connection string (currently only honored by the
|
|
44
50
|
# mongodb adapter, e.g. `mongodb+srv://...`). When present it is the source of
|
|
45
51
|
# truth for the connection — host/port/user/password are ignored — so TLS,
|
data/lib/tasks/exwiw.rake
CHANGED
|
@@ -7,7 +7,7 @@ namespace :exwiw do
|
|
|
7
7
|
require "exwiw"
|
|
8
8
|
|
|
9
9
|
Exwiw::SchemaGenerator.from_rails_application(
|
|
10
|
-
output_dir: ENV["
|
|
10
|
+
output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
|
|
11
11
|
).generate!
|
|
12
12
|
end
|
|
13
13
|
|
|
@@ -16,7 +16,7 @@ namespace :exwiw do
|
|
|
16
16
|
require "exwiw"
|
|
17
17
|
|
|
18
18
|
result = Exwiw::SchemaGenerator.from_rails_application(
|
|
19
|
-
output_dir: ENV["
|
|
19
|
+
output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
|
|
20
20
|
).tidy!
|
|
21
21
|
|
|
22
22
|
if result.empty?
|
|
@@ -47,7 +47,7 @@ namespace :exwiw do
|
|
|
47
47
|
require "exwiw"
|
|
48
48
|
|
|
49
49
|
Exwiw::MongoidSchemaGenerator.from_rails_application(
|
|
50
|
-
output_dir: ENV["
|
|
50
|
+
output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
|
|
51
51
|
skip_unsupported: ENV["EXWIW_SKIP_UNSUPPORTED"] == "1",
|
|
52
52
|
).generate!
|
|
53
53
|
end
|