exwiw 0.6.0 → 0.6.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +7 -1
- data/docs/mongodb-scoping-fullscan-notes.md +145 -0
- data/lib/exwiw/adapter/mongodb_adapter.rb +89 -13
- data/lib/exwiw/config_file.rb +46 -0
- data/lib/exwiw/explain_runner.rb +16 -0
- data/lib/exwiw/runner.rb +19 -0
- data/lib/exwiw/version.rb +1 -1
- data/lib/exwiw.rb +1 -0
- data/lib/tasks/exwiw.rake +14 -3
- metadata +3 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 2dc1d7b1052722433d6328889f3fba23233d09edfdf1db1d3c27d89598951cb1
|
|
4
|
+
data.tar.gz: 7e0cdbe59e89ea5fb7ced4dce6e170a0e88d09715789ca1faea4f36bc2558a66
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 5ab4c1f3c40d11c8d717d20639b3338c18df168880059c6c1e74d33e6f6bf9d9ce9f6921b9e8c613dc4aa24f8dfaf058017382763af45a888edc3c665e38373a
|
|
7
|
+
data.tar.gz: ebddd4d938ef4369e41a87e4b57cde92c0cf534920459eafc6df7e5d0eeec2fc37810effe0ba4656a14890d75c73e2cb3c0b9f9f9f38e5bf9807df2ec156585e
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
|
@@ -253,12 +253,18 @@ The config generator is provided as a Rake task.
|
|
|
253
253
|
bundle exec rake exwiw:schema:generate
|
|
254
254
|
```
|
|
255
255
|
|
|
256
|
-
|
|
256
|
+
The output directory is resolved in this order:
|
|
257
|
+
|
|
258
|
+
1. the `EXWIW_SCHEMA_DIR_PATH` environment variable, if set;
|
|
259
|
+
2. otherwise `schema_dir` from the config file (`exwiw.yml` / `exwiw.yaml` in the current directory), so the generator and the `exwiw` CLI share one location without repeating the path;
|
|
260
|
+
3. otherwise the `exwiw/schema` default.
|
|
257
261
|
|
|
258
262
|
```sh
|
|
259
263
|
EXWIW_SCHEMA_DIR_PATH=custom_directory bundle exec rake exwiw:schema:generate
|
|
260
264
|
```
|
|
261
265
|
|
|
266
|
+
As with the CLI, a relative `schema_dir` in the config file is resolved relative to the config file's own directory.
|
|
267
|
+
|
|
262
268
|
#### Tidying stale config (`schema:tidy`)
|
|
263
269
|
|
|
264
270
|
`schema:generate` adds and updates config files for the tables it finds, but it never deletes the config file of a table that has been dropped from the application. To reconcile the existing config against the current schema, run:
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
# MongoDB scoping full-scan diagnosis (nullable-FK belongs_to)
|
|
2
|
+
|
|
3
|
+
Why a related collection's dump can be "especially slow" — dumping far more
|
|
4
|
+
records than the scope implies and scanning the whole collection — when running
|
|
5
|
+
an exwiw config against a MongoDB backup. The headline finding: the slowness is a
|
|
6
|
+
**symptom of a scoping bug**, not of serialization/decode cost.
|
|
7
|
+
|
|
8
|
+
## Reproduction setup
|
|
9
|
+
|
|
10
|
+
- Backup: serve a raw WiredTiger dbpath with a standalone `mongod` (the same
|
|
11
|
+
`mongo:7` image the repo's `compose.yml` uses) on a spare port:
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
docker run -d --name exwiw-restore-mongo --user 0:0 --entrypoint mongod \
|
|
15
|
+
-p 27018:27017 -v "<backup-dbpath>:/data/db" \
|
|
16
|
+
mongo:7 --dbpath /data/db --bind_ip_all
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
Notes: run as root (`--user 0:0`) so mongod can read a backup's `0600` files;
|
|
20
|
+
bypass the image entrypoint (`--entrypoint mongod`) so it does not `gosu`-drop
|
|
21
|
+
back to the `mongodb` user. A backup carrying a `WiredTiger.backup` marker runs
|
|
22
|
+
recovery on the first start and **writes into the backup dir** (expected for a
|
|
23
|
+
restore). Starting a standalone from a replica-set backup yields a harmless
|
|
24
|
+
`system.replset` warning. Local DB connections may be blocked by a dev sandbox —
|
|
25
|
+
run measurement commands with the sandbox disabled.
|
|
26
|
+
|
|
27
|
+
- Run: `bundle exec exwiw export --config <app>/exwiw/exwiw.yml
|
|
28
|
+
--adapter=mongodb --host=localhost --port=27018 --database=<database>
|
|
29
|
+
--ids=<target-id> --output-dir=… --log-level=debug`.
|
|
30
|
+
(The optional-argument CLI flags — `--ids`, `--output-dir` — must use `=`; the
|
|
31
|
+
space form passes `nil` and crashes in the option callback.)
|
|
32
|
+
|
|
33
|
+
## Baseline measurement (worked example, warm cache)
|
|
34
|
+
|
|
35
|
+
Full run ≈ **69s** over a few hundred non-empty collections. Per-collection wall
|
|
36
|
+
time (gap between consecutive `Processing table` log markers) was dominated by a
|
|
37
|
+
single collection:
|
|
38
|
+
|
|
39
|
+
| collection | time | records |
|
|
40
|
+
|---|---|---|
|
|
41
|
+
| **items** | **41.0s (59% of the whole run)** | 18,739 |
|
|
42
|
+
| (other collections) | ~5s and below each | — |
|
|
43
|
+
|
|
44
|
+
A "full run ≈ 10 min" figure is the **cold-cache** version (first read pages the
|
|
45
|
+
`items` data off the backup over the bind mount). The *relative* shape (items
|
|
46
|
+
dominates) is the same warm or cold.
|
|
47
|
+
|
|
48
|
+
## Root cause: nullable belongs_to FK used as a hard `$in` AND constraint
|
|
49
|
+
|
|
50
|
+
The scope is tiny: one parent entity (`<target-id>`) → **1 store** (linked by
|
|
51
|
+
`business_entity_id` = the entity's **`uuid`**, correctly configured with
|
|
52
|
+
`references: uuid`) → **127 items** (`{store_id: <that store>}`, indexed, ~2ms).
|
|
53
|
+
|
|
54
|
+
But the run dumped **18,739** items, not 127, and scanned the whole collection.
|
|
55
|
+
Why:
|
|
56
|
+
|
|
57
|
+
1. `MongodbAdapter#related_collection_filter` ANDs **every** belongs_to whose
|
|
58
|
+
parent produced ids. The `stores` filter became:
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
{ user_id: {$in: [580 ids]},
|
|
62
|
+
deleted_user_id: {$in: [96 ids]}, # nullable FK
|
|
63
|
+
business_entity_id: {$in: ["<target-id>"]} }
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
The one matching store has **`deleted_user_id` absent (null)**, so it can never
|
|
67
|
+
satisfy `deleted_user_id ∈ {96 ids}`. The AND yields **0 stores** → `stores`
|
|
68
|
+
logs "No records matched. skip this table."
|
|
69
|
+
|
|
70
|
+
2. With `stores` empty, `@state["stores"]` carries no ids, so when `items` is
|
|
71
|
+
built its `store_id` belongs_to contributes nothing. The remaining belongs_tos
|
|
72
|
+
are to **reference/master data that is dumped in full** —
|
|
73
|
+
`large_categories` and `medium_categories`. So the items filter degenerated to:
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
{ large_category_id: {$in: [98 ids]},
|
|
77
|
+
medium_category_id: {$in: [846 ids]} }
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
3. `items` has **no index on `large_category_id` / `medium_category_id`** (it does
|
|
81
|
+
have `store_id_1`). So this filter forces a **full COLLSCAN of all 2.43M items**
|
|
82
|
+
— and the Runner scans it **twice**: once for `StreamingResult#size`
|
|
83
|
+
(`count_documents`) and again for the fetch.
|
|
84
|
+
|
|
85
|
+
Isolated phase breakdown of the degenerate items query (warm): `count_documents`
|
|
86
|
+
3.58s + fetch/decode 1.47s + serialize 0.26s. Serialization (the native
|
|
87
|
+
`Exwiw::ExtJson` C ext) is **not** the bottleneck — the COLLSCAN is, and it is far
|
|
88
|
+
worse cold.
|
|
89
|
+
|
|
90
|
+
The same nullable-FK problem applies one level down: the store's 127 items
|
|
91
|
+
themselves have `*_category_id` = null, so even `{store_id, large_category,
|
|
92
|
+
medium_category}` ANDed returns **0**. The only filter that yields the correct 127
|
|
93
|
+
is `{store_id: {$in: [store]}}` **alone**.
|
|
94
|
+
|
|
95
|
+
## Implemented fix: genuine-anchor scoping (MongodbAdapter#related_collection_filter)
|
|
96
|
+
|
|
97
|
+
Scope flows from the dump target along belongs_to edges. The fix classifies each
|
|
98
|
+
belongs_to parent of a non-target collection by whether it is **genuinely scoped**
|
|
99
|
+
— reachable back to the dump target through belongs_to chains
|
|
100
|
+
(`#genuine_scope_set`, a fixpoint over the configs) — and applies the constraint
|
|
101
|
+
accordingly:
|
|
102
|
+
|
|
103
|
+
- **Anchor (strict).** Among the genuine parents, the most selective one (fewest
|
|
104
|
+
captured ids) is applied strictly (`{fk: {$in: [...]}}`). It carries the real
|
|
105
|
+
narrowing and, being strict, bounds the result to a small set — which keeps both
|
|
106
|
+
this query and the `$in` sets it feeds downstream from ballooning.
|
|
107
|
+
- **Other genuine parents (null-aware).** `{fk: {$in: [nil, ...]}}` (Mongo's
|
|
108
|
+
`$in: [nil]` matches both explicit nulls and missing fields), so a row whose
|
|
109
|
+
nullable refinement FK is null is not excluded by it.
|
|
110
|
+
- **Reference parents (dropped).** A parent NOT reachable to the dump target is
|
|
111
|
+
reference/master data dumped in full; its id set is "all/most of a table" and is
|
|
112
|
+
not a real scope, so when a genuine anchor exists it is dropped entirely.
|
|
113
|
+
- **No genuine parent:** fall back to the historical strict-AND of whatever
|
|
114
|
+
constraints exist (preserves prior behaviour for unreachable collections).
|
|
115
|
+
|
|
116
|
+
For this extraction: `stores` → `{business_entity_id ∈ {<target-id>}}` (anchor;
|
|
117
|
+
`user_id`/`deleted_user_id` → reference leaks, dropped) → **1 store**; `items` →
|
|
118
|
+
`{store_id ∈ {store}}` strict anchor with the nullable refinement FKs null-aware
|
|
119
|
+
and the `*_category` references dropped → **127** via the `store_id_1` index.
|
|
120
|
+
|
|
121
|
+
### Measured result (warm, same cache)
|
|
122
|
+
|
|
123
|
+
Full run **58.8s → 11.0s ≈ 5.4×**; `items` 41s double COLLSCAN → ~11ms indexed
|
|
124
|
+
(≈3700×). Correctness also fixed: `stores` 0→1, `items` 18,739 (leaked COLLSCAN)
|
|
125
|
+
→127. Byte-identical existing snapshots (the seed graph is fully genuine and has no
|
|
126
|
+
null FKs, so anchor-strict + null-aware ≡ the prior strict-AND).
|
|
127
|
+
|
|
128
|
+
### Approaches considered and rejected
|
|
129
|
+
|
|
130
|
+
- **Unconditional null-aware on every belongs_to** (the original iter-1 direction):
|
|
131
|
+
catastrophic. A collection that belongs_to only reference data dumped in full
|
|
132
|
+
becomes ~the whole table once null-aware; the resulting child `$in` then exceeds
|
|
133
|
+
Mongo's **48 MB max message size** and the run crashes. Null-awareness must NOT
|
|
134
|
+
be applied to a collection's only/anchor scope.
|
|
135
|
+
- **Null-aware on all genuine parents (no anchor distinction):** makes the genuine
|
|
136
|
+
*anchor* itself null-aware too — `stores` then matched every store with a null
|
|
137
|
+
`business_entity_id` (a not-fully-backfilled column) → hundreds of thousands of
|
|
138
|
+
stores → a ~39 MB child filter on a downstream collection (**MaxBSONSize**).
|
|
139
|
+
Hence the anchor stays strict.
|
|
140
|
+
- **Scope by the single most-selective genuine parent alone (drop other genuine):**
|
|
141
|
+
fast and correct here, but drops legitimate AND-narrowing for multi-parent
|
|
142
|
+
collections (e.g. `order_items` ∈ orders AND products) and moves seed snapshots.
|
|
143
|
+
- Pure-performance tweaks that keep the (incorrect) 18,739-row output —
|
|
144
|
+
`--cursor-parallel` (changes row order, treats the symptom) or skipping the
|
|
145
|
+
redundant `count_documents` scan (~½ only) — were rejected as the primary fix.
|
|
@@ -123,7 +123,7 @@ module Exwiw
|
|
|
123
123
|
{ config.primary_key => { "$in" => coerce_ids(dump_target.ids) } }
|
|
124
124
|
end
|
|
125
125
|
else
|
|
126
|
-
related_collection_filter(config, config_by_name)
|
|
126
|
+
related_collection_filter(config, config_by_name, dump_target)
|
|
127
127
|
end
|
|
128
128
|
|
|
129
129
|
Exwiw::MongoQuery::Find.new(
|
|
@@ -352,23 +352,74 @@ module Exwiw
|
|
|
352
352
|
# the values were captured from that field in #execute, so their BSON type
|
|
353
353
|
# already matches the stored FK — no coercion.
|
|
354
354
|
#
|
|
355
|
-
#
|
|
356
|
-
#
|
|
357
|
-
#
|
|
358
|
-
#
|
|
359
|
-
#
|
|
360
|
-
#
|
|
361
|
-
#
|
|
362
|
-
#
|
|
363
|
-
#
|
|
364
|
-
|
|
365
|
-
|
|
355
|
+
# Scope flows from the dump target along belongs_to edges. A belongs_to is
|
|
356
|
+
# classified by whether its parent is *genuinely scoped* — reachable back to
|
|
357
|
+
# the dump target through belongs_to chains (see #genuine_scope_set) — which
|
|
358
|
+
# determines how its constraint is applied:
|
|
359
|
+
#
|
|
360
|
+
# - Among the genuine parents, the most selective one (fewest captured ids)
|
|
361
|
+
# is the ANCHOR and is applied strictly. It carries the real narrowing and,
|
|
362
|
+
# being strict, bounds the result to a small set — which keeps both this
|
|
363
|
+
# query and the `$in` sets it feeds downstream from ballooning.
|
|
364
|
+
#
|
|
365
|
+
# - The OTHER genuine parents are applied null-aware: a row whose (nullable)
|
|
366
|
+
# FK is null/absent has no reference through that relation and must not be
|
|
367
|
+
# excluded by it. `nil` is added to the `$in` set (Mongo's `$in: [nil]`
|
|
368
|
+
# matches both explicit nulls and missing fields). Without this, a nullable
|
|
369
|
+
# genuine FK that is null on otherwise in-scope rows ANDs the result to
|
|
370
|
+
# empty — dropping legitimate rows, and (when it zeroes a parent) making
|
|
371
|
+
# children lose that parent's selective+indexed scope and degenerate to a
|
|
372
|
+
# full COLLSCAN. See docs/mongodb-scoping-fullscan-notes.md. Null-aware is
|
|
373
|
+
# applied to non-anchor parents only: making the sole/anchor scope itself
|
|
374
|
+
# null-aware would match every row whose FK is null (e.g. a not-yet-
|
|
375
|
+
# backfilled column), ballooning the result instead of scoping it.
|
|
376
|
+
#
|
|
377
|
+
# - Reference parents (NOT reachable to the dump target — master/reference
|
|
378
|
+
# data dumped in full, or only reachable via such data) produce a non-
|
|
379
|
+
# scoping id set: "all/most of a reference table", which neither narrows
|
|
380
|
+
# meaningfully nor, made null-aware, stays bounded. So when the collection
|
|
381
|
+
# has a genuine parent to anchor on, reference-parent constraints are
|
|
382
|
+
# dropped entirely.
|
|
383
|
+
#
|
|
384
|
+
# When NO genuine parent produced ids, the collection is not reachable from
|
|
385
|
+
# the dump target; fall back to the historical strict-AND of whatever
|
|
386
|
+
# constraints exist (bounded, preserves prior behavior).
|
|
387
|
+
#
|
|
388
|
+
# A belongs_to whose parent produced no ids contributes no constraint: either
|
|
389
|
+
# the parent matched nothing, or it is not dumped here (e.g. an embedded
|
|
390
|
+
# collection, or one excluded from the run). If that leaves the filter empty
|
|
391
|
+
# even though the collection HAS belongs_to, the collection cannot be scoped
|
|
392
|
+
# from the dump target — and an empty `{}` filter would scan and dump the
|
|
393
|
+
# ENTIRE collection across every scope. That is never what a scoped
|
|
394
|
+
# extraction wants, so constrain it to match nothing and warn instead. (A
|
|
395
|
+
# collection with no belongs_to at all is genuine reference/master data and
|
|
396
|
+
# is still dumped in full via `{}`.)
|
|
397
|
+
private def related_collection_filter(config, config_by_name, dump_target)
|
|
398
|
+
genuine = genuine_scope_set(config_by_name, dump_target.table_name)
|
|
399
|
+
|
|
400
|
+
genuine_clauses = []
|
|
401
|
+
reference_clauses = []
|
|
402
|
+
config.belongs_tos.each do |relation|
|
|
366
403
|
values = parent_state_for(relation, config_by_name)
|
|
367
404
|
next if values.nil? || values.empty?
|
|
368
405
|
|
|
369
|
-
|
|
406
|
+
target = genuine.include?(relation.table_name) ? genuine_clauses : reference_clauses
|
|
407
|
+
target << [relation.foreign_key, values]
|
|
370
408
|
end
|
|
371
409
|
|
|
410
|
+
filter =
|
|
411
|
+
if genuine_clauses.any?
|
|
412
|
+
anchor_index = (0...genuine_clauses.size).min_by { |i| genuine_clauses[i][1].size }
|
|
413
|
+
genuine_clauses.each_with_index.each_with_object({}) do |((foreign_key, values), index), acc|
|
|
414
|
+
acc[foreign_key] =
|
|
415
|
+
index == anchor_index ? { "$in" => values } : { "$in" => [nil] + values }
|
|
416
|
+
end
|
|
417
|
+
else
|
|
418
|
+
reference_clauses.each_with_object({}) do |(foreign_key, values), acc|
|
|
419
|
+
acc[foreign_key] = { "$in" => values }
|
|
420
|
+
end
|
|
421
|
+
end
|
|
422
|
+
|
|
372
423
|
return filter unless filter.empty? && config.belongs_tos.any?
|
|
373
424
|
|
|
374
425
|
@logger.warn(
|
|
@@ -379,6 +430,31 @@ module Exwiw
|
|
|
379
430
|
{ config.primary_key => { "$in" => [] } }
|
|
380
431
|
end
|
|
381
432
|
|
|
433
|
+
# The set of collection names *genuinely scoped* by the dump target: the
|
|
434
|
+
# target itself, plus every collection that can reach it by following
|
|
435
|
+
# belongs_to edges (child -> parent) transitively. Computed by fixpoint over
|
|
436
|
+
# the configs. Everything outside this set is reference/master data (or only
|
|
437
|
+
# reachable through it) whose belongs_to id sets do not represent a real
|
|
438
|
+
# scope. Memoized per target name; the configs do not mutate mid-run.
|
|
439
|
+
private def genuine_scope_set(config_by_name, target_name)
|
|
440
|
+
(@genuine_scope_set_cache ||= {})[target_name] ||=
|
|
441
|
+
begin
|
|
442
|
+
reachable = Set.new([target_name])
|
|
443
|
+
loop do
|
|
444
|
+
added = false
|
|
445
|
+
config_by_name.each_value do |cfg|
|
|
446
|
+
next if cfg.embedded? || reachable.include?(cfg.name)
|
|
447
|
+
next unless cfg.belongs_tos.any? { |relation| reachable.include?(relation.table_name) }
|
|
448
|
+
|
|
449
|
+
reachable << cfg.name
|
|
450
|
+
added = true
|
|
451
|
+
end
|
|
452
|
+
break unless added
|
|
453
|
+
end
|
|
454
|
+
reachable
|
|
455
|
+
end
|
|
456
|
+
end
|
|
457
|
+
|
|
382
458
|
# The captured parent-collection values a child belongs_to should be
|
|
383
459
|
# constrained by: the values of the parent field the FK references
|
|
384
460
|
# (`relation.references`, default the parent primary_key). nil when the
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "yaml"
|
|
4
|
+
|
|
5
|
+
module Exwiw
|
|
6
|
+
# Minimal reader for the exwiw config YAML (exwiw.yml / exwiw.yaml).
|
|
7
|
+
#
|
|
8
|
+
# The CLI has its own, richer config handling (CLI#apply_config_file!); this
|
|
9
|
+
# module exists for contexts that have no CLI in scope — chiefly the
|
|
10
|
+
# `exwiw:schema:*` rake tasks, which otherwise only knew the
|
|
11
|
+
# EXWIW_SCHEMA_DIR_PATH env var and a hard-coded default. It deliberately
|
|
12
|
+
# reads only what those tasks need (schema_dir) and never aborts the process:
|
|
13
|
+
# an absent or unreadable config simply yields nil so the caller can fall back.
|
|
14
|
+
module ConfigFile
|
|
15
|
+
# Mirrors CLI::DEFAULT_CONFIG_PATHS; .yml wins when both are present.
|
|
16
|
+
DEFAULT_PATHS = %w[exwiw.yml exwiw.yaml].freeze
|
|
17
|
+
|
|
18
|
+
module_function
|
|
19
|
+
|
|
20
|
+
# The `schema_dir` from the config file, expanded to an absolute path
|
|
21
|
+
# relative to the config file's own directory (matching the CLI). Returns
|
|
22
|
+
# nil when no config file is found, it cannot be parsed, or it does not set
|
|
23
|
+
# `schema_dir`. Pass an explicit `path` to read a specific file; otherwise
|
|
24
|
+
# the default paths are looked up in the current directory.
|
|
25
|
+
def schema_dir(path = nil)
|
|
26
|
+
path ||= DEFAULT_PATHS.map { |p| File.expand_path(p) }.find { |p| File.file?(p) }
|
|
27
|
+
return nil if path.nil? || !File.file?(path)
|
|
28
|
+
|
|
29
|
+
config =
|
|
30
|
+
begin
|
|
31
|
+
YAML.safe_load(File.read(path))
|
|
32
|
+
rescue Psych::SyntaxError
|
|
33
|
+
nil
|
|
34
|
+
end
|
|
35
|
+
return nil unless config.is_a?(Hash)
|
|
36
|
+
|
|
37
|
+
value = config["schema_dir"]
|
|
38
|
+
return nil if value.nil?
|
|
39
|
+
|
|
40
|
+
# Strip a trailing slash and resolve relative to the config file's
|
|
41
|
+
# directory, exactly as CLI#expand_dir does.
|
|
42
|
+
value = value.end_with?("/") ? value[0..-2] : value
|
|
43
|
+
File.expand_path(value, File.dirname(File.expand_path(path)))
|
|
44
|
+
end
|
|
45
|
+
end
|
|
46
|
+
end
|
data/lib/exwiw/explain_runner.rb
CHANGED
|
@@ -24,6 +24,7 @@ module Exwiw
|
|
|
24
24
|
table_by_name = configs.each_with_object({}) { |config, hash| hash[config.name] = config }
|
|
25
25
|
|
|
26
26
|
target = table_by_name[@dump_target.table_name]
|
|
27
|
+
validate_target_exists!(target)
|
|
27
28
|
adapter.validate_as_dump_target!(target) if target
|
|
28
29
|
|
|
29
30
|
dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
|
|
@@ -62,6 +63,21 @@ module Exwiw
|
|
|
62
63
|
end
|
|
63
64
|
end
|
|
64
65
|
|
|
66
|
+
# Reject a `--target-table` (or `--target-collection`) absent from the loaded
|
|
67
|
+
# schema; mirrors Runner#validate_target_exists! so explain and export fail the
|
|
68
|
+
# same way on a typo. `target` is the looked-up config (nil when not found).
|
|
69
|
+
#
|
|
70
|
+
# TODO: same caveat as Runner#validate_target_exists! — this checks the schema
|
|
71
|
+
# (schema_dir JSON), not the live DB connection; verifying against the
|
|
72
|
+
# connection would need a table-exists capability on each adapter. revisit.
|
|
73
|
+
private def validate_target_exists!(target)
|
|
74
|
+
return if @dump_target.table_name.nil?
|
|
75
|
+
return unless target.nil?
|
|
76
|
+
|
|
77
|
+
raise ArgumentError,
|
|
78
|
+
"--target-table '#{@dump_target.table_name}' does not exist in the schema (#{@schema_dir})."
|
|
79
|
+
end
|
|
80
|
+
|
|
65
81
|
private def validate_ignored(configs)
|
|
66
82
|
ignored_names = configs.select { |c| c.ignore }.map(&:name).to_set
|
|
67
83
|
return if ignored_names.empty?
|
data/lib/exwiw/runner.rb
CHANGED
|
@@ -36,6 +36,7 @@ module Exwiw
|
|
|
36
36
|
table_by_name = configs.each_with_object({}) { |config, hash| hash[config.name] = config }
|
|
37
37
|
|
|
38
38
|
target = table_by_name[@dump_target.table_name]
|
|
39
|
+
validate_target_exists!(target)
|
|
39
40
|
adapter.validate_as_dump_target!(target) if target
|
|
40
41
|
|
|
41
42
|
dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
|
|
@@ -211,6 +212,24 @@ module Exwiw
|
|
|
211
212
|
ignored_names.each { |n| @logger.info("Table '#{n}' is marked ignore:true (schema will be included, data extraction skipped)") }
|
|
212
213
|
end
|
|
213
214
|
|
|
215
|
+
# Reject a `--target-table` (or `--target-collection`) that does not match any
|
|
216
|
+
# table/collection in the loaded schema. Without this a typo'd target silently
|
|
217
|
+
# matched nothing and produced an empty dump with no indication of the mistake.
|
|
218
|
+
# `target` is the looked-up config (nil when not found); a nil dump target
|
|
219
|
+
# (dump-all / scope-column mode) is allowed through.
|
|
220
|
+
#
|
|
221
|
+
# TODO: this checks the loaded schema (schema_dir JSON), not the live DB
|
|
222
|
+
# connection — a table that exists in the database but has no schema config
|
|
223
|
+
# is still rejected here. We may instead want to verify existence against the
|
|
224
|
+
# connection (would need a table-exists capability on each adapter). revisit.
|
|
225
|
+
private def validate_target_exists!(target)
|
|
226
|
+
return if @dump_target.table_name.nil?
|
|
227
|
+
return unless target.nil?
|
|
228
|
+
|
|
229
|
+
raise ArgumentError,
|
|
230
|
+
"--target-table '#{@dump_target.table_name}' does not exist in the schema (#{@schema_dir})."
|
|
231
|
+
end
|
|
232
|
+
|
|
214
233
|
private def validate_rails_managed_target!(configs)
|
|
215
234
|
return if @dump_target.table_name.nil?
|
|
216
235
|
|
data/lib/exwiw/version.rb
CHANGED
data/lib/exwiw.rb
CHANGED
data/lib/tasks/exwiw.rake
CHANGED
|
@@ -2,12 +2,23 @@
|
|
|
2
2
|
|
|
3
3
|
namespace :exwiw do
|
|
4
4
|
namespace :schema do
|
|
5
|
+
# Output directory for the generated schema config. Precedence:
|
|
6
|
+
# 1. EXWIW_SCHEMA_DIR_PATH env var (explicit per-run override)
|
|
7
|
+
# 2. schema_dir in the exwiw config file (exwiw.yml/.yaml), so generating
|
|
8
|
+
# the schema and running the `exwiw` CLI agree on one location without
|
|
9
|
+
# repeating the path
|
|
10
|
+
# 3. the historical "exwiw/schema" default
|
|
11
|
+
# Resolved at task-run time (after `require "exwiw"` has loaded ConfigFile).
|
|
12
|
+
resolve_schema_dir = lambda do
|
|
13
|
+
ENV["EXWIW_SCHEMA_DIR_PATH"] || Exwiw::ConfigFile.schema_dir || "exwiw/schema"
|
|
14
|
+
end
|
|
15
|
+
|
|
5
16
|
desc "Generate schema from application"
|
|
6
17
|
task generate: :environment do
|
|
7
18
|
require "exwiw"
|
|
8
19
|
|
|
9
20
|
Exwiw::SchemaGenerator.from_rails_application(
|
|
10
|
-
output_dir:
|
|
21
|
+
output_dir: resolve_schema_dir.call,
|
|
11
22
|
).generate!
|
|
12
23
|
end
|
|
13
24
|
|
|
@@ -16,7 +27,7 @@ namespace :exwiw do
|
|
|
16
27
|
require "exwiw"
|
|
17
28
|
|
|
18
29
|
result = Exwiw::SchemaGenerator.from_rails_application(
|
|
19
|
-
output_dir:
|
|
30
|
+
output_dir: resolve_schema_dir.call,
|
|
20
31
|
).tidy!
|
|
21
32
|
|
|
22
33
|
if result.empty?
|
|
@@ -47,7 +58,7 @@ namespace :exwiw do
|
|
|
47
58
|
require "exwiw"
|
|
48
59
|
|
|
49
60
|
Exwiw::MongoidSchemaGenerator.from_rails_application(
|
|
50
|
-
output_dir:
|
|
61
|
+
output_dir: resolve_schema_dir.call,
|
|
51
62
|
skip_unsupported: ENV["EXWIW_SKIP_UNSUPPORTED"] == "1",
|
|
52
63
|
).generate!
|
|
53
64
|
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: exwiw
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.6.
|
|
4
|
+
version: 0.6.2
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Shia
|
|
@@ -36,6 +36,7 @@ files:
|
|
|
36
36
|
- CHANGELOG.md
|
|
37
37
|
- LICENSE.txt
|
|
38
38
|
- README.md
|
|
39
|
+
- docs/mongodb-scoping-fullscan-notes.md
|
|
39
40
|
- docs/optimization-notes.md
|
|
40
41
|
- docs/optimize-mongodb-export-with-native-ext.md
|
|
41
42
|
- docs/plans/2026-05-15-insert-000-schema-file.md
|
|
@@ -60,6 +61,7 @@ files:
|
|
|
60
61
|
- lib/exwiw/after_insert_hook.rb
|
|
61
62
|
- lib/exwiw/belongs_to.rb
|
|
62
63
|
- lib/exwiw/cli.rb
|
|
64
|
+
- lib/exwiw/config_file.rb
|
|
63
65
|
- lib/exwiw/ddl_postprocessor.rb
|
|
64
66
|
- lib/exwiw/determine_table_processing_order.rb
|
|
65
67
|
- lib/exwiw/embedded_in.rb
|