exwiw 0.6.0 → 0.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 39362410df244fffa463a86c845062f0e9bacac723e15a6697a50c631db0d5cd
4
- data.tar.gz: 9670677e7c822886ed6268e476008c81a1caba4e2eacb286bf0e747fef5d3f3c
3
+ metadata.gz: 2dc1d7b1052722433d6328889f3fba23233d09edfdf1db1d3c27d89598951cb1
4
+ data.tar.gz: 7e0cdbe59e89ea5fb7ced4dce6e170a0e88d09715789ca1faea4f36bc2558a66
5
5
  SHA512:
6
- metadata.gz: 4af4db84210fbd9da8b6b32e136c1f12af370bc719c5aeee2a1f7183046f1ffae5e8dcdc0bb6d856e30fe800a5febe12aef4d4ff2d2e8c27b162fcc4a056c7f4
7
- data.tar.gz: 5064dfee83c653ae0c73ae07edf33299752c748c6f9efc5e413424e7b3a92b769f6a99901688a2d1d3415ad3a0939ee807095b1cf24b2aab46fcf89402113a90
6
+ metadata.gz: 5ab4c1f3c40d11c8d717d20639b3338c18df168880059c6c1e74d33e6f6bf9d9ce9f6921b9e8c613dc4aa24f8dfaf058017382763af45a888edc3c665e38373a
7
+ data.tar.gz: ebddd4d938ef4369e41a87e4b57cde92c0cf534920459eafc6df7e5d0eeec2fc37810effe0ba4656a14890d75c73e2cb3c0b9f9f9f38e5bf9807df2ec156585e
data/CHANGELOG.md CHANGED
@@ -2,6 +2,10 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [0.6.2] - 2026-06-21
6
+
7
+ ## [0.6.1] - 2026-06-20
8
+
5
9
  ## [0.6.0] - 2026-06-20
6
10
 
7
11
  ### Added
data/README.md CHANGED
@@ -253,12 +253,18 @@ The config generator is provided as a Rake task.
253
253
  bundle exec rake exwiw:schema:generate
254
254
  ```
255
255
 
256
- By default, the schema files will be saved in the `exwiw/schema` directory. You can specify a different output directory by setting the `EXWIW_SCHEMA_DIR_PATH` environment variable:
256
+ The output directory is resolved in this order:
257
+
258
+ 1. the `EXWIW_SCHEMA_DIR_PATH` environment variable, if set;
259
+ 2. otherwise `schema_dir` from the config file (`exwiw.yml` / `exwiw.yaml` in the current directory), so the generator and the `exwiw` CLI share one location without repeating the path;
260
+ 3. otherwise the `exwiw/schema` default.
257
261
 
258
262
  ```sh
259
263
  EXWIW_SCHEMA_DIR_PATH=custom_directory bundle exec rake exwiw:schema:generate
260
264
  ```
261
265
 
266
+ As with the CLI, a relative `schema_dir` in the config file is resolved relative to the config file's own directory.
267
+
262
268
  #### Tidying stale config (`schema:tidy`)
263
269
 
264
270
  `schema:generate` adds and updates config files for the tables it finds, but it never deletes the config file of a table that has been dropped from the application. To reconcile the existing config against the current schema, run:
@@ -0,0 +1,145 @@
1
+ # MongoDB scoping full-scan diagnosis (nullable-FK belongs_to)
2
+
3
+ Why a related collection's dump can be "especially slow" — dumping far more
4
+ records than the scope implies and scanning the whole collection — when running
5
+ an exwiw config against a MongoDB backup. The headline finding: the slowness is a
6
+ **symptom of a scoping bug**, not of serialization/decode cost.
7
+
8
+ ## Reproduction setup
9
+
10
+ - Backup: serve a raw WiredTiger dbpath with a standalone `mongod` (the same
11
+ `mongo:7` image the repo's `compose.yml` uses) on a spare port:
12
+
13
+ ```
14
+ docker run -d --name exwiw-restore-mongo --user 0:0 --entrypoint mongod \
15
+ -p 27018:27017 -v "<backup-dbpath>:/data/db" \
16
+ mongo:7 --dbpath /data/db --bind_ip_all
17
+ ```
18
+
19
+ Notes: run as root (`--user 0:0`) so mongod can read a backup's `0600` files;
20
+ bypass the image entrypoint (`--entrypoint mongod`) so it does not `gosu`-drop
21
+ back to the `mongodb` user. A backup carrying a `WiredTiger.backup` marker runs
22
+ recovery on the first start and **writes into the backup dir** (expected for a
23
+ restore). Starting a standalone from a replica-set backup yields a harmless
24
+ `system.replset` warning. Local DB connections may be blocked by a dev sandbox —
25
+ run measurement commands with the sandbox disabled.
26
+
27
+ - Run: `bundle exec exwiw export --config <app>/exwiw/exwiw.yml
28
+ --adapter=mongodb --host=localhost --port=27018 --database=<database>
29
+ --ids=<target-id> --output-dir=… --log-level=debug`.
30
+ (The optional-argument CLI flags — `--ids`, `--output-dir` — must use `=`; the
31
+ space form passes `nil` and crashes in the option callback.)
32
+
33
+ ## Baseline measurement (worked example, warm cache)
34
+
35
+ Full run ≈ **69s** over a few hundred non-empty collections. Per-collection wall
36
+ time (gap between consecutive `Processing table` log markers) was dominated by a
37
+ single collection:
38
+
39
+ | collection | time | records |
40
+ |---|---|---|
41
+ | **items** | **41.0s (59% of the whole run)** | 18,739 |
42
+ | (other collections) | ~5s and below each | — |
43
+
44
+ A "full run ≈ 10 min" figure is the **cold-cache** version (first read pages the
45
+ `items` data off the backup over the bind mount). The *relative* shape (items
46
+ dominates) is the same warm or cold.
47
+
48
+ ## Root cause: nullable belongs_to FK used as a hard `$in` AND constraint
49
+
50
+ The scope is tiny: one parent entity (`<target-id>`) → **1 store** (linked by
51
+ `business_entity_id` = the entity's **`uuid`**, correctly configured with
52
+ `references: uuid`) → **127 items** (`{store_id: <that store>}`, indexed, ~2ms).
53
+
54
+ But the run dumped **18,739** items, not 127, and scanned the whole collection.
55
+ Why:
56
+
57
+ 1. `MongodbAdapter#related_collection_filter` ANDs **every** belongs_to whose
58
+ parent produced ids. The `stores` filter became:
59
+
60
+ ```
61
+ { user_id: {$in: [580 ids]},
62
+ deleted_user_id: {$in: [96 ids]}, # nullable FK
63
+ business_entity_id: {$in: ["<target-id>"]} }
64
+ ```
65
+
66
+ The one matching store has **`deleted_user_id` absent (null)**, so it can never
67
+ satisfy `deleted_user_id ∈ {96 ids}`. The AND yields **0 stores** → `stores`
68
+ logs "No records matched. skip this table."
69
+
70
+ 2. With `stores` empty, `@state["stores"]` carries no ids, so when `items` is
71
+ built its `store_id` belongs_to contributes nothing. The remaining belongs_tos
72
+ are to **reference/master data that is dumped in full** —
73
+ `large_categories` and `medium_categories`. So the items filter degenerated to:
74
+
75
+ ```
76
+ { large_category_id: {$in: [98 ids]},
77
+ medium_category_id: {$in: [846 ids]} }
78
+ ```
79
+
80
+ 3. `items` has **no index on `large_category_id` / `medium_category_id`** (it does
81
+ have `store_id_1`). So this filter forces a **full COLLSCAN of all 2.43M items**
82
+ — and the Runner scans it **twice**: once for `StreamingResult#size`
83
+ (`count_documents`) and again for the fetch.
84
+
85
+ Isolated phase breakdown of the degenerate items query (warm): `count_documents`
86
+ 3.58s + fetch/decode 1.47s + serialize 0.26s. Serialization (the native
87
+ `Exwiw::ExtJson` C ext) is **not** the bottleneck — the COLLSCAN is, and it is far
88
+ worse cold.
89
+
90
+ The same nullable-FK problem applies one level down: the store's 127 items
91
+ themselves have `*_category_id` = null, so even `{store_id, large_category,
92
+ medium_category}` ANDed returns **0**. The only filter that yields the correct 127
93
+ is `{store_id: {$in: [store]}}` **alone**.
94
+
95
+ ## Implemented fix: genuine-anchor scoping (MongodbAdapter#related_collection_filter)
96
+
97
+ Scope flows from the dump target along belongs_to edges. The fix classifies each
98
+ belongs_to parent of a non-target collection by whether it is **genuinely scoped**
99
+ — reachable back to the dump target through belongs_to chains
100
+ (`#genuine_scope_set`, a fixpoint over the configs) — and applies the constraint
101
+ accordingly:
102
+
103
+ - **Anchor (strict).** Among the genuine parents, the most selective one (fewest
104
+ captured ids) is applied strictly (`{fk: {$in: [...]}}`). It carries the real
105
+ narrowing and, being strict, bounds the result to a small set — which keeps both
106
+ this query and the `$in` sets it feeds downstream from ballooning.
107
+ - **Other genuine parents (null-aware).** `{fk: {$in: [nil, ...]}}` (Mongo's
108
+ `$in: [nil]` matches both explicit nulls and missing fields), so a row whose
109
+ nullable refinement FK is null is not excluded by it.
110
+ - **Reference parents (dropped).** A parent NOT reachable to the dump target is
111
+ reference/master data dumped in full; its id set is "all/most of a table" and is
112
+ not a real scope, so when a genuine anchor exists it is dropped entirely.
113
+ - **No genuine parent:** fall back to the historical strict-AND of whatever
114
+ constraints exist (preserves prior behaviour for unreachable collections).
115
+
116
+ For this extraction: `stores` → `{business_entity_id ∈ {<target-id>}}` (anchor;
117
+ `user_id`/`deleted_user_id` → reference leaks, dropped) → **1 store**; `items` →
118
+ `{store_id ∈ {store}}` strict anchor with the nullable refinement FKs null-aware
119
+ and the `*_category` references dropped → **127** via the `store_id_1` index.
120
+
121
+ ### Measured result (warm, same cache)
122
+
123
+ Full run **58.8s → 11.0s ≈ 5.4×**; `items` 41s double COLLSCAN → ~11ms indexed
124
+ (≈3700×). Correctness also fixed: `stores` 0→1, `items` 18,739 (leaked COLLSCAN)
125
+ →127. Byte-identical existing snapshots (the seed graph is fully genuine and has no
126
+ null FKs, so anchor-strict + null-aware ≡ the prior strict-AND).
127
+
128
+ ### Approaches considered and rejected
129
+
130
+ - **Unconditional null-aware on every belongs_to** (the original iter-1 direction):
131
+ catastrophic. A collection that belongs_to only reference data dumped in full
132
+ becomes ~the whole table once null-aware; the resulting child `$in` then exceeds
133
+ Mongo's **48 MB max message size** and the run crashes. Null-awareness must NOT
134
+ be applied to a collection's only/anchor scope.
135
+ - **Null-aware on all genuine parents (no anchor distinction):** makes the genuine
136
+ *anchor* itself null-aware too — `stores` then matched every store with a null
137
+ `business_entity_id` (a not-fully-backfilled column) → hundreds of thousands of
138
+ stores → a ~39 MB child filter on a downstream collection (**MaxBSONSize**).
139
+ Hence the anchor stays strict.
140
+ - **Scope by the single most-selective genuine parent alone (drop other genuine):**
141
+ fast and correct here, but drops legitimate AND-narrowing for multi-parent
142
+ collections (e.g. `order_items` ∈ orders AND products) and moves seed snapshots.
143
+ - Pure-performance tweaks that keep the (incorrect) 18,739-row output —
144
+ `--cursor-parallel` (changes row order, treats the symptom) or skipping the
145
+ redundant `count_documents` scan (~½ only) — were rejected as the primary fix.
@@ -123,7 +123,7 @@ module Exwiw
123
123
  { config.primary_key => { "$in" => coerce_ids(dump_target.ids) } }
124
124
  end
125
125
  else
126
- related_collection_filter(config, config_by_name)
126
+ related_collection_filter(config, config_by_name, dump_target)
127
127
  end
128
128
 
129
129
  Exwiw::MongoQuery::Find.new(
@@ -352,23 +352,74 @@ module Exwiw
352
352
  # the values were captured from that field in #execute, so their BSON type
353
353
  # already matches the stored FK — no coercion.
354
354
  #
355
- # A belongs_to whose parent produced no ids contributes no constraint:
356
- # either the parent matched nothing, or it is not dumped here (e.g. an
357
- # embedded collection, or one excluded from the run). If that leaves the
358
- # filter empty even though the collection HAS belongs_to, the collection
359
- # cannot be scoped from the dump target — and falling back to an empty `{}`
360
- # filter would scan and dump the ENTIRE collection across every scope. That
361
- # is never what a scoped extraction wants, so constrain it to match nothing
362
- # and warn instead. (A collection with no belongs_to at all is genuine
363
- # reference/master data and is still dumped in full via `{}`.)
364
- private def related_collection_filter(config, config_by_name)
365
- filter = config.belongs_tos.each_with_object({}) do |relation, acc|
355
+ # Scope flows from the dump target along belongs_to edges. A belongs_to is
356
+ # classified by whether its parent is *genuinely scoped* reachable back to
357
+ # the dump target through belongs_to chains (see #genuine_scope_set) which
358
+ # determines how its constraint is applied:
359
+ #
360
+ # - Among the genuine parents, the most selective one (fewest captured ids)
361
+ # is the ANCHOR and is applied strictly. It carries the real narrowing and,
362
+ # being strict, bounds the result to a small set which keeps both this
363
+ # query and the `$in` sets it feeds downstream from ballooning.
364
+ #
365
+ # - The OTHER genuine parents are applied null-aware: a row whose (nullable)
366
+ # FK is null/absent has no reference through that relation and must not be
367
+ # excluded by it. `nil` is added to the `$in` set (Mongo's `$in: [nil]`
368
+ # matches both explicit nulls and missing fields). Without this, a nullable
369
+ # genuine FK that is null on otherwise in-scope rows ANDs the result to
370
+ # empty — dropping legitimate rows, and (when it zeroes a parent) making
371
+ # children lose that parent's selective+indexed scope and degenerate to a
372
+ # full COLLSCAN. See docs/mongodb-scoping-fullscan-notes.md. Null-aware is
373
+ # applied to non-anchor parents only: making the sole/anchor scope itself
374
+ # null-aware would match every row whose FK is null (e.g. a not-yet-
375
+ # backfilled column), ballooning the result instead of scoping it.
376
+ #
377
+ # - Reference parents (NOT reachable to the dump target — master/reference
378
+ # data dumped in full, or only reachable via such data) produce a non-
379
+ # scoping id set: "all/most of a reference table", which neither narrows
380
+ # meaningfully nor, made null-aware, stays bounded. So when the collection
381
+ # has a genuine parent to anchor on, reference-parent constraints are
382
+ # dropped entirely.
383
+ #
384
+ # When NO genuine parent produced ids, the collection is not reachable from
385
+ # the dump target; fall back to the historical strict-AND of whatever
386
+ # constraints exist (bounded, preserves prior behavior).
387
+ #
388
+ # A belongs_to whose parent produced no ids contributes no constraint: either
389
+ # the parent matched nothing, or it is not dumped here (e.g. an embedded
390
+ # collection, or one excluded from the run). If that leaves the filter empty
391
+ # even though the collection HAS belongs_to, the collection cannot be scoped
392
+ # from the dump target — and an empty `{}` filter would scan and dump the
393
+ # ENTIRE collection across every scope. That is never what a scoped
394
+ # extraction wants, so constrain it to match nothing and warn instead. (A
395
+ # collection with no belongs_to at all is genuine reference/master data and
396
+ # is still dumped in full via `{}`.)
397
+ private def related_collection_filter(config, config_by_name, dump_target)
398
+ genuine = genuine_scope_set(config_by_name, dump_target.table_name)
399
+
400
+ genuine_clauses = []
401
+ reference_clauses = []
402
+ config.belongs_tos.each do |relation|
366
403
  values = parent_state_for(relation, config_by_name)
367
404
  next if values.nil? || values.empty?
368
405
 
369
- acc[relation.foreign_key] = { "$in" => values }
406
+ target = genuine.include?(relation.table_name) ? genuine_clauses : reference_clauses
407
+ target << [relation.foreign_key, values]
370
408
  end
371
409
 
410
+ filter =
411
+ if genuine_clauses.any?
412
+ anchor_index = (0...genuine_clauses.size).min_by { |i| genuine_clauses[i][1].size }
413
+ genuine_clauses.each_with_index.each_with_object({}) do |((foreign_key, values), index), acc|
414
+ acc[foreign_key] =
415
+ index == anchor_index ? { "$in" => values } : { "$in" => [nil] + values }
416
+ end
417
+ else
418
+ reference_clauses.each_with_object({}) do |(foreign_key, values), acc|
419
+ acc[foreign_key] = { "$in" => values }
420
+ end
421
+ end
422
+
372
423
  return filter unless filter.empty? && config.belongs_tos.any?
373
424
 
374
425
  @logger.warn(
@@ -379,6 +430,31 @@ module Exwiw
379
430
  { config.primary_key => { "$in" => [] } }
380
431
  end
381
432
 
433
+ # The set of collection names *genuinely scoped* by the dump target: the
434
+ # target itself, plus every collection that can reach it by following
435
+ # belongs_to edges (child -> parent) transitively. Computed by fixpoint over
436
+ # the configs. Everything outside this set is reference/master data (or only
437
+ # reachable through it) whose belongs_to id sets do not represent a real
438
+ # scope. Memoized per target name; the configs do not mutate mid-run.
439
+ private def genuine_scope_set(config_by_name, target_name)
440
+ (@genuine_scope_set_cache ||= {})[target_name] ||=
441
+ begin
442
+ reachable = Set.new([target_name])
443
+ loop do
444
+ added = false
445
+ config_by_name.each_value do |cfg|
446
+ next if cfg.embedded? || reachable.include?(cfg.name)
447
+ next unless cfg.belongs_tos.any? { |relation| reachable.include?(relation.table_name) }
448
+
449
+ reachable << cfg.name
450
+ added = true
451
+ end
452
+ break unless added
453
+ end
454
+ reachable
455
+ end
456
+ end
457
+
382
458
  # The captured parent-collection values a child belongs_to should be
383
459
  # constrained by: the values of the parent field the FK references
384
460
  # (`relation.references`, default the parent primary_key). nil when the
@@ -0,0 +1,46 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "yaml"
4
+
5
+ module Exwiw
6
+ # Minimal reader for the exwiw config YAML (exwiw.yml / exwiw.yaml).
7
+ #
8
+ # The CLI has its own, richer config handling (CLI#apply_config_file!); this
9
+ # module exists for contexts that have no CLI in scope — chiefly the
10
+ # `exwiw:schema:*` rake tasks, which otherwise only knew the
11
+ # EXWIW_SCHEMA_DIR_PATH env var and a hard-coded default. It deliberately
12
+ # reads only what those tasks need (schema_dir) and never aborts the process:
13
+ # an absent or unreadable config simply yields nil so the caller can fall back.
14
+ module ConfigFile
15
+ # Mirrors CLI::DEFAULT_CONFIG_PATHS; .yml wins when both are present.
16
+ DEFAULT_PATHS = %w[exwiw.yml exwiw.yaml].freeze
17
+
18
+ module_function
19
+
20
+ # The `schema_dir` from the config file, expanded to an absolute path
21
+ # relative to the config file's own directory (matching the CLI). Returns
22
+ # nil when no config file is found, it cannot be parsed, or it does not set
23
+ # `schema_dir`. Pass an explicit `path` to read a specific file; otherwise
24
+ # the default paths are looked up in the current directory.
25
+ def schema_dir(path = nil)
26
+ path ||= DEFAULT_PATHS.map { |p| File.expand_path(p) }.find { |p| File.file?(p) }
27
+ return nil if path.nil? || !File.file?(path)
28
+
29
+ config =
30
+ begin
31
+ YAML.safe_load(File.read(path))
32
+ rescue Psych::SyntaxError
33
+ nil
34
+ end
35
+ return nil unless config.is_a?(Hash)
36
+
37
+ value = config["schema_dir"]
38
+ return nil if value.nil?
39
+
40
+ # Strip a trailing slash and resolve relative to the config file's
41
+ # directory, exactly as CLI#expand_dir does.
42
+ value = value.end_with?("/") ? value[0..-2] : value
43
+ File.expand_path(value, File.dirname(File.expand_path(path)))
44
+ end
45
+ end
46
+ end
@@ -24,6 +24,7 @@ module Exwiw
24
24
  table_by_name = configs.each_with_object({}) { |config, hash| hash[config.name] = config }
25
25
 
26
26
  target = table_by_name[@dump_target.table_name]
27
+ validate_target_exists!(target)
27
28
  adapter.validate_as_dump_target!(target) if target
28
29
 
29
30
  dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
@@ -62,6 +63,21 @@ module Exwiw
62
63
  end
63
64
  end
64
65
 
66
+ # Reject a `--target-table` (or `--target-collection`) absent from the loaded
67
+ # schema; mirrors Runner#validate_target_exists! so explain and export fail the
68
+ # same way on a typo. `target` is the looked-up config (nil when not found).
69
+ #
70
+ # TODO: same caveat as Runner#validate_target_exists! — this checks the schema
71
+ # (schema_dir JSON), not the live DB connection; verifying against the
72
+ # connection would need a table-exists capability on each adapter. revisit.
73
+ private def validate_target_exists!(target)
74
+ return if @dump_target.table_name.nil?
75
+ return unless target.nil?
76
+
77
+ raise ArgumentError,
78
+ "--target-table '#{@dump_target.table_name}' does not exist in the schema (#{@schema_dir})."
79
+ end
80
+
65
81
  private def validate_ignored(configs)
66
82
  ignored_names = configs.select { |c| c.ignore }.map(&:name).to_set
67
83
  return if ignored_names.empty?
data/lib/exwiw/runner.rb CHANGED
@@ -36,6 +36,7 @@ module Exwiw
36
36
  table_by_name = configs.each_with_object({}) { |config, hash| hash[config.name] = config }
37
37
 
38
38
  target = table_by_name[@dump_target.table_name]
39
+ validate_target_exists!(target)
39
40
  adapter.validate_as_dump_target!(target) if target
40
41
 
41
42
  dumpable_configs = configs.select { |c| adapter.dumpable?(c) }
@@ -211,6 +212,24 @@ module Exwiw
211
212
  ignored_names.each { |n| @logger.info("Table '#{n}' is marked ignore:true (schema will be included, data extraction skipped)") }
212
213
  end
213
214
 
215
+ # Reject a `--target-table` (or `--target-collection`) that does not match any
216
+ # table/collection in the loaded schema. Without this a typo'd target silently
217
+ # matched nothing and produced an empty dump with no indication of the mistake.
218
+ # `target` is the looked-up config (nil when not found); a nil dump target
219
+ # (dump-all / scope-column mode) is allowed through.
220
+ #
221
+ # TODO: this checks the loaded schema (schema_dir JSON), not the live DB
222
+ # connection — a table that exists in the database but has no schema config
223
+ # is still rejected here. We may instead want to verify existence against the
224
+ # connection (would need a table-exists capability on each adapter). revisit.
225
+ private def validate_target_exists!(target)
226
+ return if @dump_target.table_name.nil?
227
+ return unless target.nil?
228
+
229
+ raise ArgumentError,
230
+ "--target-table '#{@dump_target.table_name}' does not exist in the schema (#{@schema_dir})."
231
+ end
232
+
214
233
  private def validate_rails_managed_target!(configs)
215
234
  return if @dump_target.table_name.nil?
216
235
 
data/lib/exwiw/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Exwiw
4
- VERSION = "0.6.0"
4
+ VERSION = "0.6.2"
5
5
  end
data/lib/exwiw.rb CHANGED
@@ -6,6 +6,7 @@ require "json"
6
6
  require "serdes"
7
7
 
8
8
  require_relative "exwiw/ext_json"
9
+ require_relative "exwiw/config_file"
9
10
  require_relative "exwiw/belongs_to"
10
11
  require_relative "exwiw/table_column"
11
12
  require_relative "exwiw/table_config"
data/lib/tasks/exwiw.rake CHANGED
@@ -2,12 +2,23 @@
2
2
 
3
3
  namespace :exwiw do
4
4
  namespace :schema do
5
+ # Output directory for the generated schema config. Precedence:
6
+ # 1. EXWIW_SCHEMA_DIR_PATH env var (explicit per-run override)
7
+ # 2. schema_dir in the exwiw config file (exwiw.yml/.yaml), so generating
8
+ # the schema and running the `exwiw` CLI agree on one location without
9
+ # repeating the path
10
+ # 3. the historical "exwiw/schema" default
11
+ # Resolved at task-run time (after `require "exwiw"` has loaded ConfigFile).
12
+ resolve_schema_dir = lambda do
13
+ ENV["EXWIW_SCHEMA_DIR_PATH"] || Exwiw::ConfigFile.schema_dir || "exwiw/schema"
14
+ end
15
+
5
16
  desc "Generate schema from application"
6
17
  task generate: :environment do
7
18
  require "exwiw"
8
19
 
9
20
  Exwiw::SchemaGenerator.from_rails_application(
10
- output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
21
+ output_dir: resolve_schema_dir.call,
11
22
  ).generate!
12
23
  end
13
24
 
@@ -16,7 +27,7 @@ namespace :exwiw do
16
27
  require "exwiw"
17
28
 
18
29
  result = Exwiw::SchemaGenerator.from_rails_application(
19
- output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
30
+ output_dir: resolve_schema_dir.call,
20
31
  ).tidy!
21
32
 
22
33
  if result.empty?
@@ -47,7 +58,7 @@ namespace :exwiw do
47
58
  require "exwiw"
48
59
 
49
60
  Exwiw::MongoidSchemaGenerator.from_rails_application(
50
- output_dir: ENV["EXWIW_SCHEMA_DIR_PATH"] || "exwiw/schema",
61
+ output_dir: resolve_schema_dir.call,
51
62
  skip_unsupported: ENV["EXWIW_SKIP_UNSUPPORTED"] == "1",
52
63
  ).generate!
53
64
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: exwiw
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.0
4
+ version: 0.6.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shia
@@ -36,6 +36,7 @@ files:
36
36
  - CHANGELOG.md
37
37
  - LICENSE.txt
38
38
  - README.md
39
+ - docs/mongodb-scoping-fullscan-notes.md
39
40
  - docs/optimization-notes.md
40
41
  - docs/optimize-mongodb-export-with-native-ext.md
41
42
  - docs/plans/2026-05-15-insert-000-schema-file.md
@@ -60,6 +61,7 @@ files:
60
61
  - lib/exwiw/after_insert_hook.rb
61
62
  - lib/exwiw/belongs_to.rb
62
63
  - lib/exwiw/cli.rb
64
+ - lib/exwiw/config_file.rb
63
65
  - lib/exwiw/ddl_postprocessor.rb
64
66
  - lib/exwiw/determine_table_processing_order.rb
65
67
  - lib/exwiw/embedded_in.rb