exwiw 0.8.2 → 0.8.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eb6879b8efb94ef9a9a0bd37df32962257106df9eec4c3fc9de844a12c6360cd
4
- data.tar.gz: b43b0bfde090111d7ce23083db7471f15974292ddb5886da682de9bc443febfa
3
+ metadata.gz: ee150e9fd830a7829be8c0eebbeca19f0e2d1b1dbcce50fe78e63d62e878997c
4
+ data.tar.gz: 31377340d8737ece209e1c0755eeac49ac40930562503e2c4d04f6362938e9d9
5
5
  SHA512:
6
- metadata.gz: e8cca59b376a369b9253d1482f3ee64ae8c542b00c1c3eddb192d273efe0e3c969e3aec1770f087977efe6b75fa52f375538e286d31a87d7f87a197739a4eeaf
7
- data.tar.gz: 78a3ece09cd6ec9ecb6fd7b0a2f166bc41d51720ea36935ca4cd974bf766008f11688f0e0dc7bbd8a63ce29f4719806e1a4705b0390850a857a192fcfaa56976
6
+ metadata.gz: 26151e0bf763b6f48e993d2025b4bac804a373fe2e3cfd71bb70d91b86b453826838ec683ebfd3f4f4336bb83b8a12d567d7dbda7f8a4ef48fc7f50bbe98145a
7
+ data.tar.gz: b29a2da42750dde93cd53eb56a11a592478134b690ccba31b59cd312f3685dad2713e63897d09cb77246a8be37dfe55890f1a8505c1aa8b5fd020b7bc7760a03
data/CHANGELOG.md CHANGED
@@ -2,6 +2,18 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [0.8.4] - 2026-06-24
6
+
7
+ ### Fixed
8
+
9
+ - **Scope id-sets are materialized and probed by JOIN instead of `<col> IN (subquery)`, removing a correlated full-scan on large tables.** The three id-set scope shapes — the multi-referencer `reverse_scope` `UNION`, the single-referencer reverse/`referenced_by` extraction, and the multi-hop forward (`via_scoped_parent`) cascade — were emitted as `<col> IN (<subquery>)`. On a large global-identity table (e.g. `users`) MySQL cannot turn a `UNION` subquery into a materialized semi-join and falls back to its IN-to-`EXISTS` rewrite: a correlated `DEPENDENT SUBQUERY`/`DEPENDENT UNION` re-evaluated **per outer row**, so the driving table is full-scanned and the union re-run for every row (the plan that ran for minutes and timed out on a production-scale identity table, even for an empty tenant). These clauses are now lifted into a `JOIN` against a materialized derived table — `JOIN (SELECT DISTINCT src.<id> AS exwiw_scope_id FROM (<id-set subquery>) AS src) AS ids ON <table>.<col> = ids.exwiw_scope_id` — so the engine evaluates the id set **once** (the `DISTINCT` makes the derived table non-mergeable, hence materialized) and probes the outer table by primary key. The `DISTINCT` also dedups, so the result set is identical to the old `IN` form; the cascade nests the same way (each level materialized once); NULL exclusion, the forward-path cycle guard, single-parent/polymorphic skips, and PostgreSQL's `uuid`/`varchar` `::text` reconciliation are all preserved. All three SQL adapters (mysql / postgresql / sqlite). See the [README](README.md#why-a-join-not-in-subquery).
10
+
11
+ ## [0.8.3] - 2026-06-24
12
+
13
+ ### Fixed
14
+
15
+ - **Forward scope (`via_scoped_parent`) now cascades across multiple `belongs_to` hops instead of dying after one.** A table with no scope column of its own is scoped by constraining it to its `belongs_to` parent's in-scope ids (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`). Previously the parent was rebuilt with forward scoping turned off, so if the parent was *itself* scoped only through *its* parent (e.g. an identity-family table two or more hops below a `reverse_scope`/`referenced_by` table — `users ← end_users ← end_user_profiles`), the rebuilt parent came back unconstrained and the child was classified `:unscopable` — forcing a `scope_exempt` full dump and re-introducing the bloat the prune removes. The boolean single-hop bound is replaced by a forward-path guard: the rescue keeps forward scoping enabled while rebuilding the parent (appending the current table to the path), so the cascade recurses N levels and produces a correspondingly nested `IN (subquery)`; it terminates only on a genuine `belongs_to` cycle (a table already on the path is not revisited, falling through to `:unscopable`). The single-unambiguous-parent rule and the polymorphic-skip are unchanged, and the reverse arms still cannot loop back through the table being reverse-scoped. SQL adapters only.
16
+
5
17
  ## [0.8.2] - 2026-06-24
6
18
 
7
19
  ### Added
data/README.md CHANGED
@@ -179,12 +179,18 @@ Each table is resolved as follows:
179
179
  nearest such table and applies the scope filter there (the same join machinery
180
180
  the single-target mode uses).
181
181
  - **`belongs_to` a parent that is itself scoped but carries no scope column of its
182
- own** → exwiw constrains this table to the parent's in-scope ids via a subquery
183
- (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`). This covers a *hub*
184
- table that has no scope column and is scoped only because an extractable child
185
- references it (see referenced-by below): the hub's other `belongs_to` children
186
- ride along to just the in-scope rows instead of being dumped in full. Limited to
187
- a single forward hop and a single unambiguous scopable parent.
182
+ own** → exwiw constrains this table to the parent's in-scope ids by joining it to
183
+ the parent's scoped query, materialized as a derived table
184
+ (`JOIN (SELECT DISTINCT parent.pk FROM <parent's scoped query>) ON fk = …`).
185
+ This covers a *hub* table that has no scope column and is scoped only because an
186
+ extractable child references it (see referenced-by below): the hub's other
187
+ `belongs_to` children ride along to just the in-scope rows instead of being dumped
188
+ in full. The parent itself may be scoped the same way, so this **cascades across
189
+ multiple hops** (each a single unambiguous scopable parent) and the derived-table
190
+ JOINs nest correspondingly; the recursion terminates on a genuine `belongs_to`
191
+ cycle (a table already on the path is left `:unscopable` rather than looped on).
192
+ (See [Why a JOIN, not `IN (subquery)`](#why-a-join-not-in-subquery) for the
193
+ materialization rationale.)
188
194
  - **Cannot be scoped at all** (no scope column and no path to one) → exwiw
189
195
  **aborts** and lists the offending tables, so an unscoped table is never silently
190
196
  dumped in full. For each, either declare a `scope_column`, add a `belongs_to`
@@ -565,15 +571,19 @@ The same type filter is applied on the join path — and in the matching `delete
565
571
  ActiveStorage is handled automatically — no ActiveStorage-specific configuration is required. The `has_one_attached` / `has_many_attached` macros don't add a column to the owning model; they generate ordinary associations that exwiw already understands:
566
572
 
567
573
  - **`active_storage_attachments`** is the polymorphic join row (`belongs_to :record, polymorphic: true` + `belongs_to :blob`). `exwiw:schema:generate` expands the polymorphic `record` into one `belongs_to` per model that declared `has_*_attached` (found via the generated `has_* ..., as: :record` reflections), exactly like any other [polymorphic `belongs_to`](#polymorphic-belongs_to). So only the attachments whose owner is among the dumped rows are extracted.
568
- - **`active_storage_blobs`** has no `belongs_to` of its own (attachments point *at* it), so it has no path to the dump target. exwiw narrows it via **reverse / "referenced_by" extraction**: a parent table referenced by exactly one constrained, non-polymorphic child is constrained to just the referenced ids instead of dumping every row:
574
+ - **`active_storage_blobs`** has no `belongs_to` of its own (attachments point *at* it), so it has no path to the dump target. exwiw narrows it via **reverse / "referenced_by" extraction**: a parent table referenced by exactly one constrained, non-polymorphic child is constrained to just the referenced ids instead of dumping every row. The id set is materialized once and joined back (see [Why a JOIN, not `IN (subquery)`](#why-a-join-not-in-subquery)):
569
575
 
570
576
  ```sql
571
577
  SELECT active_storage_blobs.* FROM active_storage_blobs
572
- WHERE active_storage_blobs.id IN (
573
- SELECT active_storage_attachments.blob_id FROM active_storage_attachments
574
- WHERE active_storage_attachments.record_id IN (/* owner subquery */)
575
- AND active_storage_attachments.record_type = '...'
576
- )
578
+ JOIN (
579
+ SELECT DISTINCT exwiw_scope_src_0.blob_id AS exwiw_scope_id
580
+ FROM (
581
+ SELECT active_storage_attachments.blob_id FROM active_storage_attachments
582
+ WHERE active_storage_attachments.record_id IN (/* owner subquery */)
583
+ AND active_storage_attachments.record_type = '...'
584
+ ) AS exwiw_scope_src_0
585
+ ) AS exwiw_scope_ids_0
586
+ ON active_storage_blobs.id = exwiw_scope_ids_0.exwiw_scope_id
577
587
  ```
578
588
 
579
589
  `active_storage_variant_records` also references blobs, but since it has no path of its own to the dump target it doesn't constrain anything and is ignored as a referencer — blobs stays narrowed to the attachment-referenced ids. (A parent referenced by *multiple* constrained children currently falls back to dumping all of its rows.)
@@ -600,18 +610,22 @@ The automatic reverse extraction above narrows a table referenced by **exactly o
600
610
  }
601
611
  ```
602
612
 
603
- produces (each arm reuses that referencer's own scope, so a per-tenant run keeps only that tenant's ids):
613
+ produces (each arm reuses that referencer's own scope, so a per-tenant run keeps only that tenant's ids; the `UNION` id set is materialized once and joined back — see [Why a JOIN, not `IN (subquery)`](#why-a-join-not-in-subquery)):
604
614
 
605
615
  ```sql
606
616
  SELECT users.* FROM users
607
- WHERE users.id IN (
608
- SELECT customers.user_id FROM customers WHERE <customers' scope> AND customers.user_id IS NOT NULL
609
- UNION
610
- SELECT staff.user_id FROM staff WHERE <staff' scope> AND staff.user_id IS NOT NULL
611
- UNION
612
- SELECT business_entity_customers.kantan_yoyaku_user_id FROM business_entity_customers
613
- WHERE <…' scope> AND business_entity_customers.kantan_yoyaku_user_id IS NOT NULL
614
- )
617
+ JOIN (
618
+ SELECT DISTINCT exwiw_scope_src_0.user_id AS exwiw_scope_id
619
+ FROM (
620
+ SELECT customers.user_id FROM customers WHERE <customers' scope> AND customers.user_id IS NOT NULL
621
+ UNION
622
+ SELECT staff.user_id FROM staff WHERE <staff' scope> AND staff.user_id IS NOT NULL
623
+ UNION
624
+ SELECT business_entity_customers.kantan_yoyaku_user_id FROM business_entity_customers
625
+ WHERE <…' scope> AND business_entity_customers.kantan_yoyaku_user_id IS NOT NULL
626
+ ) AS exwiw_scope_src_0
627
+ ) AS exwiw_scope_ids_0
628
+ ON users.id = exwiw_scope_ids_0.exwiw_scope_id
615
629
  ```
616
630
 
617
631
  Notes:
@@ -619,9 +633,22 @@ Notes:
619
633
  - **`column` is explicit**, so a *non-default* foreign key (e.g. `kantan_yoyaku_user_id`, or `organization_admins.id` which itself references `users.id`) is honored, and even a column with no declared `belongs_to` edge can be enumerated.
620
634
  - **Only scoped referencers belong in `via`.** Each arm's query must come out constrained; an unconstrained referencer (e.g. a `scope_exempt` table, or one with no path to a scope) would project *every* id and union the whole table back — so such an arm is **skipped with a warning** rather than silently widening the dump. An unknown table is likewise skipped with a warning. If no arm survives, the table stays unscopable and (in [scope-column mode](#scope-column-mode)) the run aborts via `validate_scope!`.
621
635
  - **NULLs are excluded** per arm (`IS NOT NULL`).
622
- - **Satellites need no config.** A table that `belongs_to` the reverse-scoped table (e.g. `end_users.id → users.id`, or `identities.user_id → users.id`) tightens to the kept ids automatically through the normal cascade — only the reverse-scoped table itself declares `reverse_scope`.
636
+ - **Satellites need no config.** A table that `belongs_to` the reverse-scoped table (e.g. `end_users.id → users.id`, or `identities.user_id → users.id`) tightens to the kept ids automatically through the normal cascade — only the reverse-scoped table itself declares `reverse_scope`. The cascade is **multi-hop**, so a table several `belongs_to` hops below the reverse-scoped table (e.g. `end_user_profiles → end_users → users`) also tightens automatically, with no config of its own.
623
637
  - Works in both single-target and scope-column mode. Polymorphic foreign keys are not eligible as anchors (the named `column` is always a concrete column).
624
638
 
639
+ ### Why a JOIN, not `IN (subquery)`
640
+
641
+ Every scope id-set above — the multi-referencer `reverse_scope` `UNION`, the single-referencer reverse extraction, and the multi-hop forward cascade — is emitted as a `JOIN` to a `SELECT DISTINCT` derived table rather than `<col> IN (<subquery>)`:
642
+
643
+ ```sql
644
+ … JOIN (SELECT DISTINCT src.<id> AS exwiw_scope_id FROM (<id-set subquery>) AS src) AS ids
645
+ ON <table>.<col> = ids.exwiw_scope_id
646
+ ```
647
+
648
+ Both forms select the **same rows** — the `DISTINCT` dedups, so the join never fans out — but the query plans differ sharply on a large table. As `<col> IN (… UNION …)`, MySQL cannot turn a `UNION` subquery into a materialized semi-join and falls back to its IN-to-`EXISTS` rewrite: a **correlated `DEPENDENT SUBQUERY`** re-evaluated for every outer row, i.e. a full scan of the (potentially huge) outer table multiplied by the cost of the union. The derived-table form forces the engine to evaluate the id set **once** (the `DISTINCT` makes the derived table non-mergeable, hence materialized) and then probe the outer table by its primary key. On a global-identity table such as `users` this is the difference between a full table scan and an index lookup; the cascade nests the same way, so each level is materialized once instead of being re-evaluated by the level above.
649
+
650
+ All three SQL adapters (mysql / postgresql / sqlite) emit this shape. PostgreSQL additionally reconciles a `uuid`/`varchar` type mismatch by casting the join key and the projected id to `text`, exactly as the old `IN` form did.
651
+
625
652
  ### Rails-managed tables (special `type` values)
626
653
 
627
654
  Some tables are owned by Rails itself rather than the application — they have no ActiveRecord model and Rails reserves the right to evolve their column shape between versions (e.g. `schema_migrations`, `ar_internal_metadata`). exwiw treats them as a distinct category via the `type` field on a table config:
@@ -229,11 +229,18 @@ module Exwiw
229
229
  def compile_ast(query_ast, count_only: false)
230
230
  raise NotImplementedError unless query_ast.is_a?(Exwiw::QueryAst::Select)
231
231
 
232
+ # Lift scope id-set clauses (reverse_scope UNION / forward cascade /
233
+ # single referenced_by) out of `WHERE <col> IN (subquery)` and into a
234
+ # JOIN against a materialized derived table. See #compile_scope_join.
235
+ scope_clauses, plain_where_clauses = partition_scope_clauses(query_ast.where_clauses)
236
+
232
237
  sql = "SELECT "
233
238
  sql += if count_only
234
239
  "COUNT(*)"
235
240
  elsif query_ast.select_all
236
- "*"
241
+ # A lifted scope JOIN brings a derived table into FROM, so a bare
242
+ # `*` would also project its column. Qualify to this table's own.
243
+ scope_clauses.any? ? "#{query_ast.from_table_name}.*" : "*"
237
244
  else
238
245
  query_ast.columns.map { |col| compile_column_name(query_ast, col) }.join(', ')
239
246
  end
@@ -256,14 +263,43 @@ module Exwiw
256
263
  end
257
264
  end
258
265
 
259
- if query_ast.where_clauses.any?
266
+ scope_clauses.each_with_index do |where_clause, idx|
267
+ sql += " #{compile_scope_join(query_ast.from_table_name, where_clause, idx)}"
268
+ end
269
+
270
+ if plain_where_clauses.any?
260
271
  sql += " WHERE "
261
- sql += query_ast.where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
272
+ sql += plain_where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
262
273
  end
263
274
 
264
275
  sql
265
276
  end
266
277
 
278
+ # Render a scope id-set clause as a JOIN to a materialized derived table:
279
+ #
280
+ # JOIN (SELECT DISTINCT src.<proj> AS exwiw_scope_id
281
+ # FROM (<subquery>) AS src) AS ids
282
+ # ON <table>.<col> = ids.exwiw_scope_id
283
+ #
284
+ # The DISTINCT makes the derived table non-mergeable, so MySQL materializes
285
+ # the id-set once and probes this table by it (PK/index lookup) — instead
286
+ # of full-scanning this table and re-evaluating a correlated
287
+ # `IN (… UNION …)` per row (the DEPENDENT SUBQUERY / IN-to-EXISTS fallback,
288
+ # which a UNION subquery cannot be turned into a materialized semi-join).
289
+ # DISTINCT also dedups, so the join never fans out: the row set is identical
290
+ # to `<col> IN (subquery)`.
291
+ private def compile_scope_join(from_table_name, where_clause, idx)
292
+ subquery = where_clause.value
293
+ projection = subquery_projection_name(subquery)
294
+ src_alias = "exwiw_scope_src_#{idx}"
295
+ ids_alias = "exwiw_scope_ids_#{idx}"
296
+ outer_key = "#{from_table_name}.#{where_clause.column_name}"
297
+
298
+ "JOIN (SELECT DISTINCT #{src_alias}.#{projection} AS exwiw_scope_id " \
299
+ "FROM (#{compile_subquery(subquery)}) AS #{src_alias}) AS #{ids_alias} " \
300
+ "ON #{outer_key} = #{ids_alias}.exwiw_scope_id"
301
+ end
302
+
267
303
  private def compile_where_condition(where_clause, table_name)
268
304
  # Use as it is if it's a raw query
269
305
  return where_clause if where_clause.is_a?(String)
@@ -301,9 +301,16 @@ module Exwiw
301
301
  def compile_ast(query_ast, select_cast_to: nil)
302
302
  raise NotImplementedError unless query_ast.is_a?(Exwiw::QueryAst::Select)
303
303
 
304
+ # Lift scope id-set clauses (reverse_scope UNION / forward cascade /
305
+ # single referenced_by) out of `WHERE <col> IN (subquery)` and into a
306
+ # JOIN against a materialized derived table. See #compile_scope_join.
307
+ scope_clauses, plain_where_clauses = partition_scope_clauses(query_ast.where_clauses)
308
+
304
309
  sql = "SELECT "
305
310
  sql += if query_ast.select_all
306
- "*"
311
+ # A lifted scope JOIN brings a derived table into FROM, so a bare
312
+ # `*` would also project its column. Qualify to this table's own.
313
+ scope_clauses.any? ? "#{query_ast.from_table_name}.*" : "*"
307
314
  else
308
315
  cols = query_ast.columns.map { |col| compile_column_name(query_ast, col) }
309
316
  cols = cols.map { |c| "#{c}::#{select_cast_to}" } if select_cast_to
@@ -337,14 +344,45 @@ module Exwiw
337
344
  end
338
345
  end
339
346
 
340
- if query_ast.where_clauses.any?
347
+ scope_clauses.each_with_index do |where_clause, idx|
348
+ sql += " #{compile_scope_join(query_ast.from_table_name, where_clause, idx)}"
349
+ end
350
+
351
+ if plain_where_clauses.any?
341
352
  sql += " WHERE "
342
- sql += query_ast.where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
353
+ sql += plain_where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
343
354
  end
344
355
 
345
356
  sql
346
357
  end
347
358
 
359
+ # Render a scope id-set clause as a JOIN to a materialized derived table
360
+ # (see MysqlAdapter#compile_scope_join for the full rationale). The DISTINCT
361
+ # forces the engine to materialize the id-set once and probe this table by
362
+ # it, instead of full-scanning and re-evaluating a correlated subquery per
363
+ # row; it also dedups, so the join is row-for-row identical to
364
+ # `<col> IN (subquery)`.
365
+ #
366
+ # Type reconciliation mirrors the old IN form: when the outer column and
367
+ # the projected id clash (e.g. uuid vs varchar), #compile_subquery already
368
+ # casts every arm to text, so the derived `exwiw_scope_id` is text and the
369
+ # outer key is cast to match.
370
+ private def compile_scope_join(from_table_name, where_clause, idx)
371
+ subquery = where_clause.value
372
+ projection = subquery_projection_name(subquery)
373
+ src_alias = "exwiw_scope_src_#{idx}"
374
+ ids_alias = "exwiw_scope_ids_#{idx}"
375
+
376
+ inner_sql = compile_subquery(subquery, outer_table: from_table_name, outer_column: where_clause.column_name)
377
+ cast_to = subquery_cast_to(subquery, from_table_name, where_clause.column_name)
378
+ outer_key = "#{from_table_name}.#{where_clause.column_name}"
379
+ outer_key = "#{outer_key}::#{cast_to}" if cast_to
380
+
381
+ "JOIN (SELECT DISTINCT #{src_alias}.#{projection} AS exwiw_scope_id " \
382
+ "FROM (#{inner_sql}) AS #{src_alias}) AS #{ids_alias} " \
383
+ "ON #{outer_key} = #{ids_alias}.exwiw_scope_id"
384
+ end
385
+
348
386
  private def compile_where_condition(where_clause, table_name)
349
387
  # Use as it is if it's a raw query
350
388
  return where_clause if where_clause.is_a?(String)
@@ -198,11 +198,18 @@ module Exwiw
198
198
  def compile_ast(query_ast, count_only: false)
199
199
  raise NotImplementedError unless query_ast.is_a?(Exwiw::QueryAst::Select)
200
200
 
201
+ # Lift scope id-set clauses (reverse_scope UNION / forward cascade /
202
+ # single referenced_by) out of `WHERE <col> IN (subquery)` and into a
203
+ # JOIN against a materialized derived table. See #compile_scope_join.
204
+ scope_clauses, plain_where_clauses = partition_scope_clauses(query_ast.where_clauses)
205
+
201
206
  sql = "SELECT "
202
207
  sql += if count_only
203
208
  "COUNT(*)"
204
209
  elsif query_ast.select_all
205
- "*"
210
+ # A lifted scope JOIN brings a derived table into FROM, so a bare
211
+ # `*` would also project its column. Qualify to this table's own.
212
+ scope_clauses.any? ? "#{query_ast.from_table_name}.*" : "*"
206
213
  else
207
214
  query_ast.columns.map { |col| compile_column_name(query_ast, col) }.join(', ')
208
215
  end
@@ -225,14 +232,36 @@ module Exwiw
225
232
  end
226
233
  end
227
234
 
228
- if query_ast.where_clauses.any?
235
+ scope_clauses.each_with_index do |where_clause, idx|
236
+ sql += " #{compile_scope_join(query_ast.from_table_name, where_clause, idx)}"
237
+ end
238
+
239
+ if plain_where_clauses.any?
229
240
  sql += " WHERE "
230
- sql += query_ast.where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
241
+ sql += plain_where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
231
242
  end
232
243
 
233
244
  sql
234
245
  end
235
246
 
247
+ # Render a scope id-set clause as a JOIN to a materialized derived table
248
+ # (see MysqlAdapter#compile_scope_join for the full rationale). The DISTINCT
249
+ # forces the engine to materialize the id-set once and probe this table by
250
+ # it, instead of full-scanning and re-evaluating a correlated subquery per
251
+ # row; it also dedups, so the join is row-for-row identical to
252
+ # `<col> IN (subquery)`.
253
+ private def compile_scope_join(from_table_name, where_clause, idx)
254
+ subquery = where_clause.value
255
+ projection = subquery_projection_name(subquery)
256
+ src_alias = "exwiw_scope_src_#{idx}"
257
+ ids_alias = "exwiw_scope_ids_#{idx}"
258
+ outer_key = "#{from_table_name}.#{where_clause.column_name}"
259
+
260
+ "JOIN (SELECT DISTINCT #{src_alias}.#{projection} AS exwiw_scope_id " \
261
+ "FROM (#{compile_subquery(subquery)}) AS #{src_alias}) AS #{ids_alias} " \
262
+ "ON #{outer_key} = #{ids_alias}.exwiw_scope_id"
263
+ end
264
+
236
265
  private def compile_where_condition(where_clause, table_name)
237
266
  # Use as it is if it's a raw query
238
267
  return where_clause if where_clause.is_a?(String)
data/lib/exwiw/adapter.rb CHANGED
@@ -242,6 +242,43 @@ module Exwiw
242
242
  private def null_preserving(ast, column, masked_expr)
243
243
  "CASE WHEN #{ast.from_table_name}.#{column.name} IS NOT NULL THEN #{masked_expr} ELSE NULL END"
244
244
  end
245
+
246
+ # Split an outer query's WHERE clauses into the scope id-set clauses to
247
+ # lift into a materialized derived-table JOIN (see each adapter's
248
+ # #compile_scope_join) and the remaining plain clauses (kept in WHERE).
249
+ # Returns [scope_clauses, plain_clauses]; #partition keeps each clause in
250
+ # its original order *within* its own group. The two groups are emitted in
251
+ # different SQL positions (a JOIN vs the WHERE), so their interleaving is
252
+ # irrelevant — only the order within each group matters, and that is kept.
253
+ private def partition_scope_clauses(where_clauses)
254
+ where_clauses.partition { |where_clause| scope_subquery_clause?(where_clause) }
255
+ end
256
+
257
+ # Whether a WHERE clause is a scope id-set probe that should be lifted into
258
+ # a JOIN against a materialized derived table. Only the SelectSubquery /
259
+ # UnionSubquery shapes (reverse_scope UNION, forward cascade, single
260
+ # referenced_by) qualify: they project over potentially huge tables and, as
261
+ # `<col> IN (subquery)`, can degrade into a correlated DEPENDENT SUBQUERY
262
+ # re-evaluated per outer row. The flat ids_field `Subquery` is deliberately
263
+ # left as a plain IN — it is a small, bounded, uncorrelated probe.
264
+ private def scope_subquery_clause?(where_clause)
265
+ where_clause.is_a?(Exwiw::QueryAst::WhereClause) &&
266
+ where_clause.operator == :in_subquery &&
267
+ (where_clause.value.is_a?(Exwiw::QueryAst::SelectSubquery) ||
268
+ where_clause.value.is_a?(Exwiw::QueryAst::UnionSubquery))
269
+ end
270
+
271
+ # The bare name of the single column a scope subquery projects, used to
272
+ # reference it inside the materialized derived table. For a UNION the
273
+ # output column name comes from the first arm.
274
+ private def subquery_projection_name(subquery)
275
+ case subquery
276
+ when Exwiw::QueryAst::SelectSubquery
277
+ subquery.query.columns.first.name
278
+ when Exwiw::QueryAst::UnionSubquery
279
+ subquery.queries.first.columns.first.name
280
+ end
281
+ end
245
282
  end
246
283
 
247
284
  # @params [Exwiw::QueryAst] query_ast
@@ -2,8 +2,8 @@
2
2
 
3
3
  module Exwiw
4
4
  class QueryAstBuilder
5
- def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
6
- new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse, allow_forward: allow_forward).run
5
+ def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true, forward_path: [])
6
+ new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse, forward_path: forward_path).run
7
7
  end
8
8
 
9
9
  # Scope-column mode classification for a single table. One of
@@ -49,17 +49,20 @@ module Exwiw
49
49
 
50
50
  attr_reader :table_name, :table_by_name, :dump_target
51
51
 
52
- def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true, allow_forward: true)
52
+ def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true, forward_path: [])
53
53
  @table_name = table_name
54
54
  @table_by_name = table_by_name
55
55
  @dump_target = dump_target
56
56
  @logger = logger
57
57
  @allow_reverse = allow_reverse
58
- # @allow_forward gates the "scope via an indirectly-scoped belongs_to
59
- # parent" rescue (build_belongs_to_scoped_clause). Disabled while building a
60
- # parent/child subquery so a single forward hop never recurses into another
61
- # (which could loop on a belongs_to cycle).
62
- @allow_forward = allow_forward
58
+ # @forward_path is the chain of tables currently being forward-resolved by
59
+ # the "scope via an indirectly-scoped belongs_to parent" rescue
60
+ # (build_belongs_to_scoped_clause). Each forward hop appends the table it is
61
+ # descending from, so the rescue recurses N levels (users -> end_users ->
62
+ # end_user_profiles -> ...) and stops only on a real belongs_to cycle: a
63
+ # table already on the path is not re-resolved, falling through to
64
+ # :unscopable instead of looping forever.
65
+ @forward_path = forward_path
63
66
  end
64
67
 
65
68
  def run
@@ -187,10 +190,11 @@ module Exwiw
187
190
  next if relation.nil? || relation.polymorphic?
188
191
 
189
192
  # Build the child's own extraction query. allow_reverse:false stops a
190
- # chain of FK-less tables from recursing back into each other;
191
- # allow_forward:false stops the child from forward-scoping back through
192
- # this very table (which would loop).
193
- child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false, allow_forward: false)
193
+ # chain of FK-less tables from recursing back into each other; adding this
194
+ # table to forward_path stops the child from forward-scoping back through
195
+ # it (which would loop) while still letting the child forward-scope
196
+ # through other tables.
197
+ child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false, forward_path: @forward_path + [table.name])
194
198
 
195
199
  # Only an *already constrained* child narrows anything; an unconstrained
196
200
  # child would select every fk value (i.e. dump all) and not help.
@@ -248,12 +252,12 @@ module Exwiw
248
252
  next
249
253
  end
250
254
 
251
- # Build the referencer's own scoped extraction query. allow_reverse /
252
- # allow_forward are disabled to bound recursion exactly as the
253
- # single-referencer path does (a referencer scopable only via its own
254
- # reverse/forward rescue would loop or, worse, recurse back into this
255
- # table).
256
- ref_query = self.class.run(referencer.name, table_by_name, dump_target, @logger, allow_reverse: false, allow_forward: false)
255
+ # Build the referencer's own scoped extraction query. allow_reverse is
256
+ # disabled and this table is added to forward_path to bound recursion
257
+ # exactly as the single-referencer path does (a referencer that could only
258
+ # be scoped by recursing back into this table would loop); the referencer
259
+ # may still forward-scope through other tables.
260
+ ref_query = self.class.run(referencer.name, table_by_name, dump_target, @logger, allow_reverse: false, forward_path: @forward_path + [table.name])
257
261
 
258
262
  unless ref_query.where_clauses.any? || ref_query.join_clauses.any?
259
263
  @logger.warn(
@@ -297,6 +301,13 @@ module Exwiw
297
301
  # them out of a full dump. Returns nil when there is no single, unambiguous
298
302
  # scopable parent, leaving the caller on the unscopable path.
299
303
  private def build_belongs_to_scoped_clause(table)
304
+ # This table plus every ancestor currently being forward-resolved. A
305
+ # candidate parent already on this path would close a belongs_to cycle, so
306
+ # it is skipped; threading the grown path into the parent build lets the
307
+ # cascade recurse N hops (users -> end_users -> end_user_profiles -> ...)
308
+ # and terminate only when a table reappears.
309
+ forward_path = @forward_path + [table.name]
310
+
300
311
  candidates = table.belongs_tos.filter_map do |relation|
301
312
  # A polymorphic belongs_to points at several parent tables through one
302
313
  # column, so it cannot project to a single parent id set; skip it.
@@ -305,10 +316,15 @@ module Exwiw
305
316
  parent = table_by_name[relation.table_name]
306
317
  next if parent.nil?
307
318
 
319
+ # Cycle guard: descending into a parent already on the forward path would
320
+ # loop (a -> b -> a). Stop, leaving this table on the :unscopable path.
321
+ next if forward_path.include?(parent.name)
322
+
308
323
  # Build the parent's own scoped query. allow_reverse stays true so the
309
- # parent may be scoped via referenced_by; allow_forward:false bounds this
310
- # to a single forward hop so a belongs_to cycle cannot loop.
311
- parent_query = self.class.run(parent.name, table_by_name, dump_target, @logger, allow_reverse: true, allow_forward: false)
324
+ # parent may be scoped via referenced_by, and forward scoping stays
325
+ # enabled so a parent that is itself scoped via *its* parent resolves
326
+ # too this is what makes the cascade multi-hop.
327
+ parent_query = self.class.run(parent.name, table_by_name, dump_target, @logger, allow_reverse: true, forward_path: forward_path)
312
328
 
313
329
  # Only a constrained parent narrows anything; an unconstrained parent
314
330
  # would select every pk (i.e. dump all) and not help.
@@ -460,11 +476,18 @@ module Exwiw
460
476
  return :direct if directly_scoped?(table)
461
477
  return :via_path if build_join_clauses_scoped(table).any?
462
478
  return :referenced_by if @allow_reverse && build_referenced_by_clause(table)
463
- return :via_scoped_parent if @allow_forward && build_belongs_to_scoped_clause(table)
479
+ return :via_scoped_parent if forward_scope_allowed?(table) && build_belongs_to_scoped_clause(table)
464
480
 
465
481
  :unscopable
466
482
  end
467
483
 
484
+ # True when this table may still attempt the forward "scope via a scoped
485
+ # belongs_to parent" rescue: it is not already on the forward-resolution
486
+ # path, so descending into its parent cannot revisit it (a belongs_to cycle).
487
+ private def forward_scope_allowed?(table)
488
+ !@forward_path.include?(table.name)
489
+ end
490
+
468
491
  private def build_scoped(table)
469
492
  ast = QueryAst::Select.new
470
493
  ast.from(table.name)
@@ -502,11 +525,13 @@ module Exwiw
502
525
  end
503
526
  end
504
527
 
505
- if @allow_forward
528
+ if forward_scope_allowed?(table)
506
529
  # Belongs_to a parent that is itself scoped but carries no scope column of
507
530
  # its own (so via_path cannot terminate on it) — e.g. a hub table scoped
508
- # only via referenced_by. Constrain this table to that parent's in-scope
509
- # ids so its rows ride along instead of being dumped in full.
531
+ # only via referenced_by, or a parent that is itself scoped through *its*
532
+ # parent. Constrain this table to that parent's in-scope ids so its rows
533
+ # ride along instead of being dumped in full; the parent build recurses
534
+ # the cascade further up.
510
535
  parent_clause = build_belongs_to_scoped_clause(table)
511
536
  if parent_clause
512
537
  ast.where(parent_clause)
@@ -514,12 +539,13 @@ module Exwiw
514
539
  end
515
540
  end
516
541
 
517
- # Only the genuine top-level build (no rescue disabled) is allowed to fail
518
- # hard. The Runner/ExplainRunner pre-flight (validate_scope!) rejects
519
- # unscopable tables before extraction, so a top-level build never
520
- # legitimately lands here; if it does, raise rather than emit an unfiltered
521
- # (potential full PII) dump.
522
- if @allow_reverse && @allow_forward
542
+ # Only the genuine top-level build (allow_reverse on, forward_path empty
543
+ # i.e. no rescue subquery in progress) is allowed to fail hard. The
544
+ # Runner/ExplainRunner pre-flight (validate_scope!) rejects unscopable
545
+ # tables before extraction, so a top-level build never legitimately lands
546
+ # here; if it does, raise rather than emit an unfiltered (potential full
547
+ # PII) dump.
548
+ if @allow_reverse && @forward_path.empty?
523
549
  raise ArgumentError, scope_unscopable_message(table)
524
550
  end
525
551
 
data/lib/exwiw/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Exwiw
4
- VERSION = "0.8.2"
4
+ VERSION = "0.8.4"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: exwiw
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.2
4
+ version: 0.8.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shia