exwiw 0.8.2 → 0.8.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +12 -0
- data/README.md +49 -22
- data/lib/exwiw/adapter/mysql_adapter.rb +39 -3
- data/lib/exwiw/adapter/postgresql_adapter.rb +41 -3
- data/lib/exwiw/adapter/sqlite_adapter.rb +32 -3
- data/lib/exwiw/adapter.rb +37 -0
- data/lib/exwiw/query_ast_builder.rb +57 -31
- data/lib/exwiw/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: ee150e9fd830a7829be8c0eebbeca19f0e2d1b1dbcce50fe78e63d62e878997c
|
|
4
|
+
data.tar.gz: 31377340d8737ece209e1c0755eeac49ac40930562503e2c4d04f6362938e9d9
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 26151e0bf763b6f48e993d2025b4bac804a373fe2e3cfd71bb70d91b86b453826838ec683ebfd3f4f4336bb83b8a12d567d7dbda7f8a4ef48fc7f50bbe98145a
|
|
7
|
+
data.tar.gz: b29a2da42750dde93cd53eb56a11a592478134b690ccba31b59cd312f3685dad2713e63897d09cb77246a8be37dfe55890f1a8505c1aa8b5fd020b7bc7760a03
|
data/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,18 @@
|
|
|
2
2
|
|
|
3
3
|
## [Unreleased]
|
|
4
4
|
|
|
5
|
+
## [0.8.4] - 2026-06-24
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
|
|
9
|
+
- **Scope id-sets are materialized and probed by JOIN instead of `<col> IN (subquery)`, removing a correlated full-scan on large tables.** The three id-set scope shapes — the multi-referencer `reverse_scope` `UNION`, the single-referencer reverse/`referenced_by` extraction, and the multi-hop forward (`via_scoped_parent`) cascade — were emitted as `<col> IN (<subquery>)`. On a large global-identity table (e.g. `users`) MySQL cannot turn a `UNION` subquery into a materialized semi-join and falls back to its IN-to-`EXISTS` rewrite: a correlated `DEPENDENT SUBQUERY`/`DEPENDENT UNION` re-evaluated **per outer row**, so the driving table is full-scanned and the union re-run for every row (the plan that ran for minutes and timed out on a production-scale identity table, even for an empty tenant). These clauses are now lifted into a `JOIN` against a materialized derived table — `JOIN (SELECT DISTINCT src.<id> AS exwiw_scope_id FROM (<id-set subquery>) AS src) AS ids ON <table>.<col> = ids.exwiw_scope_id` — so the engine evaluates the id set **once** (the `DISTINCT` makes the derived table non-mergeable, hence materialized) and probes the outer table by primary key. The `DISTINCT` also dedups, so the result set is identical to the old `IN` form; the cascade nests the same way (each level materialized once); NULL exclusion, the forward-path cycle guard, single-parent/polymorphic skips, and PostgreSQL's `uuid`/`varchar` `::text` reconciliation are all preserved. All three SQL adapters (mysql / postgresql / sqlite). See the [README](README.md#why-a-join-not-in-subquery).
|
|
10
|
+
|
|
11
|
+
## [0.8.3] - 2026-06-24
|
|
12
|
+
|
|
13
|
+
### Fixed
|
|
14
|
+
|
|
15
|
+
- **Forward scope (`via_scoped_parent`) now cascades across multiple `belongs_to` hops instead of dying after one.** A table with no scope column of its own is scoped by constraining it to its `belongs_to` parent's in-scope ids (`fk IN (SELECT parent.pk FROM <parent's scoped query>)`). Previously the parent was rebuilt with forward scoping turned off, so if the parent was *itself* scoped only through *its* parent (e.g. an identity-family table two or more hops below a `reverse_scope`/`referenced_by` table — `users ← end_users ← end_user_profiles`), the rebuilt parent came back unconstrained and the child was classified `:unscopable` — forcing a `scope_exempt` full dump and re-introducing the bloat the prune removes. The boolean single-hop bound is replaced by a forward-path guard: the rescue keeps forward scoping enabled while rebuilding the parent (appending the current table to the path), so the cascade recurses N levels and produces a correspondingly nested `IN (subquery)`; it terminates only on a genuine `belongs_to` cycle (a table already on the path is not revisited, falling through to `:unscopable`). The single-unambiguous-parent rule and the polymorphic-skip are unchanged, and the reverse arms still cannot loop back through the table being reverse-scoped. SQL adapters only.
|
|
16
|
+
|
|
5
17
|
## [0.8.2] - 2026-06-24
|
|
6
18
|
|
|
7
19
|
### Added
|
data/README.md
CHANGED
|
@@ -179,12 +179,18 @@ Each table is resolved as follows:
|
|
|
179
179
|
nearest such table and applies the scope filter there (the same join machinery
|
|
180
180
|
the single-target mode uses).
|
|
181
181
|
- **`belongs_to` a parent that is itself scoped but carries no scope column of its
|
|
182
|
-
own** → exwiw constrains this table to the parent's in-scope ids
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
182
|
+
own** → exwiw constrains this table to the parent's in-scope ids by joining it to
|
|
183
|
+
the parent's scoped query, materialized as a derived table
|
|
184
|
+
(`JOIN (SELECT DISTINCT parent.pk … FROM <parent's scoped query>) … ON fk = …`).
|
|
185
|
+
This covers a *hub* table that has no scope column and is scoped only because an
|
|
186
|
+
extractable child references it (see referenced-by below): the hub's other
|
|
187
|
+
`belongs_to` children ride along to just the in-scope rows instead of being dumped
|
|
188
|
+
in full. The parent itself may be scoped the same way, so this **cascades across
|
|
189
|
+
multiple hops** (each a single unambiguous scopable parent) and the derived-table
|
|
190
|
+
JOINs nest correspondingly; the recursion terminates on a genuine `belongs_to`
|
|
191
|
+
cycle (a table already on the path is left `:unscopable` rather than looped on).
|
|
192
|
+
(See [Why a JOIN, not `IN (subquery)`](#why-a-join-not-in-subquery) for the
|
|
193
|
+
materialization rationale.)
|
|
188
194
|
- **Cannot be scoped at all** (no scope column and no path to one) → exwiw
|
|
189
195
|
**aborts** and lists the offending tables, so an unscoped table is never silently
|
|
190
196
|
dumped in full. For each, either declare a `scope_column`, add a `belongs_to`
|
|
@@ -565,15 +571,19 @@ The same type filter is applied on the join path — and in the matching `delete
|
|
|
565
571
|
ActiveStorage is handled automatically — no ActiveStorage-specific configuration is required. The `has_one_attached` / `has_many_attached` macros don't add a column to the owning model; they generate ordinary associations that exwiw already understands:
|
|
566
572
|
|
|
567
573
|
- **`active_storage_attachments`** is the polymorphic join row (`belongs_to :record, polymorphic: true` + `belongs_to :blob`). `exwiw:schema:generate` expands the polymorphic `record` into one `belongs_to` per model that declared `has_*_attached` (found via the generated `has_* ..., as: :record` reflections), exactly like any other [polymorphic `belongs_to`](#polymorphic-belongs_to). So only the attachments whose owner is among the dumped rows are extracted.
|
|
568
|
-
- **`active_storage_blobs`** has no `belongs_to` of its own (attachments point *at* it), so it has no path to the dump target. exwiw narrows it via **reverse / "referenced_by" extraction**: a parent table referenced by exactly one constrained, non-polymorphic child is constrained to just the referenced ids instead of dumping every row:
|
|
574
|
+
- **`active_storage_blobs`** has no `belongs_to` of its own (attachments point *at* it), so it has no path to the dump target. exwiw narrows it via **reverse / "referenced_by" extraction**: a parent table referenced by exactly one constrained, non-polymorphic child is constrained to just the referenced ids instead of dumping every row. The id set is materialized once and joined back (see [Why a JOIN, not `IN (subquery)`](#why-a-join-not-in-subquery)):
|
|
569
575
|
|
|
570
576
|
```sql
|
|
571
577
|
SELECT active_storage_blobs.* FROM active_storage_blobs
|
|
572
|
-
|
|
573
|
-
SELECT
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
|
|
578
|
+
JOIN (
|
|
579
|
+
SELECT DISTINCT exwiw_scope_src_0.blob_id AS exwiw_scope_id
|
|
580
|
+
FROM (
|
|
581
|
+
SELECT active_storage_attachments.blob_id FROM active_storage_attachments
|
|
582
|
+
WHERE active_storage_attachments.record_id IN (/* owner subquery */)
|
|
583
|
+
AND active_storage_attachments.record_type = '...'
|
|
584
|
+
) AS exwiw_scope_src_0
|
|
585
|
+
) AS exwiw_scope_ids_0
|
|
586
|
+
ON active_storage_blobs.id = exwiw_scope_ids_0.exwiw_scope_id
|
|
577
587
|
```
|
|
578
588
|
|
|
579
589
|
`active_storage_variant_records` also references blobs, but since it has no path of its own to the dump target it doesn't constrain anything and is ignored as a referencer — blobs stays narrowed to the attachment-referenced ids. (A parent referenced by *multiple* constrained children currently falls back to dumping all of its rows.)
|
|
@@ -600,18 +610,22 @@ The automatic reverse extraction above narrows a table referenced by **exactly o
|
|
|
600
610
|
}
|
|
601
611
|
```
|
|
602
612
|
|
|
603
|
-
produces (each arm reuses that referencer's own scope, so a per-tenant run keeps only that tenant's ids):
|
|
613
|
+
produces (each arm reuses that referencer's own scope, so a per-tenant run keeps only that tenant's ids; the `UNION` id set is materialized once and joined back — see [Why a JOIN, not `IN (subquery)`](#why-a-join-not-in-subquery)):
|
|
604
614
|
|
|
605
615
|
```sql
|
|
606
616
|
SELECT users.* FROM users
|
|
607
|
-
|
|
608
|
-
SELECT
|
|
609
|
-
|
|
610
|
-
|
|
611
|
-
|
|
612
|
-
|
|
613
|
-
|
|
614
|
-
|
|
617
|
+
JOIN (
|
|
618
|
+
SELECT DISTINCT exwiw_scope_src_0.user_id AS exwiw_scope_id
|
|
619
|
+
FROM (
|
|
620
|
+
SELECT customers.user_id FROM customers WHERE <customers' scope> AND customers.user_id IS NOT NULL
|
|
621
|
+
UNION
|
|
622
|
+
SELECT staff.user_id FROM staff WHERE <staff' scope> AND staff.user_id IS NOT NULL
|
|
623
|
+
UNION
|
|
624
|
+
SELECT business_entity_customers.kantan_yoyaku_user_id FROM business_entity_customers
|
|
625
|
+
WHERE <…' scope> AND business_entity_customers.kantan_yoyaku_user_id IS NOT NULL
|
|
626
|
+
) AS exwiw_scope_src_0
|
|
627
|
+
) AS exwiw_scope_ids_0
|
|
628
|
+
ON users.id = exwiw_scope_ids_0.exwiw_scope_id
|
|
615
629
|
```
|
|
616
630
|
|
|
617
631
|
Notes:
|
|
@@ -619,9 +633,22 @@ Notes:
|
|
|
619
633
|
- **`column` is explicit**, so a *non-default* foreign key (e.g. `kantan_yoyaku_user_id`, or `organization_admins.id` which itself references `users.id`) is honored, and even a column with no declared `belongs_to` edge can be enumerated.
|
|
620
634
|
- **Only scoped referencers belong in `via`.** Each arm's query must come out constrained; an unconstrained referencer (e.g. a `scope_exempt` table, or one with no path to a scope) would project *every* id and union the whole table back — so such an arm is **skipped with a warning** rather than silently widening the dump. An unknown table is likewise skipped with a warning. If no arm survives, the table stays unscopable and (in [scope-column mode](#scope-column-mode)) the run aborts via `validate_scope!`.
|
|
621
635
|
- **NULLs are excluded** per arm (`IS NOT NULL`).
|
|
622
|
-
- **Satellites need no config.** A table that `belongs_to` the reverse-scoped table (e.g. `end_users.id → users.id`, or `identities.user_id → users.id`) tightens to the kept ids automatically through the normal cascade — only the reverse-scoped table itself declares `reverse_scope`.
|
|
636
|
+
- **Satellites need no config.** A table that `belongs_to` the reverse-scoped table (e.g. `end_users.id → users.id`, or `identities.user_id → users.id`) tightens to the kept ids automatically through the normal cascade — only the reverse-scoped table itself declares `reverse_scope`. The cascade is **multi-hop**, so a table several `belongs_to` hops below the reverse-scoped table (e.g. `end_user_profiles → end_users → users`) also tightens automatically, with no config of its own.
|
|
623
637
|
- Works in both single-target and scope-column mode. Polymorphic foreign keys are not eligible as anchors (the named `column` is always a concrete column).
|
|
624
638
|
|
|
639
|
+
### Why a JOIN, not `IN (subquery)`
|
|
640
|
+
|
|
641
|
+
Every scope id-set above — the multi-referencer `reverse_scope` `UNION`, the single-referencer reverse extraction, and the multi-hop forward cascade — is emitted as a `JOIN` to a `SELECT DISTINCT` derived table rather than `<col> IN (<subquery>)`:
|
|
642
|
+
|
|
643
|
+
```sql
|
|
644
|
+
… JOIN (SELECT DISTINCT src.<id> AS exwiw_scope_id FROM (<id-set subquery>) AS src) AS ids
|
|
645
|
+
ON <table>.<col> = ids.exwiw_scope_id
|
|
646
|
+
```
|
|
647
|
+
|
|
648
|
+
Both forms select the **same rows** — the `DISTINCT` dedups, so the join never fans out — but the query plans differ sharply on a large table. As `<col> IN (… UNION …)`, MySQL cannot turn a `UNION` subquery into a materialized semi-join and falls back to its IN-to-`EXISTS` rewrite: a **correlated `DEPENDENT SUBQUERY`** re-evaluated for every outer row, i.e. a full scan of the (potentially huge) outer table multiplied by the cost of the union. The derived-table form forces the engine to evaluate the id set **once** (the `DISTINCT` makes the derived table non-mergeable, hence materialized) and then probe the outer table by its primary key. On a global-identity table such as `users` this is the difference between a full table scan and an index lookup; the cascade nests the same way, so each level is materialized once instead of being re-evaluated by the level above.
|
|
649
|
+
|
|
650
|
+
All three SQL adapters (mysql / postgresql / sqlite) emit this shape. PostgreSQL additionally reconciles a `uuid`/`varchar` type mismatch by casting the join key and the projected id to `text`, exactly as the old `IN` form did.
|
|
651
|
+
|
|
625
652
|
### Rails-managed tables (special `type` values)
|
|
626
653
|
|
|
627
654
|
Some tables are owned by Rails itself rather than the application — they have no ActiveRecord model and Rails reserves the right to evolve their column shape between versions (e.g. `schema_migrations`, `ar_internal_metadata`). exwiw treats them as a distinct category via the `type` field on a table config:
|
|
@@ -229,11 +229,18 @@ module Exwiw
|
|
|
229
229
|
def compile_ast(query_ast, count_only: false)
|
|
230
230
|
raise NotImplementedError unless query_ast.is_a?(Exwiw::QueryAst::Select)
|
|
231
231
|
|
|
232
|
+
# Lift scope id-set clauses (reverse_scope UNION / forward cascade /
|
|
233
|
+
# single referenced_by) out of `WHERE <col> IN (subquery)` and into a
|
|
234
|
+
# JOIN against a materialized derived table. See #compile_scope_join.
|
|
235
|
+
scope_clauses, plain_where_clauses = partition_scope_clauses(query_ast.where_clauses)
|
|
236
|
+
|
|
232
237
|
sql = "SELECT "
|
|
233
238
|
sql += if count_only
|
|
234
239
|
"COUNT(*)"
|
|
235
240
|
elsif query_ast.select_all
|
|
236
|
-
|
|
241
|
+
# A lifted scope JOIN brings a derived table into FROM, so a bare
|
|
242
|
+
# `*` would also project its column. Qualify to this table's own.
|
|
243
|
+
scope_clauses.any? ? "#{query_ast.from_table_name}.*" : "*"
|
|
237
244
|
else
|
|
238
245
|
query_ast.columns.map { |col| compile_column_name(query_ast, col) }.join(', ')
|
|
239
246
|
end
|
|
@@ -256,14 +263,43 @@ module Exwiw
|
|
|
256
263
|
end
|
|
257
264
|
end
|
|
258
265
|
|
|
259
|
-
|
|
266
|
+
scope_clauses.each_with_index do |where_clause, idx|
|
|
267
|
+
sql += " #{compile_scope_join(query_ast.from_table_name, where_clause, idx)}"
|
|
268
|
+
end
|
|
269
|
+
|
|
270
|
+
if plain_where_clauses.any?
|
|
260
271
|
sql += " WHERE "
|
|
261
|
-
sql +=
|
|
272
|
+
sql += plain_where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
|
|
262
273
|
end
|
|
263
274
|
|
|
264
275
|
sql
|
|
265
276
|
end
|
|
266
277
|
|
|
278
|
+
# Render a scope id-set clause as a JOIN to a materialized derived table:
|
|
279
|
+
#
|
|
280
|
+
# JOIN (SELECT DISTINCT src.<proj> AS exwiw_scope_id
|
|
281
|
+
# FROM (<subquery>) AS src) AS ids
|
|
282
|
+
# ON <table>.<col> = ids.exwiw_scope_id
|
|
283
|
+
#
|
|
284
|
+
# The DISTINCT makes the derived table non-mergeable, so MySQL materializes
|
|
285
|
+
# the id-set once and probes this table by it (PK/index lookup) — instead
|
|
286
|
+
# of full-scanning this table and re-evaluating a correlated
|
|
287
|
+
# `IN (… UNION …)` per row (the DEPENDENT SUBQUERY / IN-to-EXISTS fallback,
|
|
288
|
+
# which a UNION subquery cannot be turned into a materialized semi-join).
|
|
289
|
+
# DISTINCT also dedups, so the join never fans out: the row set is identical
|
|
290
|
+
# to `<col> IN (subquery)`.
|
|
291
|
+
private def compile_scope_join(from_table_name, where_clause, idx)
|
|
292
|
+
subquery = where_clause.value
|
|
293
|
+
projection = subquery_projection_name(subquery)
|
|
294
|
+
src_alias = "exwiw_scope_src_#{idx}"
|
|
295
|
+
ids_alias = "exwiw_scope_ids_#{idx}"
|
|
296
|
+
outer_key = "#{from_table_name}.#{where_clause.column_name}"
|
|
297
|
+
|
|
298
|
+
"JOIN (SELECT DISTINCT #{src_alias}.#{projection} AS exwiw_scope_id " \
|
|
299
|
+
"FROM (#{compile_subquery(subquery)}) AS #{src_alias}) AS #{ids_alias} " \
|
|
300
|
+
"ON #{outer_key} = #{ids_alias}.exwiw_scope_id"
|
|
301
|
+
end
|
|
302
|
+
|
|
267
303
|
private def compile_where_condition(where_clause, table_name)
|
|
268
304
|
# Use as it is if it's a raw query
|
|
269
305
|
return where_clause if where_clause.is_a?(String)
|
|
@@ -301,9 +301,16 @@ module Exwiw
|
|
|
301
301
|
def compile_ast(query_ast, select_cast_to: nil)
|
|
302
302
|
raise NotImplementedError unless query_ast.is_a?(Exwiw::QueryAst::Select)
|
|
303
303
|
|
|
304
|
+
# Lift scope id-set clauses (reverse_scope UNION / forward cascade /
|
|
305
|
+
# single referenced_by) out of `WHERE <col> IN (subquery)` and into a
|
|
306
|
+
# JOIN against a materialized derived table. See #compile_scope_join.
|
|
307
|
+
scope_clauses, plain_where_clauses = partition_scope_clauses(query_ast.where_clauses)
|
|
308
|
+
|
|
304
309
|
sql = "SELECT "
|
|
305
310
|
sql += if query_ast.select_all
|
|
306
|
-
|
|
311
|
+
# A lifted scope JOIN brings a derived table into FROM, so a bare
|
|
312
|
+
# `*` would also project its column. Qualify to this table's own.
|
|
313
|
+
scope_clauses.any? ? "#{query_ast.from_table_name}.*" : "*"
|
|
307
314
|
else
|
|
308
315
|
cols = query_ast.columns.map { |col| compile_column_name(query_ast, col) }
|
|
309
316
|
cols = cols.map { |c| "#{c}::#{select_cast_to}" } if select_cast_to
|
|
@@ -337,14 +344,45 @@ module Exwiw
|
|
|
337
344
|
end
|
|
338
345
|
end
|
|
339
346
|
|
|
340
|
-
|
|
347
|
+
scope_clauses.each_with_index do |where_clause, idx|
|
|
348
|
+
sql += " #{compile_scope_join(query_ast.from_table_name, where_clause, idx)}"
|
|
349
|
+
end
|
|
350
|
+
|
|
351
|
+
if plain_where_clauses.any?
|
|
341
352
|
sql += " WHERE "
|
|
342
|
-
sql +=
|
|
353
|
+
sql += plain_where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
|
|
343
354
|
end
|
|
344
355
|
|
|
345
356
|
sql
|
|
346
357
|
end
|
|
347
358
|
|
|
359
|
+
# Render a scope id-set clause as a JOIN to a materialized derived table
|
|
360
|
+
# (see MysqlAdapter#compile_scope_join for the full rationale). The DISTINCT
|
|
361
|
+
# forces the engine to materialize the id-set once and probe this table by
|
|
362
|
+
# it, instead of full-scanning and re-evaluating a correlated subquery per
|
|
363
|
+
# row; it also dedups, so the join is row-for-row identical to
|
|
364
|
+
# `<col> IN (subquery)`.
|
|
365
|
+
#
|
|
366
|
+
# Type reconciliation mirrors the old IN form: when the outer column and
|
|
367
|
+
# the projected id clash (e.g. uuid vs varchar), #compile_subquery already
|
|
368
|
+
# casts every arm to text, so the derived `exwiw_scope_id` is text and the
|
|
369
|
+
# outer key is cast to match.
|
|
370
|
+
private def compile_scope_join(from_table_name, where_clause, idx)
|
|
371
|
+
subquery = where_clause.value
|
|
372
|
+
projection = subquery_projection_name(subquery)
|
|
373
|
+
src_alias = "exwiw_scope_src_#{idx}"
|
|
374
|
+
ids_alias = "exwiw_scope_ids_#{idx}"
|
|
375
|
+
|
|
376
|
+
inner_sql = compile_subquery(subquery, outer_table: from_table_name, outer_column: where_clause.column_name)
|
|
377
|
+
cast_to = subquery_cast_to(subquery, from_table_name, where_clause.column_name)
|
|
378
|
+
outer_key = "#{from_table_name}.#{where_clause.column_name}"
|
|
379
|
+
outer_key = "#{outer_key}::#{cast_to}" if cast_to
|
|
380
|
+
|
|
381
|
+
"JOIN (SELECT DISTINCT #{src_alias}.#{projection} AS exwiw_scope_id " \
|
|
382
|
+
"FROM (#{inner_sql}) AS #{src_alias}) AS #{ids_alias} " \
|
|
383
|
+
"ON #{outer_key} = #{ids_alias}.exwiw_scope_id"
|
|
384
|
+
end
|
|
385
|
+
|
|
348
386
|
private def compile_where_condition(where_clause, table_name)
|
|
349
387
|
# Use as it is if it's a raw query
|
|
350
388
|
return where_clause if where_clause.is_a?(String)
|
|
@@ -198,11 +198,18 @@ module Exwiw
|
|
|
198
198
|
def compile_ast(query_ast, count_only: false)
|
|
199
199
|
raise NotImplementedError unless query_ast.is_a?(Exwiw::QueryAst::Select)
|
|
200
200
|
|
|
201
|
+
# Lift scope id-set clauses (reverse_scope UNION / forward cascade /
|
|
202
|
+
# single referenced_by) out of `WHERE <col> IN (subquery)` and into a
|
|
203
|
+
# JOIN against a materialized derived table. See #compile_scope_join.
|
|
204
|
+
scope_clauses, plain_where_clauses = partition_scope_clauses(query_ast.where_clauses)
|
|
205
|
+
|
|
201
206
|
sql = "SELECT "
|
|
202
207
|
sql += if count_only
|
|
203
208
|
"COUNT(*)"
|
|
204
209
|
elsif query_ast.select_all
|
|
205
|
-
|
|
210
|
+
# A lifted scope JOIN brings a derived table into FROM, so a bare
|
|
211
|
+
# `*` would also project its column. Qualify to this table's own.
|
|
212
|
+
scope_clauses.any? ? "#{query_ast.from_table_name}.*" : "*"
|
|
206
213
|
else
|
|
207
214
|
query_ast.columns.map { |col| compile_column_name(query_ast, col) }.join(', ')
|
|
208
215
|
end
|
|
@@ -225,14 +232,36 @@ module Exwiw
|
|
|
225
232
|
end
|
|
226
233
|
end
|
|
227
234
|
|
|
228
|
-
|
|
235
|
+
scope_clauses.each_with_index do |where_clause, idx|
|
|
236
|
+
sql += " #{compile_scope_join(query_ast.from_table_name, where_clause, idx)}"
|
|
237
|
+
end
|
|
238
|
+
|
|
239
|
+
if plain_where_clauses.any?
|
|
229
240
|
sql += " WHERE "
|
|
230
|
-
sql +=
|
|
241
|
+
sql += plain_where_clauses.map { |where| compile_where_condition(where, query_ast.from_table_name) }.join(' AND ')
|
|
231
242
|
end
|
|
232
243
|
|
|
233
244
|
sql
|
|
234
245
|
end
|
|
235
246
|
|
|
247
|
+
# Render a scope id-set clause as a JOIN to a materialized derived table
|
|
248
|
+
# (see MysqlAdapter#compile_scope_join for the full rationale). The DISTINCT
|
|
249
|
+
# forces the engine to materialize the id-set once and probe this table by
|
|
250
|
+
# it, instead of full-scanning and re-evaluating a correlated subquery per
|
|
251
|
+
# row; it also dedups, so the join is row-for-row identical to
|
|
252
|
+
# `<col> IN (subquery)`.
|
|
253
|
+
private def compile_scope_join(from_table_name, where_clause, idx)
|
|
254
|
+
subquery = where_clause.value
|
|
255
|
+
projection = subquery_projection_name(subquery)
|
|
256
|
+
src_alias = "exwiw_scope_src_#{idx}"
|
|
257
|
+
ids_alias = "exwiw_scope_ids_#{idx}"
|
|
258
|
+
outer_key = "#{from_table_name}.#{where_clause.column_name}"
|
|
259
|
+
|
|
260
|
+
"JOIN (SELECT DISTINCT #{src_alias}.#{projection} AS exwiw_scope_id " \
|
|
261
|
+
"FROM (#{compile_subquery(subquery)}) AS #{src_alias}) AS #{ids_alias} " \
|
|
262
|
+
"ON #{outer_key} = #{ids_alias}.exwiw_scope_id"
|
|
263
|
+
end
|
|
264
|
+
|
|
236
265
|
private def compile_where_condition(where_clause, table_name)
|
|
237
266
|
# Use as it is if it's a raw query
|
|
238
267
|
return where_clause if where_clause.is_a?(String)
|
data/lib/exwiw/adapter.rb
CHANGED
|
@@ -242,6 +242,43 @@ module Exwiw
|
|
|
242
242
|
private def null_preserving(ast, column, masked_expr)
|
|
243
243
|
"CASE WHEN #{ast.from_table_name}.#{column.name} IS NOT NULL THEN #{masked_expr} ELSE NULL END"
|
|
244
244
|
end
|
|
245
|
+
|
|
246
|
+
# Split an outer query's WHERE clauses into the scope id-set clauses to
|
|
247
|
+
# lift into a materialized derived-table JOIN (see each adapter's
|
|
248
|
+
# #compile_scope_join) and the remaining plain clauses (kept in WHERE).
|
|
249
|
+
# Returns [scope_clauses, plain_clauses]; #partition keeps each clause in
|
|
250
|
+
# its original order *within* its own group. The two groups are emitted in
|
|
251
|
+
# different SQL positions (a JOIN vs the WHERE), so their interleaving is
|
|
252
|
+
# irrelevant — only the order within each group matters, and that is kept.
|
|
253
|
+
private def partition_scope_clauses(where_clauses)
|
|
254
|
+
where_clauses.partition { |where_clause| scope_subquery_clause?(where_clause) }
|
|
255
|
+
end
|
|
256
|
+
|
|
257
|
+
# Whether a WHERE clause is a scope id-set probe that should be lifted into
|
|
258
|
+
# a JOIN against a materialized derived table. Only the SelectSubquery /
|
|
259
|
+
# UnionSubquery shapes (reverse_scope UNION, forward cascade, single
|
|
260
|
+
# referenced_by) qualify: they project over potentially huge tables and, as
|
|
261
|
+
# `<col> IN (subquery)`, can degrade into a correlated DEPENDENT SUBQUERY
|
|
262
|
+
# re-evaluated per outer row. The flat ids_field `Subquery` is deliberately
|
|
263
|
+
# left as a plain IN — it is a small, bounded, uncorrelated probe.
|
|
264
|
+
private def scope_subquery_clause?(where_clause)
|
|
265
|
+
where_clause.is_a?(Exwiw::QueryAst::WhereClause) &&
|
|
266
|
+
where_clause.operator == :in_subquery &&
|
|
267
|
+
(where_clause.value.is_a?(Exwiw::QueryAst::SelectSubquery) ||
|
|
268
|
+
where_clause.value.is_a?(Exwiw::QueryAst::UnionSubquery))
|
|
269
|
+
end
|
|
270
|
+
|
|
271
|
+
# The bare name of the single column a scope subquery projects, used to
|
|
272
|
+
# reference it inside the materialized derived table. For a UNION the
|
|
273
|
+
# output column name comes from the first arm.
|
|
274
|
+
private def subquery_projection_name(subquery)
|
|
275
|
+
case subquery
|
|
276
|
+
when Exwiw::QueryAst::SelectSubquery
|
|
277
|
+
subquery.query.columns.first.name
|
|
278
|
+
when Exwiw::QueryAst::UnionSubquery
|
|
279
|
+
subquery.queries.first.columns.first.name
|
|
280
|
+
end
|
|
281
|
+
end
|
|
245
282
|
end
|
|
246
283
|
|
|
247
284
|
# @params [Exwiw::QueryAst] query_ast
|
|
@@ -2,8 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
module Exwiw
|
|
4
4
|
class QueryAstBuilder
|
|
5
|
-
def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true,
|
|
6
|
-
new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse,
|
|
5
|
+
def self.run(table_name, table_by_name, dump_target, logger, allow_reverse: true, forward_path: [])
|
|
6
|
+
new(table_name, table_by_name, dump_target, logger, allow_reverse: allow_reverse, forward_path: forward_path).run
|
|
7
7
|
end
|
|
8
8
|
|
|
9
9
|
# Scope-column mode classification for a single table. One of
|
|
@@ -49,17 +49,20 @@ module Exwiw
|
|
|
49
49
|
|
|
50
50
|
attr_reader :table_name, :table_by_name, :dump_target
|
|
51
51
|
|
|
52
|
-
def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true,
|
|
52
|
+
def initialize(table_name, table_by_name, dump_target, logger, allow_reverse: true, forward_path: [])
|
|
53
53
|
@table_name = table_name
|
|
54
54
|
@table_by_name = table_by_name
|
|
55
55
|
@dump_target = dump_target
|
|
56
56
|
@logger = logger
|
|
57
57
|
@allow_reverse = allow_reverse
|
|
58
|
-
# @
|
|
59
|
-
#
|
|
60
|
-
#
|
|
61
|
-
#
|
|
62
|
-
|
|
58
|
+
# @forward_path is the chain of tables currently being forward-resolved by
|
|
59
|
+
# the "scope via an indirectly-scoped belongs_to parent" rescue
|
|
60
|
+
# (build_belongs_to_scoped_clause). Each forward hop appends the table it is
|
|
61
|
+
# descending from, so the rescue recurses N levels (users -> end_users ->
|
|
62
|
+
# end_user_profiles -> ...) and stops only on a real belongs_to cycle: a
|
|
63
|
+
# table already on the path is not re-resolved, falling through to
|
|
64
|
+
# :unscopable instead of looping forever.
|
|
65
|
+
@forward_path = forward_path
|
|
63
66
|
end
|
|
64
67
|
|
|
65
68
|
def run
|
|
@@ -187,10 +190,11 @@ module Exwiw
|
|
|
187
190
|
next if relation.nil? || relation.polymorphic?
|
|
188
191
|
|
|
189
192
|
# Build the child's own extraction query. allow_reverse:false stops a
|
|
190
|
-
# chain of FK-less tables from recursing back into each other;
|
|
191
|
-
#
|
|
192
|
-
#
|
|
193
|
-
|
|
193
|
+
# chain of FK-less tables from recursing back into each other; adding this
|
|
194
|
+
# table to forward_path stops the child from forward-scoping back through
|
|
195
|
+
# it (which would loop) while still letting the child forward-scope
|
|
196
|
+
# through other tables.
|
|
197
|
+
child_query = self.class.run(other.name, table_by_name, dump_target, @logger, allow_reverse: false, forward_path: @forward_path + [table.name])
|
|
194
198
|
|
|
195
199
|
# Only an *already constrained* child narrows anything; an unconstrained
|
|
196
200
|
# child would select every fk value (i.e. dump all) and not help.
|
|
@@ -248,12 +252,12 @@ module Exwiw
|
|
|
248
252
|
next
|
|
249
253
|
end
|
|
250
254
|
|
|
251
|
-
# Build the referencer's own scoped extraction query. allow_reverse
|
|
252
|
-
#
|
|
253
|
-
# single-referencer path does (a referencer
|
|
254
|
-
#
|
|
255
|
-
#
|
|
256
|
-
ref_query = self.class.run(referencer.name, table_by_name, dump_target, @logger, allow_reverse: false,
|
|
255
|
+
# Build the referencer's own scoped extraction query. allow_reverse is
|
|
256
|
+
# disabled and this table is added to forward_path to bound recursion
|
|
257
|
+
# exactly as the single-referencer path does (a referencer that could only
|
|
258
|
+
# be scoped by recursing back into this table would loop); the referencer
|
|
259
|
+
# may still forward-scope through other tables.
|
|
260
|
+
ref_query = self.class.run(referencer.name, table_by_name, dump_target, @logger, allow_reverse: false, forward_path: @forward_path + [table.name])
|
|
257
261
|
|
|
258
262
|
unless ref_query.where_clauses.any? || ref_query.join_clauses.any?
|
|
259
263
|
@logger.warn(
|
|
@@ -297,6 +301,13 @@ module Exwiw
|
|
|
297
301
|
# them out of a full dump. Returns nil when there is no single, unambiguous
|
|
298
302
|
# scopable parent, leaving the caller on the unscopable path.
|
|
299
303
|
private def build_belongs_to_scoped_clause(table)
|
|
304
|
+
# This table plus every ancestor currently being forward-resolved. A
|
|
305
|
+
# candidate parent already on this path would close a belongs_to cycle, so
|
|
306
|
+
# it is skipped; threading the grown path into the parent build lets the
|
|
307
|
+
# cascade recurse N hops (users -> end_users -> end_user_profiles -> ...)
|
|
308
|
+
# and terminate only when a table reappears.
|
|
309
|
+
forward_path = @forward_path + [table.name]
|
|
310
|
+
|
|
300
311
|
candidates = table.belongs_tos.filter_map do |relation|
|
|
301
312
|
# A polymorphic belongs_to points at several parent tables through one
|
|
302
313
|
# column, so it cannot project to a single parent id set; skip it.
|
|
@@ -305,10 +316,15 @@ module Exwiw
|
|
|
305
316
|
parent = table_by_name[relation.table_name]
|
|
306
317
|
next if parent.nil?
|
|
307
318
|
|
|
319
|
+
# Cycle guard: descending into a parent already on the forward path would
|
|
320
|
+
# loop (a -> b -> a). Stop, leaving this table on the :unscopable path.
|
|
321
|
+
next if forward_path.include?(parent.name)
|
|
322
|
+
|
|
308
323
|
# Build the parent's own scoped query. allow_reverse stays true so the
|
|
309
|
-
# parent may be scoped via referenced_by
|
|
310
|
-
#
|
|
311
|
-
|
|
324
|
+
# parent may be scoped via referenced_by, and forward scoping stays
|
|
325
|
+
# enabled so a parent that is itself scoped via *its* parent resolves
|
|
326
|
+
# too — this is what makes the cascade multi-hop.
|
|
327
|
+
parent_query = self.class.run(parent.name, table_by_name, dump_target, @logger, allow_reverse: true, forward_path: forward_path)
|
|
312
328
|
|
|
313
329
|
# Only a constrained parent narrows anything; an unconstrained parent
|
|
314
330
|
# would select every pk (i.e. dump all) and not help.
|
|
@@ -460,11 +476,18 @@ module Exwiw
|
|
|
460
476
|
return :direct if directly_scoped?(table)
|
|
461
477
|
return :via_path if build_join_clauses_scoped(table).any?
|
|
462
478
|
return :referenced_by if @allow_reverse && build_referenced_by_clause(table)
|
|
463
|
-
return :via_scoped_parent if
|
|
479
|
+
return :via_scoped_parent if forward_scope_allowed?(table) && build_belongs_to_scoped_clause(table)
|
|
464
480
|
|
|
465
481
|
:unscopable
|
|
466
482
|
end
|
|
467
483
|
|
|
484
|
+
# True when this table may still attempt the forward "scope via a scoped
|
|
485
|
+
# belongs_to parent" rescue: it is not already on the forward-resolution
|
|
486
|
+
# path, so descending into its parent cannot revisit it (a belongs_to cycle).
|
|
487
|
+
private def forward_scope_allowed?(table)
|
|
488
|
+
!@forward_path.include?(table.name)
|
|
489
|
+
end
|
|
490
|
+
|
|
468
491
|
private def build_scoped(table)
|
|
469
492
|
ast = QueryAst::Select.new
|
|
470
493
|
ast.from(table.name)
|
|
@@ -502,11 +525,13 @@ module Exwiw
|
|
|
502
525
|
end
|
|
503
526
|
end
|
|
504
527
|
|
|
505
|
-
if
|
|
528
|
+
if forward_scope_allowed?(table)
|
|
506
529
|
# Belongs_to a parent that is itself scoped but carries no scope column of
|
|
507
530
|
# its own (so via_path cannot terminate on it) — e.g. a hub table scoped
|
|
508
|
-
# only via referenced_by
|
|
509
|
-
#
|
|
531
|
+
# only via referenced_by, or a parent that is itself scoped through *its*
|
|
532
|
+
# parent. Constrain this table to that parent's in-scope ids so its rows
|
|
533
|
+
# ride along instead of being dumped in full; the parent build recurses
|
|
534
|
+
# the cascade further up.
|
|
510
535
|
parent_clause = build_belongs_to_scoped_clause(table)
|
|
511
536
|
if parent_clause
|
|
512
537
|
ast.where(parent_clause)
|
|
@@ -514,12 +539,13 @@ module Exwiw
|
|
|
514
539
|
end
|
|
515
540
|
end
|
|
516
541
|
|
|
517
|
-
# Only the genuine top-level build (
|
|
518
|
-
#
|
|
519
|
-
#
|
|
520
|
-
#
|
|
521
|
-
# (potential full
|
|
522
|
-
|
|
542
|
+
# Only the genuine top-level build (allow_reverse on, forward_path empty —
|
|
543
|
+
# i.e. no rescue subquery in progress) is allowed to fail hard. The
|
|
544
|
+
# Runner/ExplainRunner pre-flight (validate_scope!) rejects unscopable
|
|
545
|
+
# tables before extraction, so a top-level build never legitimately lands
|
|
546
|
+
# here; if it does, raise rather than emit an unfiltered (potential full
|
|
547
|
+
# PII) dump.
|
|
548
|
+
if @allow_reverse && @forward_path.empty?
|
|
523
549
|
raise ArgumentError, scope_unscopable_message(table)
|
|
524
550
|
end
|
|
525
551
|
|
data/lib/exwiw/version.rb
CHANGED