activerecord-materialized 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (59) hide show
  1. checksums.yaml +7 -0
  2. data/CHANGELOG.md +37 -0
  3. data/LICENSE +21 -0
  4. data/README.md +526 -0
  5. data/lib/activerecord/materialized/aggregate_analysis.rb +132 -0
  6. data/lib/activerecord/materialized/async_refresher.rb +105 -0
  7. data/lib/activerecord/materialized/cache_table_schema.rb +67 -0
  8. data/lib/activerecord/materialized/cold_read.rb +60 -0
  9. data/lib/activerecord/materialized/configuration.rb +80 -0
  10. data/lib/activerecord/materialized/delta_maintainer.rb +74 -0
  11. data/lib/activerecord/materialized/dependency_registry.rb +107 -0
  12. data/lib/activerecord/materialized/dependency_trackable.rb +48 -0
  13. data/lib/activerecord/materialized/incremental_maintainer.rb +58 -0
  14. data/lib/activerecord/materialized/maintenance_delta.rb +82 -0
  15. data/lib/activerecord/materialized/maintenance_delta_builder.rb +62 -0
  16. data/lib/activerecord/materialized/maintenance_store.rb +82 -0
  17. data/lib/activerecord/materialized/metadata/maintenance_payload.rb +33 -0
  18. data/lib/activerecord/materialized/metadata/schema.rb +84 -0
  19. data/lib/activerecord/materialized/metadata/timestamps.rb +31 -0
  20. data/lib/activerecord/materialized/metadata.rb +138 -0
  21. data/lib/activerecord/materialized/metadata_record.rb +28 -0
  22. data/lib/activerecord/materialized/migration_builder.rb +38 -0
  23. data/lib/activerecord/materialized/module_api.rb +82 -0
  24. data/lib/activerecord/materialized/partition_record.rb +27 -0
  25. data/lib/activerecord/materialized/partition_state.rb +127 -0
  26. data/lib/activerecord/materialized/query_expressions.rb +83 -0
  27. data/lib/activerecord/materialized/railtie.rb +16 -0
  28. data/lib/activerecord/materialized/refresh_callbacks.rb +62 -0
  29. data/lib/activerecord/materialized/refresh_job.rb +22 -0
  30. data/lib/activerecord/materialized/refresh_result.rb +40 -0
  31. data/lib/activerecord/materialized/refresh_scheduler.rb +54 -0
  32. data/lib/activerecord/materialized/refresher.rb +139 -0
  33. data/lib/activerecord/materialized/registry.rb +74 -0
  34. data/lib/activerecord/materialized/relation_cache_writer.rb +137 -0
  35. data/lib/activerecord/materialized/schema_verifier.rb +64 -0
  36. data/lib/activerecord/materialized/summary_delta.rb +76 -0
  37. data/lib/activerecord/materialized/summary_delta_builder.rb +58 -0
  38. data/lib/activerecord/materialized/table_model_registry.rb +43 -0
  39. data/lib/activerecord/materialized/tasks.rb +79 -0
  40. data/lib/activerecord/materialized/type_reexports.rb +14 -0
  41. data/lib/activerecord/materialized/version.rb +9 -0
  42. data/lib/activerecord/materialized/view.rb +79 -0
  43. data/lib/activerecord/materialized/view_class.rb +8 -0
  44. data/lib/activerecord/materialized/view_configuration_class_methods.rb +103 -0
  45. data/lib/activerecord/materialized/view_definition.rb +133 -0
  46. data/lib/activerecord/materialized/view_incremental_class_methods.rb +142 -0
  47. data/lib/activerecord/materialized/view_query_access_class_methods.rb +160 -0
  48. data/lib/activerecord/materialized/view_refresh_policy_class_methods.rb +109 -0
  49. data/lib/activerecord/materialized/write_change.rb +69 -0
  50. data/lib/activerecord/materialized.rb +55 -0
  51. data/lib/activerecord_materialized_types.rb +18 -0
  52. data/lib/generators/activerecord_materialized/install/templates/README +55 -0
  53. data/lib/generators/activerecord_materialized/install/templates/create_ar_materialized_view_metadata.rb.erb +30 -0
  54. data/lib/generators/activerecord_materialized/install_generator.rb +32 -0
  55. data/lib/generators/activerecord_materialized/migration_generator.rb +51 -0
  56. data/lib/generators/activerecord_materialized/templates/materialized_view.rb.erb +17 -0
  57. data/lib/generators/activerecord_materialized/templates/materialized_view_migration.rb.erb +11 -0
  58. data/lib/generators/activerecord_materialized/view_generator.rb +18 -0
  59. metadata +162 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 5d506e972959c083a0c0e735e98ce7d1ceca240b4590b09ef2f22e97cc4a8ab5
4
+ data.tar.gz: 1e0bfc6cc09ec287b7a6f4db7148ca426a724d653189856d0c63e53916a10ada
5
+ SHA512:
6
+ metadata.gz: 78f74eb576c9cb238ab33df4ea4e95f708db1d5e0f24d5f1b1123d1861deb5e6fdf150fe488fc81111d9f369ffba6324f1d13f38748fdc1f71fa11afdd0691c0
7
+ data.tar.gz: 594206c271a6cf42600b2c628ff9a126565d33dd3dfb54f8f84103161cd17b2e5b3dd8e22e7de4439785722f2395252c62f2ac2df99097ca6a2fc8b44a3233ca
data/CHANGELOG.md ADDED
@@ -0,0 +1,37 @@
1
+ # Changelog
2
+
3
+ ## 0.1.0 (2026-06-18)
4
+
5
+ Initial release.
6
+
7
+ ### Features
8
+
9
+ - Application-level materialized views for ActiveRecord (Rails 8+, Ruby 3.4+)
10
+ - Refresh-on-write: dependency changes schedule incremental background maintenance; reads never block on a rebuild
11
+ - Never an implicit full rebuild — a full materialization happens only via the explicit `rebuild!(confirm: true)` / `materialized:rebuild`, so launching against a large database is safe
12
+ - Read-through cold reads (`cold_read :read_through` default): reads on a not-yet-built view serve correct results from the source query; `:serve_stale` and `:raise` are also available
13
+ - Per-partition freshness: a cold view materializes individual `GROUP BY` partitions on demand (keyed reads and dependency writes), serving those partitions from the cache while the rest read through — partial materialization without ever a full rebuild
14
+ - Transparent ActiveRecord query interface (`where`, `find`, `count`, scopes)
15
+ - Declarative `materialized_from` sources defined as an `ActiveRecord::Relation` (via a block)
16
+ - `depends_on` dependency tracking via ActiveRecord `after_*_commit` callbacks
17
+ - Refresh strategies: `:async` (default), `:immediate`, `:manual`
18
+ - Debounced async refresh with in-process `AsyncRefresher` or ActiveJob dispatcher
19
+ - `rebuild!` materializes entirely in the database with `INSERT … SELECT` over the source query (atomic table swap), so the result set never crosses into Ruby memory — safe to run against a large dataset
20
+ - Warm-up: a `warm_up { [...] }` DSL plus `warm_up!` / `materialized:warm_up` materialize a cold view's hot partitions ahead of traffic, leaving the rest to read through on demand
21
+ - Incremental view maintenance (IVM) for `GROUP BY` views — never a routine table rebuild:
22
+ - **Summary-delta** maintenance for distributive views (`SUM` / `COUNT` / `COUNT(*)`): writes apply signed per-partition deltas to the cache table without re-reading base rows, with NULL-safe sums and empty-partition deletion
23
+ - **Scoped recompute** (partition-local delete + re-aggregate) for everything else — `AVG`, `MIN`, `MAX`, `COUNT(DISTINCT)`, joins, `HAVING` — always correct
24
+ - Metadata tracking (`dirty`, `last_refreshed_at`, `row_count`, `refresh_duration_ms`, errors)
25
+ - Optional `max_staleness` time-based safety net
26
+ - `before_refresh` / `after_refresh` callbacks
27
+ - Migration-provisioned cache tables: `activerecord_materialized:migration <View>` generates a `create_table` migration with columns/types inferred from the source relation, so the table exists at deploy time
28
+ - Boot/CI schema drift verification (`materialized:verify` / `ActiveRecord::Materialized.verify_schema!`) raises a helpful error when a view's table no longer matches its relation — never auto-alters
29
+ - Rails generators: `activerecord_materialized:install`, `activerecord_materialized:view`, `activerecord_materialized:migration`
30
+ - Rake tasks: `materialized:refresh_all`, `materialized:refresh_stale`, `materialized:rebuild`, `materialized:verify`, `materialized:warm_up`
31
+ - JOB-schema benchmark suite with multi-second analytical queries on SQLite
32
+
33
+ ### Benchmark highlights (xlarge dataset, ~2M cast_info rows)
34
+
35
+ - Raw queries: 7–20 seconds
36
+ - Materialized view reads: ~0.3–0.7ms
37
+ - Speedup: 20,000–49,000×
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Michael Avrukin
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,526 @@
1
+ <p align="center">
2
+ <picture>
3
+ <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/mavrukin/activerecord-materialized/main/assets/png/lockup-horizontal-dark.png">
4
+ <img alt="activerecord-materialized" src="https://raw.githubusercontent.com/mavrukin/activerecord-materialized/main/assets/png/lockup-horizontal.png" width="430">
5
+ </picture>
6
+ </p>
7
+
8
+ # activerecord-materialized
9
+
10
+ **Materialized views for Rails apps on databases that don't have them** — precompute an expensive query into a cache table, refresh it in the background when the underlying data changes, and read it through a transparent ActiveRecord API.
11
+
12
+ [![Gem Version](https://img.shields.io/gem/v/activerecord-materialized.svg)](https://rubygems.org/gems/activerecord-materialized)
13
+ [![CI](https://github.com/mavrukin/activerecord-materialized/actions/workflows/ci.yml/badge.svg)](https://github.com/mavrukin/activerecord-materialized/actions/workflows/ci.yml)
14
+ [![Docs](https://img.shields.io/badge/docs-rubydoc.info-blue.svg)](https://rubydoc.info/gems/activerecord-materialized)
15
+ [![Ruby](https://img.shields.io/badge/ruby-%3E%3D%203.4-red)](activerecord-materialized.gemspec)
16
+ [![Rails](https://img.shields.io/badge/rails-%3E%3D%208.0-red)](activerecord-materialized.gemspec)
17
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
18
+
19
+ > **Use case:** Your reporting page runs a 12-second join across six tables. Users visit once a day. MySQL has no native materialized views. This gem gives you PostgreSQL-style semantics in application code — writes trigger refresh, reads never pay for it.
20
+
21
+ ### Why use this?
22
+
23
+ - **Reads stay fast** — queries hit a small precomputed table, not a multi-second join.
24
+ - **Freshness is automatic** — a write to a `depends_on` model schedules background maintenance; you don't refresh by hand.
25
+ - **Nothing blocks on a rebuild** — refresh is incremental and on-write, never on-read, and a full rebuild only ever happens when you explicitly ask for it.
26
+ - **It's just ActiveRecord** — `where`, `find`, `count`, and scopes work unchanged; an unbuilt view still returns correct results by reading through to the source.
27
+ - **It's portable** — works on MySQL, MariaDB, and SQLite, which have no native materialized views.
28
+
29
+ > 🚀 **New here? Start with the [Getting started tutorial](docs/getting-started.md)** — a hands-on, fully tested walkthrough from install to refresh-on-write.
30
+ >
31
+ > 🧪 **Want to feel it?** A runnable Rails demo lives in [`demo/`](demo/) — compare raw vs. materialized timings side by side, mutate the data, and watch the view go stale and catch up.
32
+
33
+ **Author:** [Michael Avrukin](https://github.com/mavrukin) · **License:** [MIT](LICENSE)
34
+
35
+ ---
36
+
37
+ ## Table of contents
38
+
39
+ - [Why this exists](#why-this-exists)
40
+ - [How it works](#how-it-works)
41
+ - [Research background](#research-background)
42
+ - [Features](#features)
43
+ - [Gotchas and trade-offs](#gotchas-and-trade-offs)
44
+ - [Installation](#installation)
45
+ - [Getting started tutorial](#getting-started-tutorial)
46
+ - [Quick start](#quick-start)
47
+ - [Configuration](#configuration)
48
+ - [API reference](#api-reference)
49
+ - [Benchmark results](#benchmark-results)
50
+ - [When to use (and when not to)](#when-to-use-and-when-not-to)
51
+ - [Comparison with native materialized views](#comparison-with-native-materialized-views)
52
+ - [Versioning](#versioning)
53
+ - [Development](#development)
54
+ - [Contributing](#contributing)
55
+
56
+ ---
57
+
58
+ ## Why this exists
59
+
60
+ Many Rails applications on **MySQL**, **MariaDB**, or **SQLite** hit the same wall:
61
+
62
+ | Symptom | Example |
63
+ |---------|---------|
64
+ | Complex joins + aggregations | `GROUP BY`, `DISTINCT`, correlated subqueries |
65
+ | Seconds per query even with indexes | Dashboards, admin reports, analytics APIs |
66
+ | Read-heavy, write-light | Thousands of reads/day, dozens of writes/day |
67
+ | No native MV support | Unlike PostgreSQL's `CREATE MATERIALIZED VIEW` |
68
+
69
+ **Materialized views** solve this by storing query results as a physical table and refreshing that snapshot when source data changes. High-end databases (PostgreSQL, Oracle, SQL Server) provide this natively. When your database cannot, **activerecord-materialized** implements the same read/refresh split in Ruby — without changing how developers query data.
70
+
71
+ ### The problem with refresh-on-read
72
+
73
+ A naive approach refreshes the view on the first read after data changes. That punishes the unlucky user whose visit triggers a 10-second rebuild — and on a large database an implicit full rebuild can be catastrophic. This gem **never rebuilds implicitly**: a full materialization happens only via an explicit `rebuild!(confirm: true)`. Routine freshness is **incremental, on write** (dependency changes schedule partition-local maintenance after commit), and an unbuilt view stays correct via **read-through** to the source query until you build it.
74
+
75
+ ---
76
+
77
+ ## How it works
78
+
79
+ ### Architecture
80
+
81
+ ```mermaid
82
+ flowchart TB
83
+ subgraph writes ["Write path — routine maintenance"]
84
+ W["INSERT / UPDATE / DELETE on depends_on model"]
85
+ DT["DependencyTrackable after_*_commit callbacks"]
86
+ DR["DependencyRegistry.publish_write_change!"]
87
+ MS["MaintenanceDeltaBuilder / SummaryDeltaBuilder + MaintenanceStore"]
88
+ RS[RefreshScheduler]
89
+ AR["AsyncRefresher or RefreshJob"]
90
+ RFR["Refresher dispatch"]
91
+ DM["DeltaMaintainer — signed summary deltas (distributive views)"]
92
+ IM["IncrementalMaintainer — scoped delete + re-aggregate (fallback)"]
93
+ W --> DT --> DR --> MS --> RS --> AR --> RFR
94
+ RFR --> DM
95
+ RFR --> IM
96
+ end
97
+
98
+ subgraph bootstrap ["Bootstrap once (explicit rebuild!)"]
99
+ RF["Refresher — RelationCacheWriter INSERT … SELECT + atomic swap"]
100
+ end
101
+
102
+ subgraph reads ["Read path — always fast"]
103
+ Q["SalesSummary queries"]
104
+ CT[("mv_sales_summary cache table")]
105
+ Q --> CT
106
+ end
107
+
108
+ subgraph meta [Metadata]
109
+ MD[("ar_materialized_view_metadata")]
110
+ DM --> CT
111
+ IM --> CT
112
+ DM --> MD
113
+ IM --> MD
114
+ MS --> MD
115
+ RF --> CT
116
+ RF --> MD
117
+ end
118
+
119
+ DM -.->|"applies partition deltas in place"| CT
120
+ IM -.->|"re-aggregates affected partitions"| CT
121
+ RF -.->|"initial snapshot"| CT
122
+ ```
123
+
124
+ ### Refresh lifecycle
125
+
126
+ 1. **Define** a view class with a `materialized_from` block (returning an `ActiveRecord::Relation`) and `depends_on` models.
127
+ 2. **Build** — an explicit `rebuild!(confirm: true)` materializes the source relation into the cache table via `RelationCacheWriter` + atomic swap. This is the only full-scan path and never fires implicitly; until it runs, reads fall through to the source (`cold_read :read_through`).
128
+ 3. **Write** — any create/update/destroy on a `depends_on` model fires an `after_*_commit` callback (installed by `DependencyTrackable`) that calls `DependencyRegistry.publish_write_change!`.
129
+ 4. **Accumulate** — for each affected view, `MaintenanceDeltaBuilder` records affected `GROUP BY` partition keys in `MaintenanceStore` (widens to all partitions when scope is unknown).
130
+ 5. **Defer** — `after_*_commit` fires only once the writing transaction commits, so changes are batched naturally and a rolled-back transaction schedules nothing.
131
+ 6. **Debounce** — rapid writes coalesce into one maintenance pass (configurable window).
132
+ 7. **Maintain** — distributive views (`SUM`/`COUNT`/`COUNT(*)`) apply signed **summary deltas** straight to the affected cache rows without re-reading base rows (`DeltaMaintainer`); everything else (`AVG`, `MIN`, `MAX`, `COUNT(DISTINCT)`, joins, `HAVING`) **re-aggregates only the affected partitions** (`IncrementalMaintainer`). Neither path does DDL or an atomic swap on the hot path.
133
+ 8. **Read** — once built, `where`, `find`, `count`, scopes query the cache table directly; reads before maintenance completes return the previous snapshot, reads after see updated partitions. Before the view is built, reads transparently fall through to the source query.
134
+
135
+ ### Core components
136
+
137
+ | Component | Role |
138
+ |-----------|------|
139
+ | `ActiveRecord::Materialized::View` | Base model; DSL and query interface |
140
+ | `DependencyTrackable` | Installs `after_*_commit` callbacks on `depends_on` models |
141
+ | `DependencyRegistry` | Maps tables → view classes; publishes commit writes to affected views |
142
+ | `RefreshScheduler` | Dispatches `:async`, `:immediate`, or `:manual` strategies |
143
+ | `AsyncRefresher` | Debounced in-process background maintenance (tests: `flush!`) |
144
+ | `RefreshJob` | Optional ActiveJob wrapper for production workers |
145
+ | `ViewDefinition` | Inspects source relations for `GROUP BY` maintenance keys |
146
+ | `AggregateAnalysis` | Classifies a view's aggregates; decides if it is summary-delta maintainable |
147
+ | `MaintenanceDeltaBuilder` | Maps ActiveRecord change payloads to affected partition keys (scoped recompute) |
148
+ | `SummaryDeltaBuilder` / `SummaryDelta` | Compute and accumulate signed per-partition aggregate deltas (distributive views) |
149
+ | `MaintenanceStore` | Persists pending maintenance (delta or scope) in metadata |
150
+ | `DeltaMaintainer` | Hot path for distributive views: applies summary deltas in place, no base re-read |
151
+ | `IncrementalMaintainer` | Fallback hot path: partition delete + re-aggregate in the existing cache table |
152
+ | `Refresher` | Orchestrates explicit bootstrap/full refresh and dispatches incremental maintenance |
153
+ | `RelationCacheWriter` | Materializes the relation via `INSERT … SELECT`; atomic table swap on full refresh |
154
+ | `QueryExpressions` | Portable Arel helpers (`sum_as`, `count_distinct_as`, …) for view definitions |
155
+ | `Metadata` | Tracks `dirty`, `maintenance_payload`, `last_refreshed_at`, `row_count`, errors |
156
+
157
+ ---
158
+
159
+ ## Research background
160
+
161
+ This gem applies decades of materialized-view and incremental-maintenance research to the application layer.
162
+
163
+ ### Foundational surveys
164
+
165
+ | Topic | Reference |
166
+ |-------|-----------|
167
+ | **Materialized views monograph** | Chirkova & Yang, [*Materialized Views*](https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf) (Foundations and Trends in Databases, 2012) — definitions, refresh strategies, view selection, query rewriting |
168
+ | **View maintenance taxonomy** | Gupta & Mumick, [*Maintenance of Materialized Views: Problems, Techniques, and Applications*](https://homepages.inf.ed.ac.uk/wenfei/qsx/reading/gupta95maintenance.pdf) (IEEE Data Engineering Bulletin, 1995) — when full vs incremental refresh is appropriate |
169
+
170
+ ### Incremental view maintenance
171
+
172
+ | Topic | Reference |
173
+ |-------|-----------|
174
+ | **Warehousing & decoupled sources** | Zhuge et al., [*View Maintenance in a Warehousing Environment*](https://sigmodrecord.org/publications/sigmodRecord/9506/pdfs/568271.223848.pdf) (SIGMOD 1995) — maintaining views when base data lives outside the warehouse |
175
+ | **Higher-order deltas** | Ahmad et al., [*DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views*](https://arxiv.org/pdf/1207.0137) (VLDB 2012) — recursive finite-differencing for low-latency view refresh |
176
+ | **Factorized IVM (F-IVM)** | Nikolic & Olteanu, [*Incremental View Maintenance with Triple Lock Factorization Benefits*](https://www.cs.ox.ac.uk/dan.olteanu/papers/no-sigmod18.pdf) (SIGMOD 2018) — factorized higher-order maintenance for conjunctive queries and aggregates |
177
+ | **IVM survey (recent)** | Olteanu, [*Recent Increments in Incremental View Maintenance*](https://arxiv.org/pdf/2404.17679) (PODS 2024 Gems) — fine-grained complexity and modern IVM engines |
178
+
179
+ ### Systems & dataflow approaches
180
+
181
+ | Topic | Reference |
182
+ |-------|-----------|
183
+ | **Differential dataflow** | McSherry et al., [*Differential Dataflow*](https://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper111.pdf) (CIDR 2013) — incremental computation over changing data with multi-version state |
184
+ | **Application-layer precomputation** | Gjengset et al., [*Noria: dynamic, partially-stateful data-flow for high-performance web applications*](https://www.usenix.org/system/files/osdi18-gjengset.pdf) (OSDI 2018) — partially-stateful dataflow that incrementally maintains query results for web backends |
185
+
186
+ ### Practical references
187
+
188
+ | Topic | Reference |
189
+ |-------|-----------|
190
+ | **Production reference** | [PostgreSQL: REFRESH MATERIALIZED VIEW](https://www.postgresql.org/docs/current/sql-refreshmaterializedview.html) — `CONCURRENTLY` refresh, separate read/refresh paths |
191
+ | **Benchmark schema** | Leis et al., [*How Good Are Query Optimizers, Really?*](https://dl.acm.org/doi/10.1145/3035918.3064035) (VLDB 2015) — [Join Order Benchmark](https://github.com/gregrahn/join-order-benchmark) used in this repo's benchmark suite |
192
+
193
+ **Design choice:** After a one-time bootstrap, routine refresh uses **incremental view maintenance (IVM)** by default. Following Gupta & Mumick, aggregate views with `GROUP BY` are maintained by recomputing only **affected partitions** (group keys) and merging them into the existing cache table — no table rebuild, no atomic swap on the hot path. Writes on `depends_on` models accumulate partition keys from ActiveRecord change payloads; maintenance deletes stale partition rows and inserts freshly aggregated replacements. Use `refresh_mode :full` when a view cannot be maintained incrementally.
194
+
195
+ ---
196
+
197
+ ## Features
198
+
199
+ - **Refresh on write** — dependency changes schedule background refresh; reads never block on rebuild
200
+ - **Transparent ActiveRecord API** — `where`, `find`, `count`, scopes, associations on cache tables
201
+ - **Relation-based sources** — `materialized_from` blocks return `ActiveRecord::Relation` (no raw SQL strings)
202
+ - **Portable aggregations** — `QueryExpressions` helpers build Arel for `SUM`, `COUNT`, `AVG`, etc.
203
+ - **Incremental maintenance by default** — summary-delta IVM for distributive `GROUP BY` views (signed deltas, no base re-scan) with partition-local re-aggregation as the always-correct fallback; no cache-table rebuild on routine refresh
204
+ - **Atomic table swap on bootstrap only** — initial full materialization + rename when the cache is first built or on `refresh_mode :full`
205
+ - **Debounced async refresh** — coalesce rapid writes (PostgreSQL NOTIFY + worker pattern)
206
+ - **ActiveJob integration** — offload refresh to Sidekiq, GoodJob, Solid Queue, etc.
207
+ - **Dependency tracking** — `depends_on` models; ActiveRecord commit callbacks detect writes
208
+ - **Metadata table** — `last_refreshed_at`, `dirty`, `row_count`, `refresh_duration_ms`, errors
209
+ - **Staleness safety net** — optional `max_staleness` + rake tasks for cron-driven refresh
210
+ - **Rails generators** — `activerecord_materialized:install`, `:view`, and `:migration` (cache-table migration inferred from the source relation)
211
+ - **Rake tasks** — `materialized:refresh_all`, `:refresh_stale`, `:rebuild`, `:verify`, `:warm_up`
212
+ - **Benchmark suite** — JOB-schema SQLite database with multi-second analytical queries
213
+
214
+ ---
215
+
216
+ ## Gotchas and trade-offs
217
+
218
+ | Gotcha | Detail |
219
+ |--------|--------|
220
+ | **Eventual consistency** | Between a write and background refresh completing, reads return the previous snapshot. Same trade-off as `REFRESH MATERIALIZED VIEW CONCURRENTLY` in PostgreSQL. |
221
+ | **`depends_on` is required** | The gem cannot infer dependencies from a relation. Declare every model (or table) whose writes should trigger refresh. Prefer model classes (`depends_on LineItem`) so commit callbacks are wired automatically. |
222
+ | **Maintenance scope** | Partition keys are taken from ActiveRecord change payloads when possible (`create`/`update`/`destroy` with equality on `GROUP BY` columns). Unbounded writes widen to all partitions (in-place, still no DDL). |
223
+ | **Non-aggregate views** | Views without `GROUP BY` fall back to full refresh (`refresh_mode :full` or atomic swap). Join-heavy maintenance (Larson & Zhou) is not automatic yet. |
224
+ | **Full refresh escape hatch** | `rebuild!(confirm: true)` (or `refresh_mode :full`) rebuilds via atomic swap — use for recovery or non-maintainable views. `refresh!` is always incremental and never rebuilds. |
225
+ | **Table-name-only `depends_on`** | Symbol/string table names work, but refresh-on-write requires a resolvable ActiveRecord model for that table. Raw SQL writes bypass callbacks and will not trigger refresh. |
226
+ | **SQLite vs MySQL in dev** | The benchmark uses SQLite. Production behavior is adapter-agnostic, but test atomic swap on your target database. |
227
+ | **In-process async default** | Default `refresh_dispatcher: :async` uses a background thread. **Use ActiveJob in production** so refresh work runs on job workers, not Puma threads. |
228
+ | **No automatic indexes** | Cache tables are created from query results. Add indexes on cache columns you filter/sort on. |
229
+ | **Storage** | Cache tables duplicate data. Plan disk usage accordingly. |
230
+ | **Nested transactions** | Refresh is scheduled on the transaction where the write occurred; rollback clears pending refreshes for that transaction. |
231
+ | **Bulk writes** | Each committed row to a `depends_on` model runs the maintenance bookkeeping once. Use `:async` (with a non-zero debounce, the default) or `:manual`, not `refresh_debounce 0` or `:immediate`. Pending scope that spans more than `max_tracked_partitions` distinct partitions collapses to one full recompute. `insert_all`/`upsert_all` **bypass** `after_commit`, so the view won't be notified — call `refresh!` (or `mark_dependencies_changed!`) yourself after a callback-skipping bulk load. |
232
+
233
+ ---
234
+
235
+ ## Installation
236
+
237
+ Add to your Gemfile:
238
+
239
+ ```ruby
240
+ gem "activerecord-materialized"
241
+ ```
242
+
243
+ Install the metadata migration:
244
+
245
+ ```bash
246
+ bin/rails generate activerecord_materialized:install
247
+ bin/rails db:migrate
248
+ ```
249
+
250
+ ---
251
+
252
+ ## Getting started tutorial
253
+
254
+ The **[Getting started tutorial](docs/getting-started.md)** is the recommended first read: a hands-on walkthrough that goes from `bundle install` to a view that refreshes itself on write — defining a view, reading through before it's built, building it, querying it, and watching background maintenance update it. Every example in it is executed by the test suite (`spec/docs/getting_started_tutorial_spec.rb`), so the code and the numbers are guaranteed to work.
255
+
256
+ The condensed reference version follows below.
257
+
258
+ ---
259
+
260
+ ## Quick start
261
+
262
+ Generate a view model:
263
+
264
+ ```bash
265
+ bin/rails generate activerecord_materialized:view SalesSummary
266
+ ```
267
+
268
+ Define the view:
269
+
270
+ ```ruby
271
+ class SalesSummary < ActiveRecord::Materialized::View
272
+ extend ActiveRecord::Materialized::QueryExpressions
273
+
274
+ self.table_name = "mv_sales_summary"
275
+
276
+ materialized_from do
277
+ line_items = LineItem.arel_table
278
+ orders = Order.arel_table
279
+ products = Product.arel_table
280
+
281
+ LineItem
282
+ .joins(:order, :product)
283
+ .group(products[:category])
284
+ .select(
285
+ products[:category],
286
+ sum_as(line_items[:amount], as: :revenue),
287
+ count_distinct_as(orders[:id], as: :order_count)
288
+ )
289
+ end
290
+
291
+ depends_on LineItem, Order, Product
292
+ refresh_on_change :async
293
+ refresh_debounce 30.seconds
294
+ max_staleness 12.hours
295
+
296
+ before_refresh { Rails.logger.info("Refreshing #{name}") }
297
+ end
298
+ ```
299
+
300
+ Sources must be `ActiveRecord::Relation` objects built with standard query APIs and Arel — not raw SQL strings. Extract complex relations to a module or class method when a view definition grows large (see `spec/support/view_sources.rb` and `benchmark/support/source_relations.rb` in this repo).
301
+
302
+ Provision the (empty) cache table with a migration generated from the relation, so it exists at deploy time:
303
+
304
+ ```bash
305
+ bin/rails generate activerecord_materialized:migration SalesSummary
306
+ bin/rails db:migrate
307
+ ```
308
+
309
+ Build the view once (e.g. in a deploy task) — the only full-scan path, never implicit:
310
+
311
+ ```ruby
312
+ SalesSummary.rebuild!(confirm: true)
313
+ ```
314
+
315
+ Then query like any ActiveRecord model:
316
+
317
+ ```ruby
318
+ # Served from the mv_sales_summary cache table — never triggers a rebuild.
319
+ # (Before the view is built, this reads through to the source query instead.)
320
+ SalesSummary.where("revenue > ?", 10_000).order(revenue: :desc)
321
+ ```
322
+
323
+ Refresh strategies:
324
+
325
+ | Strategy | Behavior |
326
+ |----------|----------|
327
+ | `:async` (default) | After commit, debounced, via background thread or ActiveJob |
328
+ | `:immediate` | Synchronous refresh on each write (blocks writers) |
329
+ | `:manual` | Mark dirty only; call `refresh!` or rake tasks explicitly |
330
+
331
+ ### Incremental maintenance (default)
332
+
333
+ For `GROUP BY` aggregate views, no extra configuration is required. The gem:
334
+
335
+ 1. Inspects the `materialized_from` relation to derive maintenance partition keys (`GROUP BY` columns).
336
+ 2. Accumulates affected partition keys from dependency writes (via ActiveRecord commit callbacks).
337
+ 3. On refresh, deletes and re-inserts only those partitions in the existing cache table.
338
+
339
+ Optional overrides when you need explicit control:
340
+
341
+ ```ruby
342
+ class SalesSummary < ActiveRecord::Materialized::View
343
+ incremental_keys :category # override inferred GROUP BY keys
344
+ refresh_mode :full # opt out of incremental maintenance
345
+ # incremental_from { ... } # optional: override auto-scoped maintenance relation
346
+ end
347
+ ```
348
+
349
+ ---
350
+
351
+ ## Configuration
352
+
353
+ ```ruby
354
+ # config/initializers/activerecord_materialized.rb
355
+ ActiveRecord::Materialized.configure do |config|
356
+ config.default_refresh_strategy = :async
357
+ config.default_refresh_debounce = 30.seconds
358
+ config.refresh_dispatcher = :active_job # :async for in-process thread
359
+ config.refresh_queue_name = :materialized_views
360
+ config.default_max_staleness = 12.hours
361
+ config.default_cold_read_strategy = :read_through # :serve_stale or :raise
362
+ config.atomic_swap_refresh = true
363
+ config.max_tracked_partitions = 1_000 # collapse to a full recompute past this
364
+ config.metadata_table_name = "ar_materialized_view_metadata"
365
+ end
366
+ ```
367
+
368
+ ---
369
+
370
+ ## API reference
371
+
372
+ ### Class methods
373
+
374
+ | Method | Description |
375
+ |--------|-------------|
376
+ | `rebuild!(confirm: true)` | **Explicit** full materialization via in-database `INSERT … SELECT` (the only full-scan path; never fires implicitly, never buffers rows in Ruby) |
377
+ | `warm_up!` | Materialize the configured `warm_up` partitions ahead of traffic |
378
+ | `refresh!` | Incremental maintenance only (no-op on an unbuilt view); never rebuilds |
379
+ | `refresh_if_stale!` | Incremental maintenance when materialized and stale |
380
+ | `materialized?` | Whether the view has been built (warm) and reads serve from the cache |
381
+ | `dirty?` | Whether a dependency change is pending maintenance |
382
+ | `stale?` | Whether view is dirty or exceeds `max_staleness` |
383
+ | `last_refreshed_at` | Timestamp of last successful refresh |
384
+ | `refreshing?` | Whether a refresh is in progress |
385
+ | `resolved_source` | The current `ActiveRecord::Relation` used for refresh |
386
+
387
+ ### DSL
388
+
389
+ | Macro | Description |
390
+ |-------|-------------|
391
+ | `materialized_from { relation }` | Block returning the source `ActiveRecord::Relation` |
392
+ | `depends_on(*models_or_tables)` | Register dependencies; writes trigger refresh |
393
+ | `refresh_on_change(strategy)` | `:async`, `:immediate`, or `:manual` |
394
+ | `refresh_debounce(duration)` | Coalesce rapid writes before refreshing |
395
+ | `refresh_mode(mode)` | `:incremental` (default) or `:full` |
396
+ | `cold_read(strategy)` | Read behavior before the view is built: `:read_through` (default), `:serve_stale`, or `:raise` |
397
+ | `warm_up { [relations] }` | Representative queries whose partitions `warm_up!` materializes ahead of traffic |
398
+ | `incremental_from { relation }` | Optional override for scoped maintenance relation |
399
+ | `incremental_keys(*columns)` | Optional override for inferred `GROUP BY` keys |
400
+ | `max_staleness(duration)` | Optional time-based safety refresh via rake/cron |
401
+ | `before_refresh` / `after_refresh` | Refresh lifecycle callbacks |
402
+
403
+ ### QueryExpressions
404
+
405
+ Include or extend `ActiveRecord::Materialized::QueryExpressions` when defining aggregations:
406
+
407
+ | Helper | Arel equivalent |
408
+ |--------|-----------------|
409
+ | `sum_as(attr, as: :name)` | `SUM(...)` |
410
+ | `avg_as(attr, as: :name)` | `AVG(...)` |
411
+ | `count_as(attr, as: :name)` | `COUNT(...)` |
412
+ | `count_distinct_as(attr, as: :name)` | `COUNT(DISTINCT ...)` |
413
+ | `count_all_as(as: :name)` | `COUNT(*)` |
414
+ | `min_as` / `max_as` | `MIN` / `MAX` |
415
+
416
+ ### Rake tasks
417
+
418
+ ```bash
419
+ bin/rails materialized:refresh_all # incremental maintenance pass
420
+ bin/rails materialized:refresh_stale
421
+ bin/rails materialized:rebuild # intentional full materialization (in-DB INSERT … SELECT)
422
+ bin/rails materialized:verify # raise on cache-table schema drift
423
+ bin/rails materialized:warm_up # materialize configured warm_up partitions
424
+ ```
425
+
426
+ ---
427
+
428
+ ## Benchmark results
429
+
430
+ The included benchmark uses a [Join Order Benchmark](https://github.com/gregrahn/join-order-benchmark)-style schema on SQLite. On the **xlarge** dataset (~2M `cast_info` rows):
431
+
432
+ | Query | Source relation | MV read | Speedup |
433
+ |-------|-----------------|---------|---------|
434
+ | `gender_pairing_stats` | ~7.4s | ~0.3ms | ~21,000× |
435
+ | `company_movie_cross` | ~7.4s | ~0.4ms | ~20,000× |
436
+ | `person_movie_network` | ~13.3s | ~0.7ms | ~20,000× |
437
+ | `cast_coappearance` | ~19.7s | ~0.4ms | ~49,000× |
438
+
439
+ Run locally:
440
+
441
+ ```bash
442
+ bundle install
443
+ JOB_SCALE=xlarge bundle exec rake benchmark:setup # ~few minutes
444
+ bundle exec rake benchmark:slow
445
+ bundle exec rake benchmark:verify_updates # refresh-on-write proof
446
+ ```
447
+
448
+ See [benchmark/DATA.md](benchmark/DATA.md) for dataset scales and setup details.
449
+
450
+ ---
451
+
452
+ ## When to use (and when not to)
453
+
454
+ **Good fit:**
455
+
456
+ - Expensive read-mostly reporting queries on MySQL/MariaDB/SQLite
457
+ - Dashboards and admin pages where sub-second reads matter
458
+ - Infrequent or batched writes to underlying tables
459
+ - Acceptable eventual consistency between write and background refresh
460
+
461
+ **Poor fit:**
462
+
463
+ - Real-time, strongly consistent reads (use live queries or replicas)
464
+ - Very frequent writes where full refresh cost exceeds query cost
465
+ - Tiny queries where materialization overhead isn't worth it
466
+ - Views where you cannot enumerate all `depends_on` tables
467
+
468
+ ---
469
+
470
+ ## Comparison with native materialized views
471
+
472
+ | Capability | PostgreSQL native | activerecord-materialized |
473
+ |------------|-------------------|---------------------------|
474
+ | Precomputed snapshot | ✅ | ✅ |
475
+ | Transparent reads | ✅ (query rewrite or direct) | ✅ (ActiveRecord model) |
476
+ | Refresh on dependency change | Manual / trigger / pg_cron | ✅ automatic via `depends_on` |
477
+ | Background refresh | `REFRESH ... CONCURRENTLY` | ✅ async / ActiveJob |
478
+ | Incremental refresh | Limited (IVM extensions) | ✅ default partition-local IVM for `GROUP BY` views |
479
+ | Atomic swap during refresh | ✅ CONCURRENTLY | ✅ table rename |
480
+ | Database portability | PostgreSQL only | ✅ any ActiveRecord adapter |
481
+
482
+ ---
483
+
484
+ ## Versioning
485
+
486
+ This gem follows [Semantic Versioning](https://semver.org/). Given `MAJOR.MINOR.PATCH`:
487
+
488
+ - **MAJOR** — incompatible public-API changes (DSL macros, configuration keys, the `View` query surface).
489
+ - **MINOR** — backward-compatible features.
490
+ - **PATCH** — backward-compatible bug fixes.
491
+
492
+ Until `1.0.0`, the API may still change between minor releases; pin a version if you depend on it. Every change is recorded in [CHANGELOG.md](CHANGELOG.md).
493
+
494
+ ---
495
+
496
+ ## Development
497
+
498
+ ```bash
499
+ git clone https://github.com/mavrukin/activerecord-materialized.git
500
+ cd activerecord-materialized
501
+ bin/setup # bundle install + git hooks + Sorbet RBIs
502
+ bin/ci # RuboCop, Sorbet, and the full test suite
503
+ bundle exec rake benchmark:setup
504
+ bundle exec rake benchmark
505
+ ```
506
+
507
+ **API documentation** is published at [rubydoc.info/gems/activerecord-materialized](https://rubydoc.info/gems/activerecord-materialized) (generated from YARD doc comments, with types pulled from the Sorbet signatures via `yard-sorbet`). Build it locally with:
508
+
509
+ ```bash
510
+ bundle exec yard doc # generates HTML into doc/
511
+ bundle exec yard server # browse at http://localhost:8808
512
+ ```
513
+
514
+ Maintainers: see [RELEASING.md](RELEASING.md) for the gem publishing process.
515
+
516
+ ---
517
+
518
+ ## Contributing
519
+
520
+ Bug reports and pull requests are welcome at [github.com/mavrukin/activerecord-materialized](https://github.com/mavrukin/activerecord-materialized).
521
+
522
+ ---
523
+
524
+ ## License
525
+
526
+ MIT © [Michael Avrukin](https://github.com/mavrukin)