pipeloader 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,40 +1,119 @@
1
1
  # pipeloader
2
2
 
3
- Transparent libpq pipelining for **graphql-ruby on ActiveRecord**. During GraphQL
4
- response building, every ActiveRecord `SELECT` is routed through a libpq pipeline,
5
- so a nested query resolves in roughly **one round trip per tree level** — with
6
- **plain resolvers and plain models**. No Futures, no `dataloader.load`, no field
7
- changes.
3
+ Cut ActiveRecord N+1 on both axes, round trips and query count, with plain models and no
4
+ `dataloader.load` keys. The pieces compose:
5
+
6
+ - `use Pipeloader` routes every `SELECT` in a [graphql-ruby](#adopting-it) response through
7
+ a libpq pipeline, so a nested query resolves in roughly one round trip per tree level.
8
+ Plain resolvers, no Futures, no field changes.
9
+ - `auto_fuse` collapses each level's per-record lookups into one `WHERE key = ANY($1)`,
10
+ dropping query count to DataLoader's.
11
+ - [`Pipeloader::Batch`](#batch-loaders-for-plain-activerecord) brings that batching to plain
12
+ ActiveRecord (`batch_has_many`, `batch_belongs_to`) for jobs, serializers, and other
13
+ non-GraphQL paths.
14
+
15
+ Adopt them together: a fused or batch-loaded query is itself pipelined, so you get
16
+ DataLoader-class query counts and one round trip per level at once.
8
17
 
9
18
  ## Adopting it
10
19
 
11
- One line:
20
+ These compose; adopt the ones your app needs. A query gathered by fusion or a batch loader
21
+ is itself pipelined, so you get few queries and few round trips together.
22
+
23
+ ### Pipelining
24
+
25
+ One line. Types and resolvers stay ordinary ActiveRecord:
12
26
 
13
27
  ```ruby
14
28
  class AppSchema < GraphQL::Schema
15
29
  use Pipeloader
16
30
  end
31
+
32
+ class Types::Post < GraphQL::Schema::Object
33
+ field :title, String, null: false
34
+ field :author, Types::Author, null: false # resolves via post.author
35
+ field :comments, [Types::Comment], null: false # resolves via post.comments
36
+ end
17
37
  ```
18
38
 
19
- That's the whole adoption surface. Your types and resolvers stay exactly as they
20
- are ordinary ActiveRecord:
39
+ Any AR SELECT issued while building the response (a `belongs_to`, a `has_many`, a
40
+ `.where(...)` in a hand-written resolver) is intercepted and pipelined. It hooks AR's
41
+ query path rather than the GraphQL field, so nothing leaks back to synchronous N+1, even
42
+ from custom resolver code. By default the pipeline fetches whole rows.
43
+
44
+ ### Field-exact projection (opt-in)
45
+
46
+ Set `field_exact` and each SELECT narrows to the columns the query selected, using
47
+ graphql-ruby's `lookahead`:
21
48
 
22
49
  ```ruby
50
+ Pipeloader.field_exact = true # globally, before your types load, or
51
+
23
52
  class Types::Post < GraphQL::Schema::Object
53
+ pipeloader_field_exact! # per type
24
54
  field :title, String, null: false
25
- field :author, Types::Author, null: false # resolves via post.author
26
- field :comments, [Types::Comment], null: false # resolves via post.comments
55
+ field :author, Types::Author, null: false
56
+ end
57
+ ```
58
+
59
+ For `{ posts { title author { name } } }` the posts SELECT becomes
60
+ `SELECT id, title, author_id FROM ...` (primary key, selected column, and the FK needed
61
+ for `author`), and the authors SELECT becomes `SELECT id, name FROM ...`.
62
+
63
+ Projection narrows only when it can prove every selected field reads a known column or
64
+ association. If a selection is opaque (a computed field, a custom resolver, anything it
65
+ can't map to a column) it falls back to a whole-row fetch for that record, so a projected
66
+ field never raises `MissingAttributeError`. A computed field can declare the columns it
67
+ reads with `selects:`, so projection keeps them:
68
+
69
+ ```ruby
70
+ field :excerpt, String, null: false, selects: %i[body]
71
+ def excerpt = object.body[0, 200]
72
+ ```
73
+
74
+ With no opt-in, `selects:` is accepted and ignored and every SELECT is whole-row.
75
+
76
+ ### Auto-fuse (opt-in)
77
+
78
+ Field-exact also fuses: the per-record `belongs_to` / `has_one` / `has_many` lookups on a
79
+ level collapse into one `WHERE key = ANY($1)` (DataLoader-class server cost, still
80
+ pipelined, so round trips stay at the tree depth). To get that fusion whole-row, with no
81
+ projection and no resolver code, set `auto_fuse`:
82
+
83
+ ```ruby
84
+ Pipeloader.auto_fuse = true # before your types load
85
+ ```
86
+
87
+ A plain `object.author` / `object.comments` now fuses automatically. It fuses only when
88
+ the demux is provably unambiguous (a unique primary key for `belongs_to`, a unique FK
89
+ index for `has_one`, a bare unscoped `has_many`). Anything else (scopes, chained
90
+ `order`/`limit`, polymorphic, custom resolvers, SQLite) falls back to the plain pipelined
91
+ load. Results are byte-identical to the un-fused path.
92
+
93
+ ### Batch loaders
94
+
95
+ The same gathering for plain ActiveRecord, for the jobs, serializers, and non-GraphQL
96
+ endpoints the resolvers don't cover. Include the concern and swap `has_many` for
97
+ `batch_has_many` (or `belongs_to` for `batch_belongs_to`):
98
+
99
+ ```ruby
100
+ class Author < ApplicationRecord
101
+ include Pipeloader::Batch::Model
102
+ batch_has_many :books
27
103
  end
104
+
105
+ Author.all.to_a.each { |a| a.books.to_a } # one query for everyone's books
28
106
  ```
29
107
 
30
- `post.author`, `post.comments`, a `has_many`, a `.where(...)` in a hand-written
31
- resolver any AR SELECT issued while building the response is intercepted and
32
- pipelined. Because it hooks AR's query path (not the GraphQL field), nothing
33
- leaks back to synchronous N+1, even from custom resolver code.
108
+ `a.books` loads for every author loaded alongside it on first access, as one `IN` query via
109
+ AR's `Preloader`, with no setup: the sibling group is stamped onto the records as they load.
110
+ Inside a `use Pipeloader` response those batched queries are pipelined too. Full surface (the
111
+ chainable proxy, counts and aggregates, the general `batch` macro) in
112
+ [Batch loaders for plain ActiveRecord](#batch-loaders-for-plain-activerecord) below.
34
113
 
35
114
  ## What it does
36
115
 
37
- `example/run.rb` plain resolvers, against a seeded database:
116
+ `example/run.rb`, plain resolvers against a seeded database:
38
117
 
39
118
  ```
40
119
  { posts(limit: 50) { title author { name } comments { body commenter { name } } } }
@@ -45,149 +124,140 @@ queries pipelined: 403
45
124
  naive N+1 would be: ~594 round trips
46
125
  ```
47
126
 
48
- Three round trips: `posts` (`authors` + `comments`) `commenters`. The to-one
49
- `author` and the to-many `comments` are *different shapes at the same level*, yet
50
- collapse into a single round trip.
127
+ Three round trips: `posts`, then `authors` and `comments`, then `commenters`. The to-one
128
+ `author` and the to-many `comments` are different shapes at the same level but collapse
129
+ into one round trip.
130
+
131
+ ## How it works
132
+
133
+ 1. `use GraphQL::Dataloader` runs resolution in fibers, so a synchronous-looking
134
+ `post.author` can yield instead of blocking and sibling queries gather before anything
135
+ hits the wire.
136
+ 2. A monkey-patch on `select_all` hands each SELECT to a Dataloader source instead of
137
+ running it. The active dataloader is stashed on the connection for the multiplex (and
138
+ cleared after), so the patch finds it as `self`.
139
+ 3. When the fibers park, the source prepares each distinct query shape (once per request,
140
+ reused across bursts), Bind/Executes every gathered query in one libpq burst
141
+ (`enter_pipeline_mode` to `pipeline_sync`), and returns an `ActiveRecord::Result` per
142
+ query so AR builds models normally.
143
+
144
+ Prepared statements are scoped to the request. The next request's first burst
145
+ `DEALLOCATE`s the previous one's, piggybacked into the same pipeline so cleanup costs no
146
+ extra round trip, so no plan goes stale across a reconnect or migration. If a query
147
+ errors, the burst is drained to its sync point, the connection is restored, and the error
148
+ is raised rather than swallowed.
51
149
 
52
150
  ## Benchmark
53
151
 
54
- The same 3-level query (`posts author + comments commenter`, 25 posts),
55
- resolved four ways, with **real** network latency added by a local TCP proxy in
56
- front of Postgres (`example/latency_proxy.rb` delays the request direction, so a
57
- synchronous query pays RTT once and a pipelined burst pays it once). Min of 3
58
- iterations; your numbers will vary.
59
-
60
- | approach | RTT 0 | RTT 1 ms | RTT 5 ms | round-trips |
61
- |---|--:|--:|--:|--:|
62
- | naive (N+1) | 94 ms | 505 ms | **1972 ms** | 290 |
63
- | AR `includes` (hand-written) | 17 ms | 22 ms | 42 ms | 4 |
64
- | `GraphQL::Dataloader` | 16 ms | 21 ms | 42 ms | 4 |
65
- | **pipeloader** | 41 ms | 45 ms | **73 ms** | **3** |
66
-
67
- Reading it honestly:
68
-
69
- - **vs the N+1 you actually have** — the headline. pipeloader turns 290 round
70
- trips into 3 with zero resolver code, so at a 5 ms hop it's **~24× faster** than
71
- naive. Most "there's an N+1 in here somewhere" code is the naive row.
72
- - **vs batching (`includes` / `GraphQL::Dataloader`)** — at low/moderate RTT,
73
- batching still wins: its 4 `IN` queries do less work than pipeloader's ~400
74
- prepared point queries. pipeloader prepares + caches statements per connection
75
- (so parse cost is amortized to ~one parse per query shape), but it still runs
76
- 400 bind+executes and builds 400 results. **Pipelining cuts round trips;
77
- batching cuts server work.** pipeloader does fewer round trips (3 vs 4 — it
78
- collapses the to-one `author` and to-many `comments` into one burst, where
79
- Dataloader runs them as two sequential sources), so it closes the gap as RTT
80
- rises and passes the batchers around ~25 ms RTT (cross-region). Same
81
- point-vs-batch tradeoff the Go experiments in this repo show.
82
- - **What pipeloader actually buys you: zero code, for any query shape.**
83
- `GraphQL::Dataloader` needs a source plus a `.load` call per association;
84
- `includes` must be hand-written per query and kept in sync with the selection.
85
- pipeloader is `use Pipeloader` and ordinary resolvers.
86
-
87
- Run it: `ruby example/bench.rb` (needs the seeded `graphql_experiment` DB).
88
-
89
- ### Scaling with tree shape
90
-
91
- That benchmark is a *narrow* tree (3 deep, 2 relations at its widest), which is
92
- close to the worst case for pipeloader. The gap widens with **width**, because:
93
-
94
- - **pipeloader round trips = tree depth** — one burst per level, any width.
95
- - **batching round trips = Σ (distinct target tables per level)** — each is its
96
- own `IN` query (a Dataloader source, or an `includes` preload).
97
-
98
- A *wide* query — issues fanning out to assignee, creator, project, parent, and
99
- comments, those nesting to team, lead, and authors (`example/bench_wide.rb`):
100
-
101
- | approach | RTT 0 | RTT 1 ms | RTT 5 ms | round-trips |
102
- |---|--:|--:|--:|--:|
103
- | naive (N+1) | 63 ms | 278 ms | 1115 ms | 164 |
104
- | AR `includes` | 13 ms | 29 ms | 91 ms | 11 |
105
- | `GraphQL::Dataloader` | 9 ms | 20 ms | 57 ms | 7 |
106
- | **pipeloader** | 28 ms | 34 ms | **51 ms** | **3** |
107
-
108
- pipeloader's round trips stay at **3** (the depth) while batching climbs to 7–11,
109
- so at a 5 ms hop **pipeloader is the fastest of all** — the point-vs-batch
110
- crossover dropped from ~25 ms (narrow) to under 5 ms (wide). The wider and deeper
111
- the tree, the lower the RTT at which pipelining wins, because pipelining is the
112
- only one of the three whose round trips don't grow with the query.
152
+ A wide GraphQL query (10 issues, each fanning out to assignee, creator, project, parent,
153
+ and comments, those nesting to team, lead, and authors), resolved against Postgres at a
154
+ realistic 5 ms network RTT (app and primary DB in different AZs through a pooler) via a
155
+ local TCP proxy. Min of 3 iterations; your numbers will vary.
113
156
 
114
- ## How it works
157
+ | approach | time | round-trips |
158
+ |---|--:|--:|
159
+ | naive (N+1) | 1160 ms | 164 |
160
+ | AR `includes` (hand-written) | 83 ms | 11 |
161
+ | `GraphQL::Dataloader` | 56 ms | 7 |
162
+ | pipeloader | 62 ms | 3 |
163
+ | **pipeloader (`auto_fuse`)** | **46 ms** | **3** |
115
164
 
116
- 1. **`use GraphQL::Dataloader`** runs resolution in fibers. This is what lets a
117
- synchronous-looking `post.author` *yield* instead of blocking, so sibling
118
- queries can gather before anything hits the wire.
119
- 2. **A monkey-patch on `select_all`** — while a response is being built, AR's
120
- SELECT path hands the query to a Dataloader source instead of executing it.
121
- The active dataloader is **stashed on the connection** for the duration of the
122
- multiplex (and cleared at the end), so the patch finds it as `self`.
123
- 3. **The source pipelines** — when the fibers all park, it prepares each distinct
124
- SQL once (cached per connection), then sends every gathered query as one libpq
125
- burst (`enter_pipeline_mode` … `pipeline_sync`), reads the results, and returns
126
- an `ActiveRecord::Result` per query so AR builds models normally.
165
+ Against the N+1 you have, pipeloader turns 164 round trips into 3 with no resolver code,
166
+ about 25x faster than naive at this latency.
127
167
 
128
- ## Field-exact projection (opt-in)
168
+ Against batching, latency multiplies round trips, and pipeloader does the fewest: 3, the
169
+ tree depth, where `includes` and `GraphQL::Dataloader` run a separate `IN` query per
170
+ association (7 to 11). The transparent path does more server work (N point queries) for
171
+ those few round trips and lands close to Dataloader. `auto_fuse` fuses each level into one
172
+ `WHERE key = ANY($1)`, getting Dataloader's server work and pipeloader's round trips, and
173
+ comes out fastest.
129
174
 
130
- By default AR picks the columns (`SELECT *`), which keeps adoption zero-effort. If
131
- you want the pipeline to fetch **only the columns the query selected**, opt in and
132
- pipeloader narrows each SELECT using graphql-ruby's `lookahead`:
175
+ `GraphQL::Dataloader` needs a source and a `.load` per association; `includes` is
176
+ hand-written per query. pipeloader is `use Pipeloader`, plus one flag for `auto_fuse`.
133
177
 
134
- ```ruby
135
- Pipeloader.field_exact = true # globally, before your types load, or…
178
+ Run it: `ruby example/bench_wide.rb` (needs the seeded `graphql_experiment` DB).
136
179
 
137
- class Types::Post < GraphQL::Schema::Object
138
- pipeloader_field_exact! # …per type
139
- field :title, String, null: false
140
- field :author, Types::Author, null: false
180
+ ## Batch loaders for plain ActiveRecord
181
+
182
+ The full surface behind the [quick-start above](#batch-loaders).
183
+
184
+ ```ruby
185
+ class Author < ApplicationRecord
186
+ include Pipeloader::Batch::Model
187
+ batch_has_many :books # chainable, batched
188
+ batch_has_one :profile
189
+ batch_belongs_to :publisher
141
190
  end
142
191
  ```
143
192
 
144
- For `{ posts { title author { name } } }` the posts SELECT becomes
145
- `SELECT id, title, author_id FROM …` (PK + selected column + the FK needed to
146
- resolve `author`), and the authors SELECT becomes `SELECT id, name FROM …` — for
147
- both the root relation and the `belongs_to`.
193
+ `batch_has_many` / `batch_has_one` / `batch_belongs_to` declare a real AR association and
194
+ accept everything the matching macro does (a scope, `class_name:`, `foreign_key:`, and so
195
+ on). `batch_belongs_to` and `batch_has_one` return native records, batched the first time
196
+ any sibling's target is read. `batch_has_many` returns a lazy, chainable proxy whose
197
+ `where` / `order` / `limit` apply inside the one batched query (limit and offset are
198
+ per-owner, top-N per group):
199
+
200
+ ```ruby
201
+ authors.each { |a| a.books.where(published: true).to_a } # filter pushed down, one query
202
+ authors.each { |a| a.books.order(pages: :desc).limit(3) } # each author's 3 longest, one query
203
+ ```
148
204
 
149
- **It never breaks a field.** A classifier narrows only when it can *prove* every
150
- selected field reads a known column or association. The instant a selection is
151
- opaque a computed field, a custom resolver, anything it can't map to a column —
152
- it **bails to a whole-row fetch** for that record, so a projected field can never
153
- raise `MissingAttributeError`.
205
+ The proxy covers the common read surface (`where`, `order`, `limit`, `select`, `pluck`,
206
+ `find_by`, `exists?`, and `Enumerable`). Only writes (`<<`, `create`, `build`, ...)
207
+ delegate to the real association; any read it doesn't implement raises `NoMethodError`
208
+ rather than silently issuing a per-record query.
154
209
 
155
- **The `selects:` escape hatch.** A computed field can declare the columns it
156
- reads, so projection keeps them instead of bailing to a whole row:
210
+ Counts and aggregates batch into a single `GROUP BY`:
157
211
 
158
212
  ```ruby
159
- field :excerpt, String, null: false, selects: %i[body]
160
- def excerpt = object.body[0, 200]
213
+ batch_count :books_count # Integer, default 0
214
+ batch_aggregate :total_pages, of: :books, function: :sum, column: :pages
215
+ batch_aggregate :longest, of: :books, function: :maximum, column: :pages
216
+ ```
217
+
218
+ For anything that isn't a plain association (an existence or viewer-scoped flag, a lookup
219
+ by a non-PK column, a derived value) the general `batch` macro takes a loader returning a
220
+ `{ key => value }` Hash, run once across all siblings:
221
+
222
+ ```ruby
223
+ batch :viewer_has_starred, default: false do |book_ids|
224
+ Star.where(user_id: Current.user.id, book_id: book_ids).pluck(:book_id).index_with(true)
225
+ end
161
226
  ```
162
227
 
163
- Selecting `excerpt` now adds `body` to the projection. With no opt-in (the
164
- default), `selects:` is accepted and ignored, and every SELECT is whole-row.
165
-
166
- ## Status & caveats — this is a proof of concept
167
-
168
- - **Whole rows by default; field-exact is [opt-in](#field-exact-projection-opt-in).**
169
- Off, AR picks the columns (maximum transparency); on, the pipeline projects to
170
- the selected columns and bails to whole rows on anything opaque.
171
- - **PostgreSQL pipelines; SQLite narrows only; anything else raises.** Pipelining
172
- is libpq-specific, so on PostgreSQL queries are pipelined, on SQLite they run
173
- un-pipelined (the opt-in column projection still applies, useful for tests/dev),
174
- and any other adapter raises a `RuntimeError` at query time rather than silently
175
- misbehaving. Running SQLite un-pipelined is safe because SQLite is *embedded* —
176
- its queries are in-process calls with no network round trip, so there's nothing
177
- for a dataloader or a pipeline to collapse. N+1 there is just a series of cheap
178
- local calls, not the latency amplification that makes N+1 catastrophic against a
179
- networked database. So pipelining buys nothing on SQLite, and skipping it costs
180
- nothing.
181
- - **Reads only.** It intercepts `select_all` (SELECTs); writes and non-SELECTs
182
- fall straight through, and queries inside an open transaction are skipped.
183
- - **Assumes thread-isolated connections** (the ActiveRecord default): a request's
184
- resolver fibers all share one connection. Under `:fiber` isolation you'd stash
185
- per leased connection.
186
- - **Stats are process-global** single-threaded demo instrumentation.
187
- - Prepares and caches statements per connection, but doesn't re-prepare after a
188
- reconnect / `DEALLOCATE` the way AR does. Also not hardened for multiple
189
- databases, `count`/`exists?` (which route through other methods), or error
190
- recovery mid-pipeline.
228
+ [DATALOADERS.md](DATALOADERS.md) puts the common `GraphQL::Dataloader` sources
229
+ (record-by-id, has-many, count, by-column, existence, derived) side by side with their
230
+ batch-loader equivalents.
231
+
232
+ Siblings are the records loaded by the same query, and the group rides on the records, so
233
+ batching needs no setup and is correct across threads, fibers, `GraphQL::Dataloader`, and
234
+ fiber-per-request servers alike. Records loaded on their own (a `find`, a separate query)
235
+ form their own group and don't cross-batch. `has_many` / `has_one` / `belongs_to` (including
236
+ `:through` and polymorphic), counts, and aggregates are covered; the `has_many` proxy is
237
+ read-only.
238
+
239
+ ## Status and caveats
240
+
241
+ A proof of concept.
242
+
243
+ - Whole rows by default; [field-exact](#field-exact-projection-opt-in) is opt-in. Off, AR
244
+ picks the columns; on, the pipeline projects to the selected columns and falls back to
245
+ whole rows on anything opaque.
246
+ - PostgreSQL pipelines, SQLite narrows only, anything else raises. Pipelining is
247
+ libpq-specific. On SQLite, queries run un-pipelined (the opt-in projection still
248
+ applies), which is safe because SQLite is embedded: its in-process queries have no round
249
+ trip to collapse, and N+1 there is just cheap local calls. Any other adapter raises a
250
+ `RuntimeError` at query time rather than misbehaving silently.
251
+ - Reads only. It intercepts `select_all` (SELECTs); writes and non-SELECTs pass through,
252
+ and queries inside an open transaction are skipped.
253
+ - Assumes thread-isolated connections (the ActiveRecord default): a request's resolver
254
+ fibers share one connection. Under `:fiber` isolation you'd stash per leased connection.
255
+ - Stats are process-global, single-threaded demo instrumentation.
256
+ - Statements are prepared once per request and `DEALLOCATE`d by the next one (piggybacked
257
+ onto its first burst, so cleanup adds no round trip), so no cache goes stale across a
258
+ reconnect or migration. A query error is drained and raised, leaving the connection
259
+ usable. Not yet hardened for multiple databases or `count`/`exists?` (which route through
260
+ other methods).
191
261
 
192
262
  ## Running the example
193
263
 
@@ -195,30 +265,42 @@ default), `selects:` is accepted and ignored, and every SELECT is whole-row.
195
265
  # Needs a Postgres DB with posts/authors/comments/users tables. In this repo:
196
266
  # go run ./cmd/gqlbench -reset # seeds the graphql_experiment DB
197
267
  ruby example/run.rb # shows the round-trip collapse
198
- ruby example/bench.rb # the latency benchmark (narrow tree)
199
- ruby example/bench_wide.rb # the wide-tree benchmark
268
+ ruby example/bench_wide.rb # the latency benchmark
200
269
  ```
201
270
 
202
271
  Requires `activerecord`, `graphql`, and `pg` (libpq ≥ 14 for pipelining).
203
272
 
204
273
  ## Tests
205
274
 
206
- `rake test`. Three suites, all **parity-first** the pipelined result must be
207
- byte-identical to plain ActiveRecord:
208
-
209
- - **`test/pipeloader_test.rb`** every query runs through both a plain schema and
210
- a `use Pipeloader` schema, asserting identical results across each relationship
211
- kind, nullable foreign keys, empty has-many, deduplication, ordering, type
212
- casting, aliases, variables, and multiplex. It also checks round-trip counts
213
- (= tree depth) and that the patch leaves writes, transactions, and non-GraphQL
214
- ActiveRecord untouched.
215
- - **`test/field_exact_test.rb`** the opt-in projection: projected results match
216
- the whole-row schema, the emitted SQL is actually narrowed (and keeps the FK),
217
- the `selects:` escape hatch includes its columns, and opaque fields bail to a
218
- whole-row `SELECT *` instead of raising.
219
- - **`test/adapter_test.rb`** adapter handling: PostgreSQL pipelines, an
220
- unsupported adapter raises, and a real in-memory **SQLite** run (in a subprocess)
221
- proves projection works there with pipelining disabled.
222
-
223
- Needs a reachable Postgres (the suites create `pl_*` fixture tables in
224
- `graphql_experiment`).
275
+ `rake test`. The pipelining suites are parity-first: the pipelined result must be
276
+ byte-identical to plain ActiveRecord. The batch suites assert one query per level.
277
+
278
+ - `test/pipeloader_test.rb`: every query runs through a plain schema and a `use Pipeloader`
279
+ schema, asserting identical results across each relationship kind, nullable foreign keys,
280
+ empty has-many, deduplication, ordering, type casting, aliases, variables, and multiplex.
281
+ It also checks round-trip counts (= tree depth), that the patch leaves writes,
282
+ transactions, and non-GraphQL ActiveRecord untouched, that a database error inside a
283
+ burst surfaces and leaves the connection usable, that prepared statements don't linger,
284
+ and that existing `GraphQL::Dataloader` sources keep working once pipeloader is installed.
285
+ - `test/field_exact_test.rb`: projected results match the whole-row schema, the emitted SQL
286
+ is narrowed (and keeps the FK), the `selects:` hatch includes its columns, and opaque
287
+ fields fall back to a whole-row `SELECT *`.
288
+ - `test/auto_fuse_test.rb`: a fused result is byte-identical to the un-fused path, fusion
289
+ collapses each level into one `ANY($1)` (round trips = depth, even on wide levels), and
290
+ every non-fusable shape falls back cleanly.
291
+ - `test/adapter_test.rb`: PostgreSQL pipelines, an unsupported adapter raises, and a real
292
+ in-memory SQLite run (in a subprocess) proves projection works there with pipelining off.
293
+ - `test/batch_*_test.rb`: the [batch loaders](#batch-loaders-for-plain-activerecord),
294
+ exhaustively. `batch_proxy_test` covers every `has_many`-proxy variant: `where` (hash,
295
+ string, range, `not`, chained, rewhere), `order` (asc, desc, multi-column, reorder, SQL
296
+ string), per-group `limit`/`offset`, `select`/`distinct`/`pluck`/`joins`, the
297
+ materializers, scope caching, and write-through. With it: `batch_singular_test`
298
+ (`belongs_to` including optional and a non-PK key, `has_one`), `batch_aggregate_test`
299
+ (count/sum/avg/min/max and defaults), `batch_through_test` (`:through` and polymorphic),
300
+ `batch_custom_test` (the `batch` macro), `batch_test` for the basics, and
301
+ `batch_context_test` for the sibling-group model (grouping by load, fiber- and
302
+ thread-safety by construction).
303
+
304
+ Coverage: `rake coverage` (or `COVERAGE=1 rake test`) writes a SimpleCov report to
305
+ `coverage/`. Needs a reachable Postgres (the suites create `pl_*` and `bl_*` fixture tables
306
+ in `graphql_experiment`).
@@ -0,0 +1,63 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pipeloader
4
+ module Batch
5
+ # Runs ONE query for a BatchProxy's accumulated scope across every live
6
+ # sibling of the owner, partitions rows by foreign key, applies per-group
7
+ # limit/offset, and caches each sibling's slice (keyed by the scope) so the
8
+ # other siblings' identical access is free.
9
+ module BatchLoader
10
+ module_function
11
+
12
+ def load(proxy)
13
+ owner = proxy.owner
14
+ cache_key = [proxy.name, proxy.cache_signature]
15
+ cache = owner._pipeloader_batch_scope_cache
16
+ return cache[cache_key] if cache.key?(cache_key)
17
+
18
+ siblings = relevant_siblings(proxy)
19
+ grouped = run(proxy, siblings)
20
+
21
+ siblings.each do |sibling|
22
+ sibling._pipeloader_batch_scope_cache[cache_key] = grouped[sibling.send(proxy.owner_key)]
23
+ end
24
+ grouped[owner.send(proxy.owner_key)]
25
+ end
26
+
27
+ def relevant_siblings(proxy)
28
+ owner = proxy.owner
29
+ siblings = owner._pipeloader_batch_context.all(owner.class).select(&:persisted?)
30
+ siblings << owner if owner.persisted? && siblings.none? { |sibling| sibling.equal?(owner) }
31
+ siblings
32
+ end
33
+
34
+ def run(proxy, siblings)
35
+ foreign_key = proxy.foreign_key
36
+ ids = siblings.map { |sibling| sibling.send(proxy.owner_key) }.compact.uniq
37
+
38
+ grouped = Hash.new { |hash, key| hash[key] = [] }
39
+ return grouped if ids.empty?
40
+
41
+ scope = proxy.relation.where(foreign_key => ids)
42
+ # A custom .select must still include the FK so we can partition the rows.
43
+ scope = scope.select(foreign_key) if scope.select_values.any?
44
+ scope.to_a.each { |record| grouped[record[foreign_key]] << record }
45
+
46
+ apply_window!(grouped, proxy) if proxy.limit_value || proxy.offset_value
47
+ grouped
48
+ end
49
+
50
+ # limit/offset are per group: the batched query is ordered but unlimited, and
51
+ # each owner's slice is taken in Ruby (so "5 per author", not 5 overall).
52
+ def apply_window!(grouped, proxy)
53
+ offset = proxy.offset_value || 0
54
+ limit = proxy.limit_value
55
+ grouped.each_key do |key|
56
+ windowed = grouped[key].drop(offset)
57
+ windowed = windowed.first(limit) if limit
58
+ grouped[key] = windowed
59
+ end
60
+ end
61
+ end
62
+ end
63
+ end