rumongo 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/BENCHMARKS.md +448 -0
- package/LICENSE +21 -0
- package/MIGRATION.md +69 -0
- package/README.md +255 -0
- package/dist/index.d.ts +42 -0
- package/dist/index.js +112 -0
- package/dist/model.d.ts +27 -0
- package/dist/model.js +50 -0
- package/dist/shadow.d.ts +14 -0
- package/dist/shadow.js +53 -0
- package/index.d.ts +51 -0
- package/index.js +317 -0
- package/package.json +50 -0
- package/rumongo.linux-x64-gnu.node +0 -0
- package/worker/pool-worker.js +47 -0
- package/worker/pool.d.ts +42 -0
- package/worker/pool.js +100 -0
package/BENCHMARKS.md
ADDED
|
@@ -0,0 +1,448 @@
|
|
|
1
|
+
# rumongo — Benchmarks: Rust driver vs Node drivers
|
|
2
|
+
|
|
3
|
+
Progressive log of comparison results across build phases. Append new runs; do
|
|
4
|
+
not rewrite history. Each entry: date, what changed, environment, numbers,
|
|
5
|
+
interpretation.
|
|
6
|
+
|
|
7
|
+
## Environment
|
|
8
|
+
|
|
9
|
+
- Host: Linux 6.2.0, 12 cores, 15 GiB RAM
|
|
10
|
+
- MongoDB server: **8.0.16**, local (`mongodb://localhost:27017`)
|
|
11
|
+
- Node.js: **v20.4.0**
|
|
12
|
+
- Rust: **1.96.0**; crates: `mongodb` 3.7.0, `bson` 2.15.0, `napi` 2.16
|
|
13
|
+
- Compare targets:
|
|
14
|
+
- **official** = official `mongodb` Node.js driver (npm `mongodb` ^6.8)
|
|
15
|
+
- **rust** = rumongo (this project)
|
|
16
|
+
- (Mongoose comparison: planned, Phase 5)
|
|
17
|
+
|
|
18
|
+
> ⚠️ All numbers below are **localhost** (≈0 network latency). Pipeline / fetch
|
|
19
|
+
> overlap wins show up under real network latency, not here. Treat localhost as
|
|
20
|
+
> a lower bound for those features and an upper bound for CPU-bound wins.
|
|
21
|
+
|
|
22
|
+
Bench scripts: [bench/compare.js](bench/compare.js) (single-query find),
|
|
23
|
+
[bench/pipeline.js](bench/pipeline.js) (sequential vs pipelined),
|
|
24
|
+
[bench/concurrent.js](bench/concurrent.js) (parallel queries + event-loop jitter).
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## 2026-06-15 — Phase 1: scaffold + basic find()
|
|
29
|
+
|
|
30
|
+
Implementation: standard cursor, full BSON deserialize, returned to JS as JSON
|
|
31
|
+
strings (`JSON.parse` on the JS side). No optimization.
|
|
32
|
+
|
|
33
|
+
### Single-query find, 10k docs, 20 iters (`bench/compare.js`)
|
|
34
|
+
|
|
35
|
+
| metric | official | rust | result |
|
|
36
|
+
|---|---|---|---|
|
|
37
|
+
| parity (result set) | 10000 | 10000 | **identical ✓** |
|
|
38
|
+
| mean (ms) | 216.5 | 171.4 | **rust 1.26× faster** |
|
|
39
|
+
| p50 (ms) | 221.4 | 160.0 | |
|
|
40
|
+
|
|
41
|
+
Interpretation: even unoptimized, native Rust BSON→JSON beats the Node driver
|
|
42
|
+
building JS objects field-by-field on the main thread; V8 `JSON.parse` is cheap.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## 2026-06-15 — Phase 2: pipelined fetch (batched mpsc channel)
|
|
47
|
+
|
|
48
|
+
Implementation: spawned tokio task drives the cursor and pushes **batches** of
|
|
49
|
+
docs through a bounded mpsc channel; consumer serializes while fetcher prefetches.
|
|
50
|
+
Bounded channel = backpressure. (`pipeline` option toggles vs sequential.)
|
|
51
|
+
|
|
52
|
+
### Single-query: sequential vs pipelined, 50k docs, 15 iters (`bench/pipeline.js`)
|
|
53
|
+
|
|
54
|
+
| mode | mean (ms) | p50 (ms) |
|
|
55
|
+
|---|---|---|
|
|
56
|
+
| sequential | 515.3 | 507.4 |
|
|
57
|
+
| pipelined | 629.8 | 629.5 |
|
|
58
|
+
|
|
59
|
+
Result: pipelined **22% SLOWER** on localhost. Expected — `getMore` latency ≈ 0,
|
|
60
|
+
so prefetch-overlap saves nothing while task+channel scheduling costs ~20%.
|
|
61
|
+
(First attempt with a *per-document* channel was ~100% slower; batching the
|
|
62
|
+
channel cut the overhead.) The "30–40% faster" claim needs real network latency.
|
|
63
|
+
|
|
64
|
+
> Note: an earlier per-document channel was 2× slower; the table above is the
|
|
65
|
+
> batched version. Pipeline is kept as the foundation Phase 3 plugs into, and
|
|
66
|
+
> for latency/concurrency wins — not for single-query localhost speed.
|
|
67
|
+
|
|
68
|
+
### Concurrency: 20 parallel queries × 20k docs, 8 iters (`bench/concurrent.js`)
|
|
69
|
+
|
|
70
|
+
| metric | official | rust | result |
|
|
71
|
+
|---|---|---|---|
|
|
72
|
+
| wall time, 20 concurrent (ms) | 6255.9 | 2342.0 | **rust 2.67× faster** ✓ |
|
|
73
|
+
| event-loop max jitter (ms) | 520.2 | 1565.3 | **rust 3× worse** ✗ |
|
|
74
|
+
|
|
75
|
+
Interpretation:
|
|
76
|
+
- **Throughput win is the real Phase 2 result:** concurrent queries' BSON work
|
|
77
|
+
spreads across tokio worker threads instead of serializing on Node's single
|
|
78
|
+
event loop → 2.67×.
|
|
79
|
+
- **Jitter regression is diagnostic, not a dead end:** we return JSON *strings*,
|
|
80
|
+
so the JS side runs 20× `JSON.parse(20k)` in a synchronous burst that blocks
|
|
81
|
+
the loop. The official driver spreads deserialization across arriving batches.
|
|
82
|
+
→ The JSON-string boundary is now the bottleneck. **Phase 3 (off-thread parse,
|
|
83
|
+
no string round-trip) and Phase 4 (lazy, skip parse) target exactly this.**
|
|
84
|
+
|
|
85
|
+
### Correctness (`__tests__/integration/`)
|
|
86
|
+
|
|
87
|
+
- basic.test.js: **5/5 pass**
|
|
88
|
+
- pipeline.test.js: **4/4 pass** (pipelined==sequential, backpressure with
|
|
89
|
+
`maxInflight=1`, abandoned cursor cleanup)
|
|
90
|
+
|
|
91
|
+
### Operational findings
|
|
92
|
+
|
|
93
|
+
- `MongoClient.close()` added — without it the napi tokio runtime never drains
|
|
94
|
+
and Node hangs at exit.
|
|
95
|
+
- Streaming server monitoring makes `close()` take **~10001ms** (awaitable
|
|
96
|
+
`hello` blocks shutdown); `?serverMonitoringMode=poll` → **~1ms**. Tests/benches
|
|
97
|
+
use poll. Revisit graceful shutdown in Phase 7.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## 2026-06-15 — Phase 3: off-thread parse (RawBatchCursor + rayon)
|
|
102
|
+
|
|
103
|
+
Implementation: default path switched to `RawBatchCursor` (`.find(..).batch()`) —
|
|
104
|
+
raw server batches stream through the bounded channel, each parsed to JSON
|
|
105
|
+
strings on the **rayon** pool via `spawn_blocking` (parallel across cores, off the
|
|
106
|
+
async workers). Filters now parsed as **Extended JSON** (`{$oid}`→ObjectId,
|
|
107
|
+
`{$date}`→DateTime). Interface still JSON strings (native JS / lazy = Phase 4).
|
|
108
|
+
`pipeline:false` = Phase 1 standard-cursor baseline.
|
|
109
|
+
|
|
110
|
+
### Parity suite — 23/23 PASS (`__tests__/parity/parity.test.js`)
|
|
111
|
+
|
|
112
|
+
Same query run against official Node driver and rust, results compared after
|
|
113
|
+
canonicalizing rich types. Covers: filters, projection in/out, sort asc/desc,
|
|
114
|
+
limit/skip/both, nested filter, `$gt/$lt/$gte/$lte`, `$in`, `$and`, `$or`, empty
|
|
115
|
+
result, ObjectId filter (EJSON), Date, nested doc, array, null, bool, int, float,
|
|
116
|
+
10k set, abandoned cursor. **All pass.**
|
|
117
|
+
|
|
118
|
+
(Also: integration basic 5/5, pipeline 4/4 still pass.)
|
|
119
|
+
|
|
120
|
+
### Bench: rayon vs baseline vs official (`bench/phase3.js`)
|
|
121
|
+
|
|
122
|
+
`rust-base` = `pipeline:false` (Phase 1 path); `rust-rayon` = Phase 3.
|
|
123
|
+
|
|
124
|
+
**Single query, 100k docs, 6 iters:**
|
|
125
|
+
|
|
126
|
+
| target | wall mean (ms) | max jitter (ms) |
|
|
127
|
+
|---|---|---|
|
|
128
|
+
| official | 2441.9 | 82.6 |
|
|
129
|
+
| rust-base | 1299.4 | 5.9 |
|
|
130
|
+
| rust-rayon | **641.5** | 9.2 |
|
|
131
|
+
|
|
132
|
+
→ rayon **2.03× vs base**, **3.81× vs official**.
|
|
133
|
+
|
|
134
|
+
**20 concurrent queries, 100k docs each, 6 iters:**
|
|
135
|
+
|
|
136
|
+
| target | wall mean (ms) | max jitter (ms) |
|
|
137
|
+
|---|---|---|
|
|
138
|
+
| official | 37167.3 | 1221.3 |
|
|
139
|
+
| rust-base | 15553.6 | 9072.1 |
|
|
140
|
+
| rust-rayon | **12346.5** | 8140.9 |
|
|
141
|
+
|
|
142
|
+
→ rayon **1.26× vs base**, **3.01× vs official**.
|
|
143
|
+
|
|
144
|
+
Interpretation:
|
|
145
|
+
- **Throughput is the Phase 3 win:** parallel parse across cores → 3.8× over
|
|
146
|
+
official on a single large query, 3× concurrent, 2× over the Phase 1 baseline
|
|
147
|
+
(meets the plan's "2–3× over Phase 1" gate). Concurrent gain over base is
|
|
148
|
+
smaller (1.26×) because 20 concurrent queries already saturate the 12 cores.
|
|
149
|
+
- **Jitter still high on concurrent (8141ms vs official 1221ms):** the rayon
|
|
150
|
+
parse is off-loop, but each query's result is still returned as JSON strings →
|
|
151
|
+
20 synchronous `JSON.parse` bursts block the event loop. Single-query jitter is
|
|
152
|
+
fine (9ms). **The JSON-string boundary is the remaining bottleneck → Phase 4
|
|
153
|
+
(native JS values + lazy field access) targets exactly this.**
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## 2026-06-15 — Phase 4: lazy zero-copy (RawDoc + Proxy)
|
|
158
|
+
|
|
159
|
+
Implementation: new `find_lazy()` returns `RawDoc` handles holding raw BSON bytes
|
|
160
|
+
— **no value parsing on return**. A field is parsed only when JS reads it
|
|
161
|
+
(`get_field`), via a native BSON→JS converter (String/number/bool/null, Date,
|
|
162
|
+
ObjectId→hex, nested doc/array, Buffer). A JS `Proxy` (ts/index.ts) makes
|
|
163
|
+
`doc.field` call `get_field`, while spread / `JSON.stringify` still see all fields
|
|
164
|
+
(ownKeys + descriptors). Eager `find()` is unchanged.
|
|
165
|
+
|
|
166
|
+
### Lazy tests — 6/6 PASS (`__tests__/lazy/lazy.test.js`)
|
|
167
|
+
|
|
168
|
+
getField primitives; Date/ObjectId/nested/array/Buffer; `keys()`; `to_object`
|
|
169
|
+
parity vs official (normalized); Proxy dot-access + spread + JSON.stringify;
|
|
170
|
+
partial access of a 40-field doc. (Eager 23 parity + 9 integration still pass.)
|
|
171
|
+
|
|
172
|
+
### Bench: lazy vs eager vs official (`bench/lazy.js`)
|
|
173
|
+
|
|
174
|
+
10 concurrent queries, 20k docs × 33 fields, **reading only 2 fields/doc**, 6 iters:
|
|
175
|
+
|
|
176
|
+
| target | wall mean (ms) | max jitter (ms) |
|
|
177
|
+
|---|---|---|
|
|
178
|
+
| official | 8362.2 | 645.2 |
|
|
179
|
+
| rust-eager | 2537.3 | 1322.0 |
|
|
180
|
+
| rust-lazy | **1144.1** | 991.1 |
|
|
181
|
+
|
|
182
|
+
→ lazy **2.22× vs eager**, **7.31× vs official**.
|
|
183
|
+
|
|
184
|
+
Interpretation:
|
|
185
|
+
- **Throughput is the Phase 4 win:** skipping the 31 unread fields makes lazy
|
|
186
|
+
**7.3× faster than the official driver** and 2.2× faster than our own eager
|
|
187
|
+
path. This is the lever for the headline Mongoose win (Phase 5 Model layer
|
|
188
|
+
pushes projections so even fewer bytes are fetched).
|
|
189
|
+
- **Jitter (991ms) beats eager (1322ms) but not official (645ms):** `find_lazy`
|
|
190
|
+
still materializes one `RawDoc` per doc and each `getField` crosses the JS↔Rust
|
|
191
|
+
boundary — both touch the event loop. Near-zero jitter would need a streaming
|
|
192
|
+
iterator instead of an up-front handle array (future work).
|
|
193
|
+
- **Memory tradeoff:** one handle object + its byte buffer per doc. Holding ~1M
|
|
194
|
+
simultaneously OOMs Node's default 2GB heap (observed at 20×50k). Lazy is for
|
|
195
|
+
"wide docs, few fields read," not for buffering millions of docs at once.
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## 2026-06-15 — Phase 4b: jitter investigation + streaming cursor
|
|
200
|
+
|
|
201
|
+
Goal: drive down the concurrent-query jitter from Phase 4 (lazy 991ms).
|
|
202
|
+
|
|
203
|
+
### Diagnosis (`single query, 50k docs`)
|
|
204
|
+
|
|
205
|
+
| step | max jitter |
|
|
206
|
+
|---|---|
|
|
207
|
+
| A) findLazy return only (no access) | **0.9ms** |
|
|
208
|
+
| B) access 2 fields (sync loop) | **0.0ms** |
|
|
209
|
+
| C) eager find + JSON.parse all | **5.7ms** |
|
|
210
|
+
|
|
211
|
+
→ Single/low-concurrency jitter is already near-zero. The Phase 4 concurrent
|
|
212
|
+
991ms was **not** from marshaling.
|
|
213
|
+
|
|
214
|
+
### Streaming cursor (`FindCursor.next_batch()`)
|
|
215
|
+
|
|
216
|
+
Added a cursor that hands back one batch at a time (process + drop before the
|
|
217
|
+
next), so peak live objects ≈ one batch. Bench, 10 concurrent, 20k×33 fields,
|
|
218
|
+
read 2 fields/doc:
|
|
219
|
+
|
|
220
|
+
| target | wall (ms) | max jitter (ms) |
|
|
221
|
+
|---|---|---|
|
|
222
|
+
| official | 7537 | 636.8 |
|
|
223
|
+
| lazy-array | 830 | 601.4 |
|
|
224
|
+
| lazy-cursor | 821 | 672.0 |
|
|
225
|
+
|
|
226
|
+
**Finding:** at 10× concurrency the jitter floor (~600ms) is the **same for the
|
|
227
|
+
official driver too**. It is not our parsing — it's the single JS thread being
|
|
228
|
+
saturated by 10 simultaneous CPU-bound query loops, so the 10ms timer can't fire
|
|
229
|
+
regardless of driver. `await`-yields don't help when 10 queries keep the thread
|
|
230
|
+
busy. The honest levers:
|
|
231
|
+
- **Do less main-thread work** → lazy already does (reads 2 of 33 fields): same
|
|
232
|
+
peak jitter as official but the busy window is **9× shorter** (821ms vs 7537ms),
|
|
233
|
+
so the loop is responsive again ~9× sooner.
|
|
234
|
+
- **Bound memory** → the cursor's real, measured win.
|
|
235
|
+
|
|
236
|
+
### Memory: cursor survives what `findLazy` OOM'd
|
|
237
|
+
|
|
238
|
+
`findLazy` on 20×50k (=1M docs) → **OOM at 2080MB** (heap full of handles).
|
|
239
|
+
`FindCursor` on the same 1M docs → **peak RSS 1140MB, no OOM, 4447ms**.
|
|
240
|
+
|
|
241
|
+
Takeaways:
|
|
242
|
+
- Lazy/cursor jitter is near-zero at realistic concurrency; at extreme
|
|
243
|
+
concurrency it's main-thread-bound and equal to official, but lazy finishes
|
|
244
|
+
far sooner (less total work).
|
|
245
|
+
- Use `find` (eager) for small results, `findLazy` for wide-doc/few-field reads,
|
|
246
|
+
`findCursor` for large/streaming results (bounded memory).
|
|
247
|
+
|
|
248
|
+
---
|
|
249
|
+
|
|
250
|
+
## 2026-06-15 — Phase 4c: worker-thread offload = near-zero main-loop jitter
|
|
251
|
+
|
|
252
|
+
Physics: napi builds JS values on the calling isolate, and JS is single-threaded
|
|
253
|
+
per isolate. We already decode BSON off-thread (rayon), but the final Rust→JS
|
|
254
|
+
materialization runs on whichever isolate owns the result. On the main isolate
|
|
255
|
+
under concurrency that saturates → jitter. The only escape is a *different
|
|
256
|
+
isolate* = a Worker thread.
|
|
257
|
+
|
|
258
|
+
Bench (`bench/worker.js`): 10 concurrent `findLazy` queries, 20k×33 fields,
|
|
259
|
+
2 fields read. Main thread runs a 10ms heartbeat throughout.
|
|
260
|
+
|
|
261
|
+
| where the queries run | query wall (ms) | MAIN-loop max jitter (ms) |
|
|
262
|
+
|---|---|---|
|
|
263
|
+
| main thread | 1101 | 329.9 |
|
|
264
|
+
| **worker thread** | 919 | **0.7** |
|
|
265
|
+
|
|
266
|
+
→ Offloading the addon to a worker drops main-loop jitter **330ms → 0.7ms**
|
|
267
|
+
(~470×) and even runs faster (no contention with the heartbeat).
|
|
268
|
+
|
|
269
|
+
Why this is a Rust-driver advantage: the worker's isolate does fetch + BSON→JS
|
|
270
|
+
in native code; the main isolate does nothing. The official Node driver's BSON
|
|
271
|
+
decode is JS, so a worker still pays full JS deserialize and worker→main transfer
|
|
272
|
+
is heavier.
|
|
273
|
+
|
|
274
|
+
Caveat: returning large result *data* to main still costs a structured clone.
|
|
275
|
+
Mitigate by (a) processing in the worker and returning summaries, or (b)
|
|
276
|
+
transferring the raw BSON Buffer (transferable, zero-copy) and lazy-parsing only
|
|
277
|
+
accessed fields on main. Recommended production shape: a small worker pool running
|
|
278
|
+
rumongo, main thread dispatches queries — main loop stays responsive under load.
|
|
279
|
+
|
|
280
|
+
**Conclusion on jitter:** near-zero is achievable. Single/low concurrency →
|
|
281
|
+
already near-zero (Phase 4b). High concurrency on one isolate → main-thread-bound
|
|
282
|
+
(equal to official, but lazy finishes ~9× sooner). High concurrency with a worker
|
|
283
|
+
pool → main-loop jitter ~0.7ms. Lever summary: lazy (less work) + worker threads
|
|
284
|
+
(other isolate) + cursor (bounded memory).
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## 2026-06-15 — Phase 4d: worker pool load sweep → opt-in (not default)
|
|
289
|
+
|
|
290
|
+
Built an opt-in worker pool (`worker/pool.js` + `worker/pool-worker.js`): N Node
|
|
291
|
+
worker threads, each with its own addon + MongoClient, round-robin dispatch.
|
|
292
|
+
Swept it vs direct main-thread use across loads (`bench/poolbench.js`, pool=6).
|
|
293
|
+
|
|
294
|
+
| load | mode | wall (ms) | main jitter (ms) |
|
|
295
|
+
|---|---|---|---|
|
|
296
|
+
| tiny (1 doc, 50 conc) | direct | 8.6 | 0.6 |
|
|
297
|
+
| | pool-data | 18.0 | 3.0 |
|
|
298
|
+
| | pool-reduced | 8.4 | 0.0 |
|
|
299
|
+
| small (100, 20 conc) | direct | 24.9 | 11.7 |
|
|
300
|
+
| | pool-data | 25.8 | 11.5 |
|
|
301
|
+
| | pool-reduced | 14.1 | 1.3 |
|
|
302
|
+
| med (1000, 12 conc) | direct | 121.9 | 93.2 |
|
|
303
|
+
| | pool-data | 125.1 | 108.9 |
|
|
304
|
+
| | pool-reduced | 69.4 | 0.9 |
|
|
305
|
+
| heavy (10000, 6 conc) | direct | 780.7 | 517.9 |
|
|
306
|
+
| | pool-data | 860.6 | 551.0 |
|
|
307
|
+
| | pool-reduced | **328.4** | **7.5** |
|
|
308
|
+
|
|
309
|
+
- **pool-data** (worker queries, ships rows to main): ties or LOSES at every load.
|
|
310
|
+
Main still parses the result and now also pays the cross-thread transfer. A
|
|
311
|
+
transparent "route find() through workers" buys nothing.
|
|
312
|
+
- **pool-reduced** (worker queries AND reduces, returns a summary): wins at every
|
|
313
|
+
load — jitter near-zero (7.5 vs 518ms heavy) and wall up to 2.4× faster. But it
|
|
314
|
+
requires pushing the data-processing INTO the worker; it is not a drop-in find().
|
|
315
|
+
|
|
316
|
+
**Decision: worker pool stays OPT-IN**, positioned for the "do the work in the
|
|
317
|
+
worker, return a small result" pattern (aggregations, transforms, counts, exports,
|
|
318
|
+
streaming to a socket from the worker). Not made default, because the only
|
|
319
|
+
universally-winning mode isn't a transparent `find()` replacement. The default
|
|
320
|
+
path remains the direct addon (already 3–7× faster than the official driver).
|
|
321
|
+
|
|
322
|
+
---
|
|
323
|
+
|
|
324
|
+
## 2026-06-16 — Phase 4e: generic worker reduce + larger-load sweep
|
|
325
|
+
|
|
326
|
+
Generalized the worker reduce: `pool.reduce(db, coll, filter, opts, reducerFn,
|
|
327
|
+
init)` ships the reducer as source and runs `(acc, doc) => acc` in the worker;
|
|
328
|
+
only the accumulator returns to main. (find() stays direct; reduce is the
|
|
329
|
+
worker-backed path.)
|
|
330
|
+
|
|
331
|
+
Sweep, 100k-doc collection, pool=6 (`bench/poolbench.js`):
|
|
332
|
+
|
|
333
|
+
| load (result size, conc) | direct wall/jit (ms) | pool-reduced wall/jit (ms) |
|
|
334
|
+
|---|---|---|
|
|
335
|
+
| small (100, 20) | 33 / 18 | 19 / 0.5 |
|
|
336
|
+
| med (1000, 12) | 154 / 127 | 65 / 0.8 |
|
|
337
|
+
| heavy (10000, 6) | 808 / 544 | 337 / 4.6 |
|
|
338
|
+
| huge (50000, 4) | 2879 / 1722 | 1420 / 14 |
|
|
339
|
+
| max (100000, 2) | 3118 / 163 | 2274 / 18 |
|
|
340
|
+
|
|
341
|
+
(pool-data omitted — ties/loses on wall at every load, same as Phase 4d.)
|
|
342
|
+
|
|
343
|
+
- pool-reduced wins wall ~2–2.4× and keeps main-loop jitter ≤18ms while direct
|
|
344
|
+
jitter climbs to 1722ms on big result sets. The bigger the data, the bigger the
|
|
345
|
+
responsiveness win.
|
|
346
|
+
- Worker count is **fixed** at pool creation (default `cpus-2`). BSON work is
|
|
347
|
+
CPU-bound so >cores doesn't help; dynamic autoscaling would add cold-start
|
|
348
|
+
latency on spikes — deferred unless bursty traffic needs it.
|
|
349
|
+
|
|
350
|
+
Final stance: worker pool = opt-in; `reduce` runs in the worker by default within
|
|
351
|
+
the pool. Direct addon remains the default for `find` (returns docs).
|
|
352
|
+
|
|
353
|
+
---
|
|
354
|
+
|
|
355
|
+
## 2026-06-16 — Phase 5: Mongoose-style Model + projection pushdown
|
|
356
|
+
|
|
357
|
+
`ts/model.ts`: `Model.define(collection, schemaFields)` builds a cached
|
|
358
|
+
projection from the schema field list and pushes it down on every query, so
|
|
359
|
+
MongoDB only sends schema fields. Methods: `find`, `findOne`, `findById`
|
|
360
|
+
(hex-string id → ObjectId via Extended JSON), `getProjection`.
|
|
361
|
+
|
|
362
|
+
### Model parity — 9/9 PASS vs Mongoose (`__tests__/model/model.test.js`)
|
|
363
|
+
|
|
364
|
+
find / find+filter / findOne / findById / sort / limit / **projection pushdown
|
|
365
|
+
(non-schema field excluded)** / empty→[] / no-match→null. Compared schema-field
|
|
366
|
+
values + counts on identical data (mongoose `versionKey:false`, `.lean()`).
|
|
367
|
+
|
|
368
|
+
### Perf: Model vs Mongoose, 50k docs (6 fields), 10 iters (`bench/model.js`)
|
|
369
|
+
|
|
370
|
+
| target | mean (ms) |
|
|
371
|
+
|---|---|
|
|
372
|
+
| mongoose (hydrated) | 1228.1 |
|
|
373
|
+
| mongoose (.lean) | 607.2 |
|
|
374
|
+
| **rumongo Model** | **244.6** |
|
|
375
|
+
|
|
376
|
+
→ **5.0× vs hydrated Mongoose**, 2.5× vs `.lean()`. (Eager, all 6 fields read.
|
|
377
|
+
With `findLazy` + few-field reads the multiple is higher — see Phase 4: 7.3× vs
|
|
378
|
+
the raw official driver when reading 2 of 33 fields.)
|
|
379
|
+
|
|
380
|
+
MIGRATION.md written (Mongoose → rumongo mapping + behavior differences).
|
|
381
|
+
|
|
382
|
+
---
|
|
383
|
+
|
|
384
|
+
## 2026-06-16 — Phase 6: consolidated preset benchmark suite
|
|
385
|
+
|
|
386
|
+
`bench/suite.js`: projection presets (few=4, small=9, medium=15, large=35,
|
|
387
|
+
full=45 fields) over a 45-field doc. Deterministic (data = f(index)), warmup +
|
|
388
|
+
6 iters, mean ± sd. N=30k. Run against local MongoDB (network ~0 isolates
|
|
389
|
+
client-side cost — the mock wire-server from the plan was skipped: making both
|
|
390
|
+
the official Node driver and the Rust `mongodb` crate accept a hand-rolled
|
|
391
|
+
handshake is large and brittle, and localhost already removes network variance).
|
|
392
|
+
|
|
393
|
+
### A) Driver find — official Node driver vs rumongo (eager)
|
|
394
|
+
|
|
395
|
+
| preset | fields | official (ms) | rumongo (ms) | speedup |
|
|
396
|
+
|---|---|---|---|---|
|
|
397
|
+
| few | 4 | 649±95 | 178±17 | **3.65×** |
|
|
398
|
+
| small | 9 | 792±62 | 304±16 | 2.61× |
|
|
399
|
+
| medium | 15 | 687±49 | 418±54 | 1.64× |
|
|
400
|
+
| large | 35 | 1532±132 | 841±61 | 1.82× |
|
|
401
|
+
| full | 45 | 2032±135 | 1031±59 | 1.97× |
|
|
402
|
+
|
|
403
|
+
### B) ODM — mongoose `.lean()` vs rumongo Model
|
|
404
|
+
|
|
405
|
+
| preset | fields | mongoose (ms) | Model (ms) | speedup |
|
|
406
|
+
|---|---|---|---|---|
|
|
407
|
+
| few | 4 | 477±53 | 177±18 | 2.69× |
|
|
408
|
+
| small | 9 | 559±35 | 284±31 | 1.97× |
|
|
409
|
+
| medium | 15 | 680±68 | 405±35 | 1.68× |
|
|
410
|
+
| large | 35 | 1455±24 | 850±65 | 1.71× |
|
|
411
|
+
| full | 45 | 2041±167 | 1031±98 | 1.98× |
|
|
412
|
+
|
|
413
|
+
### C) Event-loop jitter (full preset, single query)
|
|
414
|
+
|
|
415
|
+
official `maxJitter=149.2ms` · rumongo `maxJitter=13.8ms` (~10× lower).
|
|
416
|
+
|
|
417
|
+
### Verdict vs plan targets (honest)
|
|
418
|
+
|
|
419
|
+
- **≥2× over official find:** met for few/small/full (1.97–3.65×); medium/large
|
|
420
|
+
1.64–1.82× (just under 2× — projection cost dominates at mid widths).
|
|
421
|
+
- **≥15× over Mongoose:** NOT met by eager find. vs `.lean()` it's 1.7–2.7×; vs
|
|
422
|
+
hydrated Mongoose ~5× (Phase 5). The 15× figure only appears with lazy +
|
|
423
|
+
narrow field reads (Phase 4: 7.3× vs the raw official driver reading 2 of 33
|
|
424
|
+
fields) — i.e. it's a property of the access pattern, not eager full-doc reads.
|
|
425
|
+
- **Near-zero jitter:** eager full read is 13.8ms (10× better than official, not
|
|
426
|
+
zero — JSON.parse remains). Near-zero needs lazy/worker paths (Phase 4b/4c).
|
|
427
|
+
|
|
428
|
+
Bottom line: 1.6–3.7× faster reads than the official driver, ~2× vs Mongoose
|
|
429
|
+
`.lean()` / ~5× vs hydrated, 10× lower jitter — consistently, across projection
|
|
430
|
+
sizes. The headline 15–20× is achievable but only under lazy/narrow-read or
|
|
431
|
+
worker-offload patterns, not eager full-document reads.
|
|
432
|
+
|
|
433
|
+
---
|
|
434
|
+
|
|
435
|
+
## Template for future entries
|
|
436
|
+
|
|
437
|
+
```
|
|
438
|
+
## YYYY-MM-DD — Phase N: <title>
|
|
439
|
+
|
|
440
|
+
Implementation: <what changed>
|
|
441
|
+
|
|
442
|
+
### <scenario>, <dataset>, <iters> (`bench/<script>.js`)
|
|
443
|
+
| metric | official | rust | result |
|
|
444
|
+
|---|---|---|---|
|
|
445
|
+
| ... | | | |
|
|
446
|
+
|
|
447
|
+
Interpretation: <why the numbers look like this; what's next>
|
|
448
|
+
```
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Piyush Bhangale
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/MIGRATION.md
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Migrating from Mongoose to rumongo
|
|
2
|
+
|
|
3
|
+
rumongo is a **read-path** replacement for Mongoose: faster reads via
|
|
4
|
+
Rust-native BSON parsing, off-thread, with optional lazy field access. It does
|
|
5
|
+
**not** cover writes, hooks, virtuals, populate, or validation — keep Mongoose
|
|
6
|
+
(or the official driver) for those.
|
|
7
|
+
|
|
8
|
+
## Model definition
|
|
9
|
+
|
|
10
|
+
**Mongoose**
|
|
11
|
+
```js
|
|
12
|
+
const User = mongoose.model('User', new mongoose.Schema({ name: String, age: Number }))
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
**rumongo**
|
|
16
|
+
```js
|
|
17
|
+
import { MongoClient, Model } from 'rumongo'
|
|
18
|
+
const client = await MongoClient.connect(uri)
|
|
19
|
+
const coll = client.collection('mydb', 'users')
|
|
20
|
+
const User = Model.define(coll, { name: 1, age: 1 }) // field list = projection
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
The schema field list becomes a cached projection — MongoDB only sends those
|
|
24
|
+
fields (projection pushdown), so less data on the wire and less to parse.
|
|
25
|
+
|
|
26
|
+
## Read methods
|
|
27
|
+
|
|
28
|
+
| Mongoose | rumongo | Notes |
|
|
29
|
+
|---|---|---|
|
|
30
|
+
| `User.find(filter)` | `User.find(filter)` | returns plain objects |
|
|
31
|
+
| `User.find(f).sort(s).limit(n)` | `User.find(f, { sort: s, limit: n })` | options object, not chained |
|
|
32
|
+
| `User.findOne(filter)` | `User.findOne(filter)` | `null` if no match |
|
|
33
|
+
| `User.findById(id)` | `User.findById(idHexString)` | pass the 24-char hex string |
|
|
34
|
+
| `.lean()` | (always) | rumongo always returns plain objects |
|
|
35
|
+
| `.select('name age')` | (automatic) | the schema fields are the projection |
|
|
36
|
+
|
|
37
|
+
## Behavior differences
|
|
38
|
+
|
|
39
|
+
- **No Mongoose Documents.** Results are plain objects (like `.lean()`). No
|
|
40
|
+
`.save()`, no getters/setters, no virtuals.
|
|
41
|
+
- **`_id` is a hex string**, not an `ObjectId` instance. Compare with
|
|
42
|
+
`id === doc._id` (string), or convert.
|
|
43
|
+
- **Dates** come back as JS `Date` (via `findLazy`/`Model`) — same as Mongoose
|
|
44
|
+
`.lean()`.
|
|
45
|
+
- **No `__v`** version key (not in your schema → not projected).
|
|
46
|
+
- **Filters with BSON types** use Extended JSON: an ObjectId filter is
|
|
47
|
+
`{ _id: { $oid: '...' } }`. `findById` does this for you.
|
|
48
|
+
- **Writes / hooks / populate / validation: not supported.** Use Mongoose or the
|
|
49
|
+
official driver for the write path; use rumongo for hot read paths.
|
|
50
|
+
|
|
51
|
+
## Advanced: keep the event loop free under load
|
|
52
|
+
|
|
53
|
+
For heavy concurrent read+aggregate work, run the reduction inside a worker so the
|
|
54
|
+
main loop stays responsive (see README / `worker/pool.js`):
|
|
55
|
+
|
|
56
|
+
```js
|
|
57
|
+
import { WorkerPool } from 'rumongo'
|
|
58
|
+
const pool = await WorkerPool.create({ uri, size: 6 })
|
|
59
|
+
const { acc } = await pool.reduce('mydb', 'users', { active: true }, {}, (a, d) => a + d.age, 0)
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Three read APIs (pick by shape)
|
|
63
|
+
|
|
64
|
+
- `collection.find(filter, opts)` — eager, plain objects. Small/medium results.
|
|
65
|
+
- `collection.findLazy(filter, opts)` — Proxy docs; fields parse on access. Wide
|
|
66
|
+
docs where you read few fields.
|
|
67
|
+
- `collection.findCursor(filter, opts)` → `nextBatch()` — streaming, bounded
|
|
68
|
+
memory. Large result sets.
|
|
69
|
+
- `Model.find/findOne/findById` — Mongoose-style + automatic projection.
|