exwiw 0.5.2 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,70 @@
1
+ # Plan: Back out MongoDB fork/cursor parallelism → optional Extended-JSON C extension
2
+
3
+ ## Context (why)
4
+
5
+ This branch (`gnhf/rt-rails-mongodb-dum-ed518c`) shipped a large MongoDB-dump perf effort across 18 iterations. The memory wins (iter 2–5: streaming result set + chunked output + precompiled `MaskPlan`) are clean, byte-identical, no-flag defaults and stay. But the CPU/throughput half (iter 6–18) grew into heavy multi-process machinery: two CLI flags (`--parallel-workers`, `--cursor-parallel`), four components (`ParallelSerializer`, `MongoIdPartitioner`, `PropagationCapture`, `ForkedPartWriter`), adapter `write_bulk_insert`/`write_inserts` seams, CLI→Runner→Adapter threading, fork orchestration, Marshal IPC, `_id`-range partitioning, distributed `@state` merge, and a sorted-output caveat — to buy ~1.1–1.4× (serialize-fork) / ~2.5–5.5× (cursor-parallel).
6
+
7
+ The maintainer's decision: **this is over-engineered.** Preserve the findings as `docs/optimization-notes.md`, **remove the parallelism**, and address the CPU hotspot with a much simpler lever the earlier iterations kept pointing at — a **C extension** for the dominant `as_extended_json(mode: :relaxed) + JSON.generate` cost.
8
+
9
+ **Honest scope of the win:** a C extension speeds only the per-document Extended-JSON *serialization* (~82% of per-doc serialize cost). It cannot touch the Mongo cursor's BSON→Ruby *decode* (~40% of total wall time, inside the driver) — that was cursor-parallel's job and is being removed. So end-to-end is bounded ~2.5× (the same Amdahl ceiling `--parallel-workers` hit), not cursor-parallel's 3.4–5.5×. The trade is deliberate: far simpler code + a no-flag, no-fork, single-process CPU win, for a smaller peak speedup. Memory behavior is unchanged (streaming stays the default).
10
+
11
+ ## Part 1 — Remove the fork/cursor parallelism (keep streaming + MaskPlan)
12
+
13
+ Verified: `git diff b389204..HEAD` on the core lib files is **entirely** parallelism (`b389204` = iter-5 "MaskPlan" commit). So the iter-5 versions are exactly the "keep streaming, drop parallel" baseline.
14
+
15
+ **Restore to `b389204` (clean streaming baseline):**
16
+ - `lib/exwiw/runner.rb` — back to the inline `each_slice` chunk loop (`file.print(adapter.to_bulk_insert(chunk))`), no `write_inserts`/`parallel_workers`/`cursor_parallel`.
17
+ - `lib/exwiw/adapter.rb` — `Base#initialize(connection_config, logger)` (drop the two kwargs + ivars), `self.build(connection_config, logger)`; drop the `write_bulk_insert`/`write_inserts` seams (they exist only for parallelism). Keep `default_bulk_insert_chunk_size` (streaming).
18
+ - `lib/exwiw/adapter/mongodb_adapter.rb` — iter-5 form: `StreamingResult` without `query`/`keys` readers; `db` with inline client construction (drop `build_client`); no `write_bulk_insert`/`write_inserts`/`cursor_parallel`/`parallel_workers`/partitioner/capture. Keep `StreamingResult`, `MaskPlan`, chunking. (Then patch `serialize_document` in Part 3.)
19
+ - `lib/exwiw/cli.rb` — drop `--parallel-workers`/`--cursor-parallel` flags, ivars, `validate_parallel_workers!`/`validate_cursor_parallel!`, and the Runner kwargs.
20
+ - `lib/exwiw.rb` — drop the four parallel `require_relative` lines (Part 3 adds the `ext_json` require).
21
+
22
+ **Delete (components + specs + probes):**
23
+ - `lib/exwiw/{parallel_serializer,forked_part_writer,mongo_id_partitioner,propagation_capture}.rb`
24
+ - `spec/{parallel_serializer,forked_part_writer,mongo_id_partitioner,propagation_capture}_spec.rb`
25
+ - `script/bench_mongodb_parallel_probe.rb`, `script/bench_mongodb_cursor_parallel_probe.rb`
26
+
27
+ **Restore to `b389204`:** `spec/adapter_spec.rb`, `spec/cli_spec.rb`, `spec/adapter/mongodb_adapter_spec.rb`, `script/bench_mongodb_dump.rb` (all post-iter-5 changes there are parallel-only).
28
+
29
+ **Edit, don't restore:** `README.md`, `CHANGELOG.md` — remove the `--parallel-workers` / `--cursor-parallel` / `EXWIW_MONGODB_*` sections; the `[Unreleased]` entry becomes "streaming/chunked MongoDB dump by default + optional Extended-JSON C extension", pointing to `docs/optimization-notes.md`.
30
+
31
+ ## Part 2 — `docs/optimization-notes.md`
32
+
33
+ Distill the 18-iteration log (`.gnhf/runs/.../notes.md`) into a durable doc: the two hotspots (result-set memory; `as_extended_json` CPU), what shipped by default (streaming + chunking + `MaskPlan`), and the **explored-and-removed** parallelism — fork serialize (~1.1–1.4×), cursor-parallel (~3.4–5.5× but heavy: IPC, `_id` partitioning, distributed `@state`, sorted output) — with the Amdahl reasoning (serial cursor decode floor) and *why* it was removed in favor of the C extension. This is the knowledge-preservation the maintainer asked for.
34
+
35
+ ## Part 3 — Optional Extended-JSON C extension — DOCUMENT ONLY (not implemented now)
36
+
37
+ **Revised scope (per maintainer):** Part 3 is NOT implemented in this pass. Instead, write the design below into `docs/optimize-mongodb-export-with-native-ext.md` as a future-work / design doc. Only Part 1 and Part 2 are executed now.
38
+
39
+ Design to capture: replaces `JSON.generate(doc.as_extended_json(mode: :relaxed))` with one native tree-walk that emits the Relaxed Extended JSON line directly — no intermediate transformed-Hash tree, no second JSON pass.
40
+
41
+ **New files**
42
+ - `ext/exwiw/ext_json/extconf.rb` — `mkmf` (stdlib), `create_makefile("exwiw/ext_json_native")`.
43
+ - `ext/exwiw/ext_json/ext_json.c` — defines `Exwiw::ExtJson.encode_native(doc) -> String` (one JSONL line, no trailing `\n`). Recursive emitter into a growth buffer.
44
+ - `lib/exwiw/ext_json.rb` — shim: `require "exwiw/ext_json_native"` (distinct name avoids self-collision); on `LoadError`, define a pure-Ruby `encode`. Exposes one stable `Exwiw::ExtJson.encode(doc)`. Fallback is **byte-for-byte today's code**:
45
+ ```ruby
46
+ JSON.generate(doc.respond_to?(:as_extended_json) ? doc.as_extended_json(mode: :relaxed) : doc)
47
+ ```
48
+
49
+ **Native fast-path vs delegate (byte-identity strategy, from the Plan agent's findings)**
50
+ - Native in C: `Hash` (insertion order via `rb_hash_foreach`; String keys), `Array`, `String` (escape only `\b\t\n\f\r\"\\`; lowercase `\u00xx` for other <0x20; leave `/`, DEL, U+2028/9, non-ASCII raw), `Integer` within int64, `true`/`false`/`nil`, and `BSON::ObjectId` → `{"$oid":"<24 hex>"}` (hex via `to_s`).
51
+ - **Delegate to Ruby** (call back to a fallback helper returning the JSON fragment for that value): `Float` (`Float#to_s` ≠ `JSON.generate` for sci-notation, e.g. `1e20`), `Time` (variable fractional digits + the years-[1970,9999] `$numberLong` boundary — highest risk), out-of-int64 `Integer` (must preserve the existing `RangeError`), and any unrecognized class (Decimal128, Binary, Symbol, Regexp, Date, …). This is provably byte-identical because `Hash/Array#as_extended_json` are non-transforming structural recursion, so `JSON.generate(v.as_extended_json(mode: :relaxed))` on any sub-value matches the whole-tree output. Time/Float are candidates for later native promotion if the benchmark justifies the added risk.
52
+
53
+ **Packaging / wiring**
54
+ - `exwiw.gemspec` — `spec.extensions = ["ext/exwiw/ext_json/extconf.rb"]` (auto-compiles on `gem install`; fallback covers platforms that can't).
55
+ - `Rakefile` — `Rake::ExtensionTask.new("ext_json_native")`; make `spec` depend on `compile`.
56
+ - Add `rake-compiler` as a dev dep (Gemfile/gemspec). `mkmf` is stdlib.
57
+ - `.gitignore` — ignore built artifacts (`lib/exwiw/*.bundle`, `lib/exwiw/*.so`, `ext/**/*.o`, `ext/**/Makefile`); commit only `ext/` sources (gemspec ships via `git ls-files`).
58
+ - `lib/exwiw.rb` — `require_relative "exwiw/ext_json"`.
59
+ - `lib/exwiw/adapter/mongodb_adapter.rb` — `serialize_document`/`to_bulk_insert` inner call becomes `ExtJson.encode(doc)` (after `apply_mask_plan!`); remove the now-unused private `#extended_json`.
60
+
61
+ ## Verification (Part 1 + Part 2 scope)
62
+
63
+ - **`bundle exec rspec`** — full suite green after the revert. The parallel specs are deleted; the restored specs match iter-5. (`spec/insert_output_snapshot_spec.rb` and other mongodb-touching specs need live mongo on 27017 → sandbox disabled; the rest run normally.)
64
+ - **`spec/insert_output_snapshot_spec.rb`** (mongodb fixtures, live mongo) — the byte-exact guard; output bytes must be unchanged by the revert (streaming default is byte-identical to iter-5).
65
+ - `git grep -nE 'parallel_workers|cursor_parallel|ParallelSerializer|MongoIdPartitioner|PropagationCapture|ForkedPartWriter'` returns nothing in `lib/`, `spec/`, `exe/`, `README.md` after the revert (confirms full removal).
66
+ - `docs/optimize-mongodb-export-with-native-ext.md` and `docs/optimization-notes.md` exist and read coherently.
67
+
68
+ ## Notes
69
+ - No `git rebase`/`push -f` (history may stay messy; backing out via forward edits + deletions, not history rewrite).
70
+ - After implementation, offer to commit the plan via the remember-plan skill.
@@ -0,0 +1,278 @@
1
+ # SQL dump performance: investigation notes
2
+
3
+ Companion to [`optimization-notes.md`](./optimization-notes.md) (which covers the
4
+ MongoDB adapter). This records the speed/memory bottlenecks of the **SQL**
5
+ adapters' dump path (mysql / postgresql / sqlite), measured against a baseline,
6
+ so a future iteration can address them. **Nothing is fixed yet** — this is the
7
+ measurement + bottleneck-analysis step.
8
+
9
+ The reproducible harness is `script/bench_sql_dump.rb`. It seeds a synthetic
10
+ table and measures the two Runner phases per table; it also measures the
11
+ serialization step with no DB at all. The correctness anchor for any future fix
12
+ is `spec/insert_output_snapshot_spec.rb` — the **byte-exact** snapshot of dump
13
+ output.
14
+
15
+ > **Status:** **both hotspots are fixed for all three SQL adapters.** Hotspot #2
16
+ > (the whole-table INSERT string) — see
17
+ > [Resolution #2](#resolution-hotspot-2-streamed-single-insert). Hotspot #1
18
+ > (full result-set materialization in `execute`) — postgresql, mysql, and sqlite
19
+ > all stream the fetch now; see
20
+ > [Resolution #1](#resolution-hotspot-1-streaming-fetch-postgresql--mysql).
21
+
22
+ ## The two hotspots (same shape as MongoDB had pre-optimization)
23
+
24
+ The Runner drives, per table:
25
+
26
+ 1. **execute** — the adapter materializes the **entire** result set into a Ruby
27
+ array-of-arrays before anything is written:
28
+ - postgresql: `connection.exec(sql).values`
29
+ - mysql: `res.to_a.map { |row| row.map { stringify } }` (also re-allocates
30
+ every value as a normalized String)
31
+ - sqlite: `connection.execute(sql)`
32
+
33
+ Memory here is proportional to the **table size**, independent of any chunking
34
+ downstream.
35
+
36
+ 2. **to_bulk_insert** — SQL adapters set **no** `default_bulk_insert_chunk_size`
37
+ (it is `nil`), so the Runner treats the whole table as one chunk and
38
+ `to_bulk_insert` builds the **entire** `INSERT INTO ... VALUES (...),(...);`
39
+ as one giant String — first an `Array` of N per-row tuple strings, then the
40
+ joined result — held simultaneously with the result set from step 1.
41
+
42
+ ## Baseline (200,000 rows, 8 columns, ~41.2 MB output)
43
+
44
+ Measured via `bench_sql_dump.rb` (sandbox disabled — it needs `ps` for RSS and
45
+ localhost for the live DB). RSS is sticky across in-process phases, so read the
46
+ peaks as upper bounds and the *deltas* as the signal.
47
+
48
+ | adapter | execute peak (Δ) | + whole-string write peak | result-set objs |
49
+ |---------|------------------|---------------------------|-----------------|
50
+ | postgresql | 471.7 MB (+65) | 494.0 MB | 1.8M |
51
+ | mysql | 413.4 MB (+52) | 554.9 MB | 2.4M |
52
+ | sqlite | 434.5 MB | 526.6 MB | 1.4M |
53
+
54
+ For a **41 MB** dump the process peaks near **0.5 GB** — ~12× the output size —
55
+ because the full result set *and* the whole-table INSERT string are resident at
56
+ once. Both costs scale linearly with the table, so a large table OOMs the same
57
+ way the embed-heavy MongoDB collection did.
58
+
59
+ Per-value serialization is cheap and not the bottleneck: `escape_value` is
60
+ ~0.4–1.3 µs/op and a one-row `to_bulk_insert` ~5.5 µs/op. The cost is **memory**
61
+ (holding everything at once), not CPU.
62
+
63
+ ## The byte-identity catch (why this differs from MongoDB)
64
+
65
+ MongoDB's fix was a `default_bulk_insert_chunk_size`, which is byte-identical
66
+ because JSONL chunks join with the same `"\n"` `to_bulk_insert` already inserts
67
+ between docs. **That does not transfer to SQL.** Each `to_bulk_insert` call wraps
68
+ its rows in its own `INSERT INTO ... VALUES ...;` statement, so a chunk size > 0
69
+ turns one INSERT into *many* INSERT statements — semantically equivalent on
70
+ re-import, but a **different byte stream** that breaks the snapshot guard.
71
+
72
+ The byte-identical lever for SQL is instead to **stream the tuples into a single
73
+ INSERT statement**: emit the adapter's exact `INSERT ... VALUES\n` header once,
74
+ then write each row's `(...)` tuple (reusing the adapter's own `escape_value`)
75
+ separated by `",\n"`, then `;`. The bench implements this as `write_streamed`
76
+ and asserts byte-for-byte identity with the whole-string path — confirmed
77
+ **true** for all three adapters (the header must reuse the adapter's quoting,
78
+ e.g. MySQL's backticks, or it diverges).
79
+
80
+ `write_streamed` cuts the to_bulk_insert peak by ~110–120 MB, **but** naive
81
+ per-row `IO#print` makes it ~2–2.4× slower than building one string and writing
82
+ it once. So the production fix wants **chunk-buffered streaming**: build an
83
+ N-row substring in memory, write it, repeat — managing the `",\n"` separator
84
+ across chunk boundaries — to bound memory *without* the per-row IO penalty,
85
+ while still emitting a single INSERT statement (byte-identical).
86
+
87
+ ## Resolution: hotspot #2 (streamed single INSERT)
88
+
89
+ Implemented. The Runner no longer builds the per-table INSERT as one giant
90
+ String; it delegates writing to a new adapter seam `Adapter#write_inserts(io,
91
+ results, table, chunk_size)`:
92
+
93
+ - `Adapter::Base#write_inserts` keeps the old behavior (write `to_bulk_insert`
94
+ per chunk, joined by `"\n"`), so MongoDB and any future adapter are unchanged.
95
+ - The SQL adapters mix in `Adapter::SqlBulkInsert`, which **streams** the single
96
+ `INSERT INTO ... VALUES <tuples>;` statement to the file `STREAM_FLUSH_ROWS`
97
+ (2,000) tuples at a time. Each flush is one fast `map`+`join` (the same path
98
+ `to_bulk_insert` uses), and the `",\n"` printed between slices reproduces the
99
+ exact separator between tuples — so the bytes are **identical** to the
100
+ whole-table build. The three duplicate `to_bulk_insert` methods collapsed into
101
+ the shared module; each adapter now only supplies `insert_header` (its
102
+ identifier quoting) and `escape_value`.
103
+
104
+ Verified byte-for-byte by `spec/insert_output_snapshot_spec.rb` (live DB, all
105
+ three adapters) and a flush-boundary sanity check; full suite green.
106
+
107
+ Measured (`bench_sql_dump.rb`, 200k rows / ~41.2 MB output):
108
+
109
+ | adapter | whole-string peak | streamed peak | Δ peak |
110
+ |---------|-------------------|---------------|--------|
111
+ | postgresql | 367 MB | 226 MB | −141 MB |
112
+ | mysql | 325 MB | 214 MB | −111 MB |
113
+ | sqlite | 353 MB | 207 MB | −146 MB |
114
+
115
+ So hotspot #2's contribution (~110–145 MB on a 41 MB dump — the whole INSERT
116
+ string *plus* the transient 200k-tuple `Array` and its join) is gone; the write
117
+ buffer is now bounded to ~2,000 tuples regardless of table size.
118
+
119
+ **Speed:** streaming is *not* slower than the whole-string build. Measured in
120
+ isolation (post-`GC.start`, no sampler thread) the streamed write at
121
+ `flush_rows=2000` was ~0.67 s vs ~0.83 s for the one giant `map`+`join` — small
122
+ chunks stay in cache and avoid repeatedly growing/copying a 41 MB String. The
123
+ ~1.3× "slowdown" the in-process bench shows is an artifact of its background RSS
124
+ sampler thread (`ps` every 10 ms) plus run-ordering (streamed runs first, cold),
125
+ not the algorithm. The earlier worry about a per-row `IO#print` penalty only
126
+ applied to the naive row-at-a-time prototype, which `flush_rows` slicing avoids.
127
+
128
+ ## Resolution: hotspot #1 (streaming fetch, postgresql + mysql)
129
+
130
+ Implemented for **postgresql**. `PostgresqlAdapter#execute` no longer returns
131
+ `connection.exec(sql).values` (the whole result set as a Ruby array-of-arrays);
132
+ it returns a lazy `PostgresqlAdapter::StreamingResult` that pulls rows off the
133
+ wire one at a time via libpq **single-row mode** (`send_query` +
134
+ `set_single_row_mode` + a `get_result` loop, each yielding one row's
135
+ text-format `Array<String|nil>`). The Runner drives it exactly like the old
136
+ array — `#size` then a single `each_slice` pass — so nothing else changed and
137
+ the output is byte-identical (verified by `insert_output_snapshot_spec`, both
138
+ the `insert` and `copy` pg scenarios).
139
+
140
+ It mirrors `MongodbAdapter::StreamingResult`, with two SQL-specific points:
141
+
142
+ - **`#size`** can't be answered cheaply from the cursor, so it runs a separate
143
+ `SELECT COUNT(*) FROM (<query>) AS exwiw_count_src` (comment-prefixed, like
144
+ the data query). Postgres prunes the wrapped subquery's unused projection, so
145
+ the COUNT transfers no row data — but it does re-run the query plan. This is
146
+ the deliberate cost of keeping the Runner contract (`#size` before iteration,
147
+ used to skip empty tables and log the count) unchanged, so MongoDB and the
148
+ other SQL adapters are untouched. (MongoDB's `count_documents` is an
149
+ index-only walk and cheaper; the SQL COUNT is the analogue.)
150
+ - the streaming pass ties up the connection until fully drained. The Runner
151
+ always drains it (`write_inserts`) before issuing `post_insert_sql` / DELETE
152
+ on the same connection, so the ordering holds. `StreamingResult#each` also
153
+ drains any queued results if iteration is abandoned mid-stream (a SQL error
154
+ surfaced by `#check`, or the consumer raising), so the connection stays
155
+ usable.
156
+
157
+ Measured in **isolated fresh processes** (one per path, so the peak is not
158
+ polluted by other phases — RSS is sticky), 200k rows / ~41.2 MB output:
159
+
160
+ | pg fetch path | peak RSS | Δ over baseline |
161
+ |---------------|----------|-----------------|
162
+ | materialize (`exec(sql).values`) + streamed write (OLD) | ~360 MB | ~320 MB |
163
+ | **single-row stream + streamed write (NEW)** | **~48 MB** | **~12 MB** |
164
+
165
+ So the full result set (~320 MB of Ruby strings/arrays for 200k×8) is no longer
166
+ resident: peak drops ~**310 MB (~87%)** and is now *below* the 41 MB output
167
+ size, because the row is pulled one at a time and the write buffer is bounded to
168
+ ~2,000 tuples (hotspot #2's fix). Speed is unchanged (~1.8 s both paths on the
169
+ in-process bench; the COUNT is cheap on an indexed seed). The reproducible A/B
170
+ is in `bench_sql_dump.rb` Part B (`execute(stream)` vs `execute(materialize)`,
171
+ with a byte-identity assertion).
172
+
173
+ Implemented for **mysql** too. `MysqlAdapter#execute` now returns a
174
+ `MysqlAdapter::StreamingResult` (an Enumerable mirroring the pg one) instead of
175
+ `connection.query(sql).rows`. The new `MysqlClient#stream_rows` pulls rows off
176
+ the wire one at a time via mysql2's server-side stream (`stream: true` +
177
+ `cache_rows: false`), yielding the same `Array<String|nil>` rows `#query`
178
+ buffered — so the generated INSERT is byte-identical (verified by
179
+ `insert_output_snapshot_spec`).
180
+
181
+ Two MySQL specifics differ from the pg path:
182
+
183
+ - **`#size`** is a separate `SELECT COUNT(*)` of the same query, but **not** a
184
+ subquery wrap. MySQL rejects a derived table with duplicate column names,
185
+ which a rails-managed `SELECT *` joined to another table produces
186
+ (`Duplicate column name 'id'`); Postgres tolerates it, MySQL does not. So
187
+ mysql replaces the projection with `COUNT(*)` instead
188
+ (`compile_ast(count_only: true)`) — exact because exwiw's extraction queries
189
+ have no DISTINCT/GROUP BY/LIMIT, so the count is independent of the projected
190
+ columns (confirmed against live data: `COUNT(*)` over the bare FROM/JOIN/WHERE
191
+ equals the streamed row count for both plain and `SELECT *`+join queries).
192
+ - **abandoned streams.** mysql2 requires a streamed result to be fully consumed
193
+ before the next query on the connection, or it raises "Commands out of sync".
194
+ `stream_rows` drains the remainder (re-entering `res.each`, which continues
195
+ from where it stopped) if the consumer block raises, so the connection stays
196
+ usable for the next table. `trilogy` has no streaming cursor
197
+ (no `QUERY_FLAGS_STREAMING`), so it buffers and yields — parity, no memory
198
+ win; trilogy is a test-only driver, production uses mysql2.
199
+
200
+ Measured in **isolated fresh processes** (one per path), 200k rows / ~40.7 MB
201
+ output:
202
+
203
+ | mysql fetch path | peak RSS | Δ over baseline |
204
+ |------------------|----------|-----------------|
205
+ | materialize (`query(sql).rows`) + streamed write (OLD) | ~340 MB | ~300 MB |
206
+ | **single-row stream + streamed write (NEW)** | **~50 MB** | **~10 MB** |
207
+
208
+ So peak drops ~**290 MB (~85%)**, now just above the 40.7 MB output — the same
209
+ shape as the pg result. Speed is unchanged-to-faster (the materialize path also
210
+ builds the whole array first). `bench_sql_dump.rb` Part B now shows the delta
211
+ for mysql too (it was equivalent before, when mysql still materialized).
212
+
213
+ Implemented for **sqlite** too, closing hotspot #1 for all three SQL adapters.
214
+ `SqliteAdapter#execute` no longer returns `connection.execute(sql)` (which
215
+ buffers the whole result into a Ruby array); it returns a
216
+ `SqliteAdapter::StreamingResult` (Enumerable, mirroring the pg/mysql ones) whose
217
+ `#each` walks the result one row at a time through SQLite's **statement cursor**
218
+ — `connection.prepare(data_sql)` then `Statement#each` (which maps to
219
+ `sqlite3_step`), closing the statement in an `ensure` so an abandoned mid-stream
220
+ iteration still releases the cursor. The rows are the same `Array` of
221
+ native-typed values `Database#execute` produced, so the generated INSERT is
222
+ byte-identical (verified by `insert_output_snapshot_spec` and a direct cursor
223
+ vs. `#execute` comparison).
224
+
225
+ SQLite specifics vs. the pg/mysql paths:
226
+
227
+ - **`#size`** runs a separate `SELECT COUNT(*)` of the same query with the
228
+ projection replaced by `COUNT(*)` (`compile_ast(count_only: true)`, the same
229
+ trick mysql uses) — exact because exwiw's extraction queries have no
230
+ DISTINCT/GROUP BY/LIMIT. SQLite would also tolerate a duplicate-column
231
+ subquery wrap (unlike mysql), but the `count_only` form is shared and avoids
232
+ the extra subquery.
233
+ - **no connection contention.** SQLite is an embedded, single-connection engine
234
+ that allows multiple active prepared statements at once, so the `#size` COUNT
235
+ and the data cursor don't fight over the connection the way the pg/mysql
236
+ single-row streams tie up the wire. No drain dance is needed; just close the
237
+ statement.
238
+
239
+ Measured in **isolated fresh processes** (one per path), 200k rows / ~40.5 MB
240
+ output:
241
+
242
+ | sqlite fetch path | peak RSS | Δ over baseline |
243
+ |-------------------|----------|-----------------|
244
+ | materialize (`Database#execute`) + streamed write (OLD) | ~298 MB | ~257 MB |
245
+ | **statement-cursor stream + streamed write (NEW)** | **~59 MB** | **~18 MB** |
246
+
247
+ So peak drops ~**240 MB (~80%)**, the same shape as pg/mysql, and it is
248
+ **faster** (~0.84 s vs ~1.68 s) — the materialize path pays to build the whole
249
+ Ruby array up front, the cursor does not. `bench_sql_dump.rb` Part B now shows a
250
+ real delta for sqlite too (it was equivalent before, when sqlite still
251
+ materialized).
252
+
253
+ ## Status: both hotspots closed for all three SQL adapters
254
+
255
+ 1. **Bounded-memory write** (hotspot #2) — done for mysql / postgresql / sqlite;
256
+ see [Resolution #2](#resolution-hotspot-2-streamed-single-insert).
257
+ 2. **Streaming result fetch** (hotspot #1) — done for postgresql (libpq
258
+ single-row mode), mysql (mysql2 `stream: true`), and sqlite (statement
259
+ cursor); see Resolution #1 above.
260
+
261
+ There is no remaining materialization hotspot in the SQL dump path: peak RSS is
262
+ now bounded (well below the output size) and independent of table size for every
263
+ SQL adapter, the same property the MongoDB streaming work achieved. The
264
+ `trilogy` driver still buffers (it has no streaming cursor flag), but it is a
265
+ test-only driver — production mysql uses mysql2.
266
+
267
+ ## Methodology notes
268
+
269
+ - The serialization hotspot reproduces **with no database** (Part A): synthesize
270
+ the array-of-String-arrays the drivers return and measure `to_bulk_insert`.
271
+ The live-DB part (Part B) measures `execute` and needs a reachable DB; the dev
272
+ sandbox blocks localhost (and `ps`), so disable the sandbox for bench runs.
273
+ - Run order matters: the bench measures the STREAMED path **before** the WHOLE
274
+ path so the transient giant String doesn't pollute the streamed peak (RSS is
275
+ reclaimed lazily). For defensible absolute numbers, isolate phases in fresh
276
+ processes.
277
+ - Ruby 4.0 removed the `benchmark` stdlib; the harness uses
278
+ `Process.clock_gettime(Process::CLOCK_MONOTONIC)`.
@@ -0,0 +1,274 @@
1
+ // Native emitter for MongoDB Relaxed Extended JSON.
2
+ //
3
+ // Replaces the pure-Ruby `JSON.generate(doc.as_extended_json(mode: :relaxed))`
4
+ // (which rebuilds the whole document into an intermediate transformed Hash tree
5
+ // and then walks it a second time in JSON.generate) with a single native
6
+ // tree-walk that emits the JSONL line directly.
7
+ //
8
+ // Byte-identity strategy (see docs/optimize-mongodb-export-with-native-ext.md):
9
+ // only the structural bulk + the cheapest, most-stable leaves are formatted in
10
+ // C — Hash, Array, String, fixnum Integer, true/false/nil, BSON::ObjectId, and
11
+ // in-range Time (years 1970..9999; see encode_time_native). Everything else
12
+ // (Float, out-of-int64 Integer, out-of-range Time, Symbol, Decimal128, Binary,
13
+ // ...) is handed back to Ruby's `encode_fragment`, which is the exact pure-Ruby
14
+ // path. This is provably byte-identical because Hash#as_extended_json
15
+ // and Array#as_extended_json are non-transforming structural recursion: the
16
+ // bytes `JSON.generate(v.as_extended_json(mode: :relaxed))` produces for any
17
+ // sub-value `v` are exactly the bytes the whole-document generate would produce
18
+ // in that position, so a value the native walk does not format can be spliced
19
+ // in verbatim with no divergence.
20
+
21
+ #include <ruby.h>
22
+ #include <ruby/encoding.h>
23
+ #include <stdio.h>
24
+ #include <time.h>
25
+
26
+ static VALUE rb_mExtJson;
27
+ // Cached BSON::ObjectId class, or Qnil until bson is loaded and it resolves.
28
+ // Resolution is lazy (bson is required only when the Mongo adapter touches the
29
+ // DB, which always precedes serialization in a real run); see resolve below.
30
+ static VALUE rb_cObjectId;
31
+
32
+ static ID id_encode_fragment;
33
+ static ID id_to_s;
34
+ static ID id_const_BSON;
35
+ static ID id_const_ObjectId;
36
+
37
+ static const char hexdigits[] = "0123456789abcdef";
38
+
39
+ static void encode_value(VALUE buf, VALUE val);
40
+
41
+ // Append `str` as a JSON string literal (surrounding quotes included), escaping
42
+ // exactly as JSON.generate does: \b \t \n \f \r \" \\ get their short escapes,
43
+ // any other byte < 0x20 becomes a lowercase \u00xx, and every other byte —
44
+ // including '/', DEL (0x7f), U+2028/U+2029, and UTF-8 multi-byte sequences — is
45
+ // passed through raw. Unescaped runs are appended in bulk to avoid a per-byte
46
+ // rb_str_cat call.
47
+ static void encode_string(VALUE buf, const char *p, long len)
48
+ {
49
+ rb_str_cat(buf, "\"", 1);
50
+
51
+ long start = 0;
52
+ for (long i = 0; i < len; i++) {
53
+ unsigned char c = (unsigned char)p[i];
54
+ const char *esc = NULL;
55
+ long esclen = 0;
56
+ char ubuf[6];
57
+
58
+ switch (c) {
59
+ case '"': esc = "\\\""; esclen = 2; break;
60
+ case '\\': esc = "\\\\"; esclen = 2; break;
61
+ case '\b': esc = "\\b"; esclen = 2; break;
62
+ case '\t': esc = "\\t"; esclen = 2; break;
63
+ case '\n': esc = "\\n"; esclen = 2; break;
64
+ case '\f': esc = "\\f"; esclen = 2; break;
65
+ case '\r': esc = "\\r"; esclen = 2; break;
66
+ default:
67
+ if (c < 0x20) {
68
+ ubuf[0] = '\\'; ubuf[1] = 'u'; ubuf[2] = '0'; ubuf[3] = '0';
69
+ ubuf[4] = hexdigits[(c >> 4) & 0xf];
70
+ ubuf[5] = hexdigits[c & 0xf];
71
+ esc = ubuf; esclen = 6;
72
+ }
73
+ }
74
+
75
+ if (esc) {
76
+ if (i > start) rb_str_cat(buf, p + start, i - start);
77
+ rb_str_cat(buf, esc, esclen);
78
+ start = i + 1;
79
+ }
80
+ }
81
+ if (len > start) rb_str_cat(buf, p + start, len - start);
82
+
83
+ rb_str_cat(buf, "\"", 1);
84
+ }
85
+
86
+ // Hash keys mirror JSON.generate: a String key is emitted as-is, anything else
87
+ // is stringified (Symbol via its name, otherwise #to_s) before escaping.
88
+ static void encode_key(VALUE buf, VALUE key)
89
+ {
90
+ VALUE kstr;
91
+ if (RB_TYPE_P(key, T_STRING)) {
92
+ kstr = key;
93
+ } else if (RB_TYPE_P(key, T_SYMBOL)) {
94
+ kstr = rb_sym2str(key);
95
+ } else {
96
+ kstr = rb_funcall(key, id_to_s, 0);
97
+ }
98
+ encode_string(buf, RSTRING_PTR(kstr), RSTRING_LEN(kstr));
99
+ }
100
+
101
+ typedef struct {
102
+ VALUE buf;
103
+ int first;
104
+ } hash_ctx;
105
+
106
+ static int hash_iter(VALUE key, VALUE value, VALUE arg)
107
+ {
108
+ hash_ctx *ctx = (hash_ctx *)arg;
109
+ if (!ctx->first) rb_str_cat(ctx->buf, ",", 1);
110
+ ctx->first = 0;
111
+ encode_key(ctx->buf, key);
112
+ rb_str_cat(ctx->buf, ":", 1);
113
+ encode_value(ctx->buf, value);
114
+ return ST_CONTINUE;
115
+ }
116
+
117
+ // Splice the pure-Ruby fragment for a value the native path does not format.
118
+ static void delegate(VALUE buf, VALUE val)
119
+ {
120
+ VALUE frag = rb_funcall(rb_mExtJson, id_encode_fragment, 1, val);
121
+ rb_str_cat(buf, RSTRING_PTR(frag), RSTRING_LEN(frag));
122
+ }
123
+
124
+ // Epoch second for 10000-01-01T00:00:00Z. `bson`'s relaxed Time encoding uses
125
+ // the ISO-8601 string form only for years 1970..9999 (inclusive) and the
126
+ // {"$numberLong":"<ms>"} form otherwise; that year window is exactly the
127
+ // half-open epoch-second range [0, MAX_ISO_EPOCH).
128
+ #define MAX_ISO_EPOCH 253402300800LL
129
+
130
+ // Format a Time as Relaxed Extended JSON in C, matching bson 5.2.0 byte for
131
+ // byte for the common in-range case (see bson/time.rb and the empirical probe):
132
+ // - whole second (usec == 0, i.e. nsec < 1000): {"$date":"...:SSZ"} (no fraction)
133
+ // - sub-second (nsec >= 1000): {"$date":"...:SS.mmmZ"}, where the
134
+ // millisecond is floor(nsec / 1e6) — bson floors the Time to milliseconds.
135
+ // Returns 1 when handled. Returns 0 (leaving buf untouched) for years outside
136
+ // 1970..9999, whose {"$numberLong"} form involves negative-epoch arithmetic too
137
+ // fiddly to risk in C — the caller then delegates that rare case to Ruby.
138
+ static int encode_time_native(VALUE buf, VALUE val)
139
+ {
140
+ struct timespec ts = rb_time_timespec(val);
141
+ if (ts.tv_sec < 0 || ts.tv_sec >= MAX_ISO_EPOCH) return 0;
142
+
143
+ time_t secs = (time_t)ts.tv_sec;
144
+ struct tm tm;
145
+ if (gmtime_r(&secs, &tm) == NULL) return 0;
146
+
147
+ char tmp[40];
148
+ int n;
149
+ if (ts.tv_nsec >= 1000) {
150
+ int ms = (int)(ts.tv_nsec / 1000000L);
151
+ n = snprintf(tmp, sizeof(tmp),
152
+ "{\"$date\":\"%04d-%02d-%02dT%02d:%02d:%02d.%03dZ\"}",
153
+ tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
154
+ tm.tm_hour, tm.tm_min, tm.tm_sec, ms);
155
+ } else {
156
+ n = snprintf(tmp, sizeof(tmp),
157
+ "{\"$date\":\"%04d-%02d-%02dT%02d:%02d:%02dZ\"}",
158
+ tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
159
+ tm.tm_hour, tm.tm_min, tm.tm_sec);
160
+ }
161
+ rb_str_cat(buf, tmp, n);
162
+ return 1;
163
+ }
164
+
165
+ static void encode_value(VALUE buf, VALUE val)
166
+ {
167
+ switch (TYPE(val)) {
168
+ case T_NIL:
169
+ rb_str_cat(buf, "null", 4);
170
+ return;
171
+ case T_TRUE:
172
+ rb_str_cat(buf, "true", 4);
173
+ return;
174
+ case T_FALSE:
175
+ rb_str_cat(buf, "false", 5);
176
+ return;
177
+ case T_FIXNUM: {
178
+ // A Fixnum always fits in a C long (and thus int64) on the platforms
179
+ // exwiw targets, so it can never be the out-of-int64 case that must
180
+ // raise; emit it directly. Bignums fall through to delegate, where
181
+ // encode_fragment emits in-range ones and raises RangeError for the
182
+ // rest — matching today's behavior exactly.
183
+ char tmp[24];
184
+ int n = snprintf(tmp, sizeof(tmp), "%ld", (long)FIX2LONG(val));
185
+ rb_str_cat(buf, tmp, n);
186
+ return;
187
+ }
188
+ case T_STRING:
189
+ encode_string(buf, RSTRING_PTR(val), RSTRING_LEN(val));
190
+ return;
191
+ case T_ARRAY: {
192
+ long len = RARRAY_LEN(val);
193
+ rb_str_cat(buf, "[", 1);
194
+ for (long i = 0; i < len; i++) {
195
+ if (i > 0) rb_str_cat(buf, ",", 1);
196
+ encode_value(buf, rb_ary_entry(val, i));
197
+ }
198
+ rb_str_cat(buf, "]", 1);
199
+ return;
200
+ }
201
+ case T_HASH: {
202
+ // rb_hash_foreach preserves insertion order, matching JSON output.
203
+ hash_ctx ctx = { buf, 1 };
204
+ rb_str_cat(buf, "{", 1);
205
+ rb_hash_foreach(val, hash_iter, (VALUE)&ctx);
206
+ rb_str_cat(buf, "}", 1);
207
+ return;
208
+ }
209
+ default:
210
+ // BSON::ObjectId is the single most common leaf (`_id`) and its
211
+ // Relaxed form is the stable {"$oid":"<24 hex>"}, so format it here.
212
+ // The hex comes from #to_s (the same source as as_extended_json) and
213
+ // is always [0-9a-f]{24}, so it needs no escaping.
214
+ if (!NIL_P(rb_cObjectId) && RTEST(rb_obj_is_kind_of(val, rb_cObjectId))) {
215
+ VALUE hex = rb_funcall(val, id_to_s, 0);
216
+ rb_str_cat(buf, "{\"$oid\":\"", 9);
217
+ rb_str_cat(buf, RSTRING_PTR(hex), RSTRING_LEN(hex));
218
+ rb_str_cat(buf, "\"}", 2);
219
+ return;
220
+ }
221
+ // Time is the other common leaf in dumped documents (Mongoid's
222
+ // created_at/updated_at); format the in-range case natively. The
223
+ // out-of-range $numberLong form returns 0 and falls through to Ruby.
224
+ if (RTEST(rb_obj_is_kind_of(val, rb_cTime)) && encode_time_native(buf, val)) {
225
+ return;
226
+ }
227
+ // Float, Bignum, Symbol, Decimal128, Binary, out-of-range Time, ... -> Ruby.
228
+ delegate(buf, val);
229
+ return;
230
+ }
231
+ }
232
+
233
+ // Resolve and cache BSON::ObjectId the first time a document is encoded with
234
+ // bson loaded. Cheap const lookups guarded by the Qnil cache; once resolved it
235
+ // is skipped. Until resolved, ObjectId simply takes the (correct) delegate path.
236
+ static void resolve_objectid_class(void)
237
+ {
238
+ if (!NIL_P(rb_cObjectId)) return;
239
+ if (!rb_const_defined(rb_cObject, id_const_BSON)) return;
240
+
241
+ VALUE bson = rb_const_get(rb_cObject, id_const_BSON);
242
+ if (rb_const_defined(bson, id_const_ObjectId)) {
243
+ rb_cObjectId = rb_const_get(bson, id_const_ObjectId);
244
+ }
245
+ }
246
+
247
+ // Exwiw::ExtJson.encode_native(doc) -> String
248
+ // Returns one JSONL line (no trailing newline); the caller owns separators.
249
+ static VALUE rb_encode_native(VALUE self, VALUE doc)
250
+ {
251
+ resolve_objectid_class();
252
+
253
+ VALUE buf = rb_str_buf_new(256);
254
+ rb_enc_associate(buf, rb_utf8_encoding());
255
+ encode_value(buf, doc);
256
+ return buf;
257
+ }
258
+
259
+ void Init_ext_json_native(void)
260
+ {
261
+ id_encode_fragment = rb_intern("encode_fragment");
262
+ id_to_s = rb_intern("to_s");
263
+ id_const_BSON = rb_intern("BSON");
264
+ id_const_ObjectId = rb_intern("ObjectId");
265
+
266
+ VALUE mExwiw = rb_define_module("Exwiw");
267
+ rb_mExtJson = rb_define_module_under(mExwiw, "ExtJson");
268
+ rb_global_variable(&rb_mExtJson);
269
+
270
+ rb_cObjectId = Qnil;
271
+ rb_global_variable(&rb_cObjectId);
272
+
273
+ rb_define_singleton_method(rb_mExtJson, "encode_native", rb_encode_native, 1);
274
+ }
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "mkmf"
4
+
5
+ # Compiled to lib/exwiw/ext_json_native.{so,bundle}. The name is distinct from
6
+ # the `ext_json.rb` shim so `require "exwiw/ext_json_native"` does not collide
7
+ # with `require_relative "exwiw/ext_json"`.
8
+ create_makefile("exwiw/ext_json_native")