RubyGems - exwiw - Versions diffs - 0.5.2 → 0.6.0 - Mend

exwiw 0.5.2 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +13 -0
data/README.md +3 -2
data/docs/optimization-notes.md +126 -0
data/docs/optimize-mongodb-export-with-native-ext.md +249 -0
data/docs/plans/2026-05-15-insert-000-schema-file.md +4 -4
data/docs/plans/2026-05-16-mongodb-from-clean-scenario.md +8 -8
data/docs/plans/2026-05-22-postgres-copy-mode-scenario-test.md +7 -7
data/docs/plans/2026-05-31-ids-column-for-sql-adapters.md +1 -1
data/docs/plans/2026-06-19-mongodb-export-remove-parallelism-native-ext.md +70 -0
data/docs/sql-dump-optimization-notes.md +278 -0
data/ext/exwiw/ext_json/ext_json.c +274 -0
data/ext/exwiw/ext_json/extconf.rb +8 -0
data/lib/exwiw/adapter/mongodb_adapter.rb +159 -40
data/lib/exwiw/adapter/mysql_adapter.rb +70 -18
data/lib/exwiw/adapter/mysql_client.rb +43 -0
data/lib/exwiw/adapter/postgresql_adapter.rb +85 -15
data/lib/exwiw/adapter/sql_bulk_insert.rb +71 -0
data/lib/exwiw/adapter/sqlite_adapter.rb +75 -18
data/lib/exwiw/adapter.rb +38 -0
data/lib/exwiw/ext_json.rb +33 -0
data/lib/exwiw/runner.rb +18 -6
data/lib/exwiw/version.rb +1 -1
data/lib/exwiw.rb +2 -0
data/mise.toml +2 -2
metadata +11 -2

data/docs/plans/2026-06-19-mongodb-export-remove-parallelism-native-ext.md ADDED Viewed

@@ -0,0 +1,70 @@
+# Plan: Back out MongoDB fork/cursor parallelism → optional Extended-JSON C extension
+## Context (why)
+This branch (`gnhf/rt-rails-mongodb-dum-ed518c`) shipped a large MongoDB-dump perf effort across 18 iterations. The memory wins (iter 2–5: streaming result set + chunked output + precompiled `MaskPlan`) are clean, byte-identical, no-flag defaults and stay. But the CPU/throughput half (iter 6–18) grew into heavy multi-process machinery: two CLI flags (`--parallel-workers`, `--cursor-parallel`), four components (`ParallelSerializer`, `MongoIdPartitioner`, `PropagationCapture`, `ForkedPartWriter`), adapter `write_bulk_insert`/`write_inserts` seams, CLI→Runner→Adapter threading, fork orchestration, Marshal IPC, `_id`-range partitioning, distributed `@state` merge, and a sorted-output caveat — to buy ~1.1–1.4× (serialize-fork) / ~2.5–5.5× (cursor-parallel).
+The maintainer's decision: **this is over-engineered.** Preserve the findings as `docs/optimization-notes.md`, **remove the parallelism**, and address the CPU hotspot with a much simpler lever the earlier iterations kept pointing at — a **C extension** for the dominant `as_extended_json(mode: :relaxed) + JSON.generate` cost.
+**Honest scope of the win:** a C extension speeds only the per-document Extended-JSON *serialization* (~82% of per-doc serialize cost). It cannot touch the Mongo cursor's BSON→Ruby *decode* (~40% of total wall time, inside the driver) — that was cursor-parallel's job and is being removed. So end-to-end is bounded ~2.5× (the same Amdahl ceiling `--parallel-workers` hit), not cursor-parallel's 3.4–5.5×. The trade is deliberate: far simpler code + a no-flag, no-fork, single-process CPU win, for a smaller peak speedup. Memory behavior is unchanged (streaming stays the default).
+## Part 1 — Remove the fork/cursor parallelism (keep streaming + MaskPlan)
+Verified: `git diff b389204..HEAD` on the core lib files is **entirely** parallelism (`b389204` = iter-5 "MaskPlan" commit). So the iter-5 versions are exactly the "keep streaming, drop parallel" baseline.
+**Restore to `b389204` (clean streaming baseline):**
+- `lib/exwiw/runner.rb` — back to the inline `each_slice` chunk loop (`file.print(adapter.to_bulk_insert(chunk))`), no `write_inserts`/`parallel_workers`/`cursor_parallel`.
+- `lib/exwiw/adapter.rb` — `Base#initialize(connection_config, logger)` (drop the two kwargs + ivars), `self.build(connection_config, logger)`; drop the `write_bulk_insert`/`write_inserts` seams (they exist only for parallelism). Keep `default_bulk_insert_chunk_size` (streaming).
+- `lib/exwiw/adapter/mongodb_adapter.rb` — iter-5 form: `StreamingResult` without `query`/`keys` readers; `db` with inline client construction (drop `build_client`); no `write_bulk_insert`/`write_inserts`/`cursor_parallel`/`parallel_workers`/partitioner/capture. Keep `StreamingResult`, `MaskPlan`, chunking. (Then patch `serialize_document` in Part 3.)
+- `lib/exwiw/cli.rb` — drop `--parallel-workers`/`--cursor-parallel` flags, ivars, `validate_parallel_workers!`/`validate_cursor_parallel!`, and the Runner kwargs.
+- `lib/exwiw.rb` — drop the four parallel `require_relative` lines (Part 3 adds the `ext_json` require).
+**Delete (components + specs + probes):**
+- `lib/exwiw/{parallel_serializer,forked_part_writer,mongo_id_partitioner,propagation_capture}.rb`
+- `spec/{parallel_serializer,forked_part_writer,mongo_id_partitioner,propagation_capture}_spec.rb`
+- `script/bench_mongodb_parallel_probe.rb`, `script/bench_mongodb_cursor_parallel_probe.rb`
+**Restore to `b389204`:** `spec/adapter_spec.rb`, `spec/cli_spec.rb`, `spec/adapter/mongodb_adapter_spec.rb`, `script/bench_mongodb_dump.rb` (all post-iter-5 changes there are parallel-only).
+**Edit, don't restore:** `README.md`, `CHANGELOG.md` — remove the `--parallel-workers` / `--cursor-parallel` / `EXWIW_MONGODB_*` sections; the `[Unreleased]` entry becomes "streaming/chunked MongoDB dump by default + optional Extended-JSON C extension", pointing to `docs/optimization-notes.md`.
+## Part 2 — `docs/optimization-notes.md`
+Distill the 18-iteration log (`.gnhf/runs/.../notes.md`) into a durable doc: the two hotspots (result-set memory; `as_extended_json` CPU), what shipped by default (streaming + chunking + `MaskPlan`), and the **explored-and-removed** parallelism — fork serialize (~1.1–1.4×), cursor-parallel (~3.4–5.5× but heavy: IPC, `_id` partitioning, distributed `@state`, sorted output) — with the Amdahl reasoning (serial cursor decode floor) and *why* it was removed in favor of the C extension. This is the knowledge-preservation the maintainer asked for.
+## Part 3 — Optional Extended-JSON C extension — DOCUMENT ONLY (not implemented now)
+**Revised scope (per maintainer):** Part 3 is NOT implemented in this pass. Instead, write the design below into `docs/optimize-mongodb-export-with-native-ext.md` as a future-work / design doc. Only Part 1 and Part 2 are executed now.
+Design to capture: replaces `JSON.generate(doc.as_extended_json(mode: :relaxed))` with one native tree-walk that emits the Relaxed Extended JSON line directly — no intermediate transformed-Hash tree, no second JSON pass.
+**New files**
+- `ext/exwiw/ext_json/extconf.rb` — `mkmf` (stdlib), `create_makefile("exwiw/ext_json_native")`.
+- `ext/exwiw/ext_json/ext_json.c` — defines `Exwiw::ExtJson.encode_native(doc) -> String` (one JSONL line, no trailing `\n`). Recursive emitter into a growth buffer.
+- `lib/exwiw/ext_json.rb` — shim: `require "exwiw/ext_json_native"` (distinct name avoids self-collision); on `LoadError`, define a pure-Ruby `encode`. Exposes one stable `Exwiw::ExtJson.encode(doc)`. Fallback is **byte-for-byte today's code**:
+  ```ruby
+  JSON.generate(doc.respond_to?(:as_extended_json) ? doc.as_extended_json(mode: :relaxed) : doc)
+  ```
+**Native fast-path vs delegate (byte-identity strategy, from the Plan agent's findings)**
+- Native in C: `Hash` (insertion order via `rb_hash_foreach`; String keys), `Array`, `String` (escape only `\b\t\n\f\r\"\\`; lowercase `\u00xx` for other <0x20; leave `/`, DEL, U+2028/9, non-ASCII raw), `Integer` within int64, `true`/`false`/`nil`, and `BSON::ObjectId` → `{"$oid":"<24 hex>"}` (hex via `to_s`).
+- **Delegate to Ruby** (call back to a fallback helper returning the JSON fragment for that value): `Float` (`Float#to_s` ≠ `JSON.generate` for sci-notation, e.g. `1e20`), `Time` (variable fractional digits + the years-[1970,9999] `$numberLong` boundary — highest risk), out-of-int64 `Integer` (must preserve the existing `RangeError`), and any unrecognized class (Decimal128, Binary, Symbol, Regexp, Date, …). This is provably byte-identical because `Hash/Array#as_extended_json` are non-transforming structural recursion, so `JSON.generate(v.as_extended_json(mode: :relaxed))` on any sub-value matches the whole-tree output. Time/Float are candidates for later native promotion if the benchmark justifies the added risk.
+**Packaging / wiring**
+- `exwiw.gemspec` — `spec.extensions = ["ext/exwiw/ext_json/extconf.rb"]` (auto-compiles on `gem install`; fallback covers platforms that can't).
+- `Rakefile` — `Rake::ExtensionTask.new("ext_json_native")`; make `spec` depend on `compile`.
+- Add `rake-compiler` as a dev dep (Gemfile/gemspec). `mkmf` is stdlib.
+- `.gitignore` — ignore built artifacts (`lib/exwiw/*.bundle`, `lib/exwiw/*.so`, `ext/**/*.o`, `ext/**/Makefile`); commit only `ext/` sources (gemspec ships via `git ls-files`).
+- `lib/exwiw.rb` — `require_relative "exwiw/ext_json"`.
+- `lib/exwiw/adapter/mongodb_adapter.rb` — `serialize_document`/`to_bulk_insert` inner call becomes `ExtJson.encode(doc)` (after `apply_mask_plan!`); remove the now-unused private `#extended_json`.
+## Verification (Part 1 + Part 2 scope)
+- **`bundle exec rspec`** — full suite green after the revert. The parallel specs are deleted; the restored specs match iter-5. (`spec/insert_output_snapshot_spec.rb` and other mongodb-touching specs need live mongo on 27017 → sandbox disabled; the rest run normally.)
+- **`spec/insert_output_snapshot_spec.rb`** (mongodb fixtures, live mongo) — the byte-exact guard; output bytes must be unchanged by the revert (streaming default is byte-identical to iter-5).
+- `git grep -nE 'parallel_workers|cursor_parallel|ParallelSerializer|MongoIdPartitioner|PropagationCapture|ForkedPartWriter'` returns nothing in `lib/`, `spec/`, `exe/`, `README.md` after the revert (confirms full removal).
+- `docs/optimize-mongodb-export-with-native-ext.md` and `docs/optimization-notes.md` exist and read coherently.
+## Notes
+- No `git rebase`/`push -f` (history may stay messy; backing out via forward edits + deletions, not history rewrite).
+- After implementation, offer to commit the plan via the remember-plan skill.

data/docs/sql-dump-optimization-notes.md ADDED Viewed

@@ -0,0 +1,278 @@
+# SQL dump performance: investigation notes
+Companion to [`optimization-notes.md`](./optimization-notes.md) (which covers the
+MongoDB adapter). This records the speed/memory bottlenecks of the **SQL**
+adapters' dump path (mysql / postgresql / sqlite), measured against a baseline,
+so a future iteration can address them. **Nothing is fixed yet** — this is the
+measurement + bottleneck-analysis step.
+The reproducible harness is `script/bench_sql_dump.rb`. It seeds a synthetic
+table and measures the two Runner phases per table; it also measures the
+serialization step with no DB at all. The correctness anchor for any future fix
+is `spec/insert_output_snapshot_spec.rb` — the **byte-exact** snapshot of dump
+output.
+> **Status:** **both hotspots are fixed for all three SQL adapters.** Hotspot #2
+> (the whole-table INSERT string) — see
+> [Resolution #2](#resolution-hotspot-2-streamed-single-insert). Hotspot #1
+> (full result-set materialization in `execute`) — postgresql, mysql, and sqlite
+> all stream the fetch now; see
+> [Resolution #1](#resolution-hotspot-1-streaming-fetch-postgresql--mysql).
+## The two hotspots (same shape as MongoDB had pre-optimization)
+The Runner drives, per table:
+1. **execute** — the adapter materializes the **entire** result set into a Ruby
+   array-of-arrays before anything is written:
+   - postgresql: `connection.exec(sql).values`
+   - mysql: `res.to_a.map { |row| row.map { stringify } }` (also re-allocates
+     every value as a normalized String)
+   - sqlite: `connection.execute(sql)`
+   Memory here is proportional to the **table size**, independent of any chunking
+   downstream.
+2. **to_bulk_insert** — SQL adapters set **no** `default_bulk_insert_chunk_size`
+   (it is `nil`), so the Runner treats the whole table as one chunk and
+   `to_bulk_insert` builds the **entire** `INSERT INTO ... VALUES (...),(...);`
+   as one giant String — first an `Array` of N per-row tuple strings, then the
+   joined result — held simultaneously with the result set from step 1.
+## Baseline (200,000 rows, 8 columns, ~41.2 MB output)
+Measured via `bench_sql_dump.rb` (sandbox disabled — it needs `ps` for RSS and
+localhost for the live DB). RSS is sticky across in-process phases, so read the
+peaks as upper bounds and the *deltas* as the signal.
+| adapter | execute peak (Δ) | + whole-string write peak | result-set objs |
+|---------|------------------|---------------------------|-----------------|
+| postgresql | 471.7 MB (+65)  | 494.0 MB | 1.8M |
+| mysql      | 413.4 MB (+52)  | 554.9 MB | 2.4M |
+| sqlite     | 434.5 MB        | 526.6 MB | 1.4M |
+For a **41 MB** dump the process peaks near **0.5 GB** — ~12× the output size —
+because the full result set *and* the whole-table INSERT string are resident at
+once. Both costs scale linearly with the table, so a large table OOMs the same
+way the embed-heavy MongoDB collection did.
+Per-value serialization is cheap and not the bottleneck: `escape_value` is
+~0.4–1.3 µs/op and a one-row `to_bulk_insert` ~5.5 µs/op. The cost is **memory**
+(holding everything at once), not CPU.
+## The byte-identity catch (why this differs from MongoDB)
+MongoDB's fix was a `default_bulk_insert_chunk_size`, which is byte-identical
+because JSONL chunks join with the same `"\n"` `to_bulk_insert` already inserts
+between docs. **That does not transfer to SQL.** Each `to_bulk_insert` call wraps
+its rows in its own `INSERT INTO ... VALUES ...;` statement, so a chunk size > 0
+turns one INSERT into *many* INSERT statements — semantically equivalent on
+re-import, but a **different byte stream** that breaks the snapshot guard.
+The byte-identical lever for SQL is instead to **stream the tuples into a single
+INSERT statement**: emit the adapter's exact `INSERT ... VALUES\n` header once,
+then write each row's `(...)` tuple (reusing the adapter's own `escape_value`)
+separated by `",\n"`, then `;`. The bench implements this as `write_streamed`
+and asserts byte-for-byte identity with the whole-string path — confirmed
+**true** for all three adapters (the header must reuse the adapter's quoting,
+e.g. MySQL's backticks, or it diverges).
+`write_streamed` cuts the to_bulk_insert peak by ~110–120 MB, **but** naive
+per-row `IO#print` makes it ~2–2.4× slower than building one string and writing
+it once. So the production fix wants **chunk-buffered streaming**: build an
+N-row substring in memory, write it, repeat — managing the `",\n"` separator
+across chunk boundaries — to bound memory *without* the per-row IO penalty,
+while still emitting a single INSERT statement (byte-identical).
+## Resolution: hotspot #2 (streamed single INSERT)
+Implemented. The Runner no longer builds the per-table INSERT as one giant
+String; it delegates writing to a new adapter seam `Adapter#write_inserts(io,
+results, table, chunk_size)`:
+- `Adapter::Base#write_inserts` keeps the old behavior (write `to_bulk_insert`
+  per chunk, joined by `"\n"`), so MongoDB and any future adapter are unchanged.
+- The SQL adapters mix in `Adapter::SqlBulkInsert`, which **streams** the single
+  `INSERT INTO ... VALUES <tuples>;` statement to the file `STREAM_FLUSH_ROWS`
+  (2,000) tuples at a time. Each flush is one fast `map`+`join` (the same path
+  `to_bulk_insert` uses), and the `",\n"` printed between slices reproduces the
+  exact separator between tuples — so the bytes are **identical** to the
+  whole-table build. The three duplicate `to_bulk_insert` methods collapsed into
+  the shared module; each adapter now only supplies `insert_header` (its
+  identifier quoting) and `escape_value`.
+Verified byte-for-byte by `spec/insert_output_snapshot_spec.rb` (live DB, all
+three adapters) and a flush-boundary sanity check; full suite green.
+Measured (`bench_sql_dump.rb`, 200k rows / ~41.2 MB output):
+| adapter | whole-string peak | streamed peak | Δ peak |
+|---------|-------------------|---------------|--------|
+| postgresql | 367 MB | 226 MB | −141 MB |
+| mysql      | 325 MB | 214 MB | −111 MB |
+| sqlite     | 353 MB | 207 MB | −146 MB |
+So hotspot #2's contribution (~110–145 MB on a 41 MB dump — the whole INSERT
+string *plus* the transient 200k-tuple `Array` and its join) is gone; the write
+buffer is now bounded to ~2,000 tuples regardless of table size.
+**Speed:** streaming is *not* slower than the whole-string build. Measured in
+isolation (post-`GC.start`, no sampler thread) the streamed write at
+`flush_rows=2000` was ~0.67 s vs ~0.83 s for the one giant `map`+`join` — small
+chunks stay in cache and avoid repeatedly growing/copying a 41 MB String. The
+~1.3× "slowdown" the in-process bench shows is an artifact of its background RSS
+sampler thread (`ps` every 10 ms) plus run-ordering (streamed runs first, cold),
+not the algorithm. The earlier worry about a per-row `IO#print` penalty only
+applied to the naive row-at-a-time prototype, which `flush_rows` slicing avoids.
+## Resolution: hotspot #1 (streaming fetch, postgresql + mysql)
+Implemented for **postgresql**. `PostgresqlAdapter#execute` no longer returns
+`connection.exec(sql).values` (the whole result set as a Ruby array-of-arrays);
+it returns a lazy `PostgresqlAdapter::StreamingResult` that pulls rows off the
+wire one at a time via libpq **single-row mode** (`send_query` +
+`set_single_row_mode` + a `get_result` loop, each yielding one row's
+text-format `Array<String|nil>`). The Runner drives it exactly like the old
+array — `#size` then a single `each_slice` pass — so nothing else changed and
+the output is byte-identical (verified by `insert_output_snapshot_spec`, both
+the `insert` and `copy` pg scenarios).
+It mirrors `MongodbAdapter::StreamingResult`, with two SQL-specific points:
+- **`#size`** can't be answered cheaply from the cursor, so it runs a separate
+  `SELECT COUNT(*) FROM (<query>) AS exwiw_count_src` (comment-prefixed, like
+  the data query). Postgres prunes the wrapped subquery's unused projection, so
+  the COUNT transfers no row data — but it does re-run the query plan. This is
+  the deliberate cost of keeping the Runner contract (`#size` before iteration,
+  used to skip empty tables and log the count) unchanged, so MongoDB and the
+  other SQL adapters are untouched. (MongoDB's `count_documents` is an
+  index-only walk and cheaper; the SQL COUNT is the analogue.)
+- the streaming pass ties up the connection until fully drained. The Runner
+  always drains it (`write_inserts`) before issuing `post_insert_sql` / DELETE
+  on the same connection, so the ordering holds. `StreamingResult#each` also
+  drains any queued results if iteration is abandoned mid-stream (a SQL error
+  surfaced by `#check`, or the consumer raising), so the connection stays
+  usable.
+Measured in **isolated fresh processes** (one per path, so the peak is not
+polluted by other phases — RSS is sticky), 200k rows / ~41.2 MB output:
+| pg fetch path | peak RSS | Δ over baseline |
+|---------------|----------|-----------------|
+| materialize (`exec(sql).values`) + streamed write (OLD) | ~360 MB | ~320 MB |
+| **single-row stream + streamed write (NEW)** | **~48 MB** | **~12 MB** |
+So the full result set (~320 MB of Ruby strings/arrays for 200k×8) is no longer
+resident: peak drops ~**310 MB (~87%)** and is now *below* the 41 MB output
+size, because the row is pulled one at a time and the write buffer is bounded to
+~2,000 tuples (hotspot #2's fix). Speed is unchanged (~1.8 s both paths on the
+in-process bench; the COUNT is cheap on an indexed seed). The reproducible A/B
+is in `bench_sql_dump.rb` Part B (`execute(stream)` vs `execute(materialize)`,
+with a byte-identity assertion).
+Implemented for **mysql** too. `MysqlAdapter#execute` now returns a
+`MysqlAdapter::StreamingResult` (an Enumerable mirroring the pg one) instead of
+`connection.query(sql).rows`. The new `MysqlClient#stream_rows` pulls rows off
+the wire one at a time via mysql2's server-side stream (`stream: true` +
+`cache_rows: false`), yielding the same `Array<String|nil>` rows `#query`
+buffered — so the generated INSERT is byte-identical (verified by
+`insert_output_snapshot_spec`).
+Two MySQL specifics differ from the pg path:
+- **`#size`** is a separate `SELECT COUNT(*)` of the same query, but **not** a
+  subquery wrap. MySQL rejects a derived table with duplicate column names,
+  which a rails-managed `SELECT *` joined to another table produces
+  (`Duplicate column name 'id'`); Postgres tolerates it, MySQL does not. So
+  mysql replaces the projection with `COUNT(*)` instead
+  (`compile_ast(count_only: true)`) — exact because exwiw's extraction queries
+  have no DISTINCT/GROUP BY/LIMIT, so the count is independent of the projected
+  columns (confirmed against live data: `COUNT(*)` over the bare FROM/JOIN/WHERE
+  equals the streamed row count for both plain and `SELECT *`+join queries).
+- **abandoned streams.** mysql2 requires a streamed result to be fully consumed
+  before the next query on the connection, or it raises "Commands out of sync".
+  `stream_rows` drains the remainder (re-entering `res.each`, which continues
+  from where it stopped) if the consumer block raises, so the connection stays
+  usable for the next table. `trilogy` has no streaming cursor
+  (no `QUERY_FLAGS_STREAMING`), so it buffers and yields — parity, no memory
+  win; trilogy is a test-only driver, production uses mysql2.
+Measured in **isolated fresh processes** (one per path), 200k rows / ~40.7 MB
+output:
+| mysql fetch path | peak RSS | Δ over baseline |
+|------------------|----------|-----------------|
+| materialize (`query(sql).rows`) + streamed write (OLD) | ~340 MB | ~300 MB |
+| **single-row stream + streamed write (NEW)** | **~50 MB** | **~10 MB** |
+So peak drops ~**290 MB (~85%)**, now just above the 40.7 MB output — the same
+shape as the pg result. Speed is unchanged-to-faster (the materialize path also
+builds the whole array first). `bench_sql_dump.rb` Part B now shows the delta
+for mysql too (it was equivalent before, when mysql still materialized).
+Implemented for **sqlite** too, closing hotspot #1 for all three SQL adapters.
+`SqliteAdapter#execute` no longer returns `connection.execute(sql)` (which
+buffers the whole result into a Ruby array); it returns a
+`SqliteAdapter::StreamingResult` (Enumerable, mirroring the pg/mysql ones) whose
+`#each` walks the result one row at a time through SQLite's **statement cursor**
+— `connection.prepare(data_sql)` then `Statement#each` (which maps to
+`sqlite3_step`), closing the statement in an `ensure` so an abandoned mid-stream
+iteration still releases the cursor. The rows are the same `Array` of
+native-typed values `Database#execute` produced, so the generated INSERT is
+byte-identical (verified by `insert_output_snapshot_spec` and a direct cursor
+vs. `#execute` comparison).
+SQLite specifics vs. the pg/mysql paths:
+- **`#size`** runs a separate `SELECT COUNT(*)` of the same query with the
+  projection replaced by `COUNT(*)` (`compile_ast(count_only: true)`, the same
+  trick mysql uses) — exact because exwiw's extraction queries have no
+  DISTINCT/GROUP BY/LIMIT. SQLite would also tolerate a duplicate-column
+  subquery wrap (unlike mysql), but the `count_only` form is shared and avoids
+  the extra subquery.
+- **no connection contention.** SQLite is an embedded, single-connection engine
+  that allows multiple active prepared statements at once, so the `#size` COUNT
+  and the data cursor don't fight over the connection the way the pg/mysql
+  single-row streams tie up the wire. No drain dance is needed; just close the
+  statement.
+Measured in **isolated fresh processes** (one per path), 200k rows / ~40.5 MB
+output:
+| sqlite fetch path | peak RSS | Δ over baseline |
+|-------------------|----------|-----------------|
+| materialize (`Database#execute`) + streamed write (OLD) | ~298 MB | ~257 MB |
+| **statement-cursor stream + streamed write (NEW)** | **~59 MB** | **~18 MB** |
+So peak drops ~**240 MB (~80%)**, the same shape as pg/mysql, and it is
+**faster** (~0.84 s vs ~1.68 s) — the materialize path pays to build the whole
+Ruby array up front, the cursor does not. `bench_sql_dump.rb` Part B now shows a
+real delta for sqlite too (it was equivalent before, when sqlite still
+materialized).
+## Status: both hotspots closed for all three SQL adapters
+1. **Bounded-memory write** (hotspot #2) — done for mysql / postgresql / sqlite;
+   see [Resolution #2](#resolution-hotspot-2-streamed-single-insert).
+2. **Streaming result fetch** (hotspot #1) — done for postgresql (libpq
+   single-row mode), mysql (mysql2 `stream: true`), and sqlite (statement
+   cursor); see Resolution #1 above.
+There is no remaining materialization hotspot in the SQL dump path: peak RSS is
+now bounded (well below the output size) and independent of table size for every
+SQL adapter, the same property the MongoDB streaming work achieved. The
+`trilogy` driver still buffers (it has no streaming cursor flag), but it is a
+test-only driver — production mysql uses mysql2.
+## Methodology notes
+- The serialization hotspot reproduces **with no database** (Part A): synthesize
+  the array-of-String-arrays the drivers return and measure `to_bulk_insert`.
+  The live-DB part (Part B) measures `execute` and needs a reachable DB; the dev
+  sandbox blocks localhost (and `ps`), so disable the sandbox for bench runs.
+- Run order matters: the bench measures the STREAMED path **before** the WHOLE
+  path so the transient giant String doesn't pollute the streamed peak (RSS is
+  reclaimed lazily). For defensible absolute numbers, isolate phases in fresh
+  processes.
+- Ruby 4.0 removed the `benchmark` stdlib; the harness uses
+  `Process.clock_gettime(Process::CLOCK_MONOTONIC)`.

data/ext/exwiw/ext_json/ext_json.c ADDED Viewed

@@ -0,0 +1,274 @@
+// Native emitter for MongoDB Relaxed Extended JSON.
+//
+// Replaces the pure-Ruby `JSON.generate(doc.as_extended_json(mode: :relaxed))`
+// (which rebuilds the whole document into an intermediate transformed Hash tree
+// and then walks it a second time in JSON.generate) with a single native
+// tree-walk that emits the JSONL line directly.
+//
+// Byte-identity strategy (see docs/optimize-mongodb-export-with-native-ext.md):
+// only the structural bulk + the cheapest, most-stable leaves are formatted in
+// C — Hash, Array, String, fixnum Integer, true/false/nil, BSON::ObjectId, and
+// in-range Time (years 1970..9999; see encode_time_native). Everything else
+// (Float, out-of-int64 Integer, out-of-range Time, Symbol, Decimal128, Binary,
+// ...) is handed back to Ruby's `encode_fragment`, which is the exact pure-Ruby
+// path. This is provably byte-identical because Hash#as_extended_json
+// and Array#as_extended_json are non-transforming structural recursion: the
+// bytes `JSON.generate(v.as_extended_json(mode: :relaxed))` produces for any
+// sub-value `v` are exactly the bytes the whole-document generate would produce
+// in that position, so a value the native walk does not format can be spliced
+// in verbatim with no divergence.
+#include <ruby.h>
+#include <ruby/encoding.h>
+#include <stdio.h>
+#include <time.h>
+static VALUE rb_mExtJson;
+// Cached BSON::ObjectId class, or Qnil until bson is loaded and it resolves.
+// Resolution is lazy (bson is required only when the Mongo adapter touches the
+// DB, which always precedes serialization in a real run); see resolve below.
+static VALUE rb_cObjectId;
+static ID id_encode_fragment;
+static ID id_to_s;
+static ID id_const_BSON;
+static ID id_const_ObjectId;
+static const char hexdigits[] = "0123456789abcdef";
+static void encode_value(VALUE buf, VALUE val);
+// Append `str` as a JSON string literal (surrounding quotes included), escaping
+// exactly as JSON.generate does: \b \t \n \f \r \" \\ get their short escapes,
+// any other byte < 0x20 becomes a lowercase \u00xx, and every other byte —
+// including '/', DEL (0x7f), U+2028/U+2029, and UTF-8 multi-byte sequences — is
+// passed through raw. Unescaped runs are appended in bulk to avoid a per-byte
+// rb_str_cat call.
+static void encode_string(VALUE buf, const char *p, long len)
+{
+    rb_str_cat(buf, "\"", 1);
+    long start = 0;
+    for (long i = 0; i < len; i++) {
+        unsigned char c = (unsigned char)p[i];
+        const char *esc = NULL;
+        long esclen = 0;
+        char ubuf[6];
+        switch (c) {
+            case '"':  esc = "\\\""; esclen = 2; break;
+            case '\\': esc = "\\\\"; esclen = 2; break;
+            case '\b': esc = "\\b";  esclen = 2; break;
+            case '\t': esc = "\\t";  esclen = 2; break;
+            case '\n': esc = "\\n";  esclen = 2; break;
+            case '\f': esc = "\\f";  esclen = 2; break;
+            case '\r': esc = "\\r";  esclen = 2; break;
+            default:
+                if (c < 0x20) {
+                    ubuf[0] = '\\'; ubuf[1] = 'u'; ubuf[2] = '0'; ubuf[3] = '0';
+                    ubuf[4] = hexdigits[(c >> 4) & 0xf];
+                    ubuf[5] = hexdigits[c & 0xf];
+                    esc = ubuf; esclen = 6;
+                }
+        }
+        if (esc) {
+            if (i > start) rb_str_cat(buf, p + start, i - start);
+            rb_str_cat(buf, esc, esclen);
+            start = i + 1;
+        }
+    }
+    if (len > start) rb_str_cat(buf, p + start, len - start);
+    rb_str_cat(buf, "\"", 1);
+}
+// Hash keys mirror JSON.generate: a String key is emitted as-is, anything else
+// is stringified (Symbol via its name, otherwise #to_s) before escaping.
+static void encode_key(VALUE buf, VALUE key)
+{
+    VALUE kstr;
+    if (RB_TYPE_P(key, T_STRING)) {
+        kstr = key;
+    } else if (RB_TYPE_P(key, T_SYMBOL)) {
+        kstr = rb_sym2str(key);
+    } else {
+        kstr = rb_funcall(key, id_to_s, 0);
+    }
+    encode_string(buf, RSTRING_PTR(kstr), RSTRING_LEN(kstr));
+}
+typedef struct {
+    VALUE buf;
+    int first;
+} hash_ctx;
+static int hash_iter(VALUE key, VALUE value, VALUE arg)
+{
+    hash_ctx *ctx = (hash_ctx *)arg;
+    if (!ctx->first) rb_str_cat(ctx->buf, ",", 1);
+    ctx->first = 0;
+    encode_key(ctx->buf, key);
+    rb_str_cat(ctx->buf, ":", 1);
+    encode_value(ctx->buf, value);
+    return ST_CONTINUE;
+}
+// Splice the pure-Ruby fragment for a value the native path does not format.
+static void delegate(VALUE buf, VALUE val)
+{
+    VALUE frag = rb_funcall(rb_mExtJson, id_encode_fragment, 1, val);
+    rb_str_cat(buf, RSTRING_PTR(frag), RSTRING_LEN(frag));
+}
+// Epoch second for 10000-01-01T00:00:00Z. `bson`'s relaxed Time encoding uses
+// the ISO-8601 string form only for years 1970..9999 (inclusive) and the
+// {"$numberLong":"<ms>"} form otherwise; that year window is exactly the
+// half-open epoch-second range [0, MAX_ISO_EPOCH).
+#define MAX_ISO_EPOCH 253402300800LL
+// Format a Time as Relaxed Extended JSON in C, matching bson 5.2.0 byte for
+// byte for the common in-range case (see bson/time.rb and the empirical probe):
+//   - whole second (usec == 0, i.e. nsec < 1000): {"$date":"...:SSZ"} (no fraction)
+//   - sub-second   (nsec >= 1000):                {"$date":"...:SS.mmmZ"}, where the
+//     millisecond is floor(nsec / 1e6) — bson floors the Time to milliseconds.
+// Returns 1 when handled. Returns 0 (leaving buf untouched) for years outside
+// 1970..9999, whose {"$numberLong"} form involves negative-epoch arithmetic too
+// fiddly to risk in C — the caller then delegates that rare case to Ruby.
+static int encode_time_native(VALUE buf, VALUE val)
+{
+    struct timespec ts = rb_time_timespec(val);
+    if (ts.tv_sec < 0 || ts.tv_sec >= MAX_ISO_EPOCH) return 0;
+    time_t secs = (time_t)ts.tv_sec;
+    struct tm tm;
+    if (gmtime_r(&secs, &tm) == NULL) return 0;
+    char tmp[40];
+    int n;
+    if (ts.tv_nsec >= 1000) {
+        int ms = (int)(ts.tv_nsec / 1000000L);
+        n = snprintf(tmp, sizeof(tmp),
+                     "{\"$date\":\"%04d-%02d-%02dT%02d:%02d:%02d.%03dZ\"}",
+                     tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+                     tm.tm_hour, tm.tm_min, tm.tm_sec, ms);
+    } else {
+        n = snprintf(tmp, sizeof(tmp),
+                     "{\"$date\":\"%04d-%02d-%02dT%02d:%02d:%02dZ\"}",
+                     tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
+                     tm.tm_hour, tm.tm_min, tm.tm_sec);
+    }
+    rb_str_cat(buf, tmp, n);
+    return 1;
+}
+static void encode_value(VALUE buf, VALUE val)
+{
+    switch (TYPE(val)) {
+        case T_NIL:
+            rb_str_cat(buf, "null", 4);
+            return;
+        case T_TRUE:
+            rb_str_cat(buf, "true", 4);
+            return;
+        case T_FALSE:
+            rb_str_cat(buf, "false", 5);
+            return;
+        case T_FIXNUM: {
+            // A Fixnum always fits in a C long (and thus int64) on the platforms
+            // exwiw targets, so it can never be the out-of-int64 case that must
+            // raise; emit it directly. Bignums fall through to delegate, where
+            // encode_fragment emits in-range ones and raises RangeError for the
+            // rest — matching today's behavior exactly.
+            char tmp[24];
+            int n = snprintf(tmp, sizeof(tmp), "%ld", (long)FIX2LONG(val));
+            rb_str_cat(buf, tmp, n);
+            return;
+        }
+        case T_STRING:
+            encode_string(buf, RSTRING_PTR(val), RSTRING_LEN(val));
+            return;
+        case T_ARRAY: {
+            long len = RARRAY_LEN(val);
+            rb_str_cat(buf, "[", 1);
+            for (long i = 0; i < len; i++) {
+                if (i > 0) rb_str_cat(buf, ",", 1);
+                encode_value(buf, rb_ary_entry(val, i));
+            }
+            rb_str_cat(buf, "]", 1);
+            return;
+        }
+        case T_HASH: {
+            // rb_hash_foreach preserves insertion order, matching JSON output.
+            hash_ctx ctx = { buf, 1 };
+            rb_str_cat(buf, "{", 1);
+            rb_hash_foreach(val, hash_iter, (VALUE)&ctx);
+            rb_str_cat(buf, "}", 1);
+            return;
+        }
+        default:
+            // BSON::ObjectId is the single most common leaf (`_id`) and its
+            // Relaxed form is the stable {"$oid":"<24 hex>"}, so format it here.
+            // The hex comes from #to_s (the same source as as_extended_json) and
+            // is always [0-9a-f]{24}, so it needs no escaping.
+            if (!NIL_P(rb_cObjectId) && RTEST(rb_obj_is_kind_of(val, rb_cObjectId))) {
+                VALUE hex = rb_funcall(val, id_to_s, 0);
+                rb_str_cat(buf, "{\"$oid\":\"", 9);
+                rb_str_cat(buf, RSTRING_PTR(hex), RSTRING_LEN(hex));
+                rb_str_cat(buf, "\"}", 2);
+                return;
+            }
+            // Time is the other common leaf in dumped documents (Mongoid's
+            // created_at/updated_at); format the in-range case natively. The
+            // out-of-range $numberLong form returns 0 and falls through to Ruby.
+            if (RTEST(rb_obj_is_kind_of(val, rb_cTime)) && encode_time_native(buf, val)) {
+                return;
+            }
+            // Float, Bignum, Symbol, Decimal128, Binary, out-of-range Time, ... -> Ruby.
+            delegate(buf, val);
+            return;
+    }
+}
+// Resolve and cache BSON::ObjectId the first time a document is encoded with
+// bson loaded. Cheap const lookups guarded by the Qnil cache; once resolved it
+// is skipped. Until resolved, ObjectId simply takes the (correct) delegate path.
+static void resolve_objectid_class(void)
+{
+    if (!NIL_P(rb_cObjectId)) return;
+    if (!rb_const_defined(rb_cObject, id_const_BSON)) return;
+    VALUE bson = rb_const_get(rb_cObject, id_const_BSON);
+    if (rb_const_defined(bson, id_const_ObjectId)) {
+        rb_cObjectId = rb_const_get(bson, id_const_ObjectId);
+    }
+}
+// Exwiw::ExtJson.encode_native(doc) -> String
+// Returns one JSONL line (no trailing newline); the caller owns separators.
+static VALUE rb_encode_native(VALUE self, VALUE doc)
+{
+    resolve_objectid_class();
+    VALUE buf = rb_str_buf_new(256);
+    rb_enc_associate(buf, rb_utf8_encoding());
+    encode_value(buf, doc);
+    return buf;
+}
+void Init_ext_json_native(void)
+{
+    id_encode_fragment = rb_intern("encode_fragment");
+    id_to_s = rb_intern("to_s");
+    id_const_BSON = rb_intern("BSON");
+    id_const_ObjectId = rb_intern("ObjectId");
+    VALUE mExwiw = rb_define_module("Exwiw");
+    rb_mExtJson = rb_define_module_under(mExwiw, "ExtJson");
+    rb_global_variable(&rb_mExtJson);
+    rb_cObjectId = Qnil;
+    rb_global_variable(&rb_cObjectId);
+    rb_define_singleton_method(rb_mExtJson, "encode_native", rb_encode_native, 1);
+}

data/ext/exwiw/ext_json/extconf.rb ADDED Viewed

@@ -0,0 +1,8 @@
+# frozen_string_literal: true
+require "mkmf"
+# Compiled to lib/exwiw/ext_json_native.{so,bundle}. The name is distinct from
+# the `ext_json.rb` shim so `require "exwiw/ext_json_native"` does not collide
+# with `require_relative "exwiw/ext_json"`.
+create_makefile("exwiw/ext_json_native")