smarter_json 0.9.2 → 0.9.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2256f81fe3b29e83a42dcf948db896a03cdac568bbc799cb3b63b9516a76592d
4
- data.tar.gz: c13d572f3cb417fdffc16423a38e180018f121adbc31cbbc2490bc39576bf7b5
3
+ metadata.gz: fbc93b4afea26fc4d30c241ceb7823451cb7b777454441a9871f066b719f8c07
4
+ data.tar.gz: a9a5801cab6a604e4d166d6d86aa48f41f5e1bb74fb58f39f283a0477ac78db4
5
5
  SHA512:
6
- metadata.gz: 8ccaf09a845726e751740a870f62e008fac272bf314f6de88bba069663fa1fb9ba890d469bb1010ee55def74eb291f2953251372a2adcebae8d641a0609ff541
7
- data.tar.gz: ce622275f2c90fc5044a0c0a9c2c8efcd601326c54f1869cb62241e3f78a784d9d237c9ff9f4c5b44e734562b22bcc744c0a14577641336ec5adeb24e1926c29
6
+ metadata.gz: c8749bd358973f0d284966a6c5fb42a4e71954c8884c337c5859f5f3e566b302d1a4c1dbc9961ed5ae46aaf5a9e019935fa46916bb858877a2ef34ecc99575c9
7
+ data.tar.gz: 2f2efe9c89ae08bcf807e061ca734d13056c6fafc590d072d7870a705cf25e3d4932fff567c1b09320f5cbd6b42e02255180fded05c84b08f1032abb0594b8bf
data/.gitignore CHANGED
@@ -44,3 +44,4 @@ overage/
44
44
 
45
45
  .claude/
46
46
  CLAUDE.md
47
+ INTERNAL_DEV_LOG.md
data/CHANGELOG.md CHANGED
@@ -3,109 +3,132 @@
3
3
 
4
4
  > 🚧 Getting ready for the 1.0.0 release - sorry for the interface changes - thank you for your patience! 🚧
5
5
 
6
+ > ⚠️ **Interface change (since 0.9.7):**
7
+ >
8
+ > `SmarterJSON.process` / `SmarterJSON.process_file` now **always return an `Array`** of documents:
9
+ > — `[]` for no doc
10
+ > - `[doc]` for one doc
11
+ > - `[d1, d2, …]` for several docs (NDJSON / JSONL / concatenated docs).
12
+
13
+ Going forward this will be the supported interface.
14
+
15
+ > ⚠️ We discourage the use of `process(input).first` / `[0]` because it silently drops potential additional documents
16
+ > Please use `process_one` if you are expecting only one JSON doc, e.g. in API payloads.
17
+
18
+ ## 0.9.9 (2026-06-07)
19
+ - Much faster pure-Ruby parsing (the path used without the C extension) — roughly 3× on string-heavy data, ~2× on number-heavy, ~1.7× on object-heavy (on a YJIT-enabled Ruby). Parsed values are unchanged.
20
+
21
+ ## 0.9.8 (2026-06-06 unreleased)
22
+ - Faster parsing of string-heavy arrays — Parsed values are unchanged.
23
+
24
+ ## 0.9.7 (2026-06-05 unreleased)
25
+ - **Breaking: `process` / `process_file` now always return an `Array` of documents** — `[]` for none, `[doc]` for one, `[d1, d2, …]` for several. (Previously polymorphic: `nil` / the value / an `Array`.) The document count is now unambiguous, and any result can be iterated uniformly.
26
+ - **New `SmarterJSON.process_one(input)`** — the single-document accessor for the common case: returns the one document's value (or `nil`), and *warns* (never raises) if the input held more than one. Takes a String or an IO; for an IO it is bounded-memory (parses just the first document). Reaching for `.first` / `[0]` on a `process` result silently drops extra documents — use `process_one` instead.
27
+ - The **block form now returns the document count** (was `nil`): `n = SmarterJSON.process(io) { |doc| ... }`.
28
+ - **The top level is stricter, which keeps the LLM-wrapper recovery working:** a top-level value must be a recognized JSON value (number / `true` / `false` / `null` / quoted string / object / array) or an implicit-root object (`host: localhost`). A bare top-level run — `localhost`, `1 2 3`, the typo `flase` — now raises `ParseError` instead of becoming a quoteless string. A space is never a document separator (`1 2 3` raises rather than splitting into three). In-container quoteless strings (`[red green blue]`, `host: localhost`) are unchanged.
29
+
30
+ ## 0.9.6 (2026-06-04 unreleased)
31
+ - Faster `decimal_precision: :float` parsing of full-precision decimal numbers (around 17–18 significant digits — e.g. coordinate data and scientific output). Parsed values are unchanged: still correctly rounded, bit-for-bit identical to `JSON.parse`.
32
+
33
+ ## 0.9.5 (2026-06-04 unreleased)
34
+ - Faster `decimal_precision: :float` parsing of very high-precision decimal numbers (more than ~17 significant digits). Parsed values are unchanged.
35
+ - Faster parsing of object-heavy and compact documents — less per-element overhead in the C parser. No behavior change.
36
+
37
+ ## 0.9.4 (2026-06-04 unreleased)
38
+ - Internal performance experiments. No user-facing changes.
39
+
40
+ ## 0.9.3 (2026-06-03)
41
+ - Renamed the `bigdecimal_load:` option to `decimal_precision:` (same values: `:auto`, `:float`, `:bigdecimal`).
42
+ - Invalid option *values* now raise `ArgumentError` with a clear message instead of being silently ignored. Unknown option keys are still ignored.
43
+ - Faster parsing of pretty-printed (indented) input.
44
+ - Removed the `duplicate_key: :raise` option — it conflicted with SmarterJSON's lenient design. `duplicate_key:` now accepts `:last_wins` (default) and `:first_wins`; repeated keys are still reported through `on_warning`.
45
+
6
46
  ## 0.9.2 (2026-06-03)
7
- - **Fix a residual performance regression affecting every large document.** The "leading label" check (for `JSON: {…}`, which parses successfully but wrongly as an implicit-root object) now uses `String#start_with?(/…/)` instead of `match?(/\A…/)`. A `\A`-anchored `match?` is **not** anchor-optimized — it retries at every byte position and so scanned the entire input (~0.3 s on a 200 MB document) on every parse, which had quietly taxed every large file since the wrapper was introduced (deeply_nested.json and big_decimals.json sat well below their 0.6.0 throughput even after 0.9.1). `start_with?` inspects only the beginning, restoring — and slightly exceeding — 0.6.0 throughput across the board.
47
+ - Fixed a performance regression that slowed parsing of large documents.
8
48
 
9
49
  ## 0.9.1 (2026-06-03 unreleased)
10
- - **Fix a major performance regression on real-world data** (introduced with the 0.8.0 wrapper recovery). Wrapper recovery is now **reactive**: input is parsed first, and the markdown-fence / `<json>` / prose extraction runs only when that parse actually fails. Before, any input that merely *contained* ` ``` ` or `<json>` anywhere — including inside ordinary JSON string values, as GitHub-event payloads and other markdown-bearing data routinely do — was dragged through a full pure-Ruby recovery scan plus a double parse on every call (~30–45× slower on those files). A bare leading label like `JSON: {…}`, which parses successfully but wrongly, is still caught up front before parsing.
11
- - **Streaming framer**: a multi-byte marker (`//`, `/*`, `'''`, `*/`) whose bytes straddle a read-chunk boundary is no longer mis-scanned the framer waits for the rest of the marker before deciding, so a brace inside such a comment/string can no longer end a document early.
12
- - Wrapper warnings (`code_fence_stripped` / `wrapper_tag_stripped`) now fire only when the marker is actually in the stripped text, not when it sits inside a recovered payload's own string value.
13
- - Shared `SmarterJSON::Bytes` constants for the parser and the framer / recovery scanners (no raw hex byte literals).
50
+ - Fixed a major performance regression on real-world data that contained markdown fences or `<json>` markers inside ordinary string values.
51
+ - Streaming: a document is no longer cut off early when a comment / quote marker falls across a read-chunk boundary.
14
52
 
15
53
  ## 0.9.0 (2026-06-03 unreleased)
16
- - performance improvements
17
- - code cleanup
54
+ - Performance improvements and code cleanup.
18
55
 
19
56
  ## 0.8.0 (2026-06-03)
20
57
  - **Robustness** against LLM-generated / wrapped JSON:
21
58
  - strips markdown code fences (```json / ```)
22
- - ignores obvious prefix / suffix prose around a payload
59
+ - ignores leading / trailing prose around a JSON payload
23
60
  - unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`
24
- - preserves multiple recovered payloads as an `Array`
25
- - supports pretty-printed multi-line document framing on IO / block input
26
- - **Warnings** now cover wrapper recovery too (`:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, `:wrapper_tag_stripped`)
27
- - **No truncation recovery**: truncated / unterminated input still raises `SmarterJSON::ParseError`
61
+ - returns multiple recovered payloads as an `Array`
62
+ - parses pretty-printed multi-line documents from IO / block input
63
+ - reports each recovery through `on_warning` (`:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, `:wrapper_tag_stripped`)
64
+ - Truncated / unterminated input still raises `SmarterJSON::ParseError` — SmarterJSON does not guess at missing data.
28
65
 
29
66
  ## 0.7.0 (2026-06-03)
30
- - **Breaking:** replaced the `warnings:` option (and its `[result, warnings]` tuple return) with an `on_warning:` callable. Pass `on_warning: ->(w) { ... }` to be handed each `SmarterJSON::Warning` as the parser applies a lenient fix; `process` / `process_file` now always return the bare value (nil / value / Array) on every path. Unlike the tuple, this also fires on the streaming block form. The default (no handler) records nothing and costs nothing.
67
+ - **Breaking:** replaced the `warnings:` option (and its `[result, warnings]` return) with an `on_warning:` callable. Pass `on_warning: ->(w) { ... }` to be handed each `SmarterJSON::Warning` as a lenient fix is applied; `process` / `process_file` now always return just the value, including on the streaming block form. The default (no handler) records nothing and costs nothing.
31
68
 
32
69
  ## 0.6.0 (2026-06-02)
33
- - Lenient comma handling: empty slots around / between commas are collapsed (`[1,,2]` → `[1,2]`, `[,1,]` → `[1]`, `{a:1,,b:2}` → `{a:1,b:2}`), on both the C and Ruby paths. No null is inserted for an empty slot.
34
- - A key with a colon but no value reads as null: `{a:}` → `{"a"=>nil}` (both paths).
35
- - New opt-in `warnings:` option. With `warnings: true`, `process` / `process_file` return `[result, warnings]`, where `warnings` is an Array of `SmarterJSON::Warning` (`type`, `message`, `line`, `col`) recording the lenient fixes applied — `:empty_slot`, `:empty_value`, `:duplicate_key`. Default off; works on both paths.
36
- - Fixed a pure-Ruby bug where a mantissa-less exponent token (e.g. `-e695881`) was read as `0.0`; it is now a quoteless string, matching the C path.
37
- - Fixed a pure-Ruby bug where a `\u` escape whose next bytes split a multibyte character leaked `ArgumentError`; it now raises `SmarterJSON::ParseError`.
38
- - Added a property/fuzz test suite that checks C/Ruby parity and round-tripping on generated, mutated, and random input.
70
+ - Lenient comma handling: empty slots around / between commas are collapsed (`[1,,2]` → `[1,2]`, `[,1,]` → `[1]`, `{a:1,,b:2}` → `{a:1,b:2}`). No null is inserted for an empty slot.
71
+ - A key with a colon but no value reads as null: `{a:}` → `{"a"=>nil}`.
72
+ - New opt-in `warnings:` option recording the lenient fixes applied — `:empty_slot`, `:empty_value`, `:duplicate_key`. (Superseded by `on_warning:` in 0.7.0.)
39
73
 
40
74
  ## 0.5.2 (2026-06-01) yanked
41
- - `generate` now supports pretty-printing via the `indent:` option (spaces per nesting level; default `0` = compact). Empty objects/arrays stay inline; `indent:` combined with `format: :ndjson` raises `ArgumentError`.
42
- - `generate` adds `sort_keys:` (emit object keys in sorted order), `ascii_only:` (escape non-ASCII as `\uXXXX`, astral chars as surrogate pairs), and `script_safe:` (escape `</` and U+2028/U+2029 for safe embedding in an HTML `<script>` tag).
43
- - `generate` adds opt-in `coerce:` — when `true`, a value that isn't natively supported (e.g. `Time`, `Date`, app objects) is converted via its own `as_json` (result re-emitted) or `to_json` (spliced); strict-by-default still raises `GenerateError`.
75
+ - `generate` supports pretty-printing via the `indent:` option (spaces per nesting level; default compact). Combining `indent:` with `format: :ndjson` raises `ArgumentError`.
76
+ - `generate` adds `sort_keys:` (emit object keys in sorted order), `ascii_only:` (escape non-ASCII), and `script_safe:` (escape `</` and U+2028/U+2029 for safe embedding in an HTML `<script>` tag).
77
+ - `generate` adds opt-in `coerce:` — convert an otherwise-unsupported value (e.g. `Time`, `Date`, app objects) via its own `as_json` / `to_json`; strict-by-default still raises `GenerateError`.
44
78
 
45
79
  ## 0.5.1 (2026-06-01) yanked
46
- - Unified the error classes under a single `SmarterJSON::Error` base: `ParseError` and `EncodingError` now inherit from it, and `generate` raises a new `GenerateError`. `rescue SmarterJSON::Error` now catches everything the gem raises.
47
- - Added a CI test matrix (Ruby 2.6–4.0 + head, on Ubuntu and macOS).
48
- - Fixed the C extension build on Ruby 2.6 (declare `rb_hash_bulk_insert`, which 2.6 exports but does not declare in its headers); set the minimum Ruby to 2.6.
80
+ - Unified the error classes under a single `SmarterJSON::Error` base: `ParseError`, `EncodingError`, and the new `GenerateError` all inherit from it, so `rescue SmarterJSON::Error` catches everything the gem raises.
81
+ - Added a CI test matrix (Ruby 2.6–4.0 + head, on Ubuntu and macOS); minimum Ruby is now 2.6.
49
82
 
50
83
  ## 0.5.0 (2026-05-31 unreleased)
51
- - add JSON generation, incl. NDJSON generation
52
- - add test coverage
84
+ - Added JSON generation, including NDJSON.
85
+ - Added test coverage.
53
86
 
54
87
  ## 0.4.0 (2026-05-31 unreleased)
55
- - rename `flex_json` -> `smarter_json`
88
+ - Renamed the gem `flex_json` `smarter_json`.
56
89
 
57
90
  ## 0.3.10 (2026-05-31 unreleased)
58
- - change interface to use `.process` and `.process_file`
59
-
91
+ - Changed the interface to `.process` and `.process_file`.
60
92
 
61
93
  ## 0.3.9 (2026-05-31 unreleased)
62
- - `parse` (no block) now handles any input automatically: 0 documents (empty / whitespace / comment-only) → `nil`, 1 document → the value itself, 2+ documents (NDJSON / JSONL / concatenated / whitespace-separated) → an Array of the values. It no longer raises on trailing content.
63
- - Detection is free (the same trailing-content check that used to raise) and the single-document path allocates no Array, so single-value parsing is unchanged in speed.
64
- - The block form (`parse(input) { |doc| … }`) is kept as the bounded-memory streaming path. `parse_file(path) { |doc| … }` now forwards the block too, so files stream the same way (previously the block was silently ignored). Bracketless comma lists (`1, 2, 3`) still raise — commas don't separate top-level documents (implicit-root array remains unsupported).
65
- - The block form allows individual processing of each line in NDJSON files.
66
- - Supersedes the earlier "raise on trailing content, match Oj" behavior.
94
+ - `process` with no block now handles any input automatically: 0 documents (empty / whitespace / comment-only) → `nil`, 1 document → the value itself, 2+ documents (NDJSON / JSONL / concatenated) → an `Array`. It no longer raises on trailing content.
95
+ - The block form (`process(input) { |doc| }`) streams documents with bounded memory; `process_file` forwards the block too, so each line of an NDJSON file can be processed individually.
67
96
 
68
97
  ## 0.3.8 (2026-05-30 unreleased)
69
- - Reordered single-character checks so the more common byte is tested first (`-` before `+`).
70
- - Quoteless-token boundary scan now uses a 256-byte class table: ordinary bytes are classified in one table lookup, and the lookahead byte is read only at a `#`/`/` instead of on every byte. Speeds up quoteless / config-style input (the lenient case the JSON benchmarks don't exercise).
98
+ - Performance improvements (quoteless / config-style input).
71
99
 
72
100
  ## 0.3.7 (2026-05-30 unreleased)
73
- - Escaped-string literal runs are bulk-copied with the NEON scanner instead of one byte at a time.
74
- - Added branch hints (`__builtin_expect`) and prefetch to the hot string-scan loop. Sped up string-heavy files (string_array, github_events, twitter all 12–16% faster).
101
+ - Performance improvements (string-heavy input).
75
102
 
76
103
  ## 0.3.6 (2026-05-30 unreleased)
77
- - Fast path for plain numbers inside objects/arrays (`fj_try_member_number`): one scan straight from the cursor, committing when the number meets a delimiter and falling back to the quoteless scanner otherwise. Skips the quoteless boundary scan + classify dispatch for the common case. Broad gains on number-in-container files (weather, canada, usgs, big_decimals).
104
+ - Performance improvements (numbers inside objects / arrays).
78
105
 
79
106
  ## 0.3.5 (2026-05-30 unreleased)
80
- - Rewrote `fj_parse_number` (top-level numbers) as a single pass: finds the token end and accumulates the mantissa/exponent at once, using the string's NUL terminator as a scan sentinel (no per-byte bounds check) and a digit loop that skips the underscore check until an underscore actually appears.
81
- - Added `fj_try_decimal` for the quoteless path: validates and extracts the number in one scan, replacing the old three scans (validate + significant-digit count + mantissa extraction); skips the significant-digit scan when the number has ≤16 digits.
82
- - Both number paths now build values through the shared `fj_int_from_parts` / `fj_float_from_parts` helpers so they can't drift; removed the now-dead `fj_validate_decimal` / `fj_int_value` / `fj_decimal_value`.
107
+ - Performance improvements (number parsing).
83
108
 
84
109
  ## 0.3.4 (2026-05-30 unreleased)
85
- - Dropped a per-member Ruby method call (`key?`) that fired for every object member under the default duplicate-key mode — pure waste on object-heavy files (twitter, github_events, citm).
86
- - Build objects and arrays from a C value stack with a pre-sized hash + bulk insert (and size-based duplicate detection), instead of inserting one member/element at a time.
87
- - Added a per-parse key cache so repeated object keys are interned once instead of every occurrence.
110
+ - Performance improvements (object-heavy input).
88
111
 
89
112
  ## 0.3.3 (2026-05-30 unreleased)
90
- - Vendored Ryū (Ulf Adams, Apache-2.0) for correctly-rounded string→double conversion: the mantissa is accumulated in one pass and converted with no `strtod`. Large win on float-heavy files (canada, big_decimals).
113
+ - Faster, correctly-rounded float parsing.
91
114
 
92
115
  ## 0.3.3 (2026-05-29 unreleased)
93
- - performance fixes
116
+ - Performance fixes.
94
117
 
95
118
  ## 0.3.2 (2026-05-29 unreleased)
96
- - performance fixes
119
+ - Performance fixes.
97
120
 
98
121
  ## 0.3.1 (2026-05-29 unreleased)
99
- - performance fixes
122
+ - Performance fixes.
100
123
 
101
124
  ## 0.3.0 (2026-05-29 unreleased)
102
- - iterative parser
125
+ - Iterative parser.
103
126
 
104
127
  ## 0.2.0 (2026-05-29 unreleased)
105
- - recursive parser
128
+ - Recursive parser.
106
129
 
107
130
  ## 0.1.1 (2026-05-29 unreleased)
108
- - MVP complete
131
+ - MVP complete.
109
132
 
110
133
  ## 0.1.0 (2026-05-28 unreleased)
111
- - Initial Ruby version
134
+ - Initial Ruby version.
data/README.md CHANGED
@@ -2,27 +2,62 @@
2
2
 
3
3
  ![Gem Version](https://img.shields.io/gem/v/smarter_json) [![codecov](https://codecov.io/gh/tilo/smarter_json/branch/main/graph/badge.svg)](https://codecov.io/gh/tilo/smarter_json) <!-- [![Downloads](https://img.shields.io/gem/dt/smarter_json)](https://rubygems.org/gems/smarter_json) --> [![RubyGems](https://img.shields.io/badge/RubyGems-smarter__json-brightgreen?logo=rubygems&logoColor=white)](https://rubygems.org/gems/smarter_json) [![Ruby Toolbox](https://img.shields.io/badge/Ruby%20Toolbox-smarter__json-brightgreen)](https://www.ruby-toolbox.com/projects/smarter_json)
4
4
 
5
- A lenient, fast JSON parser for Ruby. It parses strict JSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on nearly every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Other parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
5
+ A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Traditional JSON parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
6
6
 
7
- > **SmarterJSON: one parser, no modes — want strict? Please use the stdlib `json` gem.**
7
+ > **SmarterJSON: one tool, no modes — want strict? Please use the stdlib `json` gem.**
8
8
 
9
9
  ## Why SmarterJSON?
10
10
 
11
- Most JSON parsers reject anything that isn't perfectly strict JSON. SmarterJSON is built on the opposite principle: **you shouldn't have to care what flavor of JSON you were handed** and **you shouldn't lose the whole document because of formatting errors.** Give it strict JSON, JSON5, an HJSON-style config file, newline-delimited JSON, or a copy-pasted blob with comments and trailing commas — it just parses it. When it is lenient, `smarter_json` isn't dropping data that exists — it's just not raising an eyebrow at a suspicious gap (like an extra comma). A strict parser would refuse the whole document and recover nothing; `smarter_json` returns everything except the formatting error.
11
+ **Are you tired of seeing errors like these?**
12
+
13
+ ```
14
+ ERROR running JSON.parse (stdlib) on deeply_nested.json: JSON::NestingError: nesting of 101 is too deep
15
+
16
+ ERROR running Oj.load (default) on config.json5: Oj::ParseError: unexpected character (after [0]) at line 5, column 6 [parse.c:931]
17
+
18
+ ERROR running Oj.load (strict, float) on config.json5: Oj::ParseError: unexpected character (after [0]) at line 5, column 6 [parse.c:931]
19
+
20
+ ERROR running Oj.load (compat) on config.json5: EncodingError: unexpected character (after [0]) at line 5, column 6 [parse.c:931] in '// JSON5 config sample — leni…
21
+
22
+ ERROR running JSON.parse (stdlib) on config.json5: JSON::ParserError: expected object key, got 'id:' at line 4 column 5
23
+
24
+ ERROR running Yajl::Parser (yajl-ruby) on config.json5: Yajl::ParseError: lexical error: invalid char in json text. this. */ [ // record 0 { id: 0, name: 'alpha-0', mask: 0 (…
25
+
26
+ ERROR running Oj.load (default) on github_events_100k.ndjson: Oj::ParseError: unexpected characters after the JSON document (after ) at line 2, column 1 [parse.c:870]
27
+
28
+ ERROR running Oj.load (strict, float) on github_events_100k.ndjson: Oj::ParseError: unexpected characters after the JSON document (after ) at line 2, column 1 [parse.c:870]
29
+
30
+ ERROR running Oj.load (compat) on github_events_100k.ndjson: EncodingError: unexpected characters after the JSON document (after ) at line 2, column 1 [parse.c:870] in '{"id":"…
31
+
32
+ ERROR running JSON.parse (stdlib) on github_events_100k.ndjson: JSON::ParserError: unexpected token at end of stream '{"id":"34816047161","type":"Dele' at line 1 column 1
33
+
34
+ ERROR running Yajl::Parser (yajl-ruby) on github_events_100k.ndjson: Yajl::ParseError: Found multiple JSON objects in the stream but no block or the on_parse_complete callback was
35
+ assigne…
36
+ ```
37
+
38
+ **Do you have no control of the input quality?**
39
+
40
+ Traditional JSON parsers reject anything that isn't perfectly strict JSON. That means your code breaks on malformed data.
41
+
42
+ SmarterJSON is built on the opposite principle: **you shouldn't have to care what flavor of JSON you were handed** and **you shouldn't lose the whole document because of formatting errors.**
43
+ Give it strict JSON, NDJSON, JSON5, an HJSON-style config file, LLM-generated JSON, or a copy-pasted blob with comments and trailing commas — it just extracts the data from it.
44
+ When it is lenient, `smarter_json` isn't dropping data that exists — it's just not raising an eyebrow at a suspicious gap (like an extra comma).
45
+
46
+ A strict parser would refuse the whole document and recover nothing; `smarter_json` returns everything except the formatting error.
12
47
 
13
48
  > For an ingestion tool, "reject the whole document because of one stray comma" is the worst outcome: you throw away the 99% that's fine to avoid maybe-mishandling a gap that carries no data anyway.
14
49
 
15
50
  Three things set it apart:
16
51
 
17
- 1. **One parser, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.
52
+ 1. **One tool, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure it to match your input; it adapts to whatever you give it.
18
53
 
19
- 2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time.
54
+ 2. **It extracts every document from multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: it always returns an `Array` of the documents found (`[]` / `[doc]` / `[d1, d2, …]`). For the common single-document case, `SmarterJSON.process_one` returns the one value directly (and warns, never raises, if there was more than one). The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time.
20
55
 
21
- 3. **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser — the fastest general-purpose Ruby JSON parser.
56
+ 3. **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) matches or beats Oj on every file we benchmark, and is competitive with the stdlib `json` C parser — among the fastest general-purpose JSON processors in Ruby.
22
57
 
23
58
  ## What it accepts, beyond strict JSON
24
59
 
25
- - `//`, `/* … */`, and `#` comments (a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` parses as a string, not a truncated value)
60
+ - `//`, `/* … */`, and `#` comments (a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` is read as a string, not a truncated value)
26
61
  - Markdown-wrapped / chatty blobs around the payload: strips ```` ```json ```` / ```` ``` ```` fences, ignores obvious prose before/after the payload, unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`, and preserves multiple recovered payloads as an Array
27
62
  - Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
28
63
  - Implicit root object — a config file that starts with `key: value`, no outer `{}`
@@ -31,7 +66,19 @@ Three things set it apart:
31
66
  - Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via `encoding:`)
32
67
  - Duplicate keys (last value wins by default; configurable)
33
68
 
34
- It raises only on genuinely unparseable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.
69
+ It raises only on genuinely unreadable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.
70
+
71
+ ### Format references
72
+
73
+ The lenient grammar is a superset of these human-JSON specs — listed once, here:
74
+
75
+ * [JSON5](https://json5.org/)
76
+ * [HJSON](https://hjson.github.io/)
77
+ * [JWCC / HuJSON](https://github.com/tailscale/hujson)
78
+ * [Nigel Tao](https://nigeltao.github.io/blog/2021/json-with-commas-comments.html)
79
+ * [JSONH](https://github.com/jsonh-org/Jsonh)
80
+ * [JSONC (VS Code)](https://jsonc.org/)
81
+ * [NDJSON / JSON Text Sequences (RFC 7464)](https://datatracker.ietf.org/doc/html/rfc7464).
35
82
 
36
83
  ## Installation
37
84
 
@@ -44,13 +91,48 @@ gem "smarter_json"
44
91
  gem install smarter_json
45
92
  ```
46
93
 
47
- The C extension is built on install and used automatically. On platforms where it can't build, the pure-Ruby parser runs instead and produces identical results.
94
+ The C extension is built on install and used automatically. On platforms where it can't build, the pure-Ruby implementation runs instead and produces identical results.
95
+
96
+ ## Usage
97
+
98
+ Pass a String of JSON content or an IO; you get back the extracted data. The same call handles strict JSON, JSON5, and HJSON-style config — there are no modes or flags.
99
+
100
+ ```ruby
101
+ require "smarter_json"
102
+
103
+ SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => [{"a"=>1, "b"=>[2, 3]}] (always an Array of documents)
104
+ SmarterJSON.process_one('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]} (the one document's value)
105
+ SmarterJSON.process_file("config.json5") # read a file, then process
106
+ ```
107
+
108
+ **Prefer `process`.** It always returns an `Array`, so the document count is explicit and you never silently drop one. Reach for `process_one` when you want just the single document's value — it *warns* (never raises) if the input turns out to hold more than one, so an unexpected extra document is surfaced, not dropped.
109
+
110
+ ## Usage in APIs
111
+
112
+ At an API boundary the JSON comes from someone you don't control — a client POSTing a request body to *your* service, or an upstream service answering a call *you* made — and it isn't always clean: a stray trailing comma, a `NaN`, a payload wrapped in prose, or a quiet change to the format. A strict parser turns any of those into an exception (a request you reject, or a failed call chain). SmarterJSON extracts the data that's there instead, so one formatting quirk doesn't sink the whole request:
113
+
114
+ ```ruby
115
+ # Inbound — JSON a caller sent to your endpoint:
116
+ data = SmarterJSON.process(request.body)
117
+
118
+ # Outbound — JSON from a service you called:
119
+ data = SmarterJSON.process(response.body)
120
+ ```
121
+
122
+ What that buys you:
123
+
124
+ * fewer "random production crashes" from messy JSON on either side of the wire
125
+ * resilience when a caller or a provider changes its output
126
+ * the option to log and recover, instead of rejecting the request outright
127
+ * consistent handling of edge-case payloads
128
+
129
+ See [Examples](#examples) below for multi-document input, streaming, and recovering JSON from LLM / markdown noise.
48
130
 
49
- ## API stability and thread safety
131
+ ## Stable interface & thread safety
50
132
 
51
- The public API is now considered stable: `SmarterJSON.process`, `SmarterJSON.process_file`, `SmarterJSON.generate`, and the documented options in this README/docs are the supported surface.
133
+ The public interface is now considered stable: `SmarterJSON.process`, `SmarterJSON.process_one`, `SmarterJSON.process_file`, `SmarterJSON.generate`, and the documented options in this README/docs are the supported surface.
52
134
 
53
- Concurrent calls are safe. The parser/generator keep per-call state local, and the C extension only caches Ruby IDs / constants at load time; it does not share mutable parse state across calls.
135
+ Concurrent calls are safe. The processor and generator keep per-call state local, and the C extension only caches Ruby IDs / constants at load time; it does not share mutable state across calls.
54
136
 
55
137
  ## Documentation
56
138
 
@@ -60,106 +142,167 @@ Concurrent calls are safe. The parser/generator keep per-call state local, and t
60
142
  * [Configuration Options](docs/options.md)
61
143
  * [Examples](docs/examples.md)
62
144
 
63
- ## Usage
145
+ ### Warnings (`on_warning`)
146
+
147
+ When SmarterJSON quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key, strips code fences, ignores wrapper prose, unwraps wrapper tags — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
64
148
 
65
149
  ```ruby
66
- require "smarter_json"
150
+ # Collect them all:
151
+ warns = []
152
+ data = SmarterJSON.process(input, on_warning: ->(w) { warns << w })
67
153
 
68
- SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
69
- SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
70
- SmarterJSON.process_file("config.json5") # read a file, then parse
154
+ # Or route them log, count, raise:
155
+ SmarterJSON.process(input, on_warning: ->(w) { Rails.logger.warn(w) })
156
+ ```
71
157
 
72
- # Multiple documents (NDJSON / JSONL / concatenated) — no block, no special method:
73
- SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]
74
- SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
75
- SmarterJSON.process("") # => nil (zero documents)
76
158
 
77
- # For input larger than memory, stream one document at a time with a block
78
- # (process and process_file both forward the block):
79
- SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
159
+ ## Performance
80
160
 
81
- # Wrapper noise is stripped automatically:
82
- SmarterJSON.process(<<~TEXT)
83
- Here is the JSON:
161
+ SmarterJSON is a C extension (with a pure-Ruby fallback that runs everywhere). Before the speed table, the part that isn't a "× faster" — **things the other parsers can't do at all:**
84
162
 
85
- ```json
86
- {
87
- "a": 1
88
- }
89
- ```
90
- TEXT
91
- # => {"a"=>1}
163
+ - **stdlib `json` can't parse deeply nested data.** It caps nesting at 100 levels and raises; SmarterJSON has no depth limit (iterative parser, bounded only by memory).
164
+ - **None of the others read NDJSON / JSONL / concatenated input in a single call.** Oj, `json`, and Yajl each raise on the second document. Only SmarterJSON's `process` returns every document as an `Array`.
165
+ - **None of the others parse JSON5, HJSON-style config, or LLM-wrapped output.** Comments, trailing commas, unquoted keys, quoteless values, `'single quotes'`, markdown code fences, prose wrappers — all raise in Oj / `json` / Yajl; SmarterJSON parses them.
166
+ - **`json` and Yajl produce `Float` only — lossy on high-precision numbers.** On coordinate / scientific data (>16 significant digits) they silently round to `Float`, so they aren't a like-for-like comparison there. SmarterJSON (and Oj) keep full precision as `BigDecimal` by default.
92
167
 
93
- SmarterJSON.process(<<~TEXT)
94
- Here is the result:
168
+ Where a like-for-like comparison exists, here is SmarterJSON's C path against each parser. **Apple M4, Ruby 3.4.7, p10 of 40 runs.** Each cell is **SmarterJSON vs that parser** — "faster" means SmarterJSON wins. Ratios shift with hardware; run `rake report` in `json_benchmarks/` to reproduce.
95
169
 
96
- {
97
- "a": 1
98
- }
170
+ | File | vs Oj/strict | vs `json` | vs Yajl |
171
+ | ----------------------------- | --------------- | ---------------------------- | --------------- |
172
+ | big_decimals <sup>≠</sup> | **1.8× faster** | **1.1× faster** | **1.3× faster** |
173
+ | canada <sup>≠</sup> | **8× faster** | 1.1× slower | **2.2× faster** |
174
+ | citm_catalog | **1.6× faster** | 1.2× slower | **4.8× faster** |
175
+ | citylots <sup>≠</sup> | **3.6× faster** | **2.0× faster** | **2.3× faster** |
176
+ | config.jsonc | **1.1× faster** | 1.5× slower | **3.7× faster** |
177
+ | deeply_nested | **1.4× faster** | **can't parse** <sup>‡</sup> | **5.1× faster** |
178
+ | github_events | **1.2× faster** | ≈ tied | **3.1× faster** |
179
+ | string_array | ≈ tied | ≈ tied | **1.6× faster** |
180
+ | twitter | **1.4× faster** | 1.3× slower | **3.5× faster** |
181
+ | usgs_earthquakes <sup>≠</sup> | **1.3× faster** | 1.5× slower | **3.6× faster** |
182
+ | weather_berlin | **1.9× faster** | 1.1× slower | **3.5× faster** |
99
183
 
100
- Hope this helps.
101
- TEXT
102
- # => {"a"=>1}
184
+ <sup>≠</sup> High-precision file. The row uses `decimal_precision: :float` (Float, like-for-like) for `canada` / `citylots` / `big_decimals` / `usgs`. SmarterJSON's **default** `:auto` keeps these decimals as `BigDecimal` (no precision loss, like Oj's default) — intrinsically slower than `Float`, so default-vs-`Float` would be apples-to-oranges. Against Oj's matching `BigDecimal` default, SmarterJSON is faster there too.
185
+ <sup>‡</sup> Not a measurement gap — `json` raises by default: it errors on multi-document / NDJSON input without a block, and caps nesting at 100 levels. SmarterJSON has neither limit.
103
186
 
104
- SmarterJSON.process("<json>{\"a\":1}</json>")
105
- # => {"a"=>1}
187
+ In short: **matches or beats Oj/strict on every file** — `string_array` is the one wash (within ~10%, and hardware-dependent: SmarterJSON edges ahead on an M1, Oj edges ahead on an M4) — **far faster than Yajl everywhere, and level-to-ahead of stdlib `json` on a like-for-like basis**, while parsing input `json` and Oj reject outright. Floats are decoded with the **Eisel-Lemire** algorithm (fast_float), correctly rounded and **bit-for-bit identical to `JSON.parse`** — fast *and* exact, even at full double precision.
106
188
 
107
- SmarterJSON.process(<<~TEXT)
108
- first attempt:
109
- {"a":1}
189
+ **Two notes on fair comparison:**
190
+
191
+ - **NDJSON / multi-document:** only SmarterJSON reads it via plain `process` — Oj, `json`, and Yajl raise without a block. `process` collects every document into an `Array`; the block form streams one document at a time in bounded memory (use it for input larger than RAM).
192
+ - **High-precision decimals (the <sup>≠</sup> files):** by default these load as `BigDecimal` (full precision, like Oj's default), intrinsically slower than `Float`. Pass `decimal_precision: :float` for a like-for-like `Float` comparison — where SmarterJSON **beats stdlib `json`** (e.g. `citylots` ~2×) — at 3–6× the speed of the `:auto` default on coordinate/scientific data, when you don't need `BigDecimal` precision.
110
193
 
111
- corrected payload:
112
- {"b":2}
113
- TEXT
114
- # => [{"a"=>1}, {"b"=>2}]
115
- ```
116
194
 
117
195
  ### Options
118
196
 
119
197
  | option | default | meaning |
120
198
  |-------------------|--------------|-------------------------------------------------------------------------|
121
199
  | `symbolize_keys` | `false` | return object keys as Symbols instead of Strings |
122
- | `duplicate_key` | `:last_wins` | `:last_wins` / `:first_wins` / `:raise` for repeated keys in one object |
123
- | `bigdecimal_load` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
200
+ | `duplicate_key` | `:last_wins` | `:last_wins` / `:first_wins` for a key repeated in one object (every repeat is also reported via `on_warning`) |
201
+ | `decimal_precision` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
124
202
  | `acceleration` | `true` | `true` uses the C extension when compiled and loadable; `false` forces pure Ruby (identical results) |
125
203
  | `encoding` | `"UTF-8"` | labels the input's encoding (no transcoding pass; see below) |
126
204
  | `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
127
205
 
128
- ### Warnings (`on_warning`)
206
+ ## Examples
207
+
208
+ ### Lenient, config-style input
129
209
 
130
- When the parser quietly fixes something lenient collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key, strips code fences, ignores wrapper prose, unwraps wrapper tags — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
210
+ No outer braces neededa file or string that starts with `key: value` is read as an implicit root object (HJSON-style):
131
211
 
132
212
  ```ruby
133
- # Collect them all:
134
- warns = []
135
- data = SmarterJSON.process(input, on_warning: ->(w) { warns << w })
213
+ SmarterJSON.process_one("host: localhost\nport: 5432")
214
+ # => {"host"=>"localhost", "port"=>5432}
215
+ ```
136
216
 
137
- # Or route them log, count, raise:
138
- SmarterJSON.process(input, on_warning: ->(w) { Rails.logger.warn(w) })
217
+ ### Multiple documents (NDJSON / JSONL / concatenated)
218
+
219
+ `process` always returns an **`Array` of the documents** it found — `[]` for none, `[doc]` for one, `[d1, d2, …]` for several — with **no block and no special method**. The document count is unambiguous, and any result iterates uniformly:
220
+
221
+ ```ruby
222
+ SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]
223
+ SmarterJSON.process('{"id":1}') # => [{"id"=>1}] (one document, still an Array)
224
+ SmarterJSON.process("") # => [] (zero documents)
139
225
  ```
140
226
 
141
- ## Performance
227
+ For the common single-document case, **`process_one`** returns the one value directly — and *warns* (never raises) if the input held more than one, so you never silently drop a document:
228
+
229
+ ```ruby
230
+ SmarterJSON.process_one('{"id":1}') # => {"id"=>1}
231
+ SmarterJSON.process_one("") # => nil
232
+ ```
142
233
 
143
- Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpus (canada, citm_catalog, twitter, github_events, …). The apples-to-apples comparisons are **SmarterJSON/C** vs **Oj/strict** vs **stdlib `json`**, all producing `Float` (run `rake report` in `json_benchmarks/` for the full table — numbers vary run to run).
234
+ > **Type-checking the result?** Use `result.is_a?(Array)`, not `result.class == Array` it's the idiomatic Ruby test, and it stays correct if a future release returns a specialized `Array` subclass.
144
235
 
145
- - **vs Oj/strict** (the `JSON.parse`-equivalent mode, both producing `Float`): SmarterJSON/C is faster on nearly every file typically **1.1–1.6×** (e.g. big_decimals ~1.6×, deeply-nested ~1.4×, citm / twitter / usgs ~1.3×, github / citylots / weather ~1.1–1.2×). The one exception is **string_array**, where Oj/strict's SIMD string scan is ~1.7× faster that's the current frontier.
146
- - **vs stdlib `json` (C):** competitive with the fastest Ruby JSON parser — it ties `json` on big_decimals and string_array, and trails by ~1.1–1.7× on the rest. (`canada.json` is the outlier, far behind — that's the `BigDecimal` default, see below.)
147
- - **Numbers:** floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
236
+ A **top-level** value must be recognized JSON — a number, `true` / `false` / `null`, a quoted string, an object, an array or an implicit-root object (`host: localhost`). A bare top-level run such as `localhost` or `1 2 3` raises `ParseError`. Quoteless string values *inside* objects and arrays (`{host: localhost}`, `[red green blue]`) are unchanged.
148
237
 
149
- **Two notes on fair comparison:**
238
+ ### Streaming large input with a block
239
+
240
+ For input larger than memory, pass a block: each document is yielded as it is read and the method returns the **document count** instead of building an `Array`. Both `process` and `process_file` forward the block:
241
+
242
+ ```ruby
243
+ SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
244
+ ```
245
+
246
+ ### Recovering JSON from LLM / markdown noise
247
+
248
+ When the payload is wrapped in markdown fences, surrounding prose, or tags, `process` (or `process_one` for a single payload) strips the wrapper and reads what's inside. (Clean JSON never pays for this — recovery only runs when a straight read fails.)
249
+
250
+ A fenced code block, as an LLM often returns:
251
+
252
+ ````ruby
253
+ SmarterJSON.process_one(<<~TEXT)
254
+ Here is the JSON:
150
255
 
151
- - **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the streaming block form runs faster (~440 MB/s) because it doesn't hold all documents in memory at once.
152
- - **High-precision decimals (e.g. `canada.json`):** SmarterJSON's default `:auto` mode preserves high-precision numbers as `BigDecimal` (matching Oj's default), which is intrinsically slower than `Float`. Against `Float`-producing parsers it looks slower on such files; pass `bigdecimal_load: :float` to compare like-for-like (it then runs much faster). Against the equivalent `BigDecimal`-producing Oj mode, SmarterJSON is faster.
256
+ ```json
257
+ { "a": 1 }
258
+ ```
259
+ TEXT
260
+ # => {"a"=>1}
261
+ ````
262
+
263
+ Explanatory prose before and/or after the payload is ignored:
264
+
265
+ ```ruby
266
+ SmarterJSON.process_one(<<~TEXT)
267
+ Here is the result:
268
+
269
+ { "a": 1 }
270
+
271
+ Hope this helps.
272
+ TEXT
273
+ # => {"a"=>1}
274
+ ```
275
+
276
+ `<json>...</json>` / `BEGIN_JSON ... END_JSON` wrapper tags are unwrapped:
277
+
278
+ ```ruby
279
+ SmarterJSON.process_one('<json>{"a":1}</json>')
280
+ # => {"a"=>1}
281
+ ```
282
+
283
+ When one blob contains several recovered payloads, they come back as an `Array` (the same rule as multi-document input):
284
+
285
+ ```ruby
286
+ SmarterJSON.process(<<~TEXT)
287
+ first attempt:
288
+ {"a":1}
289
+
290
+ corrected payload:
291
+ {"b":2}
292
+ TEXT
293
+ # => [{"a"=>1}, {"b"=>2}]
294
+ ```
153
295
 
154
296
  ## Encoding
155
297
 
156
- `encoding:` (default `"UTF-8"`) labels what the input is — it does **not** trigger a transcoding pass. The parser works on the bytes in their native encoding and emits string values with the same encoding tag, the same way `smarter_csv` handles encodings. Bytes that are invalid for the claimed encoding raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`).
298
+ `encoding:` (default `"UTF-8"`) labels what the input is — it does **not** trigger a transcoding pass. SmarterJSON works on the bytes in their native encoding and emits string values with the same encoding tag, the same way `smarter_csv` handles encodings. Bytes that are invalid for the claimed encoding raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`).
157
299
 
158
300
  ## Nesting & untrusted input
159
301
 
160
- Both the C extension and the pure-Ruby parser are **iterative, not recursive** — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) parses without issue.
302
+ Both the C extension and the pure-Ruby engine are **iterative, not recursive** — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) is handled without issue.
303
+
304
+ The trade-off: there is currently **no fixed nesting or input-size limit**, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you process untrusted input and want a hard cap, that's a planned opt-in guard — for now, size-limit upstream.
161
305
 
162
- The trade-off: there is currently **no fixed nesting or input-size limit**, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you parse untrusted input and want a hard cap, that's a planned opt-in guard — for now, size-limit upstream of the parser.
163
306
 
164
307
  ## Development
165
308