smarter_json 0.6.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 65b2a8f15607b861034e600fd42233328803ebbcb652e8cb8b162811e24779e2
4
- data.tar.gz: 191467cd1d29030d054c1cdaabb444523af54fb71486721182ddcdf9a4781d92
3
+ metadata.gz: '06668bbb40626009794f8e5387fb13ae1a31346a07200d2825fde4872904bd68'
4
+ data.tar.gz: 0db0d42bfd2e85a1af4b897990a290e25e4fbe0afd9b4e25e99d9520667de2c3
5
5
  SHA512:
6
- metadata.gz: c26d4ffb0789fbe7303b80d80877f02164f79f7fe2ae9eff0e17de90df4ba8da19568c7a1fa5d10d5d324c8f4a841cd8ad0917f5ab6a5f381368404f56a1ec5c
7
- data.tar.gz: 6c3fbbf21ec9f9c6300516268ad4e7741aeaab8a6732ab138497f6af5b02afbfd944fdce6d4c7095f699469b77fd2fa2cf5e65ea58712ad55da0e5c745fd6551
6
+ metadata.gz: 766cd9c5865d7218f79db57ec538bb1d34355c93012e38820544a41bbedbbca94a4a8fce05982f5f2a0e48c5670a6d3f4336eb1a22361fc29b34855d913fc564
7
+ data.tar.gz: f57396ef1a2ac4e48e06dad94122b8159a3b17c80f6d679e1bb59f85874578ea444592a828c3eb324c8d84ff57c255df0bc99af2aeb5603261aa19d26abc88c2
data/CHANGELOG.md CHANGED
@@ -1,6 +1,21 @@
1
1
 
2
2
  # SmarterJSON Change Log
3
3
 
4
+ > 🚧 Getting ready for the 1.0.0 release - sorry for the interface changes - thank you for your patience! 🚧
5
+
6
+ ## 0.8.0 (2026-06-03)
7
+ - **Robustness** against LLM-generated / wrapped JSON:
8
+ - strips markdown code fences (```json / ```)
9
+ - ignores obvious prefix / suffix prose around a payload
10
+ - unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`
11
+ - preserves multiple recovered payloads as an `Array`
12
+ - supports pretty-printed multi-line document framing on IO / block input
13
+ - **Warnings** now cover wrapper recovery too (`:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, `:wrapper_tag_stripped`)
14
+ - **No truncation recovery**: truncated / unterminated input still raises `SmarterJSON::ParseError`
15
+
16
+ ## 0.7.0 (2026-06-03)
17
+ - **Breaking:** replaced the `warnings:` option (and its `[result, warnings]` tuple return) with an `on_warning:` callable. Pass `on_warning: ->(w) { ... }` to be handed each `SmarterJSON::Warning` as the parser applies a lenient fix; `process` / `process_file` now always return the bare value (nil / value / Array) on every path. Unlike the tuple, this also fires on the streaming block form. The default (no handler) records nothing and costs nothing.
18
+
4
19
  ## 0.6.0 (2026-06-02)
5
20
  - Lenient comma handling: empty slots around / between commas are collapsed (`[1,,2]` → `[1,2]`, `[,1,]` → `[1]`, `{a:1,,b:2}` → `{a:1,b:2}`), on both the C and Ruby paths. No null is inserted for an empty slot.
6
21
  - A key with a colon but no value reads as null: `{a:}` → `{"a"=>nil}` (both paths).
@@ -9,12 +24,12 @@
9
24
  - Fixed a pure-Ruby bug where a `\u` escape whose next bytes split a multibyte character leaked `ArgumentError`; it now raises `SmarterJSON::ParseError`.
10
25
  - Added a property/fuzz test suite that checks C/Ruby parity and round-tripping on generated, mutated, and random input.
11
26
 
12
- ## 0.5.2 (2026-06-01)
27
+ ## 0.5.2 (2026-06-01) yanked
13
28
  - `generate` now supports pretty-printing via the `indent:` option (spaces per nesting level; default `0` = compact). Empty objects/arrays stay inline; `indent:` combined with `format: :ndjson` raises `ArgumentError`.
14
29
  - `generate` adds `sort_keys:` (emit object keys in sorted order), `ascii_only:` (escape non-ASCII as `\uXXXX`, astral chars as surrogate pairs), and `script_safe:` (escape `</` and U+2028/U+2029 for safe embedding in an HTML `<script>` tag).
15
30
  - `generate` adds opt-in `coerce:` — when `true`, a value that isn't natively supported (e.g. `Time`, `Date`, app objects) is converted via its own `as_json` (result re-emitted) or `to_json` (spliced); strict-by-default still raises `GenerateError`.
16
31
 
17
- ## 0.5.1 (2026-06-01)
32
+ ## 0.5.1 (2026-06-01) yanked
18
33
  - Unified the error classes under a single `SmarterJSON::Error` base: `ParseError` and `EncodingError` now inherit from it, and `generate` raises a new `GenerateError`. `rescue SmarterJSON::Error` now catches everything the gem raises.
19
34
  - Added a CI test matrix (Ruby 2.6–4.0 + head, on Ubuntu and macOS).
20
35
  - Fixed the C extension build on Ruby 2.6 (declare `rb_hash_bulk_insert`, which 2.6 exports but does not declare in its headers); set the minimum Ruby to 2.6.
data/README.md CHANGED
@@ -16,13 +16,14 @@ Three things set it apart:
16
16
 
17
17
  1. **One parser, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.
18
18
 
19
- 2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time.
19
+ 2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** Pass a block to iterate the recovered documents one at a time.
20
20
 
21
21
  3. **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser — the fastest general-purpose Ruby JSON parser.
22
22
 
23
23
  ## What it accepts, beyond strict JSON
24
24
 
25
25
  - `//`, `/* … */`, and `#` comments (a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` parses as a string, not a truncated value)
26
+ - Markdown-wrapped / chatty blobs around the payload: strips ```` ```json ```` / ```` ``` ```` fences, ignores obvious prose before/after the payload, unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`, and preserves multiple recovered payloads as an Array
26
27
  - Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
27
28
  - Implicit root object — a config file that starts with `key: value`, no outer `{}`
28
29
  - `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
@@ -67,9 +68,14 @@ SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2
67
68
  SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
68
69
  SmarterJSON.process("") # => nil (zero documents)
69
70
 
70
- # For input larger than memory, stream one document at a time with a block
71
+ # Iterate one recovered document at a time with a block
71
72
  # (process and process_file both forward the block):
72
73
  SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
74
+
75
+ # Wrapper noise is stripped automatically:
76
+ SmarterJSON.process("Here is the JSON:\n\n```json\n{\"a\":1}\n```\n") # => {"a"=>1}
77
+ SmarterJSON.process("<json>{\"a\":1}</json>") # => {"a"=>1}
78
+ SmarterJSON.process("first:\n{\"a\":1}\nsecond:\n{\"b\":2}") # => [{"a"=>1}, {"b"=>2}]
73
79
  ```
74
80
 
75
81
  ### Options
@@ -81,7 +87,20 @@ SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event
81
87
  | `bigdecimal_load` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
82
88
  | `acceleration` | `true` | `true` uses the C extension when compiled and loadable; `false` forces pure Ruby (identical results) |
83
89
  | `encoding` | `"UTF-8"` | labels the input's encoding (no transcoding pass; see below) |
84
- | `warnings` | `false` | when `true`, return `[result, warnings]` — `warnings` lists the lenient fixes applied (`:empty_slot`, `:empty_value`, `:duplicate_key`) |
90
+ | `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
91
+
92
+ ### Warnings (`on_warning`)
93
+
94
+ When the parser quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key, strips code fences, ignores wrapper prose, unwraps wrapper tags — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
95
+
96
+ ```ruby
97
+ # Collect them all:
98
+ warns = []
99
+ data = SmarterJSON.process(input, on_warning: ->(w) { warns << w })
100
+
101
+ # Or route them — log, count, raise:
102
+ SmarterJSON.process(input, on_warning: ->(w) { Rails.logger.warn(w) })
103
+ ```
85
104
 
86
105
  ## Performance
87
106
 
@@ -93,7 +112,7 @@ Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpu
93
112
 
94
113
  **Two notes on fair comparison:**
95
114
 
96
- - **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the streaming block form runs faster (~440 MB/s) because it doesn't hold all documents in memory at once — use it for input larger than RAM.
115
+ - **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the block form yields each recovered document instead of returning the collected Array.
97
116
  - **High-precision decimals (e.g. `canada.json`):** SmarterJSON's default `:auto` mode preserves high-precision numbers as `BigDecimal` (matching Oj's default), which is intrinsically slower than `Float`. Against `Float`-producing parsers it looks slower on such files; pass `bigdecimal_load: :float` to compare like-for-like (it then runs much faster). Against the equivalent `BigDecimal`-producing Oj mode, SmarterJSON is faster.
98
117
 
99
118
  ## Encoding
@@ -11,7 +11,7 @@
11
11
 
12
12
  # SmarterJSON Introduction
13
13
 
14
- `smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
14
+ `smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), markdown-wrapped / chatty blobs around a JSON payload, and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
15
15
 
16
16
  ## Why another JSON library?
17
17
 
@@ -21,7 +21,7 @@ Most JSON parsers reject anything that isn't perfectly strict JSON, and they mak
21
21
 
22
22
  * **One reader, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the reader to match your input; it adapts to whatever you give it.
23
23
 
24
- * **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time. See [The Basic Read API](./basic_read_api.md).
24
+ * **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** Pass a block to iterate the recovered documents one at a time. See [The Basic Read API](./basic_read_api.md).
25
25
 
26
26
  * **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser. Floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
27
27
 
@@ -22,7 +22,7 @@ SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
22
22
  SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
23
23
  ```
24
24
 
25
- `process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths.
25
+ `process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths. When the input wraps the payload in obvious markdown / prose / tags, `process` strips that wrapper first and then parses the recovered payload(s).
26
26
 
27
27
  ```ruby
28
28
  SmarterJSON.process(io) # an open IO (File, StringIO, socket, …) — reads it and parses
@@ -39,7 +39,7 @@ SmarterJSON.process('{"id":1}') # => {"id"=>1} (one
39
39
  SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}] (two or more → an Array)
40
40
  ```
41
41
 
42
- Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
42
+ Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). If wrapper noise is stripped and several payloads are recovered, they are returned by the same rule: one payload → its value, several → an `Array`. Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
43
43
 
44
44
  ## `SmarterJSON.process_file` — read a file by path
45
45
 
@@ -49,36 +49,34 @@ SmarterJSON.process_file("config.json5") # read the file, then parse — sam
49
49
 
50
50
  `process_file` opens the file, reads it with the labeled [`encoding:`](./options.md) (default `"UTF-8"`, no transcoding pass), and parses it.
51
51
 
52
- ## Streaming with a block (bounded memory)
52
+ ## Streaming with a block
53
53
 
54
- For input larger than memory, pass a block. Each top-level document is yielded as it is read, and the method returns `nil` (it never collects the documents into an Array). Both `process` and `process_file` forward the block.
54
+ Pass a block to have each recovered top-level document yielded one at a time; the method returns `nil` instead of collecting the documents into an Array. Both `process` and `process_file` forward the block.
55
55
 
56
56
  ```ruby
57
- # Stream straight from disk, one document at a time — the whole file is never loaded:
58
57
  SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
59
-
60
- # Same for an IO:
61
58
  SmarterJSON.process(io) { |doc| handle(doc) }
62
59
  ```
63
60
 
64
- The streaming path reads the input as newline-delimited documents (NDJSON / JSONL), one document per line. A single document that spans multiple lines is not supported by the streaming path read it without a block instead.
61
+ The streaming path now frames whole top-level documents, not just one line at a time. That means NDJSON / JSONL still work, but pretty-printed multi-line objects and arrays work too, as do mixed `\n` / `\r\n` / `\r` line endings and comment-only separators between documents.
65
62
 
66
63
  ## The C extension and the pure-Ruby fallback
67
64
 
68
65
  By default (`acceleration: true`) the C extension is used when it is compiled and loadable (`SmarterJSON::HAS_ACCELERATION` is then `true`); otherwise the pure-Ruby parser runs and produces identical results. Pass `acceleration: false` to force the pure-Ruby path. See [Configuration Options](./options.md).
69
66
 
70
- ## Seeing what was fixed: `warnings:`
67
+ ## Seeing what was fixed: `on_warning:`
71
68
 
72
- `process` and `process_file` are lenient — they salvage your data rather than reject a whole document over a stray comma. Pass `warnings: true` to also get back a record of what was adjusted, so the leniency is transparent instead of silent. The call then returns `[result, warnings]`:
69
+ `process` and `process_file` are lenient — they salvage your data rather than reject a whole document over a stray comma. Pass an `on_warning:` callable to also get a record of what was adjusted, so the leniency is transparent instead of silent. It is invoked once per fix and never changes the return value:
73
70
 
74
71
  ```ruby
75
- result, warnings = SmarterJSON.process("[1,,2]", warnings: true)
76
- result # => [1, 2]
77
- warnings.map(&:type) # => [:empty_slot]
78
- warnings.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
72
+ warns = []
73
+ result = SmarterJSON.process("[1,,2]", on_warning: ->(w) { warns << w })
74
+ result # => [1, 2]
75
+ warns.map(&:type) # => [:empty_slot]
76
+ warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
79
77
  ```
80
78
 
81
- Each warning is a `SmarterJSON::Warning` with `type`, `message`, `line`, and `col`. The types are `:empty_slot` (a collapsed empty comma slot), `:empty_value` (a key with no value, read as `null`), and `:duplicate_key` (a repeated key that was dropped). Clean input gives an empty `warnings` array. It works the same on the C and pure-Ruby paths. See [Configuration Options](./options.md).
79
+ Each warning is a `SmarterJSON::Warning` with `type`, `message`, `line`, and `col`. The types are `:empty_slot` (a collapsed empty comma slot), `:empty_value` (a key with no value, read as `null`), `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. It fires on every path — including the streaming block form — and works the same on the C and pure-Ruby paths. See [Configuration Options](./options.md).
82
80
 
83
81
  ---------------
84
82
 
data/docs/examples.md CHANGED
@@ -24,9 +24,10 @@
24
24
  7. [Duplicate Keys](#example-7-duplicate-keys)
25
25
  8. [High-Precision Numbers: BigDecimal vs Float](#example-8-high-precision-numbers-bigdecimal-vs-float)
26
26
  9. [Lenient Input: Comments, Trailing Commas, Unquoted Keys](#example-9-lenient-input-comments-trailing-commas-unquoted-keys)
27
- 10. [Write JSON](#example-10-write-json)
28
- 11. [Write NDJSON](#example-11-write-ndjson)
29
- 12. [Round-Trip Read and Write](#example-12-round-trip-read-and-write)
27
+ 10. [Wrapper Noise Around a Payload](#example-10-wrapper-noise-around-a-payload)
28
+ 11. [Write JSON](#example-11-write-json)
29
+ 12. [Write NDJSON](#example-12-write-ndjson)
30
+ 13. [Round-Trip Read and Write](#example-13-round-trip-read-and-write)
30
31
 
31
32
  ---
32
33
 
@@ -64,9 +65,9 @@ SmarterJSON.process('{"id":1}') # => {"id"=>1}
64
65
  SmarterJSON.process("") # => nil
65
66
  ```
66
67
 
67
- ### Example 5: Streaming a Large File with a Block
68
+ ### Example 5: Iterate Documents with a Block
68
69
 
69
- For input larger than memory, pass a block. Each document is yielded as it is read; the whole file is never loaded:
70
+ Pass a block to receive each recovered document one at a time:
70
71
 
71
72
  ```ruby
72
73
  SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
@@ -113,14 +114,41 @@ JSON
113
114
 
114
115
  A `#`/`//` only starts a comment when preceded by whitespace, so `http://example.com` stays a string rather than being truncated.
115
116
 
116
- ### Example 10: Write JSON
117
+ ### Example 10: Wrapper Noise Around a Payload
118
+
119
+ ```ruby
120
+ SmarterJSON.process(<<~TEXT)
121
+ Here is the JSON:
122
+
123
+ ```json
124
+ {
125
+ "a": 1
126
+ }
127
+ ```
128
+ TEXT
129
+ # => {"a"=>1}
130
+
131
+ SmarterJSON.process("<json>{\"a\":1}</json>")
132
+ # => {"a"=>1}
133
+
134
+ SmarterJSON.process(<<~TEXT)
135
+ first:
136
+ {"a":1}
137
+
138
+ second:
139
+ {"b":2}
140
+ TEXT
141
+ # => [{"a"=>1}, {"b"=>2}]
142
+ ```
143
+
144
+ ### Example 11: Write JSON
117
145
 
118
146
  ```ruby
119
147
  SmarterJSON.generate({ "a" => 1, "b" => [2, 3] }) # => '{"a":1,"b":[2,3]}'
120
148
  SmarterJSON.generate([1, 2, 3]) # => '[1,2,3]'
121
149
  ```
122
150
 
123
- ### Example 11: Write NDJSON
151
+ ### Example 12: Write NDJSON
124
152
 
125
153
  An Array writes one element per line:
126
154
 
@@ -128,7 +156,7 @@ An Array writes one element per line:
128
156
  SmarterJSON.generate([{ "id" => 1 }, { "id" => 2 }], format: :ndjson) # => "{\"id\":1}\n{\"id\":2}\n"
129
157
  ```
130
158
 
131
- ### Example 12: Round-Trip Read and Write
159
+ ### Example 13: Round-Trip Read and Write
132
160
 
133
161
  ```ruby
134
162
  obj = { "a" => 1, "b" => [2, "three", nil, true] }
data/docs/options.md CHANGED
@@ -22,27 +22,28 @@ These options are passed to [`SmarterJSON.process`](./basic_read_api.md) and `Sm
22
22
  | `:bigdecimal_load`| `:auto` | `:auto` keeps high-precision decimals as `BigDecimal` (matches Oj); `:float` forces every number to `Float`; `:bigdecimal` forces every decimal to `BigDecimal`. |
23
23
  | `:acceleration` | `true` | Use the C extension when it is compiled and loadable; `false` forces the pure-Ruby parser. Both produce identical results. |
24
24
  | `:encoding` | `nil` | Labels the input's encoding (e.g. `"UTF-8"`). It does **not** trigger a transcoding pass — see below. |
25
- | `:warnings` | `false` | When `true`, return `[result, warnings]` instead of just `result` `warnings` lists the lenient fixes that were applied. See below. |
25
+ | `:on_warning` | `nil` | A callable invoked once per lenient fix applied, passed a `SmarterJSON::Warning`. Never changes the return value. See below. |
26
26
 
27
27
  ```ruby
28
28
  SmarterJSON.process('{"a": 1}', symbolize_keys: true) # => {:a=>1}
29
29
  SmarterJSON.process('{"a":1,"a":2}', duplicate_key: :raise) # raises SmarterJSON::ParseError
30
30
  SmarterJSON.process(big_decimal_json, bigdecimal_load: :float) # every number as Float (fastest)
31
- SmarterJSON.process("[1,,2]", warnings: true) # => [[1, 2], [#<SmarterJSON::Warning ...>]]
31
+ SmarterJSON.process("[1,,2]", on_warning: ->(w) { puts w }) # => [1, 2], and prints the warning
32
32
  ```
33
33
 
34
- ### A note on `:warnings`
34
+ ### A note on `:on_warning`
35
35
 
36
- `smarter_json` is lenient by design — it salvages your data instead of rejecting the whole document over a stray comma. `warnings: true` keeps that, but also hands back a record of what it had to fix, so leniency is transparent rather than silent. The call then returns a two-element `[result, warnings]`; `warnings` is an Array of `SmarterJSON::Warning`, each with `type` (a Symbol), `message`, `line`, and `col`:
36
+ `smarter_json` is lenient by design — it salvages your data instead of rejecting the whole document over a stray comma. `on_warning:` keeps that, but also hands you a record of what it had to fix, so leniency is transparent rather than silent. It takes a callable that the parser invokes once per fix, passing a `SmarterJSON::Warning` (with `type` (a Symbol), `message`, `line`, and `col`). It never changes the return value — `process` still hands back the bare value — and it fires on every path, including the streaming block form. With no handler (the default), nothing is recorded and there is zero overhead.
37
37
 
38
38
  ```ruby
39
- result, warnings = SmarterJSON.process("[1,,2]", warnings: true)
39
+ warns = []
40
+ result = SmarterJSON.process("[1,,2]", on_warning: ->(w) { warns << w })
40
41
  result # => [1, 2]
41
- warnings.map(&:type) # => [:empty_slot]
42
- warnings.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
42
+ warns.map(&:type) # => [:empty_slot]
43
+ warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
43
44
  ```
44
45
 
45
- The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped). Clean input returns an empty `warnings` array. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
46
+ The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
46
47
 
47
48
  ### A note on `:encoding`
48
49
 
@@ -34,6 +34,7 @@ static VALUE cParseError;
34
34
  static VALUE cEncodingError;
35
35
  static VALUE cWarning;
36
36
  static ID fj_new_id;
37
+ static ID fj_call_id; /* cached :call (invoking the on_warning handler) */
37
38
  static VALUE fj_sym_empty_slot;
38
39
  static VALUE fj_sym_empty_value;
39
40
  static VALUE fj_sym_duplicate_key;
@@ -60,8 +61,7 @@ typedef struct {
60
61
  int dup_raise;
61
62
  int bigdecimal_load; /* 0 = float, 1 = auto, 2 = bigdecimal */
62
63
  fj_kc_slot *kcache; /* per-parse key cache (NULL when interning unavailable) */
63
- int collect_warnings; /* warnings: option record non-fatal lenient fixes */
64
- VALUE warnings; /* rb_ary of SmarterJSON::Warning when collecting, else Qnil */
64
+ VALUE on_warning; /* on_warning: callable invoked per non-fatal lenient fix, else Qnil */
65
65
  } fj_state;
66
66
 
67
67
  /* Line/column at the current byte position, computed lazily (only when raising
@@ -81,14 +81,16 @@ static void fj_line_col(fj_state *st, long *line, long *col) {
81
81
  *col = c;
82
82
  }
83
83
 
84
- /* Record a non-fatal lenient fix only when the parse was given warnings: true. */
84
+ /* Report a non-fatal lenient fix to the on_warning callable a no-op (and builds no
85
+ * Warning) when no handler was given. The internal Qnil guard is the safety net; the
86
+ * call sites also guard so the line/col scan is skipped on the fast path. */
85
87
  static void fj_warn(fj_state *st, VALUE type_sym, const char *msg) {
86
88
  long line, col;
87
- if (!st->collect_warnings) return;
89
+ if (st->on_warning == Qnil) return;
88
90
  fj_line_col(st, &line, &col);
89
- rb_ary_push(st->warnings,
90
- rb_funcall(cWarning, fj_new_id, 4, type_sym,
91
- rb_utf8_str_new_cstr(msg), LONG2NUM(line), LONG2NUM(col)));
91
+ rb_funcall(st->on_warning, fj_call_id, 1,
92
+ rb_funcall(cWarning, fj_new_id, 4, type_sym,
93
+ rb_utf8_str_new_cstr(msg), LONG2NUM(line), LONG2NUM(col)));
92
94
  }
93
95
 
94
96
  /* 1-based column of the current byte position (bytes since the last line start).
@@ -1161,9 +1163,9 @@ static VALUE fj_build_object(fj_state *st, const VALUE *pairs, long count) {
1161
1163
  long entries = count / 2, i;
1162
1164
  VALUE hash = rb_hash_new_capa(entries);
1163
1165
 
1164
- /* Fast path: bulk insert. Skipped when collecting warnings, which needs the
1165
- * per-member loop below to report each dropped duplicate key. */
1166
- if (!st->symbolize_keys && !st->dup_first_wins && !st->collect_warnings) {
1166
+ /* Fast path: bulk insert. Skipped when an on_warning handler is present, which needs
1167
+ * the per-member loop below to report each dropped duplicate key. */
1168
+ if (!st->symbolize_keys && !st->dup_first_wins && st->on_warning == Qnil) {
1167
1169
  rb_hash_bulk_insert(count, pairs, hash);
1168
1170
  if (st->dup_raise && fj_hash_len(hash) < entries) {
1169
1171
  VALUE seen = rb_hash_new_capa(entries);
@@ -1178,7 +1180,7 @@ static VALUE fj_build_object(fj_state *st, const VALUE *pairs, long count) {
1178
1180
 
1179
1181
  for (i = 0; i + 1 < count; i += 2) {
1180
1182
  VALUE k = st->symbolize_keys ? rb_funcall(pairs[i], fj_to_sym_id, 0) : pairs[i];
1181
- if (st->dup_first_wins || st->dup_raise || st->collect_warnings) {
1183
+ if (st->dup_first_wins || st->dup_raise || st->on_warning != Qnil) {
1182
1184
  if (RTEST(rb_funcall(hash, fj_key_p_id, 1, k))) {
1183
1185
  if (st->dup_raise) fj_error(st, "duplicate key");
1184
1186
  fj_warn(st, fj_sym_duplicate_key, "duplicate key");
@@ -1271,7 +1273,7 @@ static VALUE fj_parse_iter(fj_state *st, int implicit_root) {
1271
1273
  fj_skip_ws_comments(st);
1272
1274
  b = fj_byte(st);
1273
1275
  if (b == ',') { /* collapsing separator: skip empty member */
1274
- if (st->collect_warnings && !vss) fj_warn(st, fj_sym_empty_slot, "extra comma, collapsed an empty slot");
1276
+ if (st->on_warning != Qnil && !vss) fj_warn(st, fj_sym_empty_slot, "extra comma, collapsed an empty slot");
1275
1277
  vss = 0;
1276
1278
  fj_advance(st, 1);
1277
1279
  continue;
@@ -1323,7 +1325,7 @@ static VALUE fj_parse_iter(fj_state *st, int implicit_root) {
1323
1325
  fj_skip_ws_comments(st);
1324
1326
  b = fj_byte(st);
1325
1327
  if (b == ',') { /* collapsing separator: skip empty slot */
1326
- if (st->collect_warnings && !vss) fj_warn(st, fj_sym_empty_slot, "extra comma, collapsed an empty slot");
1328
+ if (st->on_warning != Qnil && !vss) fj_warn(st, fj_sym_empty_slot, "extra comma, collapsed an empty slot");
1327
1329
  vss = 0;
1328
1330
  fj_advance(st, 1);
1329
1331
  continue;
@@ -1412,8 +1414,7 @@ static VALUE fj_parse_c(VALUE self, VALUE input, VALUE opts) {
1412
1414
  else st.bigdecimal_load = 1; /* :auto (default), including nil */
1413
1415
  }
1414
1416
 
1415
- st.collect_warnings = RTEST(rb_hash_aref(opts, ID2SYM(rb_intern("warnings"))));
1416
- st.warnings = st.collect_warnings ? rb_ary_new() : Qnil;
1417
+ st.on_warning = rb_hash_aref(opts, ID2SYM(rb_intern("on_warning"))); /* Qnil when absent */
1417
1418
 
1418
1419
  if (st.len >= 3 && (unsigned char)st.buf[0] == 0xEF &&
1419
1420
  (unsigned char)st.buf[1] == 0xBB && (unsigned char)st.buf[2] == 0xBF) {
@@ -1439,10 +1440,10 @@ static VALUE fj_parse_c(VALUE self, VALUE input, VALUE opts) {
1439
1440
  * whitespace / newline / concatenation do), so a bracketless comma list still
1440
1441
  * raises in fj_parse_iter — the unsupported implicit-root array. */
1441
1442
  fj_skip_ws_comments(&st);
1442
- if (fj_eof(&st)) return st.collect_warnings ? rb_assoc_new(Qnil, st.warnings) : Qnil;
1443
+ if (fj_eof(&st)) return Qnil;
1443
1444
  value = fj_parse_iter(&st, fj_implicit_root_ahead(&st));
1444
1445
  fj_skip_ws_comments(&st);
1445
- if (fj_eof(&st)) return st.collect_warnings ? rb_assoc_new(value, st.warnings) : value;
1446
+ if (fj_eof(&st)) return value;
1446
1447
  {
1447
1448
  VALUE arr = rb_ary_new();
1448
1449
  rb_ary_push(arr, value);
@@ -1450,7 +1451,7 @@ static VALUE fj_parse_c(VALUE self, VALUE input, VALUE opts) {
1450
1451
  rb_ary_push(arr, fj_parse_iter(&st, fj_implicit_root_ahead(&st)));
1451
1452
  fj_skip_ws_comments(&st);
1452
1453
  } while (!fj_eof(&st));
1453
- return st.collect_warnings ? rb_assoc_new(arr, st.warnings) : arr;
1454
+ return arr;
1454
1455
  }
1455
1456
  }
1456
1457
 
@@ -1463,6 +1464,7 @@ void Init_smarter_json(void) {
1463
1464
  fj_to_sym_id = rb_intern("to_sym");
1464
1465
  fj_key_p_id = rb_intern("key?");
1465
1466
  fj_new_id = rb_intern("new");
1467
+ fj_call_id = rb_intern("call");
1466
1468
  fj_sym_empty_slot = ID2SYM(rb_intern("empty_slot"));
1467
1469
  fj_sym_empty_value = ID2SYM(rb_intern("empty_value"));
1468
1470
  fj_sym_duplicate_key = ID2SYM(rb_intern("duplicate_key"));
@@ -22,9 +22,9 @@ module SmarterJSON
22
22
  # stream as newline-delimited documents (NDJSON / JSONL), one per line.
23
23
  def process(input, options = {}, &block)
24
24
  if input.is_a?(String)
25
- process_content(input, options, &block)
25
+ Recovery.process_string(input, options, &block)
26
26
  elsif input.respond_to?(:read)
27
- block ? stream_io(input, options, &block) : process_content(input.read, options)
27
+ block ? stream_io(input, options, &block) : process(input.read, options)
28
28
  else
29
29
  raise ArgumentError, "SmarterJSON.process expects a String or an IO, got #{input.class}"
30
30
  end
@@ -43,7 +43,7 @@ module SmarterJSON
43
43
  if block
44
44
  File.open(path, "r:#{encoding}") { |io| stream_io(io, options, &block) }
45
45
  else
46
- process_content(File.read(path, encoding: encoding), options)
46
+ process(File.read(path, encoding: encoding), options)
47
47
  end
48
48
  end
49
49
 
@@ -57,10 +57,9 @@ module SmarterJSON
57
57
  Parser.new(input, options).each_value(&block)
58
58
  end
59
59
  elsif options.fetch(:acceleration, true) && HAS_ACCELERATION
60
- parse_c(input, options) # returns [result, warnings] when options[:warnings]
60
+ parse_c(input, options)
61
61
  else
62
- parser = Parser.new(input, options)
63
- options.fetch(:warnings, false) ? [parser.parse, parser.warnings] : parser.parse
62
+ Parser.new(input, options).parse
64
63
  end
65
64
  end
66
65
 
@@ -68,12 +67,230 @@ module SmarterJSON
68
67
  # each — bounded memory. Newline-delimited (NDJSON / JSONL); a single document
69
68
  # spanning multiple lines is not supported by the streaming path.
70
69
  def stream_io(io, options, &block)
71
- io.each_line("\n") { |line| process_content(line, options, &block) }
72
- nil
70
+ Recovery.process_string(io.read, options, &block)
73
71
  end
74
72
 
75
73
  private_class_method :process_content, :stream_io
76
74
 
75
+ module Recovery
76
+ module_function
77
+
78
+ def process_string(input, options, &block)
79
+ return SmarterJSON.send(:process_content, input, options, &block) unless input.valid_encoding?
80
+
81
+ if wrapper_hint?(input)
82
+ payloads = extract_payloads(input, options)
83
+ return replay_payloads(payloads, options, &block) unless payloads.empty?
84
+ end
85
+
86
+ SmarterJSON.send(:process_content, input, options, &block)
87
+ rescue ParseError => e
88
+ raise if e.is_a?(EncodingError)
89
+
90
+ payloads = extract_payloads(input, options)
91
+ return replay_payloads(payloads, options, &block) unless payloads.empty?
92
+
93
+ raise
94
+ end
95
+
96
+ def wrapper_hint?(input)
97
+ return false unless input.valid_encoding?
98
+
99
+ input.match?(/```|<json\b|BEGIN_JSON\b/i) || input.match?(/\A[[:space:]]*(?:JSON|Final answer)[[:space:]]*:/i)
100
+ end
101
+
102
+ def replay_payloads(payloads, options, &block)
103
+ handler = options[:on_warning]
104
+ emit_wrapper_warnings(payloads, handler)
105
+
106
+ results = payloads.map do |payload|
107
+ SmarterJSON.send(:process_content, payload[:slice], options)
108
+ end
109
+
110
+ return results.each(&block).then { nil } if block_given?
111
+ return nil if results.empty?
112
+ return results.first if results.length == 1
113
+
114
+ results
115
+ end
116
+
117
+ def emit_wrapper_warnings(payloads, handler)
118
+ return unless handler
119
+
120
+ meta = payloads.first[:meta]
121
+ warn(handler, :prefix_text_ignored, "ignored non-JSON text before the payload", *meta[:first_pos]) if meta[:prefix]
122
+ warn(handler, :code_fence_stripped, "stripped markdown code fences around the payload", *meta[:first_pos]) if meta[:fence]
123
+ warn(handler, :wrapper_tag_stripped, "stripped wrapper tags around the payload", *meta[:first_pos]) if meta[:wrapper]
124
+ warn(handler, :suffix_text_ignored, "ignored non-JSON text after the payload", *meta[:last_pos]) if meta[:suffix]
125
+ end
126
+
127
+ def extract_payloads(input, options)
128
+ payloads = candidate_ranges(input).filter_map do |range|
129
+ slice = input.byteslice(range.begin, range.end - range.begin)
130
+ begin
131
+ SmarterJSON.send(:process_content, slice, options.merge(on_warning: nil))
132
+ { slice: slice, range: range }
133
+ rescue ParseError
134
+ nil
135
+ end
136
+ end
137
+ meta = wrapper_meta(input, payloads.map { |p| p[:range] })
138
+ payloads.each { |payload| payload[:meta] = meta }
139
+ payloads
140
+ end
141
+
142
+ def wrapper_meta(input, ranges)
143
+ return { prefix: false, suffix: false, fence: false, wrapper: false } if ranges.empty?
144
+
145
+ first = ranges.first
146
+ last = ranges.last
147
+ prefix = input.byteslice(0, first.begin)
148
+ suffix = input.byteslice(last.end, input.bytesize - last.end)
149
+ {
150
+ prefix: substantive_text?(prefix),
151
+ suffix: substantive_text?(suffix),
152
+ fence: input.match?(/```/),
153
+ wrapper: input.match?(/<json\b|BEGIN_JSON\b/i),
154
+ first_pos: line_col_for(input, first.begin),
155
+ last_pos: line_col_for(input, last.begin)
156
+ }
157
+ end
158
+
159
+ def line_col_for(input, offset)
160
+ line = 1
161
+ col = 1
162
+ i = 0
163
+ while i < offset
164
+ b = input.getbyte(i)
165
+ break if b.nil?
166
+
167
+ if b == 0x0A
168
+ line += 1
169
+ col = 1
170
+ i += 1
171
+ elsif b == 0x0D
172
+ line += 1
173
+ col = 1
174
+ i += 1
175
+ i += 1 if i < offset && input.getbyte(i) == 0x0A
176
+ else
177
+ col += 1
178
+ i += 1
179
+ end
180
+ end
181
+ [line, col]
182
+ end
183
+
184
+ def substantive_text?(text)
185
+ return false if text.nil? || text.empty?
186
+
187
+ stripped = text.dup
188
+ stripped.gsub!(%r{/\*.*?\*/}m, "")
189
+ stripped.gsub!(/^\s*(?:#|\/\/).*$/, "")
190
+ !stripped.strip.empty? && !stripped.strip.match?(/\A(?:```[a-zA-Z0-9_-]*)?\z/) && !stripped.strip.match?(/\A(?:<\/?json>|BEGIN_JSON|END_JSON)\z/i)
191
+ end
192
+
193
+ def warn(handler, type, message, line, col)
194
+ handler.call(Warning.new(type, message, line, col))
195
+ end
196
+
197
+ def candidate_ranges(input)
198
+ ranges = []
199
+ stack = []
200
+ start_pos = nil
201
+ i = 0
202
+ mode = nil
203
+ while i < input.bytesize
204
+ b = input.getbyte(i)
205
+ if mode == :double
206
+ if b == 0x5C
207
+ i += 2
208
+ next
209
+ elsif b == 0x22
210
+ mode = nil
211
+ end
212
+ i += 1
213
+ next
214
+ elsif mode == :single
215
+ if b == 0x5C
216
+ i += 2
217
+ next
218
+ elsif b == 0x27
219
+ mode = nil
220
+ end
221
+ i += 1
222
+ next
223
+ elsif mode == :triple
224
+ if input.byteslice(i, 3) == "'''"
225
+ mode = nil
226
+ i += 3
227
+ else
228
+ i += 1
229
+ end
230
+ next
231
+ elsif mode == :line_comment
232
+ if [0x0A, 0x0D].include?(b)
233
+ mode = nil
234
+ else
235
+ i += 1
236
+ next
237
+ end
238
+ elsif mode == :block_comment
239
+ if input.byteslice(i, 2) == "*/"
240
+ mode = nil
241
+ i += 2
242
+ else
243
+ i += 1
244
+ end
245
+ next
246
+ else
247
+ if input.byteslice(i, 2) == "//"
248
+ mode = :line_comment
249
+ i += 2
250
+ next
251
+ elsif input.byteslice(i, 2) == "/*"
252
+ mode = :block_comment
253
+ i += 2
254
+ next
255
+ elsif b == 0x23
256
+ mode = :line_comment
257
+ i += 1
258
+ next
259
+ elsif b == 0x22
260
+ mode = :double
261
+ i += 1
262
+ next
263
+ elsif input.byteslice(i, 3) == "'''"
264
+ mode = :triple
265
+ i += 3
266
+ next
267
+ elsif b == 0x27
268
+ mode = :single
269
+ i += 1
270
+ next
271
+ elsif [0x7B, 0x5B].include?(b)
272
+ start_pos = i if stack.empty?
273
+ stack << b
274
+ elsif b == 0x7D
275
+ stack.pop if stack.last == 0x7B
276
+ if stack.empty? && start_pos
277
+ ranges << (start_pos...(i + 1))
278
+ start_pos = nil
279
+ end
280
+ elsif b == 0x5D
281
+ stack.pop if stack.last == 0x5B
282
+ if stack.empty? && start_pos
283
+ ranges << (start_pos...(i + 1))
284
+ start_pos = nil
285
+ end
286
+ end
287
+ end
288
+ i += 1
289
+ end
290
+ ranges
291
+ end
292
+ end
293
+
77
294
  # Hand-rolled FSM single-pass parser.
78
295
  # Layer 1: strict JSON (RFC 8259).
79
296
  # Layer 2: JSON5 additions — line/block comments, trailing comma,
@@ -143,14 +360,9 @@ module SmarterJSON
143
360
  symbolize_keys: false, # Symbol keys instead of String
144
361
  duplicate_key: :last_wins, # :last_wins | :first_wins | :raise
145
362
  bigdecimal_load: :auto, # :auto | :float | :bigdecimal (Oj-compatible)
146
- warnings: false, # collect non-fatal lenient fixes; process returns [result, warnings]
363
+ on_warning: nil, # a callable invoked once per non-fatal lenient fix (a SmarterJSON::Warning)
147
364
  }.freeze
148
365
 
149
- # Warnings collected during the parse (empty slots, empty values, dropped duplicate
150
- # keys). Empty unless the parser was built with warnings: true. Public so the module
151
- # functions can read it after parse / each_value.
152
- attr_reader :warnings
153
-
154
366
  def initialize(input, options = {})
155
367
  raise ArgumentError, "input must be a String" unless input.is_a?(String)
156
368
 
@@ -158,8 +370,7 @@ module SmarterJSON
158
370
  @symbolize_keys = opts[:symbolize_keys]
159
371
  @duplicate_key = opts[:duplicate_key]
160
372
  @bigdecimal_load = opts[:bigdecimal_load]
161
- @collect_warnings = opts[:warnings]
162
- @warnings = []
373
+ @on_warning = opts[:on_warning]
163
374
 
164
375
  encoding = opts[:encoding]
165
376
  @input = encoding ? input.dup.force_encoding(encoding) : input
@@ -263,7 +474,7 @@ module SmarterJSON
263
474
  # Commas are collapsing separators inside a container: an empty slot (leading,
264
475
  # interior, or trailing comma) adds nothing. Skip it; the next iteration reads
265
476
  # the following value/key or the closing bracket.
266
- warn(:empty_slot, "extra comma — collapsed an empty slot") unless vss
477
+ warn(:empty_slot, "extra comma — collapsed an empty slot") if @on_warning && !vss
267
478
  vss = false
268
479
  advance(1)
269
480
  elsif cur_obj
@@ -300,7 +511,7 @@ module SmarterJSON
300
511
  elsif [RBRACE, COMMA].include?(b)
301
512
  # key with a colon but no value -> null (don't consume } or ,; the loop does)
302
513
  store_member(cur, key, nil)
303
- warn(:empty_value, "key #{key.inspect} had no value — used null")
514
+ warn(:empty_value, "key #{key.inspect} had no value — used null") if @on_warning
304
515
  vss = true
305
516
  elsif b.nil?
306
517
  raise error("unexpected end of input")
@@ -573,7 +784,7 @@ module SmarterJSON
573
784
  if hash.key?(k)
574
785
  raise error("duplicate key #{k.inspect}") if @duplicate_key == :raise
575
786
 
576
- warn(:duplicate_key, "duplicate key #{k.inspect} — #{@duplicate_key}")
787
+ warn(:duplicate_key, "duplicate key #{k.inspect} — #{@duplicate_key}") if @on_warning
577
788
  return if @duplicate_key == :first_wins
578
789
  end
579
790
  hash[k] = value
@@ -933,11 +1144,14 @@ module SmarterJSON
933
1144
  value
934
1145
  end
935
1146
 
936
- # Record a non-fatal lenient fix (only when built with warnings: true).
1147
+ # Report a non-fatal lenient fix to the on_warning callable. The call-site guards
1148
+ # (`if @on_warning`) keep the message string from being built on the fast path; this
1149
+ # internal guard is the safety net so a forgotten call-site guard can't crash a
1150
+ # handler-less caller.
937
1151
  def warn(type, message)
938
- return unless @collect_warnings
1152
+ return unless @on_warning
939
1153
 
940
- @warnings << Warning.new(type, message, @line, @col)
1154
+ @on_warning.call(Warning.new(type, message, @line, @col))
941
1155
  end
942
1156
 
943
1157
  def error(message)
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterJSON
4
- VERSION = "0.6.0"
4
+ VERSION = "0.8.0"
5
5
  end
@@ -3,8 +3,8 @@
3
3
  module SmarterJSON
4
4
  # A non-fatal thing the parser worked around while staying lenient — e.g. an empty
5
5
  # comma slot it collapsed, a key with no value it read as null, or a duplicate key
6
- # it dropped. Surfaced only when process / process_file is called with warnings: true
7
- # (which then returns [result, warnings]); otherwise the parser stays silent.
6
+ # it dropped. Passed to the on_warning: callable (when process / process_file is given
7
+ # one) once per fix; otherwise the parser stays silent and builds no Warning at all.
8
8
  #
9
9
  # type — a Symbol you can branch on (:empty_slot, :empty_value, :duplicate_key)
10
10
  # message — human-readable description
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_json
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.0
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  bindir: exe
9
9
  cert_chain: []
10
- date: 2026-06-02 00:00:00.000000000 Z
10
+ date: 2026-06-03 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
13
  name: bigdecimal