smarter_json 0.7.0 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +11 -1
- data/README.md +10 -4
- data/docs/_introduction.md +2 -2
- data/docs/basic_read_api.md +6 -9
- data/docs/examples.md +36 -8
- data/docs/options.md +1 -1
- data/lib/smarter_json/parser.rb +223 -5
- data/lib/smarter_json/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: '06668bbb40626009794f8e5387fb13ae1a31346a07200d2825fde4872904bd68'
|
|
4
|
+
data.tar.gz: 0db0d42bfd2e85a1af4b897990a290e25e4fbe0afd9b4e25e99d9520667de2c3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 766cd9c5865d7218f79db57ec538bb1d34355c93012e38820544a41bbedbbca94a4a8fce05982f5f2a0e48c5670a6d3f4336eb1a22361fc29b34855d913fc564
|
|
7
|
+
data.tar.gz: f57396ef1a2ac4e48e06dad94122b8159a3b17c80f6d679e1bb59f85874578ea444592a828c3eb324c8d84ff57c255df0bc99af2aeb5603261aa19d26abc88c2
|
data/CHANGELOG.md
CHANGED
|
@@ -3,7 +3,17 @@
|
|
|
3
3
|
|
|
4
4
|
> 🚧 Getting ready for the 1.0.0 release - sorry for the interface changes - thank you for your patience! 🚧
|
|
5
5
|
|
|
6
|
-
## 0.
|
|
6
|
+
## 0.8.0 (2026-06-03)
|
|
7
|
+
- **Robustness** against LLM-generated / wrapped JSON:
|
|
8
|
+
- strips markdown code fences (```json / ```)
|
|
9
|
+
- ignores obvious prefix / suffix prose around a payload
|
|
10
|
+
- unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`
|
|
11
|
+
- preserves multiple recovered payloads as an `Array`
|
|
12
|
+
- supports pretty-printed multi-line document framing on IO / block input
|
|
13
|
+
- **Warnings** now cover wrapper recovery too (`:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, `:wrapper_tag_stripped`)
|
|
14
|
+
- **No truncation recovery**: truncated / unterminated input still raises `SmarterJSON::ParseError`
|
|
15
|
+
|
|
16
|
+
## 0.7.0 (2026-06-03)
|
|
7
17
|
- **Breaking:** replaced the `warnings:` option (and its `[result, warnings]` tuple return) with an `on_warning:` callable. Pass `on_warning: ->(w) { ... }` to be handed each `SmarterJSON::Warning` as the parser applies a lenient fix; `process` / `process_file` now always return the bare value (nil / value / Array) on every path. Unlike the tuple, this also fires on the streaming block form. The default (no handler) records nothing and costs nothing.
|
|
8
18
|
|
|
9
19
|
## 0.6.0 (2026-06-02)
|
data/README.md
CHANGED
|
@@ -16,13 +16,14 @@ Three things set it apart:
|
|
|
16
16
|
|
|
17
17
|
1. **One parser, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.
|
|
18
18
|
|
|
19
|
-
2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.**
|
|
19
|
+
2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** Pass a block to iterate the recovered documents one at a time.
|
|
20
20
|
|
|
21
21
|
3. **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser — the fastest general-purpose Ruby JSON parser.
|
|
22
22
|
|
|
23
23
|
## What it accepts, beyond strict JSON
|
|
24
24
|
|
|
25
25
|
- `//`, `/* … */`, and `#` comments (a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` parses as a string, not a truncated value)
|
|
26
|
+
- Markdown-wrapped / chatty blobs around the payload: strips ```` ```json ```` / ```` ``` ```` fences, ignores obvious prose before/after the payload, unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`, and preserves multiple recovered payloads as an Array
|
|
26
27
|
- Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
|
|
27
28
|
- Implicit root object — a config file that starts with `key: value`, no outer `{}`
|
|
28
29
|
- `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
|
|
@@ -67,9 +68,14 @@ SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2
|
|
|
67
68
|
SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
|
|
68
69
|
SmarterJSON.process("") # => nil (zero documents)
|
|
69
70
|
|
|
70
|
-
#
|
|
71
|
+
# Iterate one recovered document at a time with a block
|
|
71
72
|
# (process and process_file both forward the block):
|
|
72
73
|
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
74
|
+
|
|
75
|
+
# Wrapper noise is stripped automatically:
|
|
76
|
+
SmarterJSON.process("Here is the JSON:\n\n```json\n{\"a\":1}\n```\n") # => {"a"=>1}
|
|
77
|
+
SmarterJSON.process("<json>{\"a\":1}</json>") # => {"a"=>1}
|
|
78
|
+
SmarterJSON.process("first:\n{\"a\":1}\nsecond:\n{\"b\":2}") # => [{"a"=>1}, {"b"=>2}]
|
|
73
79
|
```
|
|
74
80
|
|
|
75
81
|
### Options
|
|
@@ -85,7 +91,7 @@ SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event
|
|
|
85
91
|
|
|
86
92
|
### Warnings (`on_warning`)
|
|
87
93
|
|
|
88
|
-
When the parser quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
|
|
94
|
+
When the parser quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key, strips code fences, ignores wrapper prose, unwraps wrapper tags — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
|
|
89
95
|
|
|
90
96
|
```ruby
|
|
91
97
|
# Collect them all:
|
|
@@ -106,7 +112,7 @@ Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpu
|
|
|
106
112
|
|
|
107
113
|
**Two notes on fair comparison:**
|
|
108
114
|
|
|
109
|
-
- **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the
|
|
115
|
+
- **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the block form yields each recovered document instead of returning the collected Array.
|
|
110
116
|
- **High-precision decimals (e.g. `canada.json`):** SmarterJSON's default `:auto` mode preserves high-precision numbers as `BigDecimal` (matching Oj's default), which is intrinsically slower than `Float`. Against `Float`-producing parsers it looks slower on such files; pass `bigdecimal_load: :float` to compare like-for-like (it then runs much faster). Against the equivalent `BigDecimal`-producing Oj mode, SmarterJSON is faster.
|
|
111
117
|
|
|
112
118
|
## Encoding
|
data/docs/_introduction.md
CHANGED
|
@@ -11,7 +11,7 @@
|
|
|
11
11
|
|
|
12
12
|
# SmarterJSON Introduction
|
|
13
13
|
|
|
14
|
-
`smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
|
|
14
|
+
`smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), markdown-wrapped / chatty blobs around a JSON payload, and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
|
|
15
15
|
|
|
16
16
|
## Why another JSON library?
|
|
17
17
|
|
|
@@ -21,7 +21,7 @@ Most JSON parsers reject anything that isn't perfectly strict JSON, and they mak
|
|
|
21
21
|
|
|
22
22
|
* **One reader, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the reader to match your input; it adapts to whatever you give it.
|
|
23
23
|
|
|
24
|
-
* **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.**
|
|
24
|
+
* **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** Pass a block to iterate the recovered documents one at a time. See [The Basic Read API](./basic_read_api.md).
|
|
25
25
|
|
|
26
26
|
* **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser. Floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
|
|
27
27
|
|
data/docs/basic_read_api.md
CHANGED
|
@@ -22,7 +22,7 @@ SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
|
|
|
22
22
|
SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
|
|
23
23
|
```
|
|
24
24
|
|
|
25
|
-
`process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths.
|
|
25
|
+
`process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths. When the input wraps the payload in obvious markdown / prose / tags, `process` strips that wrapper first and then parses the recovered payload(s).
|
|
26
26
|
|
|
27
27
|
```ruby
|
|
28
28
|
SmarterJSON.process(io) # an open IO (File, StringIO, socket, …) — reads it and parses
|
|
@@ -39,7 +39,7 @@ SmarterJSON.process('{"id":1}') # => {"id"=>1} (one
|
|
|
39
39
|
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}] (two or more → an Array)
|
|
40
40
|
```
|
|
41
41
|
|
|
42
|
-
Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
|
|
42
|
+
Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). If wrapper noise is stripped and several payloads are recovered, they are returned by the same rule: one payload → its value, several → an `Array`. Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
|
|
43
43
|
|
|
44
44
|
## `SmarterJSON.process_file` — read a file by path
|
|
45
45
|
|
|
@@ -49,19 +49,16 @@ SmarterJSON.process_file("config.json5") # read the file, then parse — sam
|
|
|
49
49
|
|
|
50
50
|
`process_file` opens the file, reads it with the labeled [`encoding:`](./options.md) (default `"UTF-8"`, no transcoding pass), and parses it.
|
|
51
51
|
|
|
52
|
-
## Streaming with a block
|
|
52
|
+
## Streaming with a block
|
|
53
53
|
|
|
54
|
-
|
|
54
|
+
Pass a block to have each recovered top-level document yielded one at a time; the method returns `nil` instead of collecting the documents into an Array. Both `process` and `process_file` forward the block.
|
|
55
55
|
|
|
56
56
|
```ruby
|
|
57
|
-
# Stream straight from disk, one document at a time — the whole file is never loaded:
|
|
58
57
|
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
59
|
-
|
|
60
|
-
# Same for an IO:
|
|
61
58
|
SmarterJSON.process(io) { |doc| handle(doc) }
|
|
62
59
|
```
|
|
63
60
|
|
|
64
|
-
The streaming path
|
|
61
|
+
The streaming path now frames whole top-level documents, not just one line at a time. That means NDJSON / JSONL still work, but pretty-printed multi-line objects and arrays work too, as do mixed `\n` / `\r\n` / `\r` line endings and comment-only separators between documents.
|
|
65
62
|
|
|
66
63
|
## The C extension and the pure-Ruby fallback
|
|
67
64
|
|
|
@@ -79,7 +76,7 @@ warns.map(&:type) # => [:empty_slot]
|
|
|
79
76
|
warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
|
|
80
77
|
```
|
|
81
78
|
|
|
82
|
-
Each warning is a `SmarterJSON::Warning` with `type`, `message`, `line`, and `col`. The types are `:empty_slot` (a collapsed empty comma slot), `:empty_value` (a key with no value, read as `null`),
|
|
79
|
+
Each warning is a `SmarterJSON::Warning` with `type`, `message`, `line`, and `col`. The types are `:empty_slot` (a collapsed empty comma slot), `:empty_value` (a key with no value, read as `null`), `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. It fires on every path — including the streaming block form — and works the same on the C and pure-Ruby paths. See [Configuration Options](./options.md).
|
|
83
80
|
|
|
84
81
|
---------------
|
|
85
82
|
|
data/docs/examples.md
CHANGED
|
@@ -24,9 +24,10 @@
|
|
|
24
24
|
7. [Duplicate Keys](#example-7-duplicate-keys)
|
|
25
25
|
8. [High-Precision Numbers: BigDecimal vs Float](#example-8-high-precision-numbers-bigdecimal-vs-float)
|
|
26
26
|
9. [Lenient Input: Comments, Trailing Commas, Unquoted Keys](#example-9-lenient-input-comments-trailing-commas-unquoted-keys)
|
|
27
|
-
10. [
|
|
28
|
-
11. [Write
|
|
29
|
-
12. [
|
|
27
|
+
10. [Wrapper Noise Around a Payload](#example-10-wrapper-noise-around-a-payload)
|
|
28
|
+
11. [Write JSON](#example-11-write-json)
|
|
29
|
+
12. [Write NDJSON](#example-12-write-ndjson)
|
|
30
|
+
13. [Round-Trip Read and Write](#example-13-round-trip-read-and-write)
|
|
30
31
|
|
|
31
32
|
---
|
|
32
33
|
|
|
@@ -64,9 +65,9 @@ SmarterJSON.process('{"id":1}') # => {"id"=>1}
|
|
|
64
65
|
SmarterJSON.process("") # => nil
|
|
65
66
|
```
|
|
66
67
|
|
|
67
|
-
### Example 5:
|
|
68
|
+
### Example 5: Iterate Documents with a Block
|
|
68
69
|
|
|
69
|
-
|
|
70
|
+
Pass a block to receive each recovered document one at a time:
|
|
70
71
|
|
|
71
72
|
```ruby
|
|
72
73
|
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
@@ -113,14 +114,41 @@ JSON
|
|
|
113
114
|
|
|
114
115
|
A `#`/`//` only starts a comment when preceded by whitespace, so `http://example.com` stays a string rather than being truncated.
|
|
115
116
|
|
|
116
|
-
### Example 10:
|
|
117
|
+
### Example 10: Wrapper Noise Around a Payload
|
|
118
|
+
|
|
119
|
+
```ruby
|
|
120
|
+
SmarterJSON.process(<<~TEXT)
|
|
121
|
+
Here is the JSON:
|
|
122
|
+
|
|
123
|
+
```json
|
|
124
|
+
{
|
|
125
|
+
"a": 1
|
|
126
|
+
}
|
|
127
|
+
```
|
|
128
|
+
TEXT
|
|
129
|
+
# => {"a"=>1}
|
|
130
|
+
|
|
131
|
+
SmarterJSON.process("<json>{\"a\":1}</json>")
|
|
132
|
+
# => {"a"=>1}
|
|
133
|
+
|
|
134
|
+
SmarterJSON.process(<<~TEXT)
|
|
135
|
+
first:
|
|
136
|
+
{"a":1}
|
|
137
|
+
|
|
138
|
+
second:
|
|
139
|
+
{"b":2}
|
|
140
|
+
TEXT
|
|
141
|
+
# => [{"a"=>1}, {"b"=>2}]
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Example 11: Write JSON
|
|
117
145
|
|
|
118
146
|
```ruby
|
|
119
147
|
SmarterJSON.generate({ "a" => 1, "b" => [2, 3] }) # => '{"a":1,"b":[2,3]}'
|
|
120
148
|
SmarterJSON.generate([1, 2, 3]) # => '[1,2,3]'
|
|
121
149
|
```
|
|
122
150
|
|
|
123
|
-
### Example
|
|
151
|
+
### Example 12: Write NDJSON
|
|
124
152
|
|
|
125
153
|
An Array writes one element per line:
|
|
126
154
|
|
|
@@ -128,7 +156,7 @@ An Array writes one element per line:
|
|
|
128
156
|
SmarterJSON.generate([{ "id" => 1 }, { "id" => 2 }], format: :ndjson) # => "{\"id\":1}\n{\"id\":2}\n"
|
|
129
157
|
```
|
|
130
158
|
|
|
131
|
-
### Example
|
|
159
|
+
### Example 13: Round-Trip Read and Write
|
|
132
160
|
|
|
133
161
|
```ruby
|
|
134
162
|
obj = { "a" => 1, "b" => [2, "three", nil, true] }
|
data/docs/options.md
CHANGED
|
@@ -43,7 +43,7 @@ warns.map(&:type) # => [:empty_slot]
|
|
|
43
43
|
warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
|
|
44
44
|
```
|
|
45
45
|
|
|
46
|
-
The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped)
|
|
46
|
+
The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
|
|
47
47
|
|
|
48
48
|
### A note on `:encoding`
|
|
49
49
|
|
data/lib/smarter_json/parser.rb
CHANGED
|
@@ -22,9 +22,9 @@ module SmarterJSON
|
|
|
22
22
|
# stream as newline-delimited documents (NDJSON / JSONL), one per line.
|
|
23
23
|
def process(input, options = {}, &block)
|
|
24
24
|
if input.is_a?(String)
|
|
25
|
-
|
|
25
|
+
Recovery.process_string(input, options, &block)
|
|
26
26
|
elsif input.respond_to?(:read)
|
|
27
|
-
block ? stream_io(input, options, &block) :
|
|
27
|
+
block ? stream_io(input, options, &block) : process(input.read, options)
|
|
28
28
|
else
|
|
29
29
|
raise ArgumentError, "SmarterJSON.process expects a String or an IO, got #{input.class}"
|
|
30
30
|
end
|
|
@@ -43,7 +43,7 @@ module SmarterJSON
|
|
|
43
43
|
if block
|
|
44
44
|
File.open(path, "r:#{encoding}") { |io| stream_io(io, options, &block) }
|
|
45
45
|
else
|
|
46
|
-
|
|
46
|
+
process(File.read(path, encoding: encoding), options)
|
|
47
47
|
end
|
|
48
48
|
end
|
|
49
49
|
|
|
@@ -67,12 +67,230 @@ module SmarterJSON
|
|
|
67
67
|
# each — bounded memory. Newline-delimited (NDJSON / JSONL); a single document
|
|
68
68
|
# spanning multiple lines is not supported by the streaming path.
|
|
69
69
|
def stream_io(io, options, &block)
|
|
70
|
-
io.
|
|
71
|
-
nil
|
|
70
|
+
Recovery.process_string(io.read, options, &block)
|
|
72
71
|
end
|
|
73
72
|
|
|
74
73
|
private_class_method :process_content, :stream_io
|
|
75
74
|
|
|
75
|
+
module Recovery
|
|
76
|
+
module_function
|
|
77
|
+
|
|
78
|
+
def process_string(input, options, &block)
|
|
79
|
+
return SmarterJSON.send(:process_content, input, options, &block) unless input.valid_encoding?
|
|
80
|
+
|
|
81
|
+
if wrapper_hint?(input)
|
|
82
|
+
payloads = extract_payloads(input, options)
|
|
83
|
+
return replay_payloads(payloads, options, &block) unless payloads.empty?
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
SmarterJSON.send(:process_content, input, options, &block)
|
|
87
|
+
rescue ParseError => e
|
|
88
|
+
raise if e.is_a?(EncodingError)
|
|
89
|
+
|
|
90
|
+
payloads = extract_payloads(input, options)
|
|
91
|
+
return replay_payloads(payloads, options, &block) unless payloads.empty?
|
|
92
|
+
|
|
93
|
+
raise
|
|
94
|
+
end
|
|
95
|
+
|
|
96
|
+
def wrapper_hint?(input)
|
|
97
|
+
return false unless input.valid_encoding?
|
|
98
|
+
|
|
99
|
+
input.match?(/```|<json\b|BEGIN_JSON\b/i) || input.match?(/\A[[:space:]]*(?:JSON|Final answer)[[:space:]]*:/i)
|
|
100
|
+
end
|
|
101
|
+
|
|
102
|
+
def replay_payloads(payloads, options, &block)
|
|
103
|
+
handler = options[:on_warning]
|
|
104
|
+
emit_wrapper_warnings(payloads, handler)
|
|
105
|
+
|
|
106
|
+
results = payloads.map do |payload|
|
|
107
|
+
SmarterJSON.send(:process_content, payload[:slice], options)
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
return results.each(&block).then { nil } if block_given?
|
|
111
|
+
return nil if results.empty?
|
|
112
|
+
return results.first if results.length == 1
|
|
113
|
+
|
|
114
|
+
results
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
def emit_wrapper_warnings(payloads, handler)
|
|
118
|
+
return unless handler
|
|
119
|
+
|
|
120
|
+
meta = payloads.first[:meta]
|
|
121
|
+
warn(handler, :prefix_text_ignored, "ignored non-JSON text before the payload", *meta[:first_pos]) if meta[:prefix]
|
|
122
|
+
warn(handler, :code_fence_stripped, "stripped markdown code fences around the payload", *meta[:first_pos]) if meta[:fence]
|
|
123
|
+
warn(handler, :wrapper_tag_stripped, "stripped wrapper tags around the payload", *meta[:first_pos]) if meta[:wrapper]
|
|
124
|
+
warn(handler, :suffix_text_ignored, "ignored non-JSON text after the payload", *meta[:last_pos]) if meta[:suffix]
|
|
125
|
+
end
|
|
126
|
+
|
|
127
|
+
def extract_payloads(input, options)
|
|
128
|
+
payloads = candidate_ranges(input).filter_map do |range|
|
|
129
|
+
slice = input.byteslice(range.begin, range.end - range.begin)
|
|
130
|
+
begin
|
|
131
|
+
SmarterJSON.send(:process_content, slice, options.merge(on_warning: nil))
|
|
132
|
+
{ slice: slice, range: range }
|
|
133
|
+
rescue ParseError
|
|
134
|
+
nil
|
|
135
|
+
end
|
|
136
|
+
end
|
|
137
|
+
meta = wrapper_meta(input, payloads.map { |p| p[:range] })
|
|
138
|
+
payloads.each { |payload| payload[:meta] = meta }
|
|
139
|
+
payloads
|
|
140
|
+
end
|
|
141
|
+
|
|
142
|
+
def wrapper_meta(input, ranges)
|
|
143
|
+
return { prefix: false, suffix: false, fence: false, wrapper: false } if ranges.empty?
|
|
144
|
+
|
|
145
|
+
first = ranges.first
|
|
146
|
+
last = ranges.last
|
|
147
|
+
prefix = input.byteslice(0, first.begin)
|
|
148
|
+
suffix = input.byteslice(last.end, input.bytesize - last.end)
|
|
149
|
+
{
|
|
150
|
+
prefix: substantive_text?(prefix),
|
|
151
|
+
suffix: substantive_text?(suffix),
|
|
152
|
+
fence: input.match?(/```/),
|
|
153
|
+
wrapper: input.match?(/<json\b|BEGIN_JSON\b/i),
|
|
154
|
+
first_pos: line_col_for(input, first.begin),
|
|
155
|
+
last_pos: line_col_for(input, last.begin)
|
|
156
|
+
}
|
|
157
|
+
end
|
|
158
|
+
|
|
159
|
+
def line_col_for(input, offset)
|
|
160
|
+
line = 1
|
|
161
|
+
col = 1
|
|
162
|
+
i = 0
|
|
163
|
+
while i < offset
|
|
164
|
+
b = input.getbyte(i)
|
|
165
|
+
break if b.nil?
|
|
166
|
+
|
|
167
|
+
if b == 0x0A
|
|
168
|
+
line += 1
|
|
169
|
+
col = 1
|
|
170
|
+
i += 1
|
|
171
|
+
elsif b == 0x0D
|
|
172
|
+
line += 1
|
|
173
|
+
col = 1
|
|
174
|
+
i += 1
|
|
175
|
+
i += 1 if i < offset && input.getbyte(i) == 0x0A
|
|
176
|
+
else
|
|
177
|
+
col += 1
|
|
178
|
+
i += 1
|
|
179
|
+
end
|
|
180
|
+
end
|
|
181
|
+
[line, col]
|
|
182
|
+
end
|
|
183
|
+
|
|
184
|
+
def substantive_text?(text)
|
|
185
|
+
return false if text.nil? || text.empty?
|
|
186
|
+
|
|
187
|
+
stripped = text.dup
|
|
188
|
+
stripped.gsub!(%r{/\*.*?\*/}m, "")
|
|
189
|
+
stripped.gsub!(/^\s*(?:#|\/\/).*$/, "")
|
|
190
|
+
!stripped.strip.empty? && !stripped.strip.match?(/\A(?:```[a-zA-Z0-9_-]*)?\z/) && !stripped.strip.match?(/\A(?:<\/?json>|BEGIN_JSON|END_JSON)\z/i)
|
|
191
|
+
end
|
|
192
|
+
|
|
193
|
+
def warn(handler, type, message, line, col)
|
|
194
|
+
handler.call(Warning.new(type, message, line, col))
|
|
195
|
+
end
|
|
196
|
+
|
|
197
|
+
def candidate_ranges(input)
|
|
198
|
+
ranges = []
|
|
199
|
+
stack = []
|
|
200
|
+
start_pos = nil
|
|
201
|
+
i = 0
|
|
202
|
+
mode = nil
|
|
203
|
+
while i < input.bytesize
|
|
204
|
+
b = input.getbyte(i)
|
|
205
|
+
if mode == :double
|
|
206
|
+
if b == 0x5C
|
|
207
|
+
i += 2
|
|
208
|
+
next
|
|
209
|
+
elsif b == 0x22
|
|
210
|
+
mode = nil
|
|
211
|
+
end
|
|
212
|
+
i += 1
|
|
213
|
+
next
|
|
214
|
+
elsif mode == :single
|
|
215
|
+
if b == 0x5C
|
|
216
|
+
i += 2
|
|
217
|
+
next
|
|
218
|
+
elsif b == 0x27
|
|
219
|
+
mode = nil
|
|
220
|
+
end
|
|
221
|
+
i += 1
|
|
222
|
+
next
|
|
223
|
+
elsif mode == :triple
|
|
224
|
+
if input.byteslice(i, 3) == "'''"
|
|
225
|
+
mode = nil
|
|
226
|
+
i += 3
|
|
227
|
+
else
|
|
228
|
+
i += 1
|
|
229
|
+
end
|
|
230
|
+
next
|
|
231
|
+
elsif mode == :line_comment
|
|
232
|
+
if [0x0A, 0x0D].include?(b)
|
|
233
|
+
mode = nil
|
|
234
|
+
else
|
|
235
|
+
i += 1
|
|
236
|
+
next
|
|
237
|
+
end
|
|
238
|
+
elsif mode == :block_comment
|
|
239
|
+
if input.byteslice(i, 2) == "*/"
|
|
240
|
+
mode = nil
|
|
241
|
+
i += 2
|
|
242
|
+
else
|
|
243
|
+
i += 1
|
|
244
|
+
end
|
|
245
|
+
next
|
|
246
|
+
else
|
|
247
|
+
if input.byteslice(i, 2) == "//"
|
|
248
|
+
mode = :line_comment
|
|
249
|
+
i += 2
|
|
250
|
+
next
|
|
251
|
+
elsif input.byteslice(i, 2) == "/*"
|
|
252
|
+
mode = :block_comment
|
|
253
|
+
i += 2
|
|
254
|
+
next
|
|
255
|
+
elsif b == 0x23
|
|
256
|
+
mode = :line_comment
|
|
257
|
+
i += 1
|
|
258
|
+
next
|
|
259
|
+
elsif b == 0x22
|
|
260
|
+
mode = :double
|
|
261
|
+
i += 1
|
|
262
|
+
next
|
|
263
|
+
elsif input.byteslice(i, 3) == "'''"
|
|
264
|
+
mode = :triple
|
|
265
|
+
i += 3
|
|
266
|
+
next
|
|
267
|
+
elsif b == 0x27
|
|
268
|
+
mode = :single
|
|
269
|
+
i += 1
|
|
270
|
+
next
|
|
271
|
+
elsif [0x7B, 0x5B].include?(b)
|
|
272
|
+
start_pos = i if stack.empty?
|
|
273
|
+
stack << b
|
|
274
|
+
elsif b == 0x7D
|
|
275
|
+
stack.pop if stack.last == 0x7B
|
|
276
|
+
if stack.empty? && start_pos
|
|
277
|
+
ranges << (start_pos...(i + 1))
|
|
278
|
+
start_pos = nil
|
|
279
|
+
end
|
|
280
|
+
elsif b == 0x5D
|
|
281
|
+
stack.pop if stack.last == 0x5B
|
|
282
|
+
if stack.empty? && start_pos
|
|
283
|
+
ranges << (start_pos...(i + 1))
|
|
284
|
+
start_pos = nil
|
|
285
|
+
end
|
|
286
|
+
end
|
|
287
|
+
end
|
|
288
|
+
i += 1
|
|
289
|
+
end
|
|
290
|
+
ranges
|
|
291
|
+
end
|
|
292
|
+
end
|
|
293
|
+
|
|
76
294
|
# Hand-rolled FSM single-pass parser.
|
|
77
295
|
# Layer 1: strict JSON (RFC 8259).
|
|
78
296
|
# Layer 2: JSON5 additions — line/block comments, trailing comma,
|
data/lib/smarter_json/version.rb
CHANGED