smarter_json 0.7.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 00a4ada7970ab645761573bd3b9704effda072064102ead760124716f0896289
4
- data.tar.gz: a7232f3d8d06d3b7d770645b28a247774c83b8ac50603e4b7611006e852480d4
3
+ metadata.gz: '06668bbb40626009794f8e5387fb13ae1a31346a07200d2825fde4872904bd68'
4
+ data.tar.gz: 0db0d42bfd2e85a1af4b897990a290e25e4fbe0afd9b4e25e99d9520667de2c3
5
5
  SHA512:
6
- metadata.gz: a4b4b8737f10ebf09408014419a9d5cf2203b706766ea5acc358ad595c5cc34104b3f658f2ca6cf6b130124c514d89d035dea2a0ea16e619ffaec7da16da4a23
7
- data.tar.gz: a317f74e3399d6b95327fb10a7b65d2ca0f07062d9097303ab32aefb6721a46a78784addd0a937c0c8f433c4acb4e7758af3986fa741bd0d5b816141ee101fd4
6
+ metadata.gz: 766cd9c5865d7218f79db57ec538bb1d34355c93012e38820544a41bbedbbca94a4a8fce05982f5f2a0e48c5670a6d3f4336eb1a22361fc29b34855d913fc564
7
+ data.tar.gz: f57396ef1a2ac4e48e06dad94122b8159a3b17c80f6d679e1bb59f85874578ea444592a828c3eb324c8d84ff57c255df0bc99af2aeb5603261aa19d26abc88c2
data/CHANGELOG.md CHANGED
@@ -3,7 +3,17 @@
3
3
 
4
4
  > 🚧 Getting ready for the 1.0.0 release - sorry for the interface changes - thank you for your patience! 🚧
5
5
 
6
- ## 0.7.0 (2026-06-02)
6
+ ## 0.8.0 (2026-06-03)
7
+ - **Robustness** against LLM-generated / wrapped JSON:
8
+ - strips markdown code fences (```json / ```)
9
+ - ignores obvious prefix / suffix prose around a payload
10
+ - unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`
11
+ - preserves multiple recovered payloads as an `Array`
12
+ - supports pretty-printed multi-line document framing on IO / block input
13
+ - **Warnings** now cover wrapper recovery too (`:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, `:wrapper_tag_stripped`)
14
+ - **No truncation recovery**: truncated / unterminated input still raises `SmarterJSON::ParseError`
15
+
16
+ ## 0.7.0 (2026-06-03)
7
17
  - **Breaking:** replaced the `warnings:` option (and its `[result, warnings]` tuple return) with an `on_warning:` callable. Pass `on_warning: ->(w) { ... }` to be handed each `SmarterJSON::Warning` as the parser applies a lenient fix; `process` / `process_file` now always return the bare value (nil / value / Array) on every path. Unlike the tuple, this also fires on the streaming block form. The default (no handler) records nothing and costs nothing.
8
18
 
9
19
  ## 0.6.0 (2026-06-02)
data/README.md CHANGED
@@ -16,13 +16,14 @@ Three things set it apart:
16
16
 
17
17
  1. **One parser, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.
18
18
 
19
- 2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time.
19
+ 2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** Pass a block to iterate the recovered documents one at a time.
20
20
 
21
21
  3. **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser — the fastest general-purpose Ruby JSON parser.
22
22
 
23
23
  ## What it accepts, beyond strict JSON
24
24
 
25
25
  - `//`, `/* … */`, and `#` comments (a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` parses as a string, not a truncated value)
26
+ - Markdown-wrapped / chatty blobs around the payload: strips ```` ```json ```` / ```` ``` ```` fences, ignores obvious prose before/after the payload, unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`, and preserves multiple recovered payloads as an Array
26
27
  - Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
27
28
  - Implicit root object — a config file that starts with `key: value`, no outer `{}`
28
29
  - `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
@@ -67,9 +68,14 @@ SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2
67
68
  SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
68
69
  SmarterJSON.process("") # => nil (zero documents)
69
70
 
70
- # For input larger than memory, stream one document at a time with a block
71
+ # Iterate one recovered document at a time with a block
71
72
  # (process and process_file both forward the block):
72
73
  SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
74
+
75
+ # Wrapper noise is stripped automatically:
76
+ SmarterJSON.process("Here is the JSON:\n\n```json\n{\"a\":1}\n```\n") # => {"a"=>1}
77
+ SmarterJSON.process("<json>{\"a\":1}</json>") # => {"a"=>1}
78
+ SmarterJSON.process("first:\n{\"a\":1}\nsecond:\n{\"b\":2}") # => [{"a"=>1}, {"b"=>2}]
73
79
  ```
74
80
 
75
81
  ### Options
@@ -85,7 +91,7 @@ SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event
85
91
 
86
92
  ### Warnings (`on_warning`)
87
93
 
88
- When the parser quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
94
+ When the parser quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key, strips code fences, ignores wrapper prose, unwraps wrapper tags — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
89
95
 
90
96
  ```ruby
91
97
  # Collect them all:
@@ -106,7 +112,7 @@ Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpu
106
112
 
107
113
  **Two notes on fair comparison:**
108
114
 
109
- - **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the streaming block form runs faster (~440 MB/s) because it doesn't hold all documents in memory at once — use it for input larger than RAM.
115
+ - **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the block form yields each recovered document instead of returning the collected Array.
110
116
  - **High-precision decimals (e.g. `canada.json`):** SmarterJSON's default `:auto` mode preserves high-precision numbers as `BigDecimal` (matching Oj's default), which is intrinsically slower than `Float`. Against `Float`-producing parsers it looks slower on such files; pass `bigdecimal_load: :float` to compare like-for-like (it then runs much faster). Against the equivalent `BigDecimal`-producing Oj mode, SmarterJSON is faster.
111
117
 
112
118
  ## Encoding
@@ -11,7 +11,7 @@
11
11
 
12
12
  # SmarterJSON Introduction
13
13
 
14
- `smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
14
+ `smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), markdown-wrapped / chatty blobs around a JSON payload, and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
15
15
 
16
16
  ## Why another JSON library?
17
17
 
@@ -21,7 +21,7 @@ Most JSON parsers reject anything that isn't perfectly strict JSON, and they mak
21
21
 
22
22
  * **One reader, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the reader to match your input; it adapts to whatever you give it.
23
23
 
24
- * **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time. See [The Basic Read API](./basic_read_api.md).
24
+ * **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** Pass a block to iterate the recovered documents one at a time. See [The Basic Read API](./basic_read_api.md).
25
25
 
26
26
  * **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser. Floats are parsed with RyÅ« (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
27
27
 
@@ -22,7 +22,7 @@ SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
22
22
  SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
23
23
  ```
24
24
 
25
- `process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths.
25
+ `process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths. When the input wraps the payload in obvious markdown / prose / tags, `process` strips that wrapper first and then parses the recovered payload(s).
26
26
 
27
27
  ```ruby
28
28
  SmarterJSON.process(io) # an open IO (File, StringIO, socket, …) — reads it and parses
@@ -39,7 +39,7 @@ SmarterJSON.process('{"id":1}') # => {"id"=>1} (one
39
39
  SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}] (two or more → an Array)
40
40
  ```
41
41
 
42
- Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
42
+ Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). If wrapper noise is stripped and several payloads are recovered, they are returned by the same rule: one payload → its value, several → an `Array`. Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
43
43
 
44
44
  ## `SmarterJSON.process_file` — read a file by path
45
45
 
@@ -49,19 +49,16 @@ SmarterJSON.process_file("config.json5") # read the file, then parse — sam
49
49
 
50
50
  `process_file` opens the file, reads it with the labeled [`encoding:`](./options.md) (default `"UTF-8"`, no transcoding pass), and parses it.
51
51
 
52
- ## Streaming with a block (bounded memory)
52
+ ## Streaming with a block
53
53
 
54
- For input larger than memory, pass a block. Each top-level document is yielded as it is read, and the method returns `nil` (it never collects the documents into an Array). Both `process` and `process_file` forward the block.
54
+ Pass a block to have each recovered top-level document yielded one at a time; the method returns `nil` instead of collecting the documents into an Array. Both `process` and `process_file` forward the block.
55
55
 
56
56
  ```ruby
57
- # Stream straight from disk, one document at a time — the whole file is never loaded:
58
57
  SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
59
-
60
- # Same for an IO:
61
58
  SmarterJSON.process(io) { |doc| handle(doc) }
62
59
  ```
63
60
 
64
- The streaming path reads the input as newline-delimited documents (NDJSON / JSONL), one document per line. A single document that spans multiple lines is not supported by the streaming path — read it without a block instead.
61
+ The streaming path now frames whole top-level documents, not just one line at a time. That means NDJSON / JSONL still work, but pretty-printed multi-line objects and arrays work too, as do mixed `\n` / `\r\n` / `\r` line endings and comment-only separators between documents.
65
62
 
66
63
  ## The C extension and the pure-Ruby fallback
67
64
 
@@ -79,7 +76,7 @@ warns.map(&:type) # => [:empty_slot]
79
76
  warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
80
77
  ```
81
78
 
82
- Each warning is a `SmarterJSON::Warning` with `type`, `message`, `line`, and `col`. The types are `:empty_slot` (a collapsed empty comma slot), `:empty_value` (a key with no value, read as `null`), and `:duplicate_key` (a repeated key that was dropped). Clean input never invokes the handler. It fires on every path — including the streaming block form — and works the same on the C and pure-Ruby paths. See [Configuration Options](./options.md).
79
+ Each warning is a `SmarterJSON::Warning` with `type`, `message`, `line`, and `col`. The types are `:empty_slot` (a collapsed empty comma slot), `:empty_value` (a key with no value, read as `null`), `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. It fires on every path — including the streaming block form — and works the same on the C and pure-Ruby paths. See [Configuration Options](./options.md).
83
80
 
84
81
  ---------------
85
82
 
data/docs/examples.md CHANGED
@@ -24,9 +24,10 @@
24
24
  7. [Duplicate Keys](#example-7-duplicate-keys)
25
25
  8. [High-Precision Numbers: BigDecimal vs Float](#example-8-high-precision-numbers-bigdecimal-vs-float)
26
26
  9. [Lenient Input: Comments, Trailing Commas, Unquoted Keys](#example-9-lenient-input-comments-trailing-commas-unquoted-keys)
27
- 10. [Write JSON](#example-10-write-json)
28
- 11. [Write NDJSON](#example-11-write-ndjson)
29
- 12. [Round-Trip Read and Write](#example-12-round-trip-read-and-write)
27
+ 10. [Wrapper Noise Around a Payload](#example-10-wrapper-noise-around-a-payload)
28
+ 11. [Write JSON](#example-11-write-json)
29
+ 12. [Write NDJSON](#example-12-write-ndjson)
30
+ 13. [Round-Trip Read and Write](#example-13-round-trip-read-and-write)
30
31
 
31
32
  ---
32
33
 
@@ -64,9 +65,9 @@ SmarterJSON.process('{"id":1}') # => {"id"=>1}
64
65
  SmarterJSON.process("") # => nil
65
66
  ```
66
67
 
67
- ### Example 5: Streaming a Large File with a Block
68
+ ### Example 5: Iterate Documents with a Block
68
69
 
69
- For input larger than memory, pass a block. Each document is yielded as it is read; the whole file is never loaded:
70
+ Pass a block to receive each recovered document one at a time:
70
71
 
71
72
  ```ruby
72
73
  SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
@@ -113,14 +114,41 @@ JSON
113
114
 
114
115
  A `#`/`//` only starts a comment when preceded by whitespace, so `http://example.com` stays a string rather than being truncated.
115
116
 
116
- ### Example 10: Write JSON
117
+ ### Example 10: Wrapper Noise Around a Payload
118
+
119
+ ```ruby
120
+ SmarterJSON.process(<<~TEXT)
121
+ Here is the JSON:
122
+
123
+ ```json
124
+ {
125
+ "a": 1
126
+ }
127
+ ```
128
+ TEXT
129
+ # => {"a"=>1}
130
+
131
+ SmarterJSON.process("<json>{\"a\":1}</json>")
132
+ # => {"a"=>1}
133
+
134
+ SmarterJSON.process(<<~TEXT)
135
+ first:
136
+ {"a":1}
137
+
138
+ second:
139
+ {"b":2}
140
+ TEXT
141
+ # => [{"a"=>1}, {"b"=>2}]
142
+ ```
143
+
144
+ ### Example 11: Write JSON
117
145
 
118
146
  ```ruby
119
147
  SmarterJSON.generate({ "a" => 1, "b" => [2, 3] }) # => '{"a":1,"b":[2,3]}'
120
148
  SmarterJSON.generate([1, 2, 3]) # => '[1,2,3]'
121
149
  ```
122
150
 
123
- ### Example 11: Write NDJSON
151
+ ### Example 12: Write NDJSON
124
152
 
125
153
  An Array writes one element per line:
126
154
 
@@ -128,7 +156,7 @@ An Array writes one element per line:
128
156
  SmarterJSON.generate([{ "id" => 1 }, { "id" => 2 }], format: :ndjson) # => "{\"id\":1}\n{\"id\":2}\n"
129
157
  ```
130
158
 
131
- ### Example 12: Round-Trip Read and Write
159
+ ### Example 13: Round-Trip Read and Write
132
160
 
133
161
  ```ruby
134
162
  obj = { "a" => 1, "b" => [2, "three", nil, true] }
data/docs/options.md CHANGED
@@ -43,7 +43,7 @@ warns.map(&:type) # => [:empty_slot]
43
43
  warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
44
44
  ```
45
45
 
46
- The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped). Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
46
+ The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
47
47
 
48
48
  ### A note on `:encoding`
49
49
 
@@ -22,9 +22,9 @@ module SmarterJSON
22
22
  # stream as newline-delimited documents (NDJSON / JSONL), one per line.
23
23
  def process(input, options = {}, &block)
24
24
  if input.is_a?(String)
25
- process_content(input, options, &block)
25
+ Recovery.process_string(input, options, &block)
26
26
  elsif input.respond_to?(:read)
27
- block ? stream_io(input, options, &block) : process_content(input.read, options)
27
+ block ? stream_io(input, options, &block) : process(input.read, options)
28
28
  else
29
29
  raise ArgumentError, "SmarterJSON.process expects a String or an IO, got #{input.class}"
30
30
  end
@@ -43,7 +43,7 @@ module SmarterJSON
43
43
  if block
44
44
  File.open(path, "r:#{encoding}") { |io| stream_io(io, options, &block) }
45
45
  else
46
- process_content(File.read(path, encoding: encoding), options)
46
+ process(File.read(path, encoding: encoding), options)
47
47
  end
48
48
  end
49
49
 
@@ -67,12 +67,230 @@ module SmarterJSON
67
67
  # each — bounded memory. Newline-delimited (NDJSON / JSONL); a single document
68
68
  # spanning multiple lines is not supported by the streaming path.
69
69
  def stream_io(io, options, &block)
70
- io.each_line("\n") { |line| process_content(line, options, &block) }
71
- nil
70
+ Recovery.process_string(io.read, options, &block)
72
71
  end
73
72
 
74
73
  private_class_method :process_content, :stream_io
75
74
 
75
+ module Recovery
76
+ module_function
77
+
78
+ def process_string(input, options, &block)
79
+ return SmarterJSON.send(:process_content, input, options, &block) unless input.valid_encoding?
80
+
81
+ if wrapper_hint?(input)
82
+ payloads = extract_payloads(input, options)
83
+ return replay_payloads(payloads, options, &block) unless payloads.empty?
84
+ end
85
+
86
+ SmarterJSON.send(:process_content, input, options, &block)
87
+ rescue ParseError => e
88
+ raise if e.is_a?(EncodingError)
89
+
90
+ payloads = extract_payloads(input, options)
91
+ return replay_payloads(payloads, options, &block) unless payloads.empty?
92
+
93
+ raise
94
+ end
95
+
96
+ def wrapper_hint?(input)
97
+ return false unless input.valid_encoding?
98
+
99
+ input.match?(/```|<json\b|BEGIN_JSON\b/i) || input.match?(/\A[[:space:]]*(?:JSON|Final answer)[[:space:]]*:/i)
100
+ end
101
+
102
+ def replay_payloads(payloads, options, &block)
103
+ handler = options[:on_warning]
104
+ emit_wrapper_warnings(payloads, handler)
105
+
106
+ results = payloads.map do |payload|
107
+ SmarterJSON.send(:process_content, payload[:slice], options)
108
+ end
109
+
110
+ return results.each(&block).then { nil } if block_given?
111
+ return nil if results.empty?
112
+ return results.first if results.length == 1
113
+
114
+ results
115
+ end
116
+
117
+ def emit_wrapper_warnings(payloads, handler)
118
+ return unless handler
119
+
120
+ meta = payloads.first[:meta]
121
+ warn(handler, :prefix_text_ignored, "ignored non-JSON text before the payload", *meta[:first_pos]) if meta[:prefix]
122
+ warn(handler, :code_fence_stripped, "stripped markdown code fences around the payload", *meta[:first_pos]) if meta[:fence]
123
+ warn(handler, :wrapper_tag_stripped, "stripped wrapper tags around the payload", *meta[:first_pos]) if meta[:wrapper]
124
+ warn(handler, :suffix_text_ignored, "ignored non-JSON text after the payload", *meta[:last_pos]) if meta[:suffix]
125
+ end
126
+
127
+ def extract_payloads(input, options)
128
+ payloads = candidate_ranges(input).filter_map do |range|
129
+ slice = input.byteslice(range.begin, range.end - range.begin)
130
+ begin
131
+ SmarterJSON.send(:process_content, slice, options.merge(on_warning: nil))
132
+ { slice: slice, range: range }
133
+ rescue ParseError
134
+ nil
135
+ end
136
+ end
137
+ meta = wrapper_meta(input, payloads.map { |p| p[:range] })
138
+ payloads.each { |payload| payload[:meta] = meta }
139
+ payloads
140
+ end
141
+
142
+ def wrapper_meta(input, ranges)
143
+ return { prefix: false, suffix: false, fence: false, wrapper: false } if ranges.empty?
144
+
145
+ first = ranges.first
146
+ last = ranges.last
147
+ prefix = input.byteslice(0, first.begin)
148
+ suffix = input.byteslice(last.end, input.bytesize - last.end)
149
+ {
150
+ prefix: substantive_text?(prefix),
151
+ suffix: substantive_text?(suffix),
152
+ fence: input.match?(/```/),
153
+ wrapper: input.match?(/<json\b|BEGIN_JSON\b/i),
154
+ first_pos: line_col_for(input, first.begin),
155
+ last_pos: line_col_for(input, last.begin)
156
+ }
157
+ end
158
+
159
+ def line_col_for(input, offset)
160
+ line = 1
161
+ col = 1
162
+ i = 0
163
+ while i < offset
164
+ b = input.getbyte(i)
165
+ break if b.nil?
166
+
167
+ if b == 0x0A
168
+ line += 1
169
+ col = 1
170
+ i += 1
171
+ elsif b == 0x0D
172
+ line += 1
173
+ col = 1
174
+ i += 1
175
+ i += 1 if i < offset && input.getbyte(i) == 0x0A
176
+ else
177
+ col += 1
178
+ i += 1
179
+ end
180
+ end
181
+ [line, col]
182
+ end
183
+
184
+ def substantive_text?(text)
185
+ return false if text.nil? || text.empty?
186
+
187
+ stripped = text.dup
188
+ stripped.gsub!(%r{/\*.*?\*/}m, "")
189
+ stripped.gsub!(/^\s*(?:#|\/\/).*$/, "")
190
+ !stripped.strip.empty? && !stripped.strip.match?(/\A(?:```[a-zA-Z0-9_-]*)?\z/) && !stripped.strip.match?(/\A(?:<\/?json>|BEGIN_JSON|END_JSON)\z/i)
191
+ end
192
+
193
+ def warn(handler, type, message, line, col)
194
+ handler.call(Warning.new(type, message, line, col))
195
+ end
196
+
197
+ def candidate_ranges(input)
198
+ ranges = []
199
+ stack = []
200
+ start_pos = nil
201
+ i = 0
202
+ mode = nil
203
+ while i < input.bytesize
204
+ b = input.getbyte(i)
205
+ if mode == :double
206
+ if b == 0x5C
207
+ i += 2
208
+ next
209
+ elsif b == 0x22
210
+ mode = nil
211
+ end
212
+ i += 1
213
+ next
214
+ elsif mode == :single
215
+ if b == 0x5C
216
+ i += 2
217
+ next
218
+ elsif b == 0x27
219
+ mode = nil
220
+ end
221
+ i += 1
222
+ next
223
+ elsif mode == :triple
224
+ if input.byteslice(i, 3) == "'''"
225
+ mode = nil
226
+ i += 3
227
+ else
228
+ i += 1
229
+ end
230
+ next
231
+ elsif mode == :line_comment
232
+ if [0x0A, 0x0D].include?(b)
233
+ mode = nil
234
+ else
235
+ i += 1
236
+ next
237
+ end
238
+ elsif mode == :block_comment
239
+ if input.byteslice(i, 2) == "*/"
240
+ mode = nil
241
+ i += 2
242
+ else
243
+ i += 1
244
+ end
245
+ next
246
+ else
247
+ if input.byteslice(i, 2) == "//"
248
+ mode = :line_comment
249
+ i += 2
250
+ next
251
+ elsif input.byteslice(i, 2) == "/*"
252
+ mode = :block_comment
253
+ i += 2
254
+ next
255
+ elsif b == 0x23
256
+ mode = :line_comment
257
+ i += 1
258
+ next
259
+ elsif b == 0x22
260
+ mode = :double
261
+ i += 1
262
+ next
263
+ elsif input.byteslice(i, 3) == "'''"
264
+ mode = :triple
265
+ i += 3
266
+ next
267
+ elsif b == 0x27
268
+ mode = :single
269
+ i += 1
270
+ next
271
+ elsif [0x7B, 0x5B].include?(b)
272
+ start_pos = i if stack.empty?
273
+ stack << b
274
+ elsif b == 0x7D
275
+ stack.pop if stack.last == 0x7B
276
+ if stack.empty? && start_pos
277
+ ranges << (start_pos...(i + 1))
278
+ start_pos = nil
279
+ end
280
+ elsif b == 0x5D
281
+ stack.pop if stack.last == 0x5B
282
+ if stack.empty? && start_pos
283
+ ranges << (start_pos...(i + 1))
284
+ start_pos = nil
285
+ end
286
+ end
287
+ end
288
+ i += 1
289
+ end
290
+ ranges
291
+ end
292
+ end
293
+
76
294
  # Hand-rolled FSM single-pass parser.
77
295
  # Layer 1: strict JSON (RFC 8259).
78
296
  # Layer 2: JSON5 additions — line/block comments, trailing comma,
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterJSON
4
- VERSION = "0.7.0"
4
+ VERSION = "0.8.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_json
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda