smarter_json 0.6.0 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +17 -2
- data/README.md +23 -4
- data/docs/_introduction.md +2 -2
- data/docs/basic_read_api.md +13 -15
- data/docs/examples.md +36 -8
- data/docs/options.md +9 -8
- data/ext/smarter_json/smarter_json.c +20 -18
- data/lib/smarter_json/parser.rb +236 -22
- data/lib/smarter_json/version.rb +1 -1
- data/lib/smarter_json/warning.rb +2 -2
- metadata +2 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: '06668bbb40626009794f8e5387fb13ae1a31346a07200d2825fde4872904bd68'
|
|
4
|
+
data.tar.gz: 0db0d42bfd2e85a1af4b897990a290e25e4fbe0afd9b4e25e99d9520667de2c3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 766cd9c5865d7218f79db57ec538bb1d34355c93012e38820544a41bbedbbca94a4a8fce05982f5f2a0e48c5670a6d3f4336eb1a22361fc29b34855d913fc564
|
|
7
|
+
data.tar.gz: f57396ef1a2ac4e48e06dad94122b8159a3b17c80f6d679e1bb59f85874578ea444592a828c3eb324c8d84ff57c255df0bc99af2aeb5603261aa19d26abc88c2
|
data/CHANGELOG.md
CHANGED
|
@@ -1,6 +1,21 @@
|
|
|
1
1
|
|
|
2
2
|
# SmarterJSON Change Log
|
|
3
3
|
|
|
4
|
+
> 🚧 Getting ready for the 1.0.0 release - sorry for the interface changes - thank you for your patience! 🚧
|
|
5
|
+
|
|
6
|
+
## 0.8.0 (2026-06-03)
|
|
7
|
+
- **Robustness** against LLM-generated / wrapped JSON:
|
|
8
|
+
- strips markdown code fences (```json / ```)
|
|
9
|
+
- ignores obvious prefix / suffix prose around a payload
|
|
10
|
+
- unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`
|
|
11
|
+
- preserves multiple recovered payloads as an `Array`
|
|
12
|
+
- supports pretty-printed multi-line document framing on IO / block input
|
|
13
|
+
- **Warnings** now cover wrapper recovery too (`:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, `:wrapper_tag_stripped`)
|
|
14
|
+
- **No truncation recovery**: truncated / unterminated input still raises `SmarterJSON::ParseError`
|
|
15
|
+
|
|
16
|
+
## 0.7.0 (2026-06-03)
|
|
17
|
+
- **Breaking:** replaced the `warnings:` option (and its `[result, warnings]` tuple return) with an `on_warning:` callable. Pass `on_warning: ->(w) { ... }` to be handed each `SmarterJSON::Warning` as the parser applies a lenient fix; `process` / `process_file` now always return the bare value (nil / value / Array) on every path. Unlike the tuple, this also fires on the streaming block form. The default (no handler) records nothing and costs nothing.
|
|
18
|
+
|
|
4
19
|
## 0.6.0 (2026-06-02)
|
|
5
20
|
- Lenient comma handling: empty slots around / between commas are collapsed (`[1,,2]` → `[1,2]`, `[,1,]` → `[1]`, `{a:1,,b:2}` → `{a:1,b:2}`), on both the C and Ruby paths. No null is inserted for an empty slot.
|
|
6
21
|
- A key with a colon but no value reads as null: `{a:}` → `{"a"=>nil}` (both paths).
|
|
@@ -9,12 +24,12 @@
|
|
|
9
24
|
- Fixed a pure-Ruby bug where a `\u` escape whose next bytes split a multibyte character leaked `ArgumentError`; it now raises `SmarterJSON::ParseError`.
|
|
10
25
|
- Added a property/fuzz test suite that checks C/Ruby parity and round-tripping on generated, mutated, and random input.
|
|
11
26
|
|
|
12
|
-
## 0.5.2 (2026-06-01)
|
|
27
|
+
## 0.5.2 (2026-06-01) yanked
|
|
13
28
|
- `generate` now supports pretty-printing via the `indent:` option (spaces per nesting level; default `0` = compact). Empty objects/arrays stay inline; `indent:` combined with `format: :ndjson` raises `ArgumentError`.
|
|
14
29
|
- `generate` adds `sort_keys:` (emit object keys in sorted order), `ascii_only:` (escape non-ASCII as `\uXXXX`, astral chars as surrogate pairs), and `script_safe:` (escape `</` and U+2028/U+2029 for safe embedding in an HTML `<script>` tag).
|
|
15
30
|
- `generate` adds opt-in `coerce:` — when `true`, a value that isn't natively supported (e.g. `Time`, `Date`, app objects) is converted via its own `as_json` (result re-emitted) or `to_json` (spliced); strict-by-default still raises `GenerateError`.
|
|
16
31
|
|
|
17
|
-
## 0.5.1 (2026-06-01)
|
|
32
|
+
## 0.5.1 (2026-06-01) yanked
|
|
18
33
|
- Unified the error classes under a single `SmarterJSON::Error` base: `ParseError` and `EncodingError` now inherit from it, and `generate` raises a new `GenerateError`. `rescue SmarterJSON::Error` now catches everything the gem raises.
|
|
19
34
|
- Added a CI test matrix (Ruby 2.6–4.0 + head, on Ubuntu and macOS).
|
|
20
35
|
- Fixed the C extension build on Ruby 2.6 (declare `rb_hash_bulk_insert`, which 2.6 exports but does not declare in its headers); set the minimum Ruby to 2.6.
|
data/README.md
CHANGED
|
@@ -16,13 +16,14 @@ Three things set it apart:
|
|
|
16
16
|
|
|
17
17
|
1. **One parser, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.
|
|
18
18
|
|
|
19
|
-
2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.**
|
|
19
|
+
2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** Pass a block to iterate the recovered documents one at a time.
|
|
20
20
|
|
|
21
21
|
3. **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser — the fastest general-purpose Ruby JSON parser.
|
|
22
22
|
|
|
23
23
|
## What it accepts, beyond strict JSON
|
|
24
24
|
|
|
25
25
|
- `//`, `/* … */`, and `#` comments (a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` parses as a string, not a truncated value)
|
|
26
|
+
- Markdown-wrapped / chatty blobs around the payload: strips ```` ```json ```` / ```` ``` ```` fences, ignores obvious prose before/after the payload, unwraps `<json>...</json>` and `BEGIN_JSON ... END_JSON`, and preserves multiple recovered payloads as an Array
|
|
26
27
|
- Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
|
|
27
28
|
- Implicit root object — a config file that starts with `key: value`, no outer `{}`
|
|
28
29
|
- `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
|
|
@@ -67,9 +68,14 @@ SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2
|
|
|
67
68
|
SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
|
|
68
69
|
SmarterJSON.process("") # => nil (zero documents)
|
|
69
70
|
|
|
70
|
-
#
|
|
71
|
+
# Iterate one recovered document at a time with a block
|
|
71
72
|
# (process and process_file both forward the block):
|
|
72
73
|
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
74
|
+
|
|
75
|
+
# Wrapper noise is stripped automatically:
|
|
76
|
+
SmarterJSON.process("Here is the JSON:\n\n```json\n{\"a\":1}\n```\n") # => {"a"=>1}
|
|
77
|
+
SmarterJSON.process("<json>{\"a\":1}</json>") # => {"a"=>1}
|
|
78
|
+
SmarterJSON.process("first:\n{\"a\":1}\nsecond:\n{\"b\":2}") # => [{"a"=>1}, {"b"=>2}]
|
|
73
79
|
```
|
|
74
80
|
|
|
75
81
|
### Options
|
|
@@ -81,7 +87,20 @@ SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event
|
|
|
81
87
|
| `bigdecimal_load` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
|
|
82
88
|
| `acceleration` | `true` | `true` uses the C extension when compiled and loadable; `false` forces pure Ruby (identical results) |
|
|
83
89
|
| `encoding` | `"UTF-8"` | labels the input's encoding (no transcoding pass; see below) |
|
|
84
|
-
| `
|
|
90
|
+
| `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
|
|
91
|
+
|
|
92
|
+
### Warnings (`on_warning`)
|
|
93
|
+
|
|
94
|
+
When the parser quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key, strips code fences, ignores wrapper prose, unwraps wrapper tags — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
|
|
95
|
+
|
|
96
|
+
```ruby
|
|
97
|
+
# Collect them all:
|
|
98
|
+
warns = []
|
|
99
|
+
data = SmarterJSON.process(input, on_warning: ->(w) { warns << w })
|
|
100
|
+
|
|
101
|
+
# Or route them — log, count, raise:
|
|
102
|
+
SmarterJSON.process(input, on_warning: ->(w) { Rails.logger.warn(w) })
|
|
103
|
+
```
|
|
85
104
|
|
|
86
105
|
## Performance
|
|
87
106
|
|
|
@@ -93,7 +112,7 @@ Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpu
|
|
|
93
112
|
|
|
94
113
|
**Two notes on fair comparison:**
|
|
95
114
|
|
|
96
|
-
- **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the
|
|
115
|
+
- **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the block form yields each recovered document instead of returning the collected Array.
|
|
97
116
|
- **High-precision decimals (e.g. `canada.json`):** SmarterJSON's default `:auto` mode preserves high-precision numbers as `BigDecimal` (matching Oj's default), which is intrinsically slower than `Float`. Against `Float`-producing parsers it looks slower on such files; pass `bigdecimal_load: :float` to compare like-for-like (it then runs much faster). Against the equivalent `BigDecimal`-producing Oj mode, SmarterJSON is faster.
|
|
98
117
|
|
|
99
118
|
## Encoding
|
data/docs/_introduction.md
CHANGED
|
@@ -11,7 +11,7 @@
|
|
|
11
11
|
|
|
12
12
|
# SmarterJSON Introduction
|
|
13
13
|
|
|
14
|
-
`smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
|
|
14
|
+
`smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), markdown-wrapped / chatty blobs around a JSON payload, and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
|
|
15
15
|
|
|
16
16
|
## Why another JSON library?
|
|
17
17
|
|
|
@@ -21,7 +21,7 @@ Most JSON parsers reject anything that isn't perfectly strict JSON, and they mak
|
|
|
21
21
|
|
|
22
22
|
* **One reader, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the reader to match your input; it adapts to whatever you give it.
|
|
23
23
|
|
|
24
|
-
* **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.**
|
|
24
|
+
* **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** Pass a block to iterate the recovered documents one at a time. See [The Basic Read API](./basic_read_api.md).
|
|
25
25
|
|
|
26
26
|
* **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser. Floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
|
|
27
27
|
|
data/docs/basic_read_api.md
CHANGED
|
@@ -22,7 +22,7 @@ SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
|
|
|
22
22
|
SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
|
|
23
23
|
```
|
|
24
24
|
|
|
25
|
-
`process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths.
|
|
25
|
+
`process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths. When the input wraps the payload in obvious markdown / prose / tags, `process` strips that wrapper first and then parses the recovered payload(s).
|
|
26
26
|
|
|
27
27
|
```ruby
|
|
28
28
|
SmarterJSON.process(io) # an open IO (File, StringIO, socket, …) — reads it and parses
|
|
@@ -39,7 +39,7 @@ SmarterJSON.process('{"id":1}') # => {"id"=>1} (one
|
|
|
39
39
|
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}] (two or more → an Array)
|
|
40
40
|
```
|
|
41
41
|
|
|
42
|
-
Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
|
|
42
|
+
Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). If wrapper noise is stripped and several payloads are recovered, they are returned by the same rule: one payload → its value, several → an `Array`. Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
|
|
43
43
|
|
|
44
44
|
## `SmarterJSON.process_file` — read a file by path
|
|
45
45
|
|
|
@@ -49,36 +49,34 @@ SmarterJSON.process_file("config.json5") # read the file, then parse — sam
|
|
|
49
49
|
|
|
50
50
|
`process_file` opens the file, reads it with the labeled [`encoding:`](./options.md) (default `"UTF-8"`, no transcoding pass), and parses it.
|
|
51
51
|
|
|
52
|
-
## Streaming with a block
|
|
52
|
+
## Streaming with a block
|
|
53
53
|
|
|
54
|
-
|
|
54
|
+
Pass a block to have each recovered top-level document yielded one at a time; the method returns `nil` instead of collecting the documents into an Array. Both `process` and `process_file` forward the block.
|
|
55
55
|
|
|
56
56
|
```ruby
|
|
57
|
-
# Stream straight from disk, one document at a time — the whole file is never loaded:
|
|
58
57
|
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
59
|
-
|
|
60
|
-
# Same for an IO:
|
|
61
58
|
SmarterJSON.process(io) { |doc| handle(doc) }
|
|
62
59
|
```
|
|
63
60
|
|
|
64
|
-
The streaming path
|
|
61
|
+
The streaming path now frames whole top-level documents, not just one line at a time. That means NDJSON / JSONL still work, but pretty-printed multi-line objects and arrays work too, as do mixed `\n` / `\r\n` / `\r` line endings and comment-only separators between documents.
|
|
65
62
|
|
|
66
63
|
## The C extension and the pure-Ruby fallback
|
|
67
64
|
|
|
68
65
|
By default (`acceleration: true`) the C extension is used when it is compiled and loadable (`SmarterJSON::HAS_ACCELERATION` is then `true`); otherwise the pure-Ruby parser runs and produces identical results. Pass `acceleration: false` to force the pure-Ruby path. See [Configuration Options](./options.md).
|
|
69
66
|
|
|
70
|
-
## Seeing what was fixed: `
|
|
67
|
+
## Seeing what was fixed: `on_warning:`
|
|
71
68
|
|
|
72
|
-
`process` and `process_file` are lenient — they salvage your data rather than reject a whole document over a stray comma. Pass
|
|
69
|
+
`process` and `process_file` are lenient — they salvage your data rather than reject a whole document over a stray comma. Pass an `on_warning:` callable to also get a record of what was adjusted, so the leniency is transparent instead of silent. It is invoked once per fix and never changes the return value:
|
|
73
70
|
|
|
74
71
|
```ruby
|
|
75
|
-
|
|
76
|
-
result
|
|
77
|
-
|
|
78
|
-
|
|
72
|
+
warns = []
|
|
73
|
+
result = SmarterJSON.process("[1,,2]", on_warning: ->(w) { warns << w })
|
|
74
|
+
result # => [1, 2]
|
|
75
|
+
warns.map(&:type) # => [:empty_slot]
|
|
76
|
+
warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
|
|
79
77
|
```
|
|
80
78
|
|
|
81
|
-
Each warning is a `SmarterJSON::Warning` with `type`, `message`, `line`, and `col`. The types are `:empty_slot` (a collapsed empty comma slot), `:empty_value` (a key with no value, read as `null`),
|
|
79
|
+
Each warning is a `SmarterJSON::Warning` with `type`, `message`, `line`, and `col`. The types are `:empty_slot` (a collapsed empty comma slot), `:empty_value` (a key with no value, read as `null`), `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. It fires on every path — including the streaming block form — and works the same on the C and pure-Ruby paths. See [Configuration Options](./options.md).
|
|
82
80
|
|
|
83
81
|
---------------
|
|
84
82
|
|
data/docs/examples.md
CHANGED
|
@@ -24,9 +24,10 @@
|
|
|
24
24
|
7. [Duplicate Keys](#example-7-duplicate-keys)
|
|
25
25
|
8. [High-Precision Numbers: BigDecimal vs Float](#example-8-high-precision-numbers-bigdecimal-vs-float)
|
|
26
26
|
9. [Lenient Input: Comments, Trailing Commas, Unquoted Keys](#example-9-lenient-input-comments-trailing-commas-unquoted-keys)
|
|
27
|
-
10. [
|
|
28
|
-
11. [Write
|
|
29
|
-
12. [
|
|
27
|
+
10. [Wrapper Noise Around a Payload](#example-10-wrapper-noise-around-a-payload)
|
|
28
|
+
11. [Write JSON](#example-11-write-json)
|
|
29
|
+
12. [Write NDJSON](#example-12-write-ndjson)
|
|
30
|
+
13. [Round-Trip Read and Write](#example-13-round-trip-read-and-write)
|
|
30
31
|
|
|
31
32
|
---
|
|
32
33
|
|
|
@@ -64,9 +65,9 @@ SmarterJSON.process('{"id":1}') # => {"id"=>1}
|
|
|
64
65
|
SmarterJSON.process("") # => nil
|
|
65
66
|
```
|
|
66
67
|
|
|
67
|
-
### Example 5:
|
|
68
|
+
### Example 5: Iterate Documents with a Block
|
|
68
69
|
|
|
69
|
-
|
|
70
|
+
Pass a block to receive each recovered document one at a time:
|
|
70
71
|
|
|
71
72
|
```ruby
|
|
72
73
|
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
@@ -113,14 +114,41 @@ JSON
|
|
|
113
114
|
|
|
114
115
|
A `#`/`//` only starts a comment when preceded by whitespace, so `http://example.com` stays a string rather than being truncated.
|
|
115
116
|
|
|
116
|
-
### Example 10:
|
|
117
|
+
### Example 10: Wrapper Noise Around a Payload
|
|
118
|
+
|
|
119
|
+
```ruby
|
|
120
|
+
SmarterJSON.process(<<~TEXT)
|
|
121
|
+
Here is the JSON:
|
|
122
|
+
|
|
123
|
+
```json
|
|
124
|
+
{
|
|
125
|
+
"a": 1
|
|
126
|
+
}
|
|
127
|
+
```
|
|
128
|
+
TEXT
|
|
129
|
+
# => {"a"=>1}
|
|
130
|
+
|
|
131
|
+
SmarterJSON.process("<json>{\"a\":1}</json>")
|
|
132
|
+
# => {"a"=>1}
|
|
133
|
+
|
|
134
|
+
SmarterJSON.process(<<~TEXT)
|
|
135
|
+
first:
|
|
136
|
+
{"a":1}
|
|
137
|
+
|
|
138
|
+
second:
|
|
139
|
+
{"b":2}
|
|
140
|
+
TEXT
|
|
141
|
+
# => [{"a"=>1}, {"b"=>2}]
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Example 11: Write JSON
|
|
117
145
|
|
|
118
146
|
```ruby
|
|
119
147
|
SmarterJSON.generate({ "a" => 1, "b" => [2, 3] }) # => '{"a":1,"b":[2,3]}'
|
|
120
148
|
SmarterJSON.generate([1, 2, 3]) # => '[1,2,3]'
|
|
121
149
|
```
|
|
122
150
|
|
|
123
|
-
### Example
|
|
151
|
+
### Example 12: Write NDJSON
|
|
124
152
|
|
|
125
153
|
An Array writes one element per line:
|
|
126
154
|
|
|
@@ -128,7 +156,7 @@ An Array writes one element per line:
|
|
|
128
156
|
SmarterJSON.generate([{ "id" => 1 }, { "id" => 2 }], format: :ndjson) # => "{\"id\":1}\n{\"id\":2}\n"
|
|
129
157
|
```
|
|
130
158
|
|
|
131
|
-
### Example
|
|
159
|
+
### Example 13: Round-Trip Read and Write
|
|
132
160
|
|
|
133
161
|
```ruby
|
|
134
162
|
obj = { "a" => 1, "b" => [2, "three", nil, true] }
|
data/docs/options.md
CHANGED
|
@@ -22,27 +22,28 @@ These options are passed to [`SmarterJSON.process`](./basic_read_api.md) and `Sm
|
|
|
22
22
|
| `:bigdecimal_load`| `:auto` | `:auto` keeps high-precision decimals as `BigDecimal` (matches Oj); `:float` forces every number to `Float`; `:bigdecimal` forces every decimal to `BigDecimal`. |
|
|
23
23
|
| `:acceleration` | `true` | Use the C extension when it is compiled and loadable; `false` forces the pure-Ruby parser. Both produce identical results. |
|
|
24
24
|
| `:encoding` | `nil` | Labels the input's encoding (e.g. `"UTF-8"`). It does **not** trigger a transcoding pass — see below. |
|
|
25
|
-
| `:
|
|
25
|
+
| `:on_warning` | `nil` | A callable invoked once per lenient fix applied, passed a `SmarterJSON::Warning`. Never changes the return value. See below. |
|
|
26
26
|
|
|
27
27
|
```ruby
|
|
28
28
|
SmarterJSON.process('{"a": 1}', symbolize_keys: true) # => {:a=>1}
|
|
29
29
|
SmarterJSON.process('{"a":1,"a":2}', duplicate_key: :raise) # raises SmarterJSON::ParseError
|
|
30
30
|
SmarterJSON.process(big_decimal_json, bigdecimal_load: :float) # every number as Float (fastest)
|
|
31
|
-
SmarterJSON.process("[1,,2]",
|
|
31
|
+
SmarterJSON.process("[1,,2]", on_warning: ->(w) { puts w }) # => [1, 2], and prints the warning
|
|
32
32
|
```
|
|
33
33
|
|
|
34
|
-
### A note on `:
|
|
34
|
+
### A note on `:on_warning`
|
|
35
35
|
|
|
36
|
-
`smarter_json` is lenient by design — it salvages your data instead of rejecting the whole document over a stray comma. `
|
|
36
|
+
`smarter_json` is lenient by design — it salvages your data instead of rejecting the whole document over a stray comma. `on_warning:` keeps that, but also hands you a record of what it had to fix, so leniency is transparent rather than silent. It takes a callable that the parser invokes once per fix, passing a `SmarterJSON::Warning` (with `type` (a Symbol), `message`, `line`, and `col`). It never changes the return value — `process` still hands back the bare value — and it fires on every path, including the streaming block form. With no handler (the default), nothing is recorded and there is zero overhead.
|
|
37
37
|
|
|
38
38
|
```ruby
|
|
39
|
-
|
|
39
|
+
warns = []
|
|
40
|
+
result = SmarterJSON.process("[1,,2]", on_warning: ->(w) { warns << w })
|
|
40
41
|
result # => [1, 2]
|
|
41
|
-
|
|
42
|
-
|
|
42
|
+
warns.map(&:type) # => [:empty_slot]
|
|
43
|
+
warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
|
|
43
44
|
```
|
|
44
45
|
|
|
45
|
-
The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped)
|
|
46
|
+
The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
|
|
46
47
|
|
|
47
48
|
### A note on `:encoding`
|
|
48
49
|
|
|
@@ -34,6 +34,7 @@ static VALUE cParseError;
|
|
|
34
34
|
static VALUE cEncodingError;
|
|
35
35
|
static VALUE cWarning;
|
|
36
36
|
static ID fj_new_id;
|
|
37
|
+
static ID fj_call_id; /* cached :call (invoking the on_warning handler) */
|
|
37
38
|
static VALUE fj_sym_empty_slot;
|
|
38
39
|
static VALUE fj_sym_empty_value;
|
|
39
40
|
static VALUE fj_sym_duplicate_key;
|
|
@@ -60,8 +61,7 @@ typedef struct {
|
|
|
60
61
|
int dup_raise;
|
|
61
62
|
int bigdecimal_load; /* 0 = float, 1 = auto, 2 = bigdecimal */
|
|
62
63
|
fj_kc_slot *kcache; /* per-parse key cache (NULL when interning unavailable) */
|
|
63
|
-
|
|
64
|
-
VALUE warnings; /* rb_ary of SmarterJSON::Warning when collecting, else Qnil */
|
|
64
|
+
VALUE on_warning; /* on_warning: callable invoked per non-fatal lenient fix, else Qnil */
|
|
65
65
|
} fj_state;
|
|
66
66
|
|
|
67
67
|
/* Line/column at the current byte position, computed lazily (only when raising
|
|
@@ -81,14 +81,16 @@ static void fj_line_col(fj_state *st, long *line, long *col) {
|
|
|
81
81
|
*col = c;
|
|
82
82
|
}
|
|
83
83
|
|
|
84
|
-
/*
|
|
84
|
+
/* Report a non-fatal lenient fix to the on_warning callable — a no-op (and builds no
|
|
85
|
+
* Warning) when no handler was given. The internal Qnil guard is the safety net; the
|
|
86
|
+
* call sites also guard so the line/col scan is skipped on the fast path. */
|
|
85
87
|
static void fj_warn(fj_state *st, VALUE type_sym, const char *msg) {
|
|
86
88
|
long line, col;
|
|
87
|
-
if (
|
|
89
|
+
if (st->on_warning == Qnil) return;
|
|
88
90
|
fj_line_col(st, &line, &col);
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
91
|
+
rb_funcall(st->on_warning, fj_call_id, 1,
|
|
92
|
+
rb_funcall(cWarning, fj_new_id, 4, type_sym,
|
|
93
|
+
rb_utf8_str_new_cstr(msg), LONG2NUM(line), LONG2NUM(col)));
|
|
92
94
|
}
|
|
93
95
|
|
|
94
96
|
/* 1-based column of the current byte position (bytes since the last line start).
|
|
@@ -1161,9 +1163,9 @@ static VALUE fj_build_object(fj_state *st, const VALUE *pairs, long count) {
|
|
|
1161
1163
|
long entries = count / 2, i;
|
|
1162
1164
|
VALUE hash = rb_hash_new_capa(entries);
|
|
1163
1165
|
|
|
1164
|
-
/* Fast path: bulk insert. Skipped when
|
|
1165
|
-
* per-member loop below to report each dropped duplicate key. */
|
|
1166
|
-
if (!st->symbolize_keys && !st->dup_first_wins &&
|
|
1166
|
+
/* Fast path: bulk insert. Skipped when an on_warning handler is present, which needs
|
|
1167
|
+
* the per-member loop below to report each dropped duplicate key. */
|
|
1168
|
+
if (!st->symbolize_keys && !st->dup_first_wins && st->on_warning == Qnil) {
|
|
1167
1169
|
rb_hash_bulk_insert(count, pairs, hash);
|
|
1168
1170
|
if (st->dup_raise && fj_hash_len(hash) < entries) {
|
|
1169
1171
|
VALUE seen = rb_hash_new_capa(entries);
|
|
@@ -1178,7 +1180,7 @@ static VALUE fj_build_object(fj_state *st, const VALUE *pairs, long count) {
|
|
|
1178
1180
|
|
|
1179
1181
|
for (i = 0; i + 1 < count; i += 2) {
|
|
1180
1182
|
VALUE k = st->symbolize_keys ? rb_funcall(pairs[i], fj_to_sym_id, 0) : pairs[i];
|
|
1181
|
-
if (st->dup_first_wins || st->dup_raise || st->
|
|
1183
|
+
if (st->dup_first_wins || st->dup_raise || st->on_warning != Qnil) {
|
|
1182
1184
|
if (RTEST(rb_funcall(hash, fj_key_p_id, 1, k))) {
|
|
1183
1185
|
if (st->dup_raise) fj_error(st, "duplicate key");
|
|
1184
1186
|
fj_warn(st, fj_sym_duplicate_key, "duplicate key");
|
|
@@ -1271,7 +1273,7 @@ static VALUE fj_parse_iter(fj_state *st, int implicit_root) {
|
|
|
1271
1273
|
fj_skip_ws_comments(st);
|
|
1272
1274
|
b = fj_byte(st);
|
|
1273
1275
|
if (b == ',') { /* collapsing separator: skip empty member */
|
|
1274
|
-
if (st->
|
|
1276
|
+
if (st->on_warning != Qnil && !vss) fj_warn(st, fj_sym_empty_slot, "extra comma, collapsed an empty slot");
|
|
1275
1277
|
vss = 0;
|
|
1276
1278
|
fj_advance(st, 1);
|
|
1277
1279
|
continue;
|
|
@@ -1323,7 +1325,7 @@ static VALUE fj_parse_iter(fj_state *st, int implicit_root) {
|
|
|
1323
1325
|
fj_skip_ws_comments(st);
|
|
1324
1326
|
b = fj_byte(st);
|
|
1325
1327
|
if (b == ',') { /* collapsing separator: skip empty slot */
|
|
1326
|
-
if (st->
|
|
1328
|
+
if (st->on_warning != Qnil && !vss) fj_warn(st, fj_sym_empty_slot, "extra comma, collapsed an empty slot");
|
|
1327
1329
|
vss = 0;
|
|
1328
1330
|
fj_advance(st, 1);
|
|
1329
1331
|
continue;
|
|
@@ -1412,8 +1414,7 @@ static VALUE fj_parse_c(VALUE self, VALUE input, VALUE opts) {
|
|
|
1412
1414
|
else st.bigdecimal_load = 1; /* :auto (default), including nil */
|
|
1413
1415
|
}
|
|
1414
1416
|
|
|
1415
|
-
st.
|
|
1416
|
-
st.warnings = st.collect_warnings ? rb_ary_new() : Qnil;
|
|
1417
|
+
st.on_warning = rb_hash_aref(opts, ID2SYM(rb_intern("on_warning"))); /* Qnil when absent */
|
|
1417
1418
|
|
|
1418
1419
|
if (st.len >= 3 && (unsigned char)st.buf[0] == 0xEF &&
|
|
1419
1420
|
(unsigned char)st.buf[1] == 0xBB && (unsigned char)st.buf[2] == 0xBF) {
|
|
@@ -1439,10 +1440,10 @@ static VALUE fj_parse_c(VALUE self, VALUE input, VALUE opts) {
|
|
|
1439
1440
|
* whitespace / newline / concatenation do), so a bracketless comma list still
|
|
1440
1441
|
* raises in fj_parse_iter — the unsupported implicit-root array. */
|
|
1441
1442
|
fj_skip_ws_comments(&st);
|
|
1442
|
-
if (fj_eof(&st)) return
|
|
1443
|
+
if (fj_eof(&st)) return Qnil;
|
|
1443
1444
|
value = fj_parse_iter(&st, fj_implicit_root_ahead(&st));
|
|
1444
1445
|
fj_skip_ws_comments(&st);
|
|
1445
|
-
if (fj_eof(&st)) return
|
|
1446
|
+
if (fj_eof(&st)) return value;
|
|
1446
1447
|
{
|
|
1447
1448
|
VALUE arr = rb_ary_new();
|
|
1448
1449
|
rb_ary_push(arr, value);
|
|
@@ -1450,7 +1451,7 @@ static VALUE fj_parse_c(VALUE self, VALUE input, VALUE opts) {
|
|
|
1450
1451
|
rb_ary_push(arr, fj_parse_iter(&st, fj_implicit_root_ahead(&st)));
|
|
1451
1452
|
fj_skip_ws_comments(&st);
|
|
1452
1453
|
} while (!fj_eof(&st));
|
|
1453
|
-
return
|
|
1454
|
+
return arr;
|
|
1454
1455
|
}
|
|
1455
1456
|
}
|
|
1456
1457
|
|
|
@@ -1463,6 +1464,7 @@ void Init_smarter_json(void) {
|
|
|
1463
1464
|
fj_to_sym_id = rb_intern("to_sym");
|
|
1464
1465
|
fj_key_p_id = rb_intern("key?");
|
|
1465
1466
|
fj_new_id = rb_intern("new");
|
|
1467
|
+
fj_call_id = rb_intern("call");
|
|
1466
1468
|
fj_sym_empty_slot = ID2SYM(rb_intern("empty_slot"));
|
|
1467
1469
|
fj_sym_empty_value = ID2SYM(rb_intern("empty_value"));
|
|
1468
1470
|
fj_sym_duplicate_key = ID2SYM(rb_intern("duplicate_key"));
|
data/lib/smarter_json/parser.rb
CHANGED
|
@@ -22,9 +22,9 @@ module SmarterJSON
|
|
|
22
22
|
# stream as newline-delimited documents (NDJSON / JSONL), one per line.
|
|
23
23
|
def process(input, options = {}, &block)
|
|
24
24
|
if input.is_a?(String)
|
|
25
|
-
|
|
25
|
+
Recovery.process_string(input, options, &block)
|
|
26
26
|
elsif input.respond_to?(:read)
|
|
27
|
-
block ? stream_io(input, options, &block) :
|
|
27
|
+
block ? stream_io(input, options, &block) : process(input.read, options)
|
|
28
28
|
else
|
|
29
29
|
raise ArgumentError, "SmarterJSON.process expects a String or an IO, got #{input.class}"
|
|
30
30
|
end
|
|
@@ -43,7 +43,7 @@ module SmarterJSON
|
|
|
43
43
|
if block
|
|
44
44
|
File.open(path, "r:#{encoding}") { |io| stream_io(io, options, &block) }
|
|
45
45
|
else
|
|
46
|
-
|
|
46
|
+
process(File.read(path, encoding: encoding), options)
|
|
47
47
|
end
|
|
48
48
|
end
|
|
49
49
|
|
|
@@ -57,10 +57,9 @@ module SmarterJSON
|
|
|
57
57
|
Parser.new(input, options).each_value(&block)
|
|
58
58
|
end
|
|
59
59
|
elsif options.fetch(:acceleration, true) && HAS_ACCELERATION
|
|
60
|
-
parse_c(input, options)
|
|
60
|
+
parse_c(input, options)
|
|
61
61
|
else
|
|
62
|
-
|
|
63
|
-
options.fetch(:warnings, false) ? [parser.parse, parser.warnings] : parser.parse
|
|
62
|
+
Parser.new(input, options).parse
|
|
64
63
|
end
|
|
65
64
|
end
|
|
66
65
|
|
|
@@ -68,12 +67,230 @@ module SmarterJSON
|
|
|
68
67
|
# each — bounded memory. Newline-delimited (NDJSON / JSONL); a single document
|
|
69
68
|
# spanning multiple lines is not supported by the streaming path.
|
|
70
69
|
def stream_io(io, options, &block)
|
|
71
|
-
io.
|
|
72
|
-
nil
|
|
70
|
+
Recovery.process_string(io.read, options, &block)
|
|
73
71
|
end
|
|
74
72
|
|
|
75
73
|
private_class_method :process_content, :stream_io
|
|
76
74
|
|
|
75
|
+
module Recovery
|
|
76
|
+
module_function
|
|
77
|
+
|
|
78
|
+
def process_string(input, options, &block)
|
|
79
|
+
return SmarterJSON.send(:process_content, input, options, &block) unless input.valid_encoding?
|
|
80
|
+
|
|
81
|
+
if wrapper_hint?(input)
|
|
82
|
+
payloads = extract_payloads(input, options)
|
|
83
|
+
return replay_payloads(payloads, options, &block) unless payloads.empty?
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
SmarterJSON.send(:process_content, input, options, &block)
|
|
87
|
+
rescue ParseError => e
|
|
88
|
+
raise if e.is_a?(EncodingError)
|
|
89
|
+
|
|
90
|
+
payloads = extract_payloads(input, options)
|
|
91
|
+
return replay_payloads(payloads, options, &block) unless payloads.empty?
|
|
92
|
+
|
|
93
|
+
raise
|
|
94
|
+
end
|
|
95
|
+
|
|
96
|
+
def wrapper_hint?(input)
|
|
97
|
+
return false unless input.valid_encoding?
|
|
98
|
+
|
|
99
|
+
input.match?(/```|<json\b|BEGIN_JSON\b/i) || input.match?(/\A[[:space:]]*(?:JSON|Final answer)[[:space:]]*:/i)
|
|
100
|
+
end
|
|
101
|
+
|
|
102
|
+
def replay_payloads(payloads, options, &block)
|
|
103
|
+
handler = options[:on_warning]
|
|
104
|
+
emit_wrapper_warnings(payloads, handler)
|
|
105
|
+
|
|
106
|
+
results = payloads.map do |payload|
|
|
107
|
+
SmarterJSON.send(:process_content, payload[:slice], options)
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
return results.each(&block).then { nil } if block_given?
|
|
111
|
+
return nil if results.empty?
|
|
112
|
+
return results.first if results.length == 1
|
|
113
|
+
|
|
114
|
+
results
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
def emit_wrapper_warnings(payloads, handler)
|
|
118
|
+
return unless handler
|
|
119
|
+
|
|
120
|
+
meta = payloads.first[:meta]
|
|
121
|
+
warn(handler, :prefix_text_ignored, "ignored non-JSON text before the payload", *meta[:first_pos]) if meta[:prefix]
|
|
122
|
+
warn(handler, :code_fence_stripped, "stripped markdown code fences around the payload", *meta[:first_pos]) if meta[:fence]
|
|
123
|
+
warn(handler, :wrapper_tag_stripped, "stripped wrapper tags around the payload", *meta[:first_pos]) if meta[:wrapper]
|
|
124
|
+
warn(handler, :suffix_text_ignored, "ignored non-JSON text after the payload", *meta[:last_pos]) if meta[:suffix]
|
|
125
|
+
end
|
|
126
|
+
|
|
127
|
+
def extract_payloads(input, options)
|
|
128
|
+
payloads = candidate_ranges(input).filter_map do |range|
|
|
129
|
+
slice = input.byteslice(range.begin, range.end - range.begin)
|
|
130
|
+
begin
|
|
131
|
+
SmarterJSON.send(:process_content, slice, options.merge(on_warning: nil))
|
|
132
|
+
{ slice: slice, range: range }
|
|
133
|
+
rescue ParseError
|
|
134
|
+
nil
|
|
135
|
+
end
|
|
136
|
+
end
|
|
137
|
+
meta = wrapper_meta(input, payloads.map { |p| p[:range] })
|
|
138
|
+
payloads.each { |payload| payload[:meta] = meta }
|
|
139
|
+
payloads
|
|
140
|
+
end
|
|
141
|
+
|
|
142
|
+
def wrapper_meta(input, ranges)
|
|
143
|
+
return { prefix: false, suffix: false, fence: false, wrapper: false } if ranges.empty?
|
|
144
|
+
|
|
145
|
+
first = ranges.first
|
|
146
|
+
last = ranges.last
|
|
147
|
+
prefix = input.byteslice(0, first.begin)
|
|
148
|
+
suffix = input.byteslice(last.end, input.bytesize - last.end)
|
|
149
|
+
{
|
|
150
|
+
prefix: substantive_text?(prefix),
|
|
151
|
+
suffix: substantive_text?(suffix),
|
|
152
|
+
fence: input.match?(/```/),
|
|
153
|
+
wrapper: input.match?(/<json\b|BEGIN_JSON\b/i),
|
|
154
|
+
first_pos: line_col_for(input, first.begin),
|
|
155
|
+
last_pos: line_col_for(input, last.begin)
|
|
156
|
+
}
|
|
157
|
+
end
|
|
158
|
+
|
|
159
|
+
def line_col_for(input, offset)
|
|
160
|
+
line = 1
|
|
161
|
+
col = 1
|
|
162
|
+
i = 0
|
|
163
|
+
while i < offset
|
|
164
|
+
b = input.getbyte(i)
|
|
165
|
+
break if b.nil?
|
|
166
|
+
|
|
167
|
+
if b == 0x0A
|
|
168
|
+
line += 1
|
|
169
|
+
col = 1
|
|
170
|
+
i += 1
|
|
171
|
+
elsif b == 0x0D
|
|
172
|
+
line += 1
|
|
173
|
+
col = 1
|
|
174
|
+
i += 1
|
|
175
|
+
i += 1 if i < offset && input.getbyte(i) == 0x0A
|
|
176
|
+
else
|
|
177
|
+
col += 1
|
|
178
|
+
i += 1
|
|
179
|
+
end
|
|
180
|
+
end
|
|
181
|
+
[line, col]
|
|
182
|
+
end
|
|
183
|
+
|
|
184
|
+
def substantive_text?(text)
|
|
185
|
+
return false if text.nil? || text.empty?
|
|
186
|
+
|
|
187
|
+
stripped = text.dup
|
|
188
|
+
stripped.gsub!(%r{/\*.*?\*/}m, "")
|
|
189
|
+
stripped.gsub!(/^\s*(?:#|\/\/).*$/, "")
|
|
190
|
+
!stripped.strip.empty? && !stripped.strip.match?(/\A(?:```[a-zA-Z0-9_-]*)?\z/) && !stripped.strip.match?(/\A(?:<\/?json>|BEGIN_JSON|END_JSON)\z/i)
|
|
191
|
+
end
|
|
192
|
+
|
|
193
|
+
def warn(handler, type, message, line, col)
|
|
194
|
+
handler.call(Warning.new(type, message, line, col))
|
|
195
|
+
end
|
|
196
|
+
|
|
197
|
+
def candidate_ranges(input)
|
|
198
|
+
ranges = []
|
|
199
|
+
stack = []
|
|
200
|
+
start_pos = nil
|
|
201
|
+
i = 0
|
|
202
|
+
mode = nil
|
|
203
|
+
while i < input.bytesize
|
|
204
|
+
b = input.getbyte(i)
|
|
205
|
+
if mode == :double
|
|
206
|
+
if b == 0x5C
|
|
207
|
+
i += 2
|
|
208
|
+
next
|
|
209
|
+
elsif b == 0x22
|
|
210
|
+
mode = nil
|
|
211
|
+
end
|
|
212
|
+
i += 1
|
|
213
|
+
next
|
|
214
|
+
elsif mode == :single
|
|
215
|
+
if b == 0x5C
|
|
216
|
+
i += 2
|
|
217
|
+
next
|
|
218
|
+
elsif b == 0x27
|
|
219
|
+
mode = nil
|
|
220
|
+
end
|
|
221
|
+
i += 1
|
|
222
|
+
next
|
|
223
|
+
elsif mode == :triple
|
|
224
|
+
if input.byteslice(i, 3) == "'''"
|
|
225
|
+
mode = nil
|
|
226
|
+
i += 3
|
|
227
|
+
else
|
|
228
|
+
i += 1
|
|
229
|
+
end
|
|
230
|
+
next
|
|
231
|
+
elsif mode == :line_comment
|
|
232
|
+
if [0x0A, 0x0D].include?(b)
|
|
233
|
+
mode = nil
|
|
234
|
+
else
|
|
235
|
+
i += 1
|
|
236
|
+
next
|
|
237
|
+
end
|
|
238
|
+
elsif mode == :block_comment
|
|
239
|
+
if input.byteslice(i, 2) == "*/"
|
|
240
|
+
mode = nil
|
|
241
|
+
i += 2
|
|
242
|
+
else
|
|
243
|
+
i += 1
|
|
244
|
+
end
|
|
245
|
+
next
|
|
246
|
+
else
|
|
247
|
+
if input.byteslice(i, 2) == "//"
|
|
248
|
+
mode = :line_comment
|
|
249
|
+
i += 2
|
|
250
|
+
next
|
|
251
|
+
elsif input.byteslice(i, 2) == "/*"
|
|
252
|
+
mode = :block_comment
|
|
253
|
+
i += 2
|
|
254
|
+
next
|
|
255
|
+
elsif b == 0x23
|
|
256
|
+
mode = :line_comment
|
|
257
|
+
i += 1
|
|
258
|
+
next
|
|
259
|
+
elsif b == 0x22
|
|
260
|
+
mode = :double
|
|
261
|
+
i += 1
|
|
262
|
+
next
|
|
263
|
+
elsif input.byteslice(i, 3) == "'''"
|
|
264
|
+
mode = :triple
|
|
265
|
+
i += 3
|
|
266
|
+
next
|
|
267
|
+
elsif b == 0x27
|
|
268
|
+
mode = :single
|
|
269
|
+
i += 1
|
|
270
|
+
next
|
|
271
|
+
elsif [0x7B, 0x5B].include?(b)
|
|
272
|
+
start_pos = i if stack.empty?
|
|
273
|
+
stack << b
|
|
274
|
+
elsif b == 0x7D
|
|
275
|
+
stack.pop if stack.last == 0x7B
|
|
276
|
+
if stack.empty? && start_pos
|
|
277
|
+
ranges << (start_pos...(i + 1))
|
|
278
|
+
start_pos = nil
|
|
279
|
+
end
|
|
280
|
+
elsif b == 0x5D
|
|
281
|
+
stack.pop if stack.last == 0x5B
|
|
282
|
+
if stack.empty? && start_pos
|
|
283
|
+
ranges << (start_pos...(i + 1))
|
|
284
|
+
start_pos = nil
|
|
285
|
+
end
|
|
286
|
+
end
|
|
287
|
+
end
|
|
288
|
+
i += 1
|
|
289
|
+
end
|
|
290
|
+
ranges
|
|
291
|
+
end
|
|
292
|
+
end
|
|
293
|
+
|
|
77
294
|
# Hand-rolled FSM single-pass parser.
|
|
78
295
|
# Layer 1: strict JSON (RFC 8259).
|
|
79
296
|
# Layer 2: JSON5 additions — line/block comments, trailing comma,
|
|
@@ -143,14 +360,9 @@ module SmarterJSON
|
|
|
143
360
|
symbolize_keys: false, # Symbol keys instead of String
|
|
144
361
|
duplicate_key: :last_wins, # :last_wins | :first_wins | :raise
|
|
145
362
|
bigdecimal_load: :auto, # :auto | :float | :bigdecimal (Oj-compatible)
|
|
146
|
-
|
|
363
|
+
on_warning: nil, # a callable invoked once per non-fatal lenient fix (a SmarterJSON::Warning)
|
|
147
364
|
}.freeze
|
|
148
365
|
|
|
149
|
-
# Warnings collected during the parse (empty slots, empty values, dropped duplicate
|
|
150
|
-
# keys). Empty unless the parser was built with warnings: true. Public so the module
|
|
151
|
-
# functions can read it after parse / each_value.
|
|
152
|
-
attr_reader :warnings
|
|
153
|
-
|
|
154
366
|
def initialize(input, options = {})
|
|
155
367
|
raise ArgumentError, "input must be a String" unless input.is_a?(String)
|
|
156
368
|
|
|
@@ -158,8 +370,7 @@ module SmarterJSON
|
|
|
158
370
|
@symbolize_keys = opts[:symbolize_keys]
|
|
159
371
|
@duplicate_key = opts[:duplicate_key]
|
|
160
372
|
@bigdecimal_load = opts[:bigdecimal_load]
|
|
161
|
-
@
|
|
162
|
-
@warnings = []
|
|
373
|
+
@on_warning = opts[:on_warning]
|
|
163
374
|
|
|
164
375
|
encoding = opts[:encoding]
|
|
165
376
|
@input = encoding ? input.dup.force_encoding(encoding) : input
|
|
@@ -263,7 +474,7 @@ module SmarterJSON
|
|
|
263
474
|
# Commas are collapsing separators inside a container: an empty slot (leading,
|
|
264
475
|
# interior, or trailing comma) adds nothing. Skip it; the next iteration reads
|
|
265
476
|
# the following value/key or the closing bracket.
|
|
266
|
-
warn(:empty_slot, "extra comma — collapsed an empty slot")
|
|
477
|
+
warn(:empty_slot, "extra comma — collapsed an empty slot") if @on_warning && !vss
|
|
267
478
|
vss = false
|
|
268
479
|
advance(1)
|
|
269
480
|
elsif cur_obj
|
|
@@ -300,7 +511,7 @@ module SmarterJSON
|
|
|
300
511
|
elsif [RBRACE, COMMA].include?(b)
|
|
301
512
|
# key with a colon but no value -> null (don't consume } or ,; the loop does)
|
|
302
513
|
store_member(cur, key, nil)
|
|
303
|
-
warn(:empty_value, "key #{key.inspect} had no value — used null")
|
|
514
|
+
warn(:empty_value, "key #{key.inspect} had no value — used null") if @on_warning
|
|
304
515
|
vss = true
|
|
305
516
|
elsif b.nil?
|
|
306
517
|
raise error("unexpected end of input")
|
|
@@ -573,7 +784,7 @@ module SmarterJSON
|
|
|
573
784
|
if hash.key?(k)
|
|
574
785
|
raise error("duplicate key #{k.inspect}") if @duplicate_key == :raise
|
|
575
786
|
|
|
576
|
-
warn(:duplicate_key, "duplicate key #{k.inspect} — #{@duplicate_key}")
|
|
787
|
+
warn(:duplicate_key, "duplicate key #{k.inspect} — #{@duplicate_key}") if @on_warning
|
|
577
788
|
return if @duplicate_key == :first_wins
|
|
578
789
|
end
|
|
579
790
|
hash[k] = value
|
|
@@ -933,11 +1144,14 @@ module SmarterJSON
|
|
|
933
1144
|
value
|
|
934
1145
|
end
|
|
935
1146
|
|
|
936
|
-
#
|
|
1147
|
+
# Report a non-fatal lenient fix to the on_warning callable. The call-site guards
|
|
1148
|
+
# (`if @on_warning`) keep the message string from being built on the fast path; this
|
|
1149
|
+
# internal guard is the safety net so a forgotten call-site guard can't crash a
|
|
1150
|
+
# handler-less caller.
|
|
937
1151
|
def warn(type, message)
|
|
938
|
-
return unless @
|
|
1152
|
+
return unless @on_warning
|
|
939
1153
|
|
|
940
|
-
@
|
|
1154
|
+
@on_warning.call(Warning.new(type, message, @line, @col))
|
|
941
1155
|
end
|
|
942
1156
|
|
|
943
1157
|
def error(message)
|
data/lib/smarter_json/version.rb
CHANGED
data/lib/smarter_json/warning.rb
CHANGED
|
@@ -3,8 +3,8 @@
|
|
|
3
3
|
module SmarterJSON
|
|
4
4
|
# A non-fatal thing the parser worked around while staying lenient — e.g. an empty
|
|
5
5
|
# comma slot it collapsed, a key with no value it read as null, or a duplicate key
|
|
6
|
-
# it dropped.
|
|
7
|
-
#
|
|
6
|
+
# it dropped. Passed to the on_warning: callable (when process / process_file is given
|
|
7
|
+
# one) once per fix; otherwise the parser stays silent and builds no Warning at all.
|
|
8
8
|
#
|
|
9
9
|
# type — a Symbol you can branch on (:empty_slot, :empty_value, :duplicate_key)
|
|
10
10
|
# message — human-readable description
|
metadata
CHANGED
|
@@ -1,13 +1,13 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: smarter_json
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.8.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Tilo Sloboda
|
|
8
8
|
bindir: exe
|
|
9
9
|
cert_chain: []
|
|
10
|
-
date: 2026-06-
|
|
10
|
+
date: 2026-06-03 00:00:00.000000000 Z
|
|
11
11
|
dependencies:
|
|
12
12
|
- !ruby/object:Gem::Dependency
|
|
13
13
|
name: bigdecimal
|