smarter_json 0.9.9 β 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/CHANGELOG.md +25 -10
- data/README.md +74 -21
- data/docs/_introduction.md +1 -1
- data/docs/basic_read_api.md +22 -0
- data/docs/basic_write_api.md +1 -1
- data/docs/examples.md +22 -0
- data/docs/options.md +8 -7
- data/ext/smarter_json/smarter_json.c +27 -7
- data/lib/smarter_json/generator.rb +100 -65
- data/lib/smarter_json/options.rb +16 -3
- data/lib/smarter_json/parser.rb +76 -2
- data/lib/smarter_json/version.rb +1 -1
- metadata +14 -8
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: ca54f0d032730a5606b5a5d0fe4091c62c2d1ef70ebce6dd9283c91bb288a6b2
|
|
4
|
+
data.tar.gz: b4d859ef69ca0fa1935271ba99027f9d132ad8e558579da2b37d6bb542fc5076
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: c4c474e948779c8541fe7d6bc7a572cf1a131aea53dff2436ac0068228c777d020dd3c6e946b014e80bedbca03aec93fa35f8189294fb070935fb12db06fee24
|
|
7
|
+
data.tar.gz: 77082d230f4cccd2f913bb4a5508f8b15551230fde2c78b341ad71b08700b8a67bd1030a242ab3bd6f70252c66300d2fce7099964b5cf883229bbc73303305a7
|
data/.gitignore
CHANGED
data/CHANGELOG.md
CHANGED
|
@@ -1,20 +1,35 @@
|
|
|
1
1
|
|
|
2
2
|
# SmarterJSON Change Log
|
|
3
3
|
|
|
4
|
-
>
|
|
5
|
-
|
|
6
|
-
> β οΈ **Interface change (since 0.9.7):**
|
|
4
|
+
> β οΈ SmarterJSON **always returns an `Array`** of documents.
|
|
7
5
|
>
|
|
8
|
-
> `SmarterJSON.process` / `SmarterJSON.process_file`
|
|
9
|
-
>
|
|
10
|
-
>
|
|
11
|
-
>
|
|
12
|
-
|
|
13
|
-
Going forward this will be the supported interface.
|
|
6
|
+
> `SmarterJSON.process` / `SmarterJSON.process_file` return:
|
|
7
|
+
>
|
|
8
|
+
> β `[]` for no doc
|
|
9
|
+
> - `[doc]` for one doc
|
|
10
|
+
> - `[d1, d2, β¦]` for several docs (NDJSON / JSONL / concatenated docs)
|
|
14
11
|
|
|
15
|
-
> β οΈ We discourage the use of `process(input).first` / `[0]` because it silently drops potential additional documents
|
|
12
|
+
> β οΈ We discourage the use of `process(input).first` / `process(input)[0]` because it silently drops potential additional documents
|
|
16
13
|
> Please use `process_one` if you are expecting only one JSON doc, e.g. in API payloads.
|
|
17
14
|
|
|
15
|
+
## 1.1.0 (2026-06-09)
|
|
16
|
+
|
|
17
|
+
RSpec tests: 1,038 β 1,070
|
|
18
|
+
|
|
19
|
+
- New `SmarterJSON.foreach(source)` β the streaming, composable sibling of `process_file`. `source` is a file path or an IO (a socket, `StringIO`, open `File`). Without a block it returns a plain `Enumerator` (like `CSV.foreach`) that reads one document at a time, never loading the whole file, so a large NDJSON / JSONL stream can be filtered or transformed with `.select` / `.map` / `.lazy` / `.first`; with a block it streams and returns the document count, like `process_file`.
|
|
20
|
+
|
|
21
|
+
## 1.0.0 (2026-06-08)
|
|
22
|
+
|
|
23
|
+
RSpec tests: 1,038
|
|
24
|
+
|
|
25
|
+
- **The public interface is now stable** β `process`, `process_one`, `process_file`, `generate`, and the documented options; semantic versioning from here on.
|
|
26
|
+
- Unknown or wrongly-typed options now raise `ArgumentError` instead of being silently ignored, so a typo (e.g. `symbolize_names:` instead of `symbolize_keys:`) is caught immediately.
|
|
27
|
+
- Input tagged `ASCII-8BIT` whose bytes are valid UTF-8 (e.g. a `Net::HTTP` `response.body`) is now read as UTF-8, so its string values compare equal to UTF-8 literals; ASCII-8BIT input that is not valid UTF-8 raises `SmarterJSON::EncodingError` (pass an explicit `encoding:` for legacy encodings).
|
|
28
|
+
- Object keys may now use smart/curly quotes too (e.g. JSON pasted from a word processor), not just string values.
|
|
29
|
+
- `SmarterJSON.generate` accepts `allow_nan: true` to emit `NaN` / `Infinity` / `-Infinity` (JSON5-style) instead of raising, so non-finite numbers round-trip; the default still raises.
|
|
30
|
+
- A numeric literal that overflows `Float` range (e.g. `1e400`) now reports a `:number_overflow` warning via `on_warning` instead of silently becoming `Infinity`.
|
|
31
|
+
- `SmarterJSON.generate` is now iterative (like the parser), so serializing a deeply nested structure no longer risks `SystemStackError` β reading and writing are both depth-safe.
|
|
32
|
+
|
|
18
33
|
## 0.9.9 (2026-06-07)
|
|
19
34
|
- Much faster pure-Ruby parsing (the path used without the C extension) β roughly 3Γ on string-heavy data, ~2Γ on number-heavy, ~1.7Γ on object-heavy (on a YJIT-enabled Ruby). Parsed values are unchanged.
|
|
20
35
|
|
data/README.md
CHANGED
|
@@ -2,10 +2,21 @@
|
|
|
2
2
|
|
|
3
3
|
 [](https://codecov.io/gh/tilo/smarter_json) <!-- [](https://rubygems.org/gems/smarter_json) --> [](https://rubygems.org/gems/smarter_json) [](https://www.ruby-toolbox.com/projects/smarter_json)
|
|
4
4
|
|
|
5
|
-
A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write β and in benchmarks it matches or beats Oj on every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Traditional JSON parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
|
|
5
|
+
A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON, JSONL, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write β and in benchmarks it matches or beats Oj on every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Traditional JSON parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
|
|
6
6
|
|
|
7
7
|
> **SmarterJSON: one tool, no modes β want strict? Please use the stdlib `json` gem.**
|
|
8
8
|
|
|
9
|
+
## Features at a glance
|
|
10
|
+
|
|
11
|
+
- **Reads the whole human-JSON superset, no modes or flags** β strict JSON, NDJSON, JSONL, JSON5, HJSON, JSONC, plus comments, trailing commas, unquoted / single / triple / smart quotes, an implicit root object, `NaN` / `Infinity` / hex / underscores, Python & JavaScript literals, a UTF-8 BOM, mixed line endings, and any Ruby encoding (see [What it accepts](#what-it-accepts-beyond-strict-json) for the full list).
|
|
12
|
+
- **Every document from multi-document input, in one call** β `process` returns an `Array` of all of them; `process_one` returns the single value and warns if there was more than one (never raises; routed to `on_warning`, else `Rails.logger`, else `Kernel.warn`).
|
|
13
|
+
- **Streaming in bounded memory** β pass a block, or use `foreach(path_or_io)` for a composable `Enumerator` you can `.select` / `.map` / `.lazy` over.
|
|
14
|
+
- **Recovers JSON from LLM / markdown noise** β strips markdown code fences, surrounding prose, and `<json>` tags, and pulls every payload out of one messy blob.
|
|
15
|
+
- **Writes JSON too** β `generate` with pretty-printing, NDJSON, `sort_keys`, `ascii_only`, `script_safe`, `allow_nan`, and `coerce` (via `as_json`); iterative, so deeply nested data is depth-safe.
|
|
16
|
+
- **Keeps number precision** β `BigDecimal` by default (Oj-compatible), or `:float` / `:auto`.
|
|
17
|
+
- **Transparent leniency** β pass an optional `on_warning` callback to be handed every lenient fix (an empty slot collapsed, a duplicate key dropped, a code fence stripped, β¦); with no handler the parser stays silent and adds zero overhead.
|
|
18
|
+
- **Fast, and runs everywhere** β a C extension that matches or beats Oj, with a pure-Ruby fallback for platforms that can't build it. Stable, semantically versioned, thread-safe, Ruby 2.6+.
|
|
19
|
+
|
|
9
20
|
## Why SmarterJSON?
|
|
10
21
|
|
|
11
22
|
**Are you tired of seeing errors like these?**
|
|
@@ -40,7 +51,7 @@ A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON, JSON5,
|
|
|
40
51
|
Traditional JSON parsers reject anything that isn't perfectly strict JSON. That means your code breaks on malformed data.
|
|
41
52
|
|
|
42
53
|
SmarterJSON is built on the opposite principle: **you shouldn't have to care what flavor of JSON you were handed** and **you shouldn't lose the whole document because of formatting errors.**
|
|
43
|
-
Give it strict JSON, NDJSON, JSON5, an HJSON-style config file, LLM-generated JSON, or a copy-pasted blob with comments and trailing commas β it just extracts the data from it.
|
|
54
|
+
Give it strict JSON, NDJSON, JSONL, JSON5, an HJSON-style config file, LLM-generated JSON, or a copy-pasted blob with comments and trailing commas β it just extracts the data from it.
|
|
44
55
|
When it is lenient, `smarter_json` isn't dropping data that exists β it's just not raising an eyebrow at a suspicious gap (like an extra comma).
|
|
45
56
|
|
|
46
57
|
A strict parser would refuse the whole document and recover nothing; `smarter_json` returns everything except the formatting error.
|
|
@@ -62,7 +73,7 @@ Three things set it apart:
|
|
|
62
73
|
- Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''β¦'''`), and quoteless string values
|
|
63
74
|
- Implicit root object β a config file that starts with `key: value`, no outer `{}`
|
|
64
75
|
- `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
|
|
65
|
-
- UTF-8 BOM, smart/curly quotes, Python literals (`True` / `False` / `None`), JavaScript `undefined`
|
|
76
|
+
- UTF-8 BOM, smart/curly quotes (in keys and values), Python literals (`True` / `False` / `None`), JavaScript `undefined`
|
|
66
77
|
- Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via `encoding:`)
|
|
67
78
|
- Duplicate keys (last value wins by default; configurable)
|
|
68
79
|
|
|
@@ -73,13 +84,15 @@ It raises only on genuinely unreadable input (unterminated string, mismatched br
|
|
|
73
84
|
The lenient grammar is a superset of these human-JSON specs β listed once, here:
|
|
74
85
|
|
|
75
86
|
* [JSON5](https://json5.org/)
|
|
76
|
-
* [HJSON](https://hjson.github.io/)
|
|
87
|
+
* [HJSON](https://hjson.github.io/) <sup>β </sup>
|
|
77
88
|
* [JWCC / HuJSON](https://github.com/tailscale/hujson)
|
|
78
89
|
* [Nigel Tao](https://nigeltao.github.io/blog/2021/json-with-commas-comments.html)
|
|
79
90
|
* [JSONH](https://github.com/jsonh-org/Jsonh)
|
|
80
91
|
* [JSONC (VS Code)](https://jsonc.org/)
|
|
81
92
|
* [NDJSON / JSON Text Sequences (RFC 7464)](https://datatracker.ietf.org/doc/html/rfc7464).
|
|
82
93
|
|
|
94
|
+
<sup>β </sup> A deliberate subset. SmarterJSON's quoteless (unquoted) string values are single-line β it does **not** parse HJSON's unquoted multi-line strings; use a quoted or triple-quoted (`'''β¦'''`) string for multiline. This is by design: SmarterJSON is one deterministic, no-modes superset of the JSON-family dialects (JSON5 / HJSON / JSONC / β¦), so it adopts a feature only where it does not conflict with the others β and an unquoted string that may span newlines collides with newline-as-a-document-separator (NDJSON, implicit-root config), so it is left out.
|
|
95
|
+
|
|
83
96
|
## Installation
|
|
84
97
|
|
|
85
98
|
```ruby
|
|
@@ -130,7 +143,7 @@ See [Examples](#examples) below for multi-document input, streaming, and recover
|
|
|
130
143
|
|
|
131
144
|
## Stable interface & thread safety
|
|
132
145
|
|
|
133
|
-
The public interface is
|
|
146
|
+
The public interface is: `SmarterJSON.process`, `SmarterJSON.process_one`, `SmarterJSON.process_file`, `SmarterJSON.foreach`, `SmarterJSON.generate`, and the documented options in this README/docs are the supported surface. `SmarterJSON.process` and `SmarterJSON.process_file` always return an `Array` of documents; `process_one` returns the single document's value (or `nil`), and emits a warning if there is more than one doc.
|
|
134
147
|
|
|
135
148
|
Concurrent calls are safe. The processor and generator keep per-call state local, and the C extension only caches Ruby IDs / constants at load time; it does not share mutable state across calls.
|
|
136
149
|
|
|
@@ -165,26 +178,26 @@ SmarterJSON is a C extension (with a pure-Ruby fallback that runs everywhere). B
|
|
|
165
178
|
- **None of the others parse JSON5, HJSON-style config, or LLM-wrapped output.** Comments, trailing commas, unquoted keys, quoteless values, `'single quotes'`, markdown code fences, prose wrappers β all raise in Oj / `json` / Yajl; SmarterJSON parses them.
|
|
166
179
|
- **`json` and Yajl produce `Float` only β lossy on high-precision numbers.** On coordinate / scientific data (>16 significant digits) they silently round to `Float`, so they aren't a like-for-like comparison there. SmarterJSON (and Oj) keep full precision as `BigDecimal` by default.
|
|
167
180
|
|
|
168
|
-
Where a like-for-like comparison exists, here is SmarterJSON's C path against each parser. **Apple M4, Ruby 3.4.7, p10 of 40 runs.** Each cell is **SmarterJSON vs that parser** β "faster" means SmarterJSON wins. Ratios shift with hardware; run `rake report` in `json_benchmarks/` to reproduce.
|
|
181
|
+
Where a like-for-like comparison exists, here is SmarterJSON's C path against each parser. **Apple M4, Ruby 3.4.7, p10 of 40 runs (2026-06-07); the same picture holds on an Apple M1 Max.** Each cell is **SmarterJSON vs that parser** β "faster" means SmarterJSON wins. Ratios shift with hardware; run `rake report` in `json_benchmarks/` to reproduce.
|
|
169
182
|
|
|
170
183
|
| File | vs Oj/strict | vs `json` | vs Yajl |
|
|
171
184
|
| ----------------------------- | --------------- | ---------------------------- | --------------- |
|
|
172
|
-
| big_decimals <sup>β </sup> | **1.
|
|
173
|
-
| canada <sup>β </sup> | **
|
|
174
|
-
| citm_catalog | **1.
|
|
175
|
-
| citylots <sup>β </sup> | **3.
|
|
176
|
-
| config.jsonc | **1.1Γ faster** | 1.
|
|
177
|
-
| deeply_nested | **1.
|
|
178
|
-
| github_events |
|
|
185
|
+
| big_decimals <sup>β </sup> | **1.7Γ faster** | β tied | **1.2Γ faster** |
|
|
186
|
+
| canada <sup>β </sup> | **7Γ faster** | β tied | **2.1Γ faster** |
|
|
187
|
+
| citm_catalog | **1.3Γ faster** | 1.2Γ slower | **3.2Γ faster** |
|
|
188
|
+
| citylots <sup>β </sup> | **3.7Γ faster** | **2.0Γ faster** | **2.3Γ faster** |
|
|
189
|
+
| config.jsonc | **1.1Γ faster** | 1.2Γ slower | **3.6Γ faster** |
|
|
190
|
+
| deeply_nested | **1.2Γ faster** | **can't parse** <sup>β‘</sup> | **4.1Γ faster** |
|
|
191
|
+
| github_events | β tied | 1.1Γ slower | **2.7Γ faster** |
|
|
179
192
|
| string_array | β tied | β tied | **1.6Γ faster** |
|
|
180
|
-
| twitter | **1.
|
|
181
|
-
| usgs_earthquakes <sup>β </sup> | **1.
|
|
182
|
-
| weather_berlin | **1.
|
|
193
|
+
| twitter | **1.3Γ faster** | 1.2Γ slower | **3.2Γ faster** |
|
|
194
|
+
| usgs_earthquakes <sup>β </sup> | **1.4Γ faster** | 1.1Γ slower | **3.4Γ faster** |
|
|
195
|
+
| weather_berlin | **1.8Γ faster** | **1.1Γ faster** | **3.2Γ faster** |
|
|
183
196
|
|
|
184
197
|
<sup>β </sup> High-precision file. The row uses `decimal_precision: :float` (Float, like-for-like) for `canada` / `citylots` / `big_decimals` / `usgs`. SmarterJSON's **default** `:auto` keeps these decimals as `BigDecimal` (no precision loss, like Oj's default) β intrinsically slower than `Float`, so default-vs-`Float` would be apples-to-oranges. Against Oj's matching `BigDecimal` default, SmarterJSON is faster there too.
|
|
185
198
|
<sup>β‘</sup> Not a measurement gap β `json` raises by default: it errors on multi-document / NDJSON input without a block, and caps nesting at 100 levels. SmarterJSON has neither limit.
|
|
186
199
|
|
|
187
|
-
In short: **matches or beats Oj/strict on every file** β
|
|
200
|
+
In short: **SmarterJSON's C path matches or beats Oj/strict on every file** (apples-to-apples β for the high-precision <sup>β </sup> files that means `decimal_precision: :float`, where Oj/strict also produces `Float`; with `:float`, float-heavy data like `canada` is **~7Γ faster**). It is **far faster than Yajl everywhere**, and **level-to-ahead of stdlib `json`** β `json` edges ahead only on a few object-heavy files (`citm`, `twitter`, `config.jsonc`, `github_events`, all within ~1.25Γ) and **can't parse `deeply_nested` at all**. Floats are decoded with the **Eisel-Lemire** algorithm (fast_float), correctly rounded and **bit-for-bit identical to `JSON.parse`** β fast *and* exact, even at full double precision.
|
|
188
201
|
|
|
189
202
|
**Two notes on fair comparison:**
|
|
190
203
|
|
|
@@ -200,8 +213,8 @@ In short: **matches or beats Oj/strict on every file** β `string_array` is the
|
|
|
200
213
|
| `duplicate_key` | `:last_wins` | `:last_wins` / `:first_wins` for a key repeated in one object (every repeat is also reported via `on_warning`) |
|
|
201
214
|
| `decimal_precision` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
|
|
202
215
|
| `acceleration` | `true` | `true` uses the C extension when compiled and loadable; `false` forces pure Ruby (identical results) |
|
|
203
|
-
| `encoding` | `
|
|
204
|
-
| `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
|
|
216
|
+
| `encoding` | `nil` | labels the input's encoding; `nil` keeps the input's own (no transcoding pass; see below) |
|
|
217
|
+
| `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`, `:number_overflow`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
|
|
205
218
|
|
|
206
219
|
## Examples
|
|
207
220
|
|
|
@@ -243,6 +256,46 @@ For input larger than memory, pass a block: each document is yielded as it is re
|
|
|
243
256
|
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
244
257
|
```
|
|
245
258
|
|
|
259
|
+
**Try it on a file you already have.** SmarterJSON reads **NDJSON / JSONL natively** β and Claude Code stores every session as a JSONL transcript (`~/.claude/projects/<project>/<session-id>.jsonl`, one JSON document per line). Walk yours, one record at a time:
|
|
260
|
+
|
|
261
|
+
```ruby
|
|
262
|
+
require "awesome_print" # optional β readable nested output
|
|
263
|
+
|
|
264
|
+
SmarterJSON.process_file("#{Dir.home}/.claude/projects/<project>/<session-id>.jsonl") do |entry|
|
|
265
|
+
ap entry # each line is a full document β a message, a tool call, a result, β¦
|
|
266
|
+
puts "-" * 80
|
|
267
|
+
end
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### Filtering and rewriting a large file (`foreach`)
|
|
271
|
+
|
|
272
|
+
`SmarterJSON.foreach(source)` is the composable sibling of `process_file`. `source` is a file path or any IO (a socket, a `StringIO`, an open `File`). With no block it returns a plain `Enumerator` (like `CSV.foreach`) that reads one document at a time, so you can chain `.select` / `.map` and friends. Add `.lazy` to keep the whole chain bounded in memory, even when the filtered set is large:
|
|
273
|
+
|
|
274
|
+
```ruby
|
|
275
|
+
# Keep only the user/assistant turns of a transcript β one document in memory at a time
|
|
276
|
+
SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
|
|
277
|
+
.lazy
|
|
278
|
+
.select { |doc| %w[user assistant].include?(doc[:type]) }
|
|
279
|
+
.each { |doc| puts doc[:text] }
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
Because it streams both ends, you can **filter a big file down and rewrite it** without ever loading the whole thing:
|
|
283
|
+
|
|
284
|
+
```ruby
|
|
285
|
+
File.open("filtered.jsonl", "w") do |out|
|
|
286
|
+
SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
|
|
287
|
+
.lazy
|
|
288
|
+
.select { |doc| %w[user assistant].include?(doc[:type]) }
|
|
289
|
+
.each { |doc| out.puts SmarterJSON.generate(doc) }
|
|
290
|
+
end
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
Pass an IO instead of a path to stream straight from a socket or an HTTP response body β anything `IO`-like works (an IO is single-pass, read once):
|
|
294
|
+
|
|
295
|
+
```ruby
|
|
296
|
+
SmarterJSON.foreach(response_io).each { |event| handle(event) }
|
|
297
|
+
```
|
|
298
|
+
|
|
246
299
|
### Recovering JSON from LLM / markdown noise
|
|
247
300
|
|
|
248
301
|
When the payload is wrapped in markdown fences, surrounding prose, or tags, `process` (or `process_one` for a single payload) strips the wrapper and reads what's inside. (Clean JSON never pays for this β recovery only runs when a straight read fails.)
|
|
@@ -295,11 +348,11 @@ TEXT
|
|
|
295
348
|
|
|
296
349
|
## Encoding
|
|
297
350
|
|
|
298
|
-
`encoding:` (default `
|
|
351
|
+
`encoding:` (default `nil`) labels what the input is β it does **not** transcode. With `nil`, SmarterJSON keeps the input's own encoding tag and emits string values with that same tag, the way `smarter_csv` does β **with one smart default:** input tagged `ASCII-8BIT` (BINARY) whose bytes are valid UTF-8 is treated as UTF-8. That is exactly how `Net::HTTP` and many HTTP libraries hand you a `response.body` (correct UTF-8 bytes, BINARY tag); without this, string values would come back tagged `ASCII-8BIT` and compare unequal to UTF-8 literals. If such `ASCII-8BIT` input is *not* valid UTF-8, it raises `SmarterJSON::EncodingError` rather than guess a legacy encoding β pass an explicit `encoding:` (e.g. `"ISO-8859-1"`) for that. Bytes invalid for an explicitly claimed encoding also raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`).
|
|
299
352
|
|
|
300
353
|
## Nesting & untrusted input
|
|
301
354
|
|
|
302
|
-
Both the C extension and the pure-Ruby engine are **iterative, not recursive** β they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) is handled without issue.
|
|
355
|
+
Both the C extension and the pure-Ruby engine are **iterative, not recursive** β they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) is handled without issue. **`generate` is iterative too**, so serializing a deeply nested Ruby structure can't overflow the stack either β reading *and* writing are both depth-safe.
|
|
303
356
|
|
|
304
357
|
The trade-off: there is currently **no fixed nesting or input-size limit**, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you process untrusted input and want a hard cap, that's a planned opt-in guard β for now, size-limit upstream.
|
|
305
358
|
|
data/docs/_introduction.md
CHANGED
|
@@ -23,7 +23,7 @@ Most JSON parsers reject anything that isn't perfectly strict JSON, and they mak
|
|
|
23
23
|
|
|
24
24
|
* **It reads multi-document input automatically β a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: it always returns an `Array` of the documents found (`[]` / `[doc]` / `[d1, d2, β¦]`). For the common single-document case, `SmarterJSON.process_one` returns the one value directly (and warns, never raises, if there was more than one). The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON reads multi-document input via plain `process` β Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time. See [The Basic Read API](./basic_read_api.md).
|
|
25
25
|
|
|
26
|
-
* **It's fast.**
|
|
26
|
+
* **It's fast.** The C extension (with a pure-Ruby fallback that runs everywhere) is **faster than Oj/strict on every file** in our benchmark suite β up to **~7Γ faster on float-heavy data** with `decimal_precision: :float` β **far faster than Yajl**, and **level-to-ahead of the stdlib `json` C parser**, which can't even parse deeply-nested input. Floats are decoded with the **Eisel-Lemire** algorithm (fast_float), correctly rounded and bit-for-bit identical to `JSON.parse`, so number-heavy data is fast and exact. Full per-file numbers (Apple M4 / M1, relative ratios) are in the [README Performance section](../README.md#performance).
|
|
27
27
|
|
|
28
28
|
* **It writes JSON too.** `SmarterJSON.generate` turns Ruby values into strict, interoperable JSON β or into NDJSON, one element per line, the exact inverse of reading NDJSON back into an Array. See [The Basic Write API](./basic_write_api.md).
|
|
29
29
|
|
data/docs/basic_read_api.md
CHANGED
|
@@ -103,6 +103,28 @@ SmarterJSON.process(io) { |doc| handle(doc) }
|
|
|
103
103
|
|
|
104
104
|
The streaming path now frames whole top-level documents, not just one line at a time. That means NDJSON / JSONL still work, but pretty-printed multi-line objects and arrays work too, as do mixed `\n` / `\r\n` / `\r` line endings and comment-only separators between documents.
|
|
105
105
|
|
|
106
|
+
## `SmarterJSON.foreach` β stream a file or IO, composably
|
|
107
|
+
|
|
108
|
+
`foreach` is the composable sibling of `process_file`. Its argument is a **file path or any IO** (a socket, a `StringIO`, an open `File`); a String is always a path, never content.
|
|
109
|
+
|
|
110
|
+
With a block it behaves exactly like the block form above β streams each document, returns the **document count**. Without a block it returns a plain `Enumerator` (like `CSV.foreach` β **not** an `Enumerator::Lazy`), so `.map` / `.select` return Arrays the usual way, and you can chain over the stream:
|
|
111
|
+
|
|
112
|
+
```ruby
|
|
113
|
+
SmarterJSON.foreach("events.ndjson").each { |event| EventJob.perform_async(event) } # like the block form
|
|
114
|
+
SmarterJSON.foreach("events.ndjson").select { |e| e["level"] == "error" } # => an Array of the matches
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
It reads one document at a time, so `foreach(path).first(3)` only reads ~3 documents off disk, and `.next` pulls them one by one. `.map` / `.select` read the source lazily but still build an Array of their *result*; to keep a whole pipeline bounded end to end (a large filtered set off a fat file), add `.lazy` at the call site:
|
|
118
|
+
|
|
119
|
+
```ruby
|
|
120
|
+
SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
|
|
121
|
+
.lazy
|
|
122
|
+
.select { |doc| %w[user assistant].include?(doc[:type]) }
|
|
123
|
+
.each { |doc| puts doc[:text] }
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Options are validated eagerly β a bad option key or value raises immediately, before any iteration. An **IO source is single-pass** (an IO can only be read once), so iterating the returned Enumerator a second time over the same IO yields nothing; a path-backed `foreach` re-opens the file and is re-iterable.
|
|
127
|
+
|
|
106
128
|
## The C extension and the pure-Ruby fallback
|
|
107
129
|
|
|
108
130
|
By default (`acceleration: true`) the C extension is used when it is compiled and loadable (`SmarterJSON::HAS_ACCELERATION` is then `true`); otherwise the pure-Ruby implementation runs and produces identical results. Pass `acceleration: false` to force the pure-Ruby path. See [Configuration Options](./options.md).
|
data/docs/basic_write_api.md
CHANGED
|
@@ -58,7 +58,7 @@ SmarterJSON.generate(Float::INFINITY) # raises SmarterJSON::GenerateError β
|
|
|
58
58
|
SmarterJSON.generate(Float::NAN) # raises SmarterJSON::GenerateError β non-finite Float
|
|
59
59
|
```
|
|
60
60
|
|
|
61
|
-
(`GenerateError` is a kind of `SmarterJSON::Error`, so `rescue SmarterJSON::Error` catches it. `Infinity` and `NaN` are accepted on the *read* side as a leniency,
|
|
61
|
+
(`GenerateError` is a kind of `SmarterJSON::Error`, so `rescue SmarterJSON::Error` catches it. `Infinity` and `NaN` are accepted on the *read* side as a leniency; to *write* them, pass `allow_nan: true` and they're emitted as `NaN` / `Infinity` / `-Infinity` (JSON5-style, so SmarterJSON reads them back) β otherwise non-finite values raise, since they aren't valid strict JSON.)
|
|
62
62
|
|
|
63
63
|
By default `generate` is strict: it only writes the types above and raises on anything else. To serialize `Time`, `Date`, or your own objects, pass `coerce: true` β an unsupported value is then converted by its own `as_json` (whose result is re-emitted, so escaping/`indent`/`sort_keys` still apply) or, failing that, `to_json` (spliced verbatim):
|
|
64
64
|
|
data/docs/examples.md
CHANGED
|
@@ -83,6 +83,28 @@ For input larger than memory, pass a block. Each recovered document is yielded o
|
|
|
83
83
|
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
84
84
|
```
|
|
85
85
|
|
|
86
|
+
**A JSONL file you already have:** Claude Code stores each session as a JSONL transcript β `~/.claude/projects/<project>/<session-id>.jsonl`, one JSON document per line (a message, a tool call, a result, β¦). It reads the same way, one record at a time:
|
|
87
|
+
|
|
88
|
+
```ruby
|
|
89
|
+
require "awesome_print" # optional β for readable nested output
|
|
90
|
+
|
|
91
|
+
SmarterJSON.process_file("#{Dir.home}/.claude/projects/<project>/<session-id>.jsonl") do |entry|
|
|
92
|
+
ap entry # each line is a full document
|
|
93
|
+
puts "-" * 80
|
|
94
|
+
end
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**Filter and rewrite as a stream β `SmarterJSON.foreach`:** `foreach(source)` is the composable sibling of `process_file`; `source` is a file path or any IO (a socket, a `StringIO`, an open `File`). Without a block it returns a plain `Enumerator` (like `CSV.foreach`) that reads one document at a time, so it chains with `.select` / `.map`; add `.lazy` to keep the whole pipeline bounded in memory. This filters a transcript down to its user/assistant turns and writes a smaller file, never loading all of it:
|
|
98
|
+
|
|
99
|
+
```ruby
|
|
100
|
+
File.open("filtered.jsonl", "w") do |out|
|
|
101
|
+
SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
|
|
102
|
+
.lazy
|
|
103
|
+
.select { |doc| %w[user assistant].include?(doc[:type]) }
|
|
104
|
+
.each { |doc| out.puts SmarterJSON.generate(doc) }
|
|
105
|
+
end
|
|
106
|
+
```
|
|
107
|
+
|
|
86
108
|
### Example 6: Symbolize Keys
|
|
87
109
|
|
|
88
110
|
```ruby
|
data/docs/options.md
CHANGED
|
@@ -13,7 +13,7 @@
|
|
|
13
13
|
|
|
14
14
|
## Reading
|
|
15
15
|
|
|
16
|
-
These options are passed to [`SmarterJSON.process`](./basic_read_api.md), `SmarterJSON.process_one`, and `SmarterJSON.process_file` as the second argument; anything you set overrides the defaults below.
|
|
16
|
+
These options are passed to [`SmarterJSON.process`](./basic_read_api.md), `SmarterJSON.process_one`, and `SmarterJSON.process_file` as the second argument; anything you set overrides the defaults below. Configuration is strict: an unknown option key, or a known key given the wrong type or value, raises `ArgumentError` immediately β leniency applies to the JSON *input* you parse, not to the options you pass, so a typo (e.g. `symbolize_names:` instead of `symbolize_keys:`) is caught right away.
|
|
17
17
|
|
|
18
18
|
| Option | Default | Explanation |
|
|
19
19
|
|-------------------|--------------|------------------------------------------------------------------------------------------------------------------------|
|
|
@@ -43,11 +43,11 @@ warns.map(&:type) # => [:empty_slot]
|
|
|
43
43
|
warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
|
|
44
44
|
```
|
|
45
45
|
|
|
46
|
-
The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`),
|
|
46
|
+
The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), `:duplicate_key` (a repeated key that was dropped), and `:number_overflow` (a numeric literal too large for `Float`, e.g. `1e400`, collapsed to `Infinity`), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
|
|
47
47
|
|
|
48
48
|
### A note on `:encoding`
|
|
49
49
|
|
|
50
|
-
`:encoding` labels what the input *is* β it does not transcode.
|
|
50
|
+
`:encoding` labels what the input *is* β it does not transcode. With the default `nil`, SmarterJSON keeps the input's own encoding tag and emits string values with that tag, the same way `smarter_csv` handles encodings β **with one smart default:** input tagged `ASCII-8BIT` (BINARY) that is valid UTF-8 is treated as UTF-8. This is how `Net::HTTP` returns a `response.body`; without it, those string values would compare unequal to UTF-8 literals. `ASCII-8BIT` input that is *not* valid UTF-8 raises `SmarterJSON::EncodingError` β pass an explicit `:encoding` (e.g. `"ISO-8859-1"`) for genuinely-legacy bytes. Bytes invalid for an explicitly claimed encoding also raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`). A UTF-8 BOM is handled automatically; UTF-16 / UTF-32 input is out of scope.
|
|
51
51
|
|
|
52
52
|
### A note on `:decimal_precision`
|
|
53
53
|
|
|
@@ -59,14 +59,15 @@ These options are passed to [`SmarterJSON.generate`](./basic_write_api.md) as th
|
|
|
59
59
|
|
|
60
60
|
| Option | Default | Explanation |
|
|
61
61
|
|------------|---------|-----------------------------------------------------------------------------------------------------------------------------|
|
|
62
|
+
| `:allow_nan` | `false` | When `true`, non-finite `Float`/`BigDecimal` values emit the JSON5 barewords `NaN` / `Infinity` / `-Infinity` (which SmarterJSON reads back, so they round-trip). When `false` (the default), a non-finite number raises `SmarterJSON::GenerateError` β they aren't valid strict JSON. |
|
|
63
|
+
| `:ascii_only` | `false` | Escape every non-ASCII character as `\uXXXX` (astral characters as a UTF-16 surrogate pair). The default emits raw UTF-8. |
|
|
64
|
+
| `:coerce` | `false` | When `true`, a value that isn't natively supported is converted by its own `as_json` (the result is re-emitted, so the other options still apply) or, failing that, `to_json` (spliced verbatim). When `false` (the default), such a value raises `SmarterJSON::GenerateError`. |
|
|
62
65
|
| `:format` | `:json` | `:json` writes standard JSON (Hash β object, Array β array, scalar β scalar). `:ndjson` writes newline-delimited JSON: an Array becomes one element per line, any other value becomes a single line. |
|
|
63
66
|
| `:indent` | `0` | Spaces per nesting level for pretty-printing. `0` (the default) is compact output. Empty objects/arrays stay inline. Not allowed with `:ndjson` (a record must be a single line). |
|
|
64
|
-
| `:sort_keys` | `false` | Emit object keys in sorted order (Symbol keys sorted by their string form). Useful for canonical, diff-friendly output. |
|
|
65
|
-
| `:ascii_only` | `false` | Escape every non-ASCII character as `\uXXXX` (astral characters as a UTF-16 surrogate pair). The default emits raw UTF-8. |
|
|
66
67
|
| `:script_safe` | `false` | Escape the `/` in `</` and the JS line separators U+2028 / U+2029, so output is safe to embed in an HTML `<script>` tag. |
|
|
67
|
-
| `:
|
|
68
|
+
| `:sort_keys` | `false` | Emit object keys in sorted order (Symbol keys sorted by their string form). Useful for canonical, diff-friendly output. |
|
|
68
69
|
|
|
69
|
-
|
|
70
|
+
Configuration is validated up front: an unknown option key, a known key with the wrong type or value (a non-Symbol `:format`, a negative/non-Integer `:indent`, a non-boolean flag), or combining `:indent` with `:ndjson`, raises `ArgumentError`.
|
|
70
71
|
|
|
71
72
|
```ruby
|
|
72
73
|
SmarterJSON.generate([1, 2, 3]) # => "[1,2,3]" (default :json β a single JSON array)
|
|
@@ -40,6 +40,7 @@ static ID fj_call_id; /* cached :call (invoking the on_warning handler) */
|
|
|
40
40
|
static VALUE fj_sym_empty_slot;
|
|
41
41
|
static VALUE fj_sym_empty_value;
|
|
42
42
|
static VALUE fj_sym_duplicate_key;
|
|
43
|
+
static VALUE fj_sym_number_overflow;
|
|
43
44
|
static ID fj_bigdecimal_id; /* cached BigDecimal() method id (set in Init) */
|
|
44
45
|
static ID fj_to_sym_id; /* cached :to_sym (symbolize_keys) */
|
|
45
46
|
static ID fj_key_p_id; /* cached :key? (non-default duplicate_key modes) */
|
|
@@ -262,6 +263,8 @@ static inline int fj_needs_ws_skip(int b) {
|
|
|
262
263
|
/* forward declarations (mutual recursion) */
|
|
263
264
|
static VALUE fj_parse_value(fj_state *st);
|
|
264
265
|
static VALUE fj_parse_member_value(fj_state *st);
|
|
266
|
+
static int fj_smart_quote_kind(fj_state *st);
|
|
267
|
+
static VALUE fj_parse_smart_string(fj_state *st, int kind);
|
|
265
268
|
|
|
266
269
|
static void fj_append_utf8(VALUE buf, unsigned long cp) {
|
|
267
270
|
char tmp[4];
|
|
@@ -579,7 +582,8 @@ static VALUE fj_float_strtod(const char *p, long n) {
|
|
|
579
582
|
}
|
|
580
583
|
|
|
581
584
|
/* e10 is the final base-10 exponent (already adjusted by the fraction length). */
|
|
582
|
-
static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(uint64_t m10, int m10digits, int64_t e10, int neg, int overflow, const char *p, long n) {
|
|
585
|
+
static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(fj_state *st, uint64_t m10, int m10digits, int64_t e10, int neg, int overflow, const char *p, long n) {
|
|
586
|
+
double d;
|
|
583
587
|
/* Fast path by mantissa width (our scanner accumulates m10 exactly up to 18
|
|
584
588
|
digits, flagging overflow beyond):
|
|
585
589
|
1..18 digits -> Eisel-Lemire, correctly-rounded for any exact uint64 mantissa
|
|
@@ -589,10 +593,18 @@ static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(uint64_t m10, int m10digits, i
|
|
|
589
593
|
>18 digits / overflow / extreme exponent -> strtod (round-to-odd). */
|
|
590
594
|
if (!overflow && m10digits >= 1 && m10digits <= 18 && (long)m10digits + e10 >= -307) {
|
|
591
595
|
if (m10 == 0) return rb_float_new(neg ? -0.0 : 0.0);
|
|
592
|
-
|
|
596
|
+
d = fj_eisel_lemire_s2d(e10, m10, neg);
|
|
597
|
+
} else {
|
|
598
|
+
/* Fallback for >18 digits / extreme or subnormal exponents. */
|
|
599
|
+
d = RFLOAT_VALUE(fj_float_strtod(p, n));
|
|
593
600
|
}
|
|
594
|
-
/*
|
|
595
|
-
|
|
601
|
+
/* A finite literal whose magnitude exceeds Float range (e.g. 1e400) becomes
|
|
602
|
+
Β±Infinity β a silent data change. Report it via :number_overflow (the value is
|
|
603
|
+
still returned). The Infinity/NaN keywords take separate paths and never get here.
|
|
604
|
+
Gate isinf on a listening handler (matches the Ruby float_or_warn): no handler ->
|
|
605
|
+
no point detecting, and it keeps the test off the hot number path. */
|
|
606
|
+
if (st->on_warning != Qnil && isinf(d)) fj_warn(st, fj_sym_number_overflow, "number literal out of Float range β collapsed to Infinity");
|
|
607
|
+
return rb_float_new(d);
|
|
596
608
|
}
|
|
597
609
|
|
|
598
610
|
/* Scan an already-bounded quoteless token [p, p+n) exactly once: validate it as a
|
|
@@ -677,7 +689,7 @@ static int fj_try_decimal(fj_state *st, const char *p, long n, VALUE *out) {
|
|
|
677
689
|
(st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(p, n) > 16)) {
|
|
678
690
|
*out = fj_to_bigdecimal_token(p, n);
|
|
679
691
|
} else {
|
|
680
|
-
*out = fj_float_from_parts(m10, m10digits, e10, neg, overflow, p, n);
|
|
692
|
+
*out = fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, p, n);
|
|
681
693
|
}
|
|
682
694
|
return 1;
|
|
683
695
|
}
|
|
@@ -789,7 +801,7 @@ static VALUE fj_parse_number(fj_state *st) {
|
|
|
789
801
|
(st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(np, nlen) > 16)) {
|
|
790
802
|
return fj_to_bigdecimal_token(np, nlen);
|
|
791
803
|
}
|
|
792
|
-
return fj_float_from_parts(m10, m10digits, e10, neg, overflow, np, nlen);
|
|
804
|
+
return fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, np, nlen);
|
|
793
805
|
}
|
|
794
806
|
|
|
795
807
|
static VALUE fj_parse_literal(fj_state *st, const char *word, VALUE value) {
|
|
@@ -842,6 +854,7 @@ static VALUE fj_parse_identifier_key(fj_state *st) {
|
|
|
842
854
|
|
|
843
855
|
static VALUE fj_parse_object_key(fj_state *st) {
|
|
844
856
|
int b = fj_byte(st);
|
|
857
|
+
int kind;
|
|
845
858
|
|
|
846
859
|
/* Quoted key. The common case has no escapes: intern straight from the buffer
|
|
847
860
|
* with no throwaway allocation. An escaped key (rare) falls through to the
|
|
@@ -862,6 +875,12 @@ static VALUE fj_parse_object_key(fj_state *st) {
|
|
|
862
875
|
return fj_parse_string(st, b);
|
|
863
876
|
}
|
|
864
877
|
|
|
878
|
+
/* A key may open with a smart/curly quote too (a word-processor paste curls the
|
|
879
|
+
* keys, not just the values) β route to the same reader the value path uses.
|
|
880
|
+
* Mirrors the Ruby fallback's parse_object_key; Hash#[]= dedups the key on store. */
|
|
881
|
+
kind = fj_smart_quote_kind(st);
|
|
882
|
+
if (kind) return fj_parse_smart_string(st, kind);
|
|
883
|
+
|
|
865
884
|
if (fj_is_key_start(b)) return fj_parse_identifier_key(st);
|
|
866
885
|
|
|
867
886
|
fj_error(st, "expected a key");
|
|
@@ -1197,7 +1216,7 @@ static int fj_try_member_number(fj_state *st, VALUE *out) {
|
|
|
1197
1216
|
(st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(np, nlen) > 16)) {
|
|
1198
1217
|
*out = fj_to_bigdecimal_token(np, nlen);
|
|
1199
1218
|
} else {
|
|
1200
|
-
*out = fj_float_from_parts(m10, m10digits, e10, neg, overflow, np, nlen);
|
|
1219
|
+
*out = fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, np, nlen);
|
|
1201
1220
|
}
|
|
1202
1221
|
return 1;
|
|
1203
1222
|
}
|
|
@@ -1625,6 +1644,7 @@ void Init_smarter_json(void) {
|
|
|
1625
1644
|
fj_sym_empty_slot = ID2SYM(rb_intern("empty_slot"));
|
|
1626
1645
|
fj_sym_empty_value = ID2SYM(rb_intern("empty_value"));
|
|
1627
1646
|
fj_sym_duplicate_key = ID2SYM(rb_intern("duplicate_key"));
|
|
1647
|
+
fj_sym_number_overflow = ID2SYM(rb_intern("number_overflow"));
|
|
1628
1648
|
fj_sym_encoding = ID2SYM(rb_intern("encoding"));
|
|
1629
1649
|
fj_sym_symbolize_keys = ID2SYM(rb_intern("symbolize_keys"));
|
|
1630
1650
|
fj_sym_first_wins = ID2SYM(rb_intern("first_wins"));
|
|
@@ -34,7 +34,17 @@ module SmarterJSON
|
|
|
34
34
|
# (including multi-byte UTF-8) is emitted raw β valid JSON.
|
|
35
35
|
ESCAPE_RE = /["\\\x00-\x1f]/.freeze
|
|
36
36
|
|
|
37
|
+
# Strict configuration: an unknown writer option is a caller bug, so it raises
|
|
38
|
+
# rather than being silently ignored.
|
|
39
|
+
KNOWN_OPTIONS = %i[format indent ascii_only script_safe sort_keys coerce allow_nan].freeze
|
|
40
|
+
|
|
37
41
|
def initialize(options = {})
|
|
42
|
+
unknown = options.keys - KNOWN_OPTIONS
|
|
43
|
+
unless unknown.empty?
|
|
44
|
+
raise ArgumentError, "SmarterJSON.generate: unknown option#{unknown.size == 1 ? '' : 's'} " \
|
|
45
|
+
"#{unknown.map(&:inspect).join(', ')} β valid keys: #{KNOWN_OPTIONS.map(&:inspect).join(', ')}"
|
|
46
|
+
end
|
|
47
|
+
|
|
38
48
|
@format = options.fetch(:format, :json)
|
|
39
49
|
unless %i[json ndjson].include?(@format)
|
|
40
50
|
raise ArgumentError, "unknown writer format: #{@format.inspect} (expected :json or :ndjson)"
|
|
@@ -50,10 +60,11 @@ module SmarterJSON
|
|
|
50
60
|
|
|
51
61
|
@pretty = @indent > 0
|
|
52
62
|
|
|
53
|
-
@ascii_only = options
|
|
54
|
-
@script_safe = options
|
|
55
|
-
@sort_keys = options
|
|
56
|
-
@coerce = options
|
|
63
|
+
@ascii_only = boolean_option(options, :ascii_only) # escape non-ASCII as \uXXXX
|
|
64
|
+
@script_safe = boolean_option(options, :script_safe) # escape </ and U+2028 / U+2029
|
|
65
|
+
@sort_keys = boolean_option(options, :sort_keys) # emit object keys in sorted order
|
|
66
|
+
@coerce = boolean_option(options, :coerce) # convert unknown types via as_json / to_json
|
|
67
|
+
@allow_nan = boolean_option(options, :allow_nan) # emit NaN / Infinity / -Infinity (JSON5) instead of raising
|
|
57
68
|
@escape_re = build_escape_re
|
|
58
69
|
end
|
|
59
70
|
|
|
@@ -77,7 +88,48 @@ module SmarterJSON
|
|
|
77
88
|
|
|
78
89
|
private
|
|
79
90
|
|
|
80
|
-
|
|
91
|
+
# A boolean writer option must be exactly true or false β a wrong type is a
|
|
92
|
+
# caller bug, so it raises rather than being coerced or ignored.
|
|
93
|
+
def boolean_option(options, key)
|
|
94
|
+
value = options.fetch(key, false)
|
|
95
|
+
return value if value == true || value == false
|
|
96
|
+
|
|
97
|
+
raise ArgumentError, "#{key} must be true or false (got #{value.inspect})"
|
|
98
|
+
end
|
|
99
|
+
|
|
100
|
+
# Iterative serializer β an explicit frame stack (one frame per open container),
|
|
101
|
+
# mirroring the recursive structure but heap-allocated, so arbitrarily deep input
|
|
102
|
+
# cannot overflow the call stack (parity with the iterative parser). Output is
|
|
103
|
+
# byte-identical to the former recursive version. A frame is a small Array:
|
|
104
|
+
# [members, idx, is_hash, before_first, before_rest, colon, closer, level]
|
|
105
|
+
def emit(obj, buf)
|
|
106
|
+
stack = []
|
|
107
|
+
push_value(obj, 0, buf, stack)
|
|
108
|
+
until stack.empty?
|
|
109
|
+
frame = stack.last
|
|
110
|
+
members = frame[0]
|
|
111
|
+
i = frame[1]
|
|
112
|
+
if i == members.length
|
|
113
|
+
buf << frame[6] # closer
|
|
114
|
+
stack.pop
|
|
115
|
+
next
|
|
116
|
+
end
|
|
117
|
+
frame[1] = i + 1
|
|
118
|
+
buf << (i.zero? ? frame[3] : frame[4]) # opener-pad / separator-pad
|
|
119
|
+
if frame[2] # hash
|
|
120
|
+
k, v = members[i]
|
|
121
|
+
emit_string(k.is_a?(String) ? k : k.to_s, buf) # Symbol/other keys -> string
|
|
122
|
+
buf << frame[5] # colon
|
|
123
|
+
push_value(v, frame[7] + 1, buf, stack)
|
|
124
|
+
else
|
|
125
|
+
push_value(members[i], frame[7] + 1, buf, stack)
|
|
126
|
+
end
|
|
127
|
+
end
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
# Emit one value at `level`: a scalar appends directly; a non-empty container writes
|
|
131
|
+
# its opener and pushes a frame for the driver above to walk (no recursion into it).
|
|
132
|
+
def push_value(obj, level, buf, stack)
|
|
81
133
|
case obj
|
|
82
134
|
when nil then buf << "null"
|
|
83
135
|
when true then buf << "true"
|
|
@@ -87,22 +139,30 @@ module SmarterJSON
|
|
|
87
139
|
when Integer then buf << obj.to_s
|
|
88
140
|
when Float then emit_float(obj, buf)
|
|
89
141
|
when BigDecimal then emit_bigdecimal(obj, buf)
|
|
90
|
-
when Array
|
|
91
|
-
|
|
142
|
+
when Array
|
|
143
|
+
return buf << "[]" if obj.empty? # empty stays inline, even in pretty mode
|
|
144
|
+
|
|
145
|
+
buf << (@pretty ? "[\n" : "[")
|
|
146
|
+
stack << container_frame(obj, false, level)
|
|
147
|
+
when Hash
|
|
148
|
+
return buf << "{}" if obj.empty? # empty stays inline, even in pretty mode
|
|
149
|
+
|
|
150
|
+
pairs = @sort_keys ? obj.sort_by { |k, _| k.is_a?(String) ? k : k.to_s } : obj.to_a
|
|
151
|
+
buf << (@pretty ? "{\n" : "{")
|
|
152
|
+
stack << container_frame(pairs, true, level)
|
|
92
153
|
else
|
|
93
|
-
return
|
|
154
|
+
return push_coerced(obj, level, buf, stack) if @coerce
|
|
94
155
|
|
|
95
156
|
raise SmarterJSON::GenerateError, "SmarterJSON.generate cannot serialize #{obj.class}"
|
|
96
157
|
end
|
|
97
158
|
end
|
|
98
159
|
|
|
99
|
-
# coerce: true β
|
|
100
|
-
#
|
|
101
|
-
#
|
|
102
|
-
|
|
103
|
-
def emit_coerced(obj, buf, level)
|
|
160
|
+
# coerce: true β prefer as_json (re-emitted through the normal pipeline, so the
|
|
161
|
+
# escaping/format options still apply); else to_json (spliced as-is, so ascii_only /
|
|
162
|
+
# script_safe do not reach inside it); else raise.
|
|
163
|
+
def push_coerced(obj, level, buf, stack)
|
|
104
164
|
if obj.respond_to?(:as_json)
|
|
105
|
-
|
|
165
|
+
push_value(obj.as_json, level, buf, stack)
|
|
106
166
|
elsif obj.respond_to?(:to_json)
|
|
107
167
|
buf << obj.to_json
|
|
108
168
|
else
|
|
@@ -110,57 +170,16 @@ module SmarterJSON
|
|
|
110
170
|
end
|
|
111
171
|
end
|
|
112
172
|
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
pad = " " * (@indent * (level + 1))
|
|
118
|
-
buf << "[\n"
|
|
119
|
-
arr.each_with_index do |v, i|
|
|
120
|
-
buf << ",\n" unless i.zero?
|
|
121
|
-
buf << pad
|
|
122
|
-
emit(v, buf, level + 1)
|
|
123
|
-
end
|
|
124
|
-
buf << "\n" << (" " * (@indent * level)) << "]"
|
|
125
|
-
else
|
|
126
|
-
buf << "["
|
|
127
|
-
arr.each_with_index do |v, i|
|
|
128
|
-
buf << "," unless i.zero?
|
|
129
|
-
emit(v, buf, level)
|
|
130
|
-
end
|
|
131
|
-
buf << "]"
|
|
132
|
-
end
|
|
133
|
-
end
|
|
134
|
-
|
|
135
|
-
def emit_hash(hash, buf, level)
|
|
136
|
-
return buf << "{}" if hash.empty? # empty stays inline, even in pretty mode
|
|
137
|
-
|
|
138
|
-
pairs = @sort_keys ? hash.sort_by { |k, _| k.is_a?(String) ? k : k.to_s } : hash
|
|
139
|
-
|
|
173
|
+
# Build a frame for an open container at `level`, precomputing its punctuation/indent
|
|
174
|
+
# once (as the recursive version computed `pad` once per container).
|
|
175
|
+
def container_frame(members, is_hash, level)
|
|
176
|
+
close_glyph = is_hash ? "}" : "]"
|
|
140
177
|
if @pretty
|
|
141
|
-
pad
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
pairs.each do |k, v|
|
|
145
|
-
buf << ",\n" unless first
|
|
146
|
-
first = false
|
|
147
|
-
buf << pad
|
|
148
|
-
emit_string(k.is_a?(String) ? k : k.to_s, buf) # Symbol/other keys -> string
|
|
149
|
-
buf << ": "
|
|
150
|
-
emit(v, buf, level + 1)
|
|
151
|
-
end
|
|
152
|
-
buf << "\n" << (" " * (@indent * level)) << "}"
|
|
178
|
+
pad = " " * (@indent * (level + 1))
|
|
179
|
+
padl = " " * (@indent * level)
|
|
180
|
+
[members, 0, is_hash, pad, ",\n#{pad}", ": ", "\n#{padl}#{close_glyph}", level]
|
|
153
181
|
else
|
|
154
|
-
|
|
155
|
-
first = true
|
|
156
|
-
pairs.each do |k, v|
|
|
157
|
-
buf << "," unless first
|
|
158
|
-
first = false
|
|
159
|
-
emit_string(k.is_a?(String) ? k : k.to_s, buf) # Symbol/other keys -> string
|
|
160
|
-
buf << ":"
|
|
161
|
-
emit(v, buf, level)
|
|
162
|
-
end
|
|
163
|
-
buf << "}"
|
|
182
|
+
[members, 0, is_hash, "", ",", ":", close_glyph, level]
|
|
164
183
|
end
|
|
165
184
|
end
|
|
166
185
|
|
|
@@ -195,15 +214,31 @@ module SmarterJSON
|
|
|
195
214
|
end
|
|
196
215
|
|
|
197
216
|
def emit_float(flt, buf)
|
|
198
|
-
|
|
217
|
+
unless flt.finite?
|
|
218
|
+
raise SmarterJSON::GenerateError, "SmarterJSON.generate cannot serialize non-finite Float #{flt}" unless @allow_nan
|
|
219
|
+
|
|
220
|
+
return buf << non_finite_literal(flt)
|
|
221
|
+
end
|
|
199
222
|
|
|
200
223
|
buf << flt.to_s # Ruby's Float#to_s is shortest round-trippable; e-notation is valid JSON
|
|
201
224
|
end
|
|
202
225
|
|
|
203
226
|
def emit_bigdecimal(num, buf)
|
|
204
|
-
|
|
227
|
+
unless num.finite?
|
|
228
|
+
raise SmarterJSON::GenerateError, "SmarterJSON.generate cannot serialize non-finite BigDecimal" unless @allow_nan
|
|
229
|
+
|
|
230
|
+
return buf << non_finite_literal(num)
|
|
231
|
+
end
|
|
205
232
|
|
|
206
233
|
buf << num.to_s("F") # plain decimal notation (BigDecimal's default "0.1e1" is not valid JSON)
|
|
207
234
|
end
|
|
235
|
+
|
|
236
|
+
# JSON5-style literals for non-finite numbers, emitted only when allow_nan: true.
|
|
237
|
+
# `infinite?` returns 1 / -1 / nil for both Float and BigDecimal.
|
|
238
|
+
def non_finite_literal(num)
|
|
239
|
+
return "NaN" if num.nan?
|
|
240
|
+
|
|
241
|
+
num.infinite? == 1 ? "Infinity" : "-Infinity"
|
|
242
|
+
end
|
|
208
243
|
end
|
|
209
244
|
end
|
data/lib/smarter_json/options.rb
CHANGED
|
@@ -7,7 +7,7 @@ module SmarterJSON
|
|
|
7
7
|
module Options
|
|
8
8
|
DEFAULT_OPTIONS = {
|
|
9
9
|
acceleration: true, # use the C extension when available; false forces pure Ruby
|
|
10
|
-
encoding: nil, # label the input's encoding (no transcoding); nil keeps the input's own
|
|
10
|
+
encoding: nil, # label the input's encoding (no transcoding); nil keeps the input's own (valid-UTF-8 ASCII-8BIT β UTF-8)
|
|
11
11
|
symbolize_keys: false, # Symbol keys instead of String
|
|
12
12
|
duplicate_key: :last_wins, # :last_wins | :first_wins (repeats are also reported via on_warning)
|
|
13
13
|
decimal_precision: :auto, # :auto | :float | :bigdecimal (Oj-compatible decimal handling)
|
|
@@ -24,17 +24,30 @@ module SmarterJSON
|
|
|
24
24
|
end
|
|
25
25
|
|
|
26
26
|
# Raise ArgumentError (consistent with the generator's option checks) listing
|
|
27
|
-
# every
|
|
28
|
-
#
|
|
27
|
+
# every problem at once. Configuration is strict β unlike the lenient *data*
|
|
28
|
+
# handling, an unknown option key or a bad value raises, so a caller's typo or
|
|
29
|
+
# wrong type is caught immediately instead of silently having no effect.
|
|
29
30
|
def validate_options!(options)
|
|
30
31
|
errors = []
|
|
31
32
|
|
|
33
|
+
unknown = options.keys - DEFAULT_OPTIONS.keys
|
|
34
|
+
unless unknown.empty?
|
|
35
|
+
errors << "unknown option#{unknown.size == 1 ? '' : 's'} #{unknown.map(&:inspect).join(', ')} " \
|
|
36
|
+
"β valid keys: #{DEFAULT_OPTIONS.keys.map(&:inspect).join(', ')}"
|
|
37
|
+
end
|
|
38
|
+
|
|
32
39
|
unless %i[auto float bigdecimal].include?(options[:decimal_precision])
|
|
33
40
|
errors << "decimal_precision must be :auto, :float, or :bigdecimal (got #{options[:decimal_precision].inspect})"
|
|
34
41
|
end
|
|
35
42
|
unless %i[last_wins first_wins].include?(options[:duplicate_key])
|
|
36
43
|
errors << "duplicate_key must be :last_wins or :first_wins (got #{options[:duplicate_key].inspect})"
|
|
37
44
|
end
|
|
45
|
+
unless [true, false].include?(options[:acceleration])
|
|
46
|
+
errors << "acceleration must be true or false (got #{options[:acceleration].inspect})"
|
|
47
|
+
end
|
|
48
|
+
unless [true, false].include?(options[:symbolize_keys])
|
|
49
|
+
errors << "symbolize_keys must be true or false (got #{options[:symbolize_keys].inspect})"
|
|
50
|
+
end
|
|
38
51
|
on_warning = options[:on_warning]
|
|
39
52
|
unless on_warning.nil? || on_warning.respond_to?(:call)
|
|
40
53
|
errors << "on_warning must be nil or a callable (got #{on_warning.class})"
|
data/lib/smarter_json/parser.rb
CHANGED
|
@@ -57,6 +57,41 @@ module SmarterJSON
|
|
|
57
57
|
end
|
|
58
58
|
end
|
|
59
59
|
|
|
60
|
+
# SmarterJSON.foreach(source, options = {}) β the streaming, composable sibling of
|
|
61
|
+
# process_file, mirroring the stdlib convention (CSV.foreach / File.foreach): a
|
|
62
|
+
# plain Enumerator (NOT Enumerator::Lazy), so .map / .select behave the normal way
|
|
63
|
+
# and return an Array.
|
|
64
|
+
#
|
|
65
|
+
# `source` is a file path (opened and streamed from disk, like process_file) OR an
|
|
66
|
+
# IO β a socket, a StringIO, an open File β streamed directly from its current
|
|
67
|
+
# position. A String is always a path, never content. An IO source is single-pass:
|
|
68
|
+
# it can only be read once, so iterating the returned Enumerator a second time over
|
|
69
|
+
# the same IO yields nothing.
|
|
70
|
+
#
|
|
71
|
+
# Without a block: returns an Enumerator over each top-level document, reading one
|
|
72
|
+
# document at a time via readpartial β it never slurps the whole file the way
|
|
73
|
+
# process_file(path) does. So foreach(path).first(3) reads only ~3 documents off
|
|
74
|
+
# disk, and foreach(src).each { β¦ } / .next stream in bounded memory. .map / .select
|
|
75
|
+
# read the source one document at a time but still build an Array of their result;
|
|
76
|
+
# for a chain that stays bounded end to end (a large filtered set off a fat file)
|
|
77
|
+
# opt into .lazy at the call site: foreach(src).lazy.select { β¦ }.each { β¦ }.
|
|
78
|
+
#
|
|
79
|
+
# With a block: streams each document and returns the document count β identical
|
|
80
|
+
# to process_file(path) { |doc| β¦ } (or process(io) { |doc| β¦ } for an IO).
|
|
81
|
+
#
|
|
82
|
+
# Options are validated eagerly (before the Enumerator is returned), so a bad
|
|
83
|
+
# option key or value fails fast rather than on first iteration.
|
|
84
|
+
def foreach(source, options = {}, &block)
|
|
85
|
+
options = Options.process_options(options)
|
|
86
|
+
return enum_for(:foreach, source, options) unless block
|
|
87
|
+
|
|
88
|
+
if source.respond_to?(:read) # an IO (socket, StringIO, open File) β stream it directly
|
|
89
|
+
stream_io(source, options, &block)
|
|
90
|
+
else # a path β open the file and stream from disk
|
|
91
|
+
process_file(source, options, &block)
|
|
92
|
+
end
|
|
93
|
+
end
|
|
94
|
+
|
|
60
95
|
# SmarterJSON.process_one(input, options = {}) β the single-document accessor.
|
|
61
96
|
#
|
|
62
97
|
# Returns the first document's value (or nil when the input holds no documents).
|
|
@@ -109,6 +144,25 @@ module SmarterJSON
|
|
|
109
144
|
end
|
|
110
145
|
end
|
|
111
146
|
|
|
147
|
+
# Smart default for the nil :encoding option. A String tagged ASCII-8BIT (BINARY)
|
|
148
|
+
# is how Net::HTTP and many HTTP libraries hand back a response body even when the
|
|
149
|
+
# bytes are UTF-8. JSON's interchange encoding is UTF-8, so we relabel such input
|
|
150
|
+
# to UTF-8 when its bytes are valid UTF-8 β otherwise string values would come back
|
|
151
|
+
# tagged ASCII-8BIT and compare unequal to UTF-8 literals (a silent footgun). When
|
|
152
|
+
# the bytes are NOT valid UTF-8 we raise EncodingError rather than guess a legacy
|
|
153
|
+
# encoding β pass an explicit :encoding for that. An explicit (non-nil) :encoding,
|
|
154
|
+
# or any non-BINARY tag, is left untouched (the per-path force_encoding / validation
|
|
155
|
+
# handles it). Only relabels β never transcodes.
|
|
156
|
+
def normalize_default_encoding(input, options)
|
|
157
|
+
return input unless options[:encoding].nil?
|
|
158
|
+
return input unless input.encoding == Encoding::ASCII_8BIT
|
|
159
|
+
|
|
160
|
+
utf8 = input.dup.force_encoding(Encoding::UTF_8)
|
|
161
|
+
return utf8 if utf8.valid_encoding?
|
|
162
|
+
|
|
163
|
+
raise EncodingError, "input is tagged ASCII-8BIT and is not valid UTF-8 β pass encoding: to declare its encoding"
|
|
164
|
+
end
|
|
165
|
+
|
|
112
166
|
# Stream documents from an IO incrementally, yielding each recovered top-level
|
|
113
167
|
# document without slurping the whole input into memory first.
|
|
114
168
|
def stream_io(io, options, &block)
|
|
@@ -411,6 +465,7 @@ module SmarterJSON
|
|
|
411
465
|
module_function
|
|
412
466
|
|
|
413
467
|
def process_string(input, options, &block)
|
|
468
|
+
input = SmarterJSON.send(:normalize_default_encoding, input, options)
|
|
414
469
|
return SmarterJSON.send(:process_content, input, options, &block) unless input.valid_encoding?
|
|
415
470
|
|
|
416
471
|
# Recovery is REACTIVE: parse first, and only fall back to wrapper extraction when
|
|
@@ -1220,6 +1275,12 @@ module SmarterJSON
|
|
|
1220
1275
|
b = byte
|
|
1221
1276
|
return parse_string(DQUOTE) if b == DQUOTE
|
|
1222
1277
|
return parse_string(SQUOTE) if b == SQUOTE
|
|
1278
|
+
|
|
1279
|
+
# A key may open with a smart/curly quote too (word-processor paste curls keys,
|
|
1280
|
+
# not just values) β route to the same reader values already use.
|
|
1281
|
+
kind = smart_quote_kind(@pos)
|
|
1282
|
+
return parse_smart_string(kind) if kind
|
|
1283
|
+
|
|
1223
1284
|
raise error("expected a key") unless b && key_start_byte?(b)
|
|
1224
1285
|
|
|
1225
1286
|
parse_identifier_key
|
|
@@ -1365,12 +1426,25 @@ module SmarterJSON
|
|
|
1365
1426
|
# than 16 significant digits (Oj's DEC_MAX threshold), else Float.
|
|
1366
1427
|
def decimal_value(body)
|
|
1367
1428
|
case @decimal_precision
|
|
1368
|
-
when :float then body
|
|
1429
|
+
when :float then float_or_warn(body)
|
|
1369
1430
|
when :bigdecimal then to_big_decimal(body)
|
|
1370
|
-
else significant_digits(body) > 16 ? to_big_decimal(body) : body
|
|
1431
|
+
else significant_digits(body) > 16 ? to_big_decimal(body) : float_or_warn(body)
|
|
1371
1432
|
end
|
|
1372
1433
|
end
|
|
1373
1434
|
|
|
1435
|
+
# A finite numeric literal whose magnitude exceeds Float range (e.g. 1e400) becomes
|
|
1436
|
+
# Β±Infinity β a silent data change. Report it via :number_overflow (the value is still
|
|
1437
|
+
# returned; we warn rather than raise or invent). The Infinity/NaN *keywords* go through
|
|
1438
|
+
# a separate path and never reach here, so they don't warn.
|
|
1439
|
+
def float_or_warn(body)
|
|
1440
|
+
f = body.to_f
|
|
1441
|
+
# Only test for overflow when an on_warning handler is listening: `f.infinite?` is a
|
|
1442
|
+
# per-float method call we don't want on the hot number path otherwise, and with no
|
|
1443
|
+
# handler the warning would go nowhere anyway. Overflow is vanishingly rare.
|
|
1444
|
+
warn(:number_overflow, "number literal out of Float range β collapsed to #{f}") if @on_warning && f.infinite?
|
|
1445
|
+
f
|
|
1446
|
+
end
|
|
1447
|
+
|
|
1374
1448
|
# Count significant mantissa digits (leading zeros excluded, exponent ignored) to pick
|
|
1375
1449
|
# Float vs BigDecimal in :auto mode. A single byte-scan β the old three-regex version
|
|
1376
1450
|
# (strip exponent, strip non-digits, strip leading zeros, .length) ran on every float
|
data/lib/smarter_json/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,13 +1,13 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: smarter_json
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version:
|
|
4
|
+
version: 1.1.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Tilo Sloboda
|
|
8
8
|
bindir: exe
|
|
9
9
|
cert_chain: []
|
|
10
|
-
date: 2026-06-
|
|
10
|
+
date: 2026-06-09 00:00:00.000000000 Z
|
|
11
11
|
dependencies:
|
|
12
12
|
- !ruby/object:Gem::Dependency
|
|
13
13
|
name: bigdecimal
|
|
@@ -23,10 +23,16 @@ dependencies:
|
|
|
23
23
|
- - ">="
|
|
24
24
|
- !ruby/object:Gem::Version
|
|
25
25
|
version: '0'
|
|
26
|
-
description: '
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
26
|
+
description: 'A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON,
|
|
27
|
+
JSONL, JSON5, HJSON-style config, and the messy JSON-ish input humans and LLMs actually
|
|
28
|
+
write β comments, trailing commas, single / unquoted / smart quotes, Python and
|
|
29
|
+
JS keywords, a UTF-8 BOM, and more all parse to the same Ruby objects, with no modes
|
|
30
|
+
or flags to set. Where a traditional parser stops at the first deviation and throws
|
|
31
|
+
away the whole document, SmarterJSON keeps going β it optimizes for getting your
|
|
32
|
+
data out, not for policing the JSON spec. It reads multi-document NDJSON / JSONL
|
|
33
|
+
in one call (and streams it with a block), and in benchmarks its C extension matches
|
|
34
|
+
or beats Oj on nearly every file. SmarterJSON is opinionated: we want your JSON
|
|
35
|
+
processing to be successful.
|
|
30
36
|
|
|
31
37
|
'
|
|
32
38
|
email:
|
|
@@ -86,6 +92,6 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
86
92
|
requirements: []
|
|
87
93
|
rubygems_version: 3.6.9
|
|
88
94
|
specification_version: 4
|
|
89
|
-
summary:
|
|
90
|
-
|
|
95
|
+
summary: A lenient, fast JSON processor for Ruby β reads strict JSON, NDJSON, JSONL,
|
|
96
|
+
JSON5, HJSON, and the messy JSON humans and LLMs actually write.
|
|
91
97
|
test_files: []
|