smarter_json 0.9.9 β†’ 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: fbc93b4afea26fc4d30c241ceb7823451cb7b777454441a9871f066b719f8c07
4
- data.tar.gz: a9a5801cab6a604e4d166d6d86aa48f41f5e1bb74fb58f39f283a0477ac78db4
3
+ metadata.gz: 84a73a6cf0785c67eb2dfaf87dc663b860d5afed6bf0816f861b6430d1f55475
4
+ data.tar.gz: 953aebf65ab855450a7d3b41c826c169242dd19a190f089da3f52df2da0b0a44
5
5
  SHA512:
6
- metadata.gz: c8749bd358973f0d284966a6c5fb42a4e71954c8884c337c5859f5f3e566b302d1a4c1dbc9961ed5ae46aaf5a9e019935fa46916bb858877a2ef34ecc99575c9
7
- data.tar.gz: 2f2efe9c89ae08bcf807e061ca734d13056c6fafc590d072d7870a705cf25e3d4932fff567c1b09320f5cbd6b42e02255180fded05c84b08f1032abb0594b8bf
6
+ metadata.gz: 8fe6e07fd99f1557a716fc37370dfc3c5cdb34e054d07fa812983d6cefea1e74108bb0e708fdb42f47a31ff8435aa0d6b8c80deeebab09a221bbd9992691be28
7
+ data.tar.gz: ec166df8863b136abc38df844bba2286b542de03f24d410b6e4a5914bfcf2165337add25c5731b92540328c5496ad4ec658243e21e4830687de9f44dcaaf4d54
data/.gitignore CHANGED
@@ -45,3 +45,4 @@ overage/
45
45
  .claude/
46
46
  CLAUDE.md
47
47
  INTERNAL_DEV_LOG.md
48
+ research
data/CHANGELOG.md CHANGED
@@ -1,20 +1,31 @@
1
1
 
2
2
  # SmarterJSON Change Log
3
3
 
4
- > 🚧 Getting ready for the 1.0.0 release - sorry for the interface changes - thank you for your patience! 🚧
5
-
6
- > ⚠️ **Interface change (since 0.9.7):**
4
+ > ⚠️ **New Interface (since 0.9.7):**
5
+ >
6
+ > SmarterJSON **always returns an `Array`** of documents.
7
7
  >
8
- > `SmarterJSON.process` / `SmarterJSON.process_file` now **always return an `Array`** of documents:
9
- > β€” `[]` for no doc
10
- > - `[doc]` for one doc
11
- > - `[d1, d2, …]` for several docs (NDJSON / JSONL / concatenated docs).
12
-
13
- Going forward this will be the supported interface.
8
+ > `SmarterJSON.process` / `SmarterJSON.process_file` return:
9
+ >
10
+ > β€” `[]` for no doc
11
+ > - `[doc]` for one doc
12
+ > - `[d1, d2, …]` for several docs (NDJSON / JSONL / concatenated docs)
14
13
 
15
- > ⚠️ We discourage the use of `process(input).first` / `[0]` because it silently drops potential additional documents
14
+ > ⚠️ We discourage the use of `process(input).first` / `process(input)[0]` because it silently drops potential additional documents
16
15
  > Please use `process_one` if you are expecting only one JSON doc, e.g. in API payloads.
17
16
 
17
+ ## 1.0.0 (2026-06-08)
18
+
19
+ RSpec tests: 1,034
20
+
21
+ - **The public interface is now stable** β€” `process`, `process_one`, `process_file`, `generate`, and the documented options; semantic versioning from here on.
22
+ - Unknown or wrongly-typed options now raise `ArgumentError` instead of being silently ignored, so a typo (e.g. `symbolize_names:` instead of `symbolize_keys:`) is caught immediately.
23
+ - Input tagged `ASCII-8BIT` whose bytes are valid UTF-8 (e.g. a `Net::HTTP` `response.body`) is now read as UTF-8, so its string values compare equal to UTF-8 literals; ASCII-8BIT input that is not valid UTF-8 raises `SmarterJSON::EncodingError` (pass an explicit `encoding:` for legacy encodings).
24
+ - Object keys may now use smart/curly quotes too (e.g. JSON pasted from a word processor), not just string values.
25
+ - `SmarterJSON.generate` accepts `allow_nan: true` to emit `NaN` / `Infinity` / `-Infinity` (JSON5-style) instead of raising, so non-finite numbers round-trip; the default still raises.
26
+ - A numeric literal that overflows `Float` range (e.g. `1e400`) now reports a `:number_overflow` warning via `on_warning` instead of silently becoming `Infinity`.
27
+ - `SmarterJSON.generate` is now iterative (like the parser), so serializing a deeply nested structure no longer risks `SystemStackError` β€” reading and writing are both depth-safe.
28
+
18
29
  ## 0.9.9 (2026-06-07)
19
30
  - Much faster pure-Ruby parsing (the path used without the C extension) β€” roughly 3Γ— on string-heavy data, ~2Γ— on number-heavy, ~1.7Γ— on object-heavy (on a YJIT-enabled Ruby). Parsed values are unchanged.
20
31
 
data/README.md CHANGED
@@ -62,7 +62,7 @@ Three things set it apart:
62
62
  - Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
63
63
  - Implicit root object β€” a config file that starts with `key: value`, no outer `{}`
64
64
  - `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
65
- - UTF-8 BOM, smart/curly quotes, Python literals (`True` / `False` / `None`), JavaScript `undefined`
65
+ - UTF-8 BOM, smart/curly quotes (in keys and values), Python literals (`True` / `False` / `None`), JavaScript `undefined`
66
66
  - Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via `encoding:`)
67
67
  - Duplicate keys (last value wins by default; configurable)
68
68
 
@@ -165,26 +165,26 @@ SmarterJSON is a C extension (with a pure-Ruby fallback that runs everywhere). B
165
165
  - **None of the others parse JSON5, HJSON-style config, or LLM-wrapped output.** Comments, trailing commas, unquoted keys, quoteless values, `'single quotes'`, markdown code fences, prose wrappers β€” all raise in Oj / `json` / Yajl; SmarterJSON parses them.
166
166
  - **`json` and Yajl produce `Float` only β€” lossy on high-precision numbers.** On coordinate / scientific data (>16 significant digits) they silently round to `Float`, so they aren't a like-for-like comparison there. SmarterJSON (and Oj) keep full precision as `BigDecimal` by default.
167
167
 
168
- Where a like-for-like comparison exists, here is SmarterJSON's C path against each parser. **Apple M4, Ruby 3.4.7, p10 of 40 runs.** Each cell is **SmarterJSON vs that parser** β€” "faster" means SmarterJSON wins. Ratios shift with hardware; run `rake report` in `json_benchmarks/` to reproduce.
168
+ Where a like-for-like comparison exists, here is SmarterJSON's C path against each parser. **Apple M4, Ruby 3.4.7, p10 of 40 runs (2026-06-07); the same picture holds on an Apple M1 Max.** Each cell is **SmarterJSON vs that parser** β€” "faster" means SmarterJSON wins. Ratios shift with hardware; run `rake report` in `json_benchmarks/` to reproduce.
169
169
 
170
170
  | File | vs Oj/strict | vs `json` | vs Yajl |
171
171
  | ----------------------------- | --------------- | ---------------------------- | --------------- |
172
- | big_decimals <sup>β‰ </sup> | **1.8Γ— faster** | **1.1Γ— faster** | **1.3Γ— faster** |
173
- | canada <sup>β‰ </sup> | **8Γ— faster** | 1.1Γ— slower | **2.2Γ— faster** |
174
- | citm_catalog | **1.6Γ— faster** | 1.2Γ— slower | **4.8Γ— faster** |
175
- | citylots <sup>β‰ </sup> | **3.6Γ— faster** | **2.0Γ— faster** | **2.3Γ— faster** |
176
- | config.jsonc | **1.1Γ— faster** | 1.5Γ— slower | **3.7Γ— faster** |
177
- | deeply_nested | **1.4Γ— faster** | **can't parse** <sup>‑</sup> | **5.1Γ— faster** |
178
- | github_events | **1.2Γ— faster** | β‰ˆ tied | **3.1Γ— faster** |
172
+ | big_decimals <sup>β‰ </sup> | **1.7Γ— faster** | β‰ˆ tied | **1.2Γ— faster** |
173
+ | canada <sup>β‰ </sup> | **7Γ— faster** | β‰ˆ tied | **2.1Γ— faster** |
174
+ | citm_catalog | **1.3Γ— faster** | 1.2Γ— slower | **3.2Γ— faster** |
175
+ | citylots <sup>β‰ </sup> | **3.7Γ— faster** | **2.0Γ— faster** | **2.3Γ— faster** |
176
+ | config.jsonc | **1.1Γ— faster** | 1.2Γ— slower | **3.6Γ— faster** |
177
+ | deeply_nested | **1.2Γ— faster** | **can't parse** <sup>‑</sup> | **4.1Γ— faster** |
178
+ | github_events | β‰ˆ tied | 1.1Γ— slower | **2.7Γ— faster** |
179
179
  | string_array | β‰ˆ tied | β‰ˆ tied | **1.6Γ— faster** |
180
- | twitter | **1.4Γ— faster** | 1.3Γ— slower | **3.5Γ— faster** |
181
- | usgs_earthquakes <sup>β‰ </sup> | **1.3Γ— faster** | 1.5Γ— slower | **3.6Γ— faster** |
182
- | weather_berlin | **1.9Γ— faster** | 1.1Γ— slower | **3.5Γ— faster** |
180
+ | twitter | **1.3Γ— faster** | 1.2Γ— slower | **3.2Γ— faster** |
181
+ | usgs_earthquakes <sup>β‰ </sup> | **1.4Γ— faster** | 1.1Γ— slower | **3.4Γ— faster** |
182
+ | weather_berlin | **1.8Γ— faster** | **1.1Γ— faster** | **3.2Γ— faster** |
183
183
 
184
184
  <sup>β‰ </sup> High-precision file. The row uses `decimal_precision: :float` (Float, like-for-like) for `canada` / `citylots` / `big_decimals` / `usgs`. SmarterJSON's **default** `:auto` keeps these decimals as `BigDecimal` (no precision loss, like Oj's default) β€” intrinsically slower than `Float`, so default-vs-`Float` would be apples-to-oranges. Against Oj's matching `BigDecimal` default, SmarterJSON is faster there too.
185
185
  <sup>‑</sup> Not a measurement gap β€” `json` raises by default: it errors on multi-document / NDJSON input without a block, and caps nesting at 100 levels. SmarterJSON has neither limit.
186
186
 
187
- In short: **matches or beats Oj/strict on every file** β€” `string_array` is the one wash (within ~10%, and hardware-dependent: SmarterJSON edges ahead on an M1, Oj edges ahead on an M4) β€” **far faster than Yajl everywhere, and level-to-ahead of stdlib `json` on a like-for-like basis**, while parsing input `json` and Oj reject outright. Floats are decoded with the **Eisel-Lemire** algorithm (fast_float), correctly rounded and **bit-for-bit identical to `JSON.parse`** β€” fast *and* exact, even at full double precision.
187
+ In short: **SmarterJSON's C path matches or beats Oj/strict on every file** (apples-to-apples β€” for the high-precision <sup>β‰ </sup> files that means `decimal_precision: :float`, where Oj/strict also produces `Float`; with `:float`, float-heavy data like `canada` is **~7Γ— faster**). It is **far faster than Yajl everywhere**, and **level-to-ahead of stdlib `json`** β€” `json` edges ahead only on a few object-heavy files (`citm`, `twitter`, `config.jsonc`, `github_events`, all within ~1.25Γ—) and **can't parse `deeply_nested` at all**. Floats are decoded with the **Eisel-Lemire** algorithm (fast_float), correctly rounded and **bit-for-bit identical to `JSON.parse`** β€” fast *and* exact, even at full double precision.
188
188
 
189
189
  **Two notes on fair comparison:**
190
190
 
@@ -200,8 +200,8 @@ In short: **matches or beats Oj/strict on every file** β€” `string_array` is the
200
200
  | `duplicate_key` | `:last_wins` | `:last_wins` / `:first_wins` for a key repeated in one object (every repeat is also reported via `on_warning`) |
201
201
  | `decimal_precision` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
202
202
  | `acceleration` | `true` | `true` uses the C extension when compiled and loadable; `false` forces pure Ruby (identical results) |
203
- | `encoding` | `"UTF-8"` | labels the input's encoding (no transcoding pass; see below) |
204
- | `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
203
+ | `encoding` | `nil` | labels the input's encoding; `nil` keeps the input's own (no transcoding pass; see below) |
204
+ | `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`, `:number_overflow`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
205
205
 
206
206
  ## Examples
207
207
 
@@ -295,11 +295,11 @@ TEXT
295
295
 
296
296
  ## Encoding
297
297
 
298
- `encoding:` (default `"UTF-8"`) labels what the input is β€” it does **not** trigger a transcoding pass. SmarterJSON works on the bytes in their native encoding and emits string values with the same encoding tag, the same way `smarter_csv` handles encodings. Bytes that are invalid for the claimed encoding raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`).
298
+ `encoding:` (default `nil`) labels what the input is β€” it does **not** transcode. With `nil`, SmarterJSON keeps the input's own encoding tag and emits string values with that same tag, the way `smarter_csv` does β€” **with one smart default:** input tagged `ASCII-8BIT` (BINARY) whose bytes are valid UTF-8 is treated as UTF-8. That is exactly how `Net::HTTP` and many HTTP libraries hand you a `response.body` (correct UTF-8 bytes, BINARY tag); without this, string values would come back tagged `ASCII-8BIT` and compare unequal to UTF-8 literals. If such `ASCII-8BIT` input is *not* valid UTF-8, it raises `SmarterJSON::EncodingError` rather than guess a legacy encoding β€” pass an explicit `encoding:` (e.g. `"ISO-8859-1"`) for that. Bytes invalid for an explicitly claimed encoding also raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`).
299
299
 
300
300
  ## Nesting & untrusted input
301
301
 
302
- Both the C extension and the pure-Ruby engine are **iterative, not recursive** β€” they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) is handled without issue.
302
+ Both the C extension and the pure-Ruby engine are **iterative, not recursive** β€” they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) is handled without issue. **`generate` is iterative too**, so serializing a deeply nested Ruby structure can't overflow the stack either β€” reading *and* writing are both depth-safe.
303
303
 
304
304
  The trade-off: there is currently **no fixed nesting or input-size limit**, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you process untrusted input and want a hard cap, that's a planned opt-in guard β€” for now, size-limit upstream.
305
305
 
@@ -23,7 +23,7 @@ Most JSON parsers reject anything that isn't perfectly strict JSON, and they mak
23
23
 
24
24
  * **It reads multi-document input automatically β€” a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: it always returns an `Array` of the documents found (`[]` / `[doc]` / `[d1, d2, …]`). For the common single-document case, `SmarterJSON.process_one` returns the one value directly (and warns, never raises, if there was more than one). The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. **Only SmarterJSON reads multi-document input via plain `process` β€” Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time. See [The Basic Read API](./basic_read_api.md).
25
25
 
26
- * **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) matches or beats Oj on every file we benchmark, and is competitive with the stdlib `json` C parser. Floats are decoded with the **Eisel-Lemire** algorithm (fast_float), correctly rounded and bit-for-bit identical to `JSON.parse`, so number-heavy data is fast and exact.
26
+ * **It's fast.** The C extension (with a pure-Ruby fallback that runs everywhere) is **faster than Oj/strict on every file** in our benchmark suite β€” up to **~7Γ— faster on float-heavy data** with `decimal_precision: :float` β€” **far faster than Yajl**, and **level-to-ahead of the stdlib `json` C parser**, which can't even parse deeply-nested input. Floats are decoded with the **Eisel-Lemire** algorithm (fast_float), correctly rounded and bit-for-bit identical to `JSON.parse`, so number-heavy data is fast and exact. Full per-file numbers (Apple M4 / M1, relative ratios) are in the [README Performance section](../README.md#performance).
27
27
 
28
28
  * **It writes JSON too.** `SmarterJSON.generate` turns Ruby values into strict, interoperable JSON β€” or into NDJSON, one element per line, the exact inverse of reading NDJSON back into an Array. See [The Basic Write API](./basic_write_api.md).
29
29
 
@@ -58,7 +58,7 @@ SmarterJSON.generate(Float::INFINITY) # raises SmarterJSON::GenerateError β€”
58
58
  SmarterJSON.generate(Float::NAN) # raises SmarterJSON::GenerateError β€” non-finite Float
59
59
  ```
60
60
 
61
- (`GenerateError` is a kind of `SmarterJSON::Error`, so `rescue SmarterJSON::Error` catches it. `Infinity` and `NaN` are accepted on the *read* side as a leniency, but they are not valid JSON to *write*.)
61
+ (`GenerateError` is a kind of `SmarterJSON::Error`, so `rescue SmarterJSON::Error` catches it. `Infinity` and `NaN` are accepted on the *read* side as a leniency; to *write* them, pass `allow_nan: true` and they're emitted as `NaN` / `Infinity` / `-Infinity` (JSON5-style, so SmarterJSON reads them back) β€” otherwise non-finite values raise, since they aren't valid strict JSON.)
62
62
 
63
63
  By default `generate` is strict: it only writes the types above and raises on anything else. To serialize `Time`, `Date`, or your own objects, pass `coerce: true` β€” an unsupported value is then converted by its own `as_json` (whose result is re-emitted, so escaping/`indent`/`sort_keys` still apply) or, failing that, `to_json` (spliced verbatim):
64
64
 
data/docs/options.md CHANGED
@@ -13,7 +13,7 @@
13
13
 
14
14
  ## Reading
15
15
 
16
- These options are passed to [`SmarterJSON.process`](./basic_read_api.md), `SmarterJSON.process_one`, and `SmarterJSON.process_file` as the second argument; anything you set overrides the defaults below.
16
+ These options are passed to [`SmarterJSON.process`](./basic_read_api.md), `SmarterJSON.process_one`, and `SmarterJSON.process_file` as the second argument; anything you set overrides the defaults below. Configuration is strict: an unknown option key, or a known key given the wrong type or value, raises `ArgumentError` immediately β€” leniency applies to the JSON *input* you parse, not to the options you pass, so a typo (e.g. `symbolize_names:` instead of `symbolize_keys:`) is caught right away.
17
17
 
18
18
  | Option | Default | Explanation |
19
19
  |-------------------|--------------|------------------------------------------------------------------------------------------------------------------------|
@@ -43,11 +43,11 @@ warns.map(&:type) # => [:empty_slot]
43
43
  warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
44
44
  ```
45
45
 
46
- The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
46
+ The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), `:duplicate_key` (a repeated key that was dropped), and `:number_overflow` (a numeric literal too large for `Float`, e.g. `1e400`, collapsed to `Infinity`), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
47
47
 
48
48
  ### A note on `:encoding`
49
49
 
50
- `:encoding` labels what the input *is* β€” it does not transcode. SmarterJSON works on the bytes in their native encoding and emits string values with the same encoding tag, the same way `smarter_csv` handles encodings. Bytes that are invalid for the claimed encoding raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`). A UTF-8 BOM is handled automatically; UTF-16 / UTF-32 input is out of scope.
50
+ `:encoding` labels what the input *is* β€” it does not transcode. With the default `nil`, SmarterJSON keeps the input's own encoding tag and emits string values with that tag, the same way `smarter_csv` handles encodings β€” **with one smart default:** input tagged `ASCII-8BIT` (BINARY) that is valid UTF-8 is treated as UTF-8. This is how `Net::HTTP` returns a `response.body`; without it, those string values would compare unequal to UTF-8 literals. `ASCII-8BIT` input that is *not* valid UTF-8 raises `SmarterJSON::EncodingError` β€” pass an explicit `:encoding` (e.g. `"ISO-8859-1"`) for genuinely-legacy bytes. Bytes invalid for an explicitly claimed encoding also raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`). A UTF-8 BOM is handled automatically; UTF-16 / UTF-32 input is out of scope.
51
51
 
52
52
  ### A note on `:decimal_precision`
53
53
 
@@ -59,14 +59,15 @@ These options are passed to [`SmarterJSON.generate`](./basic_write_api.md) as th
59
59
 
60
60
  | Option | Default | Explanation |
61
61
  |------------|---------|-----------------------------------------------------------------------------------------------------------------------------|
62
+ | `:allow_nan` | `false` | When `true`, non-finite `Float`/`BigDecimal` values emit the JSON5 barewords `NaN` / `Infinity` / `-Infinity` (which SmarterJSON reads back, so they round-trip). When `false` (the default), a non-finite number raises `SmarterJSON::GenerateError` β€” they aren't valid strict JSON. |
63
+ | `:ascii_only` | `false` | Escape every non-ASCII character as `\uXXXX` (astral characters as a UTF-16 surrogate pair). The default emits raw UTF-8. |
64
+ | `:coerce` | `false` | When `true`, a value that isn't natively supported is converted by its own `as_json` (the result is re-emitted, so the other options still apply) or, failing that, `to_json` (spliced verbatim). When `false` (the default), such a value raises `SmarterJSON::GenerateError`. |
62
65
  | `:format` | `:json` | `:json` writes standard JSON (Hash β†’ object, Array β†’ array, scalar β†’ scalar). `:ndjson` writes newline-delimited JSON: an Array becomes one element per line, any other value becomes a single line. |
63
66
  | `:indent` | `0` | Spaces per nesting level for pretty-printing. `0` (the default) is compact output. Empty objects/arrays stay inline. Not allowed with `:ndjson` (a record must be a single line). |
64
- | `:sort_keys` | `false` | Emit object keys in sorted order (Symbol keys sorted by their string form). Useful for canonical, diff-friendly output. |
65
- | `:ascii_only` | `false` | Escape every non-ASCII character as `\uXXXX` (astral characters as a UTF-16 surrogate pair). The default emits raw UTF-8. |
66
67
  | `:script_safe` | `false` | Escape the `/` in `</` and the JS line separators U+2028 / U+2029, so output is safe to embed in an HTML `<script>` tag. |
67
- | `:coerce` | `false` | When `true`, a value that isn't natively supported is converted by its own `as_json` (the result is re-emitted, so the other options still apply) or, failing that, `to_json` (spliced verbatim). When `false` (the default), such a value raises `SmarterJSON::GenerateError`. |
68
+ | `:sort_keys` | `false` | Emit object keys in sorted order (Symbol keys sorted by their string form). Useful for canonical, diff-friendly output. |
68
69
 
69
- Any other `:format` value, a negative/non-Integer `:indent`, or combining `:indent` with `:ndjson`, raises `ArgumentError`.
70
+ Configuration is validated up front: an unknown option key, a known key with the wrong type or value (a non-Symbol `:format`, a negative/non-Integer `:indent`, a non-boolean flag), or combining `:indent` with `:ndjson`, raises `ArgumentError`.
70
71
 
71
72
  ```ruby
72
73
  SmarterJSON.generate([1, 2, 3]) # => "[1,2,3]" (default :json β€” a single JSON array)
@@ -40,6 +40,7 @@ static ID fj_call_id; /* cached :call (invoking the on_warning handler) */
40
40
  static VALUE fj_sym_empty_slot;
41
41
  static VALUE fj_sym_empty_value;
42
42
  static VALUE fj_sym_duplicate_key;
43
+ static VALUE fj_sym_number_overflow;
43
44
  static ID fj_bigdecimal_id; /* cached BigDecimal() method id (set in Init) */
44
45
  static ID fj_to_sym_id; /* cached :to_sym (symbolize_keys) */
45
46
  static ID fj_key_p_id; /* cached :key? (non-default duplicate_key modes) */
@@ -262,6 +263,8 @@ static inline int fj_needs_ws_skip(int b) {
262
263
  /* forward declarations (mutual recursion) */
263
264
  static VALUE fj_parse_value(fj_state *st);
264
265
  static VALUE fj_parse_member_value(fj_state *st);
266
+ static int fj_smart_quote_kind(fj_state *st);
267
+ static VALUE fj_parse_smart_string(fj_state *st, int kind);
265
268
 
266
269
  static void fj_append_utf8(VALUE buf, unsigned long cp) {
267
270
  char tmp[4];
@@ -579,7 +582,8 @@ static VALUE fj_float_strtod(const char *p, long n) {
579
582
  }
580
583
 
581
584
  /* e10 is the final base-10 exponent (already adjusted by the fraction length). */
582
- static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(uint64_t m10, int m10digits, int64_t e10, int neg, int overflow, const char *p, long n) {
585
+ static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(fj_state *st, uint64_t m10, int m10digits, int64_t e10, int neg, int overflow, const char *p, long n) {
586
+ double d;
583
587
  /* Fast path by mantissa width (our scanner accumulates m10 exactly up to 18
584
588
  digits, flagging overflow beyond):
585
589
  1..18 digits -> Eisel-Lemire, correctly-rounded for any exact uint64 mantissa
@@ -589,10 +593,18 @@ static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(uint64_t m10, int m10digits, i
589
593
  >18 digits / overflow / extreme exponent -> strtod (round-to-odd). */
590
594
  if (!overflow && m10digits >= 1 && m10digits <= 18 && (long)m10digits + e10 >= -307) {
591
595
  if (m10 == 0) return rb_float_new(neg ? -0.0 : 0.0);
592
- return rb_float_new(fj_eisel_lemire_s2d(e10, m10, neg));
596
+ d = fj_eisel_lemire_s2d(e10, m10, neg);
597
+ } else {
598
+ /* Fallback for >18 digits / extreme or subnormal exponents. */
599
+ d = RFLOAT_VALUE(fj_float_strtod(p, n));
593
600
  }
594
- /* Fallback for >18 digits / extreme or subnormal exponents. */
595
- return fj_float_strtod(p, n);
601
+ /* A finite literal whose magnitude exceeds Float range (e.g. 1e400) becomes
602
+ Β±Infinity β€” a silent data change. Report it via :number_overflow (the value is
603
+ still returned). The Infinity/NaN keywords take separate paths and never get here.
604
+ Gate isinf on a listening handler (matches the Ruby float_or_warn): no handler ->
605
+ no point detecting, and it keeps the test off the hot number path. */
606
+ if (st->on_warning != Qnil && isinf(d)) fj_warn(st, fj_sym_number_overflow, "number literal out of Float range β€” collapsed to Infinity");
607
+ return rb_float_new(d);
596
608
  }
597
609
 
598
610
  /* Scan an already-bounded quoteless token [p, p+n) exactly once: validate it as a
@@ -677,7 +689,7 @@ static int fj_try_decimal(fj_state *st, const char *p, long n, VALUE *out) {
677
689
  (st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(p, n) > 16)) {
678
690
  *out = fj_to_bigdecimal_token(p, n);
679
691
  } else {
680
- *out = fj_float_from_parts(m10, m10digits, e10, neg, overflow, p, n);
692
+ *out = fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, p, n);
681
693
  }
682
694
  return 1;
683
695
  }
@@ -789,7 +801,7 @@ static VALUE fj_parse_number(fj_state *st) {
789
801
  (st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(np, nlen) > 16)) {
790
802
  return fj_to_bigdecimal_token(np, nlen);
791
803
  }
792
- return fj_float_from_parts(m10, m10digits, e10, neg, overflow, np, nlen);
804
+ return fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, np, nlen);
793
805
  }
794
806
 
795
807
  static VALUE fj_parse_literal(fj_state *st, const char *word, VALUE value) {
@@ -842,6 +854,7 @@ static VALUE fj_parse_identifier_key(fj_state *st) {
842
854
 
843
855
  static VALUE fj_parse_object_key(fj_state *st) {
844
856
  int b = fj_byte(st);
857
+ int kind;
845
858
 
846
859
  /* Quoted key. The common case has no escapes: intern straight from the buffer
847
860
  * with no throwaway allocation. An escaped key (rare) falls through to the
@@ -862,6 +875,12 @@ static VALUE fj_parse_object_key(fj_state *st) {
862
875
  return fj_parse_string(st, b);
863
876
  }
864
877
 
878
+ /* A key may open with a smart/curly quote too (a word-processor paste curls the
879
+ * keys, not just the values) β€” route to the same reader the value path uses.
880
+ * Mirrors the Ruby fallback's parse_object_key; Hash#[]= dedups the key on store. */
881
+ kind = fj_smart_quote_kind(st);
882
+ if (kind) return fj_parse_smart_string(st, kind);
883
+
865
884
  if (fj_is_key_start(b)) return fj_parse_identifier_key(st);
866
885
 
867
886
  fj_error(st, "expected a key");
@@ -1197,7 +1216,7 @@ static int fj_try_member_number(fj_state *st, VALUE *out) {
1197
1216
  (st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(np, nlen) > 16)) {
1198
1217
  *out = fj_to_bigdecimal_token(np, nlen);
1199
1218
  } else {
1200
- *out = fj_float_from_parts(m10, m10digits, e10, neg, overflow, np, nlen);
1219
+ *out = fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, np, nlen);
1201
1220
  }
1202
1221
  return 1;
1203
1222
  }
@@ -1625,6 +1644,7 @@ void Init_smarter_json(void) {
1625
1644
  fj_sym_empty_slot = ID2SYM(rb_intern("empty_slot"));
1626
1645
  fj_sym_empty_value = ID2SYM(rb_intern("empty_value"));
1627
1646
  fj_sym_duplicate_key = ID2SYM(rb_intern("duplicate_key"));
1647
+ fj_sym_number_overflow = ID2SYM(rb_intern("number_overflow"));
1628
1648
  fj_sym_encoding = ID2SYM(rb_intern("encoding"));
1629
1649
  fj_sym_symbolize_keys = ID2SYM(rb_intern("symbolize_keys"));
1630
1650
  fj_sym_first_wins = ID2SYM(rb_intern("first_wins"));
@@ -34,7 +34,17 @@ module SmarterJSON
34
34
  # (including multi-byte UTF-8) is emitted raw β€” valid JSON.
35
35
  ESCAPE_RE = /["\\\x00-\x1f]/.freeze
36
36
 
37
+ # Strict configuration: an unknown writer option is a caller bug, so it raises
38
+ # rather than being silently ignored.
39
+ KNOWN_OPTIONS = %i[format indent ascii_only script_safe sort_keys coerce allow_nan].freeze
40
+
37
41
  def initialize(options = {})
42
+ unknown = options.keys - KNOWN_OPTIONS
43
+ unless unknown.empty?
44
+ raise ArgumentError, "SmarterJSON.generate: unknown option#{unknown.size == 1 ? '' : 's'} " \
45
+ "#{unknown.map(&:inspect).join(', ')} β€” valid keys: #{KNOWN_OPTIONS.map(&:inspect).join(', ')}"
46
+ end
47
+
38
48
  @format = options.fetch(:format, :json)
39
49
  unless %i[json ndjson].include?(@format)
40
50
  raise ArgumentError, "unknown writer format: #{@format.inspect} (expected :json or :ndjson)"
@@ -50,10 +60,11 @@ module SmarterJSON
50
60
 
51
61
  @pretty = @indent > 0
52
62
 
53
- @ascii_only = options.fetch(:ascii_only, false) # escape non-ASCII as \uXXXX
54
- @script_safe = options.fetch(:script_safe, false) # escape </ and U+2028 / U+2029
55
- @sort_keys = options.fetch(:sort_keys, false) # emit object keys in sorted order
56
- @coerce = options.fetch(:coerce, false) # convert unknown types via as_json / to_json
63
+ @ascii_only = boolean_option(options, :ascii_only) # escape non-ASCII as \uXXXX
64
+ @script_safe = boolean_option(options, :script_safe) # escape </ and U+2028 / U+2029
65
+ @sort_keys = boolean_option(options, :sort_keys) # emit object keys in sorted order
66
+ @coerce = boolean_option(options, :coerce) # convert unknown types via as_json / to_json
67
+ @allow_nan = boolean_option(options, :allow_nan) # emit NaN / Infinity / -Infinity (JSON5) instead of raising
57
68
  @escape_re = build_escape_re
58
69
  end
59
70
 
@@ -77,7 +88,48 @@ module SmarterJSON
77
88
 
78
89
  private
79
90
 
80
- def emit(obj, buf, level = 0)
91
+ # A boolean writer option must be exactly true or false β€” a wrong type is a
92
+ # caller bug, so it raises rather than being coerced or ignored.
93
+ def boolean_option(options, key)
94
+ value = options.fetch(key, false)
95
+ return value if value == true || value == false
96
+
97
+ raise ArgumentError, "#{key} must be true or false (got #{value.inspect})"
98
+ end
99
+
100
+ # Iterative serializer β€” an explicit frame stack (one frame per open container),
101
+ # mirroring the recursive structure but heap-allocated, so arbitrarily deep input
102
+ # cannot overflow the call stack (parity with the iterative parser). Output is
103
+ # byte-identical to the former recursive version. A frame is a small Array:
104
+ # [members, idx, is_hash, before_first, before_rest, colon, closer, level]
105
+ def emit(obj, buf)
106
+ stack = []
107
+ push_value(obj, 0, buf, stack)
108
+ until stack.empty?
109
+ frame = stack.last
110
+ members = frame[0]
111
+ i = frame[1]
112
+ if i == members.length
113
+ buf << frame[6] # closer
114
+ stack.pop
115
+ next
116
+ end
117
+ frame[1] = i + 1
118
+ buf << (i.zero? ? frame[3] : frame[4]) # opener-pad / separator-pad
119
+ if frame[2] # hash
120
+ k, v = members[i]
121
+ emit_string(k.is_a?(String) ? k : k.to_s, buf) # Symbol/other keys -> string
122
+ buf << frame[5] # colon
123
+ push_value(v, frame[7] + 1, buf, stack)
124
+ else
125
+ push_value(members[i], frame[7] + 1, buf, stack)
126
+ end
127
+ end
128
+ end
129
+
130
+ # Emit one value at `level`: a scalar appends directly; a non-empty container writes
131
+ # its opener and pushes a frame for the driver above to walk (no recursion into it).
132
+ def push_value(obj, level, buf, stack)
81
133
  case obj
82
134
  when nil then buf << "null"
83
135
  when true then buf << "true"
@@ -87,22 +139,30 @@ module SmarterJSON
87
139
  when Integer then buf << obj.to_s
88
140
  when Float then emit_float(obj, buf)
89
141
  when BigDecimal then emit_bigdecimal(obj, buf)
90
- when Array then emit_array(obj, buf, level)
91
- when Hash then emit_hash(obj, buf, level)
142
+ when Array
143
+ return buf << "[]" if obj.empty? # empty stays inline, even in pretty mode
144
+
145
+ buf << (@pretty ? "[\n" : "[")
146
+ stack << container_frame(obj, false, level)
147
+ when Hash
148
+ return buf << "{}" if obj.empty? # empty stays inline, even in pretty mode
149
+
150
+ pairs = @sort_keys ? obj.sort_by { |k, _| k.is_a?(String) ? k : k.to_s } : obj.to_a
151
+ buf << (@pretty ? "{\n" : "{")
152
+ stack << container_frame(pairs, true, level)
92
153
  else
93
- return emit_coerced(obj, buf, level) if @coerce
154
+ return push_coerced(obj, level, buf, stack) if @coerce
94
155
 
95
156
  raise SmarterJSON::GenerateError, "SmarterJSON.generate cannot serialize #{obj.class}"
96
157
  end
97
158
  end
98
159
 
99
- # coerce: true β€” let a value that isn't natively supported convert itself.
100
- # Prefer as_json (its result is re-emitted through the normal pipeline, so the
101
- # escaping/format options still apply); fall back to to_json (spliced as-is, so
102
- # ascii_only / script_safe do not reach inside it). Raise if it defines neither.
103
- def emit_coerced(obj, buf, level)
160
+ # coerce: true β€” prefer as_json (re-emitted through the normal pipeline, so the
161
+ # escaping/format options still apply); else to_json (spliced as-is, so ascii_only /
162
+ # script_safe do not reach inside it); else raise.
163
+ def push_coerced(obj, level, buf, stack)
104
164
  if obj.respond_to?(:as_json)
105
- emit(obj.as_json, buf, level)
165
+ push_value(obj.as_json, level, buf, stack)
106
166
  elsif obj.respond_to?(:to_json)
107
167
  buf << obj.to_json
108
168
  else
@@ -110,57 +170,16 @@ module SmarterJSON
110
170
  end
111
171
  end
112
172
 
113
- def emit_array(arr, buf, level)
114
- return buf << "[]" if arr.empty? # empty stays inline, even in pretty mode
115
-
116
- if @pretty
117
- pad = " " * (@indent * (level + 1))
118
- buf << "[\n"
119
- arr.each_with_index do |v, i|
120
- buf << ",\n" unless i.zero?
121
- buf << pad
122
- emit(v, buf, level + 1)
123
- end
124
- buf << "\n" << (" " * (@indent * level)) << "]"
125
- else
126
- buf << "["
127
- arr.each_with_index do |v, i|
128
- buf << "," unless i.zero?
129
- emit(v, buf, level)
130
- end
131
- buf << "]"
132
- end
133
- end
134
-
135
- def emit_hash(hash, buf, level)
136
- return buf << "{}" if hash.empty? # empty stays inline, even in pretty mode
137
-
138
- pairs = @sort_keys ? hash.sort_by { |k, _| k.is_a?(String) ? k : k.to_s } : hash
139
-
173
+ # Build a frame for an open container at `level`, precomputing its punctuation/indent
174
+ # once (as the recursive version computed `pad` once per container).
175
+ def container_frame(members, is_hash, level)
176
+ close_glyph = is_hash ? "}" : "]"
140
177
  if @pretty
141
- pad = " " * (@indent * (level + 1))
142
- buf << "{\n"
143
- first = true
144
- pairs.each do |k, v|
145
- buf << ",\n" unless first
146
- first = false
147
- buf << pad
148
- emit_string(k.is_a?(String) ? k : k.to_s, buf) # Symbol/other keys -> string
149
- buf << ": "
150
- emit(v, buf, level + 1)
151
- end
152
- buf << "\n" << (" " * (@indent * level)) << "}"
178
+ pad = " " * (@indent * (level + 1))
179
+ padl = " " * (@indent * level)
180
+ [members, 0, is_hash, pad, ",\n#{pad}", ": ", "\n#{padl}#{close_glyph}", level]
153
181
  else
154
- buf << "{"
155
- first = true
156
- pairs.each do |k, v|
157
- buf << "," unless first
158
- first = false
159
- emit_string(k.is_a?(String) ? k : k.to_s, buf) # Symbol/other keys -> string
160
- buf << ":"
161
- emit(v, buf, level)
162
- end
163
- buf << "}"
182
+ [members, 0, is_hash, "", ",", ":", close_glyph, level]
164
183
  end
165
184
  end
166
185
 
@@ -195,15 +214,31 @@ module SmarterJSON
195
214
  end
196
215
 
197
216
  def emit_float(flt, buf)
198
- raise SmarterJSON::GenerateError, "SmarterJSON.generate cannot serialize non-finite Float #{flt}" unless flt.finite?
217
+ unless flt.finite?
218
+ raise SmarterJSON::GenerateError, "SmarterJSON.generate cannot serialize non-finite Float #{flt}" unless @allow_nan
219
+
220
+ return buf << non_finite_literal(flt)
221
+ end
199
222
 
200
223
  buf << flt.to_s # Ruby's Float#to_s is shortest round-trippable; e-notation is valid JSON
201
224
  end
202
225
 
203
226
  def emit_bigdecimal(num, buf)
204
- raise SmarterJSON::GenerateError, "SmarterJSON.generate cannot serialize non-finite BigDecimal" unless num.finite?
227
+ unless num.finite?
228
+ raise SmarterJSON::GenerateError, "SmarterJSON.generate cannot serialize non-finite BigDecimal" unless @allow_nan
229
+
230
+ return buf << non_finite_literal(num)
231
+ end
205
232
 
206
233
  buf << num.to_s("F") # plain decimal notation (BigDecimal's default "0.1e1" is not valid JSON)
207
234
  end
235
+
236
+ # JSON5-style literals for non-finite numbers, emitted only when allow_nan: true.
237
+ # `infinite?` returns 1 / -1 / nil for both Float and BigDecimal.
238
+ def non_finite_literal(num)
239
+ return "NaN" if num.nan?
240
+
241
+ num.infinite? == 1 ? "Infinity" : "-Infinity"
242
+ end
208
243
  end
209
244
  end
@@ -7,7 +7,7 @@ module SmarterJSON
7
7
  module Options
8
8
  DEFAULT_OPTIONS = {
9
9
  acceleration: true, # use the C extension when available; false forces pure Ruby
10
- encoding: nil, # label the input's encoding (no transcoding); nil keeps the input's own
10
+ encoding: nil, # label the input's encoding (no transcoding); nil keeps the input's own (valid-UTF-8 ASCII-8BIT β†’ UTF-8)
11
11
  symbolize_keys: false, # Symbol keys instead of String
12
12
  duplicate_key: :last_wins, # :last_wins | :first_wins (repeats are also reported via on_warning)
13
13
  decimal_precision: :auto, # :auto | :float | :bigdecimal (Oj-compatible decimal handling)
@@ -24,17 +24,30 @@ module SmarterJSON
24
24
  end
25
25
 
26
26
  # Raise ArgumentError (consistent with the generator's option checks) listing
27
- # every invalid setting at once. Unknown keys are ignored, matching the lenient
28
- # design β€” an option SmarterJSON doesn't recognize simply has no effect.
27
+ # every problem at once. Configuration is strict β€” unlike the lenient *data*
28
+ # handling, an unknown option key or a bad value raises, so a caller's typo or
29
+ # wrong type is caught immediately instead of silently having no effect.
29
30
  def validate_options!(options)
30
31
  errors = []
31
32
 
33
+ unknown = options.keys - DEFAULT_OPTIONS.keys
34
+ unless unknown.empty?
35
+ errors << "unknown option#{unknown.size == 1 ? '' : 's'} #{unknown.map(&:inspect).join(', ')} " \
36
+ "β€” valid keys: #{DEFAULT_OPTIONS.keys.map(&:inspect).join(', ')}"
37
+ end
38
+
32
39
  unless %i[auto float bigdecimal].include?(options[:decimal_precision])
33
40
  errors << "decimal_precision must be :auto, :float, or :bigdecimal (got #{options[:decimal_precision].inspect})"
34
41
  end
35
42
  unless %i[last_wins first_wins].include?(options[:duplicate_key])
36
43
  errors << "duplicate_key must be :last_wins or :first_wins (got #{options[:duplicate_key].inspect})"
37
44
  end
45
+ unless [true, false].include?(options[:acceleration])
46
+ errors << "acceleration must be true or false (got #{options[:acceleration].inspect})"
47
+ end
48
+ unless [true, false].include?(options[:symbolize_keys])
49
+ errors << "symbolize_keys must be true or false (got #{options[:symbolize_keys].inspect})"
50
+ end
38
51
  on_warning = options[:on_warning]
39
52
  unless on_warning.nil? || on_warning.respond_to?(:call)
40
53
  errors << "on_warning must be nil or a callable (got #{on_warning.class})"
@@ -109,6 +109,25 @@ module SmarterJSON
109
109
  end
110
110
  end
111
111
 
112
+ # Smart default for the nil :encoding option. A String tagged ASCII-8BIT (BINARY)
113
+ # is how Net::HTTP and many HTTP libraries hand back a response body even when the
114
+ # bytes are UTF-8. JSON's interchange encoding is UTF-8, so we relabel such input
115
+ # to UTF-8 when its bytes are valid UTF-8 β€” otherwise string values would come back
116
+ # tagged ASCII-8BIT and compare unequal to UTF-8 literals (a silent footgun). When
117
+ # the bytes are NOT valid UTF-8 we raise EncodingError rather than guess a legacy
118
+ # encoding β€” pass an explicit :encoding for that. An explicit (non-nil) :encoding,
119
+ # or any non-BINARY tag, is left untouched (the per-path force_encoding / validation
120
+ # handles it). Only relabels β€” never transcodes.
121
+ def normalize_default_encoding(input, options)
122
+ return input unless options[:encoding].nil?
123
+ return input unless input.encoding == Encoding::ASCII_8BIT
124
+
125
+ utf8 = input.dup.force_encoding(Encoding::UTF_8)
126
+ return utf8 if utf8.valid_encoding?
127
+
128
+ raise EncodingError, "input is tagged ASCII-8BIT and is not valid UTF-8 β€” pass encoding: to declare its encoding"
129
+ end
130
+
112
131
  # Stream documents from an IO incrementally, yielding each recovered top-level
113
132
  # document without slurping the whole input into memory first.
114
133
  def stream_io(io, options, &block)
@@ -411,6 +430,7 @@ module SmarterJSON
411
430
  module_function
412
431
 
413
432
  def process_string(input, options, &block)
433
+ input = SmarterJSON.send(:normalize_default_encoding, input, options)
414
434
  return SmarterJSON.send(:process_content, input, options, &block) unless input.valid_encoding?
415
435
 
416
436
  # Recovery is REACTIVE: parse first, and only fall back to wrapper extraction when
@@ -1220,6 +1240,12 @@ module SmarterJSON
1220
1240
  b = byte
1221
1241
  return parse_string(DQUOTE) if b == DQUOTE
1222
1242
  return parse_string(SQUOTE) if b == SQUOTE
1243
+
1244
+ # A key may open with a smart/curly quote too (word-processor paste curls keys,
1245
+ # not just values) β€” route to the same reader values already use.
1246
+ kind = smart_quote_kind(@pos)
1247
+ return parse_smart_string(kind) if kind
1248
+
1223
1249
  raise error("expected a key") unless b && key_start_byte?(b)
1224
1250
 
1225
1251
  parse_identifier_key
@@ -1365,12 +1391,25 @@ module SmarterJSON
1365
1391
  # than 16 significant digits (Oj's DEC_MAX threshold), else Float.
1366
1392
  def decimal_value(body)
1367
1393
  case @decimal_precision
1368
- when :float then body.to_f
1394
+ when :float then float_or_warn(body)
1369
1395
  when :bigdecimal then to_big_decimal(body)
1370
- else significant_digits(body) > 16 ? to_big_decimal(body) : body.to_f
1396
+ else significant_digits(body) > 16 ? to_big_decimal(body) : float_or_warn(body)
1371
1397
  end
1372
1398
  end
1373
1399
 
1400
+ # A finite numeric literal whose magnitude exceeds Float range (e.g. 1e400) becomes
1401
+ # Β±Infinity β€” a silent data change. Report it via :number_overflow (the value is still
1402
+ # returned; we warn rather than raise or invent). The Infinity/NaN *keywords* go through
1403
+ # a separate path and never reach here, so they don't warn.
1404
+ def float_or_warn(body)
1405
+ f = body.to_f
1406
+ # Only test for overflow when an on_warning handler is listening: `f.infinite?` is a
1407
+ # per-float method call we don't want on the hot number path otherwise, and with no
1408
+ # handler the warning would go nowhere anyway. Overflow is vanishingly rare.
1409
+ warn(:number_overflow, "number literal out of Float range β€” collapsed to #{f}") if @on_warning && f.infinite?
1410
+ f
1411
+ end
1412
+
1374
1413
  # Count significant mantissa digits (leading zeros excluded, exponent ignored) to pick
1375
1414
  # Float vs BigDecimal in :auto mode. A single byte-scan β€” the old three-regex version
1376
1415
  # (strip exponent, strip non-digits, strip leading zeros, .length) ran on every float
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterJSON
4
- VERSION = "0.9.9"
4
+ VERSION = "1.0.0"
5
5
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_json
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.9
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  bindir: exe
9
9
  cert_chain: []
10
- date: 2026-06-07 00:00:00.000000000 Z
10
+ date: 2026-06-09 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
13
  name: bigdecimal
@@ -23,10 +23,16 @@ dependencies:
23
23
  - - ">="
24
24
  - !ruby/object:Gem::Version
25
25
  version: '0'
26
- description: 'SmarterJSON is a permissive JSON/JSON5 parser: comments, trailing commas,
27
- different quote styles, Python/JS keywords, and more, all parse to the same Ruby
28
- objects. Purposely no strict mode, always best-effort, blazing fast. Handles BOM,
29
- smart quotes, messy input. Compatible with config/data files and API responses alike.
26
+ description: 'A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON,
27
+ JSON5, HJSON-style config, and the messy JSON-ish input humans and LLMs actually
28
+ write β€” comments, trailing commas, single / unquoted / smart quotes, Python and
29
+ JS keywords, a UTF-8 BOM, and more all parse to the same Ruby objects, with no modes
30
+ or flags to set. Where a traditional parser stops at the first deviation and throws
31
+ away the whole document, SmarterJSON keeps going β€” it optimizes for getting your
32
+ data out, not for policing the JSON spec. It reads multi-document NDJSON / JSONL
33
+ in one call (and streams it with a block), and in benchmarks its C extension matches
34
+ or beats Oj on nearly every file. SmarterJSON is opinionated: we want your JSON
35
+ processing to be successful.
30
36
 
31
37
  '
32
38
  email:
@@ -86,6 +92,6 @@ required_rubygems_version: !ruby/object:Gem::Requirement
86
92
  requirements: []
87
93
  rubygems_version: 3.6.9
88
94
  specification_version: 4
89
- summary: 'SmarterJSON: A lenient, robust, streaming JSON parser for Ruby supporting
90
- JSON, JSON5, NDJSON, and HJSON-style input.'
95
+ summary: A lenient, fast JSON processor for Ruby β€” reads strict JSON, NDJSON, JSON5,
96
+ HJSON, and the messy JSON humans and LLMs actually write.
91
97
  test_files: []