smarter_json 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +46 -0
- data/CHANGELOG.md +70 -0
- data/LICENSE.txt +21 -0
- data/README.md +110 -0
- data/Rakefile +22 -0
- data/docs/_introduction.md +48 -0
- data/docs/basic_read_api.md +72 -0
- data/docs/basic_write_api.md +91 -0
- data/docs/examples.md +140 -0
- data/docs/options.md +58 -0
- data/ext/smarter_json/extconf.rb +30 -0
- data/ext/smarter_json/smarter_json.c +1424 -0
- data/ext/smarter_json/smarter_json.h +9 -0
- data/ext/smarter_json/vendor/ryu.h +819 -0
- data/ext/smarter_json/vendor/ryu.md +22 -0
- data/lib/smarter_json/errors.rb +28 -0
- data/lib/smarter_json/generator.rb +117 -0
- data/lib/smarter_json/parser.rb +926 -0
- data/lib/smarter_json/version.rb +5 -0
- data/lib/smarter_json.rb +24 -0
- metadata +86 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: cbc1f54c56cf5a1fd4c569660faf9f4115e5e7ed8442ac9e8e7105bc880a3912
|
|
4
|
+
data.tar.gz: 5e68d7b1dafa55347cf5de1ee10e8ac39a97b645996fd0c58538799bbbb1191d
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 0e84fb4caf1fc9b192aa0e88f5111c3178557078abd8a31e7ce73373590e2487344c8df9ad61e718756e78a15239b460fd05b5f7a6fdbede35d8859f1873f5be
|
|
7
|
+
data.tar.gz: 9d808b8b3e8465ce7b12ae861053e67a77ef1550df425c6383ffe4b799ba401a6f927f692b09709942fdb9e690a1c9d25a5f4453d57b9b532e18f979d171199d
|
data/.gitignore
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# General ignores for a Ruby gem
|
|
2
|
+
*.gem
|
|
3
|
+
|
|
4
|
+
# Ignore Bundler lockfile for gems (leave out for gems, include for apps)
|
|
5
|
+
Gemfile.lock
|
|
6
|
+
|
|
7
|
+
# Ignore RSpec status persistence
|
|
8
|
+
.rspec_status
|
|
9
|
+
|
|
10
|
+
# Ignore build output
|
|
11
|
+
/pkg/
|
|
12
|
+
/tmp/
|
|
13
|
+
/log/
|
|
14
|
+
/coverage/
|
|
15
|
+
|
|
16
|
+
# Ignore RuboCop, Yard, docs, bundler
|
|
17
|
+
/.bundle/
|
|
18
|
+
.yardoc
|
|
19
|
+
/.yardoc/
|
|
20
|
+
/.byebug_history
|
|
21
|
+
/.pry_history
|
|
22
|
+
/.irb-history
|
|
23
|
+
|
|
24
|
+
# Ignore Mac, editor, IDE artifacts
|
|
25
|
+
.DS_Store
|
|
26
|
+
*~
|
|
27
|
+
*.swp
|
|
28
|
+
*.swo
|
|
29
|
+
*.tmp
|
|
30
|
+
.vscode/
|
|
31
|
+
.idea/
|
|
32
|
+
|
|
33
|
+
# Ignore node_modules just in case
|
|
34
|
+
node_modules/
|
|
35
|
+
|
|
36
|
+
# Ignore test coverage output
|
|
37
|
+
overage/
|
|
38
|
+
|
|
39
|
+
# Ignore binary object files, extensions
|
|
40
|
+
*.so
|
|
41
|
+
*.o
|
|
42
|
+
*.bundle
|
|
43
|
+
*.rbc
|
|
44
|
+
|
|
45
|
+
.claude/
|
|
46
|
+
CLAUDE.md
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
|
|
2
|
+
# SmarterJSON Change Log
|
|
3
|
+
|
|
4
|
+
## 0.5.1 (2026-06-01)
|
|
5
|
+
- Unified the error classes under a single `SmarterJSON::Error` base: `ParseError` and `EncodingError` now inherit from it, and `generate` raises a new `GenerateError`. `rescue SmarterJSON::Error` now catches everything the gem raises.
|
|
6
|
+
- Added a CI test matrix (Ruby 2.6–4.0 + head, on Ubuntu and macOS).
|
|
7
|
+
- Fixed the C extension build on Ruby 2.6 (declare `rb_hash_bulk_insert`, which 2.6 exports but does not declare in its headers); set the minimum Ruby to 2.6.
|
|
8
|
+
|
|
9
|
+
## 0.5.0 (2026-05-31 unreleased)
|
|
10
|
+
- add JSON generation, incl. NDJSON generation
|
|
11
|
+
- add test coverage
|
|
12
|
+
|
|
13
|
+
## 0.4.0 (2026-05-31 unreleased)
|
|
14
|
+
- rename `flex_json` -> `smarter_json`
|
|
15
|
+
|
|
16
|
+
## 0.3.10 (2026-05-31 unreleased)
|
|
17
|
+
- change interface to use `.process` and `.process_file`
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
## 0.3.9 (2026-05-31 unreleased)
|
|
21
|
+
- `parse` (no block) now handles any input automatically: 0 documents (empty / whitespace / comment-only) → `nil`, 1 document → the value itself, 2+ documents (NDJSON / JSONL / concatenated / whitespace-separated) → an Array of the values. It no longer raises on trailing content.
|
|
22
|
+
- Detection is free (the same trailing-content check that used to raise) and the single-document path allocates no Array, so single-value parsing is unchanged in speed.
|
|
23
|
+
- The block form (`parse(input) { |doc| … }`) is kept as the bounded-memory streaming path. `parse_file(path) { |doc| … }` now forwards the block too, so files stream the same way (previously the block was silently ignored). Bracketless comma lists (`1, 2, 3`) still raise — commas don't separate top-level documents (implicit-root array remains unsupported).
|
|
24
|
+
- The block form allows individual processing of each line in NDJSON files.
|
|
25
|
+
- Supersedes the earlier "raise on trailing content, match Oj" behavior.
|
|
26
|
+
|
|
27
|
+
## 0.3.8 (2026-05-30 unreleased)
|
|
28
|
+
- Reordered single-character checks so the more common byte is tested first (`-` before `+`).
|
|
29
|
+
- Quoteless-token boundary scan now uses a 256-byte class table: ordinary bytes are classified in one table lookup, and the lookahead byte is read only at a `#`/`/` instead of on every byte. Speeds up quoteless / config-style input (the lenient case the JSON benchmarks don't exercise).
|
|
30
|
+
|
|
31
|
+
## 0.3.7 (2026-05-30 unreleased)
|
|
32
|
+
- Escaped-string literal runs are bulk-copied with the NEON scanner instead of one byte at a time.
|
|
33
|
+
- Added branch hints (`__builtin_expect`) and prefetch to the hot string-scan loop. Sped up string-heavy files (string_array, github_events, twitter all 12–16% faster).
|
|
34
|
+
|
|
35
|
+
## 0.3.6 (2026-05-30 unreleased)
|
|
36
|
+
- Fast path for plain numbers inside objects/arrays (`fj_try_member_number`): one scan straight from the cursor, committing when the number meets a delimiter and falling back to the quoteless scanner otherwise. Skips the quoteless boundary scan + classify dispatch for the common case. Broad gains on number-in-container files (weather, canada, usgs, big_decimals).
|
|
37
|
+
|
|
38
|
+
## 0.3.5 (2026-05-30 unreleased)
|
|
39
|
+
- Rewrote `fj_parse_number` (top-level numbers) as a single pass: finds the token end and accumulates the mantissa/exponent at once, using the string's NUL terminator as a scan sentinel (no per-byte bounds check) and a digit loop that skips the underscore check until an underscore actually appears.
|
|
40
|
+
- Added `fj_try_decimal` for the quoteless path: validates and extracts the number in one scan, replacing the old three scans (validate + significant-digit count + mantissa extraction); skips the significant-digit scan when the number has ≤16 digits.
|
|
41
|
+
- Both number paths now build values through the shared `fj_int_from_parts` / `fj_float_from_parts` helpers so they can't drift; removed the now-dead `fj_validate_decimal` / `fj_int_value` / `fj_decimal_value`.
|
|
42
|
+
|
|
43
|
+
## 0.3.4 (2026-05-30 unreleased)
|
|
44
|
+
- Dropped a per-member Ruby method call (`key?`) that fired for every object member under the default duplicate-key mode — pure waste on object-heavy files (twitter, github_events, citm).
|
|
45
|
+
- Build objects and arrays from a C value stack with a pre-sized hash + bulk insert (and size-based duplicate detection), instead of inserting one member/element at a time.
|
|
46
|
+
- Added a per-parse key cache so repeated object keys are interned once instead of every occurrence.
|
|
47
|
+
|
|
48
|
+
## 0.3.3 (2026-05-30 unreleased)
|
|
49
|
+
- Vendored Ryū (Ulf Adams, Apache-2.0) for correctly-rounded string→double conversion: the mantissa is accumulated in one pass and converted with no `strtod`. Large win on float-heavy files (canada, big_decimals).
|
|
50
|
+
|
|
51
|
+
## 0.3.3 (2026-05-29 unreleased)
|
|
52
|
+
- performance fixes
|
|
53
|
+
|
|
54
|
+
## 0.3.2 (2026-05-29 unreleased)
|
|
55
|
+
- performance fixes
|
|
56
|
+
|
|
57
|
+
## 0.3.1 (2026-05-29 unreleased)
|
|
58
|
+
- performance fixes
|
|
59
|
+
|
|
60
|
+
## 0.3.0 (2026-05-29 unreleased)
|
|
61
|
+
- iterative parser
|
|
62
|
+
|
|
63
|
+
## 0.2.0 (2026-05-29 unreleased)
|
|
64
|
+
- recursive parser
|
|
65
|
+
|
|
66
|
+
## 0.1.1 (2026-05-29 unreleased)
|
|
67
|
+
- MVP complete
|
|
68
|
+
|
|
69
|
+
## 0.1.0 (2026-05-28 unreleased)
|
|
70
|
+
- Initial Ruby version
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Tilo Sloboda
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# SmarterJSON
|
|
2
|
+
|
|
3
|
+
 [](https://codecov.io/gh/tilo/smarter_json) [](https://rubygems.org/gems/smarter_json) [](https://rubygems.org/gems/smarter_json) [](https://www.ruby-toolbox.com/projects/smarter_json)
|
|
4
|
+
|
|
5
|
+
A lenient, fast JSON parser for Ruby. It parses strict JSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on nearly every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Other parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
|
|
6
|
+
|
|
7
|
+
## Why SmarterJSON?
|
|
8
|
+
|
|
9
|
+
Most JSON parsers reject anything that isn't perfectly strict JSON. SmarterJSON is built on the opposite principle: **you shouldn't have to care what flavor of JSON you were handed.** Give it strict JSON, JSON5, an HJSON-style config file, newline-delimited JSON, or a copy-pasted blob with comments and trailing commas — it just parses it.
|
|
10
|
+
|
|
11
|
+
Three things set it apart:
|
|
12
|
+
|
|
13
|
+
1. **One parser, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.
|
|
14
|
+
|
|
15
|
+
2. **It parses multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: one document returns its value, several documents return an `Array`, empty input returns `nil`. **Only SmarterJSON parses multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time.
|
|
16
|
+
|
|
17
|
+
3. **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser — the fastest general-purpose Ruby JSON parser.
|
|
18
|
+
|
|
19
|
+
## What it accepts, beyond strict JSON
|
|
20
|
+
|
|
21
|
+
- `//`, `/* … */`, and `#` comments (a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` parses as a string, not a truncated value)
|
|
22
|
+
- Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
|
|
23
|
+
- Implicit root object — a config file that starts with `key: value`, no outer `{}`
|
|
24
|
+
- `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
|
|
25
|
+
- UTF-8 BOM, smart/curly quotes, Python literals (`True` / `False` / `None`), JavaScript `undefined`
|
|
26
|
+
- Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via `encoding:`)
|
|
27
|
+
- Duplicate keys (last value wins by default; configurable)
|
|
28
|
+
|
|
29
|
+
It raises only on genuinely unparseable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.
|
|
30
|
+
|
|
31
|
+
## Installation
|
|
32
|
+
|
|
33
|
+
```ruby
|
|
34
|
+
# Gemfile
|
|
35
|
+
gem "smarter_json"
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
gem install smarter_json
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
The C extension is built on install and used automatically. On platforms where it can't build, the pure-Ruby parser runs instead and produces identical results.
|
|
43
|
+
|
|
44
|
+
## Documentation
|
|
45
|
+
|
|
46
|
+
* [Introduction](docs/_introduction.md)
|
|
47
|
+
* [The Basic Read API](docs/basic_read_api.md)
|
|
48
|
+
* [The Basic Write API](docs/basic_write_api.md)
|
|
49
|
+
* [Configuration Options](docs/options.md)
|
|
50
|
+
* [Examples](docs/examples.md)
|
|
51
|
+
|
|
52
|
+
## Usage
|
|
53
|
+
|
|
54
|
+
```ruby
|
|
55
|
+
require "smarter_json"
|
|
56
|
+
|
|
57
|
+
SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
|
|
58
|
+
SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
|
|
59
|
+
SmarterJSON.process_file("config.json5") # read a file, then parse
|
|
60
|
+
|
|
61
|
+
# Multiple documents (NDJSON / JSONL / concatenated) — no block, no special method:
|
|
62
|
+
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]
|
|
63
|
+
SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
|
|
64
|
+
SmarterJSON.process("") # => nil (zero documents)
|
|
65
|
+
|
|
66
|
+
# For input larger than memory, stream one document at a time with a block
|
|
67
|
+
# (process and process_file both forward the block):
|
|
68
|
+
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Options
|
|
72
|
+
|
|
73
|
+
| option | default | meaning |
|
|
74
|
+
|-------------------|--------------|-------------------------------------------------------------------------|
|
|
75
|
+
| `symbolize_keys` | `false` | return object keys as Symbols instead of Strings |
|
|
76
|
+
| `duplicate_key` | `:last_wins` | `:last_wins` / `:first_wins` / `:raise` for repeated keys in one object |
|
|
77
|
+
| `bigdecimal_load` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
|
|
78
|
+
| `acceleration` | `true` | `true` uses the C extension when compiled and loadable; `false` forces pure Ruby (identical results) |
|
|
79
|
+
| `encoding` | `"UTF-8"` | labels the input's encoding (no transcoding pass; see below) |
|
|
80
|
+
|
|
81
|
+
## Performance
|
|
82
|
+
|
|
83
|
+
Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpus (canada, citm_catalog, twitter, github_events, …). The apples-to-apples comparisons are **SmarterJSON/C** vs **Oj/strict** vs **stdlib `json`**, all producing `Float` (run `rake report` in `json_benchmarks/` for the full table — numbers vary run to run).
|
|
84
|
+
|
|
85
|
+
- **vs Oj:** SmarterJSON/C matches or beats Oj on nearly every file — typically **1.1–1.7× faster** (e.g. deeply-nested ~1.7×, citm ~1.3×, twitter ~1.3×, usgs/weather ~1.2–1.3×).
|
|
86
|
+
- **vs stdlib `json` (C):** competitive with the fastest Ruby JSON parser — it matches `json` on number- and string-heavy files (e.g. big_decimals, string_array) and trails by ~1.2–1.6× on others.
|
|
87
|
+
- **Numbers:** floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
|
|
88
|
+
|
|
89
|
+
**Two notes on fair comparison:**
|
|
90
|
+
|
|
91
|
+
- **NDJSON:** on multi-document files, **only SmarterJSON parses the input via plain `process`** — Oj and `json` raise without a block, so their cells are `N/A`. That `N/A` reflects real default behavior, not a measurement gap. Plain `process` collects every document into an Array at ~270 MB/s; the streaming block form runs faster (~440 MB/s) because it doesn't hold all documents in memory at once — use it for input larger than RAM.
|
|
92
|
+
- **High-precision decimals (e.g. `canada.json`):** SmarterJSON's default `:auto` mode preserves high-precision numbers as `BigDecimal` (matching Oj's default), which is intrinsically slower than `Float`. Against `Float`-producing parsers it looks slower on such files; pass `bigdecimal_load: :float` to compare like-for-like (it then runs much faster). Against the equivalent `BigDecimal`-producing Oj mode, SmarterJSON is faster.
|
|
93
|
+
|
|
94
|
+
## Encoding
|
|
95
|
+
|
|
96
|
+
`encoding:` (default `"UTF-8"`) labels what the input is — it does **not** trigger a transcoding pass. The parser works on the bytes in their native encoding and emits string values with the same encoding tag, the same way `smarter_csv` handles encodings. Bytes that are invalid for the claimed encoding raise `SmarterJSON::EncodingError` (a kind of `SmarterJSON::ParseError`).
|
|
97
|
+
|
|
98
|
+
## Nesting & untrusted input
|
|
99
|
+
|
|
100
|
+
Both the C extension and the pure-Ruby parser are **iterative, not recursive** — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) parses without issue.
|
|
101
|
+
|
|
102
|
+
The trade-off: there is currently **no fixed nesting or input-size limit**, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you parse untrusted input and want a hard cap, that's a planned opt-in guard — for now, size-limit upstream of the parser.
|
|
103
|
+
|
|
104
|
+
## Development
|
|
105
|
+
|
|
106
|
+
After checking out the repo, run `bin/setup` to install dependencies, then `rake compile` to build the C extension and `rake spec` to run the tests. The test suite runs every example against **both** the C and pure-Ruby paths, so the two stay behavior-identical.
|
|
107
|
+
|
|
108
|
+
## License
|
|
109
|
+
|
|
110
|
+
Available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
data/Rakefile
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "bundler/gem_tasks"
|
|
4
|
+
require "rspec/core/rake_task"
|
|
5
|
+
|
|
6
|
+
RSpec::Core::RakeTask.new(:spec)
|
|
7
|
+
|
|
8
|
+
require "rubocop/rake_task"
|
|
9
|
+
|
|
10
|
+
RuboCop::RakeTask.new
|
|
11
|
+
|
|
12
|
+
require "rake/extensiontask"
|
|
13
|
+
Rake::ExtensionTask.new("smarter_json") do |ext|
|
|
14
|
+
ext.ext_dir = "ext/smarter_json"
|
|
15
|
+
ext.lib_dir = "lib/smarter_json"
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
task spec: :compile
|
|
19
|
+
# rubocop is NOT in the default task: `bundle exec rake` = build + test only, so it
|
|
20
|
+
# runs on every Ruby in the CI matrix (incl. 2.5–2.7, where the latest rubocop won't
|
|
21
|
+
# install). Lint runs as its own CI step on one modern Ruby. Locally: `rake rubocop`.
|
|
22
|
+
task default: %i[clobber compile spec]
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
|
|
2
|
+
### Contents
|
|
3
|
+
|
|
4
|
+
* [**Introduction**](./_introduction.md)
|
|
5
|
+
* [The Basic Read API](./basic_read_api.md)
|
|
6
|
+
* [The Basic Write API](./basic_write_api.md)
|
|
7
|
+
* [Configuration Options](./options.md)
|
|
8
|
+
* [Examples](./examples.md)
|
|
9
|
+
|
|
10
|
+
--------------
|
|
11
|
+
|
|
12
|
+
# SmarterJSON Introduction
|
|
13
|
+
|
|
14
|
+
`smarter_json` is a fast, lenient JSON parser and writer for Ruby. It reads strict JSON, JSON5, HJSON-style config, newline-delimited JSON (NDJSON / JSONL), and the messy JSON-ish input humans actually paste — and in benchmarks it matches or beats Oj on nearly every file. It is opinionated: it optimizes for getting your data out, not for policing the JSON spec. Where other parsers stop at the first deviation, SmarterJSON keeps going.
|
|
15
|
+
|
|
16
|
+
## Why another JSON library?
|
|
17
|
+
|
|
18
|
+
Most JSON parsers reject anything that isn't perfectly strict JSON, and they make you tell them up front what shape the input is. SmarterJSON is built on the opposite principle: **you shouldn't have to care what flavor of JSON you were handed.** Give it strict JSON, JSON5, an HJSON-style config file, several concatenated documents, or a copy-pasted blob with comments and trailing commas — it just reads it.
|
|
19
|
+
|
|
20
|
+
## What sets it apart
|
|
21
|
+
|
|
22
|
+
* **One reader, no modes, no flags.** There is no `dialect:` option and no "strict mode" — `SmarterJSON.process(input)` accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the reader to match your input; it adapts to whatever you give it.
|
|
23
|
+
|
|
24
|
+
* **It reads multi-document input automatically — a distinguishing feature.** `SmarterJSON.process` handles NDJSON / JSONL / concatenated JSON with **no block and no special method**: zero documents returns `nil`, one document returns its value, two or more return an `Array`. **Only SmarterJSON reads multi-document input via plain `process` — Oj and the stdlib `json` library raise without a block.** For input larger than memory, pass a block to stream one document at a time. See [The Basic Read API](./basic_read_api.md).
|
|
25
|
+
|
|
26
|
+
* **It's fast.** A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib `json` C parser. Floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
|
|
27
|
+
|
|
28
|
+
* **It writes JSON too.** `SmarterJSON.generate` turns Ruby values into strict, interoperable JSON — or into NDJSON, one element per line, the exact inverse of reading NDJSON back into an Array. See [The Basic Write API](./basic_write_api.md).
|
|
29
|
+
|
|
30
|
+
## What it accepts, beyond strict JSON
|
|
31
|
+
|
|
32
|
+
* `//`, `/* … */`, and `#` comments (a `#`/`//` only starts a comment when preceded by whitespace, so `url: http://x.com` reads as a string, not a truncated value)
|
|
33
|
+
* Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
|
|
34
|
+
* Implicit root object — a config file that starts with `key: value`, no outer `{}`
|
|
35
|
+
* `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
|
|
36
|
+
* UTF-8 BOM, smart/curly quotes, Python literals (`True` / `False` / `None`), JavaScript `undefined`
|
|
37
|
+
* Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via `encoding:`)
|
|
38
|
+
* Duplicate keys (last value wins by default; configurable — see [Configuration Options](./options.md))
|
|
39
|
+
|
|
40
|
+
It raises only on genuinely unparseable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.
|
|
41
|
+
|
|
42
|
+
## Nesting & untrusted input
|
|
43
|
+
|
|
44
|
+
Both the C extension and the pure-Ruby parser are **iterative, not recursive** — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (the stdlib `json` caps at 100). The trade-off: there is currently **no fixed nesting or input-size limit**, so size-limit untrusted input upstream of the parser.
|
|
45
|
+
|
|
46
|
+
---------------
|
|
47
|
+
|
|
48
|
+
NEXT: [The Basic Read API](./basic_read_api.md) | UP: [README](../README.md)
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
|
|
2
|
+
### Contents
|
|
3
|
+
|
|
4
|
+
* [Introduction](./_introduction.md)
|
|
5
|
+
* [**The Basic Read API**](./basic_read_api.md)
|
|
6
|
+
* [The Basic Write API](./basic_write_api.md)
|
|
7
|
+
* [Configuration Options](./options.md)
|
|
8
|
+
* [Examples](./examples.md)
|
|
9
|
+
|
|
10
|
+
--------------
|
|
11
|
+
|
|
12
|
+
# SmarterJSON Basic Read API
|
|
13
|
+
|
|
14
|
+
Reading JSON has one entry point for content and one for files. Both accept the same [options](./options.md), and both take an optional block for streaming.
|
|
15
|
+
|
|
16
|
+
## `SmarterJSON.process` — read a String or an IO
|
|
17
|
+
|
|
18
|
+
```ruby
|
|
19
|
+
require "smarter_json"
|
|
20
|
+
|
|
21
|
+
SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
|
|
22
|
+
SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
`process` is polymorphic: its first argument is **either a String of JSON content or an IO to read from**. A String is always treated as content, never as a filename — use `process_file` for paths.
|
|
26
|
+
|
|
27
|
+
```ruby
|
|
28
|
+
SmarterJSON.process(io) # an open IO (File, StringIO, socket, …) — reads it and parses
|
|
29
|
+
SmarterJSON.process(some_string) # JSON content
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
### Return value depends on how many documents the input holds
|
|
33
|
+
|
|
34
|
+
This is the distinguishing feature: `process` reads multi-document input (NDJSON / JSONL / concatenated / whitespace-separated) automatically, with no block and no special method.
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
SmarterJSON.process("") # => nil (zero documents)
|
|
38
|
+
SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
|
|
39
|
+
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}] (two or more → an Array)
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Documents are separated by whitespace, newlines, or simple concatenation — **not** by commas (a comma between top-level documents would be read as an implicit root array, which is not supported). Only SmarterJSON reads this via plain `process`: Oj and the stdlib `json` library raise without a block.
|
|
43
|
+
|
|
44
|
+
## `SmarterJSON.process_file` — read a file by path
|
|
45
|
+
|
|
46
|
+
```ruby
|
|
47
|
+
SmarterJSON.process_file("config.json5") # read the file, then parse — same return-value rules as process
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
`process_file` opens the file, reads it with the labeled [`encoding:`](./options.md) (default `"UTF-8"`, no transcoding pass), and parses it.
|
|
51
|
+
|
|
52
|
+
## Streaming with a block (bounded memory)
|
|
53
|
+
|
|
54
|
+
For input larger than memory, pass a block. Each top-level document is yielded as it is read, and the method returns `nil` (it never collects the documents into an Array). Both `process` and `process_file` forward the block.
|
|
55
|
+
|
|
56
|
+
```ruby
|
|
57
|
+
# Stream straight from disk, one document at a time — the whole file is never loaded:
|
|
58
|
+
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
59
|
+
|
|
60
|
+
# Same for an IO:
|
|
61
|
+
SmarterJSON.process(io) { |doc| handle(doc) }
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
The streaming path reads the input as newline-delimited documents (NDJSON / JSONL), one document per line. A single document that spans multiple lines is not supported by the streaming path — read it without a block instead.
|
|
65
|
+
|
|
66
|
+
## The C extension and the pure-Ruby fallback
|
|
67
|
+
|
|
68
|
+
By default (`acceleration: :auto`) the C extension is used when it is compiled and loadable (`SmarterJSON::HAS_ACCELERATION` is then `true`); otherwise the pure-Ruby parser runs and produces identical results. Pass `acceleration: false` to force the pure-Ruby path. See [Configuration Options](./options.md).
|
|
69
|
+
|
|
70
|
+
---------------
|
|
71
|
+
|
|
72
|
+
PREVIOUS: [Introduction](./_introduction.md) | NEXT: [The Basic Write API](./basic_write_api.md) | UP: [README](../README.md)
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
|
|
2
|
+
### Contents
|
|
3
|
+
|
|
4
|
+
* [Introduction](./_introduction.md)
|
|
5
|
+
* [The Basic Read API](./basic_read_api.md)
|
|
6
|
+
* [**The Basic Write API**](./basic_write_api.md)
|
|
7
|
+
* [Configuration Options](./options.md)
|
|
8
|
+
* [Examples](./examples.md)
|
|
9
|
+
|
|
10
|
+
--------------
|
|
11
|
+
|
|
12
|
+
# SmarterJSON Basic Write API
|
|
13
|
+
|
|
14
|
+
Writing JSON has one entry point: `SmarterJSON.generate`. It turns a Ruby value into a JSON String — strict, interoperable output by default, or NDJSON when you ask for it.
|
|
15
|
+
|
|
16
|
+
## `SmarterJSON.generate` — write a Ruby value as JSON
|
|
17
|
+
|
|
18
|
+
```ruby
|
|
19
|
+
require "smarter_json"
|
|
20
|
+
|
|
21
|
+
SmarterJSON.generate({ "a" => 1, "b" => [2, 3] }) # => '{"a":1,"b":[2,3]}'
|
|
22
|
+
SmarterJSON.generate([1, 2, 3]) # => '[1,2,3]'
|
|
23
|
+
SmarterJSON.generate("hi") # => '"hi"'
|
|
24
|
+
SmarterJSON.generate(42) # => '42'
|
|
25
|
+
SmarterJSON.generate(nil) # => 'null'
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
The output is always **valid, strict JSON** — there is no lenient write mode. (We are lenient about what we *read*, strict about what we *write*, so the output interoperates with every other JSON parser.)
|
|
29
|
+
|
|
30
|
+
## How Ruby values map to JSON
|
|
31
|
+
|
|
32
|
+
| Ruby | JSON output |
|
|
33
|
+
|----------------------------------------|---------------------------------------------------------|
|
|
34
|
+
| `Hash` | object `{…}` — keys are stringified (Symbol keys too) |
|
|
35
|
+
| `Array` | array `[…]` |
|
|
36
|
+
| `String` | quoted string, escaped (see below) |
|
|
37
|
+
| `Symbol` | quoted string (`:sym` → `"sym"`) |
|
|
38
|
+
| `Integer` | number |
|
|
39
|
+
| `Float` | number (non-finite raises — see below) |
|
|
40
|
+
| `BigDecimal` | number, full precision (not a string) |
|
|
41
|
+
| `true` / `false` / `nil` | `true` / `false` / `null` |
|
|
42
|
+
|
|
43
|
+
```ruby
|
|
44
|
+
SmarterJSON.generate({ a: 1, b: :sym }) # => '{"a":1,"b":"sym"}' (Symbol key and value → strings)
|
|
45
|
+
SmarterJSON.generate(BigDecimal("65.613616999999977")) # => '65.613616999999977' (a number, full precision)
|
|
46
|
+
SmarterJSON.generate("café\tx") # => '"café\tx"' (control chars escaped, UTF-8 raw)
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
Strings escape `"`, `\`, and the control characters `0x00–0x1F`; everything else — including multi-byte UTF-8 — is emitted raw, which is valid JSON.
|
|
50
|
+
|
|
51
|
+
## What raises
|
|
52
|
+
|
|
53
|
+
`generate` raises `SmarterJSON::Error` on input it cannot represent as strict JSON:
|
|
54
|
+
|
|
55
|
+
```ruby
|
|
56
|
+
SmarterJSON.generate(Time.now) # raises SmarterJSON::Error — unsupported type
|
|
57
|
+
SmarterJSON.generate(Float::INFINITY) # raises SmarterJSON::Error — non-finite Float
|
|
58
|
+
SmarterJSON.generate(Float::NAN) # raises SmarterJSON::Error — non-finite Float
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
(`Infinity` and `NaN` are accepted on the *read* side as a leniency, but they are not valid JSON to *write*.)
|
|
62
|
+
|
|
63
|
+
## Writing NDJSON
|
|
64
|
+
|
|
65
|
+
Pass `format: :ndjson` to write newline-delimited JSON. An `Array` writes **one element per line**; any other value writes as a single line. This is the exact inverse of [reading NDJSON](./basic_read_api.md) back into an Array.
|
|
66
|
+
|
|
67
|
+
```ruby
|
|
68
|
+
SmarterJSON.generate([{ "id" => 1 }, { "id" => 2 }], format: :ndjson) # => "{\"id\":1}\n{\"id\":2}\n"
|
|
69
|
+
SmarterJSON.generate({ "id" => 1 }, format: :ndjson) # => "{\"id\":1}\n" (single value → one line)
|
|
70
|
+
SmarterJSON.generate([], format: :ndjson) # => "" (empty array → no lines)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Note the difference from the default `format: :json`, where a top-level Array is written as a single JSON array (`[…]`), not as NDJSON. See [Configuration Options](./options.md) for the full list of writer options.
|
|
74
|
+
|
|
75
|
+
## Round-tripping
|
|
76
|
+
|
|
77
|
+
`process` and `generate` are inverses:
|
|
78
|
+
|
|
79
|
+
```ruby
|
|
80
|
+
obj = { "a" => 1, "b" => [2, "three", nil, true] }
|
|
81
|
+
SmarterJSON.process(SmarterJSON.generate(obj)) == obj # => true
|
|
82
|
+
|
|
83
|
+
arr = [{ "id" => 1 }, { "id" => 2 }, { "id" => 3 }]
|
|
84
|
+
SmarterJSON.process(SmarterJSON.generate(arr, format: :ndjson)) == arr # => true
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Check out the [RSpec tests](../spec/generator_spec.rb) for more examples.
|
|
88
|
+
|
|
89
|
+
---------------
|
|
90
|
+
|
|
91
|
+
PREVIOUS: [The Basic Read API](./basic_read_api.md) | NEXT: [Configuration Options](./options.md) | UP: [README](../README.md)
|
data/docs/examples.md
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
|
|
2
|
+
### Contents
|
|
3
|
+
|
|
4
|
+
* [Introduction](./_introduction.md)
|
|
5
|
+
* [The Basic Read API](./basic_read_api.md)
|
|
6
|
+
* [The Basic Write API](./basic_write_api.md)
|
|
7
|
+
* [Configuration Options](./options.md)
|
|
8
|
+
* [**Examples**](./examples.md)
|
|
9
|
+
|
|
10
|
+
--------------
|
|
11
|
+
|
|
12
|
+
# Examples
|
|
13
|
+
|
|
14
|
+
**Rescue from `SmarterJSON::Error` (recommended):** SmarterJSON raises only on genuinely unparseable input (an unterminated string, a mismatched bracket), with line and column in the message. Rescuing from `SmarterJSON::Error` lets your application handle bad input gracefully.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
1. [Read a JSON String](#example-1-read-a-json-string)
|
|
19
|
+
2. [Read a JSON File](#example-2-read-a-json-file)
|
|
20
|
+
3. [Implicit Root Object (config-style, no braces)](#example-3-implicit-root-object-config-style-no-braces)
|
|
21
|
+
4. [Multiple Documents (NDJSON) → Array](#example-4-multiple-documents-ndjson--array)
|
|
22
|
+
5. [Streaming a Large File with a Block](#example-5-streaming-a-large-file-with-a-block)
|
|
23
|
+
6. [Symbolize Keys](#example-6-symbolize-keys)
|
|
24
|
+
7. [Duplicate Keys](#example-7-duplicate-keys)
|
|
25
|
+
8. [High-Precision Numbers: BigDecimal vs Float](#example-8-high-precision-numbers-bigdecimal-vs-float)
|
|
26
|
+
9. [Lenient Input: Comments, Trailing Commas, Unquoted Keys](#example-9-lenient-input-comments-trailing-commas-unquoted-keys)
|
|
27
|
+
10. [Write JSON](#example-10-write-json)
|
|
28
|
+
11. [Write NDJSON](#example-11-write-ndjson)
|
|
29
|
+
12. [Round-Trip Read and Write](#example-12-round-trip-read-and-write)
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
### Example 1: Read a JSON String
|
|
34
|
+
|
|
35
|
+
```ruby
|
|
36
|
+
require "smarter_json"
|
|
37
|
+
|
|
38
|
+
SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### Example 2: Read a JSON File
|
|
42
|
+
|
|
43
|
+
```ruby
|
|
44
|
+
SmarterJSON.process_file("config.json") # => the parsed value
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
`process_file` opens the file, reads it with the labeled [`encoding:`](./options.md) (default `"UTF-8"`), and parses it.
|
|
48
|
+
|
|
49
|
+
### Example 3: Implicit Root Object (config-style, no braces)
|
|
50
|
+
|
|
51
|
+
A config file that starts with `key: value` and has no outer `{}` is read as an object:
|
|
52
|
+
|
|
53
|
+
```ruby
|
|
54
|
+
SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Example 4: Multiple Documents (NDJSON) → Array
|
|
58
|
+
|
|
59
|
+
Plain `process` reads NDJSON / JSONL / concatenated documents with no block and no special method. Zero documents → `nil`, one → its value, two or more → an `Array`:
|
|
60
|
+
|
|
61
|
+
```ruby
|
|
62
|
+
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]
|
|
63
|
+
SmarterJSON.process('{"id":1}') # => {"id"=>1}
|
|
64
|
+
SmarterJSON.process("") # => nil
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Example 5: Streaming a Large File with a Block
|
|
68
|
+
|
|
69
|
+
For input larger than memory, pass a block. Each document is yielded as it is read; the whole file is never loaded:
|
|
70
|
+
|
|
71
|
+
```ruby
|
|
72
|
+
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Example 6: Symbolize Keys
|
|
76
|
+
|
|
77
|
+
```ruby
|
|
78
|
+
SmarterJSON.process('{"a": 1, "b": 2}', symbolize_keys: true) # => {:a=>1, :b=>2}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### Example 7: Duplicate Keys
|
|
82
|
+
|
|
83
|
+
By default the last value wins. Choose `:first_wins` or `:raise` instead:
|
|
84
|
+
|
|
85
|
+
```ruby
|
|
86
|
+
SmarterJSON.process('{"a":1,"a":2}') # => {"a"=>2} (:last_wins, the default)
|
|
87
|
+
SmarterJSON.process('{"a":1,"a":2}', duplicate_key: :first_wins) # => {"a"=>1}
|
|
88
|
+
SmarterJSON.process('{"a":1,"a":2}', duplicate_key: :raise) # raises SmarterJSON::ParseError
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### Example 8: High-Precision Numbers: BigDecimal vs Float
|
|
92
|
+
|
|
93
|
+
The default `:auto` keeps high-precision decimals as `BigDecimal` (matching Oj). Force `Float` for raw speed when you don't need the precision:
|
|
94
|
+
|
|
95
|
+
```ruby
|
|
96
|
+
SmarterJSON.process("65.613616999999977") # => BigDecimal (:auto, the default)
|
|
97
|
+
SmarterJSON.process("65.613616999999977", bigdecimal_load: :float) # => 65.613616999999977 (a Float)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Example 9: Lenient Input: Comments, Trailing Commas, Unquoted Keys
|
|
101
|
+
|
|
102
|
+
```ruby
|
|
103
|
+
SmarterJSON.process(<<~JSON)
|
|
104
|
+
{
|
|
105
|
+
host: localhost, # unquoted key, quoteless value, and a trailing comma
|
|
106
|
+
port: 5432,
|
|
107
|
+
/* block comment */
|
|
108
|
+
url: http://example.com
|
|
109
|
+
}
|
|
110
|
+
JSON
|
|
111
|
+
# => {"host"=>"localhost", "port"=>5432, "url"=>"http://example.com"}
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
A `#`/`//` only starts a comment when preceded by whitespace, so `http://example.com` stays a string rather than being truncated.
|
|
115
|
+
|
|
116
|
+
### Example 10: Write JSON
|
|
117
|
+
|
|
118
|
+
```ruby
|
|
119
|
+
SmarterJSON.generate({ "a" => 1, "b" => [2, 3] }) # => '{"a":1,"b":[2,3]}'
|
|
120
|
+
SmarterJSON.generate([1, 2, 3]) # => '[1,2,3]'
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Example 11: Write NDJSON
|
|
124
|
+
|
|
125
|
+
An Array writes one element per line:
|
|
126
|
+
|
|
127
|
+
```ruby
|
|
128
|
+
SmarterJSON.generate([{ "id" => 1 }, { "id" => 2 }], format: :ndjson) # => "{\"id\":1}\n{\"id\":2}\n"
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### Example 12: Round-Trip Read and Write
|
|
132
|
+
|
|
133
|
+
```ruby
|
|
134
|
+
obj = { "a" => 1, "b" => [2, "three", nil, true] }
|
|
135
|
+
SmarterJSON.process(SmarterJSON.generate(obj)) == obj # => true
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
---------------
|
|
139
|
+
|
|
140
|
+
PREVIOUS: [Configuration Options](./options.md) | UP: [README](../README.md)
|