smarter_json 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 84a73a6cf0785c67eb2dfaf87dc663b860d5afed6bf0816f861b6430d1f55475
4
- data.tar.gz: 953aebf65ab855450a7d3b41c826c169242dd19a190f089da3f52df2da0b0a44
3
+ metadata.gz: ca54f0d032730a5606b5a5d0fe4091c62c2d1ef70ebce6dd9283c91bb288a6b2
4
+ data.tar.gz: b4d859ef69ca0fa1935271ba99027f9d132ad8e558579da2b37d6bb542fc5076
5
5
  SHA512:
6
- metadata.gz: 8fe6e07fd99f1557a716fc37370dfc3c5cdb34e054d07fa812983d6cefea1e74108bb0e708fdb42f47a31ff8435aa0d6b8c80deeebab09a221bbd9992691be28
7
- data.tar.gz: ec166df8863b136abc38df844bba2286b542de03f24d410b6e4a5914bfcf2165337add25c5731b92540328c5496ad4ec658243e21e4830687de9f44dcaaf4d54
6
+ metadata.gz: c4c474e948779c8541fe7d6bc7a572cf1a131aea53dff2436ac0068228c777d020dd3c6e946b014e80bedbca03aec93fa35f8189294fb070935fb12db06fee24
7
+ data.tar.gz: 77082d230f4cccd2f913bb4a5508f8b15551230fde2c78b341ad71b08700b8a67bd1030a242ab3bd6f70252c66300d2fce7099964b5cf883229bbc73303305a7
data/CHANGELOG.md CHANGED
@@ -1,22 +1,26 @@
1
1
 
2
2
  # SmarterJSON Change Log
3
3
 
4
- > ⚠️ **New Interface (since 0.9.7):**
5
- >
6
- > SmarterJSON **always returns an `Array`** of documents.
4
+ > ⚠️ SmarterJSON **always returns an `Array`** of documents.
7
5
  >
8
6
  > `SmarterJSON.process` / `SmarterJSON.process_file` return:
9
7
  >
10
- > — `[]` for no doc
11
- > - `[doc]` for one doc
12
- > - `[d1, d2, …]` for several docs (NDJSON / JSONL / concatenated docs)
8
+ > — `[]` for no doc
9
+ > - `[doc]` for one doc
10
+ > - `[d1, d2, …]` for several docs (NDJSON / JSONL / concatenated docs)
13
11
 
14
12
  > ⚠️ We discourage the use of `process(input).first` / `process(input)[0]` because it silently drops potential additional documents
15
13
  > Please use `process_one` if you are expecting only one JSON doc, e.g. in API payloads.
16
14
 
15
+ ## 1.1.0 (2026-06-09)
16
+
17
+ RSpec tests: 1,038 → 1,070
18
+
19
+ - New `SmarterJSON.foreach(source)` — the streaming, composable sibling of `process_file`. `source` is a file path or an IO (a socket, `StringIO`, open `File`). Without a block it returns a plain `Enumerator` (like `CSV.foreach`) that reads one document at a time, never loading the whole file, so a large NDJSON / JSONL stream can be filtered or transformed with `.select` / `.map` / `.lazy` / `.first`; with a block it streams and returns the document count, like `process_file`.
20
+
17
21
  ## 1.0.0 (2026-06-08)
18
22
 
19
- RSpec tests: 1,034
23
+ RSpec tests: 1,038
20
24
 
21
25
  - **The public interface is now stable** — `process`, `process_one`, `process_file`, `generate`, and the documented options; semantic versioning from here on.
22
26
  - Unknown or wrongly-typed options now raise `ArgumentError` instead of being silently ignored, so a typo (e.g. `symbolize_names:` instead of `symbolize_keys:`) is caught immediately.
data/README.md CHANGED
@@ -2,10 +2,21 @@
2
2
 
3
3
  ![Gem Version](https://img.shields.io/gem/v/smarter_json) [![codecov](https://codecov.io/gh/tilo/smarter_json/branch/main/graph/badge.svg)](https://codecov.io/gh/tilo/smarter_json) <!-- [![Downloads](https://img.shields.io/gem/dt/smarter_json)](https://rubygems.org/gems/smarter_json) --> [![RubyGems](https://img.shields.io/badge/RubyGems-smarter__json-brightgreen?logo=rubygems&logoColor=white)](https://rubygems.org/gems/smarter_json) [![Ruby Toolbox](https://img.shields.io/badge/Ruby%20Toolbox-smarter__json-brightgreen)](https://www.ruby-toolbox.com/projects/smarter_json)
4
4
 
5
- A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Traditional JSON parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
5
+ A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON, JSONL, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Traditional JSON parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
6
6
 
7
7
  > **SmarterJSON: one tool, no modes — want strict? Please use the stdlib `json` gem.**
8
8
 
9
+ ## Features at a glance
10
+
11
+ - **Reads the whole human-JSON superset, no modes or flags** — strict JSON, NDJSON, JSONL, JSON5, HJSON, JSONC, plus comments, trailing commas, unquoted / single / triple / smart quotes, an implicit root object, `NaN` / `Infinity` / hex / underscores, Python & JavaScript literals, a UTF-8 BOM, mixed line endings, and any Ruby encoding (see [What it accepts](#what-it-accepts-beyond-strict-json) for the full list).
12
+ - **Every document from multi-document input, in one call** — `process` returns an `Array` of all of them; `process_one` returns the single value and warns if there was more than one (never raises; routed to `on_warning`, else `Rails.logger`, else `Kernel.warn`).
13
+ - **Streaming in bounded memory** — pass a block, or use `foreach(path_or_io)` for a composable `Enumerator` you can `.select` / `.map` / `.lazy` over.
14
+ - **Recovers JSON from LLM / markdown noise** — strips markdown code fences, surrounding prose, and `<json>` tags, and pulls every payload out of one messy blob.
15
+ - **Writes JSON too** — `generate` with pretty-printing, NDJSON, `sort_keys`, `ascii_only`, `script_safe`, `allow_nan`, and `coerce` (via `as_json`); iterative, so deeply nested data is depth-safe.
16
+ - **Keeps number precision** — `BigDecimal` by default (Oj-compatible), or `:float` / `:auto`.
17
+ - **Transparent leniency** — pass an optional `on_warning` callback to be handed every lenient fix (an empty slot collapsed, a duplicate key dropped, a code fence stripped, …); with no handler the parser stays silent and adds zero overhead.
18
+ - **Fast, and runs everywhere** — a C extension that matches or beats Oj, with a pure-Ruby fallback for platforms that can't build it. Stable, semantically versioned, thread-safe, Ruby 2.6+.
19
+
9
20
  ## Why SmarterJSON?
10
21
 
11
22
  **Are you tired of seeing errors like these?**
@@ -40,7 +51,7 @@ A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON, JSON5,
40
51
  Traditional JSON parsers reject anything that isn't perfectly strict JSON. That means your code breaks on malformed data.
41
52
 
42
53
  SmarterJSON is built on the opposite principle: **you shouldn't have to care what flavor of JSON you were handed** and **you shouldn't lose the whole document because of formatting errors.**
43
- Give it strict JSON, NDJSON, JSON5, an HJSON-style config file, LLM-generated JSON, or a copy-pasted blob with comments and trailing commas — it just extracts the data from it.
54
+ Give it strict JSON, NDJSON, JSONL, JSON5, an HJSON-style config file, LLM-generated JSON, or a copy-pasted blob with comments and trailing commas — it just extracts the data from it.
44
55
  When it is lenient, `smarter_json` isn't dropping data that exists — it's just not raising an eyebrow at a suspicious gap (like an extra comma).
45
56
 
46
57
  A strict parser would refuse the whole document and recover nothing; `smarter_json` returns everything except the formatting error.
@@ -73,13 +84,15 @@ It raises only on genuinely unreadable input (unterminated string, mismatched br
73
84
  The lenient grammar is a superset of these human-JSON specs — listed once, here:
74
85
 
75
86
  * [JSON5](https://json5.org/)
76
- * [HJSON](https://hjson.github.io/)
87
+ * [HJSON](https://hjson.github.io/) <sup>†</sup>
77
88
  * [JWCC / HuJSON](https://github.com/tailscale/hujson)
78
89
  * [Nigel Tao](https://nigeltao.github.io/blog/2021/json-with-commas-comments.html)
79
90
  * [JSONH](https://github.com/jsonh-org/Jsonh)
80
91
  * [JSONC (VS Code)](https://jsonc.org/)
81
92
  * [NDJSON / JSON Text Sequences (RFC 7464)](https://datatracker.ietf.org/doc/html/rfc7464).
82
93
 
94
+ <sup>†</sup> A deliberate subset. SmarterJSON's quoteless (unquoted) string values are single-line — it does **not** parse HJSON's unquoted multi-line strings; use a quoted or triple-quoted (`'''…'''`) string for multiline. This is by design: SmarterJSON is one deterministic, no-modes superset of the JSON-family dialects (JSON5 / HJSON / JSONC / …), so it adopts a feature only where it does not conflict with the others — and an unquoted string that may span newlines collides with newline-as-a-document-separator (NDJSON, implicit-root config), so it is left out.
95
+
83
96
  ## Installation
84
97
 
85
98
  ```ruby
@@ -130,7 +143,7 @@ See [Examples](#examples) below for multi-document input, streaming, and recover
130
143
 
131
144
  ## Stable interface & thread safety
132
145
 
133
- The public interface is now considered stable: `SmarterJSON.process`, `SmarterJSON.process_one`, `SmarterJSON.process_file`, `SmarterJSON.generate`, and the documented options in this README/docs are the supported surface.
146
+ The public interface is: `SmarterJSON.process`, `SmarterJSON.process_one`, `SmarterJSON.process_file`, `SmarterJSON.foreach`, `SmarterJSON.generate`, and the documented options in this README/docs are the supported surface. `SmarterJSON.process` and `SmarterJSON.process_file` always return an `Array` of documents; `process_one` returns the single document's value (or `nil`), and emits a warning if there is more than one doc.
134
147
 
135
148
  Concurrent calls are safe. The processor and generator keep per-call state local, and the C extension only caches Ruby IDs / constants at load time; it does not share mutable state across calls.
136
149
 
@@ -243,6 +256,46 @@ For input larger than memory, pass a block: each document is yielded as it is re
243
256
  SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
244
257
  ```
245
258
 
259
+ **Try it on a file you already have.** SmarterJSON reads **NDJSON / JSONL natively** — and Claude Code stores every session as a JSONL transcript (`~/.claude/projects/<project>/<session-id>.jsonl`, one JSON document per line). Walk yours, one record at a time:
260
+
261
+ ```ruby
262
+ require "awesome_print" # optional — readable nested output
263
+
264
+ SmarterJSON.process_file("#{Dir.home}/.claude/projects/<project>/<session-id>.jsonl") do |entry|
265
+ ap entry # each line is a full document — a message, a tool call, a result, …
266
+ puts "-" * 80
267
+ end
268
+ ```
269
+
270
+ ### Filtering and rewriting a large file (`foreach`)
271
+
272
+ `SmarterJSON.foreach(source)` is the composable sibling of `process_file`. `source` is a file path or any IO (a socket, a `StringIO`, an open `File`). With no block it returns a plain `Enumerator` (like `CSV.foreach`) that reads one document at a time, so you can chain `.select` / `.map` and friends. Add `.lazy` to keep the whole chain bounded in memory, even when the filtered set is large:
273
+
274
+ ```ruby
275
+ # Keep only the user/assistant turns of a transcript — one document in memory at a time
276
+ SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
277
+ .lazy
278
+ .select { |doc| %w[user assistant].include?(doc[:type]) }
279
+ .each { |doc| puts doc[:text] }
280
+ ```
281
+
282
+ Because it streams both ends, you can **filter a big file down and rewrite it** without ever loading the whole thing:
283
+
284
+ ```ruby
285
+ File.open("filtered.jsonl", "w") do |out|
286
+ SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
287
+ .lazy
288
+ .select { |doc| %w[user assistant].include?(doc[:type]) }
289
+ .each { |doc| out.puts SmarterJSON.generate(doc) }
290
+ end
291
+ ```
292
+
293
+ Pass an IO instead of a path to stream straight from a socket or an HTTP response body — anything `IO`-like works (an IO is single-pass, read once):
294
+
295
+ ```ruby
296
+ SmarterJSON.foreach(response_io).each { |event| handle(event) }
297
+ ```
298
+
246
299
  ### Recovering JSON from LLM / markdown noise
247
300
 
248
301
  When the payload is wrapped in markdown fences, surrounding prose, or tags, `process` (or `process_one` for a single payload) strips the wrapper and reads what's inside. (Clean JSON never pays for this — recovery only runs when a straight read fails.)
@@ -103,6 +103,28 @@ SmarterJSON.process(io) { |doc| handle(doc) }
103
103
 
104
104
  The streaming path now frames whole top-level documents, not just one line at a time. That means NDJSON / JSONL still work, but pretty-printed multi-line objects and arrays work too, as do mixed `\n` / `\r\n` / `\r` line endings and comment-only separators between documents.
105
105
 
106
+ ## `SmarterJSON.foreach` — stream a file or IO, composably
107
+
108
+ `foreach` is the composable sibling of `process_file`. Its argument is a **file path or any IO** (a socket, a `StringIO`, an open `File`); a String is always a path, never content.
109
+
110
+ With a block it behaves exactly like the block form above — streams each document, returns the **document count**. Without a block it returns a plain `Enumerator` (like `CSV.foreach` — **not** an `Enumerator::Lazy`), so `.map` / `.select` return Arrays the usual way, and you can chain over the stream:
111
+
112
+ ```ruby
113
+ SmarterJSON.foreach("events.ndjson").each { |event| EventJob.perform_async(event) } # like the block form
114
+ SmarterJSON.foreach("events.ndjson").select { |e| e["level"] == "error" } # => an Array of the matches
115
+ ```
116
+
117
+ It reads one document at a time, so `foreach(path).first(3)` only reads ~3 documents off disk, and `.next` pulls them one by one. `.map` / `.select` read the source lazily but still build an Array of their *result*; to keep a whole pipeline bounded end to end (a large filtered set off a fat file), add `.lazy` at the call site:
118
+
119
+ ```ruby
120
+ SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
121
+ .lazy
122
+ .select { |doc| %w[user assistant].include?(doc[:type]) }
123
+ .each { |doc| puts doc[:text] }
124
+ ```
125
+
126
+ Options are validated eagerly — a bad option key or value raises immediately, before any iteration. An **IO source is single-pass** (an IO can only be read once), so iterating the returned Enumerator a second time over the same IO yields nothing; a path-backed `foreach` re-opens the file and is re-iterable.
127
+
106
128
  ## The C extension and the pure-Ruby fallback
107
129
 
108
130
  By default (`acceleration: true`) the C extension is used when it is compiled and loadable (`SmarterJSON::HAS_ACCELERATION` is then `true`); otherwise the pure-Ruby implementation runs and produces identical results. Pass `acceleration: false` to force the pure-Ruby path. See [Configuration Options](./options.md).
data/docs/examples.md CHANGED
@@ -83,6 +83,28 @@ For input larger than memory, pass a block. Each recovered document is yielded o
83
83
  SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
84
84
  ```
85
85
 
86
+ **A JSONL file you already have:** Claude Code stores each session as a JSONL transcript — `~/.claude/projects/<project>/<session-id>.jsonl`, one JSON document per line (a message, a tool call, a result, …). It reads the same way, one record at a time:
87
+
88
+ ```ruby
89
+ require "awesome_print" # optional — for readable nested output
90
+
91
+ SmarterJSON.process_file("#{Dir.home}/.claude/projects/<project>/<session-id>.jsonl") do |entry|
92
+ ap entry # each line is a full document
93
+ puts "-" * 80
94
+ end
95
+ ```
96
+
97
+ **Filter and rewrite as a stream — `SmarterJSON.foreach`:** `foreach(source)` is the composable sibling of `process_file`; `source` is a file path or any IO (a socket, a `StringIO`, an open `File`). Without a block it returns a plain `Enumerator` (like `CSV.foreach`) that reads one document at a time, so it chains with `.select` / `.map`; add `.lazy` to keep the whole pipeline bounded in memory. This filters a transcript down to its user/assistant turns and writes a smaller file, never loading all of it:
98
+
99
+ ```ruby
100
+ File.open("filtered.jsonl", "w") do |out|
101
+ SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
102
+ .lazy
103
+ .select { |doc| %w[user assistant].include?(doc[:type]) }
104
+ .each { |doc| out.puts SmarterJSON.generate(doc) }
105
+ end
106
+ ```
107
+
86
108
  ### Example 6: Symbolize Keys
87
109
 
88
110
  ```ruby
@@ -57,6 +57,41 @@ module SmarterJSON
57
57
  end
58
58
  end
59
59
 
60
+ # SmarterJSON.foreach(source, options = {}) — the streaming, composable sibling of
61
+ # process_file, mirroring the stdlib convention (CSV.foreach / File.foreach): a
62
+ # plain Enumerator (NOT Enumerator::Lazy), so .map / .select behave the normal way
63
+ # and return an Array.
64
+ #
65
+ # `source` is a file path (opened and streamed from disk, like process_file) OR an
66
+ # IO — a socket, a StringIO, an open File — streamed directly from its current
67
+ # position. A String is always a path, never content. An IO source is single-pass:
68
+ # it can only be read once, so iterating the returned Enumerator a second time over
69
+ # the same IO yields nothing.
70
+ #
71
+ # Without a block: returns an Enumerator over each top-level document, reading one
72
+ # document at a time via readpartial — it never slurps the whole file the way
73
+ # process_file(path) does. So foreach(path).first(3) reads only ~3 documents off
74
+ # disk, and foreach(src).each { … } / .next stream in bounded memory. .map / .select
75
+ # read the source one document at a time but still build an Array of their result;
76
+ # for a chain that stays bounded end to end (a large filtered set off a fat file)
77
+ # opt into .lazy at the call site: foreach(src).lazy.select { … }.each { … }.
78
+ #
79
+ # With a block: streams each document and returns the document count — identical
80
+ # to process_file(path) { |doc| … } (or process(io) { |doc| … } for an IO).
81
+ #
82
+ # Options are validated eagerly (before the Enumerator is returned), so a bad
83
+ # option key or value fails fast rather than on first iteration.
84
+ def foreach(source, options = {}, &block)
85
+ options = Options.process_options(options)
86
+ return enum_for(:foreach, source, options) unless block
87
+
88
+ if source.respond_to?(:read) # an IO (socket, StringIO, open File) — stream it directly
89
+ stream_io(source, options, &block)
90
+ else # a path — open the file and stream from disk
91
+ process_file(source, options, &block)
92
+ end
93
+ end
94
+
60
95
  # SmarterJSON.process_one(input, options = {}) — the single-document accessor.
61
96
  #
62
97
  # Returns the first document's value (or nil when the input holds no documents).
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterJSON
4
- VERSION = "1.0.0"
4
+ VERSION = "1.1.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_json
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
@@ -24,7 +24,7 @@ dependencies:
24
24
  - !ruby/object:Gem::Version
25
25
  version: '0'
26
26
  description: 'A lenient, fast JSON processor for Ruby. It extracts strict JSON, NDJSON,
27
- JSON5, HJSON-style config, and the messy JSON-ish input humans and LLMs actually
27
+ JSONL, JSON5, HJSON-style config, and the messy JSON-ish input humans and LLMs actually
28
28
  write — comments, trailing commas, single / unquoted / smart quotes, Python and
29
29
  JS keywords, a UTF-8 BOM, and more all parse to the same Ruby objects, with no modes
30
30
  or flags to set. Where a traditional parser stops at the first deviation and throws
@@ -92,6 +92,6 @@ required_rubygems_version: !ruby/object:Gem::Requirement
92
92
  requirements: []
93
93
  rubygems_version: 3.6.9
94
94
  specification_version: 4
95
- summary: A lenient, fast JSON processor for Ruby — reads strict JSON, NDJSON, JSON5,
96
- HJSON, and the messy JSON humans and LLMs actually write.
95
+ summary: A lenient, fast JSON processor for Ruby — reads strict JSON, NDJSON, JSONL,
96
+ JSON5, HJSON, and the messy JSON humans and LLMs actually write.
97
97
  test_files: []