json-repair 0.10.0 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +2 -4
- data/CHANGELOG.md +20 -0
- data/CLAUDE.md +19 -5
- data/README.md +7 -1
- data/lib/json/repair/string_utils.rb +4 -0
- data/lib/json/repair/version.rb +1 -1
- data/lib/json/repairer.rb +57 -0
- data/sig/json/repair/string_utils.rbs +2 -0
- data/sig/json/repairer.rbs +4 -0
- metadata +9 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0c130bbea1b9299e31e5bfa8db873b09fd911715b7125fda6ee60a101353be5f
|
|
4
|
+
data.tar.gz: 80f6e2fe16669210505e45c99a443eaa3ce6b8b3c10e3967633885f91a9d057b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 4e2c05c45ebf1cf149705021faa611c22e6e2e0de48d15d2496da4e935291abe6bf185cde756263c6aad210f73ca2e54e4c965826b14bdbb0fa9ffa47bf1684f
|
|
7
|
+
data.tar.gz: 180b477cea0b27c813b664bfcda75eb28b59453d1fe4ce90781cf298c870aa8518411e689a37ddc30882aa5f376c351ff718600411d97df845f67fd87b593d92
|
data/.rubocop.yml
CHANGED
|
@@ -1,10 +1,6 @@
|
|
|
1
1
|
AllCops:
|
|
2
2
|
TargetRubyVersion: 3.0
|
|
3
3
|
|
|
4
|
-
Metrics/BlockLength:
|
|
5
|
-
Exclude:
|
|
6
|
-
- spec/**/*
|
|
7
|
-
|
|
8
4
|
Style/Documentation:
|
|
9
5
|
Enabled: false
|
|
10
6
|
|
|
@@ -33,6 +29,8 @@ Metrics/BlockLength:
|
|
|
33
29
|
Exclude:
|
|
34
30
|
- lib/json/repairer.rb
|
|
35
31
|
- spec/**/*
|
|
32
|
+
# restore RuboCop's default exclusion, lost when Exclude is overridden
|
|
33
|
+
- '*.gemspec'
|
|
36
34
|
|
|
37
35
|
Metrics/BlockNesting:
|
|
38
36
|
Exclude:
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,25 @@
|
|
|
1
1
|
# Changes
|
|
2
2
|
|
|
3
|
+
### 2026-06-12 (0.11.0)
|
|
4
|
+
|
|
5
|
+
* Repair object string values with unescaped quotes around a colon
|
|
6
|
+
("doubled colon"): `{"a": "b": "c"}` → `{"a":"b\": \"c"}` — the
|
|
7
|
+
value reads as `b": "c`, the unescaped-quotes interpretation. The
|
|
8
|
+
merge preserves the literal characters between the strings
|
|
9
|
+
(whitespace, original quote style) and repeats greedily
|
|
10
|
+
(`{"a": "b": "c": "d"}` → value `b": "c": "d`). Only the
|
|
11
|
+
string–colon–string shape is repaired: non-string shapes like
|
|
12
|
+
`{"a": "b": 1}` or `{"a": 1: 2}` still raise `JSONRepairError`
|
|
13
|
+
rather than silently dropping data (Python `json_repair` drops the
|
|
14
|
+
`: 1` there). Previously all of these raised "Object key expected".
|
|
15
|
+
Deliberate divergence from upstream
|
|
16
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair) (raises as of
|
|
17
|
+
v3.14.0), matching
|
|
18
|
+
[Go json-repair](https://github.com/RealAlexandreAI/json-repair)
|
|
19
|
+
and Python
|
|
20
|
+
[`json_repair`](https://github.com/mangiucugna/json_repair) on the
|
|
21
|
+
canonical case.
|
|
22
|
+
|
|
3
23
|
### 2026-06-11 (0.10.0)
|
|
4
24
|
|
|
5
25
|
* Repair Markdown list markers in front of top-level values:
|
data/CLAUDE.md
CHANGED
|
@@ -10,7 +10,8 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|
|
10
10
|
- `bundle exec rspec spec/json_spec.rb:42` — run a single example by line number; nearly all behavioral specs live in `spec/json_spec.rb`.
|
|
11
11
|
- `bundle exec rubocop` — lint. Project-specific exclusions in `.rubocop.yml` deliberately disable several `Metrics/*` cops for `lib/json/repairer.rb` and `lib/json/repair/string_utils.rb` because the parser is long by design — don't try to "fix" it by chopping methods up.
|
|
12
12
|
- `bin/console` — IRB with the gem preloaded.
|
|
13
|
-
- `bundle exec rake
|
|
13
|
+
- `bundle exec rake bench` — benchmark-ips regression baseline (`benchmark/run.rb`); run before/after perf-sensitive changes.
|
|
14
|
+
- `bundle exec rake install` / `bundle exec rake release` — local install / publish to rubygems.org (release prompts for a rubygems MFA OTP).
|
|
14
15
|
- Type checking: `Steepfile` checks `lib/` against `sig/`. `bundle exec steep check` (typecheck) and `bundle exec rbs validate` (sig syntax) both run in CI and as part of the default rake task. `steep` and `rbs` are dev dependencies in the `Gemfile`.
|
|
15
16
|
|
|
16
17
|
Ruby `>= 3.0.0` is required (per gemspec). CI runs against Ruby 3.3.1.
|
|
@@ -19,9 +20,15 @@ Ruby `>= 3.0.0` is required (per gemspec). CI runs against Ruby 3.3.1.
|
|
|
19
20
|
|
|
20
21
|
This gem is a **Ruby port of the [josdejong/jsonrepair](https://github.com/josdejong/jsonrepair) TypeScript library**. The upstream version currently mirrored is tracked in `CHANGELOG.md` (presently v3.14.0). When syncing upstream changes, the goal is parity with the JS implementation, not idiomatic refactoring — keep method names, control flow, and repair heuristics aligned with the JS source so future syncs stay tractable.
|
|
21
22
|
|
|
23
|
+
A few repair heuristics deliberately go **beyond** upstream (leading-dot numbers like `.5`, Markdown list markers like `- {...}`). Each such site carries a "Divergence from upstream" comment in the code and a CHANGELOG note — keep that convention when adding more, so future upstream syncs can tell ported behavior from local extensions.
|
|
24
|
+
|
|
22
25
|
### Entry point
|
|
23
26
|
|
|
24
|
-
`JSON.repair(str)` in `lib/json/repair.rb`
|
|
27
|
+
`JSON.repair(str)` in `lib/json/repair.rb` first tries stdlib `JSON.parse` (fast path; opt out with `skip_json_loads: true`), and falls back to `JSON::Repairer.new(str).repair` when that raises. Either way the result is re-serialized with `JSON.generate`, so **output is canonical** — whitespace collapsed, numbers normalized, duplicate keys last-write-wins — and both paths agree on it. A `REPAIR_REQUIRED_PATTERN` regex routes inputs containing comments or invalid escapes straight to the Repairer even though the bundled `json` gem would accept them. `return_objects: true` returns the parsed Ruby value instead of a string; `JSON.repair_file(path)` / `JSON.repair_io(io)` are convenience wrappers forwarding both options.
|
|
28
|
+
|
|
29
|
+
`JSON::JSONRepairError` is the only error raised for unrecoverable inputs; it exposes the failure `#position`. If the Repairer ever emits a string stdlib cannot parse (a Repairer bug), the `JSON::ParserError` is wrapped in `JSONRepairError` rather than leaked.
|
|
30
|
+
|
|
31
|
+
`exe/json-repair` is a CLI wrapper (`lib/json/repair/cli.rb`) reading stdin or a file, writing stdout, `--output FILE`, or `--overwrite`.
|
|
25
32
|
|
|
26
33
|
### The parser (`lib/json/repairer.rb`)
|
|
27
34
|
|
|
@@ -43,7 +50,7 @@ Two patterns recur and are worth knowing before editing:
|
|
|
43
50
|
- **Backtracking via snapshots.** Methods like `parse_string` capture `i_before = @index` and `o_before = @output.length` before tentatively consuming input. If a later check (e.g. "the end quote turned out not to be a real end quote") fails, they restore both and re-invoke themselves with different flags (e.g. `stop_at_delimiter: true`, `stop_at_index: …`). Preserve this pattern when modifying string/number parsing.
|
|
44
51
|
- **Repair-by-rewriting-tail.** Helpers like `insert_before_last_whitespace(@output, ',')` and `@output = strip_last_occurrence(@output, ',')` patch the already-emitted output to fix things like missing or trailing commas. These run *after* the malformed input has been partially emitted — they are the mechanism for "I now realize that earlier token needed a comma after it."
|
|
45
52
|
|
|
46
|
-
`repair` (the public method) drives `parse_value` then handles top-level concerns: stripping Markdown fences (` ```json ... ``` `), converting newline-delimited JSON at the root into an array, dropping redundant trailing braces/brackets, and rejecting any non-whitespace trailing garbage.
|
|
53
|
+
`repair` (the public method) drives `parse_value` then handles top-level concerns: stripping Markdown fences (` ```json ... ``` `), skipping Markdown list markers like `- ` / `* ` / `1. ` before the root value and each newline-delimited line (`markdown_list_marker_length` / `skip_markdown_list_marker` — top-level only, never inside nested structures), converting newline-delimited JSON at the root into an array, dropping redundant trailing braces/brackets, and rejecting any non-whitespace trailing garbage.
|
|
47
54
|
|
|
48
55
|
### Shared helpers (`lib/json/repair/string_utils.rb`)
|
|
49
56
|
|
|
@@ -62,6 +69,13 @@ RBS signatures mirror the public surface of `JSON.repair`, `JSON::Repairer`, and
|
|
|
62
69
|
|
|
63
70
|
### Test layout
|
|
64
71
|
|
|
65
|
-
- `spec/json_spec.rb` — the substantive behavioral suite (
|
|
72
|
+
- `spec/json_spec.rb` — the substantive behavioral suite (130+ examples, hundreds of assertions, covering every repair heuristic). New behavior — and every sync from upstream — belongs here.
|
|
73
|
+
- `spec/json/repair/cli_spec.rb` — the `exe/json-repair` CLI (argument handling, IO errors, exit codes).
|
|
74
|
+
- `spec/json/repair/string_utils_spec.rb` — direct unit coverage for a few `StringUtils` edge cases the behavioral suite can't reach.
|
|
66
75
|
- `spec/json/repair_spec.rb` — sanity check on `JSON::Repair::VERSION` only.
|
|
67
|
-
-
|
|
76
|
+
- SimpleCov enforces 100% line and branch coverage on the full run, so a filtered `rspec -e ...` run "fails" at the coverage gate even when all selected examples pass — ignore that exit code during TDD.
|
|
77
|
+
- `.rspec_status` is gitignored (local pass/fail tracking for `--only-failures` / `--next-failure`).
|
|
78
|
+
|
|
79
|
+
## Local planning notes
|
|
80
|
+
|
|
81
|
+
`docs/` (the `TODO.md` backlog plus design specs and implementation plans under `docs/superpowers/`) is gitignored local planning material — read it for context, update it as work completes, but never commit anything under `docs/`.
|
data/README.md
CHANGED
|
@@ -4,7 +4,7 @@ This is a Ruby gem designed to repair broken JSON strings. Inspired by and based
|
|
|
4
4
|
|
|
5
5
|
## Installation
|
|
6
6
|
|
|
7
|
-
Add this gem to your application's
|
|
7
|
+
Add this gem to your application's Gemfile by executing:
|
|
8
8
|
|
|
9
9
|
```bash
|
|
10
10
|
$ bundle add json-repair
|
|
@@ -37,6 +37,12 @@ Markdown markup in LLM output is handled too: fenced code blocks like `` ```json
|
|
|
37
37
|
JSON.repair("- {\"a\": 1}\n- {\"b\": 2}") # => '[{"a":1},{"b":2}]'
|
|
38
38
|
```
|
|
39
39
|
|
|
40
|
+
Object values containing unescaped quotes around a colon are merged back into a single string value:
|
|
41
|
+
|
|
42
|
+
```ruby
|
|
43
|
+
JSON.repair('{"a": "b": "c"}') # => '{"a":"b\": \"c"}' — the value reads as 'b": "c'
|
|
44
|
+
```
|
|
45
|
+
|
|
40
46
|
Pass `return_objects: true` to get the parsed Ruby value (Hash, Array, or scalar) instead of a string:
|
|
41
47
|
|
|
42
48
|
```ruby
|
|
@@ -127,6 +127,10 @@ module JSON
|
|
|
127
127
|
whitespace_except_newline?(char) || special_whitespace?(char)
|
|
128
128
|
end
|
|
129
129
|
|
|
130
|
+
def whitespace_or_special?(char)
|
|
131
|
+
whitespace?(char) || special_whitespace?(char)
|
|
132
|
+
end
|
|
133
|
+
|
|
130
134
|
def quote?(char)
|
|
131
135
|
double_quote_like?(char) || single_quote_like?(char)
|
|
132
136
|
end
|
data/lib/json/repair/version.rb
CHANGED
data/lib/json/repairer.rb
CHANGED
|
@@ -294,6 +294,10 @@ module JSON
|
|
|
294
294
|
end
|
|
295
295
|
# :nocov:
|
|
296
296
|
end
|
|
297
|
+
|
|
298
|
+
# repair: an object string value with unescaped quotes around a
|
|
299
|
+
# colon, like {"a": "b": "c"}
|
|
300
|
+
repair_doubled_colon if processed_value
|
|
297
301
|
end
|
|
298
302
|
|
|
299
303
|
if @json[@index] == CLOSING_BRACE
|
|
@@ -307,6 +311,59 @@ module JSON
|
|
|
307
311
|
true
|
|
308
312
|
end
|
|
309
313
|
|
|
314
|
+
# Repair an object value with unescaped quotes around a colon, like
|
|
315
|
+
# {"a": "b": "c"}, by merging it all into one string value: 'b": "c'
|
|
316
|
+
# (the unescaped-quotes reading of the input). Greedy: keeps merging
|
|
317
|
+
# while another `: "..."` follows. Only the string-colon-string
|
|
318
|
+
# shape is repaired; anything else falls through to the regular
|
|
319
|
+
# error paths. Divergence from upstream (which raises "Object key
|
|
320
|
+
# expected" as of v3.14.0), matching the Go and Python json-repair
|
|
321
|
+
# libraries on the canonical case.
|
|
322
|
+
def repair_doubled_colon
|
|
323
|
+
loop do
|
|
324
|
+
colon = @index
|
|
325
|
+
# :nocov: kept for symmetry with the start_quote scan below; unreachable
|
|
326
|
+
# because @index never rests on whitespace here. On first entry,
|
|
327
|
+
# parse_value ends with parse_whitespace_and_skip_comments. On greedy
|
|
328
|
+
# re-entry, every parse_string exit leaves @index off-whitespace: the
|
|
329
|
+
# EOF path (nil is not whitespace), the stop_at_index path (a
|
|
330
|
+
# prev_non_whitespace_index position), and the end-quote path (ends in
|
|
331
|
+
# parse_concatenated_string, whose leading whitespace skip consumes
|
|
332
|
+
# newlines too). If a future parse_value/parse_string change breaks
|
|
333
|
+
# that, this scan becomes live and the :nocov: will hide it.
|
|
334
|
+
colon += 1 while whitespace_or_special?(@json[colon])
|
|
335
|
+
# :nocov:
|
|
336
|
+
return unless @json[colon] == COLON
|
|
337
|
+
|
|
338
|
+
# scan past special whitespace too (unlike prev_non_whitespace_index):
|
|
339
|
+
# parse_whitespace treats NBSP and friends as whitespace, so this
|
|
340
|
+
# repair should as well. The value's last character (at worst the
|
|
341
|
+
# object's opening brace) always stops the scan before index 0.
|
|
342
|
+
end_quote = colon - 1
|
|
343
|
+
end_quote -= 1 while whitespace_or_special?(@json[end_quote])
|
|
344
|
+
return unless quote?(@json[end_quote])
|
|
345
|
+
|
|
346
|
+
start_quote = colon + 1
|
|
347
|
+
start_quote += 1 while whitespace_or_special?(@json[start_quote])
|
|
348
|
+
return unless quote?(@json[start_quote])
|
|
349
|
+
|
|
350
|
+
# repair: replace the end quote already emitted (plus any copied
|
|
351
|
+
# trailing whitespace) with the literal input span from that end
|
|
352
|
+
# quote through the next start quote, escaped as string content
|
|
353
|
+
@output = strip_last_occurrence(@output, '"', strip_remaining_text: true)
|
|
354
|
+
@json[end_quote..start_quote].each_char do |char|
|
|
355
|
+
@output << (char == DOUBLE_QUOTE ? '\"' : CONTROL_CHARACTERS.fetch(char, char))
|
|
356
|
+
end
|
|
357
|
+
|
|
358
|
+
# let parse_string consume the rest of the merged string, then
|
|
359
|
+
# drop the start quote it emits (already emitted escaped above)
|
|
360
|
+
@index = start_quote
|
|
361
|
+
start = @output.length
|
|
362
|
+
parse_string
|
|
363
|
+
@output = remove_at_index(@output, start, 1)
|
|
364
|
+
end
|
|
365
|
+
end
|
|
366
|
+
|
|
310
367
|
def skip_character(char)
|
|
311
368
|
if @json[@index] == char
|
|
312
369
|
@index += 1
|
data/sig/json/repairer.rbs
CHANGED
|
@@ -52,6 +52,10 @@ module JSON
|
|
|
52
52
|
# Parse an object like '{"key": "value"}'
|
|
53
53
|
def parse_object: () -> bool
|
|
54
54
|
|
|
55
|
+
# Repair an object value with unescaped quotes around a colon,
|
|
56
|
+
# like {"a": "b": "c"}, by merging it into one string value.
|
|
57
|
+
def repair_doubled_colon: () -> void
|
|
58
|
+
|
|
55
59
|
def skip_character: (::String char) -> bool
|
|
56
60
|
|
|
57
61
|
# Skip ellipsis like "[1,2,3,...]" or "[1,2,3,...,9]" or "[...,7,8,9]"
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: json-repair
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.11.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Aleksandr Zykov
|
|
@@ -9,7 +9,11 @@ bindir: exe
|
|
|
9
9
|
cert_chain: []
|
|
10
10
|
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
11
|
dependencies: []
|
|
12
|
-
description:
|
|
12
|
+
description: 'Repairs broken JSON: missing quotes and commas, unclosed brackets, trailing
|
|
13
|
+
commas, unquoted keys, single quotes, comments, Python constants, NDJSON, Markdown
|
|
14
|
+
code fences and list markers in LLM output, truncated documents, and more. A Ruby
|
|
15
|
+
port of the jsonrepair JavaScript library — useful whenever JSON from LLMs, APIs,
|
|
16
|
+
or logs does not strictly follow the standard.'
|
|
13
17
|
email:
|
|
14
18
|
- alexandrz@gmail.com
|
|
15
19
|
executables:
|
|
@@ -44,6 +48,8 @@ metadata:
|
|
|
44
48
|
homepage_uri: https://github.com/sashazykov/json-repair-rb
|
|
45
49
|
source_code_uri: https://github.com/sashazykov/json-repair-rb
|
|
46
50
|
changelog_uri: https://github.com/sashazykov/json-repair-rb/blob/main/CHANGELOG.md
|
|
51
|
+
documentation_uri: https://rubydoc.info/gems/json-repair
|
|
52
|
+
bug_tracker_uri: https://github.com/sashazykov/json-repair-rb/issues
|
|
47
53
|
rdoc_options: []
|
|
48
54
|
require_paths:
|
|
49
55
|
- lib
|
|
@@ -60,5 +66,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
60
66
|
requirements: []
|
|
61
67
|
rubygems_version: 3.6.9
|
|
62
68
|
specification_version: 4
|
|
63
|
-
summary:
|
|
69
|
+
summary: Repair invalid or malformed JSON documents, including LLM output.
|
|
64
70
|
test_files: []
|