json-repair 0.7.0 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +2 -4
- data/CHANGELOG.md +69 -0
- data/CLAUDE.md +19 -5
- data/README.md +27 -1
- data/Rakefile +3 -0
- data/lib/json/repair/string_utils.rb +31 -17
- data/lib/json/repair/version.rb +1 -1
- data/lib/json/repair.rb +23 -1
- data/lib/json/repairer.rb +131 -4
- data/sig/json/repair/string_utils.rbs +40 -32
- data/sig/json/repair.rbs +21 -3
- data/sig/json/repairer.rbs +47 -28
- metadata +9 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0c130bbea1b9299e31e5bfa8db873b09fd911715b7125fda6ee60a101353be5f
|
|
4
|
+
data.tar.gz: 80f6e2fe16669210505e45c99a443eaa3ce6b8b3c10e3967633885f91a9d057b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 4e2c05c45ebf1cf149705021faa611c22e6e2e0de48d15d2496da4e935291abe6bf185cde756263c6aad210f73ca2e54e4c965826b14bdbb0fa9ffa47bf1684f
|
|
7
|
+
data.tar.gz: 180b477cea0b27c813b664bfcda75eb28b59453d1fe4ce90781cf298c870aa8518411e689a37ddc30882aa5f376c351ff718600411d97df845f67fd87b593d92
|
data/.rubocop.yml
CHANGED
|
@@ -1,10 +1,6 @@
|
|
|
1
1
|
AllCops:
|
|
2
2
|
TargetRubyVersion: 3.0
|
|
3
3
|
|
|
4
|
-
Metrics/BlockLength:
|
|
5
|
-
Exclude:
|
|
6
|
-
- spec/**/*
|
|
7
|
-
|
|
8
4
|
Style/Documentation:
|
|
9
5
|
Enabled: false
|
|
10
6
|
|
|
@@ -33,6 +29,8 @@ Metrics/BlockLength:
|
|
|
33
29
|
Exclude:
|
|
34
30
|
- lib/json/repairer.rb
|
|
35
31
|
- spec/**/*
|
|
32
|
+
# restore RuboCop's default exclusion, lost when Exclude is overridden
|
|
33
|
+
- '*.gemspec'
|
|
36
34
|
|
|
37
35
|
Metrics/BlockNesting:
|
|
38
36
|
Exclude:
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,74 @@
|
|
|
1
1
|
# Changes
|
|
2
2
|
|
|
3
|
+
### 2026-06-12 (0.11.0)
|
|
4
|
+
|
|
5
|
+
* Repair object string values with unescaped quotes around a colon
|
|
6
|
+
("doubled colon"): `{"a": "b": "c"}` → `{"a":"b\": \"c"}` — the
|
|
7
|
+
value reads as `b": "c`, the unescaped-quotes interpretation. The
|
|
8
|
+
merge preserves the literal characters between the strings
|
|
9
|
+
(whitespace, original quote style) and repeats greedily
|
|
10
|
+
(`{"a": "b": "c": "d"}` → value `b": "c": "d`). Only the
|
|
11
|
+
string–colon–string shape is repaired: non-string shapes like
|
|
12
|
+
`{"a": "b": 1}` or `{"a": 1: 2}` still raise `JSONRepairError`
|
|
13
|
+
rather than silently dropping data (Python `json_repair` drops the
|
|
14
|
+
`: 1` there). Previously all of these raised "Object key expected".
|
|
15
|
+
Deliberate divergence from upstream
|
|
16
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair) (raises as of
|
|
17
|
+
v3.14.0), matching
|
|
18
|
+
[Go json-repair](https://github.com/RealAlexandreAI/json-repair)
|
|
19
|
+
and Python
|
|
20
|
+
[`json_repair`](https://github.com/mangiucugna/json_repair) on the
|
|
21
|
+
canonical case.
|
|
22
|
+
|
|
23
|
+
### 2026-06-11 (0.10.0)
|
|
24
|
+
|
|
25
|
+
* Repair Markdown list markers in front of top-level values:
|
|
26
|
+
`- {"a": 1}` → `{"a":1}`, and multi-line lists become arrays via the
|
|
27
|
+
existing newline-delimited JSON handling
|
|
28
|
+
(`"- {\"a\": 1}\n- {\"b\": 2}"` → `[{"a":1},{"b":2}]`). Bullet
|
|
29
|
+
markers `-`, `*`, `+` and ordered markers like `1.` / `2)` (up to
|
|
30
|
+
nine digits, the CommonMark limit) are recognized at the start of
|
|
31
|
+
the root value and of each newline-delimited line, only when
|
|
32
|
+
followed by same-line whitespace and a value — so `-5`, a trailing
|
|
33
|
+
`"- "`, and newline-delimited decimals like `"1.5\n2.5"` keep their
|
|
34
|
+
number readings, and nothing changes inside nested structures.
|
|
35
|
+
Previously these inputs raised `JSONRepairError`; two non-raising
|
|
36
|
+
behaviors change for the better: `"3\n- 5\n7"` now repairs to
|
|
37
|
+
`[3,5,7]` instead of the corrupt `[3,0,5,7]`, and a single-line
|
|
38
|
+
`* text` becomes `"text"` instead of `"* text"`. Deliberate
|
|
39
|
+
divergence from upstream
|
|
40
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair) (no Markdown
|
|
41
|
+
list handling as of v3.14.0), and more precise than Python
|
|
42
|
+
[`json_repair`](https://github.com/mangiucugna/json_repair), which
|
|
43
|
+
collapses scalar list items to `""`.
|
|
44
|
+
|
|
45
|
+
### 2026-06-11 (0.9.0)
|
|
46
|
+
|
|
47
|
+
* Repair numbers missing the digit before their decimal point:
|
|
48
|
+
`.5` → `0.5`, `-.5` → `-0.5`, and truncated forms like `.` → `0.0`.
|
|
49
|
+
Previously these leaked a raw stdlib `JSON::ParserError` out of
|
|
50
|
+
`JSON.repair` because the repairer emitted the leading-dot number
|
|
51
|
+
unchanged (invalid JSON) and the canonical-output re-parse choked on
|
|
52
|
+
it. This is a deliberate divergence from upstream
|
|
53
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair) (which leaves
|
|
54
|
+
leading-dot numbers unrepaired as of v3.14.0), matching
|
|
55
|
+
[dirty-json](https://github.com/RyanMarcus/dirty-json) behavior.
|
|
56
|
+
* `JSON.repair` now guards its error contract: if the repairer ever
|
|
57
|
+
emits a string stdlib JSON cannot parse (a repairer bug), the stdlib
|
|
58
|
+
error is wrapped in `JSON::JSONRepairError` instead of leaking
|
|
59
|
+
`JSON::ParserError` to callers.
|
|
60
|
+
|
|
61
|
+
### 2026-05-15 (0.8.0)
|
|
62
|
+
|
|
63
|
+
* `JSON.repair_file(path)` and `JSON.repair_io(io)` convenience
|
|
64
|
+
wrappers around `JSON.repair`. `repair_file` reads a path from disk
|
|
65
|
+
(accepts a `String` or `Pathname`); `repair_io` reads from any
|
|
66
|
+
object responding to `#read` (e.g. `File`, `StringIO`, `$stdin`)
|
|
67
|
+
without closing it. Both forward `return_objects:` and
|
|
68
|
+
`skip_json_loads:` through to `JSON.repair`. Mirrors Python's
|
|
69
|
+
[`json_repair`](https://github.com/mangiucugna/json_repair)
|
|
70
|
+
`load` / `from_file` helpers.
|
|
71
|
+
|
|
3
72
|
### 2026-05-12 (0.7.0)
|
|
4
73
|
|
|
5
74
|
* `JSON.repair` now always returns canonical JSON via
|
data/CLAUDE.md
CHANGED
|
@@ -10,7 +10,8 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|
|
10
10
|
- `bundle exec rspec spec/json_spec.rb:42` — run a single example by line number; nearly all behavioral specs live in `spec/json_spec.rb`.
|
|
11
11
|
- `bundle exec rubocop` — lint. Project-specific exclusions in `.rubocop.yml` deliberately disable several `Metrics/*` cops for `lib/json/repairer.rb` and `lib/json/repair/string_utils.rb` because the parser is long by design — don't try to "fix" it by chopping methods up.
|
|
12
12
|
- `bin/console` — IRB with the gem preloaded.
|
|
13
|
-
- `bundle exec rake
|
|
13
|
+
- `bundle exec rake bench` — benchmark-ips regression baseline (`benchmark/run.rb`); run before/after perf-sensitive changes.
|
|
14
|
+
- `bundle exec rake install` / `bundle exec rake release` — local install / publish to rubygems.org (release prompts for a rubygems MFA OTP).
|
|
14
15
|
- Type checking: `Steepfile` checks `lib/` against `sig/`. `bundle exec steep check` (typecheck) and `bundle exec rbs validate` (sig syntax) both run in CI and as part of the default rake task. `steep` and `rbs` are dev dependencies in the `Gemfile`.
|
|
15
16
|
|
|
16
17
|
Ruby `>= 3.0.0` is required (per gemspec). CI runs against Ruby 3.3.1.
|
|
@@ -19,9 +20,15 @@ Ruby `>= 3.0.0` is required (per gemspec). CI runs against Ruby 3.3.1.
|
|
|
19
20
|
|
|
20
21
|
This gem is a **Ruby port of the [josdejong/jsonrepair](https://github.com/josdejong/jsonrepair) TypeScript library**. The upstream version currently mirrored is tracked in `CHANGELOG.md` (presently v3.14.0). When syncing upstream changes, the goal is parity with the JS implementation, not idiomatic refactoring — keep method names, control flow, and repair heuristics aligned with the JS source so future syncs stay tractable.
|
|
21
22
|
|
|
23
|
+
A few repair heuristics deliberately go **beyond** upstream (leading-dot numbers like `.5`, Markdown list markers like `- {...}`). Each such site carries a "Divergence from upstream" comment in the code and a CHANGELOG note — keep that convention when adding more, so future upstream syncs can tell ported behavior from local extensions.
|
|
24
|
+
|
|
22
25
|
### Entry point
|
|
23
26
|
|
|
24
|
-
`JSON.repair(str)` in `lib/json/repair.rb`
|
|
27
|
+
`JSON.repair(str)` in `lib/json/repair.rb` first tries stdlib `JSON.parse` (fast path; opt out with `skip_json_loads: true`), and falls back to `JSON::Repairer.new(str).repair` when that raises. Either way the result is re-serialized with `JSON.generate`, so **output is canonical** — whitespace collapsed, numbers normalized, duplicate keys last-write-wins — and both paths agree on it. A `REPAIR_REQUIRED_PATTERN` regex routes inputs containing comments or invalid escapes straight to the Repairer even though the bundled `json` gem would accept them. `return_objects: true` returns the parsed Ruby value instead of a string; `JSON.repair_file(path)` / `JSON.repair_io(io)` are convenience wrappers forwarding both options.
|
|
28
|
+
|
|
29
|
+
`JSON::JSONRepairError` is the only error raised for unrecoverable inputs; it exposes the failure `#position`. If the Repairer ever emits a string stdlib cannot parse (a Repairer bug), the `JSON::ParserError` is wrapped in `JSONRepairError` rather than leaked.
|
|
30
|
+
|
|
31
|
+
`exe/json-repair` is a CLI wrapper (`lib/json/repair/cli.rb`) reading stdin or a file, writing stdout, `--output FILE`, or `--overwrite`.
|
|
25
32
|
|
|
26
33
|
### The parser (`lib/json/repairer.rb`)
|
|
27
34
|
|
|
@@ -43,7 +50,7 @@ Two patterns recur and are worth knowing before editing:
|
|
|
43
50
|
- **Backtracking via snapshots.** Methods like `parse_string` capture `i_before = @index` and `o_before = @output.length` before tentatively consuming input. If a later check (e.g. "the end quote turned out not to be a real end quote") fails, they restore both and re-invoke themselves with different flags (e.g. `stop_at_delimiter: true`, `stop_at_index: …`). Preserve this pattern when modifying string/number parsing.
|
|
44
51
|
- **Repair-by-rewriting-tail.** Helpers like `insert_before_last_whitespace(@output, ',')` and `@output = strip_last_occurrence(@output, ',')` patch the already-emitted output to fix things like missing or trailing commas. These run *after* the malformed input has been partially emitted — they are the mechanism for "I now realize that earlier token needed a comma after it."
|
|
45
52
|
|
|
46
|
-
`repair` (the public method) drives `parse_value` then handles top-level concerns: stripping Markdown fences (` ```json ... ``` `), converting newline-delimited JSON at the root into an array, dropping redundant trailing braces/brackets, and rejecting any non-whitespace trailing garbage.
|
|
53
|
+
`repair` (the public method) drives `parse_value` then handles top-level concerns: stripping Markdown fences (` ```json ... ``` `), skipping Markdown list markers like `- ` / `* ` / `1. ` before the root value and each newline-delimited line (`markdown_list_marker_length` / `skip_markdown_list_marker` — top-level only, never inside nested structures), converting newline-delimited JSON at the root into an array, dropping redundant trailing braces/brackets, and rejecting any non-whitespace trailing garbage.
|
|
47
54
|
|
|
48
55
|
### Shared helpers (`lib/json/repair/string_utils.rb`)
|
|
49
56
|
|
|
@@ -62,6 +69,13 @@ RBS signatures mirror the public surface of `JSON.repair`, `JSON::Repairer`, and
|
|
|
62
69
|
|
|
63
70
|
### Test layout
|
|
64
71
|
|
|
65
|
-
- `spec/json_spec.rb` — the substantive behavioral suite (
|
|
72
|
+
- `spec/json_spec.rb` — the substantive behavioral suite (130+ examples, hundreds of assertions, covering every repair heuristic). New behavior — and every sync from upstream — belongs here.
|
|
73
|
+
- `spec/json/repair/cli_spec.rb` — the `exe/json-repair` CLI (argument handling, IO errors, exit codes).
|
|
74
|
+
- `spec/json/repair/string_utils_spec.rb` — direct unit coverage for a few `StringUtils` edge cases the behavioral suite can't reach.
|
|
66
75
|
- `spec/json/repair_spec.rb` — sanity check on `JSON::Repair::VERSION` only.
|
|
67
|
-
-
|
|
76
|
+
- SimpleCov enforces 100% line and branch coverage on the full run, so a filtered `rspec -e ...` run "fails" at the coverage gate even when all selected examples pass — ignore that exit code during TDD.
|
|
77
|
+
- `.rspec_status` is gitignored (local pass/fail tracking for `--only-failures` / `--next-failure`).
|
|
78
|
+
|
|
79
|
+
## Local planning notes
|
|
80
|
+
|
|
81
|
+
`docs/` (the `TODO.md` backlog plus design specs and implementation plans under `docs/superpowers/`) is gitignored local planning material — read it for context, update it as work completes, but never commit anything under `docs/`.
|
data/README.md
CHANGED
|
@@ -4,7 +4,7 @@ This is a Ruby gem designed to repair broken JSON strings. Inspired by and based
|
|
|
4
4
|
|
|
5
5
|
## Installation
|
|
6
6
|
|
|
7
|
-
Add this gem to your application's
|
|
7
|
+
Add this gem to your application's Gemfile by executing:
|
|
8
8
|
|
|
9
9
|
```bash
|
|
10
10
|
$ bundle add json-repair
|
|
@@ -31,6 +31,18 @@ puts repaired_json # Outputs: {"name":"Alice","age":25}
|
|
|
31
31
|
|
|
32
32
|
The `repair` method takes a string containing JSON data and returns a corrected version of this string, ensuring it is valid JSON.
|
|
33
33
|
|
|
34
|
+
Markdown markup in LLM output is handled too: fenced code blocks like `` ```json `` are stripped, and list markers (`-`, `*`, `+`, `1.`) in front of top-level values are removed — a multi-line list becomes an array:
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
JSON.repair("- {\"a\": 1}\n- {\"b\": 2}") # => '[{"a":1},{"b":2}]'
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Object values containing unescaped quotes around a colon are merged back into a single string value:
|
|
41
|
+
|
|
42
|
+
```ruby
|
|
43
|
+
JSON.repair('{"a": "b": "c"}') # => '{"a":"b\": \"c"}' — the value reads as 'b": "c'
|
|
44
|
+
```
|
|
45
|
+
|
|
34
46
|
Pass `return_objects: true` to get the parsed Ruby value (Hash, Array, or scalar) instead of a string:
|
|
35
47
|
|
|
36
48
|
```ruby
|
|
@@ -53,6 +65,20 @@ If you need the parsed Ruby value instead of a string, pass `return_objects: tru
|
|
|
53
65
|
|
|
54
66
|
`skip_json_loads: true` skips the stdlib `JSON.parse` attempt and routes the input straight through the repairer. The output is the same; the option is purely a performance knob for callers who know their input will need repair.
|
|
55
67
|
|
|
68
|
+
### Reading from a file or IO
|
|
69
|
+
|
|
70
|
+
`JSON.repair_file(path)` reads a file from disk and repairs its contents. `JSON.repair_io(io)` does the same with any object that responds to `#read` (e.g. `File`, `StringIO`, `$stdin`). Both forward `return_objects:` and `skip_json_loads:` to `JSON.repair`.
|
|
71
|
+
|
|
72
|
+
```ruby
|
|
73
|
+
JSON.repair_file('broken.json')
|
|
74
|
+
JSON.repair_file('broken.json', return_objects: true)
|
|
75
|
+
|
|
76
|
+
File.open('broken.json') { |io| JSON.repair_io(io) }
|
|
77
|
+
JSON.repair_io($stdin)
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
`JSON.repair_io` does not close the IO — the caller manages its lifecycle.
|
|
81
|
+
|
|
56
82
|
## Command line
|
|
57
83
|
|
|
58
84
|
The gem ships a `json-repair` executable. It reads from stdin or a file and writes to stdout, `--output FILE`, or back over the input file with `--overwrite`.
|
data/Rakefile
CHANGED
|
@@ -19,6 +19,9 @@ task :steep do
|
|
|
19
19
|
sh 'bundle exec steep check'
|
|
20
20
|
end
|
|
21
21
|
|
|
22
|
+
desc 'Type-check: rbs validate + steep check'
|
|
23
|
+
task typecheck: %i[rbs steep]
|
|
24
|
+
|
|
22
25
|
desc 'Run benchmark/run.rb (regression baseline for JSON.repair)'
|
|
23
26
|
task :bench do
|
|
24
27
|
ruby '-Ilib', 'benchmark/run.rb'
|
|
@@ -60,13 +60,14 @@ module JSON
|
|
|
60
60
|
|
|
61
61
|
# Functions to check character chars
|
|
62
62
|
def hex?(char)
|
|
63
|
-
|
|
64
|
-
(char >=
|
|
65
|
-
|
|
63
|
+
!char.nil? &&
|
|
64
|
+
((char >= ZERO && char <= NINE) ||
|
|
65
|
+
(char >= UPPERCASE_A && char <= UPPERCASE_F) ||
|
|
66
|
+
(char >= LOWERCASE_A && char <= LOWERCASE_F))
|
|
66
67
|
end
|
|
67
68
|
|
|
68
69
|
def digit?(char)
|
|
69
|
-
char && char >= ZERO && char <= NINE
|
|
70
|
+
!char.nil? && char >= ZERO && char <= NINE
|
|
70
71
|
end
|
|
71
72
|
|
|
72
73
|
def valid_string_character?(char)
|
|
@@ -74,11 +75,11 @@ module JSON
|
|
|
74
75
|
end
|
|
75
76
|
|
|
76
77
|
def delimiter?(char)
|
|
77
|
-
REGEX_DELIMITER.match?(char)
|
|
78
|
+
!char.nil? && REGEX_DELIMITER.match?(char)
|
|
78
79
|
end
|
|
79
80
|
|
|
80
81
|
def unquoted_string_delimiter?(char)
|
|
81
|
-
REGEX_UNQUOTED_STRING_DELIMITER.match?(char)
|
|
82
|
+
!char.nil? && REGEX_UNQUOTED_STRING_DELIMITER.match?(char)
|
|
82
83
|
end
|
|
83
84
|
|
|
84
85
|
REGEX_FUNCTION_NAME_CHAR_START = /\A[a-zA-Z_$]\z/
|
|
@@ -93,19 +94,19 @@ module JSON
|
|
|
93
94
|
end
|
|
94
95
|
|
|
95
96
|
def start_of_value?(char)
|
|
96
|
-
REGEX_START_OF_VALUE.match?(char) ||
|
|
97
|
+
!char.nil? && (REGEX_START_OF_VALUE.match?(char) || quote?(char))
|
|
97
98
|
end
|
|
98
99
|
|
|
99
100
|
def control_character?(char)
|
|
100
|
-
[NEWLINE, RETURN, TAB, BACKSPACE, FORM_FEED].include?(char)
|
|
101
|
+
!char.nil? && [NEWLINE, RETURN, TAB, BACKSPACE, FORM_FEED].include?(char)
|
|
101
102
|
end
|
|
102
103
|
|
|
103
104
|
def whitespace?(char)
|
|
104
|
-
[SPACE, NEWLINE, TAB, RETURN].include?(char)
|
|
105
|
+
!char.nil? && [SPACE, NEWLINE, TAB, RETURN].include?(char)
|
|
105
106
|
end
|
|
106
107
|
|
|
107
108
|
def whitespace_except_newline?(char)
|
|
108
|
-
[SPACE, TAB, RETURN].include?(char)
|
|
109
|
+
!char.nil? && [SPACE, TAB, RETURN].include?(char)
|
|
109
110
|
end
|
|
110
111
|
|
|
111
112
|
def special_whitespace?(char)
|
|
@@ -122,6 +123,14 @@ module JSON
|
|
|
122
123
|
(char >= EN_QUAD && char <= ZERO_WIDTH_SPACE)
|
|
123
124
|
end
|
|
124
125
|
|
|
126
|
+
def same_line_whitespace?(char)
|
|
127
|
+
whitespace_except_newline?(char) || special_whitespace?(char)
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
def whitespace_or_special?(char)
|
|
131
|
+
whitespace?(char) || special_whitespace?(char)
|
|
132
|
+
end
|
|
133
|
+
|
|
125
134
|
def quote?(char)
|
|
126
135
|
double_quote_like?(char) || single_quote_like?(char)
|
|
127
136
|
end
|
|
@@ -135,20 +144,25 @@ module JSON
|
|
|
135
144
|
end
|
|
136
145
|
|
|
137
146
|
def double_quote_like?(char)
|
|
138
|
-
[DOUBLE_QUOTE, DOUBLE_QUOTE_LEFT, DOUBLE_QUOTE_RIGHT].include?(char)
|
|
147
|
+
!char.nil? && [DOUBLE_QUOTE, DOUBLE_QUOTE_LEFT, DOUBLE_QUOTE_RIGHT].include?(char)
|
|
139
148
|
end
|
|
140
149
|
|
|
141
150
|
def single_quote_like?(char)
|
|
142
|
-
[QUOTE, QUOTE_LEFT, QUOTE_RIGHT, GRAVE_ACCENT, ACUTE_ACCENT].include?(char)
|
|
151
|
+
!char.nil? && [QUOTE, QUOTE_LEFT, QUOTE_RIGHT, GRAVE_ACCENT, ACUTE_ACCENT].include?(char)
|
|
143
152
|
end
|
|
144
153
|
|
|
145
|
-
# Strip last occurrence of text_to_strip from text
|
|
154
|
+
# Strip last occurrence of text_to_strip from text.
|
|
155
|
+
#
|
|
156
|
+
# `|| ''` on the slices below (and in `insert_before_last_whitespace` /
|
|
157
|
+
# `remove_at_index`) is for steep's nil-narrowing: `String#[range]` is
|
|
158
|
+
# typed `String?`, but every call site here keeps indices within
|
|
159
|
+
# `0..text.length`, so the slices never actually return `nil`.
|
|
146
160
|
def strip_last_occurrence(text, text_to_strip, strip_remaining_text: false)
|
|
147
161
|
index = text.rindex(text_to_strip)
|
|
148
162
|
return text unless index
|
|
149
163
|
|
|
150
|
-
remaining_text = strip_remaining_text ? '' : text[index + 1..]
|
|
151
|
-
text[0...index] + remaining_text
|
|
164
|
+
remaining_text = strip_remaining_text ? '' : (text[index + 1..] || '')
|
|
165
|
+
(text[0...index] || '') + remaining_text
|
|
152
166
|
end
|
|
153
167
|
|
|
154
168
|
def insert_before_last_whitespace(text, text_to_insert)
|
|
@@ -158,7 +172,7 @@ module JSON
|
|
|
158
172
|
|
|
159
173
|
index -= 1 while whitespace?(text[index - 1])
|
|
160
174
|
|
|
161
|
-
text[0...index] + text_to_insert + text[index..]
|
|
175
|
+
(text[0...index] || '') + text_to_insert + (text[index..] || '')
|
|
162
176
|
end
|
|
163
177
|
|
|
164
178
|
# Parse keywords true, false, null
|
|
@@ -187,7 +201,7 @@ module JSON
|
|
|
187
201
|
end
|
|
188
202
|
|
|
189
203
|
def remove_at_index(text, start, count)
|
|
190
|
-
text[0...start] + text[start + count..]
|
|
204
|
+
(text[0...start] || '') + (text[start + count..] || '')
|
|
191
205
|
end
|
|
192
206
|
|
|
193
207
|
def ends_with_comma_or_newline?(text)
|
data/lib/json/repair/version.rb
CHANGED
data/lib/json/repair.rb
CHANGED
|
@@ -20,6 +20,22 @@ module JSON
|
|
|
20
20
|
return_objects ? parsed : JSON.generate(parsed)
|
|
21
21
|
end
|
|
22
22
|
|
|
23
|
+
# Inlined rather than calling `repair(...)` so the literal-bool overloads
|
|
24
|
+
# in sig/json/repair.rbs narrow correctly per caller — forwarding a
|
|
25
|
+
# `bool`-typed `return_objects` will not resolve against the literal-
|
|
26
|
+
# `true`/`false` overloads on `JSON.repair`.
|
|
27
|
+
def self.repair_io(io, return_objects: false, skip_json_loads: false)
|
|
28
|
+
json = io.read || ''
|
|
29
|
+
parsed = skip_json_loads ? repaired_parse(json) : tolerant_parse(json)
|
|
30
|
+
return_objects ? parsed : JSON.generate(parsed)
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def self.repair_file(path, return_objects: false, skip_json_loads: false)
|
|
34
|
+
json = File.read(path.to_s)
|
|
35
|
+
parsed = skip_json_loads ? repaired_parse(json) : tolerant_parse(json)
|
|
36
|
+
return_objects ? parsed : JSON.generate(parsed)
|
|
37
|
+
end
|
|
38
|
+
|
|
23
39
|
def self.tolerant_parse(json)
|
|
24
40
|
JSON.parse(json)
|
|
25
41
|
rescue JSON::ParserError
|
|
@@ -27,8 +43,14 @@ module JSON
|
|
|
27
43
|
end
|
|
28
44
|
private_class_method :tolerant_parse
|
|
29
45
|
|
|
46
|
+
# The rescue guards the JSONRepairError-only error contract: if the
|
|
47
|
+
# Repairer ever emits a string stdlib JSON cannot parse (a Repairer bug),
|
|
48
|
+
# wrap the stdlib error instead of leaking JSON::ParserError to callers.
|
|
30
49
|
def self.repaired_parse(json)
|
|
31
|
-
|
|
50
|
+
repaired = Repairer.new(json).repair
|
|
51
|
+
JSON.parse(repaired)
|
|
52
|
+
rescue JSON::ParserError => e
|
|
53
|
+
raise JSONRepairError, "Internal error: repaired output is not valid JSON (#{e.message})"
|
|
32
54
|
end
|
|
33
55
|
private_class_method :repaired_parse
|
|
34
56
|
end
|
data/lib/json/repairer.rb
CHANGED
|
@@ -37,6 +37,12 @@ module JSON
|
|
|
37
37
|
def repair
|
|
38
38
|
parse_markdown_code_block(MARKDOWN_OPEN_BLOCKS)
|
|
39
39
|
|
|
40
|
+
# repair: skip a Markdown list marker before the root value
|
|
41
|
+
# (and any comments before it, which parse_value would otherwise
|
|
42
|
+
# only consume after the marker check has already failed)
|
|
43
|
+
parse_whitespace_and_skip_comments
|
|
44
|
+
skip_markdown_list_marker
|
|
45
|
+
|
|
40
46
|
processed = parse_value
|
|
41
47
|
|
|
42
48
|
throw_unexpected_end unless processed
|
|
@@ -46,7 +52,8 @@ module JSON
|
|
|
46
52
|
processed_comma = parse_character(COMMA)
|
|
47
53
|
parse_whitespace_and_skip_comments if processed_comma
|
|
48
54
|
|
|
49
|
-
if start_of_value?(@json[@index])
|
|
55
|
+
if (start_of_value?(@json[@index]) || markdown_list_marker_length) &&
|
|
56
|
+
ends_with_comma_or_newline?(@output)
|
|
50
57
|
# start of a new value after end of the root level object: looks like
|
|
51
58
|
# newline delimited JSON -> turn into a root level array
|
|
52
59
|
unless processed_comma
|
|
@@ -170,6 +177,52 @@ module JSON
|
|
|
170
177
|
false
|
|
171
178
|
end
|
|
172
179
|
|
|
180
|
+
# Look ahead from @index for a Markdown list marker like "- ", "* ",
|
|
181
|
+
# "+ ", or "12. " that precedes a value. Returns the marker's length,
|
|
182
|
+
# or nil when there is no marker. Only consulted at the top level —
|
|
183
|
+
# the root value and each newline-delimited value — never inside
|
|
184
|
+
# nested structures. A marker must be followed by same-line
|
|
185
|
+
# whitespace and a value, so "-5", a trailing "- ", and "-\n{...}"
|
|
186
|
+
# keep their number readings. Ordered markers are capped at nine
|
|
187
|
+
# digits (the CommonMark limit) so long truncated decimals are not
|
|
188
|
+
# mistaken for markers. Divergence from upstream (no Markdown list
|
|
189
|
+
# handling as of v3.14.0): LLMs frequently emit JSON values as
|
|
190
|
+
# Markdown list items.
|
|
191
|
+
def markdown_list_marker_length
|
|
192
|
+
j = @index
|
|
193
|
+
|
|
194
|
+
if [MINUS, ASTERISK, PLUS].include?(@json[j])
|
|
195
|
+
j += 1
|
|
196
|
+
elsif digit?(@json[j])
|
|
197
|
+
j += 1 while digit?(@json[j]) && j - @index < 9
|
|
198
|
+
return nil unless [DOT, CLOSE_PARENTHESIS].include?(@json[j])
|
|
199
|
+
|
|
200
|
+
j += 1
|
|
201
|
+
else
|
|
202
|
+
return nil
|
|
203
|
+
end
|
|
204
|
+
|
|
205
|
+
marker_length = j - @index
|
|
206
|
+
return nil unless same_line_whitespace?(@json[j])
|
|
207
|
+
|
|
208
|
+
j += 1 while same_line_whitespace?(@json[j])
|
|
209
|
+
# a leading-dot number like ".5" is also a value here: parse_number
|
|
210
|
+
# repairs it to "0.5" even though start_of_value? does not match it
|
|
211
|
+
return nil unless start_of_value?(@json[j]) || @json[j] == DOT
|
|
212
|
+
|
|
213
|
+
marker_length
|
|
214
|
+
end
|
|
215
|
+
|
|
216
|
+
# Repair a value behind a Markdown list marker, like "- {"a":1}",
|
|
217
|
+
# by skipping the marker. See markdown_list_marker_length.
|
|
218
|
+
def skip_markdown_list_marker
|
|
219
|
+
length = markdown_list_marker_length
|
|
220
|
+
return false unless length
|
|
221
|
+
|
|
222
|
+
@index += length
|
|
223
|
+
true
|
|
224
|
+
end
|
|
225
|
+
|
|
173
226
|
# Parse an object like '{"key": "value"}'
|
|
174
227
|
def parse_object
|
|
175
228
|
return false unless @json[@index] == OPENING_BRACE
|
|
@@ -241,6 +294,10 @@ module JSON
|
|
|
241
294
|
end
|
|
242
295
|
# :nocov:
|
|
243
296
|
end
|
|
297
|
+
|
|
298
|
+
# repair: an object string value with unescaped quotes around a
|
|
299
|
+
# colon, like {"a": "b": "c"}
|
|
300
|
+
repair_doubled_colon if processed_value
|
|
244
301
|
end
|
|
245
302
|
|
|
246
303
|
if @json[@index] == CLOSING_BRACE
|
|
@@ -254,6 +311,59 @@ module JSON
|
|
|
254
311
|
true
|
|
255
312
|
end
|
|
256
313
|
|
|
314
|
+
# Repair an object value with unescaped quotes around a colon, like
|
|
315
|
+
# {"a": "b": "c"}, by merging it all into one string value: 'b": "c'
|
|
316
|
+
# (the unescaped-quotes reading of the input). Greedy: keeps merging
|
|
317
|
+
# while another `: "..."` follows. Only the string-colon-string
|
|
318
|
+
# shape is repaired; anything else falls through to the regular
|
|
319
|
+
# error paths. Divergence from upstream (which raises "Object key
|
|
320
|
+
# expected" as of v3.14.0), matching the Go and Python json-repair
|
|
321
|
+
# libraries on the canonical case.
|
|
322
|
+
def repair_doubled_colon
|
|
323
|
+
loop do
|
|
324
|
+
colon = @index
|
|
325
|
+
# :nocov: kept for symmetry with the start_quote scan below; unreachable
|
|
326
|
+
# because @index never rests on whitespace here. On first entry,
|
|
327
|
+
# parse_value ends with parse_whitespace_and_skip_comments. On greedy
|
|
328
|
+
# re-entry, every parse_string exit leaves @index off-whitespace: the
|
|
329
|
+
# EOF path (nil is not whitespace), the stop_at_index path (a
|
|
330
|
+
# prev_non_whitespace_index position), and the end-quote path (ends in
|
|
331
|
+
# parse_concatenated_string, whose leading whitespace skip consumes
|
|
332
|
+
# newlines too). If a future parse_value/parse_string change breaks
|
|
333
|
+
# that, this scan becomes live and the :nocov: will hide it.
|
|
334
|
+
colon += 1 while whitespace_or_special?(@json[colon])
|
|
335
|
+
# :nocov:
|
|
336
|
+
return unless @json[colon] == COLON
|
|
337
|
+
|
|
338
|
+
# scan past special whitespace too (unlike prev_non_whitespace_index):
|
|
339
|
+
# parse_whitespace treats NBSP and friends as whitespace, so this
|
|
340
|
+
# repair should as well. The value's last character (at worst the
|
|
341
|
+
# object's opening brace) always stops the scan before index 0.
|
|
342
|
+
end_quote = colon - 1
|
|
343
|
+
end_quote -= 1 while whitespace_or_special?(@json[end_quote])
|
|
344
|
+
return unless quote?(@json[end_quote])
|
|
345
|
+
|
|
346
|
+
start_quote = colon + 1
|
|
347
|
+
start_quote += 1 while whitespace_or_special?(@json[start_quote])
|
|
348
|
+
return unless quote?(@json[start_quote])
|
|
349
|
+
|
|
350
|
+
# repair: replace the end quote already emitted (plus any copied
|
|
351
|
+
# trailing whitespace) with the literal input span from that end
|
|
352
|
+
# quote through the next start quote, escaped as string content
|
|
353
|
+
@output = strip_last_occurrence(@output, '"', strip_remaining_text: true)
|
|
354
|
+
@json[end_quote..start_quote].each_char do |char|
|
|
355
|
+
@output << (char == DOUBLE_QUOTE ? '\"' : CONTROL_CHARACTERS.fetch(char, char))
|
|
356
|
+
end
|
|
357
|
+
|
|
358
|
+
# let parse_string consume the rest of the merged string, then
|
|
359
|
+
# drop the start quote it emits (already emitted escaped above)
|
|
360
|
+
@index = start_quote
|
|
361
|
+
start = @output.length
|
|
362
|
+
parse_string
|
|
363
|
+
@output = remove_at_index(@output, start, 1)
|
|
364
|
+
end
|
|
365
|
+
end
|
|
366
|
+
|
|
257
367
|
def skip_character(char)
|
|
258
368
|
if @json[@index] == char
|
|
259
369
|
@index += 1
|
|
@@ -570,7 +680,9 @@ module JSON
|
|
|
570
680
|
repair_number_ending_with_numeric_symbol(start)
|
|
571
681
|
return true
|
|
572
682
|
end
|
|
573
|
-
|
|
683
|
+
# also accept a dot so "-.5" continues into the fraction branch
|
|
684
|
+
# below (divergence from upstream, which leaves "-.5" unrepaired)
|
|
685
|
+
unless digit?(@json[@index]) || @json[@index] == DOT
|
|
574
686
|
@index = start
|
|
575
687
|
return false
|
|
576
688
|
end
|
|
@@ -620,7 +732,7 @@ module JSON
|
|
|
620
732
|
num = @json[start...@index]
|
|
621
733
|
has_invalid_leading_zero = num.match?(/^0\d/)
|
|
622
734
|
|
|
623
|
-
@output << (has_invalid_leading_zero ? "\"#{num}\"" : num)
|
|
735
|
+
@output << (has_invalid_leading_zero ? "\"#{num}\"" : repair_leading_dot_number(num))
|
|
624
736
|
return true
|
|
625
737
|
end
|
|
626
738
|
|
|
@@ -711,7 +823,18 @@ module JSON
|
|
|
711
823
|
# repair numbers cut off at the end
|
|
712
824
|
# this will only be called when we end after a '.', '-', or 'e' and does not
|
|
713
825
|
# change the number more than it needs to make it valid JSON
|
|
714
|
-
@output << "#{@json[start...@index]}0"
|
|
826
|
+
@output << repair_leading_dot_number("#{@json[start...@index]}0")
|
|
827
|
+
end
|
|
828
|
+
|
|
829
|
+
# Repair a number missing its digit before the decimal point, like ".5"
|
|
830
|
+
# or "-.5", into "0.5" / "-0.5". Divergence from upstream, which emits
|
|
831
|
+
# the invalid leading-dot number unchanged. The guard keeps the common
|
|
832
|
+
# case (a number that needs no repair) allocation-free; `sub` copies
|
|
833
|
+
# its receiver even when the pattern does not match.
|
|
834
|
+
def repair_leading_dot_number(num)
|
|
835
|
+
return num unless num.start_with?('.', '-.')
|
|
836
|
+
|
|
837
|
+
num.sub(/\A(?<sign>-?)\./, '\k<sign>0.')
|
|
715
838
|
end
|
|
716
839
|
|
|
717
840
|
# Parse and repair Newline Delimited JSON (NDJSON):
|
|
@@ -732,6 +855,10 @@ module JSON
|
|
|
732
855
|
end
|
|
733
856
|
end
|
|
734
857
|
|
|
858
|
+
# repair: skip a Markdown list marker before the next value
|
|
859
|
+
parse_whitespace_and_skip_comments
|
|
860
|
+
skip_markdown_list_marker
|
|
861
|
+
|
|
735
862
|
processed_value = parse_value
|
|
736
863
|
end
|
|
737
864
|
|
|
@@ -1,9 +1,9 @@
|
|
|
1
1
|
module JSON
|
|
2
2
|
module Repair
|
|
3
3
|
module StringUtils
|
|
4
|
-
@output:
|
|
4
|
+
@output: ::String
|
|
5
5
|
|
|
6
|
-
@index:
|
|
6
|
+
@index: ::Integer
|
|
7
7
|
|
|
8
8
|
# Constants for character chars
|
|
9
9
|
BACKSLASH: "\\"
|
|
@@ -24,17 +24,17 @@ module JSON
|
|
|
24
24
|
|
|
25
25
|
CLOSE_PARENTHESIS: ")"
|
|
26
26
|
|
|
27
|
-
SPACE:
|
|
27
|
+
SPACE: ::String
|
|
28
28
|
|
|
29
|
-
NEWLINE:
|
|
29
|
+
NEWLINE: ::String
|
|
30
30
|
|
|
31
|
-
TAB:
|
|
31
|
+
TAB: ::String
|
|
32
32
|
|
|
33
|
-
RETURN:
|
|
33
|
+
RETURN: ::String
|
|
34
34
|
|
|
35
|
-
BACKSPACE:
|
|
35
|
+
BACKSPACE: ::String
|
|
36
36
|
|
|
37
|
-
FORM_FEED:
|
|
37
|
+
FORM_FEED: ::String
|
|
38
38
|
|
|
39
39
|
DOUBLE_QUOTE: "\""
|
|
40
40
|
|
|
@@ -110,56 +110,64 @@ module JSON
|
|
|
110
110
|
|
|
111
111
|
REGEX_FUNCTION_NAME_CHAR: ::Regexp
|
|
112
112
|
|
|
113
|
-
# Functions to check character chars
|
|
114
|
-
|
|
113
|
+
# Functions to check character chars.
|
|
114
|
+
# `char` is `::String?` because every caller passes `@json[@index]`,
|
|
115
|
+
# which is `nil` past the end of input. The predicates either guard
|
|
116
|
+
# against `nil` explicitly or rely on `Array#include?` / `==` /
|
|
117
|
+
# `Regexp#match?` returning a safe value for `nil`.
|
|
118
|
+
def hex?: (::String? char) -> bool
|
|
115
119
|
|
|
116
|
-
def digit?: (
|
|
120
|
+
def digit?: (::String? char) -> bool
|
|
117
121
|
|
|
118
|
-
def valid_string_character?: (
|
|
122
|
+
def valid_string_character?: (::String char) -> bool
|
|
119
123
|
|
|
120
|
-
def delimiter?: (
|
|
124
|
+
def delimiter?: (::String? char) -> bool
|
|
121
125
|
|
|
122
|
-
def unquoted_string_delimiter?: (
|
|
126
|
+
def unquoted_string_delimiter?: (::String? char) -> bool
|
|
123
127
|
|
|
124
|
-
def function_name_char_start?: (
|
|
128
|
+
def function_name_char_start?: (::String? char) -> bool
|
|
125
129
|
|
|
126
|
-
def function_name_char?: (
|
|
130
|
+
def function_name_char?: (::String? char) -> bool
|
|
127
131
|
|
|
128
|
-
def start_of_value?: (
|
|
132
|
+
def start_of_value?: (::String? char) -> bool
|
|
129
133
|
|
|
130
|
-
def control_character?: (
|
|
134
|
+
def control_character?: (::String? char) -> bool
|
|
131
135
|
|
|
132
|
-
def whitespace?: (
|
|
136
|
+
def whitespace?: (::String? char) -> bool
|
|
133
137
|
|
|
134
|
-
def whitespace_except_newline?: (
|
|
138
|
+
def whitespace_except_newline?: (::String? char) -> bool
|
|
135
139
|
|
|
136
|
-
def special_whitespace?: (
|
|
140
|
+
def special_whitespace?: (::String? char) -> bool
|
|
137
141
|
|
|
138
|
-
def
|
|
142
|
+
def same_line_whitespace?: (::String? char) -> bool
|
|
139
143
|
|
|
140
|
-
def
|
|
144
|
+
def whitespace_or_special?: (::String? char) -> bool
|
|
141
145
|
|
|
142
|
-
def
|
|
146
|
+
def quote?: (::String? char) -> bool
|
|
143
147
|
|
|
144
|
-
def
|
|
148
|
+
def double_quote?: (::String? char) -> bool
|
|
145
149
|
|
|
146
|
-
def
|
|
150
|
+
def single_quote?: (::String? char) -> bool
|
|
151
|
+
|
|
152
|
+
def double_quote_like?: (::String? char) -> bool
|
|
153
|
+
|
|
154
|
+
def single_quote_like?: (::String? char) -> bool
|
|
147
155
|
|
|
148
156
|
# Strip last occurrence of text_to_strip from text
|
|
149
|
-
def strip_last_occurrence: (
|
|
157
|
+
def strip_last_occurrence: (::String text, ::String text_to_strip, ?strip_remaining_text: bool) -> ::String
|
|
150
158
|
|
|
151
|
-
def insert_before_last_whitespace: (
|
|
159
|
+
def insert_before_last_whitespace: (::String text, ::String text_to_insert) -> ::String
|
|
152
160
|
|
|
153
161
|
# Parse keywords true, false, null
|
|
154
162
|
# Repair Python keywords True, False, None
|
|
155
163
|
# Repair Ruby keyword nil
|
|
156
|
-
def parse_keywords: () ->
|
|
164
|
+
def parse_keywords: () -> bool
|
|
157
165
|
|
|
158
|
-
def parse_keyword: (
|
|
166
|
+
def parse_keyword: (::String name, ::String value) -> bool
|
|
159
167
|
|
|
160
|
-
def remove_at_index: (
|
|
168
|
+
def remove_at_index: (::String text, ::Integer start, ::Integer count) -> ::String
|
|
161
169
|
|
|
162
|
-
def ends_with_comma_or_newline?: (
|
|
170
|
+
def ends_with_comma_or_newline?: (::String text) -> bool
|
|
163
171
|
end
|
|
164
172
|
end
|
|
165
173
|
end
|
data/sig/json/repair.rbs
CHANGED
|
@@ -1,4 +1,10 @@
|
|
|
1
1
|
module JSON
|
|
2
|
+
# Recursive type for any `JSON.parse` result. Mirrors what stdlib's
|
|
3
|
+
# `JSON.parse` produces (and the JS upstream emits): scalars, arrays,
|
|
4
|
+
# and objects of the same. Used in place of `untyped` for the
|
|
5
|
+
# `return_objects: true` and internal `*_parse` paths.
|
|
6
|
+
type json_value = ::Hash[::String, json_value] | ::Array[json_value] | ::String | ::Integer | ::Float | bool | nil
|
|
7
|
+
|
|
2
8
|
class JSONRepairError < StandardError
|
|
3
9
|
attr_reader position: ::Integer?
|
|
4
10
|
|
|
@@ -9,13 +15,25 @@ module JSON
|
|
|
9
15
|
VERSION: ::String
|
|
10
16
|
end
|
|
11
17
|
|
|
18
|
+
interface _Readable
|
|
19
|
+
def read: () -> ::String?
|
|
20
|
+
end
|
|
21
|
+
|
|
12
22
|
def self.repair: (::String json, return_objects: false, ?skip_json_loads: bool) -> ::String
|
|
13
|
-
| (::String json, return_objects: true, ?skip_json_loads: bool) ->
|
|
23
|
+
| (::String json, return_objects: true, ?skip_json_loads: bool) -> json_value
|
|
14
24
|
| (::String json, ?skip_json_loads: bool) -> ::String
|
|
15
25
|
|
|
26
|
+
def self.repair_io: (_Readable io, return_objects: false, ?skip_json_loads: bool) -> ::String
|
|
27
|
+
| (_Readable io, return_objects: true, ?skip_json_loads: bool) -> json_value
|
|
28
|
+
| (_Readable io, ?skip_json_loads: bool) -> ::String
|
|
29
|
+
|
|
30
|
+
def self.repair_file: (::String | ::Pathname path, return_objects: false, ?skip_json_loads: bool) -> ::String
|
|
31
|
+
| (::String | ::Pathname path, return_objects: true, ?skip_json_loads: bool) -> json_value
|
|
32
|
+
| (::String | ::Pathname path, ?skip_json_loads: bool) -> ::String
|
|
33
|
+
|
|
16
34
|
private
|
|
17
35
|
|
|
18
|
-
def self.tolerant_parse: (::String json) ->
|
|
36
|
+
def self.tolerant_parse: (::String json) -> json_value
|
|
19
37
|
|
|
20
|
-
def self.repaired_parse: (::String json) ->
|
|
38
|
+
def self.repaired_parse: (::String json) -> json_value
|
|
21
39
|
end
|
data/sig/json/repairer.rbs
CHANGED
|
@@ -11,7 +11,7 @@ module JSON
|
|
|
11
11
|
# `lib/json/repairer.rb`).
|
|
12
12
|
@json: untyped
|
|
13
13
|
|
|
14
|
-
@index: Integer
|
|
14
|
+
@index: ::Integer
|
|
15
15
|
|
|
16
16
|
@output: ::String
|
|
17
17
|
|
|
@@ -31,25 +31,36 @@ module JSON
|
|
|
31
31
|
|
|
32
32
|
private
|
|
33
33
|
|
|
34
|
-
def parse_value: () ->
|
|
34
|
+
def parse_value: () -> bool
|
|
35
35
|
|
|
36
|
-
def parse_whitespace: (?skip_newline: bool) ->
|
|
36
|
+
def parse_whitespace: (?skip_newline: bool) -> bool
|
|
37
37
|
|
|
38
|
-
def parse_comment: () ->
|
|
38
|
+
def parse_comment: () -> bool
|
|
39
39
|
|
|
40
40
|
# Find and skip over a Markdown fenced code block
|
|
41
|
-
def parse_markdown_code_block: (::Array[::String] blocks) ->
|
|
41
|
+
def parse_markdown_code_block: (::Array[::String] blocks) -> bool
|
|
42
42
|
|
|
43
|
-
def skip_markdown_code_block: (::Array[::String] blocks) ->
|
|
43
|
+
def skip_markdown_code_block: (::Array[::String] blocks) -> bool
|
|
44
|
+
|
|
45
|
+
# Look ahead for a Markdown list marker like "- " or "12. " that
|
|
46
|
+
# precedes a value; returns the marker's length, or nil when there
|
|
47
|
+
# is no marker.
|
|
48
|
+
def markdown_list_marker_length: () -> ::Integer?
|
|
49
|
+
|
|
50
|
+
def skip_markdown_list_marker: () -> bool
|
|
44
51
|
|
|
45
52
|
# Parse an object like '{"key": "value"}'
|
|
46
|
-
def parse_object: () ->
|
|
53
|
+
def parse_object: () -> bool
|
|
47
54
|
|
|
48
|
-
|
|
55
|
+
# Repair an object value with unescaped quotes around a colon,
|
|
56
|
+
# like {"a": "b": "c"}, by merging it into one string value.
|
|
57
|
+
def repair_doubled_colon: () -> void
|
|
58
|
+
|
|
59
|
+
def skip_character: (::String char) -> bool
|
|
49
60
|
|
|
50
61
|
# Skip ellipsis like "[1,2,3,...]" or "[1,2,3,...,9]" or "[...,7,8,9]"
|
|
51
62
|
# or a similar construct in objects.
|
|
52
|
-
def skip_ellipsis: () ->
|
|
63
|
+
def skip_ellipsis: () -> void
|
|
53
64
|
|
|
54
65
|
# Parse a string enclosed by double quotes "...". Can contain escaped quotes
|
|
55
66
|
# Repair strings enclosed in single quotes or special quotes
|
|
@@ -62,51 +73,59 @@ module JSON
|
|
|
62
73
|
# more conservative way, stopping the string at the first next delimiter
|
|
63
74
|
# and fixing the string by inserting a quote there, or stopping at a
|
|
64
75
|
# stop index detected in the first iteration.
|
|
65
|
-
def parse_string: (?stop_at_delimiter: bool, ?stop_at_index: ::Integer) ->
|
|
76
|
+
def parse_string: (?stop_at_delimiter: bool, ?stop_at_index: ::Integer) -> bool
|
|
66
77
|
|
|
67
78
|
# Repair an unquoted string by adding quotes around it
|
|
68
79
|
# Repair a MongoDB function call like NumberLong("2")
|
|
69
80
|
# Repair a JSONP function call like callback({...});
|
|
70
|
-
def parse_unquoted_string: (bool is_key) ->
|
|
81
|
+
def parse_unquoted_string: (bool is_key) -> bool
|
|
71
82
|
|
|
72
83
|
# Parse a regular expression literal like /foo/ or /foo\/bar/
|
|
73
|
-
def parse_regex: () ->
|
|
84
|
+
def parse_regex: () -> bool
|
|
74
85
|
|
|
75
|
-
def parse_character: (
|
|
86
|
+
def parse_character: (::String char) -> bool
|
|
76
87
|
|
|
77
|
-
def parse_whitespace_and_skip_comments: (?skip_newline: bool) ->
|
|
88
|
+
def parse_whitespace_and_skip_comments: (?skip_newline: bool) -> bool
|
|
78
89
|
|
|
79
90
|
# Parse a number like 2.4 or 2.4e6
|
|
80
|
-
def parse_number: () ->
|
|
91
|
+
def parse_number: () -> bool
|
|
81
92
|
|
|
82
|
-
def at_end_of_number?: () ->
|
|
93
|
+
def at_end_of_number?: () -> bool
|
|
83
94
|
|
|
84
95
|
# Parse an array like '["item1", "item2", ...]'
|
|
85
|
-
def parse_array: () ->
|
|
96
|
+
def parse_array: () -> bool
|
|
86
97
|
|
|
87
|
-
def prev_non_whitespace_index: (
|
|
98
|
+
def prev_non_whitespace_index: (::Integer start) -> ::Integer
|
|
88
99
|
|
|
89
100
|
# Repair concatenated strings like "hello" + "world", change this into "helloworld"
|
|
90
|
-
def parse_concatenated_string: () ->
|
|
101
|
+
def parse_concatenated_string: () -> bool
|
|
102
|
+
|
|
103
|
+
def repair_number_ending_with_numeric_symbol: (::Integer start) -> void
|
|
91
104
|
|
|
92
|
-
|
|
105
|
+
# Repair a number missing its digit before the decimal point, like ".5"
|
|
106
|
+
# or "-.5", into "0.5" / "-0.5".
|
|
107
|
+
def repair_leading_dot_number: (::String num) -> ::String
|
|
93
108
|
|
|
94
109
|
# Parse and repair Newline Delimited JSON (NDJSON):
|
|
95
110
|
# multiple JSON objects separated by a newline character
|
|
96
|
-
def parse_newline_delimited_json: () ->
|
|
111
|
+
def parse_newline_delimited_json: () -> void
|
|
97
112
|
|
|
98
|
-
def skip_escape_character: () ->
|
|
113
|
+
def skip_escape_character: () -> bool
|
|
99
114
|
|
|
100
|
-
|
|
115
|
+
# `bot` (bottom) because these always raise — steep needs this to
|
|
116
|
+
# treat their call sites as unreachable so methods like `repair`
|
|
117
|
+
# type-check (the trailing `throw_unexpected_character` must not
|
|
118
|
+
# contribute `void` to the method's union return type).
|
|
119
|
+
def throw_invalid_character: (::String char) -> bot
|
|
101
120
|
|
|
102
|
-
def throw_unexpected_character: () ->
|
|
121
|
+
def throw_unexpected_character: () -> bot
|
|
103
122
|
|
|
104
|
-
def throw_unexpected_end: () ->
|
|
123
|
+
def throw_unexpected_end: () -> bot
|
|
105
124
|
|
|
106
|
-
def throw_object_key_expected: () ->
|
|
125
|
+
def throw_object_key_expected: () -> bot
|
|
107
126
|
|
|
108
|
-
def throw_colon_expected: () ->
|
|
127
|
+
def throw_colon_expected: () -> bot
|
|
109
128
|
|
|
110
|
-
def throw_invalid_unicode_character: () ->
|
|
129
|
+
def throw_invalid_unicode_character: () -> bot
|
|
111
130
|
end
|
|
112
131
|
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: json-repair
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.11.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Aleksandr Zykov
|
|
@@ -9,7 +9,11 @@ bindir: exe
|
|
|
9
9
|
cert_chain: []
|
|
10
10
|
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
11
|
dependencies: []
|
|
12
|
-
description:
|
|
12
|
+
description: 'Repairs broken JSON: missing quotes and commas, unclosed brackets, trailing
|
|
13
|
+
commas, unquoted keys, single quotes, comments, Python constants, NDJSON, Markdown
|
|
14
|
+
code fences and list markers in LLM output, truncated documents, and more. A Ruby
|
|
15
|
+
port of the jsonrepair JavaScript library — useful whenever JSON from LLMs, APIs,
|
|
16
|
+
or logs does not strictly follow the standard.'
|
|
13
17
|
email:
|
|
14
18
|
- alexandrz@gmail.com
|
|
15
19
|
executables:
|
|
@@ -44,6 +48,8 @@ metadata:
|
|
|
44
48
|
homepage_uri: https://github.com/sashazykov/json-repair-rb
|
|
45
49
|
source_code_uri: https://github.com/sashazykov/json-repair-rb
|
|
46
50
|
changelog_uri: https://github.com/sashazykov/json-repair-rb/blob/main/CHANGELOG.md
|
|
51
|
+
documentation_uri: https://rubydoc.info/gems/json-repair
|
|
52
|
+
bug_tracker_uri: https://github.com/sashazykov/json-repair-rb/issues
|
|
47
53
|
rdoc_options: []
|
|
48
54
|
require_paths:
|
|
49
55
|
- lib
|
|
@@ -60,5 +66,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
60
66
|
requirements: []
|
|
61
67
|
rubygems_version: 3.6.9
|
|
62
68
|
specification_version: 4
|
|
63
|
-
summary:
|
|
69
|
+
summary: Repair invalid or malformed JSON documents, including LLM output.
|
|
64
70
|
test_files: []
|