json-repair 0.7.0 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +49 -0
- data/README.md +20 -0
- data/Rakefile +3 -0
- data/lib/json/repair/string_utils.rb +27 -17
- data/lib/json/repair/version.rb +1 -1
- data/lib/json/repair.rb +23 -1
- data/lib/json/repairer.rb +74 -4
- data/sig/json/repair/string_utils.rbs +38 -32
- data/sig/json/repair.rbs +21 -3
- data/sig/json/repairer.rbs +43 -28
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 6de36fcd3ab73ce63e1f9367d4e86dd7373dd7430b2dc87d451d75dd1f5fd685
|
|
4
|
+
data.tar.gz: 7221ab8c14253abf6cf4d40ff8fb8e6bd93c20504a8b5e6608c93a159420dc2f
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 5083d8d8ada9a0b0a67beb8cb8ad1f1af53435e4e5c99632de5c68e19fd02d6dc85567b8c44cfc3068ae44118934c073d5a1acc22abf7f3d089b2add3899e083
|
|
7
|
+
data.tar.gz: 01d4eb164137dcb0e5b3b11703e2b1099efff6749f633466c3b37e1cccec07904e04006ddca1961d30702b5f133e35b88f8c37edce2bfa72ad61a6172dbbbada
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,54 @@
|
|
|
1
1
|
# Changes
|
|
2
2
|
|
|
3
|
+
### 2026-06-11 (0.10.0)
|
|
4
|
+
|
|
5
|
+
* Repair Markdown list markers in front of top-level values:
|
|
6
|
+
`- {"a": 1}` → `{"a":1}`, and multi-line lists become arrays via the
|
|
7
|
+
existing newline-delimited JSON handling
|
|
8
|
+
(`"- {\"a\": 1}\n- {\"b\": 2}"` → `[{"a":1},{"b":2}]`). Bullet
|
|
9
|
+
markers `-`, `*`, `+` and ordered markers like `1.` / `2)` (up to
|
|
10
|
+
nine digits, the CommonMark limit) are recognized at the start of
|
|
11
|
+
the root value and of each newline-delimited line, only when
|
|
12
|
+
followed by same-line whitespace and a value — so `-5`, a trailing
|
|
13
|
+
`"- "`, and newline-delimited decimals like `"1.5\n2.5"` keep their
|
|
14
|
+
number readings, and nothing changes inside nested structures.
|
|
15
|
+
Previously these inputs raised `JSONRepairError`; two non-raising
|
|
16
|
+
behaviors change for the better: `"3\n- 5\n7"` now repairs to
|
|
17
|
+
`[3,5,7]` instead of the corrupt `[3,0,5,7]`, and a single-line
|
|
18
|
+
`* text` becomes `"text"` instead of `"* text"`. Deliberate
|
|
19
|
+
divergence from upstream
|
|
20
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair) (no Markdown
|
|
21
|
+
list handling as of v3.14.0), and more precise than Python
|
|
22
|
+
[`json_repair`](https://github.com/mangiucugna/json_repair), which
|
|
23
|
+
collapses scalar list items to `""`.
|
|
24
|
+
|
|
25
|
+
### 2026-06-11 (0.9.0)
|
|
26
|
+
|
|
27
|
+
* Repair numbers missing the digit before their decimal point:
|
|
28
|
+
`.5` → `0.5`, `-.5` → `-0.5`, and truncated forms like `.` → `0.0`.
|
|
29
|
+
Previously these leaked a raw stdlib `JSON::ParserError` out of
|
|
30
|
+
`JSON.repair` because the repairer emitted the leading-dot number
|
|
31
|
+
unchanged (invalid JSON) and the canonical-output re-parse choked on
|
|
32
|
+
it. This is a deliberate divergence from upstream
|
|
33
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair) (which leaves
|
|
34
|
+
leading-dot numbers unrepaired as of v3.14.0), matching
|
|
35
|
+
[dirty-json](https://github.com/RyanMarcus/dirty-json) behavior.
|
|
36
|
+
* `JSON.repair` now guards its error contract: if the repairer ever
|
|
37
|
+
emits a string stdlib JSON cannot parse (a repairer bug), the stdlib
|
|
38
|
+
error is wrapped in `JSON::JSONRepairError` instead of leaking
|
|
39
|
+
`JSON::ParserError` to callers.
|
|
40
|
+
|
|
41
|
+
### 2026-05-15 (0.8.0)
|
|
42
|
+
|
|
43
|
+
* `JSON.repair_file(path)` and `JSON.repair_io(io)` convenience
|
|
44
|
+
wrappers around `JSON.repair`. `repair_file` reads a path from disk
|
|
45
|
+
(accepts a `String` or `Pathname`); `repair_io` reads from any
|
|
46
|
+
object responding to `#read` (e.g. `File`, `StringIO`, `$stdin`)
|
|
47
|
+
without closing it. Both forward `return_objects:` and
|
|
48
|
+
`skip_json_loads:` through to `JSON.repair`. Mirrors Python's
|
|
49
|
+
[`json_repair`](https://github.com/mangiucugna/json_repair)
|
|
50
|
+
`load` / `from_file` helpers.
|
|
51
|
+
|
|
3
52
|
### 2026-05-12 (0.7.0)
|
|
4
53
|
|
|
5
54
|
* `JSON.repair` now always returns canonical JSON via
|
data/README.md
CHANGED
|
@@ -31,6 +31,12 @@ puts repaired_json # Outputs: {"name":"Alice","age":25}
|
|
|
31
31
|
|
|
32
32
|
The `repair` method takes a string containing JSON data and returns a corrected version of this string, ensuring it is valid JSON.
|
|
33
33
|
|
|
34
|
+
Markdown markup in LLM output is handled too: fenced code blocks like `` ```json `` are stripped, and list markers (`-`, `*`, `+`, `1.`) in front of top-level values are removed — a multi-line list becomes an array:
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
JSON.repair("- {\"a\": 1}\n- {\"b\": 2}") # => '[{"a":1},{"b":2}]'
|
|
38
|
+
```
|
|
39
|
+
|
|
34
40
|
Pass `return_objects: true` to get the parsed Ruby value (Hash, Array, or scalar) instead of a string:
|
|
35
41
|
|
|
36
42
|
```ruby
|
|
@@ -53,6 +59,20 @@ If you need the parsed Ruby value instead of a string, pass `return_objects: tru
|
|
|
53
59
|
|
|
54
60
|
`skip_json_loads: true` skips the stdlib `JSON.parse` attempt and routes the input straight through the repairer. The output is the same; the option is purely a performance knob for callers who know their input will need repair.
|
|
55
61
|
|
|
62
|
+
### Reading from a file or IO
|
|
63
|
+
|
|
64
|
+
`JSON.repair_file(path)` reads a file from disk and repairs its contents. `JSON.repair_io(io)` does the same with any object that responds to `#read` (e.g. `File`, `StringIO`, `$stdin`). Both forward `return_objects:` and `skip_json_loads:` to `JSON.repair`.
|
|
65
|
+
|
|
66
|
+
```ruby
|
|
67
|
+
JSON.repair_file('broken.json')
|
|
68
|
+
JSON.repair_file('broken.json', return_objects: true)
|
|
69
|
+
|
|
70
|
+
File.open('broken.json') { |io| JSON.repair_io(io) }
|
|
71
|
+
JSON.repair_io($stdin)
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
`JSON.repair_io` does not close the IO — the caller manages its lifecycle.
|
|
75
|
+
|
|
56
76
|
## Command line
|
|
57
77
|
|
|
58
78
|
The gem ships a `json-repair` executable. It reads from stdin or a file and writes to stdout, `--output FILE`, or back over the input file with `--overwrite`.
|
data/Rakefile
CHANGED
|
@@ -19,6 +19,9 @@ task :steep do
|
|
|
19
19
|
sh 'bundle exec steep check'
|
|
20
20
|
end
|
|
21
21
|
|
|
22
|
+
desc 'Type-check: rbs validate + steep check'
|
|
23
|
+
task typecheck: %i[rbs steep]
|
|
24
|
+
|
|
22
25
|
desc 'Run benchmark/run.rb (regression baseline for JSON.repair)'
|
|
23
26
|
task :bench do
|
|
24
27
|
ruby '-Ilib', 'benchmark/run.rb'
|
|
@@ -60,13 +60,14 @@ module JSON
|
|
|
60
60
|
|
|
61
61
|
# Functions to check character chars
|
|
62
62
|
def hex?(char)
|
|
63
|
-
|
|
64
|
-
(char >=
|
|
65
|
-
|
|
63
|
+
!char.nil? &&
|
|
64
|
+
((char >= ZERO && char <= NINE) ||
|
|
65
|
+
(char >= UPPERCASE_A && char <= UPPERCASE_F) ||
|
|
66
|
+
(char >= LOWERCASE_A && char <= LOWERCASE_F))
|
|
66
67
|
end
|
|
67
68
|
|
|
68
69
|
def digit?(char)
|
|
69
|
-
char && char >= ZERO && char <= NINE
|
|
70
|
+
!char.nil? && char >= ZERO && char <= NINE
|
|
70
71
|
end
|
|
71
72
|
|
|
72
73
|
def valid_string_character?(char)
|
|
@@ -74,11 +75,11 @@ module JSON
|
|
|
74
75
|
end
|
|
75
76
|
|
|
76
77
|
def delimiter?(char)
|
|
77
|
-
REGEX_DELIMITER.match?(char)
|
|
78
|
+
!char.nil? && REGEX_DELIMITER.match?(char)
|
|
78
79
|
end
|
|
79
80
|
|
|
80
81
|
def unquoted_string_delimiter?(char)
|
|
81
|
-
REGEX_UNQUOTED_STRING_DELIMITER.match?(char)
|
|
82
|
+
!char.nil? && REGEX_UNQUOTED_STRING_DELIMITER.match?(char)
|
|
82
83
|
end
|
|
83
84
|
|
|
84
85
|
REGEX_FUNCTION_NAME_CHAR_START = /\A[a-zA-Z_$]\z/
|
|
@@ -93,19 +94,19 @@ module JSON
|
|
|
93
94
|
end
|
|
94
95
|
|
|
95
96
|
def start_of_value?(char)
|
|
96
|
-
REGEX_START_OF_VALUE.match?(char) ||
|
|
97
|
+
!char.nil? && (REGEX_START_OF_VALUE.match?(char) || quote?(char))
|
|
97
98
|
end
|
|
98
99
|
|
|
99
100
|
def control_character?(char)
|
|
100
|
-
[NEWLINE, RETURN, TAB, BACKSPACE, FORM_FEED].include?(char)
|
|
101
|
+
!char.nil? && [NEWLINE, RETURN, TAB, BACKSPACE, FORM_FEED].include?(char)
|
|
101
102
|
end
|
|
102
103
|
|
|
103
104
|
def whitespace?(char)
|
|
104
|
-
[SPACE, NEWLINE, TAB, RETURN].include?(char)
|
|
105
|
+
!char.nil? && [SPACE, NEWLINE, TAB, RETURN].include?(char)
|
|
105
106
|
end
|
|
106
107
|
|
|
107
108
|
def whitespace_except_newline?(char)
|
|
108
|
-
[SPACE, TAB, RETURN].include?(char)
|
|
109
|
+
!char.nil? && [SPACE, TAB, RETURN].include?(char)
|
|
109
110
|
end
|
|
110
111
|
|
|
111
112
|
def special_whitespace?(char)
|
|
@@ -122,6 +123,10 @@ module JSON
|
|
|
122
123
|
(char >= EN_QUAD && char <= ZERO_WIDTH_SPACE)
|
|
123
124
|
end
|
|
124
125
|
|
|
126
|
+
def same_line_whitespace?(char)
|
|
127
|
+
whitespace_except_newline?(char) || special_whitespace?(char)
|
|
128
|
+
end
|
|
129
|
+
|
|
125
130
|
def quote?(char)
|
|
126
131
|
double_quote_like?(char) || single_quote_like?(char)
|
|
127
132
|
end
|
|
@@ -135,20 +140,25 @@ module JSON
|
|
|
135
140
|
end
|
|
136
141
|
|
|
137
142
|
def double_quote_like?(char)
|
|
138
|
-
[DOUBLE_QUOTE, DOUBLE_QUOTE_LEFT, DOUBLE_QUOTE_RIGHT].include?(char)
|
|
143
|
+
!char.nil? && [DOUBLE_QUOTE, DOUBLE_QUOTE_LEFT, DOUBLE_QUOTE_RIGHT].include?(char)
|
|
139
144
|
end
|
|
140
145
|
|
|
141
146
|
def single_quote_like?(char)
|
|
142
|
-
[QUOTE, QUOTE_LEFT, QUOTE_RIGHT, GRAVE_ACCENT, ACUTE_ACCENT].include?(char)
|
|
147
|
+
!char.nil? && [QUOTE, QUOTE_LEFT, QUOTE_RIGHT, GRAVE_ACCENT, ACUTE_ACCENT].include?(char)
|
|
143
148
|
end
|
|
144
149
|
|
|
145
|
-
# Strip last occurrence of text_to_strip from text
|
|
150
|
+
# Strip last occurrence of text_to_strip from text.
|
|
151
|
+
#
|
|
152
|
+
# `|| ''` on the slices below (and in `insert_before_last_whitespace` /
|
|
153
|
+
# `remove_at_index`) is for steep's nil-narrowing: `String#[range]` is
|
|
154
|
+
# typed `String?`, but every call site here keeps indices within
|
|
155
|
+
# `0..text.length`, so the slices never actually return `nil`.
|
|
146
156
|
def strip_last_occurrence(text, text_to_strip, strip_remaining_text: false)
|
|
147
157
|
index = text.rindex(text_to_strip)
|
|
148
158
|
return text unless index
|
|
149
159
|
|
|
150
|
-
remaining_text = strip_remaining_text ? '' : text[index + 1..]
|
|
151
|
-
text[0...index] + remaining_text
|
|
160
|
+
remaining_text = strip_remaining_text ? '' : (text[index + 1..] || '')
|
|
161
|
+
(text[0...index] || '') + remaining_text
|
|
152
162
|
end
|
|
153
163
|
|
|
154
164
|
def insert_before_last_whitespace(text, text_to_insert)
|
|
@@ -158,7 +168,7 @@ module JSON
|
|
|
158
168
|
|
|
159
169
|
index -= 1 while whitespace?(text[index - 1])
|
|
160
170
|
|
|
161
|
-
text[0...index] + text_to_insert + text[index..]
|
|
171
|
+
(text[0...index] || '') + text_to_insert + (text[index..] || '')
|
|
162
172
|
end
|
|
163
173
|
|
|
164
174
|
# Parse keywords true, false, null
|
|
@@ -187,7 +197,7 @@ module JSON
|
|
|
187
197
|
end
|
|
188
198
|
|
|
189
199
|
def remove_at_index(text, start, count)
|
|
190
|
-
text[0...start] + text[start + count..]
|
|
200
|
+
(text[0...start] || '') + (text[start + count..] || '')
|
|
191
201
|
end
|
|
192
202
|
|
|
193
203
|
def ends_with_comma_or_newline?(text)
|
data/lib/json/repair/version.rb
CHANGED
data/lib/json/repair.rb
CHANGED
|
@@ -20,6 +20,22 @@ module JSON
|
|
|
20
20
|
return_objects ? parsed : JSON.generate(parsed)
|
|
21
21
|
end
|
|
22
22
|
|
|
23
|
+
# Inlined rather than calling `repair(...)` so the literal-bool overloads
|
|
24
|
+
# in sig/json/repair.rbs narrow correctly per caller — forwarding a
|
|
25
|
+
# `bool`-typed `return_objects` will not resolve against the literal-
|
|
26
|
+
# `true`/`false` overloads on `JSON.repair`.
|
|
27
|
+
def self.repair_io(io, return_objects: false, skip_json_loads: false)
|
|
28
|
+
json = io.read || ''
|
|
29
|
+
parsed = skip_json_loads ? repaired_parse(json) : tolerant_parse(json)
|
|
30
|
+
return_objects ? parsed : JSON.generate(parsed)
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def self.repair_file(path, return_objects: false, skip_json_loads: false)
|
|
34
|
+
json = File.read(path.to_s)
|
|
35
|
+
parsed = skip_json_loads ? repaired_parse(json) : tolerant_parse(json)
|
|
36
|
+
return_objects ? parsed : JSON.generate(parsed)
|
|
37
|
+
end
|
|
38
|
+
|
|
23
39
|
def self.tolerant_parse(json)
|
|
24
40
|
JSON.parse(json)
|
|
25
41
|
rescue JSON::ParserError
|
|
@@ -27,8 +43,14 @@ module JSON
|
|
|
27
43
|
end
|
|
28
44
|
private_class_method :tolerant_parse
|
|
29
45
|
|
|
46
|
+
# The rescue guards the JSONRepairError-only error contract: if the
|
|
47
|
+
# Repairer ever emits a string stdlib JSON cannot parse (a Repairer bug),
|
|
48
|
+
# wrap the stdlib error instead of leaking JSON::ParserError to callers.
|
|
30
49
|
def self.repaired_parse(json)
|
|
31
|
-
|
|
50
|
+
repaired = Repairer.new(json).repair
|
|
51
|
+
JSON.parse(repaired)
|
|
52
|
+
rescue JSON::ParserError => e
|
|
53
|
+
raise JSONRepairError, "Internal error: repaired output is not valid JSON (#{e.message})"
|
|
32
54
|
end
|
|
33
55
|
private_class_method :repaired_parse
|
|
34
56
|
end
|
data/lib/json/repairer.rb
CHANGED
|
@@ -37,6 +37,12 @@ module JSON
|
|
|
37
37
|
def repair
|
|
38
38
|
parse_markdown_code_block(MARKDOWN_OPEN_BLOCKS)
|
|
39
39
|
|
|
40
|
+
# repair: skip a Markdown list marker before the root value
|
|
41
|
+
# (and any comments before it, which parse_value would otherwise
|
|
42
|
+
# only consume after the marker check has already failed)
|
|
43
|
+
parse_whitespace_and_skip_comments
|
|
44
|
+
skip_markdown_list_marker
|
|
45
|
+
|
|
40
46
|
processed = parse_value
|
|
41
47
|
|
|
42
48
|
throw_unexpected_end unless processed
|
|
@@ -46,7 +52,8 @@ module JSON
|
|
|
46
52
|
processed_comma = parse_character(COMMA)
|
|
47
53
|
parse_whitespace_and_skip_comments if processed_comma
|
|
48
54
|
|
|
49
|
-
if start_of_value?(@json[@index])
|
|
55
|
+
if (start_of_value?(@json[@index]) || markdown_list_marker_length) &&
|
|
56
|
+
ends_with_comma_or_newline?(@output)
|
|
50
57
|
# start of a new value after end of the root level object: looks like
|
|
51
58
|
# newline delimited JSON -> turn into a root level array
|
|
52
59
|
unless processed_comma
|
|
@@ -170,6 +177,52 @@ module JSON
|
|
|
170
177
|
false
|
|
171
178
|
end
|
|
172
179
|
|
|
180
|
+
# Look ahead from @index for a Markdown list marker like "- ", "* ",
|
|
181
|
+
# "+ ", or "12. " that precedes a value. Returns the marker's length,
|
|
182
|
+
# or nil when there is no marker. Only consulted at the top level —
|
|
183
|
+
# the root value and each newline-delimited value — never inside
|
|
184
|
+
# nested structures. A marker must be followed by same-line
|
|
185
|
+
# whitespace and a value, so "-5", a trailing "- ", and "-\n{...}"
|
|
186
|
+
# keep their number readings. Ordered markers are capped at nine
|
|
187
|
+
# digits (the CommonMark limit) so long truncated decimals are not
|
|
188
|
+
# mistaken for markers. Divergence from upstream (no Markdown list
|
|
189
|
+
# handling as of v3.14.0): LLMs frequently emit JSON values as
|
|
190
|
+
# Markdown list items.
|
|
191
|
+
def markdown_list_marker_length
|
|
192
|
+
j = @index
|
|
193
|
+
|
|
194
|
+
if [MINUS, ASTERISK, PLUS].include?(@json[j])
|
|
195
|
+
j += 1
|
|
196
|
+
elsif digit?(@json[j])
|
|
197
|
+
j += 1 while digit?(@json[j]) && j - @index < 9
|
|
198
|
+
return nil unless [DOT, CLOSE_PARENTHESIS].include?(@json[j])
|
|
199
|
+
|
|
200
|
+
j += 1
|
|
201
|
+
else
|
|
202
|
+
return nil
|
|
203
|
+
end
|
|
204
|
+
|
|
205
|
+
marker_length = j - @index
|
|
206
|
+
return nil unless same_line_whitespace?(@json[j])
|
|
207
|
+
|
|
208
|
+
j += 1 while same_line_whitespace?(@json[j])
|
|
209
|
+
# a leading-dot number like ".5" is also a value here: parse_number
|
|
210
|
+
# repairs it to "0.5" even though start_of_value? does not match it
|
|
211
|
+
return nil unless start_of_value?(@json[j]) || @json[j] == DOT
|
|
212
|
+
|
|
213
|
+
marker_length
|
|
214
|
+
end
|
|
215
|
+
|
|
216
|
+
# Repair a value behind a Markdown list marker, like "- {"a":1}",
|
|
217
|
+
# by skipping the marker. See markdown_list_marker_length.
|
|
218
|
+
def skip_markdown_list_marker
|
|
219
|
+
length = markdown_list_marker_length
|
|
220
|
+
return false unless length
|
|
221
|
+
|
|
222
|
+
@index += length
|
|
223
|
+
true
|
|
224
|
+
end
|
|
225
|
+
|
|
173
226
|
# Parse an object like '{"key": "value"}'
|
|
174
227
|
def parse_object
|
|
175
228
|
return false unless @json[@index] == OPENING_BRACE
|
|
@@ -570,7 +623,9 @@ module JSON
|
|
|
570
623
|
repair_number_ending_with_numeric_symbol(start)
|
|
571
624
|
return true
|
|
572
625
|
end
|
|
573
|
-
|
|
626
|
+
# also accept a dot so "-.5" continues into the fraction branch
|
|
627
|
+
# below (divergence from upstream, which leaves "-.5" unrepaired)
|
|
628
|
+
unless digit?(@json[@index]) || @json[@index] == DOT
|
|
574
629
|
@index = start
|
|
575
630
|
return false
|
|
576
631
|
end
|
|
@@ -620,7 +675,7 @@ module JSON
|
|
|
620
675
|
num = @json[start...@index]
|
|
621
676
|
has_invalid_leading_zero = num.match?(/^0\d/)
|
|
622
677
|
|
|
623
|
-
@output << (has_invalid_leading_zero ? "\"#{num}\"" : num)
|
|
678
|
+
@output << (has_invalid_leading_zero ? "\"#{num}\"" : repair_leading_dot_number(num))
|
|
624
679
|
return true
|
|
625
680
|
end
|
|
626
681
|
|
|
@@ -711,7 +766,18 @@ module JSON
|
|
|
711
766
|
# repair numbers cut off at the end
|
|
712
767
|
# this will only be called when we end after a '.', '-', or 'e' and does not
|
|
713
768
|
# change the number more than it needs to make it valid JSON
|
|
714
|
-
@output << "#{@json[start...@index]}0"
|
|
769
|
+
@output << repair_leading_dot_number("#{@json[start...@index]}0")
|
|
770
|
+
end
|
|
771
|
+
|
|
772
|
+
# Repair a number missing its digit before the decimal point, like ".5"
|
|
773
|
+
# or "-.5", into "0.5" / "-0.5". Divergence from upstream, which emits
|
|
774
|
+
# the invalid leading-dot number unchanged. The guard keeps the common
|
|
775
|
+
# case (a number that needs no repair) allocation-free; `sub` copies
|
|
776
|
+
# its receiver even when the pattern does not match.
|
|
777
|
+
def repair_leading_dot_number(num)
|
|
778
|
+
return num unless num.start_with?('.', '-.')
|
|
779
|
+
|
|
780
|
+
num.sub(/\A(?<sign>-?)\./, '\k<sign>0.')
|
|
715
781
|
end
|
|
716
782
|
|
|
717
783
|
# Parse and repair Newline Delimited JSON (NDJSON):
|
|
@@ -732,6 +798,10 @@ module JSON
|
|
|
732
798
|
end
|
|
733
799
|
end
|
|
734
800
|
|
|
801
|
+
# repair: skip a Markdown list marker before the next value
|
|
802
|
+
parse_whitespace_and_skip_comments
|
|
803
|
+
skip_markdown_list_marker
|
|
804
|
+
|
|
735
805
|
processed_value = parse_value
|
|
736
806
|
end
|
|
737
807
|
|
|
@@ -1,9 +1,9 @@
|
|
|
1
1
|
module JSON
|
|
2
2
|
module Repair
|
|
3
3
|
module StringUtils
|
|
4
|
-
@output:
|
|
4
|
+
@output: ::String
|
|
5
5
|
|
|
6
|
-
@index:
|
|
6
|
+
@index: ::Integer
|
|
7
7
|
|
|
8
8
|
# Constants for character chars
|
|
9
9
|
BACKSLASH: "\\"
|
|
@@ -24,17 +24,17 @@ module JSON
|
|
|
24
24
|
|
|
25
25
|
CLOSE_PARENTHESIS: ")"
|
|
26
26
|
|
|
27
|
-
SPACE:
|
|
27
|
+
SPACE: ::String
|
|
28
28
|
|
|
29
|
-
NEWLINE:
|
|
29
|
+
NEWLINE: ::String
|
|
30
30
|
|
|
31
|
-
TAB:
|
|
31
|
+
TAB: ::String
|
|
32
32
|
|
|
33
|
-
RETURN:
|
|
33
|
+
RETURN: ::String
|
|
34
34
|
|
|
35
|
-
BACKSPACE:
|
|
35
|
+
BACKSPACE: ::String
|
|
36
36
|
|
|
37
|
-
FORM_FEED:
|
|
37
|
+
FORM_FEED: ::String
|
|
38
38
|
|
|
39
39
|
DOUBLE_QUOTE: "\""
|
|
40
40
|
|
|
@@ -110,56 +110,62 @@ module JSON
|
|
|
110
110
|
|
|
111
111
|
REGEX_FUNCTION_NAME_CHAR: ::Regexp
|
|
112
112
|
|
|
113
|
-
# Functions to check character chars
|
|
114
|
-
|
|
113
|
+
# Functions to check character chars.
|
|
114
|
+
# `char` is `::String?` because every caller passes `@json[@index]`,
|
|
115
|
+
# which is `nil` past the end of input. The predicates either guard
|
|
116
|
+
# against `nil` explicitly or rely on `Array#include?` / `==` /
|
|
117
|
+
# `Regexp#match?` returning a safe value for `nil`.
|
|
118
|
+
def hex?: (::String? char) -> bool
|
|
115
119
|
|
|
116
|
-
def digit?: (
|
|
120
|
+
def digit?: (::String? char) -> bool
|
|
117
121
|
|
|
118
|
-
def valid_string_character?: (
|
|
122
|
+
def valid_string_character?: (::String char) -> bool
|
|
119
123
|
|
|
120
|
-
def delimiter?: (
|
|
124
|
+
def delimiter?: (::String? char) -> bool
|
|
121
125
|
|
|
122
|
-
def unquoted_string_delimiter?: (
|
|
126
|
+
def unquoted_string_delimiter?: (::String? char) -> bool
|
|
123
127
|
|
|
124
|
-
def function_name_char_start?: (
|
|
128
|
+
def function_name_char_start?: (::String? char) -> bool
|
|
125
129
|
|
|
126
|
-
def function_name_char?: (
|
|
130
|
+
def function_name_char?: (::String? char) -> bool
|
|
127
131
|
|
|
128
|
-
def start_of_value?: (
|
|
132
|
+
def start_of_value?: (::String? char) -> bool
|
|
129
133
|
|
|
130
|
-
def control_character?: (
|
|
134
|
+
def control_character?: (::String? char) -> bool
|
|
131
135
|
|
|
132
|
-
def whitespace?: (
|
|
136
|
+
def whitespace?: (::String? char) -> bool
|
|
133
137
|
|
|
134
|
-
def whitespace_except_newline?: (
|
|
138
|
+
def whitespace_except_newline?: (::String? char) -> bool
|
|
135
139
|
|
|
136
|
-
def special_whitespace?: (
|
|
140
|
+
def special_whitespace?: (::String? char) -> bool
|
|
137
141
|
|
|
138
|
-
def
|
|
142
|
+
def same_line_whitespace?: (::String? char) -> bool
|
|
139
143
|
|
|
140
|
-
def
|
|
144
|
+
def quote?: (::String? char) -> bool
|
|
141
145
|
|
|
142
|
-
def
|
|
146
|
+
def double_quote?: (::String? char) -> bool
|
|
143
147
|
|
|
144
|
-
def
|
|
148
|
+
def single_quote?: (::String? char) -> bool
|
|
145
149
|
|
|
146
|
-
def
|
|
150
|
+
def double_quote_like?: (::String? char) -> bool
|
|
151
|
+
|
|
152
|
+
def single_quote_like?: (::String? char) -> bool
|
|
147
153
|
|
|
148
154
|
# Strip last occurrence of text_to_strip from text
|
|
149
|
-
def strip_last_occurrence: (
|
|
155
|
+
def strip_last_occurrence: (::String text, ::String text_to_strip, ?strip_remaining_text: bool) -> ::String
|
|
150
156
|
|
|
151
|
-
def insert_before_last_whitespace: (
|
|
157
|
+
def insert_before_last_whitespace: (::String text, ::String text_to_insert) -> ::String
|
|
152
158
|
|
|
153
159
|
# Parse keywords true, false, null
|
|
154
160
|
# Repair Python keywords True, False, None
|
|
155
161
|
# Repair Ruby keyword nil
|
|
156
|
-
def parse_keywords: () ->
|
|
162
|
+
def parse_keywords: () -> bool
|
|
157
163
|
|
|
158
|
-
def parse_keyword: (
|
|
164
|
+
def parse_keyword: (::String name, ::String value) -> bool
|
|
159
165
|
|
|
160
|
-
def remove_at_index: (
|
|
166
|
+
def remove_at_index: (::String text, ::Integer start, ::Integer count) -> ::String
|
|
161
167
|
|
|
162
|
-
def ends_with_comma_or_newline?: (
|
|
168
|
+
def ends_with_comma_or_newline?: (::String text) -> bool
|
|
163
169
|
end
|
|
164
170
|
end
|
|
165
171
|
end
|
data/sig/json/repair.rbs
CHANGED
|
@@ -1,4 +1,10 @@
|
|
|
1
1
|
module JSON
|
|
2
|
+
# Recursive type for any `JSON.parse` result. Mirrors what stdlib's
|
|
3
|
+
# `JSON.parse` produces (and the JS upstream emits): scalars, arrays,
|
|
4
|
+
# and objects of the same. Used in place of `untyped` for the
|
|
5
|
+
# `return_objects: true` and internal `*_parse` paths.
|
|
6
|
+
type json_value = ::Hash[::String, json_value] | ::Array[json_value] | ::String | ::Integer | ::Float | bool | nil
|
|
7
|
+
|
|
2
8
|
class JSONRepairError < StandardError
|
|
3
9
|
attr_reader position: ::Integer?
|
|
4
10
|
|
|
@@ -9,13 +15,25 @@ module JSON
|
|
|
9
15
|
VERSION: ::String
|
|
10
16
|
end
|
|
11
17
|
|
|
18
|
+
interface _Readable
|
|
19
|
+
def read: () -> ::String?
|
|
20
|
+
end
|
|
21
|
+
|
|
12
22
|
def self.repair: (::String json, return_objects: false, ?skip_json_loads: bool) -> ::String
|
|
13
|
-
| (::String json, return_objects: true, ?skip_json_loads: bool) ->
|
|
23
|
+
| (::String json, return_objects: true, ?skip_json_loads: bool) -> json_value
|
|
14
24
|
| (::String json, ?skip_json_loads: bool) -> ::String
|
|
15
25
|
|
|
26
|
+
def self.repair_io: (_Readable io, return_objects: false, ?skip_json_loads: bool) -> ::String
|
|
27
|
+
| (_Readable io, return_objects: true, ?skip_json_loads: bool) -> json_value
|
|
28
|
+
| (_Readable io, ?skip_json_loads: bool) -> ::String
|
|
29
|
+
|
|
30
|
+
def self.repair_file: (::String | ::Pathname path, return_objects: false, ?skip_json_loads: bool) -> ::String
|
|
31
|
+
| (::String | ::Pathname path, return_objects: true, ?skip_json_loads: bool) -> json_value
|
|
32
|
+
| (::String | ::Pathname path, ?skip_json_loads: bool) -> ::String
|
|
33
|
+
|
|
16
34
|
private
|
|
17
35
|
|
|
18
|
-
def self.tolerant_parse: (::String json) ->
|
|
36
|
+
def self.tolerant_parse: (::String json) -> json_value
|
|
19
37
|
|
|
20
|
-
def self.repaired_parse: (::String json) ->
|
|
38
|
+
def self.repaired_parse: (::String json) -> json_value
|
|
21
39
|
end
|
data/sig/json/repairer.rbs
CHANGED
|
@@ -11,7 +11,7 @@ module JSON
|
|
|
11
11
|
# `lib/json/repairer.rb`).
|
|
12
12
|
@json: untyped
|
|
13
13
|
|
|
14
|
-
@index: Integer
|
|
14
|
+
@index: ::Integer
|
|
15
15
|
|
|
16
16
|
@output: ::String
|
|
17
17
|
|
|
@@ -31,25 +31,32 @@ module JSON
|
|
|
31
31
|
|
|
32
32
|
private
|
|
33
33
|
|
|
34
|
-
def parse_value: () ->
|
|
34
|
+
def parse_value: () -> bool
|
|
35
35
|
|
|
36
|
-
def parse_whitespace: (?skip_newline: bool) ->
|
|
36
|
+
def parse_whitespace: (?skip_newline: bool) -> bool
|
|
37
37
|
|
|
38
|
-
def parse_comment: () ->
|
|
38
|
+
def parse_comment: () -> bool
|
|
39
39
|
|
|
40
40
|
# Find and skip over a Markdown fenced code block
|
|
41
|
-
def parse_markdown_code_block: (::Array[::String] blocks) ->
|
|
41
|
+
def parse_markdown_code_block: (::Array[::String] blocks) -> bool
|
|
42
42
|
|
|
43
|
-
def skip_markdown_code_block: (::Array[::String] blocks) ->
|
|
43
|
+
def skip_markdown_code_block: (::Array[::String] blocks) -> bool
|
|
44
|
+
|
|
45
|
+
# Look ahead for a Markdown list marker like "- " or "12. " that
|
|
46
|
+
# precedes a value; returns the marker's length, or nil when there
|
|
47
|
+
# is no marker.
|
|
48
|
+
def markdown_list_marker_length: () -> ::Integer?
|
|
49
|
+
|
|
50
|
+
def skip_markdown_list_marker: () -> bool
|
|
44
51
|
|
|
45
52
|
# Parse an object like '{"key": "value"}'
|
|
46
|
-
def parse_object: () ->
|
|
53
|
+
def parse_object: () -> bool
|
|
47
54
|
|
|
48
|
-
def skip_character: (
|
|
55
|
+
def skip_character: (::String char) -> bool
|
|
49
56
|
|
|
50
57
|
# Skip ellipsis like "[1,2,3,...]" or "[1,2,3,...,9]" or "[...,7,8,9]"
|
|
51
58
|
# or a similar construct in objects.
|
|
52
|
-
def skip_ellipsis: () ->
|
|
59
|
+
def skip_ellipsis: () -> void
|
|
53
60
|
|
|
54
61
|
# Parse a string enclosed by double quotes "...". Can contain escaped quotes
|
|
55
62
|
# Repair strings enclosed in single quotes or special quotes
|
|
@@ -62,51 +69,59 @@ module JSON
|
|
|
62
69
|
# more conservative way, stopping the string at the first next delimiter
|
|
63
70
|
# and fixing the string by inserting a quote there, or stopping at a
|
|
64
71
|
# stop index detected in the first iteration.
|
|
65
|
-
def parse_string: (?stop_at_delimiter: bool, ?stop_at_index: ::Integer) ->
|
|
72
|
+
def parse_string: (?stop_at_delimiter: bool, ?stop_at_index: ::Integer) -> bool
|
|
66
73
|
|
|
67
74
|
# Repair an unquoted string by adding quotes around it
|
|
68
75
|
# Repair a MongoDB function call like NumberLong("2")
|
|
69
76
|
# Repair a JSONP function call like callback({...});
|
|
70
|
-
def parse_unquoted_string: (bool is_key) ->
|
|
77
|
+
def parse_unquoted_string: (bool is_key) -> bool
|
|
71
78
|
|
|
72
79
|
# Parse a regular expression literal like /foo/ or /foo\/bar/
|
|
73
|
-
def parse_regex: () ->
|
|
80
|
+
def parse_regex: () -> bool
|
|
74
81
|
|
|
75
|
-
def parse_character: (
|
|
82
|
+
def parse_character: (::String char) -> bool
|
|
76
83
|
|
|
77
|
-
def parse_whitespace_and_skip_comments: (?skip_newline: bool) ->
|
|
84
|
+
def parse_whitespace_and_skip_comments: (?skip_newline: bool) -> bool
|
|
78
85
|
|
|
79
86
|
# Parse a number like 2.4 or 2.4e6
|
|
80
|
-
def parse_number: () ->
|
|
87
|
+
def parse_number: () -> bool
|
|
81
88
|
|
|
82
|
-
def at_end_of_number?: () ->
|
|
89
|
+
def at_end_of_number?: () -> bool
|
|
83
90
|
|
|
84
91
|
# Parse an array like '["item1", "item2", ...]'
|
|
85
|
-
def parse_array: () ->
|
|
92
|
+
def parse_array: () -> bool
|
|
86
93
|
|
|
87
|
-
def prev_non_whitespace_index: (
|
|
94
|
+
def prev_non_whitespace_index: (::Integer start) -> ::Integer
|
|
88
95
|
|
|
89
96
|
# Repair concatenated strings like "hello" + "world", change this into "helloworld"
|
|
90
|
-
def parse_concatenated_string: () ->
|
|
97
|
+
def parse_concatenated_string: () -> bool
|
|
98
|
+
|
|
99
|
+
def repair_number_ending_with_numeric_symbol: (::Integer start) -> void
|
|
91
100
|
|
|
92
|
-
|
|
101
|
+
# Repair a number missing its digit before the decimal point, like ".5"
|
|
102
|
+
# or "-.5", into "0.5" / "-0.5".
|
|
103
|
+
def repair_leading_dot_number: (::String num) -> ::String
|
|
93
104
|
|
|
94
105
|
# Parse and repair Newline Delimited JSON (NDJSON):
|
|
95
106
|
# multiple JSON objects separated by a newline character
|
|
96
|
-
def parse_newline_delimited_json: () ->
|
|
107
|
+
def parse_newline_delimited_json: () -> void
|
|
97
108
|
|
|
98
|
-
def skip_escape_character: () ->
|
|
109
|
+
def skip_escape_character: () -> bool
|
|
99
110
|
|
|
100
|
-
|
|
111
|
+
# `bot` (bottom) because these always raise — steep needs this to
|
|
112
|
+
# treat their call sites as unreachable so methods like `repair`
|
|
113
|
+
# type-check (the trailing `throw_unexpected_character` must not
|
|
114
|
+
# contribute `void` to the method's union return type).
|
|
115
|
+
def throw_invalid_character: (::String char) -> bot
|
|
101
116
|
|
|
102
|
-
def throw_unexpected_character: () ->
|
|
117
|
+
def throw_unexpected_character: () -> bot
|
|
103
118
|
|
|
104
|
-
def throw_unexpected_end: () ->
|
|
119
|
+
def throw_unexpected_end: () -> bot
|
|
105
120
|
|
|
106
|
-
def throw_object_key_expected: () ->
|
|
121
|
+
def throw_object_key_expected: () -> bot
|
|
107
122
|
|
|
108
|
-
def throw_colon_expected: () ->
|
|
123
|
+
def throw_colon_expected: () -> bot
|
|
109
124
|
|
|
110
|
-
def throw_invalid_unicode_character: () ->
|
|
125
|
+
def throw_invalid_unicode_character: () -> bot
|
|
111
126
|
end
|
|
112
127
|
end
|