json-repair 0.11.1 → 0.11.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +10 -0
- data/CHANGELOG.md +44 -1
- data/lib/json/repair/version.rb +1 -1
- data/lib/json/repairer.rb +41 -6
- data/sig/json/repairer.rbs +2 -0
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 69085d74f416811c4ac11ca7cfe2e9545a6cecdaeb32de96532932c99ab4aaf3
|
|
4
|
+
data.tar.gz: 4deee8e6715200ae693144a2c8cab914b9e8c78c3b539671f7632ce39d7b77f3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 31242bd165c070b1836d85a3fca120b5853a6e1ed715dbaeb75aa026563e033a7af491e318c119f3f3f899c4a7bb55382fc998372d7b55953358dc95d9c526be
|
|
7
|
+
data.tar.gz: b8a5a58a36d1c2b36922205f2b3b8d33e89e570fc215ed02ef288577a8d1325d96fa21cda9267ca78f44754b00a3150e5f6d149d6087c9f020e23090d47a13f2
|
data/.rubocop.yml
CHANGED
|
@@ -1,5 +1,15 @@
|
|
|
1
|
+
# Merge our Exclude lists with RuboCop's defaults (vendor/**/*, tmp/**/*, …)
|
|
2
|
+
# instead of replacing them — CI vendors gems into vendor/bundle, which
|
|
3
|
+
# RuboCop must keep skipping.
|
|
4
|
+
inherit_mode:
|
|
5
|
+
merge:
|
|
6
|
+
- Exclude
|
|
7
|
+
|
|
1
8
|
AllCops:
|
|
2
9
|
TargetRubyVersion: 3.0
|
|
10
|
+
Exclude:
|
|
11
|
+
# gitignored local planning notes and scratch tooling (see CLAUDE.md)
|
|
12
|
+
- docs/**/*
|
|
3
13
|
|
|
4
14
|
Style/Documentation:
|
|
5
15
|
Enabled: false
|
data/CHANGELOG.md
CHANGED
|
@@ -1,6 +1,49 @@
|
|
|
1
1
|
# Changes
|
|
2
2
|
|
|
3
|
-
### 2026-06-12 (0.11.
|
|
3
|
+
### 2026-06-12 (0.11.3)
|
|
4
|
+
|
|
5
|
+
* Fix infinite recursion (`SystemStackError`) on a quoted string
|
|
6
|
+
followed by a backslash-escaped delimiter, like `["y"\, "z"]`. The
|
|
7
|
+
missing-end-quote retry in `parse_string` stops at the comma it
|
|
8
|
+
detected in the first pass, but the invalid-escape repair consumed
|
|
9
|
+
`\,` as one two-character step, jumping over the stop index and
|
|
10
|
+
re-firing the retry with identical arguments forever — violating the
|
|
11
|
+
contract that `JSONRepairError` is the only error raised. The escaped
|
|
12
|
+
delimiter now ends the string there and the dangling backslash is
|
|
13
|
+
dropped (the standard invalid-escape repair): `["y"\, "z"]` →
|
|
14
|
+
`["y\"","z"]`. The stop-index check is also hardened from `==` to
|
|
15
|
+
`>=` so no future multi-character advance can step over it and
|
|
16
|
+
recurse. Deliberate divergence from upstream
|
|
17
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair), which crashes
|
|
18
|
+
with "Maximum call stack size exceeded" on the same input as of
|
|
19
|
+
v3.14.0 (still its latest release). Found by differential fuzzing
|
|
20
|
+
during the 0.11.2 work and re-validated the same way: across a
|
|
21
|
+
240-input grid of escape-adjacent shapes, only previously-crashing
|
|
22
|
+
inputs changed behavior — object shapes like `{"k": "y"\, "z"}` now
|
|
23
|
+
raise the same "Colon expected" as their backslash-free analog
|
|
24
|
+
`{"k": "y", "z"}`. Benchmarks flat vs 0.11.2.
|
|
25
|
+
|
|
26
|
+
### 2026-06-12 (0.11.2)
|
|
27
|
+
|
|
28
|
+
* Fix the 0.11.0 doubled-colon repair silently mangling objects with a
|
|
29
|
+
stray junk word between pairs. `{"value_1": true, COMMENT "value_2":
|
|
30
|
+
"data"}` returned `{"value_1":true,"COMMENT":"value_2\": \"data"}`
|
|
31
|
+
(the junk word became a key and swallowed the real pair), and
|
|
32
|
+
`{ "key": "value" COMMENT "key2": "value2" }` returned a single
|
|
33
|
+
glued string value. Both shapes now raise "Object key expected"
|
|
34
|
+
again at the same positions as upstream
|
|
35
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair) v3.14.0,
|
|
36
|
+
restoring the pre-0.11.0 behavior: the merge is skipped when the
|
|
37
|
+
pair already needed a missing-colon repair or the value string was
|
|
38
|
+
itself salvaged by the unescaped-quote repair — signals that the
|
|
39
|
+
pair was malformed in a way the merge would compound, not fix. The
|
|
40
|
+
salvage signal survives string concatenation: in
|
|
41
|
+
`{"a": "b" x "c" + "d": "e"}` the `+ "d"` segment no longer clears
|
|
42
|
+
it (caught in review by Copilot). All
|
|
43
|
+
0.11.0 repairs (canonical, greedy, escaped quotes, unquoted
|
|
44
|
+
keys/values) are unchanged. Go and Python `json_repair` instead
|
|
45
|
+
drop the junk word; we deliberately keep raising rather than
|
|
46
|
+
silently discarding input (see the 0.11.0 note).
|
|
4
47
|
|
|
5
48
|
* Fix a `TypeError` crash on input ending in a lone backslash inside a
|
|
6
49
|
string: `"abc\` now repairs to `"abc"` (likewise `"\` → `""`,
|
data/lib/json/repair/version.rb
CHANGED
data/lib/json/repairer.rb
CHANGED
|
@@ -32,6 +32,7 @@ module JSON
|
|
|
32
32
|
@json = json
|
|
33
33
|
@index = 0
|
|
34
34
|
@output = +''
|
|
35
|
+
@repaired_unescaped_quote = false
|
|
35
36
|
end
|
|
36
37
|
|
|
37
38
|
def repair
|
|
@@ -295,8 +296,14 @@ module JSON
|
|
|
295
296
|
end
|
|
296
297
|
|
|
297
298
|
# repair: an object string value with unescaped quotes around a
|
|
298
|
-
# colon, like {"a": "b": "c"}
|
|
299
|
-
|
|
299
|
+
# colon, like {"a": "b": "c"}. Skipped when this pair already
|
|
300
|
+
# needed a repair that makes the merge compound garbage: a
|
|
301
|
+
# missing colon (the "key" was a stray junk word, like
|
|
302
|
+
# {"v1": true, COMMENT "v2": "data"}) or a value glued together
|
|
303
|
+
# by the unescaped-quote repair (like
|
|
304
|
+
# {"k": "v" COMMENT "k2": "v2"}); both keep raising, matching
|
|
305
|
+
# upstream
|
|
306
|
+
repair_doubled_colon if processed_value && processed_colon && !@repaired_unescaped_quote
|
|
300
307
|
end
|
|
301
308
|
|
|
302
309
|
if @json[@index] == CLOSING_BRACE
|
|
@@ -315,9 +322,13 @@ module JSON
|
|
|
315
322
|
# (the unescaped-quotes reading of the input). Greedy: keeps merging
|
|
316
323
|
# while another `: "..."` follows. Only the string-colon-string
|
|
317
324
|
# shape is repaired; anything else falls through to the regular
|
|
318
|
-
# error paths.
|
|
319
|
-
#
|
|
320
|
-
#
|
|
325
|
+
# error paths. The call site additionally requires the pair's colon
|
|
326
|
+
# to be present in the input and the value string to have parsed
|
|
327
|
+
# without the unescaped-quote repair (@repaired_unescaped_quote) —
|
|
328
|
+
# when either repair already fired, the pair was malformed in a way
|
|
329
|
+
# this merge would compound, not fix. Divergence from upstream
|
|
330
|
+
# (which raises "Object key expected" as of v3.14.0), matching the
|
|
331
|
+
# Go and Python json-repair libraries on the canonical case.
|
|
321
332
|
def repair_doubled_colon
|
|
322
333
|
loop do
|
|
323
334
|
colon = @index
|
|
@@ -399,6 +410,9 @@ module JSON
|
|
|
399
410
|
# and fixing the string by inserting a quote there, or stopping at a
|
|
400
411
|
# stop index detected in the first iteration.
|
|
401
412
|
def parse_string(stop_at_delimiter: false, stop_at_index: -1)
|
|
413
|
+
# fresh parse (the backtracking re-invocations below rebuild the
|
|
414
|
+
# string from scratch, so they reset too); see repair_doubled_colon
|
|
415
|
+
@repaired_unescaped_quote = false
|
|
402
416
|
skip_escape_chars = @json[@index] == BACKSLASH
|
|
403
417
|
if skip_escape_chars
|
|
404
418
|
# repair: remove the first escape character
|
|
@@ -449,7 +463,13 @@ module JSON
|
|
|
449
463
|
return true
|
|
450
464
|
end
|
|
451
465
|
|
|
452
|
-
|
|
466
|
+
# >= with a sentinel guard, not ==. Divergence from upstream (which
|
|
467
|
+
# compares with == as of v3.14.0): a multi-character advance below
|
|
468
|
+
# can step over the stop index, and resuming the comma-path retry
|
|
469
|
+
# from beyond it would re-fire that retry with identical arguments
|
|
470
|
+
# forever. The invalid-escape repair below avoids the only known
|
|
471
|
+
# overshoot; this is the backstop guaranteeing termination.
|
|
472
|
+
if stop_at_index >= 0 && @index >= stop_at_index
|
|
453
473
|
# use the stop index detected in the first iteration, and repair end quote
|
|
454
474
|
str = insert_before_last_whitespace(str, '"')
|
|
455
475
|
@output << str
|
|
@@ -508,6 +528,7 @@ module JSON
|
|
|
508
528
|
|
|
509
529
|
# repair unescaped quote
|
|
510
530
|
str = "#{str[...o_quote]}\\#{str[o_quote..]}"
|
|
531
|
+
@repaired_unescaped_quote = true
|
|
511
532
|
elsif stop_at_delimiter && unquoted_string_delimiter?(@json[@index])
|
|
512
533
|
# we're in the mode to stop the string at the first delimiter
|
|
513
534
|
# because there is an end quote missing
|
|
@@ -554,6 +575,15 @@ module JSON
|
|
|
554
575
|
# repair a backslash escaped newline (like in Bash scripts)
|
|
555
576
|
str << '\n'
|
|
556
577
|
@index += 2
|
|
578
|
+
elsif @index + 1 == stop_at_index
|
|
579
|
+
# repair invalid escape character: remove it — but the escaped
|
|
580
|
+
# character is the delimiter the comma-path retry said to stop
|
|
581
|
+
# at, so drop only the backslash and let the stop check above
|
|
582
|
+
# fire there, keeping the delimiter a delimiter. Divergence
|
|
583
|
+
# from upstream, which consumes both characters, jumps the stop
|
|
584
|
+
# index, and crashes ("Maximum call stack size exceeded" on
|
|
585
|
+
# inputs like `["y"\, "z"]` as of v3.14.0).
|
|
586
|
+
@index += 1
|
|
557
587
|
else
|
|
558
588
|
# repair invalid escape character: remove it
|
|
559
589
|
str << char
|
|
@@ -807,7 +837,12 @@ module JSON
|
|
|
807
837
|
# repair: remove the end quote of the first string
|
|
808
838
|
@output = strip_last_occurrence(@output, '"', strip_remaining_text: true)
|
|
809
839
|
start = @output.length
|
|
840
|
+
# the segments form one logical string value: keep the doubled-colon
|
|
841
|
+
# guard's flag set when an earlier segment needed the unescaped-quote
|
|
842
|
+
# repair (parse_string resets it on entry)
|
|
843
|
+
repaired_earlier_segment = @repaired_unescaped_quote
|
|
810
844
|
parsed_str = parse_string
|
|
845
|
+
@repaired_unescaped_quote ||= repaired_earlier_segment
|
|
811
846
|
@output = if parsed_str
|
|
812
847
|
# repair: remove the start quote of the second string
|
|
813
848
|
remove_at_index(@output, start, 1)
|
data/sig/json/repairer.rbs
CHANGED