json-repair 0.11.2 → 0.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +54 -0
- data/lib/json/repair/version.rb +1 -1
- data/lib/json/repairer.rb +45 -8
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: aef10e86ea82fb56d9666ad7470317cf751f00730ef51910094ce8b2ee876a53
|
|
4
|
+
data.tar.gz: 682f0cdacc02896687e6c39e534f92f0beda52110679da8b7b7f43721aa6c2a4
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 572225e5c09ac6ab7795d21d179e9ac07bc2c3bffedb5f1df48afa5a33ab8a73923af90429730532a822f3a78004181f3a9d2d7a2bddf17b9b649819b169ec65
|
|
7
|
+
data.tar.gz: a62311d2002b538b81132f6efb8525e6c6801baaa5a8f9eac020abbc2726e5ed783ee2719aeed9dc77301492f869f79deb807b9eee74f3a56e0a37d4712b7279
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,59 @@
|
|
|
1
1
|
# Changes
|
|
2
2
|
|
|
3
|
+
### 2026-06-12 (0.12.0)
|
|
4
|
+
|
|
5
|
+
* Repair the three known input families that raised `Internal error:
|
|
6
|
+
repaired output is not valid JSON` — cases where upstream
|
|
7
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair) (v3.14.0, still
|
|
8
|
+
its latest release) emits invalid JSON and this gem's canonical
|
|
9
|
+
re-serialize guard caught it but blamed the Repairer. All three are
|
|
10
|
+
deliberate divergences from upstream, commented at each site:
|
|
11
|
+
* A stray `e`/`E` with no mantissa is now an unquoted string instead
|
|
12
|
+
of an empty-mantissa exponent: `[e]` → `["e"]`, `[e5]` → `["e5"]`,
|
|
13
|
+
`[truee]` → `[true,"e"]`, `{"k": e}` → `{"k":"e"}` (upstream emits
|
|
14
|
+
`e0` / raw `e5`). Numbers truncated at a real exponent (`[2e]` →
|
|
15
|
+
`[2.0]`) are unchanged.
|
|
16
|
+
* Negative leading-zero numbers are quoted like positive ones:
|
|
17
|
+
`{"n": -05}` → `{"n":"-05"}`, matching the existing `{"n": 05}` →
|
|
18
|
+
`{"n":"05"}` (upstream emits `-05` unrepaired). The same rule now
|
|
19
|
+
also covers the truncated-number repair, which bypassed it:
|
|
20
|
+
`[05e]` → `["05e0"]`, `00.` → `"00.0"` (upstream emits `05e0` /
|
|
21
|
+
`00.0` unrepaired). Valid `-0` / `-0.5` / `0e` / `0.` are
|
|
22
|
+
unchanged.
|
|
23
|
+
* The trailing-comma repair no longer strips a comma belonging to the
|
|
24
|
+
enclosing container when an inner object/array fails on its first
|
|
25
|
+
key or value: `[{{]` → `[{},{}]`, `[1,[}]` → `[1,[]]`,
|
|
26
|
+
`{"a": 1, "b": [}` → `{"a":1,"b":[]}` (upstream emits `[{}{}]`,
|
|
27
|
+
`[1[]]`, `{"a": 1 "b": []}`).
|
|
28
|
+
Validated by differential testing against upstream over a 270-input
|
|
29
|
+
grid of these shapes in every container context: the only behavior
|
|
30
|
+
changes vs 0.11.3 are the 123 previously-`Internal error` inputs now
|
|
31
|
+
repairing (or, for `e+` shapes where upstream emits invalid `e+0`,
|
|
32
|
+
raising a clean position-bearing error). Benchmarks flat.
|
|
33
|
+
|
|
34
|
+
### 2026-06-12 (0.11.3)
|
|
35
|
+
|
|
36
|
+
* Fix infinite recursion (`SystemStackError`) on a quoted string
|
|
37
|
+
followed by a backslash-escaped delimiter, like `["y"\, "z"]`. The
|
|
38
|
+
missing-end-quote retry in `parse_string` stops at the comma it
|
|
39
|
+
detected in the first pass, but the invalid-escape repair consumed
|
|
40
|
+
`\,` as one two-character step, jumping over the stop index and
|
|
41
|
+
re-firing the retry with identical arguments forever — violating the
|
|
42
|
+
contract that `JSONRepairError` is the only error raised. The escaped
|
|
43
|
+
delimiter now ends the string there and the dangling backslash is
|
|
44
|
+
dropped (the standard invalid-escape repair): `["y"\, "z"]` →
|
|
45
|
+
`["y\"","z"]`. The stop-index check is also hardened from `==` to
|
|
46
|
+
`>=` so no future multi-character advance can step over it and
|
|
47
|
+
recurse. Deliberate divergence from upstream
|
|
48
|
+
[jsonrepair](https://github.com/josdejong/jsonrepair), which crashes
|
|
49
|
+
with "Maximum call stack size exceeded" on the same input as of
|
|
50
|
+
v3.14.0 (still its latest release). Found by differential fuzzing
|
|
51
|
+
during the 0.11.2 work and re-validated the same way: across a
|
|
52
|
+
240-input grid of escape-adjacent shapes, only previously-crashing
|
|
53
|
+
inputs changed behavior — object shapes like `{"k": "y"\, "z"}` now
|
|
54
|
+
raise the same "Colon expected" as their backslash-free analog
|
|
55
|
+
`{"k": "y", "z"}`. Benchmarks flat vs 0.11.2.
|
|
56
|
+
|
|
3
57
|
### 2026-06-12 (0.11.2)
|
|
4
58
|
|
|
5
59
|
* Fix the 0.11.0 doubled-colon repair silently mangling objects with a
|
data/lib/json/repair/version.rb
CHANGED
data/lib/json/repairer.rb
CHANGED
|
@@ -237,6 +237,7 @@ module JSON
|
|
|
237
237
|
|
|
238
238
|
initial = true
|
|
239
239
|
while @index < @json.length && @json[@index] != CLOSING_BRACE
|
|
240
|
+
first_pair = initial
|
|
240
241
|
if initial
|
|
241
242
|
initial = false
|
|
242
243
|
else
|
|
@@ -255,8 +256,12 @@ module JSON
|
|
|
255
256
|
if @json[@index] == CLOSING_BRACE || @json[@index] == OPENING_BRACE ||
|
|
256
257
|
@json[@index] == CLOSING_BRACKET || @json[@index] == OPENING_BRACKET ||
|
|
257
258
|
@json[@index].nil?
|
|
258
|
-
# repair trailing comma
|
|
259
|
-
|
|
259
|
+
# repair trailing comma — but only the one this object's own loop
|
|
260
|
+
# emitted or inserted; on the first pair the buffer's last
|
|
261
|
+
# comma belongs to the enclosing container, like in [{{] or
|
|
262
|
+
# {"a": 1, "b": {] (divergence from upstream, which strips
|
|
263
|
+
# the parent's comma and emits invalid JSON like [{}{}])
|
|
264
|
+
@output = strip_last_occurrence(@output, ',') unless first_pair
|
|
260
265
|
else
|
|
261
266
|
throw_object_key_expected
|
|
262
267
|
end
|
|
@@ -463,7 +468,13 @@ module JSON
|
|
|
463
468
|
return true
|
|
464
469
|
end
|
|
465
470
|
|
|
466
|
-
|
|
471
|
+
# >= with a sentinel guard, not ==. Divergence from upstream (which
|
|
472
|
+
# compares with == as of v3.14.0): a multi-character advance below
|
|
473
|
+
# can step over the stop index, and resuming the comma-path retry
|
|
474
|
+
# from beyond it would re-fire that retry with identical arguments
|
|
475
|
+
# forever. The invalid-escape repair below avoids the only known
|
|
476
|
+
# overshoot; this is the backstop guaranteeing termination.
|
|
477
|
+
if stop_at_index >= 0 && @index >= stop_at_index
|
|
467
478
|
# use the stop index detected in the first iteration, and repair end quote
|
|
468
479
|
str = insert_before_last_whitespace(str, '"')
|
|
469
480
|
@output << str
|
|
@@ -569,6 +580,15 @@ module JSON
|
|
|
569
580
|
# repair a backslash escaped newline (like in Bash scripts)
|
|
570
581
|
str << '\n'
|
|
571
582
|
@index += 2
|
|
583
|
+
elsif @index + 1 == stop_at_index
|
|
584
|
+
# repair invalid escape character: remove it — but the escaped
|
|
585
|
+
# character is the delimiter the comma-path retry said to stop
|
|
586
|
+
# at, so drop only the backslash and let the stop check above
|
|
587
|
+
# fire there, keeping the delimiter a delimiter. Divergence
|
|
588
|
+
# from upstream, which consumes both characters, jumps the stop
|
|
589
|
+
# index, and crashes ("Maximum call stack size exceeded" on
|
|
590
|
+
# inputs like `["y"\, "z"]` as of v3.14.0).
|
|
591
|
+
@index += 1
|
|
572
592
|
else
|
|
573
593
|
# repair invalid escape character: remove it
|
|
574
594
|
str << char
|
|
@@ -723,7 +743,13 @@ module JSON
|
|
|
723
743
|
@index += 1 while digit?(@json[@index])
|
|
724
744
|
end
|
|
725
745
|
|
|
726
|
-
|
|
746
|
+
# Divergence from upstream: only enter the exponent branch when a
|
|
747
|
+
# mantissa was consumed — at this point @index > start implies at
|
|
748
|
+
# least one digit (the '-' and '.' paths reset otherwise). Upstream
|
|
749
|
+
# accepts a bare "e"/"E" here and emits invalid JSON like `e0` or
|
|
750
|
+
# raw `e5`; declining lets the token fall through to
|
|
751
|
+
# parse_unquoted_string, matching how "-e5" already becomes "-e5".
|
|
752
|
+
if @index > start && @json[@index] && @json[@index].downcase == 'e'
|
|
727
753
|
@index += 1
|
|
728
754
|
@index += 1 if ['-', '+'].include?(@json[@index])
|
|
729
755
|
if at_end_of_number?
|
|
@@ -746,7 +772,9 @@ module JSON
|
|
|
746
772
|
if @index > start
|
|
747
773
|
# repair a number with leading zeros like "00789"
|
|
748
774
|
num = @json[start...@index]
|
|
749
|
-
|
|
775
|
+
# the optional sign quotes "-05" like "05" (divergence from
|
|
776
|
+
# upstream, whose unsigned check lets "-05" through unrepaired)
|
|
777
|
+
has_invalid_leading_zero = num.match?(/^-?0\d/)
|
|
750
778
|
|
|
751
779
|
@output << (has_invalid_leading_zero ? "\"#{num}\"" : repair_leading_dot_number(num))
|
|
752
780
|
return true
|
|
@@ -771,6 +799,7 @@ module JSON
|
|
|
771
799
|
|
|
772
800
|
initial = true
|
|
773
801
|
while @index < @json.length && @json[@index] != CLOSING_BRACKET
|
|
802
|
+
first_item = initial
|
|
774
803
|
if initial
|
|
775
804
|
initial = false
|
|
776
805
|
else
|
|
@@ -784,8 +813,12 @@ module JSON
|
|
|
784
813
|
processed_value = parse_value
|
|
785
814
|
next if processed_value
|
|
786
815
|
|
|
787
|
-
# repair trailing comma
|
|
788
|
-
|
|
816
|
+
# repair trailing comma — but only the one this array's own loop
|
|
817
|
+
# emitted or inserted; on the first item the buffer's last
|
|
818
|
+
# comma belongs to the enclosing container, like in [1,[}] or
|
|
819
|
+
# {"a": 1, "b": [} (divergence from upstream, which strips
|
|
820
|
+
# the parent's comma and emits invalid JSON like [1[]])
|
|
821
|
+
@output = strip_last_occurrence(@output, ',') unless first_item
|
|
789
822
|
break
|
|
790
823
|
end
|
|
791
824
|
|
|
@@ -844,7 +877,11 @@ module JSON
|
|
|
844
877
|
# repair numbers cut off at the end
|
|
845
878
|
# this will only be called when we end after a '.', '-', or 'e' and does not
|
|
846
879
|
# change the number more than it needs to make it valid JSON
|
|
847
|
-
|
|
880
|
+
num = "#{@json[start...@index]}0"
|
|
881
|
+
# quote a padded token that has an invalid leading zero, like "05e" ->
|
|
882
|
+
# "05e0", applying the same rule as the end of parse_number (divergence
|
|
883
|
+
# from upstream, which emits the invalid number raw)
|
|
884
|
+
@output << (num.match?(/^-?0\d/) ? "\"#{num}\"" : repair_leading_dot_number(num))
|
|
848
885
|
end
|
|
849
886
|
|
|
850
887
|
# Repair a number missing its digit before the decimal point, like ".5"
|