json-repair 0.11.2 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2d7b2f30c2451f62471beabed342ff8360f644e4ae703fa3e0f5490e44d4f0d1
4
- data.tar.gz: 254abaa4ab104a0cc6650d02f099ebb00b0600ef213b00866da266159b6c1a37
3
+ metadata.gz: aef10e86ea82fb56d9666ad7470317cf751f00730ef51910094ce8b2ee876a53
4
+ data.tar.gz: 682f0cdacc02896687e6c39e534f92f0beda52110679da8b7b7f43721aa6c2a4
5
5
  SHA512:
6
- metadata.gz: bfd417574888a7ff44a03870f48ab6e02c7bb7ae80189262e093425ba52e3943982f6ab36170c1cac6a3871e3d3379469b8319bbaa6defe8a0d473a7be6fe2a9
7
- data.tar.gz: 208611ef2a09d4e3a5c91afe98334dfc7443a3db56de9624cfa82a3dea1e0c0480de18227646a96dac3c46f64af526e229f70dd9de4cd41cc1160a703b132a0b
6
+ metadata.gz: 572225e5c09ac6ab7795d21d179e9ac07bc2c3bffedb5f1df48afa5a33ab8a73923af90429730532a822f3a78004181f3a9d2d7a2bddf17b9b649819b169ec65
7
+ data.tar.gz: a62311d2002b538b81132f6efb8525e6c6801baaa5a8f9eac020abbc2726e5ed783ee2719aeed9dc77301492f869f79deb807b9eee74f3a56e0a37d4712b7279
data/CHANGELOG.md CHANGED
@@ -1,5 +1,59 @@
1
1
  # Changes
2
2
 
3
+ ### 2026-06-12 (0.12.0)
4
+
5
+ * Repair the three known input families that raised `Internal error:
6
+ repaired output is not valid JSON` — cases where upstream
7
+ [jsonrepair](https://github.com/josdejong/jsonrepair) (v3.14.0, still
8
+ its latest release) emits invalid JSON and this gem's canonical
9
+ re-serialize guard caught it but blamed the Repairer. All three are
10
+ deliberate divergences from upstream, commented at each site:
11
+ * A stray `e`/`E` with no mantissa is now an unquoted string instead
12
+ of an empty-mantissa exponent: `[e]` → `["e"]`, `[e5]` → `["e5"]`,
13
+ `[truee]` → `[true,"e"]`, `{"k": e}` → `{"k":"e"}` (upstream emits
14
+ `e0` / raw `e5`). Numbers truncated at a real exponent (`[2e]` →
15
+ `[2.0]`) are unchanged.
16
+ * Negative leading-zero numbers are quoted like positive ones:
17
+ `{"n": -05}` → `{"n":"-05"}`, matching the existing `{"n": 05}` →
18
+ `{"n":"05"}` (upstream emits `-05` unrepaired). The same rule now
19
+ also covers the truncated-number repair, which bypassed it:
20
+ `[05e]` → `["05e0"]`, `00.` → `"00.0"` (upstream emits `05e0` /
21
+ `00.0` unrepaired). Valid `-0` / `-0.5` / `0e` / `0.` are
22
+ unchanged.
23
+ * The trailing-comma repair no longer strips a comma belonging to the
24
+ enclosing container when an inner object/array fails on its first
25
+ key or value: `[{{]` → `[{},{}]`, `[1,[}]` → `[1,[]]`,
26
+ `{"a": 1, "b": [}` → `{"a":1,"b":[]}` (upstream emits `[{}{}]`,
27
+ `[1[]]`, `{"a": 1 "b": []}`).
28
+ Validated by differential testing against upstream over a 270-input
29
+ grid of these shapes in every container context: the only behavior
30
+ changes vs 0.11.3 are the 123 previously-`Internal error` inputs now
31
+ repairing (or, for `e+` shapes where upstream emits invalid `e+0`,
32
+ raising a clean position-bearing error). Benchmarks flat.
33
+
34
+ ### 2026-06-12 (0.11.3)
35
+
36
+ * Fix infinite recursion (`SystemStackError`) on a quoted string
37
+ followed by a backslash-escaped delimiter, like `["y"\, "z"]`. The
38
+ missing-end-quote retry in `parse_string` stops at the comma it
39
+ detected in the first pass, but the invalid-escape repair consumed
40
+ `\,` as one two-character step, jumping over the stop index and
41
+ re-firing the retry with identical arguments forever — violating the
42
+ contract that `JSONRepairError` is the only error raised. The escaped
43
+ delimiter now ends the string there and the dangling backslash is
44
+ dropped (the standard invalid-escape repair): `["y"\, "z"]` →
45
+ `["y\"","z"]`. The stop-index check is also hardened from `==` to
46
+ `>=` so no future multi-character advance can step over it and
47
+ recurse. Deliberate divergence from upstream
48
+ [jsonrepair](https://github.com/josdejong/jsonrepair), which crashes
49
+ with "Maximum call stack size exceeded" on the same input as of
50
+ v3.14.0 (still its latest release). Found by differential fuzzing
51
+ during the 0.11.2 work and re-validated the same way: across a
52
+ 240-input grid of escape-adjacent shapes, only previously-crashing
53
+ inputs changed behavior — object shapes like `{"k": "y"\, "z"}` now
54
+ raise the same "Colon expected" as their backslash-free analog
55
+ `{"k": "y", "z"}`. Benchmarks flat vs 0.11.2.
56
+
3
57
  ### 2026-06-12 (0.11.2)
4
58
 
5
59
  * Fix the 0.11.0 doubled-colon repair silently mangling objects with a
@@ -2,6 +2,6 @@
2
2
 
3
3
  module JSON
4
4
  module Repair
5
- VERSION = '0.11.2'
5
+ VERSION = '0.12.0'
6
6
  end
7
7
  end
data/lib/json/repairer.rb CHANGED
@@ -237,6 +237,7 @@ module JSON
237
237
 
238
238
  initial = true
239
239
  while @index < @json.length && @json[@index] != CLOSING_BRACE
240
+ first_pair = initial
240
241
  if initial
241
242
  initial = false
242
243
  else
@@ -255,8 +256,12 @@ module JSON
255
256
  if @json[@index] == CLOSING_BRACE || @json[@index] == OPENING_BRACE ||
256
257
  @json[@index] == CLOSING_BRACKET || @json[@index] == OPENING_BRACKET ||
257
258
  @json[@index].nil?
258
- # repair trailing comma
259
- @output = strip_last_occurrence(@output, ',')
259
+ # repair trailing comma — but only the one this object's own loop
260
+ # emitted or inserted; on the first pair the buffer's last
261
+ # comma belongs to the enclosing container, like in [{{] or
262
+ # {"a": 1, "b": {] (divergence from upstream, which strips
263
+ # the parent's comma and emits invalid JSON like [{}{}])
264
+ @output = strip_last_occurrence(@output, ',') unless first_pair
260
265
  else
261
266
  throw_object_key_expected
262
267
  end
@@ -463,7 +468,13 @@ module JSON
463
468
  return true
464
469
  end
465
470
 
466
- if @index == stop_at_index
471
+ # >= with a sentinel guard, not ==. Divergence from upstream (which
472
+ # compares with == as of v3.14.0): a multi-character advance below
473
+ # can step over the stop index, and resuming the comma-path retry
474
+ # from beyond it would re-fire that retry with identical arguments
475
+ # forever. The invalid-escape repair below avoids the only known
476
+ # overshoot; this is the backstop guaranteeing termination.
477
+ if stop_at_index >= 0 && @index >= stop_at_index
467
478
  # use the stop index detected in the first iteration, and repair end quote
468
479
  str = insert_before_last_whitespace(str, '"')
469
480
  @output << str
@@ -569,6 +580,15 @@ module JSON
569
580
  # repair a backslash escaped newline (like in Bash scripts)
570
581
  str << '\n'
571
582
  @index += 2
583
+ elsif @index + 1 == stop_at_index
584
+ # repair invalid escape character: remove it — but the escaped
585
+ # character is the delimiter the comma-path retry said to stop
586
+ # at, so drop only the backslash and let the stop check above
587
+ # fire there, keeping the delimiter a delimiter. Divergence
588
+ # from upstream, which consumes both characters, jumps the stop
589
+ # index, and crashes ("Maximum call stack size exceeded" on
590
+ # inputs like `["y"\, "z"]` as of v3.14.0).
591
+ @index += 1
572
592
  else
573
593
  # repair invalid escape character: remove it
574
594
  str << char
@@ -723,7 +743,13 @@ module JSON
723
743
  @index += 1 while digit?(@json[@index])
724
744
  end
725
745
 
726
- if @json[@index] && @json[@index].downcase == 'e'
746
+ # Divergence from upstream: only enter the exponent branch when a
747
+ # mantissa was consumed — at this point @index > start implies at
748
+ # least one digit (the '-' and '.' paths reset otherwise). Upstream
749
+ # accepts a bare "e"/"E" here and emits invalid JSON like `e0` or
750
+ # raw `e5`; declining lets the token fall through to
751
+ # parse_unquoted_string, matching how "-e5" already becomes "-e5".
752
+ if @index > start && @json[@index] && @json[@index].downcase == 'e'
727
753
  @index += 1
728
754
  @index += 1 if ['-', '+'].include?(@json[@index])
729
755
  if at_end_of_number?
@@ -746,7 +772,9 @@ module JSON
746
772
  if @index > start
747
773
  # repair a number with leading zeros like "00789"
748
774
  num = @json[start...@index]
749
- has_invalid_leading_zero = num.match?(/^0\d/)
775
+ # the optional sign quotes "-05" like "05" (divergence from
776
+ # upstream, whose unsigned check lets "-05" through unrepaired)
777
+ has_invalid_leading_zero = num.match?(/^-?0\d/)
750
778
 
751
779
  @output << (has_invalid_leading_zero ? "\"#{num}\"" : repair_leading_dot_number(num))
752
780
  return true
@@ -771,6 +799,7 @@ module JSON
771
799
 
772
800
  initial = true
773
801
  while @index < @json.length && @json[@index] != CLOSING_BRACKET
802
+ first_item = initial
774
803
  if initial
775
804
  initial = false
776
805
  else
@@ -784,8 +813,12 @@ module JSON
784
813
  processed_value = parse_value
785
814
  next if processed_value
786
815
 
787
- # repair trailing comma
788
- @output = strip_last_occurrence(@output, ',')
816
+ # repair trailing comma — but only the one this array's own loop
817
+ # emitted or inserted; on the first item the buffer's last
818
+ # comma belongs to the enclosing container, like in [1,[}] or
819
+ # {"a": 1, "b": [} (divergence from upstream, which strips
820
+ # the parent's comma and emits invalid JSON like [1[]])
821
+ @output = strip_last_occurrence(@output, ',') unless first_item
789
822
  break
790
823
  end
791
824
 
@@ -844,7 +877,11 @@ module JSON
844
877
  # repair numbers cut off at the end
845
878
  # this will only be called when we end after a '.', '-', or 'e' and does not
846
879
  # change the number more than it needs to make it valid JSON
847
- @output << repair_leading_dot_number("#{@json[start...@index]}0")
880
+ num = "#{@json[start...@index]}0"
881
+ # quote a padded token that has an invalid leading zero, like "05e" ->
882
+ # "05e0", applying the same rule as the end of parse_number (divergence
883
+ # from upstream, which emits the invalid number raw)
884
+ @output << (num.match?(/^-?0\d/) ? "\"#{num}\"" : repair_leading_dot_number(num))
848
885
  end
849
886
 
850
887
  # Repair a number missing its digit before the decimal point, like ".5"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: json-repair
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.11.2
4
+ version: 0.12.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Aleksandr Zykov