json-repair 0.11.1 → 0.11.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 288e3502829f51d11dbf2c3a9ab45f04dd44a1fa9ae5e00c1537f47c215aad96
4
- data.tar.gz: 1ab99e5121ef3e73066157569bd85379a527cc1f44ecd7087d4f058133c4fcf0
3
+ metadata.gz: 69085d74f416811c4ac11ca7cfe2e9545a6cecdaeb32de96532932c99ab4aaf3
4
+ data.tar.gz: 4deee8e6715200ae693144a2c8cab914b9e8c78c3b539671f7632ce39d7b77f3
5
5
  SHA512:
6
- metadata.gz: ffc4cd085d9a6aa5b45f3ee605a0fa043e20c68d63fdaab5c736acecbb17f160066d4412a76496300614545577d74d5ca4b232d4354adf625913753f8c0e8477
7
- data.tar.gz: 190ea601a010b401fdf0c6aa52670a030ce81d39ebcd1af1f81c9fb2399baf6d074f3135b3b05814d1bcfcd24db53321a779b760c8eebe8d5a9ce2e029476754
6
+ metadata.gz: 31242bd165c070b1836d85a3fca120b5853a6e1ed715dbaeb75aa026563e033a7af491e318c119f3f3f899c4a7bb55382fc998372d7b55953358dc95d9c526be
7
+ data.tar.gz: b8a5a58a36d1c2b36922205f2b3b8d33e89e570fc215ed02ef288577a8d1325d96fa21cda9267ca78f44754b00a3150e5f6d149d6087c9f020e23090d47a13f2
data/.rubocop.yml CHANGED
@@ -1,5 +1,15 @@
1
+ # Merge our Exclude lists with RuboCop's defaults (vendor/**/*, tmp/**/*, …)
2
+ # instead of replacing them — CI vendors gems into vendor/bundle, which
3
+ # RuboCop must keep skipping.
4
+ inherit_mode:
5
+ merge:
6
+ - Exclude
7
+
1
8
  AllCops:
2
9
  TargetRubyVersion: 3.0
10
+ Exclude:
11
+ # gitignored local planning notes and scratch tooling (see CLAUDE.md)
12
+ - docs/**/*
3
13
 
4
14
  Style/Documentation:
5
15
  Enabled: false
data/CHANGELOG.md CHANGED
@@ -1,6 +1,49 @@
1
1
  # Changes
2
2
 
3
- ### 2026-06-12 (0.11.1)
3
+ ### 2026-06-12 (0.11.3)
4
+
5
+ * Fix infinite recursion (`SystemStackError`) on a quoted string
6
+ followed by a backslash-escaped delimiter, like `["y"\, "z"]`. The
7
+ missing-end-quote retry in `parse_string` stops at the comma it
8
+ detected in the first pass, but the invalid-escape repair consumed
9
+ `\,` as one two-character step, jumping over the stop index and
10
+ re-firing the retry with identical arguments forever — violating the
11
+ contract that `JSONRepairError` is the only error raised. The escaped
12
+ delimiter now ends the string there and the dangling backslash is
13
+ dropped (the standard invalid-escape repair): `["y"\, "z"]` →
14
+ `["y\"","z"]`. The stop-index check is also hardened from `==` to
15
+ `>=` so no future multi-character advance can step over it and
16
+ recurse. Deliberate divergence from upstream
17
+ [jsonrepair](https://github.com/josdejong/jsonrepair), which crashes
18
+ with "Maximum call stack size exceeded" on the same input as of
19
+ v3.14.0 (still its latest release). Found by differential fuzzing
20
+ during the 0.11.2 work and re-validated the same way: across a
21
+ 240-input grid of escape-adjacent shapes, only previously-crashing
22
+ inputs changed behavior — object shapes like `{"k": "y"\, "z"}` now
23
+ raise the same "Colon expected" as their backslash-free analog
24
+ `{"k": "y", "z"}`. Benchmarks flat vs 0.11.2.
25
+
26
+ ### 2026-06-12 (0.11.2)
27
+
28
+ * Fix the 0.11.0 doubled-colon repair silently mangling objects with a
29
+ stray junk word between pairs. `{"value_1": true, COMMENT "value_2":
30
+ "data"}` returned `{"value_1":true,"COMMENT":"value_2\": \"data"}`
31
+ (the junk word became a key and swallowed the real pair), and
32
+ `{ "key": "value" COMMENT "key2": "value2" }` returned a single
33
+ glued string value. Both shapes now raise "Object key expected"
34
+ again at the same positions as upstream
35
+ [jsonrepair](https://github.com/josdejong/jsonrepair) v3.14.0,
36
+ restoring the pre-0.11.0 behavior: the merge is skipped when the
37
+ pair already needed a missing-colon repair or the value string was
38
+ itself salvaged by the unescaped-quote repair — signals that the
39
+ pair was malformed in a way the merge would compound, not fix. The
40
+ salvage signal survives string concatenation: in
41
+ `{"a": "b" x "c" + "d": "e"}` the `+ "d"` segment no longer clears
42
+ it (caught in review by Copilot). All
43
+ 0.11.0 repairs (canonical, greedy, escaped quotes, unquoted
44
+ keys/values) are unchanged. Go and Python `json_repair` instead
45
+ drop the junk word; we deliberately keep raising rather than
46
+ silently discarding input (see the 0.11.0 note).
4
47
 
5
48
  * Fix a `TypeError` crash on input ending in a lone backslash inside a
6
49
  string: `"abc\` now repairs to `"abc"` (likewise `"\` → `""`,
@@ -2,6 +2,6 @@
2
2
 
3
3
  module JSON
4
4
  module Repair
5
- VERSION = '0.11.1'
5
+ VERSION = '0.11.3'
6
6
  end
7
7
  end
data/lib/json/repairer.rb CHANGED
@@ -32,6 +32,7 @@ module JSON
32
32
  @json = json
33
33
  @index = 0
34
34
  @output = +''
35
+ @repaired_unescaped_quote = false
35
36
  end
36
37
 
37
38
  def repair
@@ -295,8 +296,14 @@ module JSON
295
296
  end
296
297
 
297
298
  # repair: an object string value with unescaped quotes around a
298
- # colon, like {"a": "b": "c"}
299
- repair_doubled_colon if processed_value
299
+ # colon, like {"a": "b": "c"}. Skipped when this pair already
300
+ # needed a repair that makes the merge compound garbage: a
301
+ # missing colon (the "key" was a stray junk word, like
302
+ # {"v1": true, COMMENT "v2": "data"}) or a value glued together
303
+ # by the unescaped-quote repair (like
304
+ # {"k": "v" COMMENT "k2": "v2"}); both keep raising, matching
305
+ # upstream
306
+ repair_doubled_colon if processed_value && processed_colon && !@repaired_unescaped_quote
300
307
  end
301
308
 
302
309
  if @json[@index] == CLOSING_BRACE
@@ -315,9 +322,13 @@ module JSON
315
322
  # (the unescaped-quotes reading of the input). Greedy: keeps merging
316
323
  # while another `: "..."` follows. Only the string-colon-string
317
324
  # shape is repaired; anything else falls through to the regular
318
- # error paths. Divergence from upstream (which raises "Object key
319
- # expected" as of v3.14.0), matching the Go and Python json-repair
320
- # libraries on the canonical case.
325
+ # error paths. The call site additionally requires the pair's colon
326
+ # to be present in the input and the value string to have parsed
327
+ # without the unescaped-quote repair (@repaired_unescaped_quote) —
328
+ # when either repair already fired, the pair was malformed in a way
329
+ # this merge would compound, not fix. Divergence from upstream
330
+ # (which raises "Object key expected" as of v3.14.0), matching the
331
+ # Go and Python json-repair libraries on the canonical case.
321
332
  def repair_doubled_colon
322
333
  loop do
323
334
  colon = @index
@@ -399,6 +410,9 @@ module JSON
399
410
  # and fixing the string by inserting a quote there, or stopping at a
400
411
  # stop index detected in the first iteration.
401
412
  def parse_string(stop_at_delimiter: false, stop_at_index: -1)
413
+ # fresh parse (the backtracking re-invocations below rebuild the
414
+ # string from scratch, so they reset too); see repair_doubled_colon
415
+ @repaired_unescaped_quote = false
402
416
  skip_escape_chars = @json[@index] == BACKSLASH
403
417
  if skip_escape_chars
404
418
  # repair: remove the first escape character
@@ -449,7 +463,13 @@ module JSON
449
463
  return true
450
464
  end
451
465
 
452
- if @index == stop_at_index
466
+ # >= with a sentinel guard, not ==. Divergence from upstream (which
467
+ # compares with == as of v3.14.0): a multi-character advance below
468
+ # can step over the stop index, and resuming the comma-path retry
469
+ # from beyond it would re-fire that retry with identical arguments
470
+ # forever. The invalid-escape repair below avoids the only known
471
+ # overshoot; this is the backstop guaranteeing termination.
472
+ if stop_at_index >= 0 && @index >= stop_at_index
453
473
  # use the stop index detected in the first iteration, and repair end quote
454
474
  str = insert_before_last_whitespace(str, '"')
455
475
  @output << str
@@ -508,6 +528,7 @@ module JSON
508
528
 
509
529
  # repair unescaped quote
510
530
  str = "#{str[...o_quote]}\\#{str[o_quote..]}"
531
+ @repaired_unescaped_quote = true
511
532
  elsif stop_at_delimiter && unquoted_string_delimiter?(@json[@index])
512
533
  # we're in the mode to stop the string at the first delimiter
513
534
  # because there is an end quote missing
@@ -554,6 +575,15 @@ module JSON
554
575
  # repair a backslash escaped newline (like in Bash scripts)
555
576
  str << '\n'
556
577
  @index += 2
578
+ elsif @index + 1 == stop_at_index
579
+ # repair invalid escape character: remove it — but the escaped
580
+ # character is the delimiter the comma-path retry said to stop
581
+ # at, so drop only the backslash and let the stop check above
582
+ # fire there, keeping the delimiter a delimiter. Divergence
583
+ # from upstream, which consumes both characters, jumps the stop
584
+ # index, and crashes ("Maximum call stack size exceeded" on
585
+ # inputs like `["y"\, "z"]` as of v3.14.0).
586
+ @index += 1
557
587
  else
558
588
  # repair invalid escape character: remove it
559
589
  str << char
@@ -807,7 +837,12 @@ module JSON
807
837
  # repair: remove the end quote of the first string
808
838
  @output = strip_last_occurrence(@output, '"', strip_remaining_text: true)
809
839
  start = @output.length
840
+ # the segments form one logical string value: keep the doubled-colon
841
+ # guard's flag set when an earlier segment needed the unescaped-quote
842
+ # repair (parse_string resets it on entry)
843
+ repaired_earlier_segment = @repaired_unescaped_quote
810
844
  parsed_str = parse_string
845
+ @repaired_unescaped_quote ||= repaired_earlier_segment
811
846
  @output = if parsed_str
812
847
  # repair: remove the start quote of the second string
813
848
  remove_at_index(@output, start, 1)
@@ -15,6 +15,8 @@ module JSON
15
15
 
16
16
  @output: ::String
17
17
 
18
+ @repaired_unescaped_quote: bool
19
+
18
20
  include Repair::StringUtils
19
21
 
20
22
  CONTROL_CHARACTERS: ::Hash[::String, "\\b" | "\\f" | "\\n" | "\\r" | "\\t"]
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: json-repair
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.11.1
4
+ version: 0.11.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Aleksandr Zykov