json-repair 0.10.0 → 0.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6de36fcd3ab73ce63e1f9367d4e86dd7373dd7430b2dc87d451d75dd1f5fd685
4
- data.tar.gz: 7221ab8c14253abf6cf4d40ff8fb8e6bd93c20504a8b5e6608c93a159420dc2f
3
+ metadata.gz: 288e3502829f51d11dbf2c3a9ab45f04dd44a1fa9ae5e00c1537f47c215aad96
4
+ data.tar.gz: 1ab99e5121ef3e73066157569bd85379a527cc1f44ecd7087d4f058133c4fcf0
5
5
  SHA512:
6
- metadata.gz: 5083d8d8ada9a0b0a67beb8cb8ad1f1af53435e4e5c99632de5c68e19fd02d6dc85567b8c44cfc3068ae44118934c073d5a1acc22abf7f3d089b2add3899e083
7
- data.tar.gz: 01d4eb164137dcb0e5b3b11703e2b1099efff6749f633466c3b37e1cccec07904e04006ddca1961d30702b5f133e35b88f8c37edce2bfa72ad61a6172dbbbada
6
+ metadata.gz: ffc4cd085d9a6aa5b45f3ee605a0fa043e20c68d63fdaab5c736acecbb17f160066d4412a76496300614545577d74d5ca4b232d4354adf625913753f8c0e8477
7
+ data.tar.gz: 190ea601a010b401fdf0c6aa52670a030ce81d39ebcd1af1f81c9fb2399baf6d074f3135b3b05814d1bcfcd24db53321a779b760c8eebe8d5a9ce2e029476754
data/.rubocop.yml CHANGED
@@ -1,10 +1,6 @@
1
1
  AllCops:
2
2
  TargetRubyVersion: 3.0
3
3
 
4
- Metrics/BlockLength:
5
- Exclude:
6
- - spec/**/*
7
-
8
4
  Style/Documentation:
9
5
  Enabled: false
10
6
 
@@ -33,6 +29,8 @@ Metrics/BlockLength:
33
29
  Exclude:
34
30
  - lib/json/repairer.rb
35
31
  - spec/**/*
32
+ # restore RuboCop's default exclusion, lost when Exclude is overridden
33
+ - '*.gemspec'
36
34
 
37
35
  Metrics/BlockNesting:
38
36
  Exclude:
data/CHANGELOG.md CHANGED
@@ -1,5 +1,37 @@
1
1
  # Changes
2
2
 
3
+ ### 2026-06-12 (0.11.1)
4
+
5
+ * Fix a `TypeError` crash on input ending in a lone backslash inside a
6
+ string: `"abc\` now repairs to `"abc"` (likewise `"\` → `""`,
7
+ `["abc\` → `["abc"]`, `{"a": "b\` → `{"a":"b"}`), matching upstream
8
+ [jsonrepair](https://github.com/josdejong/jsonrepair) v3.14.0. This
9
+ was a porting bug — JS `charAt` past EOF returns `''` where Ruby
10
+ `String#[]` returns `nil`, so the invalid-escape repair in
11
+ `parse_string` crashed on `str << nil` instead of ending the string,
12
+ violating the contract that `JSONRepairError` is the only error
13
+ raised. Found by differential fuzzing during the 0.11.0 review.
14
+
15
+ ### 2026-06-12 (0.11.0)
16
+
17
+ * Repair object string values with unescaped quotes around a colon
18
+ ("doubled colon"): `{"a": "b": "c"}` → `{"a":"b\": \"c"}` — the
19
+ value reads as `b": "c`, the unescaped-quotes interpretation. The
20
+ merge preserves the literal characters between the strings
21
+ (whitespace, original quote style) and repeats greedily
22
+ (`{"a": "b": "c": "d"}` → value `b": "c": "d`). Only the
23
+ string–colon–string shape is repaired: non-string shapes like
24
+ `{"a": "b": 1}` or `{"a": 1: 2}` still raise `JSONRepairError`
25
+ rather than silently dropping data (Python `json_repair` drops the
26
+ `: 1` there). Previously all of these raised "Object key expected".
27
+ Deliberate divergence from upstream
28
+ [jsonrepair](https://github.com/josdejong/jsonrepair) (raises as of
29
+ v3.14.0), matching
30
+ [Go json-repair](https://github.com/RealAlexandreAI/json-repair)
31
+ and Python
32
+ [`json_repair`](https://github.com/mangiucugna/json_repair) on the
33
+ canonical case.
34
+
3
35
  ### 2026-06-11 (0.10.0)
4
36
 
5
37
  * Repair Markdown list markers in front of top-level values:
data/CLAUDE.md CHANGED
@@ -10,18 +10,25 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
10
10
  - `bundle exec rspec spec/json_spec.rb:42` — run a single example by line number; nearly all behavioral specs live in `spec/json_spec.rb`.
11
11
  - `bundle exec rubocop` — lint. Project-specific exclusions in `.rubocop.yml` deliberately disable several `Metrics/*` cops for `lib/json/repairer.rb` and `lib/json/repair/string_utils.rb` because the parser is long by design — don't try to "fix" it by chopping methods up.
12
12
  - `bin/console` — IRB with the gem preloaded.
13
- - `bundle exec rake install` / `bundle exec rake release` local install / publish to rubygems.org.
13
+ - `bundle exec rake bench` benchmark-ips regression baseline (`benchmark/run.rb`); run before/after perf-sensitive changes.
14
+ - `bundle exec rake install` / `bundle exec rake release` — local install / publish to rubygems.org (release prompts for a rubygems MFA OTP).
14
15
  - Type checking: `Steepfile` checks `lib/` against `sig/`. `bundle exec steep check` (typecheck) and `bundle exec rbs validate` (sig syntax) both run in CI and as part of the default rake task. `steep` and `rbs` are dev dependencies in the `Gemfile`.
15
16
 
16
- Ruby `>= 3.0.0` is required (per gemspec). CI runs against Ruby 3.3.1.
17
+ Ruby `>= 3.0.0` is required (per gemspec). CI runs against all currently maintained Ruby branches (3.3, 3.4, 4.0).
17
18
 
18
19
  ## Architecture
19
20
 
20
21
  This gem is a **Ruby port of the [josdejong/jsonrepair](https://github.com/josdejong/jsonrepair) TypeScript library**. The upstream version currently mirrored is tracked in `CHANGELOG.md` (presently v3.14.0). When syncing upstream changes, the goal is parity with the JS implementation, not idiomatic refactoring — keep method names, control flow, and repair heuristics aligned with the JS source so future syncs stay tractable.
21
22
 
23
+ A few repair heuristics deliberately go **beyond** upstream (leading-dot numbers like `.5`, Markdown list markers like `- {...}`). Each such site carries a "Divergence from upstream" comment in the code and a CHANGELOG note — keep that convention when adding more, so future upstream syncs can tell ported behavior from local extensions.
24
+
22
25
  ### Entry point
23
26
 
24
- `JSON.repair(str)` in `lib/json/repair.rb` is a thin wrapper that constructs `JSON::Repairer.new(str).repair`. `JSON::JSONRepairError` is the only error raised for unrecoverable inputs.
27
+ `JSON.repair(str)` in `lib/json/repair.rb` first tries stdlib `JSON.parse` (fast path; opt out with `skip_json_loads: true`), and falls back to `JSON::Repairer.new(str).repair` when that raises. Either way the result is re-serialized with `JSON.generate`, so **output is canonical** — whitespace collapsed, numbers normalized, duplicate keys last-write-wins — and both paths agree on it. A `REPAIR_REQUIRED_PATTERN` regex routes inputs containing comments or invalid escapes straight to the Repairer even though the bundled `json` gem would accept them. `return_objects: true` returns the parsed Ruby value instead of a string; `JSON.repair_file(path)` / `JSON.repair_io(io)` are convenience wrappers forwarding both options.
28
+
29
+ `JSON::JSONRepairError` is the only error raised for unrecoverable inputs; it exposes the failure `#position`. If the Repairer ever emits a string stdlib cannot parse (a Repairer bug), the `JSON::ParserError` is wrapped in `JSONRepairError` rather than leaked.
30
+
31
+ `exe/json-repair` is a CLI wrapper (`lib/json/repair/cli.rb`) reading stdin or a file, writing stdout, `--output FILE`, or `--overwrite`.
25
32
 
26
33
  ### The parser (`lib/json/repairer.rb`)
27
34
 
@@ -43,7 +50,7 @@ Two patterns recur and are worth knowing before editing:
43
50
  - **Backtracking via snapshots.** Methods like `parse_string` capture `i_before = @index` and `o_before = @output.length` before tentatively consuming input. If a later check (e.g. "the end quote turned out not to be a real end quote") fails, they restore both and re-invoke themselves with different flags (e.g. `stop_at_delimiter: true`, `stop_at_index: …`). Preserve this pattern when modifying string/number parsing.
44
51
  - **Repair-by-rewriting-tail.** Helpers like `insert_before_last_whitespace(@output, ',')` and `@output = strip_last_occurrence(@output, ',')` patch the already-emitted output to fix things like missing or trailing commas. These run *after* the malformed input has been partially emitted — they are the mechanism for "I now realize that earlier token needed a comma after it."
45
52
 
46
- `repair` (the public method) drives `parse_value` then handles top-level concerns: stripping Markdown fences (` ```json ... ``` `), converting newline-delimited JSON at the root into an array, dropping redundant trailing braces/brackets, and rejecting any non-whitespace trailing garbage.
53
+ `repair` (the public method) drives `parse_value` then handles top-level concerns: stripping Markdown fences (` ```json ... ``` `), skipping Markdown list markers like `- ` / `* ` / `1. ` before the root value and each newline-delimited line (`markdown_list_marker_length` / `skip_markdown_list_marker` — top-level only, never inside nested structures), converting newline-delimited JSON at the root into an array, dropping redundant trailing braces/brackets, and rejecting any non-whitespace trailing garbage.
47
54
 
48
55
  ### Shared helpers (`lib/json/repair/string_utils.rb`)
49
56
 
@@ -62,6 +69,13 @@ RBS signatures mirror the public surface of `JSON.repair`, `JSON::Repairer`, and
62
69
 
63
70
  ### Test layout
64
71
 
65
- - `spec/json_spec.rb` — the substantive behavioral suite (700+ examples covering every repair heuristic). New behavior — and every sync from upstream — belongs here.
72
+ - `spec/json_spec.rb` — the substantive behavioral suite (130+ examples, hundreds of assertions, covering every repair heuristic). New behavior — and every sync from upstream — belongs here.
73
+ - `spec/json/repair/cli_spec.rb` — the `exe/json-repair` CLI (argument handling, IO errors, exit codes).
74
+ - `spec/json/repair/string_utils_spec.rb` — direct unit coverage for a few `StringUtils` edge cases the behavioral suite can't reach.
66
75
  - `spec/json/repair_spec.rb` — sanity check on `JSON::Repair::VERSION` only.
67
- - `.rspec_status` is committed and tracks per-example pass/fail so `--only-failures` / `--next-failure` work across runs.
76
+ - SimpleCov enforces 100% line and branch coverage on the full run, so a filtered `rspec -e ...` run "fails" at the coverage gate even when all selected examples pass — ignore that exit code during TDD.
77
+ - `.rspec_status` is gitignored (local pass/fail tracking for `--only-failures` / `--next-failure`).
78
+
79
+ ## Local planning notes
80
+
81
+ `docs/` (the `TODO.md` backlog plus design specs and implementation plans under `docs/superpowers/`) is gitignored local planning material — read it for context, update it as work completes, but never commit anything under `docs/`.
data/README.md CHANGED
@@ -4,7 +4,7 @@ This is a Ruby gem designed to repair broken JSON strings. Inspired by and based
4
4
 
5
5
  ## Installation
6
6
 
7
- Add this gem to your application's Gemfield by executing:
7
+ Add this gem to your application's Gemfile by executing:
8
8
 
9
9
  ```bash
10
10
  $ bundle add json-repair
@@ -37,6 +37,12 @@ Markdown markup in LLM output is handled too: fenced code blocks like `` ```json
37
37
  JSON.repair("- {\"a\": 1}\n- {\"b\": 2}") # => '[{"a":1},{"b":2}]'
38
38
  ```
39
39
 
40
+ Object values containing unescaped quotes around a colon are merged back into a single string value:
41
+
42
+ ```ruby
43
+ JSON.repair('{"a": "b": "c"}') # => '{"a":"b\": \"c"}' — the value reads as 'b": "c'
44
+ ```
45
+
40
46
  Pass `return_objects: true` to get the parsed Ruby value (Hash, Array, or scalar) instead of a string:
41
47
 
42
48
  ```ruby
@@ -127,6 +127,10 @@ module JSON
127
127
  whitespace_except_newline?(char) || special_whitespace?(char)
128
128
  end
129
129
 
130
+ def whitespace_or_special?(char)
131
+ whitespace?(char) || special_whitespace?(char)
132
+ end
133
+
130
134
  def quote?(char)
131
135
  double_quote_like?(char) || single_quote_like?(char)
132
136
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module JSON
4
4
  module Repair
5
- VERSION = '0.10.0'
5
+ VERSION = '0.11.1'
6
6
  end
7
7
  end
data/lib/json/repairer.rb CHANGED
@@ -68,7 +68,7 @@ module JSON
68
68
  end
69
69
 
70
70
  # repair redundant end quotes
71
- while @json[@index] == CLOSING_BRACE || @json[@index] == CLOSING_BRACKET
71
+ while [CLOSING_BRACE, CLOSING_BRACKET].include?(@json[@index])
72
72
  @index += 1
73
73
  parse_whitespace_and_skip_comments
74
74
  end
@@ -236,7 +236,6 @@ module JSON
236
236
 
237
237
  initial = true
238
238
  while @index < @json.length && @json[@index] != CLOSING_BRACE
239
- processed_comma = true
240
239
  if initial
241
240
  initial = false
242
241
  else
@@ -294,6 +293,10 @@ module JSON
294
293
  end
295
294
  # :nocov:
296
295
  end
296
+
297
+ # repair: an object string value with unescaped quotes around a
298
+ # colon, like {"a": "b": "c"}
299
+ repair_doubled_colon if processed_value
297
300
  end
298
301
 
299
302
  if @json[@index] == CLOSING_BRACE
@@ -307,6 +310,59 @@ module JSON
307
310
  true
308
311
  end
309
312
 
313
+ # Repair an object value with unescaped quotes around a colon, like
314
+ # {"a": "b": "c"}, by merging it all into one string value: 'b": "c'
315
+ # (the unescaped-quotes reading of the input). Greedy: keeps merging
316
+ # while another `: "..."` follows. Only the string-colon-string
317
+ # shape is repaired; anything else falls through to the regular
318
+ # error paths. Divergence from upstream (which raises "Object key
319
+ # expected" as of v3.14.0), matching the Go and Python json-repair
320
+ # libraries on the canonical case.
321
+ def repair_doubled_colon
322
+ loop do
323
+ colon = @index
324
+ # :nocov: kept for symmetry with the start_quote scan below; unreachable
325
+ # because @index never rests on whitespace here. On first entry,
326
+ # parse_value ends with parse_whitespace_and_skip_comments. On greedy
327
+ # re-entry, every parse_string exit leaves @index off-whitespace: the
328
+ # EOF path (nil is not whitespace), the stop_at_index path (a
329
+ # prev_non_whitespace_index position), and the end-quote path (ends in
330
+ # parse_concatenated_string, whose leading whitespace skip consumes
331
+ # newlines too). If a future parse_value/parse_string change breaks
332
+ # that, this scan becomes live and the :nocov: will hide it.
333
+ colon += 1 while whitespace_or_special?(@json[colon])
334
+ # :nocov:
335
+ return unless @json[colon] == COLON
336
+
337
+ # scan past special whitespace too (unlike prev_non_whitespace_index):
338
+ # parse_whitespace treats NBSP and friends as whitespace, so this
339
+ # repair should as well. The value's last character (at worst the
340
+ # object's opening brace) always stops the scan before index 0.
341
+ end_quote = colon - 1
342
+ end_quote -= 1 while whitespace_or_special?(@json[end_quote])
343
+ return unless quote?(@json[end_quote])
344
+
345
+ start_quote = colon + 1
346
+ start_quote += 1 while whitespace_or_special?(@json[start_quote])
347
+ return unless quote?(@json[start_quote])
348
+
349
+ # repair: replace the end quote already emitted (plus any copied
350
+ # trailing whitespace) with the literal input span from that end
351
+ # quote through the next start quote, escaped as string content
352
+ @output = strip_last_occurrence(@output, '"', strip_remaining_text: true)
353
+ @json[end_quote..start_quote].each_char do |char|
354
+ @output << (char == DOUBLE_QUOTE ? '\"' : CONTROL_CHARACTERS.fetch(char, char))
355
+ end
356
+
357
+ # let parse_string consume the rest of the merged string, then
358
+ # drop the start quote it emits (already emitted escaped above)
359
+ @index = start_quote
360
+ start = @output.length
361
+ parse_string
362
+ @output = remove_at_index(@output, start, 1)
363
+ end
364
+ end
365
+
310
366
  def skip_character(char)
311
367
  if @json[@index] == char
312
368
  @index += 1
@@ -474,7 +530,9 @@ module JSON
474
530
  return true
475
531
  elsif @json[@index] == BACKSLASH
476
532
  # handle escaped content like \n or ★
477
- char = @json[@index + 1]
533
+ # nil at EOF: '' mirrors JS charAt, making the invalid-escape
534
+ # repair below a no-op that ends the string
535
+ char = @json[@index + 1] || ''
478
536
  escape_char = ESCAPE_CHARACTERS[char]
479
537
  if escape_char
480
538
  str << @json[@index, 2]
@@ -141,6 +141,8 @@ module JSON
141
141
 
142
142
  def same_line_whitespace?: (::String? char) -> bool
143
143
 
144
+ def whitespace_or_special?: (::String? char) -> bool
145
+
144
146
  def quote?: (::String? char) -> bool
145
147
 
146
148
  def double_quote?: (::String? char) -> bool
@@ -52,6 +52,10 @@ module JSON
52
52
  # Parse an object like '{"key": "value"}'
53
53
  def parse_object: () -> bool
54
54
 
55
+ # Repair an object value with unescaped quotes around a colon,
56
+ # like {"a": "b": "c"}, by merging it into one string value.
57
+ def repair_doubled_colon: () -> void
58
+
55
59
  def skip_character: (::String char) -> bool
56
60
 
57
61
  # Skip ellipsis like "[1,2,3,...]" or "[1,2,3,...,9]" or "[...,7,8,9]"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: json-repair
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.10.0
4
+ version: 0.11.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Aleksandr Zykov
@@ -9,7 +9,11 @@ bindir: exe
9
9
  cert_chain: []
10
10
  date: 1980-01-02 00:00:00.000000000 Z
11
11
  dependencies: []
12
- description: This is a simple gem that repairs broken JSON strings.
12
+ description: 'Repairs broken JSON: missing quotes and commas, unclosed brackets, trailing
13
+ commas, unquoted keys, single quotes, comments, Python constants, NDJSON, Markdown
14
+ code fences and list markers in LLM output, truncated documents, and more. A Ruby
15
+ port of the jsonrepair JavaScript library — useful whenever JSON from LLMs, APIs,
16
+ or logs does not strictly follow the standard.'
13
17
  email:
14
18
  - alexandrz@gmail.com
15
19
  executables:
@@ -44,6 +48,8 @@ metadata:
44
48
  homepage_uri: https://github.com/sashazykov/json-repair-rb
45
49
  source_code_uri: https://github.com/sashazykov/json-repair-rb
46
50
  changelog_uri: https://github.com/sashazykov/json-repair-rb/blob/main/CHANGELOG.md
51
+ documentation_uri: https://rubydoc.info/gems/json-repair
52
+ bug_tracker_uri: https://github.com/sashazykov/json-repair-rb/issues
47
53
  rdoc_options: []
48
54
  require_paths:
49
55
  - lib
@@ -60,5 +66,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
60
66
  requirements: []
61
67
  rubygems_version: 3.6.9
62
68
  specification_version: 4
63
- summary: Repairs broken JSON strings.
69
+ summary: Repair invalid or malformed JSON documents, including LLM output.
64
70
  test_files: []