json-repair 0.5.0 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -0
- data/CHANGELOG.md +36 -0
- data/CLAUDE.md +2 -2
- data/README.md +25 -1
- data/Rakefile +16 -1
- data/Steepfile +5 -0
- data/lib/json/repair/cli.rb +7 -7
- data/lib/json/repair/version.rb +1 -1
- data/lib/json/repair.rb +17 -2
- data/lib/json/repairer.rb +14 -4
- data/sig/json/repair/cli.rbs +34 -3
- data/sig/json/repair.rbs +9 -1
- data/sig/json/repairer.rbs +10 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: fdf2958528b936eba0faf6c724a20d10a3a3e0b95329bc1866740c99432f6fe3
|
|
4
|
+
data.tar.gz: 33fa97d2c7689ea594723e5d0d412646cf57a2ebdbf79f3b62947d4928963f32
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 587323c57caac3e53af1da24cfeee47df18ff738516d8fd2862d37a547c6e864cf5653c8b43427ec58aa47be3f2e958c991208b36de098c5ffc9503f2c64a0a2
|
|
7
|
+
data.tar.gz: f5380cfdca6a4f60833ab57fc8465715b546415fa4f6ed5781b6bb24653ce324d2b6d928f9a5ff87b4deead145391303c1e805bf0fbff238b9ef1c7024a1f4aa
|
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,41 @@
|
|
|
1
1
|
# Changes
|
|
2
2
|
|
|
3
|
+
### 2026-05-12 (0.7.0)
|
|
4
|
+
|
|
5
|
+
* `JSON.repair` now always returns canonical JSON via
|
|
6
|
+
`JSON.generate`. When the input is already valid JSON, stdlib
|
|
7
|
+
`JSON.parse` handles it directly; when it isn't, the repairer
|
|
8
|
+
produces an intermediate string that's then re-parsed and serialized
|
|
9
|
+
the same way. Both paths converge on the same output for any given
|
|
10
|
+
input, so `JSON.repair(json)` and
|
|
11
|
+
`JSON.repair(json, skip_json_loads: true)` agree on result and only
|
|
12
|
+
differ in how they got there.
|
|
13
|
+
* **Breaking:** outputs are now canonical instead of preserving the
|
|
14
|
+
input's exact formatting. Whitespace is collapsed
|
|
15
|
+
(`'{"a": 1}'` → `'{"a":1}'`), numbers are normalized
|
|
16
|
+
(`2300e3` → `2300000.0`, `-0` → `0`), `\uXXXX` escapes are decoded
|
|
17
|
+
to their literal characters, `\/` is unescaped to `/`, and objects
|
|
18
|
+
with duplicate keys are collapsed to the last-write-wins form
|
|
19
|
+
(`{"a":1,"a":2}` → `{"a":2}`). Callers that need a parsed Ruby
|
|
20
|
+
value can opt out of the final `JSON.generate` step with
|
|
21
|
+
`return_objects: true`.
|
|
22
|
+
* `skip_json_loads:` keyword argument added (default `false`,
|
|
23
|
+
mirroring Python's
|
|
24
|
+
[`json_repair`](https://github.com/mangiucugna/json_repair)
|
|
25
|
+
option). Passing `true` skips the stdlib `JSON.parse` fast attempt
|
|
26
|
+
and routes the input through the repairer first; the final output
|
|
27
|
+
is identical, so the option is purely a performance knob for
|
|
28
|
+
callers who know their input will need repair.
|
|
29
|
+
|
|
30
|
+
### 2026-05-12 (0.6.0)
|
|
31
|
+
|
|
32
|
+
* `JSON.repair` accepts a `return_objects:` keyword argument. Pass
|
|
33
|
+
`return_objects: true` to receive the parsed Ruby value (Hash, Array,
|
|
34
|
+
or scalar) instead of a serialized JSON string. Default is `false`,
|
|
35
|
+
preserving the existing return-a-string contract. Mirrors Python's
|
|
36
|
+
`return_objects` option on
|
|
37
|
+
[`json_repair`](https://github.com/mangiucugna/json_repair).
|
|
38
|
+
|
|
3
39
|
### 2026-05-12 (0.5.0)
|
|
4
40
|
|
|
5
41
|
* `JSON::JSONRepairError#position` returns the input index at which the
|
data/CLAUDE.md
CHANGED
|
@@ -5,13 +5,13 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|
|
5
5
|
## Commands
|
|
6
6
|
|
|
7
7
|
- `bin/setup` — install dependencies via Bundler.
|
|
8
|
-
- `bundle exec rake` — default task; runs
|
|
8
|
+
- `bundle exec rake` — default task; runs RSpec, RuboCop, RBS validate, and Steep.
|
|
9
9
|
- `bundle exec rspec` — run the test suite.
|
|
10
10
|
- `bundle exec rspec spec/json_spec.rb:42` — run a single example by line number; nearly all behavioral specs live in `spec/json_spec.rb`.
|
|
11
11
|
- `bundle exec rubocop` — lint. Project-specific exclusions in `.rubocop.yml` deliberately disable several `Metrics/*` cops for `lib/json/repairer.rb` and `lib/json/repair/string_utils.rb` because the parser is long by design — don't try to "fix" it by chopping methods up.
|
|
12
12
|
- `bin/console` — IRB with the gem preloaded.
|
|
13
13
|
- `bundle exec rake install` / `bundle exec rake release` — local install / publish to rubygems.org.
|
|
14
|
-
- Type checking: `Steepfile` checks `lib/` against `sig/`.
|
|
14
|
+
- Type checking: `Steepfile` checks `lib/` against `sig/`. `bundle exec steep check` (typecheck) and `bundle exec rbs validate` (sig syntax) both run in CI and as part of the default rake task. `steep` and `rbs` are dev dependencies in the `Gemfile`.
|
|
15
15
|
|
|
16
16
|
Ruby `>= 3.0.0` is required (per gemspec). CI runs against Ruby 3.3.1.
|
|
17
17
|
|
data/README.md
CHANGED
|
@@ -26,11 +26,33 @@ require 'json/repair'
|
|
|
26
26
|
# Example of repairing a JSON string
|
|
27
27
|
broken_json = '{name: Alice, "age": 25,}'
|
|
28
28
|
repaired_json = JSON.repair(broken_json)
|
|
29
|
-
puts repaired_json # Outputs: {"name":
|
|
29
|
+
puts repaired_json # Outputs: {"name":"Alice","age":25}
|
|
30
30
|
```
|
|
31
31
|
|
|
32
32
|
The `repair` method takes a string containing JSON data and returns a corrected version of this string, ensuring it is valid JSON.
|
|
33
33
|
|
|
34
|
+
Pass `return_objects: true` to get the parsed Ruby value (Hash, Array, or scalar) instead of a string:
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
JSON.repair('{a: 1, b: [2, 3,]}', return_objects: true)
|
|
38
|
+
# => {"a" => 1, "b" => [2, 3]}
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### Canonical output
|
|
42
|
+
|
|
43
|
+
`JSON.repair` returns canonical JSON via `JSON.generate`. When the input is already valid, stdlib `JSON.parse` handles it; otherwise the repairer fixes it up and the result is re-serialized the same way. Either way, the output is the canonical form of the parsed value — whitespace is collapsed, numbers are normalized, `\uXXXX` escapes are decoded to literal characters, and objects with duplicate keys collapse to last-write-wins.
|
|
44
|
+
|
|
45
|
+
```ruby
|
|
46
|
+
JSON.repair('{"a": 1}') # => '{"a":1}'
|
|
47
|
+
JSON.repair('{a:1}') # => '{"a":1}'
|
|
48
|
+
JSON.repair('2300e3') # => '2300000.0'
|
|
49
|
+
JSON.repair('{"a":1,"a":2}') # => '{"a":2}'
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
If you need the parsed Ruby value instead of a string, pass `return_objects: true` (covered above).
|
|
53
|
+
|
|
54
|
+
`skip_json_loads: true` skips the stdlib `JSON.parse` attempt and routes the input straight through the repairer. The output is the same; the option is purely a performance knob for callers who know their input will need repair.
|
|
55
|
+
|
|
34
56
|
## Command line
|
|
35
57
|
|
|
36
58
|
The gem ships a `json-repair` executable. It reads from stdin or a file and writes to stdout, `--output FILE`, or back over the input file with `--overwrite`.
|
|
@@ -50,6 +72,8 @@ Run `json-repair --help` for the full list of options.
|
|
|
50
72
|
|
|
51
73
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
|
52
74
|
|
|
75
|
+
Run `bundle exec rake bench` for a `benchmark-ips` regression baseline across four canned scenarios (valid mixed JSON, broken LLM-style output, a large array, deeply nested objects). The harness lives under `benchmark/` and is not shipped in the published gem.
|
|
76
|
+
|
|
53
77
|
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
|
54
78
|
|
|
55
79
|
## Contributing
|
data/Rakefile
CHANGED
|
@@ -9,4 +9,19 @@ require 'rubocop/rake_task'
|
|
|
9
9
|
|
|
10
10
|
RuboCop::RakeTask.new
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
desc 'Validate RBS signatures in sig/'
|
|
13
|
+
task :rbs do
|
|
14
|
+
sh 'bundle exec rbs validate'
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
desc 'Run Steep type check against sig/'
|
|
18
|
+
task :steep do
|
|
19
|
+
sh 'bundle exec steep check'
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
desc 'Run benchmark/run.rb (regression baseline for JSON.repair)'
|
|
23
|
+
task :bench do
|
|
24
|
+
ruby '-Ilib', 'benchmark/run.rb'
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
task default: %i[spec rubocop rbs steep]
|
data/Steepfile
CHANGED
data/lib/json/repair/cli.rb
CHANGED
|
@@ -34,7 +34,8 @@ module JSON
|
|
|
34
34
|
|
|
35
35
|
def run(argv)
|
|
36
36
|
positional = catch(:halt) { parser.parse(argv) }
|
|
37
|
-
|
|
37
|
+
halt = @halt
|
|
38
|
+
return halt if halt
|
|
38
39
|
|
|
39
40
|
input_path = positional.first
|
|
40
41
|
return 1 unless validate(positional, input_path)
|
|
@@ -61,7 +62,7 @@ module JSON
|
|
|
61
62
|
end
|
|
62
63
|
|
|
63
64
|
def read_input(input_path)
|
|
64
|
-
raw = input_path ? File.read(input_path) : @stdin.read
|
|
65
|
+
raw = (input_path ? File.read(input_path) : @stdin.read).to_s
|
|
65
66
|
raw.force_encoding(Encoding::UTF_8)
|
|
66
67
|
raise JSON::JSONRepairError, 'input is not valid UTF-8' unless raw.valid_encoding?
|
|
67
68
|
|
|
@@ -69,13 +70,12 @@ module JSON
|
|
|
69
70
|
end
|
|
70
71
|
|
|
71
72
|
def write_output(repaired, input_path)
|
|
72
|
-
if @overwrite
|
|
73
|
+
if @overwrite && input_path
|
|
73
74
|
replace_in_place(input_path, repaired)
|
|
74
|
-
elsif @output_path
|
|
75
|
-
File.write(
|
|
75
|
+
elsif (output_path = @output_path)
|
|
76
|
+
File.write(output_path, repaired)
|
|
76
77
|
else
|
|
77
|
-
@stdout.write(repaired)
|
|
78
|
-
@stdout.write("\n") unless repaired.end_with?("\n")
|
|
78
|
+
@stdout.write(repaired, "\n")
|
|
79
79
|
end
|
|
80
80
|
end
|
|
81
81
|
|
data/lib/json/repair/version.rb
CHANGED
data/lib/json/repair.rb
CHANGED
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require 'json'
|
|
4
|
+
|
|
3
5
|
require_relative 'repair/version'
|
|
4
6
|
require_relative 'repairer'
|
|
5
7
|
|
|
@@ -13,7 +15,20 @@ module JSON
|
|
|
13
15
|
end
|
|
14
16
|
end
|
|
15
17
|
|
|
16
|
-
def self.repair(json)
|
|
17
|
-
|
|
18
|
+
def self.repair(json, return_objects: false, skip_json_loads: false)
|
|
19
|
+
parsed = skip_json_loads ? repaired_parse(json) : tolerant_parse(json)
|
|
20
|
+
return_objects ? parsed : JSON.generate(parsed)
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
def self.tolerant_parse(json)
|
|
24
|
+
JSON.parse(json)
|
|
25
|
+
rescue JSON::ParserError
|
|
26
|
+
repaired_parse(json)
|
|
27
|
+
end
|
|
28
|
+
private_class_method :tolerant_parse
|
|
29
|
+
|
|
30
|
+
def self.repaired_parse(json)
|
|
31
|
+
JSON.parse(Repairer.new(json).repair)
|
|
18
32
|
end
|
|
33
|
+
private_class_method :repaired_parse
|
|
19
34
|
end
|
data/lib/json/repairer.rb
CHANGED
|
@@ -227,9 +227,19 @@ module JSON
|
|
|
227
227
|
if processed_colon || truncated_text
|
|
228
228
|
# repair missing object value
|
|
229
229
|
@output << 'null'
|
|
230
|
+
# :nocov:
|
|
230
231
|
else
|
|
232
|
+
# Unreachable through JSON.repair: if we got here, the colon-repair
|
|
233
|
+
# branch above ran, which required start_of_value? to be true. Every
|
|
234
|
+
# char that satisfies start_of_value? (see REGEX_START_OF_VALUE plus
|
|
235
|
+
# quote chars) is consumable by some parse_* method, so parse_value
|
|
236
|
+
# cannot return false in this state. Preserved for parity with the
|
|
237
|
+
# upstream JS parser; if a future change to REGEX_START_OF_VALUE or
|
|
238
|
+
# parse_unquoted_string invalidates that invariant, this branch
|
|
239
|
+
# becomes live and the :nocov: will hide it.
|
|
231
240
|
throw_colon_expected
|
|
232
241
|
end
|
|
242
|
+
# :nocov:
|
|
233
243
|
end
|
|
234
244
|
end
|
|
235
245
|
|
|
@@ -725,10 +735,10 @@ module JSON
|
|
|
725
735
|
processed_value = parse_value
|
|
726
736
|
end
|
|
727
737
|
|
|
728
|
-
|
|
729
|
-
|
|
730
|
-
|
|
731
|
-
|
|
738
|
+
# repair: remove trailing comma
|
|
739
|
+
# (the `while processed_value` loop above only exits when processed_value
|
|
740
|
+
# is falsy, so the upstream JS `if (!processedValue)` guard is redundant)
|
|
741
|
+
@output = strip_last_occurrence(@output, ',')
|
|
732
742
|
|
|
733
743
|
# repair: wrap the output inside array brackets
|
|
734
744
|
@output = "[\n#{@output}\n]"
|
data/sig/json/repair/cli.rbs
CHANGED
|
@@ -1,16 +1,47 @@
|
|
|
1
1
|
module JSON
|
|
2
2
|
module Repair
|
|
3
3
|
class CLI
|
|
4
|
-
# `::IO | ::StringIO` because `::StringIO` is not an `::IO` subclass
|
|
5
|
-
# specs inject `::StringIO` instances
|
|
6
|
-
# that implements `#read` / `#write` / `#puts` would work too.
|
|
4
|
+
# `::IO | ::StringIO` because `::StringIO` is not an `::IO` subclass
|
|
5
|
+
# and the specs inject `::StringIO` instances.
|
|
7
6
|
type stream = ::IO | ::StringIO
|
|
8
7
|
|
|
8
|
+
# Marked `private_constant` in `lib/json/repair/cli.rb`. RBS has no
|
|
9
|
+
# `private_constant` syntax, so the declaration is unavoidably public
|
|
10
|
+
# in the signature; do not rely on it from outside this class.
|
|
11
|
+
OVERWRITE_DESC: ::String
|
|
12
|
+
|
|
13
|
+
@stdin: stream
|
|
14
|
+
@stdout: stream
|
|
15
|
+
@stderr: stream
|
|
16
|
+
@output_path: ::String?
|
|
17
|
+
@halt: ::Integer?
|
|
18
|
+
@overwrite: bool
|
|
19
|
+
|
|
9
20
|
def self.call: (::Array[::String] argv, ?stdin: stream, ?stdout: stream, ?stderr: stream) -> ::Integer
|
|
10
21
|
|
|
11
22
|
def initialize: (?stdin: stream, ?stdout: stream, ?stderr: stream) -> void
|
|
12
23
|
|
|
13
24
|
def call: (::Array[::String] argv) -> ::Integer
|
|
25
|
+
|
|
26
|
+
private
|
|
27
|
+
|
|
28
|
+
def run: (::Array[::String] argv) -> ::Integer
|
|
29
|
+
|
|
30
|
+
def validate: (::Array[::String] positional, ::String? input_path) -> bool
|
|
31
|
+
|
|
32
|
+
def validation_error: (::Array[::String] positional, ::String? input_path) -> ::String?
|
|
33
|
+
|
|
34
|
+
def read_input: (::String? input_path) -> ::String
|
|
35
|
+
|
|
36
|
+
def write_output: (::String repaired, ::String? input_path) -> void
|
|
37
|
+
|
|
38
|
+
def replace_in_place: (::String input_path, ::String repaired) -> void
|
|
39
|
+
|
|
40
|
+
def parser: () -> ::OptionParser
|
|
41
|
+
|
|
42
|
+
def define_options: (::OptionParser opts) -> void
|
|
43
|
+
|
|
44
|
+
def halt_with: (::String message) -> void
|
|
14
45
|
end
|
|
15
46
|
end
|
|
16
47
|
end
|
data/sig/json/repair.rbs
CHANGED
|
@@ -9,5 +9,13 @@ module JSON
|
|
|
9
9
|
VERSION: ::String
|
|
10
10
|
end
|
|
11
11
|
|
|
12
|
-
def self.repair: (::String json) -> ::String
|
|
12
|
+
def self.repair: (::String json, return_objects: false, ?skip_json_loads: bool) -> ::String
|
|
13
|
+
| (::String json, return_objects: true, ?skip_json_loads: bool) -> untyped
|
|
14
|
+
| (::String json, ?skip_json_loads: bool) -> ::String
|
|
15
|
+
|
|
16
|
+
private
|
|
17
|
+
|
|
18
|
+
def self.tolerant_parse: (::String json) -> untyped
|
|
19
|
+
|
|
20
|
+
def self.repaired_parse: (::String json) -> untyped
|
|
13
21
|
end
|
data/sig/json/repairer.rbs
CHANGED
|
@@ -1,6 +1,15 @@
|
|
|
1
1
|
module JSON
|
|
2
2
|
class Repairer
|
|
3
|
-
|
|
3
|
+
# `untyped` (not `::String`) because the parser idiomatically uses
|
|
4
|
+
# `@json[@index]` past EOF, relying on Ruby returning `nil` as a
|
|
5
|
+
# sentinel. Declaring `@json: ::String` makes steep infer `String?`
|
|
6
|
+
# from `String#[]` and rejects ~15 downstream call sites that pass
|
|
7
|
+
# the result to methods expecting `String`. Tightening would either
|
|
8
|
+
# require pervasive nil-extraction at every indexing site or a
|
|
9
|
+
# private accessor — both of which break parity with the upstream
|
|
10
|
+
# JS source (see CLAUDE.md and `.rubocop.yml` exceptions for
|
|
11
|
+
# `lib/json/repairer.rb`).
|
|
12
|
+
@json: untyped
|
|
4
13
|
|
|
5
14
|
@index: Integer
|
|
6
15
|
|