json-repair 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +0 -0
- data/CLAUDE.md +67 -0
- data/README.md +15 -0
- data/exe/json-repair +6 -0
- data/lib/json/repair/cli.rb +133 -0
- data/lib/json/repair/version.rb +1 -1
- data/sig/json/repair/cli.rbs +16 -0
- metadata +7 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1873dc4a32f871c8dc79227074438b0118837f32bc3f36d004510604b265de4a
|
|
4
|
+
data.tar.gz: 33e15f3df710287842b41aa0a7490b239e5022d7fa53433ae272a103dc34d242
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: a61ee9fa2a2220c494ded4bba09da73e5366a1af65e5cd6ae4ed08141c93db2ad445f5fe5571b7778e0693b5826e0c2506d042d0b3f8a91c8f78e9f6b213ca08
|
|
7
|
+
data.tar.gz: 8f1953f3959e5a571e0c6a02de0dab1ad775b007c4024ba21f7ae8b1d70e235b145a1de90bd1ddbf29d1ee16d36649dc68dceca6d536bc5f343ff4c150a7b2a6
|
data/CHANGELOG.md
CHANGED
|
Binary file
|
data/CLAUDE.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Commands
|
|
6
|
+
|
|
7
|
+
- `bin/setup` — install dependencies via Bundler.
|
|
8
|
+
- `bundle exec rake` — default task; runs both RSpec and RuboCop.
|
|
9
|
+
- `bundle exec rspec` — run the test suite.
|
|
10
|
+
- `bundle exec rspec spec/json_spec.rb:42` — run a single example by line number; nearly all behavioral specs live in `spec/json_spec.rb`.
|
|
11
|
+
- `bundle exec rubocop` — lint. Project-specific exclusions in `.rubocop.yml` deliberately disable several `Metrics/*` cops for `lib/json/repairer.rb` and `lib/json/repair/string_utils.rb` because the parser is long by design — don't try to "fix" it by chopping methods up.
|
|
12
|
+
- `bin/console` — IRB with the gem preloaded.
|
|
13
|
+
- `bundle exec rake install` / `bundle exec rake release` — local install / publish to rubygems.org.
|
|
14
|
+
- Type checking: `Steepfile` checks `lib/` against `sig/`. Run `bundle exec steep check` if/when steep is installed (not in `Gemfile` by default).
|
|
15
|
+
|
|
16
|
+
Ruby `>= 3.0.0` is required (per gemspec). CI runs against Ruby 3.3.1.
|
|
17
|
+
|
|
18
|
+
## Architecture
|
|
19
|
+
|
|
20
|
+
This gem is a **Ruby port of the [josdejong/jsonrepair](https://github.com/josdejong/jsonrepair) TypeScript library**. The upstream version currently mirrored is tracked in `CHANGELOG.md` (presently v3.14.0). When syncing upstream changes, the goal is parity with the JS implementation, not idiomatic refactoring — keep method names, control flow, and repair heuristics aligned with the JS source so future syncs stay tractable.
|
|
21
|
+
|
|
22
|
+
### Entry point
|
|
23
|
+
|
|
24
|
+
`JSON.repair(str)` in `lib/json/repair.rb` is a thin wrapper that constructs `JSON::Repairer.new(str).repair`. `JSON::JSONRepairError` is the only error raised for unrecoverable inputs.
|
|
25
|
+
|
|
26
|
+
### The parser (`lib/json/repairer.rb`)
|
|
27
|
+
|
|
28
|
+
A single-pass, hand-written recursive-descent parser. State is three instance variables:
|
|
29
|
+
|
|
30
|
+
- `@json` — the input string (read-only after init).
|
|
31
|
+
- `@index` — the current cursor into `@json`.
|
|
32
|
+
- `@output` — a mutable `+''` buffer that accumulates the *repaired* JSON. The parser writes directly to `@output` as it walks; it does not build an AST.
|
|
33
|
+
|
|
34
|
+
Each `parse_*` method (`parse_value`, `parse_object`, `parse_array`, `parse_string`, `parse_number`, `parse_keywords`, `parse_unquoted_string`, `parse_regex`, `parse_comment`, `parse_markdown_code_block`, …) follows a contract:
|
|
35
|
+
|
|
36
|
+
1. Returns truthy if it consumed something, falsy otherwise.
|
|
37
|
+
2. On success, advances `@index` past the consumed input and appends the *valid* JSON form to `@output`.
|
|
38
|
+
3. On a recoverable mismatch (missing quote, missing comma, trailing comma, wrong quote style, etc.) it performs an in-place repair on `@output` using helpers like `insert_before_last_whitespace`, `strip_last_occurrence`, or `remove_at_index`.
|
|
39
|
+
4. On an unrecoverable error it calls one of the `throw_*` helpers, which raise `JSON::JSONRepairError`.
|
|
40
|
+
|
|
41
|
+
Two patterns recur and are worth knowing before editing:
|
|
42
|
+
|
|
43
|
+
- **Backtracking via snapshots.** Methods like `parse_string` capture `i_before = @index` and `o_before = @output.length` before tentatively consuming input. If a later check (e.g. "the end quote turned out not to be a real end quote") fails, they restore both and re-invoke themselves with different flags (e.g. `stop_at_delimiter: true`, `stop_at_index: …`). Preserve this pattern when modifying string/number parsing.
|
|
44
|
+
- **Repair-by-rewriting-tail.** Helpers like `insert_before_last_whitespace(@output, ',')` and `@output = strip_last_occurrence(@output, ',')` patch the already-emitted output to fix things like missing or trailing commas. These run *after* the malformed input has been partially emitted — they are the mechanism for "I now realize that earlier token needed a comma after it."
|
|
45
|
+
|
|
46
|
+
`repair` (the public method) drives `parse_value` then handles top-level concerns: stripping Markdown fences (` ```json ... ``` `), converting newline-delimited JSON at the root into an array, dropping redundant trailing braces/brackets, and rejecting any non-whitespace trailing garbage.
|
|
47
|
+
|
|
48
|
+
### Shared helpers (`lib/json/repair/string_utils.rb`)
|
|
49
|
+
|
|
50
|
+
`JSON::Repair::StringUtils` is a mixin included into `Repairer`. It holds:
|
|
51
|
+
|
|
52
|
+
- Character constants (`OPENING_BRACE`, `BACKSLASH`, smart-quote variants, special whitespace code points, etc.) used in lieu of magic literals.
|
|
53
|
+
- Character-class predicates (`digit?`, `hex?`, `quote?`, `single_quote_like?`, `delimiter?`, `whitespace?`, `special_whitespace?`, `start_of_value?`, …).
|
|
54
|
+
- The keyword machinery — `parse_keywords` / `parse_keyword` — which converts Python `True`/`False`/`None` and Ruby `nil` into their JSON equivalents in addition to recognizing `true`/`false`/`null`.
|
|
55
|
+
- Output-buffer surgery helpers: `strip_last_occurrence`, `insert_before_last_whitespace`, `remove_at_index`, `ends_with_comma_or_newline?`.
|
|
56
|
+
|
|
57
|
+
Because the mixin reads `@json`, `@index`, and `@output` directly (notably inside `parse_keyword`), it is **not standalone** — it is coupled to `Repairer`'s state and should only be mixed into classes that own those ivars.
|
|
58
|
+
|
|
59
|
+
### Type signatures (`sig/`)
|
|
60
|
+
|
|
61
|
+
RBS signatures mirror the public surface of `JSON.repair`, `JSON::Repairer`, and `JSON::Repair::StringUtils`. Update them in lockstep with `lib/` changes; the `Steepfile` will surface drift.
|
|
62
|
+
|
|
63
|
+
### Test layout
|
|
64
|
+
|
|
65
|
+
- `spec/json_spec.rb` — the substantive behavioral suite (700+ examples covering every repair heuristic). New behavior — and every sync from upstream — belongs here.
|
|
66
|
+
- `spec/json/repair_spec.rb` — sanity check on `JSON::Repair::VERSION` only.
|
|
67
|
+
- `.rspec_status` is committed and tracks per-example pass/fail so `--only-failures` / `--next-failure` work across runs.
|
data/README.md
CHANGED
|
@@ -31,6 +31,21 @@ puts repaired_json # Outputs: {"name": "Alice", "age": 25}
|
|
|
31
31
|
|
|
32
32
|
The `repair` method takes a string containing JSON data and returns a corrected version of this string, ensuring it is valid JSON.
|
|
33
33
|
|
|
34
|
+
## Command line
|
|
35
|
+
|
|
36
|
+
The gem ships a `json-repair` executable. It reads from stdin or a file and writes to stdout, `--output FILE`, or back over the input file with `--overwrite`.
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
$ echo '{a:1,}' | json-repair
|
|
40
|
+
{"a":1}
|
|
41
|
+
|
|
42
|
+
$ json-repair broken.json
|
|
43
|
+
$ json-repair broken.json -o fixed.json
|
|
44
|
+
$ json-repair broken.json --overwrite
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
Run `json-repair --help` for the full list of options.
|
|
48
|
+
|
|
34
49
|
## Development
|
|
35
50
|
|
|
36
51
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
data/exe/json-repair
ADDED
|
@@ -0,0 +1,133 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'fileutils'
|
|
4
|
+
require 'optparse'
|
|
5
|
+
require 'tempfile'
|
|
6
|
+
require_relative '../repair'
|
|
7
|
+
|
|
8
|
+
module JSON
|
|
9
|
+
module Repair
|
|
10
|
+
class CLI
|
|
11
|
+
def self.call(argv, stdin: $stdin, stdout: $stdout, stderr: $stderr)
|
|
12
|
+
new(stdin: stdin, stdout: stdout, stderr: stderr).call(argv)
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
def initialize(stdin: $stdin, stdout: $stdout, stderr: $stderr)
|
|
16
|
+
@stdin = stdin
|
|
17
|
+
@stdout = stdout
|
|
18
|
+
@stderr = stderr
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
# Reset per-invocation state so a single instance can be safely reused
|
|
22
|
+
# (e.g. `cli = CLI.new; cli.call(['-v']); cli.call(['x'])`).
|
|
23
|
+
def call(argv)
|
|
24
|
+
@output_path = @halt = nil
|
|
25
|
+
@overwrite = false
|
|
26
|
+
run(argv)
|
|
27
|
+
rescue OptionParser::ParseError, JSON::JSONRepairError, SystemCallError, IOError,
|
|
28
|
+
SystemStackError => e
|
|
29
|
+
@stderr.puts "json-repair: #{e.message}"
|
|
30
|
+
1
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
private
|
|
34
|
+
|
|
35
|
+
def run(argv)
|
|
36
|
+
positional = catch(:halt) { parser.parse(argv) }
|
|
37
|
+
return @halt if @halt
|
|
38
|
+
|
|
39
|
+
input_path = positional.first
|
|
40
|
+
return 1 unless validate(positional, input_path)
|
|
41
|
+
|
|
42
|
+
repaired = JSON.repair(read_input(input_path))
|
|
43
|
+
write_output(repaired, input_path)
|
|
44
|
+
0
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
def validate(positional, input_path)
|
|
48
|
+
error = validation_error(positional, input_path)
|
|
49
|
+
return true unless error
|
|
50
|
+
|
|
51
|
+
@stderr.puts "json-repair: #{error}"
|
|
52
|
+
false
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
def validation_error(positional, input_path)
|
|
56
|
+
return "unexpected argument: #{positional[1]}" if positional.length > 1
|
|
57
|
+
return '--overwrite requires a filename' if @overwrite && input_path.nil?
|
|
58
|
+
return '--overwrite and --output are mutually exclusive' if @overwrite && @output_path
|
|
59
|
+
|
|
60
|
+
nil
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
def read_input(input_path)
|
|
64
|
+
raw = input_path ? File.read(input_path) : @stdin.read
|
|
65
|
+
raw.force_encoding(Encoding::UTF_8)
|
|
66
|
+
raise JSON::JSONRepairError, 'input is not valid UTF-8' unless raw.valid_encoding?
|
|
67
|
+
|
|
68
|
+
raw
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
def write_output(repaired, input_path)
|
|
72
|
+
if @overwrite
|
|
73
|
+
replace_in_place(input_path, repaired)
|
|
74
|
+
elsif @output_path
|
|
75
|
+
File.write(@output_path, repaired)
|
|
76
|
+
else
|
|
77
|
+
@stdout.write(repaired)
|
|
78
|
+
@stdout.write("\n") unless repaired.end_with?("\n")
|
|
79
|
+
end
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
# Write to a uniquely-named tempfile alongside the input, then move it
|
|
83
|
+
# over the original. Tempfile.create uses O_EXCL + a random suffix, so
|
|
84
|
+
# the temp path is safe against symlink / clobber races; FileUtils.mv
|
|
85
|
+
# with force: true handles cross-device renames and Windows, where
|
|
86
|
+
# File.rename cannot overwrite an existing destination. The original
|
|
87
|
+
# file's mode is preserved (Tempfile defaults to 0600).
|
|
88
|
+
#
|
|
89
|
+
# Symlinks are followed via File.realpath so the underlying file is
|
|
90
|
+
# rewritten in place and the link is left pointing at it; otherwise
|
|
91
|
+
# the rename would replace the link itself with a regular file.
|
|
92
|
+
def replace_in_place(input_path, repaired)
|
|
93
|
+
real_path = File.realpath(input_path)
|
|
94
|
+
original_mode = File.stat(real_path).mode
|
|
95
|
+
Tempfile.create(['json-repair', '.tmp'], File.dirname(real_path)) do |tmp|
|
|
96
|
+
tmp.write(repaired)
|
|
97
|
+
tmp.close
|
|
98
|
+
File.chmod(original_mode, tmp.path)
|
|
99
|
+
FileUtils.mv(tmp.path, real_path, force: true)
|
|
100
|
+
end
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
def parser
|
|
104
|
+
OptionParser.new do |opts|
|
|
105
|
+
opts.banner = 'Usage: json-repair [filename] [options]'
|
|
106
|
+
opts.separator ''
|
|
107
|
+
opts.separator 'Repair a broken JSON document. Reads stdin when no filename is given.'
|
|
108
|
+
opts.separator ''
|
|
109
|
+
define_options(opts)
|
|
110
|
+
end
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
OVERWRITE_DESC = 'Replace the input file in place (requires filename; conflicts with --output)'
|
|
114
|
+
private_constant :OVERWRITE_DESC
|
|
115
|
+
|
|
116
|
+
def define_options(opts)
|
|
117
|
+
opts.on('-o', '--output FILE', 'Write repaired JSON to FILE') { |f| @output_path = f }
|
|
118
|
+
opts.on('--overwrite', OVERWRITE_DESC) { @overwrite = true }
|
|
119
|
+
opts.on('-v', '--version', 'Print version and exit') { halt_with(JSON::Repair::VERSION) }
|
|
120
|
+
opts.on('-h', '--help', 'Print this help and exit') { halt_with(opts.help) }
|
|
121
|
+
end
|
|
122
|
+
|
|
123
|
+
# Print to stdout and short-circuit `parser.parse` so trailing args
|
|
124
|
+
# after --version/--help do not raise OptionParser::ParseError and
|
|
125
|
+
# flip the exit code (the option text promises "...and exit").
|
|
126
|
+
def halt_with(message)
|
|
127
|
+
@stdout.puts message
|
|
128
|
+
@halt = 0
|
|
129
|
+
throw :halt
|
|
130
|
+
end
|
|
131
|
+
end
|
|
132
|
+
end
|
|
133
|
+
end
|
data/lib/json/repair/version.rb
CHANGED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
module JSON
|
|
2
|
+
module Repair
|
|
3
|
+
class CLI
|
|
4
|
+
# `::IO | ::StringIO` because `::StringIO` is not an `::IO` subclass; the
|
|
5
|
+
# specs inject `::StringIO` instances and any other duck-typed stream
|
|
6
|
+
# that implements `#read` / `#write` / `#puts` would work too.
|
|
7
|
+
type stream = ::IO | ::StringIO
|
|
8
|
+
|
|
9
|
+
def self.call: (::Array[::String] argv, ?stdin: stream, ?stdout: stream, ?stderr: stream) -> ::Integer
|
|
10
|
+
|
|
11
|
+
def initialize: (?stdin: stream, ?stdout: stream, ?stderr: stream) -> void
|
|
12
|
+
|
|
13
|
+
def call: (::Array[::String] argv) -> ::Integer
|
|
14
|
+
end
|
|
15
|
+
end
|
|
16
|
+
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: json-repair
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.4.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Aleksandr Zykov
|
|
@@ -12,23 +12,28 @@ dependencies: []
|
|
|
12
12
|
description: This is a simple gem that repairs broken JSON strings.
|
|
13
13
|
email:
|
|
14
14
|
- alexandrz@gmail.com
|
|
15
|
-
executables:
|
|
15
|
+
executables:
|
|
16
|
+
- json-repair
|
|
16
17
|
extensions: []
|
|
17
18
|
extra_rdoc_files: []
|
|
18
19
|
files:
|
|
19
20
|
- ".rspec"
|
|
20
21
|
- ".rubocop.yml"
|
|
21
22
|
- CHANGELOG.md
|
|
23
|
+
- CLAUDE.md
|
|
22
24
|
- CODE_OF_CONDUCT.md
|
|
23
25
|
- LICENSE.txt
|
|
24
26
|
- README.md
|
|
25
27
|
- Rakefile
|
|
26
28
|
- Steepfile
|
|
29
|
+
- exe/json-repair
|
|
27
30
|
- lib/json/repair.rb
|
|
31
|
+
- lib/json/repair/cli.rb
|
|
28
32
|
- lib/json/repair/string_utils.rb
|
|
29
33
|
- lib/json/repair/version.rb
|
|
30
34
|
- lib/json/repairer.rb
|
|
31
35
|
- sig/json/repair.rbs
|
|
36
|
+
- sig/json/repair/cli.rbs
|
|
32
37
|
- sig/json/repair/string_utils.rbs
|
|
33
38
|
- sig/json/repairer.rbs
|
|
34
39
|
homepage: https://github.com/sashazykov/json-repair-rb
|