json-repair 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b50bffd203f06c7b7d2fa875802b66dd6fc944d01fe7c7f6c349233b2dc73d60
4
- data.tar.gz: f018489c9572a61a72e9784af8f2b2fec335e215933ad0efea1265cff7d7be4e
3
+ metadata.gz: 1873dc4a32f871c8dc79227074438b0118837f32bc3f36d004510604b265de4a
4
+ data.tar.gz: 33e15f3df710287842b41aa0a7490b239e5022d7fa53433ae272a103dc34d242
5
5
  SHA512:
6
- metadata.gz: 7b1047f154815fde7e587fac75c1316ccf799a3ce73c8d5da97ae94305cc8368ea745f7472972a552da4a9df61acb54730d39a2080eb5e61e9fc46d234bbcfd0
7
- data.tar.gz: 25d80f6b35509da21cbf0932a67187bd8f69ccac2f06c15e93dbc74a45c0ae4018145051f6d601df5b651aa33036d8cd4dcafefe800f5533110271a38500b4d7
6
+ metadata.gz: a61ee9fa2a2220c494ded4bba09da73e5366a1af65e5cd6ae4ed08141c93db2ad445f5fe5571b7778e0693b5826e0c2506d042d0b3f8a91c8f78e9f6b213ca08
7
+ data.tar.gz: 8f1953f3959e5a571e0c6a02de0dab1ad775b007c4024ba21f7ae8b1d70e235b145a1de90bd1ddbf29d1ee16d36649dc68dceca6d536bc5f343ff4c150a7b2a6
data/CHANGELOG.md CHANGED
Binary file
data/CLAUDE.md ADDED
@@ -0,0 +1,67 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Commands
6
+
7
+ - `bin/setup` — install dependencies via Bundler.
8
+ - `bundle exec rake` — default task; runs both RSpec and RuboCop.
9
+ - `bundle exec rspec` — run the test suite.
10
+ - `bundle exec rspec spec/json_spec.rb:42` — run a single example by line number; nearly all behavioral specs live in `spec/json_spec.rb`.
11
+ - `bundle exec rubocop` — lint. Project-specific exclusions in `.rubocop.yml` deliberately disable several `Metrics/*` cops for `lib/json/repairer.rb` and `lib/json/repair/string_utils.rb` because the parser is long by design — don't try to "fix" it by chopping methods up.
12
+ - `bin/console` — IRB with the gem preloaded.
13
+ - `bundle exec rake install` / `bundle exec rake release` — local install / publish to rubygems.org.
14
+ - Type checking: `Steepfile` checks `lib/` against `sig/`. Run `bundle exec steep check` if/when steep is installed (not in `Gemfile` by default).
15
+
16
+ Ruby `>= 3.0.0` is required (per gemspec). CI runs against Ruby 3.3.1.
17
+
18
+ ## Architecture
19
+
20
+ This gem is a **Ruby port of the [josdejong/jsonrepair](https://github.com/josdejong/jsonrepair) TypeScript library**. The upstream version currently mirrored is tracked in `CHANGELOG.md` (presently v3.14.0). When syncing upstream changes, the goal is parity with the JS implementation, not idiomatic refactoring — keep method names, control flow, and repair heuristics aligned with the JS source so future syncs stay tractable.
21
+
22
+ ### Entry point
23
+
24
+ `JSON.repair(str)` in `lib/json/repair.rb` is a thin wrapper that constructs `JSON::Repairer.new(str).repair`. `JSON::JSONRepairError` is the only error raised for unrecoverable inputs.
25
+
26
+ ### The parser (`lib/json/repairer.rb`)
27
+
28
+ A single-pass, hand-written recursive-descent parser. State is three instance variables:
29
+
30
+ - `@json` — the input string (read-only after init).
31
+ - `@index` — the current cursor into `@json`.
32
+ - `@output` — a mutable `+''` buffer that accumulates the *repaired* JSON. The parser writes directly to `@output` as it walks; it does not build an AST.
33
+
34
+ Each `parse_*` method (`parse_value`, `parse_object`, `parse_array`, `parse_string`, `parse_number`, `parse_keywords`, `parse_unquoted_string`, `parse_regex`, `parse_comment`, `parse_markdown_code_block`, …) follows a contract:
35
+
36
+ 1. Returns truthy if it consumed something, falsy otherwise.
37
+ 2. On success, advances `@index` past the consumed input and appends the *valid* JSON form to `@output`.
38
+ 3. On a recoverable mismatch (missing quote, missing comma, trailing comma, wrong quote style, etc.) it performs an in-place repair on `@output` using helpers like `insert_before_last_whitespace`, `strip_last_occurrence`, or `remove_at_index`.
39
+ 4. On an unrecoverable error it calls one of the `throw_*` helpers, which raise `JSON::JSONRepairError`.
40
+
41
+ Two patterns recur and are worth knowing before editing:
42
+
43
+ - **Backtracking via snapshots.** Methods like `parse_string` capture `i_before = @index` and `o_before = @output.length` before tentatively consuming input. If a later check (e.g. "the end quote turned out not to be a real end quote") fails, they restore both and re-invoke themselves with different flags (e.g. `stop_at_delimiter: true`, `stop_at_index: …`). Preserve this pattern when modifying string/number parsing.
44
+ - **Repair-by-rewriting-tail.** Helpers like `insert_before_last_whitespace(@output, ',')` and `@output = strip_last_occurrence(@output, ',')` patch the already-emitted output to fix things like missing or trailing commas. These run *after* the malformed input has been partially emitted — they are the mechanism for "I now realize that earlier token needed a comma after it."
45
+
46
+ `repair` (the public method) drives `parse_value` then handles top-level concerns: stripping Markdown fences (` ```json ... ``` `), converting newline-delimited JSON at the root into an array, dropping redundant trailing braces/brackets, and rejecting any non-whitespace trailing garbage.
47
+
48
+ ### Shared helpers (`lib/json/repair/string_utils.rb`)
49
+
50
+ `JSON::Repair::StringUtils` is a mixin included into `Repairer`. It holds:
51
+
52
+ - Character constants (`OPENING_BRACE`, `BACKSLASH`, smart-quote variants, special whitespace code points, etc.) used in lieu of magic literals.
53
+ - Character-class predicates (`digit?`, `hex?`, `quote?`, `single_quote_like?`, `delimiter?`, `whitespace?`, `special_whitespace?`, `start_of_value?`, …).
54
+ - The keyword machinery — `parse_keywords` / `parse_keyword` — which converts Python `True`/`False`/`None` and Ruby `nil` into their JSON equivalents in addition to recognizing `true`/`false`/`null`.
55
+ - Output-buffer surgery helpers: `strip_last_occurrence`, `insert_before_last_whitespace`, `remove_at_index`, `ends_with_comma_or_newline?`.
56
+
57
+ Because the mixin reads `@json`, `@index`, and `@output` directly (notably inside `parse_keyword`), it is **not standalone** — it is coupled to `Repairer`'s state and should only be mixed into classes that own those ivars.
58
+
59
+ ### Type signatures (`sig/`)
60
+
61
+ RBS signatures mirror the public surface of `JSON.repair`, `JSON::Repairer`, and `JSON::Repair::StringUtils`. Update them in lockstep with `lib/` changes; the `Steepfile` will surface drift.
62
+
63
+ ### Test layout
64
+
65
+ - `spec/json_spec.rb` — the substantive behavioral suite (700+ examples covering every repair heuristic). New behavior — and every sync from upstream — belongs here.
66
+ - `spec/json/repair_spec.rb` — sanity check on `JSON::Repair::VERSION` only.
67
+ - `.rspec_status` is committed and tracks per-example pass/fail so `--only-failures` / `--next-failure` work across runs.
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # JSON::Repair [![Gem Version](https://badge.fury.io/rb/json-repair.svg)](https://badge.fury.io/rb/json-repair) [![Build Status](https://github.com/sashazykov/json-repair-rb/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/sashazykov/json-repair-rb/actions)
1
+ # JSON::Repair [![Gem Version](https://badge.fury.io/rb/json-repair.svg)](https://badge.fury.io/rb/json-repair) [![Build Status](https://github.com/sashazykov/json-repair-rb/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/sashazykov/json-repair-rb/actions) [![Stand With Ukraine](https://raw.githubusercontent.com/vshymanskyy/StandWithUkraine/main/badges/StandWithUkraine.svg)](https://stand-with-ukraine.pp.ua)
2
2
 
3
3
  This is a Ruby gem designed to repair broken JSON strings. Inspired by and based on the [jsonrepair js library](https://github.com/josdejong/jsonrepair/). It efficiently handles and corrects malformed JSON data, making it especially useful in scenarios where JSON output from LLMs might not strictly adhere to JSON standards. Whether it's missing quotes, misplaced commas, or unexpected characters, it ensures that the JSON data is valid and can be parsed correctly.
4
4
 
@@ -31,6 +31,21 @@ puts repaired_json # Outputs: {"name": "Alice", "age": 25}
31
31
 
32
32
  The `repair` method takes a string containing JSON data and returns a corrected version of this string, ensuring it is valid JSON.
33
33
 
34
+ ## Command line
35
+
36
+ The gem ships a `json-repair` executable. It reads from stdin or a file and writes to stdout, `--output FILE`, or back over the input file with `--overwrite`.
37
+
38
+ ```bash
39
+ $ echo '{a:1,}' | json-repair
40
+ {"a":1}
41
+
42
+ $ json-repair broken.json
43
+ $ json-repair broken.json -o fixed.json
44
+ $ json-repair broken.json --overwrite
45
+ ```
46
+
47
+ Run `json-repair --help` for the full list of options.
48
+
34
49
  ## Development
35
50
 
36
51
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
data/Steepfile ADDED
@@ -0,0 +1,6 @@
1
+ # frozen_string_literal: true
2
+
3
+ target :lib do
4
+ signature 'sig'
5
+ check 'lib'
6
+ end
data/exe/json-repair ADDED
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'json/repair/cli'
5
+
6
+ exit JSON::Repair::CLI.call(ARGV)
@@ -0,0 +1,133 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'fileutils'
4
+ require 'optparse'
5
+ require 'tempfile'
6
+ require_relative '../repair'
7
+
8
+ module JSON
9
+ module Repair
10
+ class CLI
11
+ def self.call(argv, stdin: $stdin, stdout: $stdout, stderr: $stderr)
12
+ new(stdin: stdin, stdout: stdout, stderr: stderr).call(argv)
13
+ end
14
+
15
+ def initialize(stdin: $stdin, stdout: $stdout, stderr: $stderr)
16
+ @stdin = stdin
17
+ @stdout = stdout
18
+ @stderr = stderr
19
+ end
20
+
21
+ # Reset per-invocation state so a single instance can be safely reused
22
+ # (e.g. `cli = CLI.new; cli.call(['-v']); cli.call(['x'])`).
23
+ def call(argv)
24
+ @output_path = @halt = nil
25
+ @overwrite = false
26
+ run(argv)
27
+ rescue OptionParser::ParseError, JSON::JSONRepairError, SystemCallError, IOError,
28
+ SystemStackError => e
29
+ @stderr.puts "json-repair: #{e.message}"
30
+ 1
31
+ end
32
+
33
+ private
34
+
35
+ def run(argv)
36
+ positional = catch(:halt) { parser.parse(argv) }
37
+ return @halt if @halt
38
+
39
+ input_path = positional.first
40
+ return 1 unless validate(positional, input_path)
41
+
42
+ repaired = JSON.repair(read_input(input_path))
43
+ write_output(repaired, input_path)
44
+ 0
45
+ end
46
+
47
+ def validate(positional, input_path)
48
+ error = validation_error(positional, input_path)
49
+ return true unless error
50
+
51
+ @stderr.puts "json-repair: #{error}"
52
+ false
53
+ end
54
+
55
+ def validation_error(positional, input_path)
56
+ return "unexpected argument: #{positional[1]}" if positional.length > 1
57
+ return '--overwrite requires a filename' if @overwrite && input_path.nil?
58
+ return '--overwrite and --output are mutually exclusive' if @overwrite && @output_path
59
+
60
+ nil
61
+ end
62
+
63
+ def read_input(input_path)
64
+ raw = input_path ? File.read(input_path) : @stdin.read
65
+ raw.force_encoding(Encoding::UTF_8)
66
+ raise JSON::JSONRepairError, 'input is not valid UTF-8' unless raw.valid_encoding?
67
+
68
+ raw
69
+ end
70
+
71
+ def write_output(repaired, input_path)
72
+ if @overwrite
73
+ replace_in_place(input_path, repaired)
74
+ elsif @output_path
75
+ File.write(@output_path, repaired)
76
+ else
77
+ @stdout.write(repaired)
78
+ @stdout.write("\n") unless repaired.end_with?("\n")
79
+ end
80
+ end
81
+
82
+ # Write to a uniquely-named tempfile alongside the input, then move it
83
+ # over the original. Tempfile.create uses O_EXCL + a random suffix, so
84
+ # the temp path is safe against symlink / clobber races; FileUtils.mv
85
+ # with force: true handles cross-device renames and Windows, where
86
+ # File.rename cannot overwrite an existing destination. The original
87
+ # file's mode is preserved (Tempfile defaults to 0600).
88
+ #
89
+ # Symlinks are followed via File.realpath so the underlying file is
90
+ # rewritten in place and the link is left pointing at it; otherwise
91
+ # the rename would replace the link itself with a regular file.
92
+ def replace_in_place(input_path, repaired)
93
+ real_path = File.realpath(input_path)
94
+ original_mode = File.stat(real_path).mode
95
+ Tempfile.create(['json-repair', '.tmp'], File.dirname(real_path)) do |tmp|
96
+ tmp.write(repaired)
97
+ tmp.close
98
+ File.chmod(original_mode, tmp.path)
99
+ FileUtils.mv(tmp.path, real_path, force: true)
100
+ end
101
+ end
102
+
103
+ def parser
104
+ OptionParser.new do |opts|
105
+ opts.banner = 'Usage: json-repair [filename] [options]'
106
+ opts.separator ''
107
+ opts.separator 'Repair a broken JSON document. Reads stdin when no filename is given.'
108
+ opts.separator ''
109
+ define_options(opts)
110
+ end
111
+ end
112
+
113
+ OVERWRITE_DESC = 'Replace the input file in place (requires filename; conflicts with --output)'
114
+ private_constant :OVERWRITE_DESC
115
+
116
+ def define_options(opts)
117
+ opts.on('-o', '--output FILE', 'Write repaired JSON to FILE') { |f| @output_path = f }
118
+ opts.on('--overwrite', OVERWRITE_DESC) { @overwrite = true }
119
+ opts.on('-v', '--version', 'Print version and exit') { halt_with(JSON::Repair::VERSION) }
120
+ opts.on('-h', '--help', 'Print this help and exit') { halt_with(opts.help) }
121
+ end
122
+
123
+ # Print to stdout and short-circuit `parser.parse` so trailing args
124
+ # after --version/--help do not raise OptionParser::ParseError and
125
+ # flip the exit code (the option text promises "...and exit").
126
+ def halt_with(message)
127
+ @stdout.puts message
128
+ @halt = 0
129
+ throw :halt
130
+ end
131
+ end
132
+ end
133
+ end
@@ -35,21 +35,28 @@ module JSON
35
35
  LOWERCASE_E = 'e' # 0x65
36
36
  UPPERCASE_F = 'F' # 0x46
37
37
  LOWERCASE_F = 'f' # 0x66
38
- NON_BREAKING_SPACE = "\u00a0" # 0xa0
39
- EN_QUAD = "\u2000" # 0x2000
40
- HAIR_SPACE = "\u200a" # 0x200a
41
- NARROW_NO_BREAK_SPACE = "\u202f" # 0x202f
42
- MEDIUM_MATHEMATICAL_SPACE = "\u205f" # 0x205f
43
- IDEOGRAPHIC_SPACE = "\u3000" # 0x3000
44
- DOUBLE_QUOTE_LEFT = "\u201c" # 0x201c
45
- DOUBLE_QUOTE_RIGHT = "\u201d" # 0x201d
46
- QUOTE_LEFT = "\u2018" # 0x2018
47
- QUOTE_RIGHT = "\u2019" # 0x2019
38
+ NON_BREAKING_SPACE = ' ' # 0xa0
39
+ MONGOLIAN_VOWEL_SEPARATOR = '᠎' # 0x180e
40
+ EN_QUAD = ' ' # 0x2000
41
+ ZERO_WIDTH_SPACE = '​' # 0x200b
42
+ NARROW_NO_BREAK_SPACE = ' ' # 0x202f
43
+ MEDIUM_MATHEMATICAL_SPACE = ' ' # 0x205f
44
+ IDEOGRAPHIC_SPACE = ' ' # 0x3000
45
+ ZERO_WIDTH_NO_BREAK_SPACE = '' # 0xfeff
46
+ DOUBLE_QUOTE_LEFT = '“' # 0x201c
47
+ DOUBLE_QUOTE_RIGHT = '”' # 0x201d
48
+ QUOTE_LEFT = '‘' # 0x2018
49
+ QUOTE_RIGHT = '’' # 0x2019
48
50
  GRAVE_ACCENT = '`' # 0x0060
49
- ACUTE_ACCENT = "\u00b4" # 0x00b4
51
+ ACUTE_ACCENT = '´' # 0x00b4
50
52
 
51
53
  REGEX_DELIMITER = %r{^[,:\[\]/{}()\n+]+$}
54
+ REGEX_UNQUOTED_STRING_DELIMITER = %r{^[,\[\]/{}\n+]+$}
52
55
  REGEX_START_OF_VALUE = /^[\[{\w-]$/
56
+ # matches "https://" and other schemas
57
+ REGEX_URL_START = %r{^(http|https|ftp|mailto|file|data|irc)://$}
58
+ # matches all valid URL characters EXCEPT "[", "]", and "," (important JSON delimiters)
59
+ REGEX_URL_CHAR = %r{^[A-Za-z0-9\-._~:/?#@!$&'()*+;=]$}
53
60
 
54
61
  # Functions to check character chars
55
62
  def hex?(char)
@@ -70,8 +77,19 @@ module JSON
70
77
  REGEX_DELIMITER.match?(char)
71
78
  end
72
79
 
73
- def delimiter_except_slash?(char)
74
- delimiter?(char) && char != SLASH
80
+ def unquoted_string_delimiter?(char)
81
+ REGEX_UNQUOTED_STRING_DELIMITER.match?(char)
82
+ end
83
+
84
+ REGEX_FUNCTION_NAME_CHAR_START = /\A[a-zA-Z_$]\z/
85
+ REGEX_FUNCTION_NAME_CHAR = /\A[a-zA-Z0-9_$]\z/
86
+
87
+ def function_name_char_start?(char)
88
+ !char.nil? && REGEX_FUNCTION_NAME_CHAR_START.match?(char)
89
+ end
90
+
91
+ def function_name_char?(char)
92
+ !char.nil? && REGEX_FUNCTION_NAME_CHAR.match?(char)
75
93
  end
76
94
 
77
95
  def start_of_value?(char)
@@ -86,11 +104,22 @@ module JSON
86
104
  [SPACE, NEWLINE, TAB, RETURN].include?(char)
87
105
  end
88
106
 
107
+ def whitespace_except_newline?(char)
108
+ [SPACE, TAB, RETURN].include?(char)
109
+ end
110
+
89
111
  def special_whitespace?(char)
112
+ return false unless char
113
+
90
114
  [
91
- NON_BREAKING_SPACE, NARROW_NO_BREAK_SPACE, MEDIUM_MATHEMATICAL_SPACE, IDEOGRAPHIC_SPACE
115
+ NON_BREAKING_SPACE,
116
+ MONGOLIAN_VOWEL_SEPARATOR,
117
+ NARROW_NO_BREAK_SPACE,
118
+ MEDIUM_MATHEMATICAL_SPACE,
119
+ IDEOGRAPHIC_SPACE,
120
+ ZERO_WIDTH_NO_BREAK_SPACE
92
121
  ].include?(char) ||
93
- (char >= EN_QUAD && char <= HAIR_SPACE)
122
+ (char >= EN_QUAD && char <= ZERO_WIDTH_SPACE)
94
123
  end
95
124
 
96
125
  def quote?(char)
@@ -149,7 +178,7 @@ module JSON
149
178
 
150
179
  def parse_keyword(name, value)
151
180
  if @json[@index, name.length] == name
152
- @output += value
181
+ @output << value
153
182
  @index += name.length
154
183
  true
155
184
  else
@@ -161,10 +190,6 @@ module JSON
161
190
  text[0...start] + text[start + count..]
162
191
  end
163
192
 
164
- def function_name?(text)
165
- /^\w+$/.match?(text)
166
- end
167
-
168
193
  def ends_with_comma_or_newline?(text)
169
194
  /[,\n][ \t\r]*$/.match?(text)
170
195
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module JSON
4
4
  module Repair
5
- VERSION = '0.2.0'
5
+ VERSION = '0.4.0'
6
6
  end
7
7
  end