json-repair 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 17c6b285b495a3c053ae838e701205ec682f7648856cadb52d00da5d348f393c
4
+ data.tar.gz: 644acd7c8840e0a1edf4399297c1bbcc17739cdd13ab6ee9102880981a00bc84
5
+ SHA512:
6
+ metadata.gz: 6bbe9f8d1e5558ab344987a867dc1aae6859c7e33b537cf20b78d97b5db4734c33388d7514b5db85decf8ca156d91efd4fe98d7cd8b42bc2c2fa81822a0ce9bd
7
+ data.tar.gz: '0175953daedfe95efb9dc1777d38c2e6034e1822b0463fe50cf52dc946a17cd5a363bab862bc776399bd2e34c249b682298c2989f5ce7fa959cd667c4845d013'
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,42 @@
1
+ AllCops:
2
+ TargetRubyVersion: 3.0
3
+
4
+ Metrics/BlockLength:
5
+ Exclude:
6
+ - spec/**/*
7
+
8
+ Style/Documentation:
9
+ Enabled: false
10
+
11
+ Metrics/ClassLength:
12
+ Exclude:
13
+ - lib/json/repair/repairer.rb
14
+
15
+ Metrics/AbcSize:
16
+ Exclude:
17
+ - lib/json/repair/repairer.rb
18
+
19
+ Metrics/MethodLength:
20
+ Exclude:
21
+ - lib/json/repair/repairer.rb
22
+
23
+ Metrics/CyclomaticComplexity:
24
+ Exclude:
25
+ - lib/json/repair/repairer.rb
26
+
27
+ Metrics/PerceivedComplexity:
28
+ Exclude:
29
+ - lib/json/repair/repairer.rb
30
+
31
+ Metrics/BlockLength:
32
+ Exclude:
33
+ - lib/json/repair/repairer.rb
34
+ - spec/**/*
35
+
36
+ Metrics/BlockNesting:
37
+ Exclude:
38
+ - lib/json/repair/repairer.rb
39
+
40
+ Metrics/ModuleLength:
41
+ Exclude:
42
+ - lib/json/repair/string_utils.rb
data/CHANGELOG.md ADDED
@@ -0,0 +1,5 @@
1
+ # Changes
2
+
3
+ ### 2024-05-23 (0.1.0)
4
+
5
+ * Initial setup
@@ -0,0 +1,84 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
6
+
7
+ We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
8
+
9
+ ## Our Standards
10
+
11
+ Examples of behavior that contributes to a positive environment for our community include:
12
+
13
+ * Demonstrating empathy and kindness toward other people
14
+ * Being respectful of differing opinions, viewpoints, and experiences
15
+ * Giving and gracefully accepting constructive feedback
16
+ * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
17
+ * Focusing on what is best not just for us as individuals, but for the overall community
18
+
19
+ Examples of unacceptable behavior include:
20
+
21
+ * The use of sexualized language or imagery, and sexual attention or
22
+ advances of any kind
23
+ * Trolling, insulting or derogatory comments, and personal or political attacks
24
+ * Public or private harassment
25
+ * Publishing others' private information, such as a physical or email
26
+ address, without their explicit permission
27
+ * Other conduct which could reasonably be considered inappropriate in a
28
+ professional setting
29
+
30
+ ## Enforcement Responsibilities
31
+
32
+ Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
33
+
34
+ Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
35
+
36
+ ## Scope
37
+
38
+ This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
39
+
40
+ ## Enforcement
41
+
42
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at alexandrz@gmail.com. All complaints will be reviewed and investigated promptly and fairly.
43
+
44
+ All community leaders are obligated to respect the privacy and security of the reporter of any incident.
45
+
46
+ ## Enforcement Guidelines
47
+
48
+ Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
49
+
50
+ ### 1. Correction
51
+
52
+ **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
53
+
54
+ **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
55
+
56
+ ### 2. Warning
57
+
58
+ **Community Impact**: A violation through a single incident or series of actions.
59
+
60
+ **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
61
+
62
+ ### 3. Temporary Ban
63
+
64
+ **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
65
+
66
+ **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
67
+
68
+ ### 4. Permanent Ban
69
+
70
+ **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
71
+
72
+ **Consequence**: A permanent ban from any sort of public interaction within the community.
73
+
74
+ ## Attribution
75
+
76
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0,
77
+ available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
78
+
79
+ Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity).
80
+
81
+ [homepage]: https://www.contributor-covenant.org
82
+
83
+ For answers to common questions about this code of conduct, see the FAQ at
84
+ https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
data/LICENSE.txt ADDED
@@ -0,0 +1,7 @@
1
+ The ISC License
2
+
3
+ Copyright (c) 2024 by Aleksandr Zykov
4
+
5
+ Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
6
+
7
+ THE SOFTWARE IS PROVIDED "AS IS" AND ISC DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL ISC BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,50 @@
1
+ # JSON::Repair
2
+
3
+ This is a Ruby gem designed to repair broken JSON strings. Inspired by the [jsonrepair js library](https://github.com/josdejong/jsonrepair/). It efficiently handles and corrects malformed JSON data, making it especially useful in scenarios where JSON output from LLMs might not strictly adhere to JSON standards. Whether it's missing quotes, misplaced commas, or unexpected characters, it ensures that the JSON data is valid and can be parsed correctly.
4
+
5
+ ## Installation
6
+
7
+ Add this gem to your application's Gemfield by executing:
8
+
9
+ ```bash
10
+ $ bundle add json-repair
11
+ ```
12
+
13
+ Alternatively, if you are not using Bundler to manage your dependencies:
14
+
15
+ ```bash
16
+ $ gem install json-repair
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ Using JSON::Repair is straightforward. Simply call the `repair` method with a JSON string as an argument:
22
+
23
+ ```ruby
24
+ require 'json/repair'
25
+
26
+ # Example of repairing a JSON string
27
+ broken_json = '{name: Alice, "age": 25,}'
28
+ repaired_json = JSON::Repair.repair(broken_json)
29
+ puts repaired_json # Outputs: {"name": "Alice", "age": 25}
30
+ ```
31
+
32
+ The `repair` method takes a string containing JSON data and returns a corrected version of this string, ensuring it is valid JSON.
33
+
34
+ ## Development
35
+
36
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
37
+
38
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
39
+
40
+ ## Contributing
41
+
42
+ Bug reports and pull requests are welcome on GitHub at https://github.com/sashazykov/json-repair-rb. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/sashazykov/json-repair-rb/blob/main/CODE_OF_CONDUCT.md).
43
+
44
+ ## License
45
+
46
+ The gem is available as open source under the terms of the [ISC License](https://opensource.org/licenses/ISC).
47
+
48
+ ## Code of Conduct
49
+
50
+ Everyone interacting in the JSON::Repair project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the [code of conduct](https://github.com/sashazykov/json-repair-rb/blob/main/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/gem_tasks'
4
+ require 'rspec/core/rake_task'
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require 'rubocop/rake_task'
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
@@ -0,0 +1,647 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'string_utils'
4
+
5
+ module JSON
6
+ module Repair
7
+ class Repairer
8
+ include StringUtils
9
+
10
+ CONTROL_CHARACTERS = {
11
+ "\b" => '\b',
12
+ "\f" => '\f',
13
+ "\n" => '\n',
14
+ "\r" => '\r',
15
+ "\t" => '\t'
16
+ }.freeze
17
+
18
+ ESCAPE_CHARACTERS = {
19
+ '"' => '"',
20
+ '\\' => '\\',
21
+ '/' => '/',
22
+ 'b' => "\b",
23
+ 'f' => "\f",
24
+ 'n' => "\n",
25
+ 'r' => "\r",
26
+ 't' => "\t"
27
+ }.freeze
28
+
29
+ def initialize(json)
30
+ @json = json
31
+ @index = 0
32
+ @output = ''
33
+ end
34
+
35
+ def repair
36
+ processed = parse_value
37
+
38
+ throw_unexpected_end unless processed
39
+
40
+ processed_comma = parse_character(COMMA)
41
+ parse_whitespace_and_skip_comments if processed_comma
42
+
43
+ if start_of_value?(@json[@index]) && ends_with_comma_or_newline?(@output)
44
+ # start of a new value after end of the root level object: looks like
45
+ # newline delimited JSON -> turn into a root level array
46
+ unless processed_comma
47
+ # repair missing comma
48
+ @output = insert_before_last_whitespace(@output, ',')
49
+ end
50
+
51
+ parse_newline_delimited_json
52
+ elsif processed_comma
53
+ # repair: remove trailing comma
54
+ @output = strip_last_occurrence(@output, ',')
55
+ end
56
+
57
+ # repair redundant end quotes
58
+ while @json[@index] == CLOSING_BRACE || @json[@index] == CLOSING_BRACKET
59
+ @index += 1
60
+ parse_whitespace_and_skip_comments
61
+ end
62
+
63
+ if @index >= @json.length
64
+ # reached the end of the document properly
65
+ return @output
66
+ end
67
+
68
+ throw_unexpected_character
69
+ end
70
+
71
+ private
72
+
73
+ def parse_value
74
+ parse_whitespace_and_skip_comments
75
+ process = parse_object || parse_array || parse_string || parse_number || parse_keywords || parse_unquoted_string
76
+ parse_whitespace_and_skip_comments
77
+
78
+ process
79
+ end
80
+
81
+ def parse_whitespace
82
+ whitespace = ''
83
+ while @json[@index] && (whitespace?(@json[@index]) || special_whitespace?(@json[@index]))
84
+ whitespace += whitespace?(@json[@index]) ? @json[@index] : ' '
85
+
86
+ @index += 1
87
+ end
88
+
89
+ unless whitespace.empty?
90
+ @output += whitespace
91
+ return true
92
+ end
93
+
94
+ false
95
+ end
96
+
97
+ def parse_comment
98
+ if @json[@index] == '/' && @json[@index + 1] == '*'
99
+ # Block comment
100
+ @index += 2
101
+ @index += 1 until @json[@index].nil? || (@json[@index] == '*' && @json[@index + 1] == '/')
102
+ @index += 2
103
+ true
104
+ elsif @json[@index] == '/' && @json[@index + 1] == '/'
105
+ # Line comment
106
+ @index += 2
107
+ @index += 1 until @json[@index].nil? || @json[@index] == "\n"
108
+ true
109
+ else
110
+ false
111
+ end
112
+ end
113
+
114
+ # Parse an object like '{"key": "value"}'
115
+ def parse_object
116
+ return false unless @json[@index] == OPENING_BRACE
117
+
118
+ @output += '{'
119
+ @index += 1
120
+ parse_whitespace_and_skip_comments
121
+
122
+ # repair: skip leading comma like in {, message: "hi"}
123
+ parse_whitespace_and_skip_comments if skip_character(COMMA)
124
+
125
+ initial = true
126
+ while @index < @json.length && @json[@index] != CLOSING_BRACE
127
+ processed_comma = true
128
+ if initial
129
+ initial = false
130
+ else
131
+ processed_comma = parse_character(COMMA)
132
+ unless processed_comma
133
+ # repair missing comma
134
+ @output = insert_before_last_whitespace(@output, ',')
135
+ end
136
+ parse_whitespace_and_skip_comments
137
+ end
138
+
139
+ skip_ellipsis
140
+
141
+ processed_key = parse_string || parse_unquoted_string
142
+ unless processed_key
143
+ if @json[@index] == CLOSING_BRACE || @json[@index] == OPENING_BRACE ||
144
+ @json[@index] == CLOSING_BRACKET || @json[@index] == OPENING_BRACKET ||
145
+ @json[@index].nil?
146
+ # repair trailing comma
147
+ @output = strip_last_occurrence(@output, ',')
148
+ else
149
+ throw_object_key_expected
150
+ end
151
+ break
152
+ end
153
+
154
+ parse_whitespace_and_skip_comments
155
+ processed_colon = parse_character(COLON)
156
+ truncated_text = @index >= @json.length
157
+ unless processed_colon
158
+ if start_of_value?(@json[@index]) || truncated_text
159
+ # repair missing colon
160
+ @output = insert_before_last_whitespace(@output, ':')
161
+ else
162
+ throw_colon_expected
163
+ end
164
+ end
165
+
166
+ processed_value = parse_value
167
+ unless processed_value
168
+ if processed_colon || truncated_text
169
+ # repair missing object value
170
+ @output += 'null'
171
+ else
172
+ throw_colon_expected
173
+ end
174
+ end
175
+ end
176
+
177
+ if @json[@index] == CLOSING_BRACE
178
+ @output += '}'
179
+ @index += 1
180
+ else
181
+ # repair missing end bracket
182
+ @output = insert_before_last_whitespace(@output, '}')
183
+ end
184
+
185
+ true
186
+ end
187
+
188
+ def skip_character(char)
189
+ if @json[@index] == char
190
+ @index += 1
191
+ true
192
+ else
193
+ false
194
+ end
195
+ end
196
+
197
+ # Skip ellipsis like "[1,2,3,...]" or "[1,2,3,...,9]" or "[...,7,8,9]"
198
+ # or a similar construct in objects.
199
+ def skip_ellipsis
200
+ parse_whitespace_and_skip_comments
201
+
202
+ if @json[@index] == DOT &&
203
+ @json[@index + 1] == DOT &&
204
+ @json[@index + 2] == DOT
205
+ # repair: remove the ellipsis (three dots) and optionally a comma
206
+ @index += 3
207
+ parse_whitespace_and_skip_comments
208
+ skip_character(COMMA)
209
+ end
210
+ end
211
+
212
+ # Parse a string enclosed by double quotes "...". Can contain escaped quotes
213
+ # Repair strings enclosed in single quotes or special quotes
214
+ # Repair an escaped string
215
+ #
216
+ # The function can run in two stages:
217
+ # - First, it assumes the string has a valid end quote
218
+ # - If it turns out that the string does not have a valid end quote followed
219
+ # by a delimiter (which should be the case), the function runs again in a
220
+ # more conservative way, stopping the string at the first next delimiter
221
+ # and fixing the string by inserting a quote there.
222
+ def parse_string(stop_at_delimiter: false)
223
+ if @json[@index] == BACKSLASH
224
+ # repair: remove the first escape character
225
+ @index += 1
226
+ skip_escape_chars = true
227
+ end
228
+
229
+ if quote?(@json[@index])
230
+ # double quotes are correct JSON,
231
+ # single quotes come from JavaScript for example, we assume it will have a correct single end quote too
232
+ # otherwise, we will match any double-quote-like start with a double-quote-like end,
233
+ # or any single-quote-like start with a single-quote-like end
234
+ is_end_quote = if double_quote?(@json[@index])
235
+ method(:double_quote?)
236
+ elsif single_quote?(@json[@index])
237
+ method(:single_quote?)
238
+ elsif single_quote_like?(@json[@index])
239
+ method(:single_quote_like?)
240
+ else
241
+ method(:double_quote_like?)
242
+ end
243
+
244
+ i_before = @index
245
+ o_before = @output.length
246
+
247
+ str = '"'
248
+ @index += 1
249
+
250
+ loop do
251
+ if @index >= @json.length
252
+ # end of text, we are missing an end quote
253
+
254
+ i_prev = prev_non_whitespace_index(@index - 1)
255
+ if !stop_at_delimiter && delimiter?(@json[i_prev])
256
+ # if the text ends with a delimiter, like ["hello],
257
+ # so the missing end quote should be inserted before this delimiter
258
+ # retry parsing the string, stopping at the first next delimiter
259
+ @index = i_before
260
+ @output = @output[0...o_before]
261
+
262
+ return parse_string(stop_at_delimiter: true)
263
+ end
264
+
265
+ # repair missing quote
266
+ str = insert_before_last_whitespace(str, '"')
267
+ @output += str
268
+
269
+ return true
270
+ elsif is_end_quote.call(@json[@index])
271
+ # end quote
272
+ i_quote = @index
273
+ o_quote = str.length
274
+ str += '"'
275
+ @index += 1
276
+ @output += str
277
+
278
+ parse_whitespace_and_skip_comments
279
+
280
+ if stop_at_delimiter ||
281
+ @index >= @json.length ||
282
+ delimiter?(@json[@index]) ||
283
+ quote?(@json[@index]) ||
284
+ digit?(@json[@index])
285
+ # The quote is followed by the end of the text, a delimiter, or a next value
286
+ parse_concatenated_string
287
+
288
+ return true
289
+ end
290
+
291
+ if delimiter?(@json[prev_non_whitespace_index(i_quote - 1)])
292
+ # This is not the right end quote: it is preceded by a delimiter,
293
+ # and NOT followed by a delimiter. So, there is an end quote missing
294
+ # parse the string again and then stop at the first next delimiter
295
+ @index = i_before
296
+ @output = @output[...o_before]
297
+
298
+ return parse_string(stop_at_delimiter: true)
299
+ end
300
+
301
+ # revert to right after the quote but before any whitespace, and continue parsing the string
302
+ @output = @output[...o_before]
303
+ @index = i_quote + 1
304
+
305
+ # repair unescaped quote
306
+ str = "#{str[...o_quote]}\\#{str[o_quote..]}"
307
+ elsif stop_at_delimiter && delimiter?(@json[@index])
308
+ # we're in the mode to stop the string at the first delimiter
309
+ # because there is an end quote missing
310
+
311
+ # repair missing quote
312
+ str = insert_before_last_whitespace(str, '"')
313
+ @output += str
314
+
315
+ parse_concatenated_string
316
+
317
+ return true
318
+ elsif @json[@index] == BACKSLASH
319
+ # handle escaped content like \n or \u2605
320
+ char = @json[@index + 1]
321
+ escape_char = ESCAPE_CHARACTERS[char]
322
+ if escape_char
323
+ str += @json[@index, 2]
324
+ @index += 2
325
+ elsif char == 'u'
326
+ j = 2
327
+ j += 1 while j < 6 && @json[@index + j] && hex?(@json[@index + j])
328
+ if j == 6
329
+ str += @json[@index, 6]
330
+ @index += 6
331
+ elsif @index + j >= @json.length
332
+ # repair invalid or truncated unicode char at the end of the text
333
+ # by removing the unicode char and ending the string here
334
+ @index = @json.length
335
+ else
336
+ throw_invalid_unicode_character
337
+ end
338
+ else
339
+ # repair invalid escape character: remove it
340
+ str += char
341
+ @index += 2
342
+ end
343
+ else
344
+ # handle regular characters
345
+ char = @json[@index]
346
+
347
+ if char == DOUBLE_QUOTE && @json[@index - 1] != BACKSLASH
348
+ # repair unescaped double quote
349
+ str += "\\#{char}"
350
+ elsif control_character?(char)
351
+ # unescaped control character
352
+ str += CONTROL_CHARACTERS[char]
353
+ else
354
+ throw_invalid_character(char) unless valid_string_character?(char)
355
+ str += char
356
+ end
357
+
358
+ @index += 1
359
+ end
360
+
361
+ if skip_escape_chars
362
+ # repair: skipped escape character (nothing to do)
363
+ skip_escape_character
364
+ end
365
+ end
366
+ end
367
+
368
+ false
369
+ end
370
+
371
+ # Repair an unquoted string by adding quotes around it
372
+ # Repair a MongoDB function call like NumberLong("2")
373
+ # Repair a JSONP function call like callback({...});
374
+ def parse_unquoted_string
375
+ start = @index
376
+ @index += 1 while @index < @json.length && !delimiter_except_slash?(@json[@index]) && !quote?(@json[@index])
377
+ return if @index <= start
378
+
379
+ if @json[@index] == '(' && function_name?(@json[start...@index].strip)
380
+ # Repair a MongoDB function call like NumberLong("2")
381
+ # Repair a JSONP function call like callback({...});
382
+ @index += 1
383
+
384
+ parse_value
385
+
386
+ if @json[@index] == ')'
387
+ # Repair: skip close bracket of function call
388
+ @index += 1
389
+ # Repair: skip semicolon after JSONP call
390
+ @index += 1 if @json[@index] == ';'
391
+ end
392
+ else
393
+ # Repair unquoted string
394
+ # Also, repair undefined into null
395
+
396
+ # First, go back to prevent getting trailing whitespaces in the string
397
+ @index -= 1 while whitespace?(@json[@index - 1]) && @index.positive?
398
+
399
+ symbol = @json[start...@index]
400
+ @output += symbol == 'undefined' ? 'null' : symbol.inspect
401
+
402
+ if @json[@index] == '"'
403
+ # We had a missing start quote, but now we encountered the end quote, so we can skip that one
404
+ @index += 1
405
+ end
406
+ end
407
+
408
+ true
409
+ end
410
+
411
+ def parse_character(char)
412
+ if @json[@index] == char
413
+ @output += @json[@index]
414
+ @index += 1
415
+ true
416
+ else
417
+ false
418
+ end
419
+ end
420
+
421
+ def parse_whitespace_and_skip_comments
422
+ start = @index
423
+
424
+ changed = parse_whitespace
425
+ loop do
426
+ changed = parse_comment
427
+ changed = parse_whitespace if changed
428
+ break unless changed
429
+ end
430
+
431
+ @index > start
432
+ end
433
+
434
+ # Parse a number like 2.4 or 2.4e6
435
+ def parse_number
436
+ start = @index
437
+ if @json[@index] == '-'
438
+ @index += 1
439
+ if at_end_of_number?
440
+ repair_number_ending_with_numeric_symbol(start)
441
+ return true
442
+ end
443
+ unless digit?(@json[@index])
444
+ @index = start
445
+ return false
446
+ end
447
+ end
448
+
449
+ # Note that in JSON leading zeros like "00789" are not allowed.
450
+ # We will allow all leading zeros here though and at the end of parse_number
451
+ # check against trailing zeros and repair that if needed.
452
+ # Leading zeros can have meaning, so we should not clear them.
453
+ @index += 1 while digit?(@json[@index])
454
+
455
+ if @json[@index] == '.'
456
+ @index += 1
457
+ if at_end_of_number?
458
+ repair_number_ending_with_numeric_symbol(start)
459
+ return true
460
+ end
461
+ unless digit?(@json[@index])
462
+ @index = start
463
+ return false
464
+ end
465
+ @index += 1 while digit?(@json[@index])
466
+ end
467
+
468
+ if @json[@index] && @json[@index].downcase == 'e'
469
+ @index += 1
470
+ @index += 1 if ['-', '+'].include?(@json[@index])
471
+ if at_end_of_number?
472
+ repair_number_ending_with_numeric_symbol(start)
473
+ return true
474
+ end
475
+ unless digit?(@json[@index])
476
+ @index = start
477
+ return false
478
+ end
479
+ @index += 1 while digit?(@json[@index])
480
+ end
481
+
482
+ # if we're not at the end of the number by this point, allow this to be parsed as another type
483
+ unless at_end_of_number?
484
+ @index = start
485
+ return false
486
+ end
487
+
488
+ if @index > start
489
+ # repair a number with leading zeros like "00789"
490
+ num = @json[start...@index]
491
+ has_invalid_leading_zero = num.match?(/^0\d/)
492
+
493
+ @output += has_invalid_leading_zero ? "\"#{num}\"" : num
494
+ return true
495
+ end
496
+
497
+ false
498
+ end
499
+
500
+ def at_end_of_number?
501
+ @index >= @json.length || delimiter?(@json[@index]) || whitespace?(@json[@index])
502
+ end
503
+
504
+ # Parse an array like '["item1", "item2", ...]'
505
+ def parse_array
506
+ if @json[@index] == OPENING_BRACKET
507
+ @output += '['
508
+ @index += 1
509
+ parse_whitespace_and_skip_comments
510
+
511
+ # repair: skip leading comma like in [,1,2,3]
512
+ parse_whitespace_and_skip_comments if skip_character(COMMA)
513
+
514
+ initial = true
515
+ while @index < @json.length && @json[@index] != CLOSING_BRACKET
516
+ if initial
517
+ initial = false
518
+ else
519
+ processed_comma = parse_character(COMMA)
520
+ # repair missing comma
521
+ @output = insert_before_last_whitespace(@output, ',') unless processed_comma
522
+ end
523
+
524
+ skip_ellipsis
525
+
526
+ processed_value = parse_value
527
+ next if processed_value
528
+
529
+ # repair trailing comma
530
+ @output = strip_last_occurrence(@output, ',')
531
+ break
532
+ end
533
+
534
+ if @json[@index] == CLOSING_BRACKET
535
+ @output += ']'
536
+ @index += 1
537
+ else
538
+ # repair missing closing array bracket
539
+ @output = insert_before_last_whitespace(@output, ']')
540
+ end
541
+
542
+ true
543
+ else
544
+ false
545
+ end
546
+ end
547
+
548
+ def prev_non_whitespace_index(start)
549
+ prev = start
550
+ prev -= 1 while prev.positive? && whitespace?(@json[prev])
551
+ prev
552
+ end
553
+
554
+ # Repair concatenated strings like "hello" + "world", change this into "helloworld"
555
+ def parse_concatenated_string
556
+ processed = false
557
+
558
+ parse_whitespace_and_skip_comments
559
+ while @json[@index] == PLUS
560
+ processed = true
561
+ @index += 1
562
+ parse_whitespace_and_skip_comments
563
+
564
+ # repair: remove the end quote of the first string
565
+ @output = strip_last_occurrence(@output, '"', strip_remaining_text: true)
566
+ start = @output.length
567
+ parsed_str = parse_string
568
+ @output = if parsed_str
569
+ # repair: remove the start quote of the second string
570
+ remove_at_index(@output, start, 1)
571
+ else
572
+ # repair: remove the '+' because it is not followed by a string
573
+ insert_before_last_whitespace(@output, '"')
574
+ end
575
+ end
576
+
577
+ processed
578
+ end
579
+
580
+ def repair_number_ending_with_numeric_symbol(start)
581
+ # repair numbers cut off at the end
582
+ # this will only be called when we end after a '.', '-', or 'e' and does not
583
+ # change the number more than it needs to make it valid JSON
584
+ @output += "#{@json[start...@index]}0"
585
+ end
586
+
587
+ # Parse and repair Newline Delimited JSON (NDJSON):
588
+ # multiple JSON objects separated by a newline character
589
+ def parse_newline_delimited_json
590
+ # repair NDJSON
591
+ initial = true
592
+ processed_value = true
593
+ while processed_value
594
+ if initial
595
+ initial = false
596
+ else
597
+ # parse optional comma, insert when missing
598
+ processed_comma = parse_character(COMMA)
599
+ unless processed_comma
600
+ # repair: add missing comma
601
+ @output = insert_before_last_whitespace(@output, ',')
602
+ end
603
+ end
604
+
605
+ processed_value = parse_value
606
+ end
607
+
608
+ unless processed_value
609
+ # repair: remove trailing comma
610
+ @output = strip_last_occurrence(@output, ',')
611
+ end
612
+
613
+ # repair: wrap the output inside array brackets
614
+ @output = "[\n#{@output}\n]"
615
+ end
616
+
617
+ def skip_escape_character
618
+ skip_character(BACKSLASH)
619
+ end
620
+
621
+ def throw_invalid_character(char)
622
+ raise JSONRepairError, "Invalid character #{char.inspect} at index #{@index}"
623
+ end
624
+
625
+ def throw_unexpected_character
626
+ raise JSONRepairError, "Unexpected character #{@json[@index].inspect} at index #{@index}"
627
+ end
628
+
629
+ def throw_unexpected_end
630
+ raise JSONRepairError, 'Unexpected end of json string'
631
+ end
632
+
633
+ def throw_object_key_expected
634
+ raise JSONRepairError, 'Object key expected'
635
+ end
636
+
637
+ def throw_colon_expected
638
+ raise JSONRepairError, 'Colon expected'
639
+ end
640
+
641
+ def throw_invalid_unicode_character
642
+ chars = @json[@index, 6]
643
+ raise JSONRepairError, "Invalid unicode character #{chars.inspect} at index #{@index}"
644
+ end
645
+ end
646
+ end
647
+ end
@@ -0,0 +1,173 @@
1
+ # frozen_string_literal: true
2
+
3
+ module JSON
4
+ module Repair
5
+ module StringUtils
6
+ # Constants for character chars
7
+ BACKSLASH = '\\' # 0x5c
8
+ SLASH = '/' # 0x2f
9
+ ASTERISK = '*' # 0x2a
10
+ OPENING_BRACE = '{' # 0x7b
11
+ CLOSING_BRACE = '}' # 0x7d
12
+ OPENING_BRACKET = '[' # 0x5b
13
+ CLOSING_BRACKET = ']' # 0x5d
14
+ OPEN_PARENTHESIS = '(' # 0x28
15
+ CLOSE_PARENTHESIS = ')' # 0x29
16
+ SPACE = ' ' # 0x20
17
+ NEWLINE = "\n" # 0xa
18
+ TAB = "\t" # 0x9
19
+ RETURN = "\r" # 0xd
20
+ BACKSPACE = "\b" # 0x08
21
+ FORM_FEED = "\f" # 0x0c
22
+ DOUBLE_QUOTE = '"' # 0x0022
23
+ PLUS = '+' # 0x2b
24
+ MINUS = '-' # 0x2d
25
+ QUOTE = "'" # 0x27
26
+ ZERO = '0' # 0x30
27
+ NINE = '9' # 0x39
28
+ COMMA = ',' # 0x2c
29
+ DOT = '.' # 0x2e
30
+ COLON = ':' # 0x3a
31
+ SEMICOLON = ';' # 0x3b
32
+ UPPERCASE_A = 'A' # 0x41
33
+ LOWERCASE_A = 'a' # 0x61
34
+ UPPERCASE_E = 'E' # 0x45
35
+ LOWERCASE_E = 'e' # 0x65
36
+ UPPERCASE_F = 'F' # 0x46
37
+ LOWERCASE_F = 'f' # 0x66
38
+ NON_BREAKING_SPACE = "\u00a0" # 0xa0
39
+ EN_QUAD = "\u2000" # 0x2000
40
+ HAIR_SPACE = "\u200a" # 0x200a
41
+ NARROW_NO_BREAK_SPACE = "\u202f" # 0x202f
42
+ MEDIUM_MATHEMATICAL_SPACE = "\u205f" # 0x205f
43
+ IDEOGRAPHIC_SPACE = "\u3000" # 0x3000
44
+ DOUBLE_QUOTE_LEFT = "\u201c" # 0x201c
45
+ DOUBLE_QUOTE_RIGHT = "\u201d" # 0x201d
46
+ QUOTE_LEFT = "\u2018" # 0x2018
47
+ QUOTE_RIGHT = "\u2019" # 0x2019
48
+ GRAVE_ACCENT = '`' # 0x0060
49
+ ACUTE_ACCENT = "\u00b4" # 0x00b4
50
+
51
+ REGEX_DELIMITER = %r{^[,:\[\]/{}()\n+]+$}
52
+ REGEX_START_OF_VALUE = /^[\[{\w-]$/
53
+
54
+ # Functions to check character chars
55
+ def hex?(char)
56
+ (char >= ZERO && char <= NINE) ||
57
+ (char >= UPPERCASE_A && char <= UPPERCASE_F) ||
58
+ (char >= LOWERCASE_A && char <= LOWERCASE_F)
59
+ end
60
+
61
+ def digit?(char)
62
+ char && char >= ZERO && char <= NINE
63
+ end
64
+
65
+ def valid_string_character?(char)
66
+ char.ord >= 0x20 && char.ord <= 0x10ffff
67
+ end
68
+
69
+ def delimiter?(char)
70
+ REGEX_DELIMITER.match?(char)
71
+ end
72
+
73
+ def delimiter_except_slash?(char)
74
+ delimiter?(char) && char != SLASH
75
+ end
76
+
77
+ def start_of_value?(char)
78
+ REGEX_START_OF_VALUE.match?(char) || (char && quote?(char))
79
+ end
80
+
81
+ def control_character?(char)
82
+ [NEWLINE, RETURN, TAB, BACKSPACE, FORM_FEED].include?(char)
83
+ end
84
+
85
+ def whitespace?(char)
86
+ [SPACE, NEWLINE, TAB, RETURN].include?(char)
87
+ end
88
+
89
+ def special_whitespace?(char)
90
+ [
91
+ NON_BREAKING_SPACE, NARROW_NO_BREAK_SPACE, MEDIUM_MATHEMATICAL_SPACE, IDEOGRAPHIC_SPACE
92
+ ].include?(char) ||
93
+ (char >= EN_QUAD && char <= HAIR_SPACE)
94
+ end
95
+
96
+ def quote?(char)
97
+ double_quote_like?(char) || single_quote_like?(char)
98
+ end
99
+
100
+ def double_quote?(char)
101
+ char == DOUBLE_QUOTE
102
+ end
103
+
104
+ def single_quote?(char)
105
+ char == QUOTE
106
+ end
107
+
108
+ def double_quote_like?(char)
109
+ [DOUBLE_QUOTE, DOUBLE_QUOTE_LEFT, DOUBLE_QUOTE_RIGHT].include?(char)
110
+ end
111
+
112
+ def single_quote_like?(char)
113
+ [QUOTE, QUOTE_LEFT, QUOTE_RIGHT, GRAVE_ACCENT, ACUTE_ACCENT].include?(char)
114
+ end
115
+
116
+ # Strip last occurrence of text_to_strip from text
117
+ def strip_last_occurrence(text, text_to_strip, strip_remaining_text: false)
118
+ index = text.rindex(text_to_strip)
119
+ return text unless index
120
+
121
+ remaining_text = strip_remaining_text ? '' : text[index + 1..]
122
+ text[0...index] + remaining_text
123
+ end
124
+
125
+ def insert_before_last_whitespace(text, text_to_insert)
126
+ index = text.length
127
+
128
+ return text + text_to_insert unless whitespace?(text[index - 1])
129
+
130
+ index -= 1 while whitespace?(text[index - 1])
131
+
132
+ text[0...index] + text_to_insert + text[index..]
133
+ end
134
+
135
+ # Parse keywords true, false, null
136
+ # Repair Python keywords True, False, None
137
+ # Repair Ruby keyword nil
138
+ def parse_keywords
139
+ parse_keyword('true', 'true') ||
140
+ parse_keyword('false', 'false') ||
141
+ parse_keyword('null', 'null') ||
142
+ # Repair Python keywords True, False, None
143
+ parse_keyword('True', 'true') ||
144
+ parse_keyword('False', 'false') ||
145
+ parse_keyword('None', 'null') ||
146
+ # Repair Ruby keyword nil
147
+ parse_keyword('nil', 'null')
148
+ end
149
+
150
+ def parse_keyword(name, value)
151
+ if @json[@index, name.length] == name
152
+ @output += value
153
+ @index += name.length
154
+ true
155
+ else
156
+ false
157
+ end
158
+ end
159
+
160
+ def remove_at_index(text, start, count)
161
+ text[0...start] + text[start + count..]
162
+ end
163
+
164
+ def function_name?(text)
165
+ /^\w+$/.match?(text)
166
+ end
167
+
168
+ def ends_with_comma_or_newline?(text)
169
+ /[,\n][ \t\r]*$/.match?(text)
170
+ end
171
+ end
172
+ end
173
+ end
@@ -0,0 +1,7 @@
1
+ # frozen_string_literal: true
2
+
3
+ module JSON
4
+ module Repair
5
+ VERSION = '0.1.0'
6
+ end
7
+ end
@@ -0,0 +1,14 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'repair/version'
4
+ require_relative 'repair/repairer'
5
+
6
+ module JSON
7
+ module Repair
8
+ class JSONRepairError < StandardError; end
9
+
10
+ def self.repair(json)
11
+ Repairer.new(json).repair
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,7 @@
1
+ module JSON
2
+ module Repair
3
+ VERSION: String
4
+
5
+ def self.repair(String) -> ?String
6
+ end
7
+ end
metadata ADDED
@@ -0,0 +1,59 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: json-repair
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Aleksandr Zykov
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2024-05-24 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: This is a simple gem that repairs broken JSON strings.
14
+ email:
15
+ - alexandrz@gmail.com
16
+ executables: []
17
+ extensions: []
18
+ extra_rdoc_files: []
19
+ files:
20
+ - ".rspec"
21
+ - ".rubocop.yml"
22
+ - CHANGELOG.md
23
+ - CODE_OF_CONDUCT.md
24
+ - LICENSE.txt
25
+ - README.md
26
+ - Rakefile
27
+ - lib/json/repair.rb
28
+ - lib/json/repair/repairer.rb
29
+ - lib/json/repair/string_utils.rb
30
+ - lib/json/repair/version.rb
31
+ - sig/json/repair.rbs
32
+ homepage: https://github.com/sashazykov/json-repair-rb
33
+ licenses:
34
+ - ISC
35
+ metadata:
36
+ allowed_push_host: https://rubygems.org
37
+ homepage_uri: https://github.com/sashazykov/json-repair-rb
38
+ source_code_uri: https://github.com/sashazykov/json-repair-rb
39
+ changelog_uri: https://github.com/sashazykov/json-repair-rb/blob/main/CHANGELOG.md
40
+ post_install_message:
41
+ rdoc_options: []
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - ">="
47
+ - !ruby/object:Gem::Version
48
+ version: 3.0.0
49
+ required_rubygems_version: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ requirements: []
55
+ rubygems_version: 3.5.10
56
+ signing_key:
57
+ specification_version: 4
58
+ summary: Repairs broken JSON strings.
59
+ test_files: []