json_completer 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (5) hide show
  1. checksums.yaml +7 -0
  2. data/LICENSE +21 -0
  3. data/README.md +92 -0
  4. data/lib/json_completer.rb +726 -0
  5. metadata +78 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 2fe03f14437a3cfd88193b2cfc7a7f156e116632b1937a42ea6c4a1aefffa7c2
4
+ data.tar.gz: b20c23c7843a3ff8f18f0110ed824b54d1306eb440309186efc00c71317d33ba
5
+ SHA512:
6
+ metadata.gz: 9815201cb51addf45defae03cb710502ed93091208ec13c404865fdbbd58be2b20773334e3528beef4d82bd93cb3de81d2de22e5ecad996aebdded6e3a138b87
7
+ data.tar.gz: 261db1237466e85281eb969d90b7f6d90555c72955df728d001122db6f0d0cfd1546e399d3f7185d869d0e689108d49cb63347fffd5f5484dfe24c311e22d193
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Aha! Labs Inc.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,92 @@
1
+ # JsonCompleter
2
+
3
+ A Ruby gem that converts partial JSON strings into valid JSON with high-performance incremental parsing. Efficiently processes streaming JSON with O(n) complexity for new data by maintaining parsing state between chunks. Handles truncated primitives, missing values, and unclosed structures without reprocessing previously parsed data.
4
+
5
+ ## Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ ```ruby
10
+ gem 'json_completer'
11
+ ```
12
+
13
+ And then execute:
14
+
15
+ ```bash
16
+ bundle install
17
+ ```
18
+
19
+ Or install it yourself as:
20
+
21
+ ```bash
22
+ gem install json_completer
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ ### Basic Usage
28
+
29
+ Complete partial JSON strings in one call:
30
+
31
+ ```ruby
32
+ require 'json_completer'
33
+
34
+ # Complete truncated JSON
35
+ JsonCompleter.complete('{"name": "John", "age":')
36
+ # => '{"name": "John", "age": null}'
37
+
38
+ # Handle incomplete strings
39
+ JsonCompleter.complete('{"message": "Hello wo')
40
+ # => '{"message": "Hello wo"}'
41
+
42
+ # Fix unclosed structures
43
+ JsonCompleter.complete('[1, 2, {"key": "value"')
44
+ # => '[1, 2, {"key": "value"}]'
45
+ ```
46
+
47
+ ### Incremental Processing
48
+
49
+ For streaming scenarios where JSON arrives in chunks. Each call processes only new data (O(n) complexity) by maintaining parsing state, making it highly efficient for large streaming responses:
50
+
51
+ ```ruby
52
+ completer = JsonCompleter.new
53
+
54
+ # Process first chunk
55
+ result1 = completer.complete('{"users": [{"name": "')
56
+ # => '{"users": [{"name": ""}]}'
57
+
58
+ # Process additional data
59
+ result2 = completer.complete('{"users": [{"name": "Alice"}')
60
+ # => '{"users": [{"name": "Alice"}]}'
61
+
62
+ # Final complete JSON
63
+ result3 = completer.complete('{"users": [{"name": "Alice"}, {"name": "Bob"}]}')
64
+ # => '{"users": [{"name": "Alice"}, {"name": "Bob"}]}'
65
+ ```
66
+
67
+ #### Performance Characteristics
68
+
69
+ - **Zero reprocessing**: Maintains parsing state to avoid reparsing previously processed data
70
+ - **Linear complexity**: Each chunk processed in O(n) time where n = new data size, not total size
71
+ - **Memory efficient**: Uses token-based accumulation with minimal state overhead
72
+ - **Context preservation**: Tracks nested structures without full document analysis
73
+
74
+ ### Common Use Cases
75
+
76
+ - **High-performance streaming JSON**: Process large JSON responses efficiently as data arrives over network connections
77
+ - **Truncated API responses**: Complete JSON that was cut off due to size limits
78
+ - **Log parsing**: Handle incomplete JSON entries in log files
79
+
80
+ ## Contributing
81
+
82
+ 1. Fork the repository
83
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
84
+ 3. Make your changes and add tests
85
+ 4. Run the test suite (`bundle exec rspec`)
86
+ 5. Commit your changes (`git commit -am 'Add some feature'`)
87
+ 6. Push to the branch (`git push origin my-new-feature`)
88
+ 7. Create a new Pull Request
89
+
90
+ ## License
91
+
92
+ This gem is available as open source under the terms of the [MIT License](LICENSE).
@@ -0,0 +1,726 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'stringio'
4
+
5
+ # JsonCompleter attempts to turn partial JSON strings into valid JSON.
6
+ # It handles incomplete primitives, missing values, and unclosed structures.
7
+ class JsonCompleter
8
+ STRUCTURE_CHARS = ['[', '{', ',', ':'].to_set.freeze
9
+ KEYWORD_MAP = { 't' => 'true', 'f' => 'false', 'n' => 'null' }.freeze
10
+ VALID_PRIMITIVES = %w[true false null].to_set.freeze
11
+
12
+ # Parsing state for incremental processing
13
+ ParsingState = Struct.new(
14
+ :output_tokens, :context_stack, :last_index, :input_length,
15
+ :incomplete_string_start, :incomplete_string_buffer,
16
+ :incomplete_string_escape_state, keyword_init: true
17
+ ) do
18
+ def initialize(
19
+ output_tokens: [], context_stack: [], last_index: 0, input_length: 0,
20
+ incomplete_string_start: nil, incomplete_string_buffer: nil,
21
+ incomplete_string_escape_state: nil
22
+ )
23
+ super
24
+ end
25
+ end
26
+
27
+ def self.complete(partial_json)
28
+ new.complete(partial_json)
29
+ end
30
+
31
+ # Creates a new parsing state for incremental processing
32
+ def self.new_state
33
+ ParsingState.new
34
+ end
35
+
36
+ def initialize(state = self.class.new_state)
37
+ @state = state
38
+ end
39
+
40
+ # Incrementally completes JSON using previous parsing state to avoid reprocessing.
41
+ #
42
+ # @param partial_json [String] The current partial JSON string (full accumulated input).
43
+ # @return [String] Completed JSON.
44
+ def complete(partial_json)
45
+ input = partial_json
46
+
47
+ # Initialize or reuse state
48
+ if @state.nil? || @state.input_length > input.length
49
+ # Fresh start or input was truncated - start over
50
+ @state = ParsingState.new
51
+ end
52
+
53
+ return input if input.empty?
54
+ return input if valid_json_primitive_or_document?(input)
55
+
56
+ # If input hasn't grown since last time, just return completed version of existing state
57
+ if @state.input_length == input.length && !@state.output_tokens.empty?
58
+ return finalize_completion(@state.output_tokens.dup, @state.context_stack.dup)
59
+ end
60
+
61
+ # Handle incomplete string from previous state
62
+ output_tokens = @state.output_tokens.dup
63
+ context_stack = @state.context_stack.dup
64
+ index = @state.last_index
65
+ length = input.length
66
+ incomplete_string_start = nil
67
+ incomplete_string_buffer = nil
68
+ incomplete_string_escape_state = nil
69
+
70
+ # If we had an incomplete string, continue from where we left off
71
+ if @state.incomplete_string_start
72
+ incomplete_string_start = @state.incomplete_string_start
73
+ incomplete_string_buffer = @state.incomplete_string_buffer || StringIO.new('"')
74
+ incomplete_string_escape_state = @state.incomplete_string_escape_state
75
+ # Remove the auto-completed string from output_tokens since we'll add the real one
76
+ output_tokens.pop if output_tokens.last&.start_with?('"') && output_tokens.last.end_with?('"')
77
+ end
78
+
79
+ # Process from the current index
80
+ while index < length
81
+ # Special case: continuing an incomplete string
82
+ if incomplete_string_buffer && index == @state.last_index
83
+ str_value, new_index, terminated, new_buffer, new_escape_state = continue_parsing_string(
84
+ input, incomplete_string_buffer, incomplete_string_escape_state
85
+ )
86
+ if terminated
87
+ output_tokens << str_value
88
+ incomplete_string_start = nil
89
+ incomplete_string_buffer = nil
90
+ incomplete_string_escape_state = nil
91
+ # Continue processing from where string ended
92
+ index = new_index
93
+ else
94
+ # String still incomplete, save state
95
+ incomplete_string_buffer = new_buffer
96
+ incomplete_string_escape_state = new_escape_state
97
+ # We've consumed everything
98
+ index = length
99
+ end
100
+ next
101
+ end
102
+
103
+ char = input[index]
104
+ last_significant_char_in_output = get_last_significant_char(output_tokens)
105
+
106
+ case char
107
+ when '{'
108
+ ensure_comma_before_new_item(output_tokens, context_stack, last_significant_char_in_output)
109
+ ensure_colon_if_value_expected(output_tokens, context_stack, last_significant_char_in_output)
110
+ output_tokens << char
111
+ context_stack << '{'
112
+ index += 1
113
+ when '['
114
+ ensure_comma_before_new_item(output_tokens, context_stack, last_significant_char_in_output)
115
+ ensure_colon_if_value_expected(output_tokens, context_stack, last_significant_char_in_output)
116
+ output_tokens << char
117
+ context_stack << '['
118
+ index += 1
119
+ when '}'
120
+ # Do not repair missing object values - preserve invalid JSON
121
+ remove_trailing_comma(output_tokens)
122
+ output_tokens << char
123
+ context_stack.pop if !context_stack.empty? && context_stack.last == '{'
124
+ index += 1
125
+ when ']'
126
+ # Do not repair trailing commas in arrays - preserve invalid JSON
127
+ output_tokens << char
128
+ context_stack.pop if !context_stack.empty? && context_stack.last == '['
129
+ index += 1
130
+ when '"' # Start of a string (key or value)
131
+ # Start of a new string (incomplete strings are handled at the top of the loop)
132
+ ensure_comma_before_new_item(output_tokens, context_stack, last_significant_char_in_output)
133
+ ensure_colon_if_value_expected(output_tokens, context_stack, last_significant_char_in_output)
134
+
135
+ string_start_index = index
136
+ str_value, consumed, terminated, new_buffer, new_escape_state = parse_string_with_state(input, index)
137
+
138
+ if terminated
139
+ output_tokens << str_value
140
+ incomplete_string_start = nil
141
+ incomplete_string_buffer = nil
142
+ incomplete_string_escape_state = nil
143
+ else
144
+ # String incomplete, save state for next call
145
+ # Don't add to output_tokens yet - will be added during finalization
146
+ incomplete_string_start = string_start_index
147
+ incomplete_string_buffer = new_buffer
148
+ incomplete_string_escape_state = new_escape_state
149
+ end
150
+ index += consumed
151
+ when ':'
152
+ # If the char before ':' was a comma, it's likely {"a":1, :"b":2} which is invalid.
153
+ # Or if it was an opening brace/bracket.
154
+ # Standard JSON doesn't allow this, but we aim to fix.
155
+ # A colon should typically follow a string key.
156
+ # If last char in output was a comma, remove it.
157
+ remove_trailing_comma(output_tokens) if last_significant_char_in_output == ','
158
+ output_tokens << char
159
+ index += 1
160
+ when ','
161
+ # Handle cases like `[,` or `{,` or `,,` but do NOT repair `{"key":,` (missing object values)
162
+ # if last_significant_char_in_output && STRUCTURE_CHARS.include?(last_significant_char_in_output) && last_significant_char_in_output != ':'
163
+ # output_tokens << 'null'
164
+ # end
165
+ remove_trailing_comma(output_tokens) # Avoid double commas
166
+ output_tokens << char
167
+ index += 1
168
+ when 't', 'f', 'n' # true, false, null
169
+ ensure_comma_before_new_item(output_tokens, context_stack, last_significant_char_in_output)
170
+ ensure_colon_if_value_expected(output_tokens, context_stack, last_significant_char_in_output)
171
+
172
+ keyword_val, consumed = consume_and_complete_keyword(input, index, KEYWORD_MAP[char.downcase])
173
+ output_tokens << keyword_val
174
+ index += consumed
175
+ when '-', '0'..'9' # Number
176
+ ensure_comma_before_new_item(output_tokens, context_stack, last_significant_char_in_output)
177
+ ensure_colon_if_value_expected(output_tokens, context_stack, last_significant_char_in_output)
178
+
179
+ num_str, consumed = parse_number(input, index)
180
+ output_tokens << num_str
181
+ index += consumed
182
+ when /\s/ # Whitespace
183
+ # Preserve whitespace as-is
184
+ output_tokens << char
185
+ index += 1
186
+ else # Unknown characters
187
+ # For now, skip unknown characters as they are not part of JSON structure.
188
+ # More advanced handling could try to wrap them in strings if contextually appropriate.
189
+ index += 1
190
+ end
191
+ end
192
+
193
+ # Update state
194
+ updated_state = ParsingState.new(
195
+ output_tokens: output_tokens,
196
+ context_stack: context_stack,
197
+ last_index: index,
198
+ input_length: length,
199
+ incomplete_string_start: incomplete_string_start,
200
+ incomplete_string_buffer: incomplete_string_buffer,
201
+ incomplete_string_escape_state: incomplete_string_escape_state
202
+ )
203
+
204
+ # Return completed JSON and updated state
205
+ completed_json = finalize_completion(output_tokens.dup, context_stack.dup, incomplete_string_buffer)
206
+ @state = updated_state
207
+
208
+ completed_json
209
+ end
210
+
211
+ private
212
+
213
+ # Finalizes the completion by handling post-processing and cleanup
214
+ def finalize_completion(output_tokens, context_stack, incomplete_string_buffer = nil)
215
+ # If we have an incomplete string buffer, add it with closing quote
216
+ if incomplete_string_buffer
217
+ buffer_str = incomplete_string_buffer.string
218
+ # Remove incomplete escape sequences at the end
219
+
220
+ # Count consecutive trailing backslashes
221
+ trailing_backslashes = 0
222
+ idx = buffer_str.length - 1
223
+ while idx >= 0 && buffer_str[idx] == '\\'
224
+ trailing_backslashes += 1
225
+ idx -= 1
226
+ end
227
+
228
+ # If odd number of trailing backslashes, remove the last one (incomplete escape)
229
+ # If even number, they're all paired as escaped backslashes, don't remove any
230
+ buffer_str = buffer_str[0...-1] if trailing_backslashes.odd?
231
+
232
+ # Check for incomplete unicode escape after handling backslashes
233
+ if buffer_str =~ /\\u[0-9a-fA-F]{0,3}\z/ # Incomplete unicode
234
+ buffer_str = buffer_str.sub(/\\u[0-9a-fA-F]{0,3}\z/, '')
235
+ end
236
+
237
+ # Always add closing quote for incomplete strings
238
+ # (incomplete_string_buffer only exists when string wasn't terminated)
239
+ buffer_str += '"'
240
+ output_tokens << buffer_str
241
+ end
242
+
243
+ # Post-loop cleanup and final completions
244
+ last_sig_char_final = get_last_significant_char(output_tokens)
245
+
246
+ # If the last significant character suggests an incomplete structure:
247
+ unless context_stack.empty?
248
+ current_ctx = context_stack.last
249
+ if current_ctx == '{' # Inside an object
250
+ if last_sig_char_final == '"' # Just a key, e.g., {"key"
251
+ # Check if this is a key (not a value) by looking at the context
252
+ # If the previous significant character before this string was '{' or ',', it's a key
253
+ prev_sig_char = get_previous_significant_char(output_tokens)
254
+ output_tokens << ':' << 'null' if ['{', ','].include?(prev_sig_char)
255
+ elsif last_sig_char_final == ':' # Key with colon, e.g., {"key":
256
+ output_tokens << 'null'
257
+ end
258
+ elsif current_ctx == '[' # Inside an array
259
+ output_tokens << 'null' if last_sig_char_final == ',' # Value then comma, e.g., [1,
260
+ end
261
+ end
262
+
263
+ # Close any remaining open structures
264
+ until context_stack.empty?
265
+ opener = context_stack.pop
266
+ remove_trailing_comma(output_tokens) # Clean up before closing
267
+ output_tokens << (opener == '{' ? '}' : ']')
268
+ end
269
+
270
+ # Join tokens. A simple join might not be ideal for formatting.
271
+ # A more sophisticated join would handle spaces around colons/commas.
272
+ # For basic validity, this should be okay.
273
+ reassembled_json = output_tokens.join
274
+
275
+ # Final check: if the reassembled JSON is just a standalone comma or colon, it's invalid.
276
+ # Return something more sensible like "null" or empty string.
277
+ return 'null' if reassembled_json.match?(/\A\s*[,:]\s*\z/)
278
+
279
+ reassembled_json
280
+ end
281
+
282
+ # Parses a new JSON string and returns parsing state for incremental processing
283
+ # Returns [string_value, consumed_characters, was_terminated, buffer, escape_state]
284
+ def parse_string_with_state(input, index)
285
+ start_index = index
286
+ output_str = StringIO.new
287
+ # Initial quote
288
+ output_str << input[index]
289
+ index += 1
290
+ terminated = false
291
+ escape_state = nil
292
+
293
+ while index < input.length
294
+ char = input[index]
295
+
296
+ if escape_state == :backslash
297
+ # We're in an escape sequence
298
+ if char == 'u'
299
+ escape_state = { type: :unicode, hex: String.new }
300
+ output_str << 'u' # Don't double the backslash
301
+ index += 1
302
+ else
303
+ # Regular escape sequence
304
+ output_str << char
305
+ index += 1
306
+ escape_state = nil
307
+ end
308
+ elsif escape_state.is_a?(Hash) && escape_state[:type] == :unicode
309
+ # Collecting unicode hex digits
310
+ if char.match?(/[0-9a-fA-F]/)
311
+ escape_state[:hex] << char
312
+ output_str << char
313
+ index += 1
314
+ if escape_state[:hex].length == 4
315
+ # Unicode escape complete
316
+ escape_state = nil
317
+ end
318
+ else
319
+ # Invalid unicode escape - don't include it and close string
320
+ # Remove the incomplete unicode escape
321
+ str_so_far = output_str.string
322
+ if str_so_far =~ /\\u[0-9a-fA-F]*\z/
323
+ str_so_far = str_so_far.sub(/\\u[0-9a-fA-F]*\z/, '')
324
+ output_str = StringIO.new(str_so_far)
325
+ end
326
+ output_str << '"'
327
+ return [output_str.string, index - start_index, false, nil, nil]
328
+ end
329
+ elsif char == '\\'
330
+ output_str << char
331
+ escape_state = :backslash
332
+ index += 1
333
+ elsif char == '"'
334
+ output_str << char
335
+ terminated = true
336
+ index += 1
337
+ break
338
+ else
339
+ output_str << char
340
+ index += 1
341
+ end
342
+ end
343
+
344
+ if terminated
345
+ [output_str.string, index - start_index, true, nil, nil]
346
+ else
347
+ # String incomplete - DON'T add closing quote here, it will be added during finalization
348
+ [output_str.string, index - start_index, false, output_str, escape_state]
349
+ end
350
+ end
351
+
352
+ # Continues parsing an incomplete string from saved state
353
+ # Returns [string_value, new_index, was_terminated, buffer, escape_state]
354
+ def continue_parsing_string(input, buffer, escape_state)
355
+ # Buffer should not have closing quote - we removed it from parse_string_with_state
356
+
357
+ index = @state.last_index
358
+ terminated = false
359
+
360
+ while index < input.length
361
+ char = input[index]
362
+
363
+ if escape_state == :backslash
364
+ # We're in an escape sequence
365
+ if char == 'u'
366
+ escape_state = { type: :unicode, hex: String.new }
367
+ buffer << 'u' # Don't double the backslash
368
+ index += 1
369
+ else
370
+ # Regular escape sequence
371
+ buffer << char
372
+ index += 1
373
+ escape_state = nil
374
+ end
375
+ elsif escape_state.is_a?(Hash) && escape_state[:type] == :unicode
376
+ # Collecting unicode hex digits
377
+ if char.match?(/[0-9a-fA-F]/)
378
+ escape_state[:hex] << char
379
+ buffer << char
380
+ index += 1
381
+ if escape_state[:hex].length == 4
382
+ # Unicode escape complete
383
+ escape_state = nil
384
+ end
385
+ else
386
+ # Invalid unicode escape - don't include it and close string
387
+ # Remove the incomplete unicode escape
388
+ str_so_far = buffer.string
389
+ if str_so_far =~ /\\u[0-9a-fA-F]*\z/
390
+ str_so_far = str_so_far.sub(/\\u[0-9a-fA-F]*\z/, '')
391
+ buffer = StringIO.new(str_so_far)
392
+ end
393
+ buffer << '"'
394
+ return [buffer.string, index, false, nil, nil]
395
+ end
396
+ elsif char == '\\'
397
+ buffer << char
398
+ escape_state = :backslash
399
+ index += 1
400
+ elsif char == '"'
401
+ buffer << char
402
+ terminated = true
403
+ index += 1
404
+ break
405
+ else
406
+ buffer << char
407
+ index += 1
408
+ end
409
+ end
410
+
411
+ if terminated
412
+ [buffer.string, index, true, nil, nil]
413
+ else
414
+ # String still incomplete - DON'T add quote here
415
+ [buffer.string, index, false, buffer, escape_state]
416
+ end
417
+ end
418
+
419
+ # Parses a JSON string starting at the given index.
420
+ # Handles unterminated strings by closing them.
421
+ # Returns [string_value, consumed_characters, was_terminated]
422
+ def parse_string_with_termination_info(input, index)
423
+ start_index = index
424
+ output_str = StringIO.new
425
+ output_str << input[index] # Initial quote
426
+ index += 1
427
+ terminated = false
428
+
429
+ while index < input.length
430
+ char = input[index]
431
+
432
+ if char == '\\' && index + 1 < input.length
433
+ next_char = input[index + 1]
434
+ if next_char == 'u'
435
+ # Handle unicode escape sequence
436
+ index += 2 # Skip '\u'
437
+ hex_digits = String.new
438
+
439
+ # Collect up to 4 hex digits
440
+ while hex_digits.length < 4 && index < input.length && input[index].match?(/[0-9a-fA-F]/)
441
+ hex_digits << input[index]
442
+ index += 1
443
+ end
444
+
445
+ if hex_digits.length == 4
446
+ # Complete unicode escape
447
+ output_str << '\\u' << hex_digits
448
+ else
449
+ # Incomplete unicode escape - remove it entirely and close string
450
+ output_str << '"'
451
+ return [output_str.string, index - start_index, false]
452
+ end
453
+ else
454
+ # Regular escape sequence
455
+ output_str << char << next_char
456
+ index += 2
457
+ end
458
+ elsif char == '"'
459
+ output_str << char
460
+ terminated = true
461
+ index += 1
462
+ break
463
+ else
464
+ output_str << char
465
+ index += 1
466
+ end
467
+ end
468
+
469
+ output_str << '"' unless terminated # Close if unterminated
470
+ [output_str.string, index - start_index, terminated]
471
+ end
472
+
473
+ # Parses a JSON string starting at the given index.
474
+ # Handles unterminated strings by closing them.
475
+ def parse_string(input, index)
476
+ start_index = index
477
+ output_str = StringIO.new
478
+ output_str << input[index] # Initial quote
479
+ index += 1
480
+ terminated = false
481
+
482
+ while index < input.length
483
+ char = input[index]
484
+
485
+ if char == '\\' && index + 1 < input.length
486
+ next_char = input[index + 1]
487
+ if next_char == 'u'
488
+ # Handle unicode escape sequence
489
+ index += 2 # Skip '\u'
490
+ hex_digits = String.new
491
+
492
+ # Collect up to 4 hex digits
493
+ while hex_digits.length < 4 && index < input.length && input[index].match?(/[0-9a-fA-F]/)
494
+ hex_digits << input[index]
495
+ index += 1
496
+ end
497
+
498
+ if hex_digits.length == 4
499
+ # Complete unicode escape
500
+ output_str << '\\u' << hex_digits
501
+ else
502
+ # Incomplete unicode escape - remove it entirely and close string
503
+ output_str << '"'
504
+ return [output_str.string, index - start_index]
505
+ end
506
+ else
507
+ # Regular escape sequence
508
+ output_str << char << next_char
509
+ index += 2
510
+ end
511
+ elsif char == '"'
512
+ output_str << char
513
+ terminated = true
514
+ index += 1
515
+ break
516
+ else
517
+ output_str << char
518
+ index += 1
519
+ end
520
+ end
521
+
522
+ output_str << '"' unless terminated # Close if unterminated
523
+ [output_str.string, index - start_index]
524
+ end
525
+
526
+ # Parses a JSON number starting at the given index.
527
+ # Completes numbers like "1." to "1.0".
528
+ def parse_number(input, index)
529
+ start_index = index
530
+ num_str = StringIO.new
531
+
532
+ # Optional leading minus
533
+ if input[index] == '-'
534
+ num_str << input[index]
535
+ index += 1
536
+ end
537
+
538
+ # Integer part
539
+ digits_before_dot = false
540
+ while index < input.length && input[index] >= '0' && input[index] <= '9'
541
+ num_str << input[index]
542
+ index += 1
543
+ digits_before_dot = true
544
+ end
545
+
546
+ # Decimal part
547
+ has_dot = false
548
+ if index < input.length && input[index] == '.'
549
+ has_dot = true
550
+ num_str << input[index]
551
+ index += 1
552
+ digits_after_dot = false
553
+ while index < input.length && input[index] >= '0' && input[index] <= '9'
554
+ num_str << input[index]
555
+ index += 1
556
+ digits_after_dot = true
557
+ end
558
+ num_str << '0' unless digits_after_dot # Append '0' if it's just "X." or "."
559
+ end
560
+
561
+ # If it was just "." or "-."
562
+ current_val = num_str.string
563
+ if current_val == '.'
564
+ num_str = StringIO.new # Reset
565
+ num_str << '0.0'
566
+ elsif current_val == '-.'
567
+ num_str = StringIO.new # Reset
568
+ num_str << '-0.0'
569
+ elsif current_val == '-' # Only a minus sign
570
+ num_str = StringIO.new # Reset
571
+ num_str << '0' # Or -0, but JSON standard usually serializes -0 as 0
572
+ elsif !digits_before_dot && has_dot # e.g. ".5" -> "0.5"
573
+ val = num_str.string
574
+ num_str = StringIO.new
575
+ num_str << '0' << val
576
+ end
577
+
578
+ # Exponent part
579
+ if index < input.length && (input[index].downcase == 'e')
580
+ # Check if there was a number before 'e'
581
+ temp_num_val = num_str.string
582
+ if temp_num_val.empty? || temp_num_val == '-' || temp_num_val == '.' || temp_num_val == '-.'
583
+ # Invalid start for exponent, stop number parsing here
584
+ return [
585
+ if temp_num_val == '-'
586
+ '0'
587
+ else
588
+ (temp_num_val.include?('.') ? temp_num_val + '0' : temp_num_val)
589
+ end,
590
+ index - start_index
591
+ ]
592
+ end
593
+
594
+ num_str << input[index] # 'e' or 'E'
595
+ index += 1
596
+ if index < input.length && ['+', '-'].include?(input[index])
597
+ num_str << input[index]
598
+ index += 1
599
+ end
600
+ exponent_digits = false
601
+ while index < input.length && input[index] >= '0' && input[index] <= '9'
602
+ num_str << input[index]
603
+ index += 1
604
+ exponent_digits = true
605
+ end
606
+ # If 'e' was added but no digits followed, it's incomplete.
607
+ # JSON requires digits after 'e'. We might strip 'e' or add '0'.
608
+ # For robustness, let's add '0' if exponent is present but lacks digits.
609
+ num_str << '0' unless exponent_digits
610
+ end
611
+
612
+ final_num_str = num_str.string
613
+ # If the number is empty (e.g. bad start) or just "-", default to "0"
614
+ return ['0', index - start_index] if final_num_str.empty? || final_num_str == '-'
615
+
616
+ [final_num_str, index - start_index]
617
+ end
618
+
619
+ # Consumes characters from input that match the start of a keyword (true, false, null)
620
+ # and returns the completed keyword and number of characters consumed.
621
+ def consume_and_complete_keyword(input, index, target_keyword)
622
+ consumed_count = 0
623
+ (0...target_keyword.length).each do |k_idx|
624
+ break if index + k_idx >= input.length
625
+
626
+ break unless input[index + k_idx].downcase == target_keyword[k_idx]
627
+
628
+ consumed_count += 1
629
+
630
+ # Mismatch
631
+ end
632
+ # If at least the first char matched, we complete to the target_keyword
633
+ return [target_keyword, consumed_count] if consumed_count.positive?
634
+
635
+ # Fallback (should not be reached if called correctly, i.e., input[index] is t,f, or n)
636
+ # This indicates the char was not the start of the expected keyword.
637
+ # This case should be handled by the main loop's "else" (skip unknown char).
638
+ # For safety, if it's called, treat the single char as a token to be skipped later.
639
+ [input[index], 1]
640
+ end
641
+
642
+ # Gets the last non-whitespace character from the output tokens array.
643
+ def get_last_significant_char(output_tokens)
644
+ (output_tokens.length - 1).downto(0) do |i|
645
+ token = output_tokens[i]
646
+ stripped_token = token.strip
647
+ return stripped_token[-1] unless stripped_token.empty?
648
+ end
649
+ nil
650
+ end
651
+
652
+ # Gets the second-to-last non-whitespace character from the output tokens array.
653
+ def get_previous_significant_char(output_tokens)
654
+ significant_chars = []
655
+ (output_tokens.length - 1).downto(0) do |i|
656
+ token = output_tokens[i]
657
+ stripped_token = token.strip
658
+ unless stripped_token.empty?
659
+ significant_chars << stripped_token[-1]
660
+ return significant_chars[1] if significant_chars.length >= 2
661
+ end
662
+ end
663
+ nil
664
+ end
665
+
666
+ # Ensures a comma is added if needed before a new item in an array or object.
667
+ def ensure_comma_before_new_item(output_tokens, context_stack, last_sig_char)
668
+ return if output_tokens.empty? || context_stack.empty? || last_sig_char.nil?
669
+
670
+ # No comma needed right after an opener, a colon, or another comma.
671
+ return if STRUCTURE_CHARS.include?(last_sig_char)
672
+
673
+ # If last_sig_char indicates a completed value/key:
674
+ # (e.g., string quote, true/false/null end, number, or closing bracket/brace)
675
+ # Add a comma if we are in an array or object.
676
+ return unless context_stack.last == '[' || (context_stack.last == '{' && last_sig_char != ':')
677
+
678
+ output_tokens << ','
679
+ end
680
+
681
+ # Ensures a colon is added if a value is expected after a key in an object.
682
+ def ensure_colon_if_value_expected(output_tokens, context_stack, last_sig_char)
683
+ return if output_tokens.empty? || context_stack.empty? || last_sig_char.nil?
684
+
685
+ return unless context_stack.last == '{' && last_sig_char == '"' # In object, and last thing was a key (string)
686
+
687
+ output_tokens << ':'
688
+ end
689
+
690
+ # Removes a trailing comma from the output_tokens if present.
691
+ def remove_trailing_comma(output_tokens)
692
+ last_token_idx = -1
693
+ (output_tokens.length - 1).downto(0) do |i|
694
+ unless output_tokens[i].strip.empty?
695
+ last_token_idx = i
696
+ break
697
+ end
698
+ end
699
+
700
+ return unless last_token_idx != -1 && output_tokens[last_token_idx].strip == ','
701
+
702
+ output_tokens.slice!(last_token_idx)
703
+ # Also remove any whitespace tokens that were before this comma and are now effectively trailing
704
+ while last_token_idx.positive? && output_tokens[last_token_idx - 1].strip.empty?
705
+ output_tokens.slice!(last_token_idx - 1)
706
+ last_token_idx -= 1
707
+ end
708
+ end
709
+
710
+ # Checks if a string is a valid JSON primitive or a complete JSON document.
711
+ # This is a helper for early exit if input is already fine.
712
+ def valid_json_primitive_or_document?(str)
713
+ # Check for simple primitives first
714
+ return true if VALID_PRIMITIVES.include?(str)
715
+ # Check for valid number (simplified regex, full JSON number is complex)
716
+ # Allows integers, floats, but not ending with '.' or 'e'/'E' without digits
717
+ if str.match?(/\A-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?\z/) &&
718
+ !str.end_with?('.') && !str.match?(/[eE][+-]?$/)
719
+ return true
720
+ end
721
+ # Check for valid string literal
722
+ return true if str.match?(/\A"(?:[^"\\]|\\.)*"\z/)
723
+
724
+ false
725
+ end
726
+ end
metadata ADDED
@@ -0,0 +1,78 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: json_completer
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Aha! (www.aha.io)
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2025-09-22 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rspec
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '3.4'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '3.4'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rubocop
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.80'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.80'
41
+ description: A Ruby library that completes incomplete JSON strings by handling truncated
42
+ primitives, missing values, and unclosed structures. Supports incremental processing
43
+ for streaming scenarios.
44
+ email:
45
+ - support@aha.io
46
+ executables: []
47
+ extensions: []
48
+ extra_rdoc_files: []
49
+ files:
50
+ - LICENSE
51
+ - README.md
52
+ - lib/json_completer.rb
53
+ homepage: https://github.com/aha-app/json_completer
54
+ licenses:
55
+ - MIT
56
+ metadata:
57
+ homepage_uri: https://github.com/aha-app/json_completer
58
+ source_code_uri: https://github.com/aha-app/json_completer
59
+ post_install_message:
60
+ rdoc_options: []
61
+ require_paths:
62
+ - lib
63
+ required_ruby_version: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '3.0'
68
+ required_rubygems_version: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - ">="
71
+ - !ruby/object:Gem::Version
72
+ version: '0'
73
+ requirements: []
74
+ rubygems_version: 3.5.11
75
+ signing_key:
76
+ specification_version: 4
77
+ summary: Converts partial JSON strings into valid JSON with incremental parsing support
78
+ test_files: []