json-repair 0.28.3__tar.gz → 0.29.0__tar.gz

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: json_repair
3
- Version: 0.28.3
3
+ Version: 0.29.0
4
4
  Summary: A package to repair broken json strings
5
5
  Author-email: Stefano Baccianella <4247706+mangiucugna@users.noreply.github.com>
6
6
  License: MIT License
@@ -45,6 +45,21 @@ This simple package can be used to fix an invalid json string. To know all cases
45
45
 
46
46
  Inspired by https://github.com/josdejong/jsonrepair
47
47
 
48
+ ---
49
+ # How to cite
50
+ If you are using this library in your academic work (as I know many folks are) please find the BibTex here:
51
+
52
+ @software{Baccianella_JSON_Repair_-_2024,
53
+ author = {Baccianella, Stefano},
54
+ month = aug,
55
+ title = {{JSON Repair - A python module to repair invalid JSON, commonly used to parse the output of LLMs}},
56
+ url = {https://github.com/mangiucugna/json_repair},
57
+ version = {0.28.3},
58
+ year = {2024}
59
+ }
60
+
61
+ Thank you for citing my work and please send me a link to the paper if you can!
62
+
48
63
  ---
49
64
  # Offer me a beer
50
65
  If you find this library useful, you can help me by donating toward my monthly beer budget here: https://github.com/sponsors/mangiucugna
@@ -82,6 +97,18 @@ or just
82
97
 
83
98
  decoded_object = json_repair.repair_json(json_string, return_objects=True)
84
99
 
100
+ ### Avoid this antipattern
101
+ Some users of this library adopt the following pattern:
102
+
103
+ obj = {}
104
+ try:
105
+ obj = json.loads(string)
106
+ except json.JSONDecodeError as e:
107
+ obj = json_repair.loads(string)
108
+ ...
109
+
110
+ This is wasteful because `json_repair` will already verify for you if the JSON is valid, if you still want to do that then add `skip_json_loads=True` to the call as explained the section below.
111
+
85
112
  ### Read json from a file or file descriptor
86
113
 
87
114
  JSON repair provides also a drop-in replacement for `json.load()`:
@@ -122,6 +149,32 @@ Some rules of thumb to use:
122
149
  - Setting `return_objects=True` will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON
123
150
  - `skip_json_loads` is faster only if you 100% know that the string is not a valid JSON
124
151
  - If you are having issues with escaping pass the string as **raw** string like: `r"string with escaping\""`
152
+
153
+ ### Use json_repair from CLI
154
+
155
+ Install the library for command-line with:
156
+ ```
157
+ pipx install json-repair
158
+ ```
159
+ then run
160
+ ```
161
+ $ json_repair -h
162
+
163
+ usage: json_repair [-h] [-i] [--ensure_ascii] [--indent INDENT] filename
164
+
165
+ Repair and parse JSON files.
166
+
167
+ positional arguments:
168
+ filename The JSON file to repair
169
+
170
+ options:
171
+ -h, --help show this help message and exit
172
+ -i, --inline Replace the file inline instead of returning the output to stdout
173
+ --ensure_ascii Pass the ensure_ascii parameter to json.dumps()
174
+ --indent INDENT Number of spaces for indentation (Default 2)
175
+ ```
176
+ to learn how to use it
177
+
125
178
  ## Adding to requirements
126
179
  **Please pin this library only on the major version!**
127
180
 
@@ -7,6 +7,21 @@ This simple package can be used to fix an invalid json string. To know all cases
7
7
 
8
8
  Inspired by https://github.com/josdejong/jsonrepair
9
9
 
10
+ ---
11
+ # How to cite
12
+ If you are using this library in your academic work (as I know many folks are) please find the BibTex here:
13
+
14
+ @software{Baccianella_JSON_Repair_-_2024,
15
+ author = {Baccianella, Stefano},
16
+ month = aug,
17
+ title = {{JSON Repair - A python module to repair invalid JSON, commonly used to parse the output of LLMs}},
18
+ url = {https://github.com/mangiucugna/json_repair},
19
+ version = {0.28.3},
20
+ year = {2024}
21
+ }
22
+
23
+ Thank you for citing my work and please send me a link to the paper if you can!
24
+
10
25
  ---
11
26
  # Offer me a beer
12
27
  If you find this library useful, you can help me by donating toward my monthly beer budget here: https://github.com/sponsors/mangiucugna
@@ -44,6 +59,18 @@ or just
44
59
 
45
60
  decoded_object = json_repair.repair_json(json_string, return_objects=True)
46
61
 
62
+ ### Avoid this antipattern
63
+ Some users of this library adopt the following pattern:
64
+
65
+ obj = {}
66
+ try:
67
+ obj = json.loads(string)
68
+ except json.JSONDecodeError as e:
69
+ obj = json_repair.loads(string)
70
+ ...
71
+
72
+ This is wasteful because `json_repair` will already verify for you if the JSON is valid, if you still want to do that then add `skip_json_loads=True` to the call as explained the section below.
73
+
47
74
  ### Read json from a file or file descriptor
48
75
 
49
76
  JSON repair provides also a drop-in replacement for `json.load()`:
@@ -84,6 +111,32 @@ Some rules of thumb to use:
84
111
  - Setting `return_objects=True` will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON
85
112
  - `skip_json_loads` is faster only if you 100% know that the string is not a valid JSON
86
113
  - If you are having issues with escaping pass the string as **raw** string like: `r"string with escaping\""`
114
+
115
+ ### Use json_repair from CLI
116
+
117
+ Install the library for command-line with:
118
+ ```
119
+ pipx install json-repair
120
+ ```
121
+ then run
122
+ ```
123
+ $ json_repair -h
124
+
125
+ usage: json_repair [-h] [-i] [--ensure_ascii] [--indent INDENT] filename
126
+
127
+ Repair and parse JSON files.
128
+
129
+ positional arguments:
130
+ filename The JSON file to repair
131
+
132
+ options:
133
+ -h, --help show this help message and exit
134
+ -i, --inline Replace the file inline instead of returning the output to stdout
135
+ --ensure_ascii Pass the ensure_ascii parameter to json.dumps()
136
+ --indent INDENT Number of spaces for indentation (Default 2)
137
+ ```
138
+ to learn how to use it
139
+
87
140
  ## Adding to requirements
88
141
  **Please pin this library only on the major version!**
89
142
 
@@ -3,7 +3,7 @@ requires = ["setuptools>=61.0"]
3
3
  build-backend = "setuptools.build_meta"
4
4
  [project]
5
5
  name = "json_repair"
6
- version = "0.28.3"
6
+ version = "0.29.0"
7
7
  license = {file = "LICENSE"}
8
8
  authors = [
9
9
  { name="Stefano Baccianella", email="4247706+mangiucugna@users.noreply.github.com" },
@@ -29,3 +29,5 @@ pythonpath = [
29
29
  "pkgname" = ["py.typed"]
30
30
  [tool.setuptools.packages.find]
31
31
  where = ["src"]
32
+ [project.scripts]
33
+ json_repair = "json_repair.__main__:cli"
@@ -0,0 +1,4 @@
1
+ from .json_repair import cli
2
+
3
+ if __name__ == "__main__":
4
+ cli()
@@ -22,26 +22,62 @@ If something is wrong (a missing parantheses or quotes for example) it will use
22
22
  All supported use cases are in the unit tests
23
23
  """
24
24
 
25
+ import argparse
25
26
  import os
27
+ import sys
26
28
  import json
27
29
  from typing import Any, Dict, List, Optional, Union, TextIO, Tuple, Literal
28
30
 
29
31
 
30
32
  class StringFileWrapper:
31
33
  # This is a trick to simplify the code, transform the filedescriptor handling into a string handling
32
- def __init__(self, fd: TextIO) -> None:
34
+ def __init__(self, fd: TextIO, CHUNK_LENGTH: int) -> None:
33
35
  self.fd = fd
34
36
  self.length: int = 0
37
+ # Buffers are 1MB strings that are read from the file
38
+ # and kept in memory to keep reads low
39
+ self.buffers: dict[int, str] = {}
40
+ # CHUNK_LENGTH is in bytes
41
+ if not CHUNK_LENGTH or CHUNK_LENGTH < 2:
42
+ CHUNK_LENGTH = 1_000_000
43
+ self.buffer_length = CHUNK_LENGTH
44
+
45
+ def get_buffer(self, index: int) -> str:
46
+ if self.buffers.get(index) is None:
47
+ self.fd.seek(index * self.buffer_length)
48
+ self.buffers[index] = self.fd.read(self.buffer_length)
49
+ # Save memory by keeping max 2MB buffer chunks and min 2 chunks
50
+ if len(self.buffers) > max(2, 2_000_000 / self.buffer_length):
51
+ oldest_key = next(iter(self.buffers))
52
+ if oldest_key != index:
53
+ self.buffers.pop(oldest_key)
54
+ return self.buffers[index]
35
55
 
36
56
  def __getitem__(self, index: Union[int, slice]) -> str:
57
+ # The buffer is an array that is seek like a RAM:
58
+ # self.buffers[index]: the row in the array of length 1MB, index is `i` modulo CHUNK_LENGTH
59
+ # self.buffures[index][j]: the column of the row that is `i` remainder CHUNK_LENGTH
37
60
  if isinstance(index, slice):
38
- self.fd.seek(index.start)
39
- value = self.fd.read(index.stop - index.start)
40
- self.fd.seek(index.start)
41
- return value
61
+ buffer_index = index.start // self.buffer_length
62
+ buffer_end = index.stop // self.buffer_length
63
+ if buffer_index == buffer_end:
64
+ return self.get_buffer(buffer_index)[
65
+ index.start % self.buffer_length : index.stop % self.buffer_length
66
+ ]
67
+ else:
68
+ start_slice = self.get_buffer(buffer_index)[
69
+ index.start % self.buffer_length :
70
+ ]
71
+ end_slice = self.get_buffer(buffer_end)[
72
+ : index.stop % self.buffer_length
73
+ ]
74
+ middle_slices = [
75
+ self.get_buffer(i) for i in range(buffer_index + 1, buffer_end)
76
+ ]
77
+ return start_slice + "".join(middle_slices) + end_slice
42
78
  else:
43
- self.fd.seek(index)
44
- return self.fd.read(1)
79
+ buffer_index = index // self.buffer_length
80
+ return self.get_buffer(buffer_index)[index % self.buffer_length]
45
81
 
46
82
  def __len__(self) -> int:
47
83
  if self.length < 1:
@@ -69,13 +105,14 @@ class JSONParser:
69
105
  json_str: Union[str, StringFileWrapper],
70
106
  json_fd: Optional[TextIO],
71
107
  logging: Optional[bool],
108
+ json_fd_chunk_length: int = 0,
72
109
  ) -> None:
73
110
  # The string to parse
74
111
  self.json_str = json_str
75
112
  # Alternatively, the file description with a json file in it
76
113
  if json_fd:
77
114
  # This is a trick we do to treat the file wrapper as an array
78
- self.json_str = StringFileWrapper(json_fd)
115
+ self.json_str = StringFileWrapper(json_fd, json_fd_chunk_length)
79
116
  # Index is our iterator that will keep track of which character we are looking at right now
80
117
  self.index: int = 0
81
118
  # This is used in the object member parsing to manage the special cases of missing quotes in key or value
@@ -639,6 +676,7 @@ def repair_json(
639
676
  logging: bool = False,
640
677
  json_fd: Optional[TextIO] = None,
641
678
  ensure_ascii: bool = True,
679
+ chunk_length: int = 0,
642
680
  ) -> Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]:
643
681
  """
644
682
  Given a json formatted string, it will try to decode it and, if it fails, it will try to fix it.
@@ -647,7 +685,7 @@ def repair_json(
647
685
  When `skip_json_loads=True` is passed, it will not call the built-in json.loads() function
648
686
  When `logging=True` is passed, it will return a tuple with the repaired json and a log of all repair actions
649
687
  """
650
- parser = JSONParser(json_str, json_fd, logging)
688
+ parser = JSONParser(json_str, json_fd, logging, chunk_length)
651
689
  if skip_json_loads:
652
690
  parsed_json = parser.parse()
653
691
  else:
@@ -683,7 +721,10 @@ def loads(
683
721
 
684
722
 
685
723
  def load(
686
- fd: TextIO, skip_json_loads: bool = False, logging: bool = False
724
+ fd: TextIO,
725
+ skip_json_loads: bool = False,
726
+ logging: bool = False,
727
+ chunk_length: int = 0,
687
728
  ) -> Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]:
688
729
  """
689
730
  This function works like `json.load()` except that it will fix your JSON in the process.
@@ -691,6 +732,7 @@ def load(
691
732
  """
692
733
  return repair_json(
693
734
  json_fd=fd,
735
+ chunk_length=chunk_length,
694
736
  return_objects=True,
695
737
  skip_json_loads=skip_json_loads,
696
738
  logging=logging,
@@ -701,12 +743,62 @@ def from_file(
701
743
  filename: str,
702
744
  skip_json_loads: bool = False,
703
745
  logging: bool = False,
746
+ chunk_length: int = 0,
704
747
  ) -> Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]:
705
748
  """
706
749
  This function is a wrapper around `load()` so you can pass the filename as string
707
750
  """
708
751
  fd = open(filename)
709
- jsonobj = load(fd, skip_json_loads, logging)
752
+ jsonobj = load(
753
+ fd=fd,
754
+ skip_json_loads=skip_json_loads,
755
+ logging=logging,
756
+ chunk_length=chunk_length,
757
+ )
710
758
  fd.close()
711
759
 
712
760
  return jsonobj
761
+
762
+
763
+ def cli(): # pragma: no cover
764
+ parser = argparse.ArgumentParser(description="Repair and parse JSON files.")
765
+ parser.add_argument("filename", help="The JSON file to repair")
766
+ parser.add_argument(
767
+ "-i",
768
+ "--inline",
769
+ action="store_true",
770
+ help="Replace the file inline instead of returning the output to stdout",
771
+ )
772
+ parser.add_argument(
773
+ "--ensure_ascii",
774
+ action="store_true",
775
+ help="Pass the ensure_ascii parameter to json.dumps()",
776
+ )
777
+ parser.add_argument(
778
+ "--indent",
779
+ type=int,
780
+ default=2,
781
+ help="Number of spaces for indentation (Default 2)",
782
+ )
783
+
784
+ args = parser.parse_args()
785
+
786
+ ensure_ascii = False
787
+ if args.ensure_ascii:
788
+ ensure_ascii = True
789
+ try:
790
+ result = from_file(args.filename)
791
+
792
+ if args.inline:
793
+ fd = open(args.filename, mode="w")
794
+ json.dump(result, fd, indent=args.indent, ensure_ascii=ensure_ascii)
795
+ fd.close()
796
+ else:
797
+ print(json.dumps(result, indent=args.indent, ensure_ascii=ensure_ascii))
798
+ except Exception as e:
799
+ print(f"Error: {str(e)}", file=sys.stderr)
800
+ sys.exit(1)
801
+
802
+
803
+ if __name__ == "__main__": # pragma: no cover
804
+ cli()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: json_repair
3
- Version: 0.28.3
3
+ Version: 0.29.0
4
4
  Summary: A package to repair broken json strings
5
5
  Author-email: Stefano Baccianella <4247706+mangiucugna@users.noreply.github.com>
6
6
  License: MIT License
@@ -45,6 +45,21 @@ This simple package can be used to fix an invalid json string. To know all cases
45
45
 
46
46
  Inspired by https://github.com/josdejong/jsonrepair
47
47
 
48
+ ---
49
+ # How to cite
50
+ If you are using this library in your academic work (as I know many folks are) please find the BibTex here:
51
+
52
+ @software{Baccianella_JSON_Repair_-_2024,
53
+ author = {Baccianella, Stefano},
54
+ month = aug,
55
+ title = {{JSON Repair - A python module to repair invalid JSON, commonly used to parse the output of LLMs}},
56
+ url = {https://github.com/mangiucugna/json_repair},
57
+ version = {0.28.3},
58
+ year = {2024}
59
+ }
60
+
61
+ Thank you for citing my work and please send me a link to the paper if you can!
62
+
48
63
  ---
49
64
  # Offer me a beer
50
65
  If you find this library useful, you can help me by donating toward my monthly beer budget here: https://github.com/sponsors/mangiucugna
@@ -82,6 +97,18 @@ or just
82
97
 
83
98
  decoded_object = json_repair.repair_json(json_string, return_objects=True)
84
99
 
100
+ ### Avoid this antipattern
101
+ Some users of this library adopt the following pattern:
102
+
103
+ obj = {}
104
+ try:
105
+ obj = json.loads(string)
106
+ except json.JSONDecodeError as e:
107
+ obj = json_repair.loads(string)
108
+ ...
109
+
110
+ This is wasteful because `json_repair` will already verify for you if the JSON is valid, if you still want to do that then add `skip_json_loads=True` to the call as explained the section below.
111
+
85
112
  ### Read json from a file or file descriptor
86
113
 
87
114
  JSON repair provides also a drop-in replacement for `json.load()`:
@@ -122,6 +149,32 @@ Some rules of thumb to use:
122
149
  - Setting `return_objects=True` will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON
123
150
  - `skip_json_loads` is faster only if you 100% know that the string is not a valid JSON
124
151
  - If you are having issues with escaping pass the string as **raw** string like: `r"string with escaping\""`
152
+
153
+ ### Use json_repair from CLI
154
+
155
+ Install the library for command-line with:
156
+ ```
157
+ pipx install json-repair
158
+ ```
159
+ then run
160
+ ```
161
+ $ json_repair -h
162
+
163
+ usage: json_repair [-h] [-i] [--ensure_ascii] [--indent INDENT] filename
164
+
165
+ Repair and parse JSON files.
166
+
167
+ positional arguments:
168
+ filename The JSON file to repair
169
+
170
+ options:
171
+ -h, --help show this help message and exit
172
+ -i, --inline Replace the file inline instead of returning the output to stdout
173
+ --ensure_ascii Pass the ensure_ascii parameter to json.dumps()
174
+ --indent INDENT Number of spaces for indentation (Default 2)
175
+ ```
176
+ to learn how to use it
177
+
125
178
  ## Adding to requirements
126
179
  **Please pin this library only on the major version!**
127
180
 
@@ -2,11 +2,13 @@ LICENSE
2
2
  README.md
3
3
  pyproject.toml
4
4
  src/json_repair/__init__.py
5
+ src/json_repair/__main__.py
5
6
  src/json_repair/json_repair.py
6
7
  src/json_repair/py.typed
7
8
  src/json_repair.egg-info/PKG-INFO
8
9
  src/json_repair.egg-info/SOURCES.txt
9
10
  src/json_repair.egg-info/dependency_links.txt
11
+ src/json_repair.egg-info/entry_points.txt
10
12
  src/json_repair.egg-info/top_level.txt
11
13
  tests/test_coverage.py
12
14
  tests/test_json_repair.py
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ json_repair = json_repair.__main__:cli
@@ -0,0 +1,265 @@
1
+ from src.json_repair.json_repair import from_file, repair_json, loads
2
+
3
+ def test_basic_types_valid():
4
+ assert repair_json("True", return_objects=True) == ""
5
+ assert repair_json("False", return_objects=True) == ""
6
+ assert repair_json("Null", return_objects=True) == ""
7
+ assert repair_json("1", return_objects=True) == 1
8
+ assert repair_json("[]", return_objects=True) == []
9
+ assert repair_json("[1, 2, 3, 4]", return_objects=True) == [1, 2, 3, 4]
10
+ assert repair_json("{}", return_objects=True) == {}
11
+ assert repair_json('{ "key": "value", "key2": 1, "key3": True }', return_objects=True) == { "key": "value", "key2": 1, "key3": True }
12
+
13
+ def test_basic_types_invalid():
14
+ assert repair_json("true", return_objects=True) == True
15
+ assert repair_json("false", return_objects=True) == False
16
+ assert repair_json("null", return_objects=True) == None
17
+ assert repair_json("1.2", return_objects=True) == 1.2
18
+ assert repair_json("[", return_objects=True) == []
19
+ assert repair_json("[1, 2, 3, 4", return_objects=True) == [1, 2, 3, 4]
20
+ assert repair_json("{", return_objects=True) == {}
21
+ assert repair_json('{ "key": value, "key2": 1 "key3": null }', return_objects=True) == { "key": "value", "key2": 1, "key3": None }
22
+
23
+ def test_valid_json():
24
+ assert (
25
+ repair_json('{"name": "John", "age": 30, "city": "New York"}')
26
+ == '{"name": "John", "age": 30, "city": "New York"}'
27
+ )
28
+ assert (
29
+ repair_json('{"employees":["John", "Anna", "Peter"]} ')
30
+ == '{"employees": ["John", "Anna", "Peter"]}'
31
+ )
32
+ assert repair_json('{"key": "value:value"}') == '{"key": "value:value"}'
33
+ assert (
34
+ repair_json('{"text": "The quick brown fox,"}')
35
+ == '{"text": "The quick brown fox,"}'
36
+ )
37
+ assert (
38
+ repair_json('{"text": "The quick brown fox won\'t jump"}')
39
+ == '{"text": "The quick brown fox won\'t jump"}'
40
+ )
41
+ assert repair_json('{"key": ""') == '{"key": ""}'
42
+ assert (
43
+ repair_json('{"key1": {"key2": [1, 2, 3]}}') == '{"key1": {"key2": [1, 2, 3]}}'
44
+ )
45
+ assert (
46
+ repair_json('{"key": 12345678901234567890}') == '{"key": 12345678901234567890}'
47
+ )
48
+ assert repair_json('{"key": "value\u263A"}') == '{"key": "value\\u263a"}'
49
+ assert repair_json('{"key": "value\\nvalue"}') == '{"key": "value\\nvalue"}'
50
+
51
+ def test_brackets_edge_cases():
52
+ assert repair_json("[{]") == "[{}]"
53
+ assert repair_json(" { } ") == "{}"
54
+ assert repair_json("[") == "[]"
55
+ assert repair_json("]") == '""'
56
+ assert repair_json("{") == "{}"
57
+ assert repair_json("}") == '""'
58
+ assert repair_json('{"') == '{}'
59
+ assert repair_json('["') == '[]'
60
+ assert repair_json('{foo: [}') == '{"foo": []}'
61
+
62
+ def test_general_edge_cases():
63
+ assert repair_json("\"") == '""'
64
+ assert repair_json("\n") == '""'
65
+ assert repair_json(" ") == '""'
66
+ assert repair_json("[[1\n\n]") == "[[1]]"
67
+ assert repair_json("string") == '""'
68
+ assert repair_json("stringbeforeobject {}") == '{}'
69
+
70
+ def test_mixed_data_types():
71
+ assert repair_json(' {"key": true, "key2": false, "key3": null}') == '{"key": true, "key2": false, "key3": null}'
72
+ assert repair_json('{"key": TRUE, "key2": FALSE, "key3": Null} ') == '{"key": true, "key2": false, "key3": null}'
73
+
74
+ def test_missing_and_mixed_quotes():
75
+ assert repair_json("{'key': 'string', 'key2': false, \"key3\": null, \"key4\": unquoted}") == '{"key": "string", "key2": false, "key3": null, "key4": "unquoted"}'
76
+ assert (
77
+ repair_json('{"name": "John", "age": 30, "city": "New York')
78
+ == '{"name": "John", "age": 30, "city": "New York"}'
79
+ )
80
+ assert (
81
+ repair_json('{"name": "John", "age": 30, city: "New York"}')
82
+ == '{"name": "John", "age": 30, "city": "New York"}'
83
+ )
84
+ assert (
85
+ repair_json('{"name": "John", "age": 30, "city": New York}')
86
+ == '{"name": "John", "age": 30, "city": "New York"}'
87
+ )
88
+ assert (
89
+ repair_json('{"name": John, "age": 30, "city": "New York"}')
90
+ == '{"name": "John", "age": 30, "city": "New York"}'
91
+ )
92
+ assert repair_json('{“slanted_delimiter”: "value"}') == '{"slanted_delimiter": "value"}'
93
+ assert (
94
+ repair_json('{"name": "John", "age": 30, "city": "New')
95
+ == '{"name": "John", "age": 30, "city": "New"}'
96
+ )
97
+ assert repair_json('[{"key": "value", COMMENT "notes": "lorem "ipsum", sic." }]') == '[{"key": "value", "notes": "lorem \\"ipsum\\", sic."}]'
98
+ assert repair_json('{"key": ""value"}') == '{"key": "value"}'
99
+ assert repair_json('{"key": "value", 5: "value"}') == '{"key": "value", "5": "value"}'
100
+ assert repair_json('{"foo": "\\"bar\\""') == '{"foo": "\\"bar\\""}'
101
+ assert repair_json('{"" key":"val"') == '{" key": "val"}'
102
+ assert repair_json('{"key": value "key2" : "value2" ') == '{"key": "value", "key2": "value2"}'
103
+
104
+ def test_array_edge_cases():
105
+ assert repair_json("[1, 2, 3,") == "[1, 2, 3]"
106
+ assert repair_json("[1, 2, 3, ...]") == "[1, 2, 3]"
107
+ assert repair_json("[1, 2, ... , 3]") == "[1, 2, 3]"
108
+ assert repair_json("[1, 2, '...', 3]") == '[1, 2, "...", 3]'
109
+ assert repair_json("[true, false, null, ...]") == '[true, false, null]'
110
+ assert repair_json('["a" "b" "c" 1') == '["a", "b", "c", 1]'
111
+ assert repair_json('{"employees":["John", "Anna",') == '{"employees": ["John", "Anna"]}'
112
+ assert repair_json('{"employees":["John", "Anna", "Peter') == '{"employees": ["John", "Anna", "Peter"]}'
113
+ assert repair_json('{"key1": {"key2": [1, 2, 3') == '{"key1": {"key2": [1, 2, 3]}}'
114
+
115
+ def test_escaping():
116
+ assert repair_json("'\"'") == '""'
117
+ assert repair_json("{\"key\": 'string\"\n\t\le'") == '{"key": "string\\"\\n\\tle"}'
118
+ assert repair_json(r'{"real_content": "Some string: Some other string \t Some string <a href=\"https://domain.com\">Some link</a>"') == r'{"real_content": "Some string: Some other string \t Some string <a href=\"https://domain.com\">Some link</a>"}'
119
+ assert repair_json('{"key_1\n": "value"}') == '{"key_1": "value"}'
120
+ assert repair_json('{"key\t_": "value"}') == '{"key\\t_": "value"}'
121
+
122
+
123
+ def test_object_edge_cases():
124
+ assert repair_json('{ ') == '{}'
125
+ assert repair_json('{"": "value"') == '{"": "value"}'
126
+ assert repair_json('{"value_1": true, COMMENT "value_2": "data"}') == '{"value_1": true, "value_2": "data"}'
127
+ assert repair_json('{"value_1": true, SHOULD_NOT_EXIST "value_2": "data" AAAA }') == '{"value_1": true, "value_2": "data"}'
128
+ assert repair_json('{"" : true, "key2": "value2"}') == '{"": true, "key2": "value2"}'
129
+ assert repair_json('{""answer"":[{""traits"":''Female aged 60+'',""answer1"":""5""}]}') == '{"answer": [{"traits": "Female aged 60+", "answer1": "5"}]}'
130
+ assert repair_json('{ "words": abcdef", "numbers": 12345", "words2": ghijkl" }') == '{"words": "abcdef", "numbers": 12345, "words2": "ghijkl"}'
131
+ assert repair_json('''{"number": 1,"reason": "According...""ans": "YES"}''') == '{"number": 1, "reason": "According...", "ans": "YES"}'
132
+ assert repair_json('''{ "a" : "{ b": {} }" }''') == '{"a": "{ b"}'
133
+ assert repair_json("""{"b": "xxxxx" true}""") == '{"b": "xxxxx"}'
134
+ assert repair_json('{"key": "Lorem "ipsum" s,"}') == '{"key": "Lorem \\"ipsum\\" s,"}'
135
+ assert repair_json('{"lorem": ipsum, sic, datum.",}') == '{"lorem": "ipsum, sic, datum."}'
136
+ assert repair_json('{"lorem": sic tamet. "ipsum": sic tamet, quick brown fox. "sic": ipsum}') == '{"lorem": "sic tamet.", "ipsum": "sic tamet", "sic": "ipsum"}'
137
+ assert repair_json('{"key":value, " key2":"value2" }') == '{"key": "value", " key2": "value2"}'
138
+ assert repair_json('{"key":value "key2":"value2" }') == '{"key": "value", "key2": "value2"}'
139
+
140
+ def test_number_edge_cases():
141
+ assert repair_json(' - { "test_key": ["test_value", "test_value2"] }') == '{"test_key": ["test_value", "test_value2"]}'
142
+ assert repair_json('{"key": 1/3}') == '{"key": "1/3"}'
143
+ assert repair_json('{"key": .25}') == '{"key": 0.25}'
144
+ assert repair_json('{"here": "now", "key": 1/3, "foo": "bar"}') == '{"here": "now", "key": "1/3", "foo": "bar"}'
145
+ assert repair_json('{"key": 12345/67890}') == '{"key": "12345/67890"}'
146
+ assert repair_json('[105,12') == '[105, 12]'
147
+ assert repair_json('{"key", 105,12,') == '{"key": "105,12"}'
148
+ assert repair_json('{"key": 1/3, "foo": "bar"}') == '{"key": "1/3", "foo": "bar"}'
149
+ assert repair_json('{"key": 10-20}') == '{"key": "10-20"}'
150
+ assert repair_json('{"key": 1.1.1}') == '{"key": "1.1.1"}'
151
+ assert repair_json('[- ') == '[]'
152
+
153
+ def test_markdown():
154
+ assert repair_json('{ "content": "[LINK]("https://google.com")" }') == '{"content": "[LINK](\\"https://google.com\\")"}'
155
+ assert repair_json('{ "content": "[LINK](" }') == '{"content": "[LINK]("}'
156
+ assert repair_json('{ "content": "[LINK](", "key": true }') == '{"content": "[LINK](", "key": true}'
157
+
158
+ def test_leading_trailing_characters():
159
+ assert repair_json('````{ "key": "value" }```') == '{"key": "value"}'
160
+ assert repair_json("""{ "a": "", "b": [ { "c": 1} ] \n}```""") == '{"a": "", "b": [{"c": 1}]}'
161
+ assert repair_json("Based on the information extracted, here is the filled JSON output: ```json { 'a': 'b' } ```") == '{"a": "b"}'
162
+ assert repair_json("""
163
+ The next 64 elements are:
164
+ ```json
165
+ { "key": "value" }
166
+ ```""") == '{"key": "value"}'
167
+ def test_multiple_jsons():
168
+ assert repair_json("[]{}") == "[[], {}]"
169
+ assert repair_json("{}[]{}") == "[{}, [], {}]"
170
+ assert repair_json('{"key":"value"}[1,2,3,True]') == '[{"key": "value"}, [1, 2, 3, true]]'
171
+ assert repair_json('lorem ```json {"key":"value"} ``` ipsum ```json [1,2,3,True] ``` 42') == '[{"key": "value"}, [1, 2, 3, true]]'
172
+
173
+ def test_repair_json_with_objects():
174
+ # Test with valid JSON strings
175
+ assert repair_json("[]", return_objects=True) == []
176
+ assert repair_json("{}", return_objects=True) == {}
177
+ assert repair_json('{"key": true, "key2": false, "key3": null}', return_objects=True) == {"key": True, "key2": False, "key3": None}
178
+ assert repair_json('{"name": "John", "age": 30, "city": "New York"}', return_objects=True) == {
179
+ "name": "John",
180
+ "age": 30,
181
+ "city": "New York",
182
+ }
183
+ assert repair_json("[1, 2, 3, 4]", return_objects=True) == [1, 2, 3, 4]
184
+ assert repair_json('{"employees":["John", "Anna", "Peter"]} ', return_objects=True) == {
185
+ "employees": ["John", "Anna", "Peter"]
186
+ }
187
+ assert repair_json('''
188
+ {
189
+ "resourceType": "Bundle",
190
+ "id": "1",
191
+ "type": "collection",
192
+ "entry": [
193
+ {
194
+ "resource": {
195
+ "resourceType": "Patient",
196
+ "id": "1",
197
+ "name": [
198
+ {"use": "official", "family": "Corwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."},
199
+ {"use": "maiden", "family": "Goodwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."]}
200
+ ]
201
+ }
202
+ }
203
+ ]
204
+ }
205
+ ''', return_objects=True) == {"resourceType": "Bundle", "id": "1", "type": "collection", "entry": [{"resource": {"resourceType": "Patient", "id": "1", "name": [{"use": "official", "family": "Corwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."]}, {"use": "maiden", "family": "Goodwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."]}]}}]}
206
+ assert repair_json('{\n"html": "<h3 id="aaa">Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>"}', return_objects=True) == {'html': '<h3 id="aaa">Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>'}
207
+ assert repair_json("""
208
+ [
209
+ {
210
+ "foo": "Foo bar baz",
211
+ "tag": "#foo-bar-baz"
212
+ },
213
+ {
214
+ "foo": "foo bar "foobar" foo bar baz.",
215
+ "tag": "#foo-bar-foobar"
216
+ }
217
+ ]
218
+ """, return_objects=True) == [{"foo": "Foo bar baz", "tag": "#foo-bar-baz"},{"foo": "foo bar \"foobar\" foo bar baz.", "tag": "#foo-bar-foobar" }]
219
+
220
+ def test_repair_json_skip_json_loads():
221
+ assert repair_json('{"key": true, "key2": false, "key3": null}', skip_json_loads=True) == '{"key": true, "key2": false, "key3": null}'
222
+ assert repair_json('{"key": true, "key2": false, "key3": null}', return_objects=True, skip_json_loads=True) == {"key": True, "key2": False, "key3": None}
223
+ assert repair_json('{"key": true, "key2": false, "key3": }', skip_json_loads=True) == '{"key": true, "key2": false, "key3": ""}'
224
+ assert loads('{"key": true, "key2": false, "key3": }', skip_json_loads=True) == {"key": True, "key2": False, "key3": ""}
225
+
226
+
227
+ def test_repair_json_from_file():
228
+ import os.path
229
+ import pathlib
230
+ import tempfile
231
+
232
+ path = pathlib.Path(__file__).parent.resolve()
233
+
234
+ # Use chunk_length 2 to test the buffering feature
235
+ assert from_file(filename=os.path.join(path,"invalid.json")) == [{"_id": "655b66256574f09bdae8abe8", "index": 0, "guid": "31082ae3-b0f3-4406-90f4-cc450bd4379d", "isActive": False, "balance": "$2,562.78", "picture": "http://placehold.it/32x32", "age": 32, "eyeColor": "brown", "name": "Glover Rivas", "gender": "male", "company": "EMPIRICA", "email": "gloverrivas@empirica.com", "phone": "+1 (842) 507-3063", "address": "536 Montague Terrace, Jenkinsville, Kentucky, 2235", "about": "Mollit consectetur excepteur voluptate tempor dolore ullamco enim irure ullamco non enim officia. Voluptate occaecat proident laboris ea Lorem cupidatat reprehenderit nisi nisi aliqua. Amet nulla ipsum deserunt excepteur amet ad aute aute ex. Et enim minim sit veniam est quis dolor nisi sunt quis eiusmod in. Amet eiusmod cillum sunt occaecat dolor laboris voluptate in eiusmod irure aliqua duis.", "registered": "2023-11-18T09:32:36 -01:00", "latitude": 36.26102, "longitude": -91.304608, "tags": ["non", "tempor", "do", "ullamco", "dolore", "sunt", "ipsum"], "friends": [{"id": 0, "name": "Cara Shepherd"}, {"id": 1, "name": "Mason Farley"}, {"id": 2, "name": "Harriet Cochran"}], "greeting": "Hello, Glover Rivas! You have 7 unread messages.", "favoriteFruit": "strawberry"}, {"_id": "655b662585364bc57278bb6f", "index": 1, "guid": "0dea7a3a-f812-4dde-b78d-7a9b58e5da05", "isActive": True, "balance": "$1,359.48", "picture": "http://placehold.it/32x32", "age": 38, "eyeColor": "brown", "name": "Brandi Moreno", "gender": "female", "company": "MARQET", "email": "brandimoreno@marqet.com", "phone": "+1 (850) 434-2077", "address": "537 Doone Court, Waiohinu, Michigan, 3215", "about": "Irure proident adipisicing do Lorem do incididunt in laborum in eiusmod eiusmod ad elit proident. Eiusmod dolor ex magna magna occaecat. Nulla deserunt velit ex exercitation et irure sunt. Cupidatat ut excepteur ea quis labore sint cupidatat incididunt amet eu consectetur cillum ipsum proident. Occaecat exercitation aute laborum dolor proident reprehenderit laborum in voluptate culpa. Exercitation nulla adipisicing culpa aute est deserunt ea nisi deserunt consequat occaecat ut et non. Incididunt ex exercitation dolor dolor anim cillum dolore.", "registered": "2015-09-03T11:47:15 -02:00", "latitude": -19.768953, "longitude": 8.948458, "tags": ["laboris", "occaecat", "laborum", "laborum", "ex", "cillum", "occaecat"], "friends": [{"id": 0, "name": "Erna Kelly"}, {"id": 1, "name": "Black Mays"}, {"id": 2, "name": "Davis Buck"}], "greeting": "Hello, Brandi Moreno! You have 1 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625870da431bcf5e0c2", "index": 2, "guid": "b17f6e3f-c898-4334-abbf-05cf222f143b", "isActive": False, "balance": "$1,493.77", "picture": "http://placehold.it/32x32", "age": 20, "eyeColor": "brown", "name": "Moody Meadows", "gender": "male", "company": "OPTIQUE", "email": "moodymeadows@optique.com", "phone": "+1 (993) 566-3041", "address": "766 Osborn Street, Bath, Maine, 7666", "about": "Non commodo excepteur nostrud qui adipisicing aliquip dolor minim nulla culpa proident. In ad cupidatat ea mollit ex est do deserunt proident nostrud. Cillum id id eiusmod amet exercitation nostrud cillum sunt deserunt dolore deserunt eiusmod mollit. Ut ex tempor ad laboris voluptate labore id officia fugiat exercitation amet.", "registered": "2015-01-16T02:48:28 -01:00", "latitude": -25.847327, "longitude": 63.95991, "tags": ["aute", "commodo", "adipisicing", "nostrud", "duis", "mollit", "ut"], "friends": [{"id": 0, "name": "Lacey Cash"}, {"id": 1, "name": "Gabrielle Harmon"}, {"id": 2, "name": "Ellis Lambert"}], "greeting": "Hello, Moody Meadows! You have 4 unread messages.", "favoriteFruit": "strawberry"}, {"_id": "655b6625f3e1bf422220854e", "index": 3, "guid": "92229883-2bfd-4974-a08c-1b506b372e46", "isActive": False, "balance": "$2,215.34", "picture": "http://placehold.it/32x32", "age": 22, "eyeColor": "brown", "name": "Heath Nguyen", "gender": "male", "company": "BLEENDOT", "email": "heathnguyen@bleendot.com", "phone": "+1 (989) 512-2797", "address": "135 Milton Street, Graniteville, Nebraska, 276", "about": "Consequat aliquip irure Lorem cupidatat nulla magna ullamco nulla voluptate adipisicing anim consectetur tempor aliquip. Magna aliqua nulla eu tempor esse proident. Proident fugiat ad ex Lorem reprehenderit dolor aliquip labore labore aliquip. Deserunt aute enim ea minim officia anim culpa sint commodo. Cillum consectetur excepteur aliqua exercitation Lorem veniam voluptate.", "registered": "2016-07-06T01:31:07 -02:00", "latitude": -60.997048, "longitude": -102.397885, "tags": ["do", "ad", "consequat", "irure", "tempor", "elit", "minim"], "friends": [{"id": 0, "name": "Walker Hernandez"}, {"id": 1, "name": "Maria Lane"}, {"id": 2, "name": "Mcknight Barron"}], "greeting": "Hello, Heath Nguyen! You have 4 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625519a5b5e4b6742bf", "index": 4, "guid": "c5dc685f-6d0d-4173-b4cf-f5df29a1e8ef", "isActive": True, "balance": "$1,358.90", "picture": "http://placehold.it/32x32", "age": 33, "eyeColor": "brown", "name": "Deidre Duke", "gender": "female", "company": "OATFARM", "email": "deidreduke@oatfarm.com", "phone": "+1 (875) 587-3256", "address": "487 Schaefer Street, Wattsville, West Virginia, 4506", "about": "Laboris eu nulla esse magna sit eu deserunt non est aliqua exercitation commodo. Ad occaecat qui qui laborum dolore anim Lorem. Est qui occaecat irure enim deserunt enim aliqua ex deserunt incididunt esse. Quis in minim laboris proident non mollit. Magna ea do labore commodo. Et elit esse esse occaecat officia ipsum nisi.", "registered": "2021-09-12T04:17:08 -02:00", "latitude": 68.609781, "longitude": -87.509134, "tags": ["mollit", "cupidatat", "irure", "sit", "consequat", "anim", "fugiat"], "friends": [{"id": 0, "name": "Bean Paul"}, {"id": 1, "name": "Cochran Hubbard"}, {"id": 2, "name": "Rodgers Atkinson"}], "greeting": "Hello, Deidre Duke! You have 6 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625a19b3f7e5f82f0ea", "index": 5, "guid": "75f3c264-baa1-47a0-b21c-4edac23d9935", "isActive": True, "balance": "$3,554.36", "picture": "http://placehold.it/32x32", "age": 26, "eyeColor": "blue", "name": "Lydia Holland", "gender": "female", "company": "ESCENTA", "email": "lydiaholland@escenta.com", "phone": "+1 (927) 482-3436", "address": "554 Rockaway Parkway, Kohatk, Montana, 6316", "about": "Consectetur ea est labore commodo laborum mollit pariatur non enim. Est dolore et non laboris tempor. Ea incididunt ut adipisicing cillum labore officia tempor eiusmod commodo. Cillum fugiat ex consectetur ut nostrud anim nostrud exercitation ut duis in ea. Eu et id fugiat est duis eiusmod ullamco quis officia minim sint ea nisi in.", "registered": "2018-03-13T01:48:56 -01:00", "latitude": -88.495799, "longitude": 71.840667, "tags": ["veniam", "minim", "consequat", "consequat", "incididunt", "consequat", "elit"], "friends": [{"id": 0, "name": "Debra Massey"}, {"id": 1, "name": "Weiss Savage"}, {"id": 2, "name": "Shannon Guerra"}], "greeting": "Hello, Lydia Holland! You have 5 unread messages.", "favoriteFruit": "banana"}]
236
+ assert from_file(filename=os.path.join(path,"invalid.json"), chunk_length=2) == [{"_id": "655b66256574f09bdae8abe8", "index": 0, "guid": "31082ae3-b0f3-4406-90f4-cc450bd4379d", "isActive": False, "balance": "$2,562.78", "picture": "http://placehold.it/32x32", "age": 32, "eyeColor": "brown", "name": "Glover Rivas", "gender": "male", "company": "EMPIRICA", "email": "gloverrivas@empirica.com", "phone": "+1 (842) 507-3063", "address": "536 Montague Terrace, Jenkinsville, Kentucky, 2235", "about": "Mollit consectetur excepteur voluptate tempor dolore ullamco enim irure ullamco non enim officia. Voluptate occaecat proident laboris ea Lorem cupidatat reprehenderit nisi nisi aliqua. Amet nulla ipsum deserunt excepteur amet ad aute aute ex. Et enim minim sit veniam est quis dolor nisi sunt quis eiusmod in. Amet eiusmod cillum sunt occaecat dolor laboris voluptate in eiusmod irure aliqua duis.", "registered": "2023-11-18T09:32:36 -01:00", "latitude": 36.26102, "longitude": -91.304608, "tags": ["non", "tempor", "do", "ullamco", "dolore", "sunt", "ipsum"], "friends": [{"id": 0, "name": "Cara Shepherd"}, {"id": 1, "name": "Mason Farley"}, {"id": 2, "name": "Harriet Cochran"}], "greeting": "Hello, Glover Rivas! You have 7 unread messages.", "favoriteFruit": "strawberry"}, {"_id": "655b662585364bc57278bb6f", "index": 1, "guid": "0dea7a3a-f812-4dde-b78d-7a9b58e5da05", "isActive": True, "balance": "$1,359.48", "picture": "http://placehold.it/32x32", "age": 38, "eyeColor": "brown", "name": "Brandi Moreno", "gender": "female", "company": "MARQET", "email": "brandimoreno@marqet.com", "phone": "+1 (850) 434-2077", "address": "537 Doone Court, Waiohinu, Michigan, 3215", "about": "Irure proident adipisicing do Lorem do incididunt in laborum in eiusmod eiusmod ad elit proident. Eiusmod dolor ex magna magna occaecat. Nulla deserunt velit ex exercitation et irure sunt. Cupidatat ut excepteur ea quis labore sint cupidatat incididunt amet eu consectetur cillum ipsum proident. Occaecat exercitation aute laborum dolor proident reprehenderit laborum in voluptate culpa. Exercitation nulla adipisicing culpa aute est deserunt ea nisi deserunt consequat occaecat ut et non. Incididunt ex exercitation dolor dolor anim cillum dolore.", "registered": "2015-09-03T11:47:15 -02:00", "latitude": -19.768953, "longitude": 8.948458, "tags": ["laboris", "occaecat", "laborum", "laborum", "ex", "cillum", "occaecat"], "friends": [{"id": 0, "name": "Erna Kelly"}, {"id": 1, "name": "Black Mays"}, {"id": 2, "name": "Davis Buck"}], "greeting": "Hello, Brandi Moreno! You have 1 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625870da431bcf5e0c2", "index": 2, "guid": "b17f6e3f-c898-4334-abbf-05cf222f143b", "isActive": False, "balance": "$1,493.77", "picture": "http://placehold.it/32x32", "age": 20, "eyeColor": "brown", "name": "Moody Meadows", "gender": "male", "company": "OPTIQUE", "email": "moodymeadows@optique.com", "phone": "+1 (993) 566-3041", "address": "766 Osborn Street, Bath, Maine, 7666", "about": "Non commodo excepteur nostrud qui adipisicing aliquip dolor minim nulla culpa proident. In ad cupidatat ea mollit ex est do deserunt proident nostrud. Cillum id id eiusmod amet exercitation nostrud cillum sunt deserunt dolore deserunt eiusmod mollit. Ut ex tempor ad laboris voluptate labore id officia fugiat exercitation amet.", "registered": "2015-01-16T02:48:28 -01:00", "latitude": -25.847327, "longitude": 63.95991, "tags": ["aute", "commodo", "adipisicing", "nostrud", "duis", "mollit", "ut"], "friends": [{"id": 0, "name": "Lacey Cash"}, {"id": 1, "name": "Gabrielle Harmon"}, {"id": 2, "name": "Ellis Lambert"}], "greeting": "Hello, Moody Meadows! You have 4 unread messages.", "favoriteFruit": "strawberry"}, {"_id": "655b6625f3e1bf422220854e", "index": 3, "guid": "92229883-2bfd-4974-a08c-1b506b372e46", "isActive": False, "balance": "$2,215.34", "picture": "http://placehold.it/32x32", "age": 22, "eyeColor": "brown", "name": "Heath Nguyen", "gender": "male", "company": "BLEENDOT", "email": "heathnguyen@bleendot.com", "phone": "+1 (989) 512-2797", "address": "135 Milton Street, Graniteville, Nebraska, 276", "about": "Consequat aliquip irure Lorem cupidatat nulla magna ullamco nulla voluptate adipisicing anim consectetur tempor aliquip. Magna aliqua nulla eu tempor esse proident. Proident fugiat ad ex Lorem reprehenderit dolor aliquip labore labore aliquip. Deserunt aute enim ea minim officia anim culpa sint commodo. Cillum consectetur excepteur aliqua exercitation Lorem veniam voluptate.", "registered": "2016-07-06T01:31:07 -02:00", "latitude": -60.997048, "longitude": -102.397885, "tags": ["do", "ad", "consequat", "irure", "tempor", "elit", "minim"], "friends": [{"id": 0, "name": "Walker Hernandez"}, {"id": 1, "name": "Maria Lane"}, {"id": 2, "name": "Mcknight Barron"}], "greeting": "Hello, Heath Nguyen! You have 4 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625519a5b5e4b6742bf", "index": 4, "guid": "c5dc685f-6d0d-4173-b4cf-f5df29a1e8ef", "isActive": True, "balance": "$1,358.90", "picture": "http://placehold.it/32x32", "age": 33, "eyeColor": "brown", "name": "Deidre Duke", "gender": "female", "company": "OATFARM", "email": "deidreduke@oatfarm.com", "phone": "+1 (875) 587-3256", "address": "487 Schaefer Street, Wattsville, West Virginia, 4506", "about": "Laboris eu nulla esse magna sit eu deserunt non est aliqua exercitation commodo. Ad occaecat qui qui laborum dolore anim Lorem. Est qui occaecat irure enim deserunt enim aliqua ex deserunt incididunt esse. Quis in minim laboris proident non mollit. Magna ea do labore commodo. Et elit esse esse occaecat officia ipsum nisi.", "registered": "2021-09-12T04:17:08 -02:00", "latitude": 68.609781, "longitude": -87.509134, "tags": ["mollit", "cupidatat", "irure", "sit", "consequat", "anim", "fugiat"], "friends": [{"id": 0, "name": "Bean Paul"}, {"id": 1, "name": "Cochran Hubbard"}, {"id": 2, "name": "Rodgers Atkinson"}], "greeting": "Hello, Deidre Duke! You have 6 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625a19b3f7e5f82f0ea", "index": 5, "guid": "75f3c264-baa1-47a0-b21c-4edac23d9935", "isActive": True, "balance": "$3,554.36", "picture": "http://placehold.it/32x32", "age": 26, "eyeColor": "blue", "name": "Lydia Holland", "gender": "female", "company": "ESCENTA", "email": "lydiaholland@escenta.com", "phone": "+1 (927) 482-3436", "address": "554 Rockaway Parkway, Kohatk, Montana, 6316", "about": "Consectetur ea est labore commodo laborum mollit pariatur non enim. Est dolore et non laboris tempor. Ea incididunt ut adipisicing cillum labore officia tempor eiusmod commodo. Cillum fugiat ex consectetur ut nostrud anim nostrud exercitation ut duis in ea. Eu et id fugiat est duis eiusmod ullamco quis officia minim sint ea nisi in.", "registered": "2018-03-13T01:48:56 -01:00", "latitude": -88.495799, "longitude": 71.840667, "tags": ["veniam", "minim", "consequat", "consequat", "incididunt", "consequat", "elit"], "friends": [{"id": 0, "name": "Debra Massey"}, {"id": 1, "name": "Weiss Savage"}, {"id": 2, "name": "Shannon Guerra"}], "greeting": "Hello, Lydia Holland! You have 5 unread messages.", "favoriteFruit": "banana"}]
237
+
238
+
239
+ # Create a temporary file
240
+ temp_fd, temp_path = tempfile.mkstemp(suffix=".json")
241
+ try:
242
+ # Write content to the temporary file
243
+ with os.fdopen(temp_fd, 'w') as tmp:
244
+ tmp.write("{key:value}")
245
+ assert from_file(filename=temp_path, logging=True) == ({'key': 'value'}, [{'text': 'While parsing a string, we found a literal instead of a quote', 'context': '{key:value}'}, {'text': 'While parsing a string, we found no starting quote. Will add the quote back', 'context': '{key:value}'}, {'context': '{key:value}', 'text': 'While parsing a string missing the left delimiter in object key context, we found a :, stopping here',}, {'text': 'While parsing a string, we missed the closing quote, ignoring', 'context': '{key:value}'}, {'text': 'While parsing a string, we found a literal instead of a quote', 'context': '{key:value}'}, {'text': 'While parsing a string, we found no starting quote. Will add the quote back', 'context': '{key:value}'}, {'context': '{key:value}', 'text': 'While parsing a string missing the left delimiter in object value context, we found a , or } and we couldn\'t determine that a right delimiter was present. Stopping here'}, {'text': 'While parsing a string, we missed the closing quote, ignoring', 'context': '{key:value}'}])
246
+ assert from_file(filename=temp_path, logging=True, chunk_length=2) == ({'key': 'value'}, [{'text': 'While parsing a string, we found a literal instead of a quote', 'context': '{key:value}'}, {'text': 'While parsing a string, we found no starting quote. Will add the quote back', 'context': '{key:value}'}, {'context': '{key:value}', 'text': 'While parsing a string missing the left delimiter in object key context, we found a :, stopping here',}, {'text': 'While parsing a string, we missed the closing quote, ignoring', 'context': '{key:value}'}, {'text': 'While parsing a string, we found a literal instead of a quote', 'context': '{key:value}'}, {'text': 'While parsing a string, we found no starting quote. Will add the quote back', 'context': '{key:value}'}, {'context': '{key:value}', 'text': 'While parsing a string missing the left delimiter in object value context, we found a , or } and we couldn\'t determine that a right delimiter was present. Stopping here'}, {'text': 'While parsing a string, we missed the closing quote, ignoring', 'context': '{key:value}'}])
247
+ finally:
248
+ # Clean up - delete the temporary file
249
+ os.remove(temp_path)
250
+
251
+ # Create a temporary file
252
+ temp_fd, temp_path = tempfile.mkstemp(suffix=".json")
253
+ try:
254
+ # Write content to the temporary file
255
+ with os.fdopen(temp_fd, 'w') as tmp:
256
+ tmp.write('x' * 5 * 1024 * 1024) # 5 MB
257
+ assert from_file(filename=temp_path, logging=True) == ('', [])
258
+
259
+ finally:
260
+ # Clean up - delete the temporary file
261
+ os.remove(temp_path)
262
+
263
+
264
+ def test_ensure_ascii():
265
+ assert repair_json("{'test_中国人_ascii':'统一码'}", ensure_ascii=False) == '{"test_中国人_ascii": "统一码"}'
@@ -19,7 +19,7 @@ def test_true_true_correct(benchmark):
19
19
  mean_time = benchmark.stats.get("median")
20
20
 
21
21
  # Define your time threshold in seconds
22
- max_time = 14 / 10 ** 4 # 1.4 millisecond
22
+ max_time = 15 / 10 ** 4 # 1.5 millisecond
23
23
 
24
24
  # Assert that the average time is below the threshold
25
25
  assert mean_time < max_time, f"Benchmark exceeded threshold: {mean_time:.3f}s > {max_time:.3f}s"
@@ -31,7 +31,7 @@ def test_true_true_incorrect(benchmark):
31
31
  mean_time = benchmark.stats.get("median")
32
32
 
33
33
  # Define your time threshold in seconds
34
- max_time = 14 / 10 ** 4 # 1.4 millisecond
34
+ max_time = 15 / 10 ** 4 # 1.5 millisecond
35
35
 
36
36
  # Assert that the average time is below the threshold
37
37
  assert mean_time < max_time, f"Benchmark exceeded threshold: {mean_time:.3f}s > {max_time:.3f}s"
@@ -53,7 +53,7 @@ def test_true_false_incorrect(benchmark):
53
53
  mean_time = benchmark.stats.get("median")
54
54
 
55
55
  # Define your time threshold in seconds
56
- max_time = 14 / 10 ** 4 # 1.4 millisecond
56
+ max_time = 15 / 10 ** 4 # 1.5 millisecond
57
57
 
58
58
  # Assert that the average time is below the threshold
59
59
  assert mean_time < max_time, f"Benchmark exceeded threshold: {mean_time:.3f}s > {max_time:.3f}s"
@@ -64,7 +64,7 @@ def test_false_true_correct(benchmark):
64
64
  mean_time = benchmark.stats.get("median")
65
65
 
66
66
  # Define your time threshold in seconds
67
- max_time = 14 / 10 ** 4 # 1.4 millisecond
67
+ max_time = 15 / 10 ** 4 # 1.5 millisecond
68
68
 
69
69
  # Assert that the average time is below the threshold
70
70
  assert mean_time < max_time, f"Benchmark exceeded threshold: {mean_time:.3f}s > {max_time:.3f}s"
@@ -75,7 +75,7 @@ def test_false_true_incorrect(benchmark):
75
75
  mean_time = benchmark.stats.get("median")
76
76
 
77
77
  # Define your time threshold in seconds
78
- max_time = 14 / 10 ** 4 # 1.4 millisecond
78
+ max_time = 15 / 10 ** 4 # 1.5 millisecond
79
79
 
80
80
  # Assert that the average time is below the threshold
81
81
  assert mean_time < max_time, f"Benchmark exceeded threshold: {mean_time:.3f}s > {max_time:.3f}s"
@@ -1,247 +0,0 @@
1
- from src.json_repair.json_repair import from_file, repair_json, loads
2
-
3
- def test_basic_types_valid():
4
- assert repair_json("True", return_objects=True) == ""
5
- assert repair_json("False", return_objects=True) == ""
6
- assert repair_json("Null", return_objects=True) == ""
7
- assert repair_json("1", return_objects=True) == 1
8
- assert repair_json("[]", return_objects=True) == []
9
- assert repair_json("[1, 2, 3, 4]", return_objects=True) == [1, 2, 3, 4]
10
- assert repair_json("{}", return_objects=True) == {}
11
- assert repair_json('{ "key": "value", "key2": 1, "key3": True }', return_objects=True) == { "key": "value", "key2": 1, "key3": True }
12
-
13
- def test_basic_types_invalid():
14
- assert repair_json("true", return_objects=True) == True
15
- assert repair_json("false", return_objects=True) == False
16
- assert repair_json("null", return_objects=True) == None
17
- assert repair_json("1.2", return_objects=True) == 1.2
18
- assert repair_json("[", return_objects=True) == []
19
- assert repair_json("[1, 2, 3, 4", return_objects=True) == [1, 2, 3, 4]
20
- assert repair_json("{", return_objects=True) == {}
21
- assert repair_json('{ "key": value, "key2": 1 "key3": null }', return_objects=True) == { "key": "value", "key2": 1, "key3": None }
22
-
23
- def test_valid_json():
24
- assert (
25
- repair_json('{"name": "John", "age": 30, "city": "New York"}')
26
- == '{"name": "John", "age": 30, "city": "New York"}'
27
- )
28
- assert (
29
- repair_json('{"employees":["John", "Anna", "Peter"]} ')
30
- == '{"employees": ["John", "Anna", "Peter"]}'
31
- )
32
- assert repair_json('{"key": "value:value"}') == '{"key": "value:value"}'
33
- assert (
34
- repair_json('{"text": "The quick brown fox,"}')
35
- == '{"text": "The quick brown fox,"}'
36
- )
37
- assert (
38
- repair_json('{"text": "The quick brown fox won\'t jump"}')
39
- == '{"text": "The quick brown fox won\'t jump"}'
40
- )
41
- assert repair_json('{"key": ""') == '{"key": ""}'
42
- assert (
43
- repair_json('{"key1": {"key2": [1, 2, 3]}}') == '{"key1": {"key2": [1, 2, 3]}}'
44
- )
45
- assert (
46
- repair_json('{"key": 12345678901234567890}') == '{"key": 12345678901234567890}'
47
- )
48
- assert repair_json('{"key": "value\u263A"}') == '{"key": "value\\u263a"}'
49
- assert repair_json('{"key": "value\\nvalue"}') == '{"key": "value\\nvalue"}'
50
-
51
- def test_brackets_edge_cases():
52
- assert repair_json("[{]") == "[{}]"
53
- assert repair_json(" { } ") == "{}"
54
- assert repair_json("[") == "[]"
55
- assert repair_json("]") == '""'
56
- assert repair_json("{") == "{}"
57
- assert repair_json("}") == '""'
58
- assert repair_json('{"') == '{}'
59
- assert repair_json('["') == '[]'
60
- assert repair_json('{foo: [}') == '{"foo": []}'
61
-
62
- def test_general_edge_cases():
63
- assert repair_json("\"") == '""'
64
- assert repair_json("\n") == '""'
65
- assert repair_json(" ") == '""'
66
- assert repair_json("[[1\n\n]") == "[[1]]"
67
- assert repair_json("string") == '""'
68
- assert repair_json("stringbeforeobject {}") == '{}'
69
-
70
- def test_mixed_data_types():
71
- assert repair_json(' {"key": true, "key2": false, "key3": null}') == '{"key": true, "key2": false, "key3": null}'
72
- assert repair_json('{"key": TRUE, "key2": FALSE, "key3": Null} ') == '{"key": true, "key2": false, "key3": null}'
73
-
74
- def test_missing_and_mixed_quotes():
75
- assert repair_json("{'key': 'string', 'key2': false, \"key3\": null, \"key4\": unquoted}") == '{"key": "string", "key2": false, "key3": null, "key4": "unquoted"}'
76
- assert (
77
- repair_json('{"name": "John", "age": 30, "city": "New York')
78
- == '{"name": "John", "age": 30, "city": "New York"}'
79
- )
80
- assert (
81
- repair_json('{"name": "John", "age": 30, city: "New York"}')
82
- == '{"name": "John", "age": 30, "city": "New York"}'
83
- )
84
- assert (
85
- repair_json('{"name": "John", "age": 30, "city": New York}')
86
- == '{"name": "John", "age": 30, "city": "New York"}'
87
- )
88
- assert (
89
- repair_json('{"name": John, "age": 30, "city": "New York"}')
90
- == '{"name": "John", "age": 30, "city": "New York"}'
91
- )
92
- assert repair_json('{“slanted_delimiter”: "value"}') == '{"slanted_delimiter": "value"}'
93
- assert (
94
- repair_json('{"name": "John", "age": 30, "city": "New')
95
- == '{"name": "John", "age": 30, "city": "New"}'
96
- )
97
- assert repair_json('[{"key": "value", COMMENT "notes": "lorem "ipsum", sic." }]') == '[{"key": "value", "notes": "lorem \\"ipsum\\", sic."}]'
98
- assert repair_json('{"key": ""value"}') == '{"key": "value"}'
99
- assert repair_json('{"key": "value", 5: "value"}') == '{"key": "value", "5": "value"}'
100
- assert repair_json('{"foo": "\\"bar\\""') == '{"foo": "\\"bar\\""}'
101
- assert repair_json('{"" key":"val"') == '{" key": "val"}'
102
- assert repair_json('{"key": value "key2" : "value2" ') == '{"key": "value", "key2": "value2"}'
103
-
104
- def test_array_edge_cases():
105
- assert repair_json("[1, 2, 3,") == "[1, 2, 3]"
106
- assert repair_json("[1, 2, 3, ...]") == "[1, 2, 3]"
107
- assert repair_json("[1, 2, ... , 3]") == "[1, 2, 3]"
108
- assert repair_json("[1, 2, '...', 3]") == '[1, 2, "...", 3]'
109
- assert repair_json("[true, false, null, ...]") == '[true, false, null]'
110
- assert repair_json('["a" "b" "c" 1') == '["a", "b", "c", 1]'
111
- assert repair_json('{"employees":["John", "Anna",') == '{"employees": ["John", "Anna"]}'
112
- assert repair_json('{"employees":["John", "Anna", "Peter') == '{"employees": ["John", "Anna", "Peter"]}'
113
- assert repair_json('{"key1": {"key2": [1, 2, 3') == '{"key1": {"key2": [1, 2, 3]}}'
114
-
115
- def test_escaping():
116
- assert repair_json("'\"'") == '""'
117
- assert repair_json("{\"key\": 'string\"\n\t\le'") == '{"key": "string\\"\\n\\tle"}'
118
- assert repair_json(r'{"real_content": "Some string: Some other string \t Some string <a href=\"https://domain.com\">Some link</a>"') == r'{"real_content": "Some string: Some other string \t Some string <a href=\"https://domain.com\">Some link</a>"}'
119
- assert repair_json('{"key_1\n": "value"}') == '{"key_1": "value"}'
120
- assert repair_json('{"key\t_": "value"}') == '{"key\\t_": "value"}'
121
-
122
-
123
- def test_object_edge_cases():
124
- assert repair_json('{ ') == '{}'
125
- assert repair_json('{"": "value"') == '{"": "value"}'
126
- assert repair_json('{"value_1": true, COMMENT "value_2": "data"}') == '{"value_1": true, "value_2": "data"}'
127
- assert repair_json('{"value_1": true, SHOULD_NOT_EXIST "value_2": "data" AAAA }') == '{"value_1": true, "value_2": "data"}'
128
- assert repair_json('{"" : true, "key2": "value2"}') == '{"": true, "key2": "value2"}'
129
- assert repair_json('{""answer"":[{""traits"":''Female aged 60+'',""answer1"":""5""}]}') == '{"answer": [{"traits": "Female aged 60+", "answer1": "5"}]}'
130
- assert repair_json('{ "words": abcdef", "numbers": 12345", "words2": ghijkl" }') == '{"words": "abcdef", "numbers": 12345, "words2": "ghijkl"}'
131
- assert repair_json('''{"number": 1,"reason": "According...""ans": "YES"}''') == '{"number": 1, "reason": "According...", "ans": "YES"}'
132
- assert repair_json('''{ "a" : "{ b": {} }" }''') == '{"a": "{ b"}'
133
- assert repair_json("""{"b": "xxxxx" true}""") == '{"b": "xxxxx"}'
134
- assert repair_json('{"key": "Lorem "ipsum" s,"}') == '{"key": "Lorem \\"ipsum\\" s,"}'
135
- assert repair_json('{"lorem": ipsum, sic, datum.",}') == '{"lorem": "ipsum, sic, datum."}'
136
- assert repair_json('{"lorem": sic tamet. "ipsum": sic tamet, quick brown fox. "sic": ipsum}') == '{"lorem": "sic tamet.", "ipsum": "sic tamet", "sic": "ipsum"}'
137
- assert repair_json('{"key":value, " key2":"value2" }') == '{"key": "value", " key2": "value2"}'
138
- assert repair_json('{"key":value "key2":"value2" }') == '{"key": "value", "key2": "value2"}'
139
-
140
- def test_number_edge_cases():
141
- assert repair_json(' - { "test_key": ["test_value", "test_value2"] }') == '{"test_key": ["test_value", "test_value2"]}'
142
- assert repair_json('{"key": 1/3}') == '{"key": "1/3"}'
143
- assert repair_json('{"key": .25}') == '{"key": 0.25}'
144
- assert repair_json('{"here": "now", "key": 1/3, "foo": "bar"}') == '{"here": "now", "key": "1/3", "foo": "bar"}'
145
- assert repair_json('{"key": 12345/67890}') == '{"key": "12345/67890"}'
146
- assert repair_json('[105,12') == '[105, 12]'
147
- assert repair_json('{"key", 105,12,') == '{"key": "105,12"}'
148
- assert repair_json('{"key": 1/3, "foo": "bar"}') == '{"key": "1/3", "foo": "bar"}'
149
- assert repair_json('{"key": 10-20}') == '{"key": "10-20"}'
150
- assert repair_json('{"key": 1.1.1}') == '{"key": "1.1.1"}'
151
- assert repair_json('[- ') == '[]'
152
-
153
- def test_markdown():
154
- assert repair_json('{ "content": "[LINK]("https://google.com")" }') == '{"content": "[LINK](\\"https://google.com\\")"}'
155
- assert repair_json('{ "content": "[LINK](" }') == '{"content": "[LINK]("}'
156
- assert repair_json('{ "content": "[LINK](", "key": true }') == '{"content": "[LINK](", "key": true}'
157
-
158
- def test_leading_trailing_characters():
159
- assert repair_json('````{ "key": "value" }```') == '{"key": "value"}'
160
- assert repair_json("""{ "a": "", "b": [ { "c": 1} ] \n}```""") == '{"a": "", "b": [{"c": 1}]}'
161
- assert repair_json("Based on the information extracted, here is the filled JSON output: ```json { 'a': 'b' } ```") == '{"a": "b"}'
162
- assert repair_json("""
163
- The next 64 elements are:
164
- ```json
165
- { "key": "value" }
166
- ```""") == '{"key": "value"}'
167
- def test_multiple_jsons():
168
- assert repair_json("[]{}") == "[[], {}]"
169
- assert repair_json("{}[]{}") == "[{}, [], {}]"
170
- assert repair_json('{"key":"value"}[1,2,3,True]') == '[{"key": "value"}, [1, 2, 3, true]]'
171
- assert repair_json('lorem ```json {"key":"value"} ``` ipsum ```json [1,2,3,True] ``` 42') == '[{"key": "value"}, [1, 2, 3, true]]'
172
-
173
- def test_repair_json_with_objects():
174
- # Test with valid JSON strings
175
- assert repair_json("[]", return_objects=True) == []
176
- assert repair_json("{}", return_objects=True) == {}
177
- assert repair_json('{"key": true, "key2": false, "key3": null}', return_objects=True) == {"key": True, "key2": False, "key3": None}
178
- assert repair_json('{"name": "John", "age": 30, "city": "New York"}', return_objects=True) == {
179
- "name": "John",
180
- "age": 30,
181
- "city": "New York",
182
- }
183
- assert repair_json("[1, 2, 3, 4]", return_objects=True) == [1, 2, 3, 4]
184
- assert repair_json('{"employees":["John", "Anna", "Peter"]} ', return_objects=True) == {
185
- "employees": ["John", "Anna", "Peter"]
186
- }
187
- assert repair_json('''
188
- {
189
- "resourceType": "Bundle",
190
- "id": "1",
191
- "type": "collection",
192
- "entry": [
193
- {
194
- "resource": {
195
- "resourceType": "Patient",
196
- "id": "1",
197
- "name": [
198
- {"use": "official", "family": "Corwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."},
199
- {"use": "maiden", "family": "Goodwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."]}
200
- ]
201
- }
202
- }
203
- ]
204
- }
205
- ''', return_objects=True) == {"resourceType": "Bundle", "id": "1", "type": "collection", "entry": [{"resource": {"resourceType": "Patient", "id": "1", "name": [{"use": "official", "family": "Corwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."]}, {"use": "maiden", "family": "Goodwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."]}]}}]}
206
- assert repair_json('{\n"html": "<h3 id="aaa">Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>"}', return_objects=True) == {'html': '<h3 id="aaa">Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>'}
207
- assert repair_json("""
208
- [
209
- {
210
- "foo": "Foo bar baz",
211
- "tag": "#foo-bar-baz"
212
- },
213
- {
214
- "foo": "foo bar "foobar" foo bar baz.",
215
- "tag": "#foo-bar-foobar"
216
- }
217
- ]
218
- """, return_objects=True) == [{"foo": "Foo bar baz", "tag": "#foo-bar-baz"},{"foo": "foo bar \"foobar\" foo bar baz.", "tag": "#foo-bar-foobar" }]
219
-
220
- def test_repair_json_skip_json_loads():
221
- assert repair_json('{"key": true, "key2": false, "key3": null}', skip_json_loads=True) == '{"key": true, "key2": false, "key3": null}'
222
- assert repair_json('{"key": true, "key2": false, "key3": null}', return_objects=True, skip_json_loads=True) == {"key": True, "key2": False, "key3": None}
223
- assert repair_json('{"key": true, "key2": false, "key3": }', skip_json_loads=True) == '{"key": true, "key2": false, "key3": ""}'
224
- assert loads('{"key": true, "key2": false, "key3": }', skip_json_loads=True) == {"key": True, "key2": False, "key3": ""}
225
-
226
-
227
- def test_repair_json_from_file():
228
- import os.path
229
- import pathlib
230
- path = pathlib.Path(__file__).parent.resolve()
231
-
232
- assert from_file(os.path.join(path,"invalid.json")) == [{"_id": "655b66256574f09bdae8abe8", "index": 0, "guid": "31082ae3-b0f3-4406-90f4-cc450bd4379d", "isActive": False, "balance": "$2,562.78", "picture": "http://placehold.it/32x32", "age": 32, "eyeColor": "brown", "name": "Glover Rivas", "gender": "male", "company": "EMPIRICA", "email": "gloverrivas@empirica.com", "phone": "+1 (842) 507-3063", "address": "536 Montague Terrace, Jenkinsville, Kentucky, 2235", "about": "Mollit consectetur excepteur voluptate tempor dolore ullamco enim irure ullamco non enim officia. Voluptate occaecat proident laboris ea Lorem cupidatat reprehenderit nisi nisi aliqua. Amet nulla ipsum deserunt excepteur amet ad aute aute ex. Et enim minim sit veniam est quis dolor nisi sunt quis eiusmod in. Amet eiusmod cillum sunt occaecat dolor laboris voluptate in eiusmod irure aliqua duis.", "registered": "2023-11-18T09:32:36 -01:00", "latitude": 36.26102, "longitude": -91.304608, "tags": ["non", "tempor", "do", "ullamco", "dolore", "sunt", "ipsum"], "friends": [{"id": 0, "name": "Cara Shepherd"}, {"id": 1, "name": "Mason Farley"}, {"id": 2, "name": "Harriet Cochran"}], "greeting": "Hello, Glover Rivas! You have 7 unread messages.", "favoriteFruit": "strawberry"}, {"_id": "655b662585364bc57278bb6f", "index": 1, "guid": "0dea7a3a-f812-4dde-b78d-7a9b58e5da05", "isActive": True, "balance": "$1,359.48", "picture": "http://placehold.it/32x32", "age": 38, "eyeColor": "brown", "name": "Brandi Moreno", "gender": "female", "company": "MARQET", "email": "brandimoreno@marqet.com", "phone": "+1 (850) 434-2077", "address": "537 Doone Court, Waiohinu, Michigan, 3215", "about": "Irure proident adipisicing do Lorem do incididunt in laborum in eiusmod eiusmod ad elit proident. Eiusmod dolor ex magna magna occaecat. Nulla deserunt velit ex exercitation et irure sunt. Cupidatat ut excepteur ea quis labore sint cupidatat incididunt amet eu consectetur cillum ipsum proident. Occaecat exercitation aute laborum dolor proident reprehenderit laborum in voluptate culpa. Exercitation nulla adipisicing culpa aute est deserunt ea nisi deserunt consequat occaecat ut et non. Incididunt ex exercitation dolor dolor anim cillum dolore.", "registered": "2015-09-03T11:47:15 -02:00", "latitude": -19.768953, "longitude": 8.948458, "tags": ["laboris", "occaecat", "laborum", "laborum", "ex", "cillum", "occaecat"], "friends": [{"id": 0, "name": "Erna Kelly"}, {"id": 1, "name": "Black Mays"}, {"id": 2, "name": "Davis Buck"}], "greeting": "Hello, Brandi Moreno! You have 1 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625870da431bcf5e0c2", "index": 2, "guid": "b17f6e3f-c898-4334-abbf-05cf222f143b", "isActive": False, "balance": "$1,493.77", "picture": "http://placehold.it/32x32", "age": 20, "eyeColor": "brown", "name": "Moody Meadows", "gender": "male", "company": "OPTIQUE", "email": "moodymeadows@optique.com", "phone": "+1 (993) 566-3041", "address": "766 Osborn Street, Bath, Maine, 7666", "about": "Non commodo excepteur nostrud qui adipisicing aliquip dolor minim nulla culpa proident. In ad cupidatat ea mollit ex est do deserunt proident nostrud. Cillum id id eiusmod amet exercitation nostrud cillum sunt deserunt dolore deserunt eiusmod mollit. Ut ex tempor ad laboris voluptate labore id officia fugiat exercitation amet.", "registered": "2015-01-16T02:48:28 -01:00", "latitude": -25.847327, "longitude": 63.95991, "tags": ["aute", "commodo", "adipisicing", "nostrud", "duis", "mollit", "ut"], "friends": [{"id": 0, "name": "Lacey Cash"}, {"id": 1, "name": "Gabrielle Harmon"}, {"id": 2, "name": "Ellis Lambert"}], "greeting": "Hello, Moody Meadows! You have 4 unread messages.", "favoriteFruit": "strawberry"}, {"_id": "655b6625f3e1bf422220854e", "index": 3, "guid": "92229883-2bfd-4974-a08c-1b506b372e46", "isActive": False, "balance": "$2,215.34", "picture": "http://placehold.it/32x32", "age": 22, "eyeColor": "brown", "name": "Heath Nguyen", "gender": "male", "company": "BLEENDOT", "email": "heathnguyen@bleendot.com", "phone": "+1 (989) 512-2797", "address": "135 Milton Street, Graniteville, Nebraska, 276", "about": "Consequat aliquip irure Lorem cupidatat nulla magna ullamco nulla voluptate adipisicing anim consectetur tempor aliquip. Magna aliqua nulla eu tempor esse proident. Proident fugiat ad ex Lorem reprehenderit dolor aliquip labore labore aliquip. Deserunt aute enim ea minim officia anim culpa sint commodo. Cillum consectetur excepteur aliqua exercitation Lorem veniam voluptate.", "registered": "2016-07-06T01:31:07 -02:00", "latitude": -60.997048, "longitude": -102.397885, "tags": ["do", "ad", "consequat", "irure", "tempor", "elit", "minim"], "friends": [{"id": 0, "name": "Walker Hernandez"}, {"id": 1, "name": "Maria Lane"}, {"id": 2, "name": "Mcknight Barron"}], "greeting": "Hello, Heath Nguyen! You have 4 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625519a5b5e4b6742bf", "index": 4, "guid": "c5dc685f-6d0d-4173-b4cf-f5df29a1e8ef", "isActive": True, "balance": "$1,358.90", "picture": "http://placehold.it/32x32", "age": 33, "eyeColor": "brown", "name": "Deidre Duke", "gender": "female", "company": "OATFARM", "email": "deidreduke@oatfarm.com", "phone": "+1 (875) 587-3256", "address": "487 Schaefer Street, Wattsville, West Virginia, 4506", "about": "Laboris eu nulla esse magna sit eu deserunt non est aliqua exercitation commodo. Ad occaecat qui qui laborum dolore anim Lorem. Est qui occaecat irure enim deserunt enim aliqua ex deserunt incididunt esse. Quis in minim laboris proident non mollit. Magna ea do labore commodo. Et elit esse esse occaecat officia ipsum nisi.", "registered": "2021-09-12T04:17:08 -02:00", "latitude": 68.609781, "longitude": -87.509134, "tags": ["mollit", "cupidatat", "irure", "sit", "consequat", "anim", "fugiat"], "friends": [{"id": 0, "name": "Bean Paul"}, {"id": 1, "name": "Cochran Hubbard"}, {"id": 2, "name": "Rodgers Atkinson"}], "greeting": "Hello, Deidre Duke! You have 6 unread messages.", "favoriteFruit": "apple"}, {"_id": "655b6625a19b3f7e5f82f0ea", "index": 5, "guid": "75f3c264-baa1-47a0-b21c-4edac23d9935", "isActive": True, "balance": "$3,554.36", "picture": "http://placehold.it/32x32", "age": 26, "eyeColor": "blue", "name": "Lydia Holland", "gender": "female", "company": "ESCENTA", "email": "lydiaholland@escenta.com", "phone": "+1 (927) 482-3436", "address": "554 Rockaway Parkway, Kohatk, Montana, 6316", "about": "Consectetur ea est labore commodo laborum mollit pariatur non enim. Est dolore et non laboris tempor. Ea incididunt ut adipisicing cillum labore officia tempor eiusmod commodo. Cillum fugiat ex consectetur ut nostrud anim nostrud exercitation ut duis in ea. Eu et id fugiat est duis eiusmod ullamco quis officia minim sint ea nisi in.", "registered": "2018-03-13T01:48:56 -01:00", "latitude": -88.495799, "longitude": 71.840667, "tags": ["veniam", "minim", "consequat", "consequat", "incididunt", "consequat", "elit"], "friends": [{"id": 0, "name": "Debra Massey"}, {"id": 1, "name": "Weiss Savage"}, {"id": 2, "name": "Shannon Guerra"}], "greeting": "Hello, Lydia Holland! You have 5 unread messages.", "favoriteFruit": "banana"}]
233
-
234
- import tempfile
235
- # Create a temporary file
236
- temp_fd, temp_path = tempfile.mkstemp(suffix=".json")
237
- try:
238
- # Write content to the temporary file
239
- with os.fdopen(temp_fd, 'w') as tmp:
240
- tmp.write("{key:value}")
241
- assert from_file(temp_path, logging=True) == ({'key': 'value'}, [{'text': 'While parsing a string, we found a literal instead of a quote', 'context': '{key:value}'}, {'text': 'While parsing a string, we found no starting quote. Will add the quote back', 'context': '{key:value}'}, {'context': '{key:value}', 'text': 'While parsing a string missing the left delimiter in object key context, we found a :, stopping here',}, {'text': 'While parsing a string, we missed the closing quote, ignoring', 'context': '{key:value}'}, {'text': 'While parsing a string, we found a literal instead of a quote', 'context': '{key:value}'}, {'text': 'While parsing a string, we found no starting quote. Will add the quote back', 'context': '{key:value}'}, {'context': '{key:value}', 'text': 'While parsing a string missing the left delimiter in object value context, we found a , or } and we couldn\'t determine that a right delimiter was present. Stopping here'}, {'text': 'While parsing a string, we missed the closing quote, ignoring', 'context': '{key:value}'}])
242
- finally:
243
- # Clean up - delete the temporary file
244
- os.remove(temp_path)
245
-
246
- def test_ensure_ascii():
247
- assert repair_json("{'test_中国人_ascii':'统一码'}", ensure_ascii=False) == '{"test_中国人_ascii": "统一码"}'
File without changes
File without changes