json-repair 0.29.10__tar.gz → 0.30.1__tar.gz
Sign up to get free protection for your applications and to get access to all the features.
- {json_repair-0.29.10/src/json_repair.egg-info → json_repair-0.30.1}/PKG-INFO +35 -4
- {json_repair-0.29.10 → json_repair-0.30.1}/README.md +33 -2
- {json_repair-0.29.10 → json_repair-0.30.1}/pyproject.toml +2 -2
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair/json_context.py +3 -5
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair/json_parser.py +22 -3
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair/json_repair.py +71 -18
- {json_repair-0.29.10 → json_repair-0.30.1/src/json_repair.egg-info}/PKG-INFO +35 -4
- {json_repair-0.29.10 → json_repair-0.30.1}/tests/test_json_repair.py +3 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/LICENSE +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/setup.cfg +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair/__init__.py +3 -3
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair/__main__.py +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair/py.typed +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair/string_file_wrapper.py +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair.egg-info/SOURCES.txt +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair.egg-info/dependency_links.txt +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair.egg-info/entry_points.txt +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/src/json_repair.egg-info/top_level.txt +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/tests/test_coverage.py +0 -0
- {json_repair-0.29.10 → json_repair-0.30.1}/tests/test_performance.py +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: json_repair
|
3
|
-
Version: 0.
|
3
|
+
Version: 0.30.1
|
4
4
|
Summary: A package to repair broken json strings
|
5
5
|
Author-email: Stefano Baccianella <4247706+mangiucugna@users.noreply.github.com>
|
6
6
|
License: MIT License
|
@@ -32,19 +32,23 @@ Keywords: JSON,REPAIR,LLM,PARSER
|
|
32
32
|
Classifier: Programming Language :: Python :: 3
|
33
33
|
Classifier: License :: OSI Approved :: MIT License
|
34
34
|
Classifier: Operating System :: OS Independent
|
35
|
-
Requires-Python: >=3.
|
35
|
+
Requires-Python: >=3.9
|
36
36
|
Description-Content-Type: text/markdown
|
37
37
|
License-File: LICENSE
|
38
38
|
|
39
39
|
[](https://pypi.org/project/json-repair/)
|
40
|
-

|
41
41
|
[](https://pypi.org/project/json-repair/)
|
42
42
|
[](https://github.com/sponsors/mangiucugna)
|
43
|
+
[](https://github.com/mangiucugna/json_repair/stargazers)
|
44
|
+
|
43
45
|
|
44
46
|
This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test.
|
45
47
|
|
46
48
|
Inspired by https://github.com/josdejong/jsonrepair
|
47
49
|
|
50
|
+

|
51
|
+
|
48
52
|
---
|
49
53
|
# Offer me a beer
|
50
54
|
If you find this library useful, you can help me by donating toward my monthly beer budget here: https://github.com/sponsors/mangiucugna
|
@@ -54,6 +58,8 @@ If you find this library useful, you can help me by donating toward my monthly b
|
|
54
58
|
# Demo
|
55
59
|
If you are unsure if this library will fix your specific problem, or simply want your json validated online, you can visit the demo site on GitHub pages: https://mangiucugna.github.io/json_repair/
|
56
60
|
|
61
|
+
Or hear an [audio deepdive generate by Google's NotebookLM](https://notebooklm.google.com/notebook/05312bb3-f6f3-4e49-a99b-bd51db64520b/audio) for an introduction to the module
|
62
|
+
|
57
63
|
---
|
58
64
|
|
59
65
|
# Motivation
|
@@ -64,6 +70,11 @@ I searched for a lightweight python package that was able to reliably fix this p
|
|
64
70
|
|
65
71
|
*So I wrote one*
|
66
72
|
|
73
|
+
### Wouldn't GPT-4o Structured Output make this library outdated?
|
74
|
+
|
75
|
+
As part of my job we use OpenAI APIs and we noticed that even with structured output sometimes the result isn't a fully valid json.
|
76
|
+
So we still use this library to cover those outliers.
|
77
|
+
|
67
78
|
# Supported use cases
|
68
79
|
|
69
80
|
### Fixing Syntax Errors in JSON
|
@@ -144,6 +155,26 @@ and another method to read from a file:
|
|
144
155
|
|
145
156
|
Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you
|
146
157
|
|
158
|
+
### Non-Latin characters
|
159
|
+
|
160
|
+
When working with non-Latin characters (such as Chinese, Japanese, or Korean), you need to pass `ensure_ascii=False` to `repair_json()` in order to preserve the non-Latin characters in the output.
|
161
|
+
|
162
|
+
Here's an example using Chinese characters:
|
163
|
+
|
164
|
+
repair_json("{'test_chinese_ascii':'统一码'}")
|
165
|
+
|
166
|
+
will return
|
167
|
+
|
168
|
+
{"test_chinese_ascii": "\u7edf\u4e00\u7801"}
|
169
|
+
|
170
|
+
Instead passing `ensure_ascii=False`:
|
171
|
+
|
172
|
+
repair_json("{'test_chinese_ascii':'统一码'}", ensure_ascii=False)
|
173
|
+
|
174
|
+
will return
|
175
|
+
|
176
|
+
{"test_chinese_ascii": "统一码"}
|
177
|
+
|
147
178
|
### Performance considerations
|
148
179
|
If you find this library too slow because is using `json.loads()` you can skip that by passing `skip_json_loads=True` to `repair_json`. Like:
|
149
180
|
|
@@ -226,7 +257,7 @@ This module will parse the JSON file following the BNF definition:
|
|
226
257
|
<object> ::= '{' [ <member> *(', ' <member>) ] '}' ; A sequence of 'members'
|
227
258
|
<member> ::= <string> ': ' <json> ; A pair consisting of a name, and a JSON value
|
228
259
|
|
229
|
-
If something is wrong (a missing
|
260
|
+
If something is wrong (a missing parentheses or quotes for example) it will use a few simple heuristics to fix the JSON string:
|
230
261
|
- Add the missing parentheses if the parser believes that the array or object should be closed
|
231
262
|
- Quote strings or add missing single quotes
|
232
263
|
- Adjust whitespaces and remove line breaks
|
@@ -1,12 +1,16 @@
|
|
1
1
|
[](https://pypi.org/project/json-repair/)
|
2
|
-

|
3
3
|
[](https://pypi.org/project/json-repair/)
|
4
4
|
[](https://github.com/sponsors/mangiucugna)
|
5
|
+
[](https://github.com/mangiucugna/json_repair/stargazers)
|
6
|
+
|
5
7
|
|
6
8
|
This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test.
|
7
9
|
|
8
10
|
Inspired by https://github.com/josdejong/jsonrepair
|
9
11
|
|
12
|
+

|
13
|
+
|
10
14
|
---
|
11
15
|
# Offer me a beer
|
12
16
|
If you find this library useful, you can help me by donating toward my monthly beer budget here: https://github.com/sponsors/mangiucugna
|
@@ -16,6 +20,8 @@ If you find this library useful, you can help me by donating toward my monthly b
|
|
16
20
|
# Demo
|
17
21
|
If you are unsure if this library will fix your specific problem, or simply want your json validated online, you can visit the demo site on GitHub pages: https://mangiucugna.github.io/json_repair/
|
18
22
|
|
23
|
+
Or hear an [audio deepdive generate by Google's NotebookLM](https://notebooklm.google.com/notebook/05312bb3-f6f3-4e49-a99b-bd51db64520b/audio) for an introduction to the module
|
24
|
+
|
19
25
|
---
|
20
26
|
|
21
27
|
# Motivation
|
@@ -26,6 +32,11 @@ I searched for a lightweight python package that was able to reliably fix this p
|
|
26
32
|
|
27
33
|
*So I wrote one*
|
28
34
|
|
35
|
+
### Wouldn't GPT-4o Structured Output make this library outdated?
|
36
|
+
|
37
|
+
As part of my job we use OpenAI APIs and we noticed that even with structured output sometimes the result isn't a fully valid json.
|
38
|
+
So we still use this library to cover those outliers.
|
39
|
+
|
29
40
|
# Supported use cases
|
30
41
|
|
31
42
|
### Fixing Syntax Errors in JSON
|
@@ -106,6 +117,26 @@ and another method to read from a file:
|
|
106
117
|
|
107
118
|
Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you
|
108
119
|
|
120
|
+
### Non-Latin characters
|
121
|
+
|
122
|
+
When working with non-Latin characters (such as Chinese, Japanese, or Korean), you need to pass `ensure_ascii=False` to `repair_json()` in order to preserve the non-Latin characters in the output.
|
123
|
+
|
124
|
+
Here's an example using Chinese characters:
|
125
|
+
|
126
|
+
repair_json("{'test_chinese_ascii':'统一码'}")
|
127
|
+
|
128
|
+
will return
|
129
|
+
|
130
|
+
{"test_chinese_ascii": "\u7edf\u4e00\u7801"}
|
131
|
+
|
132
|
+
Instead passing `ensure_ascii=False`:
|
133
|
+
|
134
|
+
repair_json("{'test_chinese_ascii':'统一码'}", ensure_ascii=False)
|
135
|
+
|
136
|
+
will return
|
137
|
+
|
138
|
+
{"test_chinese_ascii": "统一码"}
|
139
|
+
|
109
140
|
### Performance considerations
|
110
141
|
If you find this library too slow because is using `json.loads()` you can skip that by passing `skip_json_loads=True` to `repair_json`. Like:
|
111
142
|
|
@@ -188,7 +219,7 @@ This module will parse the JSON file following the BNF definition:
|
|
188
219
|
<object> ::= '{' [ <member> *(', ' <member>) ] '}' ; A sequence of 'members'
|
189
220
|
<member> ::= <string> ': ' <json> ; A pair consisting of a name, and a JSON value
|
190
221
|
|
191
|
-
If something is wrong (a missing
|
222
|
+
If something is wrong (a missing parentheses or quotes for example) it will use a few simple heuristics to fix the JSON string:
|
192
223
|
- Add the missing parentheses if the parser believes that the array or object should be closed
|
193
224
|
- Quote strings or add missing single quotes
|
194
225
|
- Adjust whitespaces and remove line breaks
|
@@ -3,7 +3,7 @@ requires = ["setuptools>=61.0"]
|
|
3
3
|
build-backend = "setuptools.build_meta"
|
4
4
|
[project]
|
5
5
|
name = "json_repair"
|
6
|
-
version = "0.
|
6
|
+
version = "0.30.1"
|
7
7
|
license = {file = "LICENSE"}
|
8
8
|
authors = [
|
9
9
|
{ name="Stefano Baccianella", email="4247706+mangiucugna@users.noreply.github.com" },
|
@@ -11,7 +11,7 @@ authors = [
|
|
11
11
|
description = "A package to repair broken json strings"
|
12
12
|
keywords = ["JSON", "REPAIR", "LLM", "PARSER"]
|
13
13
|
readme = "README.md"
|
14
|
-
requires-python = ">=3.
|
14
|
+
requires-python = ">=3.9"
|
15
15
|
classifiers = [
|
16
16
|
"Programming Language :: Python :: 3",
|
17
17
|
"License :: OSI Approved :: MIT License",
|
@@ -24,11 +24,9 @@ class JsonContext:
|
|
24
24
|
Returns:
|
25
25
|
None
|
26
26
|
"""
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
self.current = value
|
31
|
-
self.empty = False
|
27
|
+
self.context.append(value)
|
28
|
+
self.current = value
|
29
|
+
self.empty = False
|
32
30
|
|
33
31
|
def reset(self) -> None:
|
34
32
|
"""
|
@@ -1,7 +1,7 @@
|
|
1
|
-
from typing import Any, Dict, List,
|
1
|
+
from typing import Any, Dict, List, Literal, Optional, TextIO, Tuple, Union
|
2
2
|
|
3
|
+
from .json_context import ContextValues, JsonContext
|
3
4
|
from .string_file_wrapper import StringFileWrapper
|
4
|
-
from .json_context import JsonContext, ContextValues
|
5
5
|
|
6
6
|
JSONReturnType = Union[Dict[str, Any], List[Any], str, float, int, bool, None]
|
7
7
|
|
@@ -314,10 +314,19 @@ class JSONParser:
|
|
314
314
|
if next_c:
|
315
315
|
i += 1
|
316
316
|
# found a delimiter, now we need to check that is followed strictly by a comma or brace
|
317
|
+
# or the string ended
|
317
318
|
i = self.skip_whitespaces_at(idx=i, move_main_index=False)
|
318
319
|
next_c = self.get_char_at(i)
|
319
|
-
if next_c
|
320
|
+
if not next_c or next_c in [",", "}"]:
|
320
321
|
rstring_delimiter_missing = False
|
322
|
+
else:
|
323
|
+
# OK but this could still be some garbage at the end of the string
|
324
|
+
# So we need to check if we find a new lstring_delimiter afterwards
|
325
|
+
# If we do, this is a missing delimiter
|
326
|
+
i = self.skip_to_character(character=lstring_delimiter, idx=i)
|
327
|
+
next_c = self.get_char_at(i)
|
328
|
+
if not next_c:
|
329
|
+
rstring_delimiter_missing = False
|
321
330
|
else:
|
322
331
|
# skip any whitespace first
|
323
332
|
i = self.skip_whitespaces_at(idx=1, move_main_index=False)
|
@@ -329,6 +338,16 @@ class JSONParser:
|
|
329
338
|
# Ok it's not right after the comma
|
330
339
|
# Let's ignore
|
331
340
|
rstring_delimiter_missing = False
|
341
|
+
# Check that j was not out of bound
|
342
|
+
elif self.get_char_at(j):
|
343
|
+
# Check for an unmatched opening brace in string_acc
|
344
|
+
for c in reversed(string_acc):
|
345
|
+
if c == "{":
|
346
|
+
# Ok then this is part of the string
|
347
|
+
rstring_delimiter_missing = False
|
348
|
+
break
|
349
|
+
elif c == "}":
|
350
|
+
break
|
332
351
|
if rstring_delimiter_missing:
|
333
352
|
self.log(
|
334
353
|
"While parsing a string missing the left delimiter in object value context, we found a , or } and we couldn't determine that a right delimiter was present. Stopping here",
|
@@ -23,9 +23,10 @@ All supported use cases are in the unit tests
|
|
23
23
|
"""
|
24
24
|
|
25
25
|
import argparse
|
26
|
-
import sys
|
27
26
|
import json
|
28
|
-
|
27
|
+
import sys
|
28
|
+
from typing import Dict, List, Optional, TextIO, Tuple, Union
|
29
|
+
|
29
30
|
from .json_parser import JSONParser, JSONReturnType
|
30
31
|
|
31
32
|
|
@@ -40,10 +41,18 @@ def repair_json(
|
|
40
41
|
) -> Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]:
|
41
42
|
"""
|
42
43
|
Given a json formatted string, it will try to decode it and, if it fails, it will try to fix it.
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
44
|
+
|
45
|
+
Args:
|
46
|
+
json_str (str, optional): The JSON string to repair. Defaults to an empty string.
|
47
|
+
return_objects (bool, optional): If True, return the decoded data structure. Defaults to False.
|
48
|
+
skip_json_loads (bool, optional): If True, skip calling the built-in json.loads() function to verify that the json is valid before attempting to repair. Defaults to False.
|
49
|
+
logging (bool, optional): If True, return a tuple with the repaired json and a log of all repair actions. Defaults to False.
|
50
|
+
json_fd (Optional[TextIO], optional): File descriptor for JSON input. Do not use! Use `from_file` or `load` instead. Defaults to None.
|
51
|
+
ensure_ascii (bool, optional): Set to False to avoid converting non-latin characters to ascii (for example when using chinese characters). Defaults to True. Ignored if `skip_json_loads` is True.
|
52
|
+
chunk_length (int, optional): Size in bytes of the file chunks to read at once. Ignored if `json_fd` is None. Do not use! Use `from_file` or `load` instead. Defaults to 1MB.
|
53
|
+
|
54
|
+
Returns:
|
55
|
+
Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]: The repaired JSON or a tuple with the repaired JSON and repair log.
|
47
56
|
"""
|
48
57
|
parser = JSONParser(json_str, json_fd, logging, chunk_length)
|
49
58
|
if skip_json_loads:
|
@@ -71,6 +80,14 @@ def loads(
|
|
71
80
|
"""
|
72
81
|
This function works like `json.loads()` except that it will fix your JSON in the process.
|
73
82
|
It is a wrapper around the `repair_json()` function with `return_objects=True`.
|
83
|
+
|
84
|
+
Args:
|
85
|
+
json_str (str): The JSON string to load and repair.
|
86
|
+
skip_json_loads (bool, optional): If True, skip calling the built-in json.loads() function to verify that the json is valid before attempting to repair. Defaults to False.
|
87
|
+
logging (bool, optional): If True, return a tuple with the repaired json and a log of all repair actions. Defaults to False.
|
88
|
+
|
89
|
+
Returns:
|
90
|
+
Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]: The repaired JSON object or a tuple with the repaired JSON object and repair log.
|
74
91
|
"""
|
75
92
|
return repair_json(
|
76
93
|
json_str=json_str,
|
@@ -89,6 +106,15 @@ def load(
|
|
89
106
|
"""
|
90
107
|
This function works like `json.load()` except that it will fix your JSON in the process.
|
91
108
|
It is a wrapper around the `repair_json()` function with `json_fd=fd` and `return_objects=True`.
|
109
|
+
|
110
|
+
Args:
|
111
|
+
fd (TextIO): File descriptor for JSON input.
|
112
|
+
skip_json_loads (bool, optional): If True, skip calling the built-in json.loads() function to verify that the json is valid before attempting to repair. Defaults to False.
|
113
|
+
logging (bool, optional): If True, return a tuple with the repaired json and a log of all repair actions. Defaults to False.
|
114
|
+
chunk_length (int, optional): Size in bytes of the file chunks to read at once. Defaults to 1MB.
|
115
|
+
|
116
|
+
Returns:
|
117
|
+
Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]: The repaired JSON object or a tuple with the repaired JSON object and repair log.
|
92
118
|
"""
|
93
119
|
return repair_json(
|
94
120
|
json_fd=fd,
|
@@ -107,20 +133,48 @@ def from_file(
|
|
107
133
|
) -> Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]:
|
108
134
|
"""
|
109
135
|
This function is a wrapper around `load()` so you can pass the filename as string
|
136
|
+
|
137
|
+
Args:
|
138
|
+
filename (str): The name of the file containing JSON data to load and repair.
|
139
|
+
skip_json_loads (bool, optional): If True, skip calling the built-in json.loads() function to verify that the json is valid before attempting to repair. Defaults to False.
|
140
|
+
logging (bool, optional): If True, return a tuple with the repaired json and a log of all repair actions. Defaults to False.
|
141
|
+
chunk_length (int, optional): Size in bytes of the file chunks to read at once. Defaults to 1MB.
|
142
|
+
|
143
|
+
Returns:
|
144
|
+
Union[JSONReturnType, Tuple[JSONReturnType, List[Dict[str, str]]]]: The repaired JSON object or a tuple with the repaired JSON object and repair log.
|
110
145
|
"""
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
fd.close()
|
146
|
+
with open(filename) as fd:
|
147
|
+
jsonobj = load(
|
148
|
+
fd=fd,
|
149
|
+
skip_json_loads=skip_json_loads,
|
150
|
+
logging=logging,
|
151
|
+
chunk_length=chunk_length,
|
152
|
+
)
|
119
153
|
|
120
154
|
return jsonobj
|
121
155
|
|
122
156
|
|
123
157
|
def cli(inline_args: Optional[List[str]] = None) -> int:
|
158
|
+
"""
|
159
|
+
Command-line interface for repairing and parsing JSON files.
|
160
|
+
|
161
|
+
Args:
|
162
|
+
inline_args (Optional[List[str]]): List of command-line arguments for testing purposes. Defaults to None.
|
163
|
+
- filename (str): The JSON file to repair
|
164
|
+
- -i, --inline (bool): Replace the file inline instead of returning the output to stdout.
|
165
|
+
- -o, --output TARGET (str): If specified, the output will be written to TARGET filename instead of stdout.
|
166
|
+
- --ensure_ascii (bool): Pass ensure_ascii=True to json.dumps(). Will pass False otherwise.
|
167
|
+
- --indent INDENT (int): Number of spaces for indentation (Default 2).
|
168
|
+
|
169
|
+
Returns:
|
170
|
+
int: Exit code of the CLI operation.
|
171
|
+
|
172
|
+
Raises:
|
173
|
+
Exception: Any exception that occurs during file processing.
|
174
|
+
|
175
|
+
Example:
|
176
|
+
>>> cli(['example.json', '--indent', '4'])
|
177
|
+
"""
|
124
178
|
parser = argparse.ArgumentParser(description="Repair and parse JSON files.")
|
125
179
|
parser.add_argument("filename", help="The JSON file to repair")
|
126
180
|
parser.add_argument(
|
@@ -166,14 +220,13 @@ def cli(inline_args: Optional[List[str]] = None) -> int:
|
|
166
220
|
result = from_file(args.filename)
|
167
221
|
|
168
222
|
if args.inline or args.output:
|
169
|
-
|
170
|
-
|
171
|
-
fd.close()
|
223
|
+
with open(args.output or args.filename, mode="w") as fd:
|
224
|
+
json.dump(result, fd, indent=args.indent, ensure_ascii=ensure_ascii)
|
172
225
|
else:
|
173
226
|
print(json.dumps(result, indent=args.indent, ensure_ascii=ensure_ascii))
|
174
227
|
except Exception as e: # pragma: no cover
|
175
228
|
print(f"Error: {str(e)}", file=sys.stderr)
|
176
|
-
|
229
|
+
return 1
|
177
230
|
|
178
231
|
return 0 # Success
|
179
232
|
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: json_repair
|
3
|
-
Version: 0.
|
3
|
+
Version: 0.30.1
|
4
4
|
Summary: A package to repair broken json strings
|
5
5
|
Author-email: Stefano Baccianella <4247706+mangiucugna@users.noreply.github.com>
|
6
6
|
License: MIT License
|
@@ -32,19 +32,23 @@ Keywords: JSON,REPAIR,LLM,PARSER
|
|
32
32
|
Classifier: Programming Language :: Python :: 3
|
33
33
|
Classifier: License :: OSI Approved :: MIT License
|
34
34
|
Classifier: Operating System :: OS Independent
|
35
|
-
Requires-Python: >=3.
|
35
|
+
Requires-Python: >=3.9
|
36
36
|
Description-Content-Type: text/markdown
|
37
37
|
License-File: LICENSE
|
38
38
|
|
39
39
|
[](https://pypi.org/project/json-repair/)
|
40
|
-

|
41
41
|
[](https://pypi.org/project/json-repair/)
|
42
42
|
[](https://github.com/sponsors/mangiucugna)
|
43
|
+
[](https://github.com/mangiucugna/json_repair/stargazers)
|
44
|
+
|
43
45
|
|
44
46
|
This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test.
|
45
47
|
|
46
48
|
Inspired by https://github.com/josdejong/jsonrepair
|
47
49
|
|
50
|
+

|
51
|
+
|
48
52
|
---
|
49
53
|
# Offer me a beer
|
50
54
|
If you find this library useful, you can help me by donating toward my monthly beer budget here: https://github.com/sponsors/mangiucugna
|
@@ -54,6 +58,8 @@ If you find this library useful, you can help me by donating toward my monthly b
|
|
54
58
|
# Demo
|
55
59
|
If you are unsure if this library will fix your specific problem, or simply want your json validated online, you can visit the demo site on GitHub pages: https://mangiucugna.github.io/json_repair/
|
56
60
|
|
61
|
+
Or hear an [audio deepdive generate by Google's NotebookLM](https://notebooklm.google.com/notebook/05312bb3-f6f3-4e49-a99b-bd51db64520b/audio) for an introduction to the module
|
62
|
+
|
57
63
|
---
|
58
64
|
|
59
65
|
# Motivation
|
@@ -64,6 +70,11 @@ I searched for a lightweight python package that was able to reliably fix this p
|
|
64
70
|
|
65
71
|
*So I wrote one*
|
66
72
|
|
73
|
+
### Wouldn't GPT-4o Structured Output make this library outdated?
|
74
|
+
|
75
|
+
As part of my job we use OpenAI APIs and we noticed that even with structured output sometimes the result isn't a fully valid json.
|
76
|
+
So we still use this library to cover those outliers.
|
77
|
+
|
67
78
|
# Supported use cases
|
68
79
|
|
69
80
|
### Fixing Syntax Errors in JSON
|
@@ -144,6 +155,26 @@ and another method to read from a file:
|
|
144
155
|
|
145
156
|
Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you
|
146
157
|
|
158
|
+
### Non-Latin characters
|
159
|
+
|
160
|
+
When working with non-Latin characters (such as Chinese, Japanese, or Korean), you need to pass `ensure_ascii=False` to `repair_json()` in order to preserve the non-Latin characters in the output.
|
161
|
+
|
162
|
+
Here's an example using Chinese characters:
|
163
|
+
|
164
|
+
repair_json("{'test_chinese_ascii':'统一码'}")
|
165
|
+
|
166
|
+
will return
|
167
|
+
|
168
|
+
{"test_chinese_ascii": "\u7edf\u4e00\u7801"}
|
169
|
+
|
170
|
+
Instead passing `ensure_ascii=False`:
|
171
|
+
|
172
|
+
repair_json("{'test_chinese_ascii':'统一码'}", ensure_ascii=False)
|
173
|
+
|
174
|
+
will return
|
175
|
+
|
176
|
+
{"test_chinese_ascii": "统一码"}
|
177
|
+
|
147
178
|
### Performance considerations
|
148
179
|
If you find this library too slow because is using `json.loads()` you can skip that by passing `skip_json_loads=True` to `repair_json`. Like:
|
149
180
|
|
@@ -226,7 +257,7 @@ This module will parse the JSON file following the BNF definition:
|
|
226
257
|
<object> ::= '{' [ <member> *(', ' <member>) ] '}' ; A sequence of 'members'
|
227
258
|
<member> ::= <string> ': ' <json> ; A pair consisting of a name, and a JSON value
|
228
259
|
|
229
|
-
If something is wrong (a missing
|
260
|
+
If something is wrong (a missing parentheses or quotes for example) it will use a few simple heuristics to fix the JSON string:
|
230
261
|
- Add the missing parentheses if the parser believes that the array or object should be closed
|
231
262
|
- Quote strings or add missing single quotes
|
232
263
|
- Adjust whitespaces and remove line breaks
|
@@ -146,6 +146,9 @@ def test_object_edge_cases():
|
|
146
146
|
assert repair_json('{"key":value, " key2":"value2" }') == '{"key": "value", " key2": "value2"}'
|
147
147
|
assert repair_json('{"key":value "key2":"value2" }') == '{"key": "value", "key2": "value2"}'
|
148
148
|
assert repair_json("{'text': 'words{words in brackets}more words'}") == '{"text": "words{words in brackets}more words"}'
|
149
|
+
assert repair_json('{text:words{words in brackets}}') == '{"text": "words{words in brackets}"}'
|
150
|
+
assert repair_json('{text:words{words in brackets}m}') == '{"text": "words{words in brackets}m"}'
|
151
|
+
assert repair_json('{"key": "value, value2"```') == '{"key": "value, value2"}'
|
149
152
|
|
150
153
|
def test_number_edge_cases():
|
151
154
|
assert repair_json(' - { "test_key": ["test_value", "test_value2"] }') == '{"test_key": ["test_value", "test_value2"]}'
|
File without changes
|
File without changes
|
@@ -1,4 +1,4 @@
|
|
1
|
-
from .json_repair import repair_json as repair_json
|
2
|
-
from .json_repair import loads as loads
|
3
|
-
from .json_repair import load as load
|
4
1
|
from .json_repair import from_file as from_file
|
2
|
+
from .json_repair import load as load
|
3
|
+
from .json_repair import loads as loads
|
4
|
+
from .json_repair import repair_json as repair_json
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|