PyPI - json-repair - Versions diffs - 0.55.1__tar.gz → 0.56.0__tar.gz - Mend

json-repair 0.55.1tar.gz → 0.56.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

{json_repair-0.55.1 → json_repair-0.56.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: json_repair
-Version: 0.55.1
+Version: 0.56.0
 Summary: A package to repair broken json strings
 Author-email: Stefano Baccianella <4247706+mangiucugna@users.noreply.github.com>
 License-Expression: MIT
@@ -13,6 +13,9 @@ Classifier: Operating System :: OS Independent
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
+Provides-Extra: schema
+Requires-Dist: jsonschema>=4.21; extra == "schema"
+Requires-Dist: pydantic>=2; extra == "schema"
 Dynamic: license-file
 [![PyPI](https://img.shields.io/pypi/v/json-repair)](https://pypi.org/project/json-repair/)
@@ -190,6 +193,58 @@ In strict mode the parser raises `ValueError` as soon as it encounters structura
 Strict mode still honors `skip_json_loads=True`; combining them lets you skip the initial `json.loads` check but still enforce strict parsing rules.
+### Schema-guided repairs
+**Alpha feature (not yet in stable releases).** Schema-guided repairs are currently shipped only in alpha builds (e.g., `0.56.0-alpha.*`). The API and behavior may change or break between alpha releases.
+You can guide repairs with a JSON Schema (or a Pydantic v2 model). When enabled, the parser will:
+- Fill missing values (defaults, required fields).
+- Coerce scalars where safe (e.g., `"1"` → `1` for integer fields).
+- Drop properties/items that the schema disallows.
+This is especially useful when you need deterministic, schema-valid outputs for downstream validation, storage, or typed processing. If the input cannot be repaired to satisfy the schema, `json_repair` raises `ValueError`.
+Install the optional dependencies:
+    pip install 'json-repair[schema]'
+(For CLI usage, you can also use `pipx install 'json-repair[schema]'`.)
+Schema guidance is skipped for already-valid JSON unless you pass `skip_json_loads=True` (this forces the parser to run even on valid JSON). Schema guidance is mutually exclusive with `strict=True`.
+```
+from json_repair import repair_json
+schema = {
+    "type": "object",
+    "properties": {"value": {"type": "integer"}},
+    "required": ["value"],
+}
+repair_json('{"value": "1"}', schema=schema, skip_json_loads=True, return_objects=True)
+```
+Pydantic v2 model example:
+```
+from pydantic import BaseModel, Field
+from json_repair import repair_json
+class Payload(BaseModel):
+    value: int
+    tags: list[str] = Field(default_factory=list)
+repair_json(
+    '{"value": "1", "tags": }',
+    schema=Payload,
+    skip_json_loads=True,
+    return_objects=True,
+)
+```
 ### Use json_repair with streaming
 Sometimes you are streaming some data and want to repair the JSON coming from it. Normally this won't work but you can pass `stream_stable` to `repair_json()` or `loads()` to make it work:
@@ -207,7 +262,9 @@ pipx install json-repair
 to know all options available:
 ```
 $ json_repair -h
-usage: json_repair [-h] [-i] [-o TARGET] [--ensure_ascii] [--indent INDENT] [filename]
+usage: json_repair [-h] [-i] [-o TARGET] [--ensure_ascii] [--indent INDENT]
+                   [--skip-json-loads] [--schema SCHEMA] [--schema-model MODEL]
+                   [--strict] [filename]
 Repair and parse JSON files.
@@ -221,6 +278,9 @@ options:
                         If specified, the output will be written to TARGET filename instead of stdout
   --ensure_ascii        Pass ensure_ascii=True to json.dumps()
   --indent INDENT       Number of spaces for indentation (Default 2)
+  --skip-json-loads     Skip initial json.loads validation (needed to force schema on valid JSON)
+  --schema SCHEMA       Path to a JSON Schema file that guides repairs
+  --schema-model MODEL  Pydantic v2 model in 'module:ClassName' form that guides repairs
   --strict              Raise on duplicate keys, missing separators, empty keys/values, and similar structural issues instead of repairing them
 ```
@@ -274,8 +334,15 @@ If something is wrong (a missing parentheses or quotes for example) it will use
 I am sure some corner cases will be missing, if you have examples please open an issue or even better push a PR
+# Contributing
+If you want to contribute, start with `CONTRIBUTING.md` and read the Code Wiki writeup for a tour of the codebase and key entry points: https://codewiki.google/github.com/mangiucugna/json_repair
 # How to develop
-Just create a virtual environment with `requirements.txt`, the setup uses [pre-commit](https://pre-commit.com/) to make sure all tests are run.
+Use `uv` to set up the dev environment and run tooling:
+    uv sync --group dev
+    uv run pre-commit run --all-files
+    uv run pytest
 Make sure that the Github Actions running after pushing a new commit don't fail as well.

json_repair-0.55.1/src/json_repair.egg-info/PKG-INFO → json_repair-0.56.0/README.md RENAMED Viewed

@@ -1,20 +1,3 @@
-Metadata-Version: 2.4
-Name: json_repair
-Version: 0.55.1
-Summary: A package to repair broken json strings
-Author-email: Stefano Baccianella <4247706+mangiucugna@users.noreply.github.com>
-License-Expression: MIT
-Project-URL: Homepage, https://github.com/mangiucugna/json_repair/
-Project-URL: Bug Tracker, https://github.com/mangiucugna/json_repair/issues
-Project-URL: Live demo, https://mangiucugna.github.io/json_repair/
-Keywords: JSON,REPAIR,LLM,PARSER
-Classifier: Programming Language :: Python :: 3
-Classifier: Operating System :: OS Independent
-Requires-Python: >=3.10
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Dynamic: license-file
 [![PyPI](https://img.shields.io/pypi/v/json-repair)](https://pypi.org/project/json-repair/)
 ![Python version](https://img.shields.io/badge/python-3.10+-important)
 [![PyPI downloads](https://img.shields.io/pypi/dm/json-repair)](https://pypi.org/project/json-repair/)
@@ -190,6 +173,58 @@ In strict mode the parser raises `ValueError` as soon as it encounters structura
 Strict mode still honors `skip_json_loads=True`; combining them lets you skip the initial `json.loads` check but still enforce strict parsing rules.
+### Schema-guided repairs
+**Alpha feature (not yet in stable releases).** Schema-guided repairs are currently shipped only in alpha builds (e.g., `0.56.0-alpha.*`). The API and behavior may change or break between alpha releases.
+You can guide repairs with a JSON Schema (or a Pydantic v2 model). When enabled, the parser will:
+- Fill missing values (defaults, required fields).
+- Coerce scalars where safe (e.g., `"1"` → `1` for integer fields).
+- Drop properties/items that the schema disallows.
+This is especially useful when you need deterministic, schema-valid outputs for downstream validation, storage, or typed processing. If the input cannot be repaired to satisfy the schema, `json_repair` raises `ValueError`.
+Install the optional dependencies:
+    pip install 'json-repair[schema]'
+(For CLI usage, you can also use `pipx install 'json-repair[schema]'`.)
+Schema guidance is skipped for already-valid JSON unless you pass `skip_json_loads=True` (this forces the parser to run even on valid JSON). Schema guidance is mutually exclusive with `strict=True`.
+```
+from json_repair import repair_json
+schema = {
+    "type": "object",
+    "properties": {"value": {"type": "integer"}},
+    "required": ["value"],
+}
+repair_json('{"value": "1"}', schema=schema, skip_json_loads=True, return_objects=True)
+```
+Pydantic v2 model example:
+```
+from pydantic import BaseModel, Field
+from json_repair import repair_json
+class Payload(BaseModel):
+    value: int
+    tags: list[str] = Field(default_factory=list)
+repair_json(
+    '{"value": "1", "tags": }',
+    schema=Payload,
+    skip_json_loads=True,
+    return_objects=True,
+)
+```
 ### Use json_repair with streaming
 Sometimes you are streaming some data and want to repair the JSON coming from it. Normally this won't work but you can pass `stream_stable` to `repair_json()` or `loads()` to make it work:
@@ -207,7 +242,9 @@ pipx install json-repair
 to know all options available:
 ```
 $ json_repair -h
-usage: json_repair [-h] [-i] [-o TARGET] [--ensure_ascii] [--indent INDENT] [filename]
+usage: json_repair [-h] [-i] [-o TARGET] [--ensure_ascii] [--indent INDENT]
+                   [--skip-json-loads] [--schema SCHEMA] [--schema-model MODEL]
+                   [--strict] [filename]
 Repair and parse JSON files.
@@ -221,6 +258,9 @@ options:
                         If specified, the output will be written to TARGET filename instead of stdout
   --ensure_ascii        Pass ensure_ascii=True to json.dumps()
   --indent INDENT       Number of spaces for indentation (Default 2)
+  --skip-json-loads     Skip initial json.loads validation (needed to force schema on valid JSON)
+  --schema SCHEMA       Path to a JSON Schema file that guides repairs
+  --schema-model MODEL  Pydantic v2 model in 'module:ClassName' form that guides repairs
   --strict              Raise on duplicate keys, missing separators, empty keys/values, and similar structural issues instead of repairing them
 ```
@@ -274,8 +314,15 @@ If something is wrong (a missing parentheses or quotes for example) it will use
 I am sure some corner cases will be missing, if you have examples please open an issue or even better push a PR
+# Contributing
+If you want to contribute, start with `CONTRIBUTING.md` and read the Code Wiki writeup for a tour of the codebase and key entry points: https://codewiki.google/github.com/mangiucugna/json_repair
 # How to develop
-Just create a virtual environment with `requirements.txt`, the setup uses [pre-commit](https://pre-commit.com/) to make sure all tests are run.
+Use `uv` to set up the dev environment and run tooling:
+    uv sync --group dev
+    uv run pre-commit run --all-files
+    uv run pytest
 Make sure that the Github Actions running after pushing a new commit don't fail as well.

{json_repair-0.55.1 → json_repair-0.56.0}/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ requires = ["setuptools>=61.0"]
 build-backend = "setuptools.build_meta"
 [project]
 name = "json_repair"
-version = "0.55.1"
+version = "0.56.0"
 license = "MIT"
 license-files = ["LICENSE"]
 authors = [
@@ -21,6 +21,33 @@ classifiers = [
 "Homepage" = "https://github.com/mangiucugna/json_repair/"
 "Bug Tracker" = "https://github.com/mangiucugna/json_repair/issues"
 "Live demo" = "https://mangiucugna.github.io/json_repair/"
+[project.optional-dependencies]
+schema = [
+  "jsonschema>=4.21",
+  "pydantic>=2",
+]
+[dependency-groups]
+dev = [
+    "coverage",
+    "jsonschema",
+    "mypy",
+    "pydantic",
+    "pre-commit",
+    "pytest",
+    "pytest-benchmark",
+    "ty",
+]
+test = [
+    "coverage",
+    "jsonschema",
+    "pydantic",
+    "pytest",
+    "pytest-benchmark",
+]
+typecheck = [
+    "mypy",
+    "ty",
+]
 [tool.pytest.ini_options]
 pythonpath = [
   "."
@@ -69,13 +96,19 @@ target-version = "py313"
 # Flake8-bugbear – catches real-world Python footguns - B
 # Flake8-builtins - A
 # Flake8-comprehensions  - C4
+# Flake8-blind-except - BLE
 # Flake8-commas - COM
+# Flake8-print - T20
 # Flake8-quotes - Q
+# Flake8-return - RET
 # Flake8-tidy-imports - TID
 # Flake8-unused-arguments - ARG
+# Refurb - FURB
 # Isort - I
 # Mccabe – code complexity warnings - C90
 # PEP 8 Naming convention - N
+# Perflint - PERF
+# Flake8-use-pathlib - PTH
 # Pycodestyle - E, W
 # Pyflakes - F
 # Pylint - PLC, PLE, PLR, PLW
@@ -83,10 +116,11 @@ target-version = "py313"
 # Pyupgrade – safe modernization (e.g., str() → f"") - UP
 # Ruff specific - RUF
 # Simplifications (e.g., if x == True → if x) - SIM
-select = ['A', 'ARG', 'B', 'C4', 'COM', 'C90', 'E', 'F', 'I', 'N', 'PLC', 'PLE', 'PLW', 'PT', 'Q', 'S', 'SIM', 'TID', 'UP', 'W']
+select = ['A', 'ARG', 'B', 'BLE', 'C4', 'COM', 'C90', 'E', 'F', 'FURB', 'I', 'N', 'PERF', 'PLC', 'PLE', 'PLW', 'PT', 'PTH', 'Q', 'RET', 'S', 'SIM', 'T20', 'TID', 'UP', 'W']
 # Only enable these RUF rules
 extend-select = [
   "RUF001",  # ambiguous Unicode
+  "RUF100",  # unused noqa
   "RUF012",  # mutable default arguments
   "RUF013",  # unnecessary super()
   "RUF016",  # unnecessary else after return (optional)
@@ -117,5 +151,7 @@ line-ending = "auto"
 [tool.ruff.lint.per-file-ignores]
 # Explicit re-exports is fine in __init__.py, still a code smell elsewhere.
 "__init__.py" = ["PLC0414"]
+"src/json_repair/json_repair.py" = ["T201"]
+"tests/profiler.py" = ["T201"]
 [tool.mypy]
 strict = true

{json_repair-0.55.1 → json_repair-0.56.0}/src/json_repair/json_parser.py RENAMED Viewed

@@ -1,4 +1,5 @@
-from typing import TextIO
+from collections.abc import Callable
+from typing import TYPE_CHECKING, Any, TextIO
 from .parse_array import parse_array as _parse_array
 from .parse_comment import parse_comment as _parse_comment
@@ -10,11 +11,18 @@ from .utils.json_context import JsonContext
 from .utils.object_comparer import ObjectComparer
 from .utils.string_file_wrapper import StringFileWrapper
+if TYPE_CHECKING:
+    from .schema_repair import SchemaRepairer
 class JSONParser:
     # Split the parse methods into separate files because this one was like 3000 lines
-    def parse_array(self) -> list[JSONReturnType]:
-        return _parse_array(self)
+    def parse_array(
+        self,
+        schema: dict[str, Any] | bool | None = None,
+        path: str = "$",
+    ) -> list[JSONReturnType]:
+        return _parse_array(self, schema, path)
     def parse_comment(self) -> JSONReturnType:
         return _parse_comment(self)
@@ -22,8 +30,12 @@ class JSONParser:
     def parse_number(self) -> JSONReturnType:
         return _parse_number(self)
-    def parse_object(self) -> JSONReturnType:
-        return _parse_object(self)
+    def parse_object(
+        self,
+        schema: dict[str, Any] | bool | None = None,
+        path: str = "$",
+    ) -> JSONReturnType:
+        return _parse_object(self, schema, path)
     def parse_string(self) -> JSONReturnType:
         return _parse_string(self)
@@ -53,8 +65,8 @@ class JSONParser:
         # We could add a guard in the code for each call but that would make this code unreadable, so here's this neat trick
         # Replace self.log with a noop
         self.logging = logging
+        self.logger: list[dict[str, str]] = []
         if logging:
-            self.logger: list[dict[str, str]] = []
             self.log = self._log
         else:
             # No-op
@@ -71,11 +83,26 @@ class JSONParser:
         # may not be desirable in some use cases and the user would prefer json_repair to return an exception.
         # So strict mode was added to disable some of those heuristics.
         self.strict = strict
+        self.schema_repairer: SchemaRepairer | None = None
     def parse(
         self,
-    ) -> JSONReturnType | tuple[JSONReturnType, list[dict[str, str]]]:
-        json = self.parse_json()
+    ) -> JSONReturnType:
+        return self._parse_top_level(self.parse_json)
+    def parse_with_schema(
+        self,
+        repairer: "SchemaRepairer",
+        schema: dict[str, Any] | bool,
+    ) -> JSONReturnType:
+        """Parse with schema guidance enabled for all nested values."""
+        self.schema_repairer = repairer
+        return self._parse_top_level(lambda: self.parse_json(schema, "$"))
+    # Consolidate top-level parsing so we handle multiple sequential JSON values consistently
+    # (including update semantics and strict-mode validation).
+    def _parse_top_level(self, parse_element: Callable[[], JSONReturnType]) -> JSONReturnType:
+        json = parse_element()
         if self.index < len(self.json_str):
             self.log(
                 "The parser returned early, checking if there's more json elements",
@@ -83,19 +110,17 @@ class JSONParser:
             json = [json]
             while self.index < len(self.json_str):
                 self.context.reset()
-                j = self.parse_json()
+                j = parse_element()
                 if j:
                     if ObjectComparer.is_same_object(json[-1], j):
-                        # replace the last entry with the new one since the new one seems an update
+                        # Treat repeated objects as updates: keep the newest value.
                         json.pop()
                     else:
                         if not json[-1]:
                             json.pop()
                     json.append(j)
                 else:
-                    # this was a bust, move the index
                     self.index += 1
-            # If nothing extra was found, don't return an array
             if len(json) == 1:
                 self.log(
                     "There were no more elements, returning the element without the array",
@@ -106,38 +131,51 @@ class JSONParser:
                     "Multiple top-level JSON elements found in strict mode, raising an error",
                 )
                 raise ValueError("Multiple top-level JSON elements found in strict mode.")
-        if self.logging:
-            return json, self.logger
-        else:
-            return json
+        return json
     def parse_json(
         self,
+        schema: dict[str, Any] | bool | None = None,
+        path: str = "$",
     ) -> JSONReturnType:
+        """Parse the next JSON value and, when configured, enforce schema constraints."""
+        repairer = self.schema_repairer if self.schema_repairer is not None and schema not in (None, True) else None
+        if repairer is not None:
+            # Resolve references once and decide whether schema-guided repairs are needed.
+            schema = repairer.resolve_schema(schema)
+            if schema is True:
+                repairer = None
+            elif schema is False:
+                raise ValueError("Schema does not allow any values.")
         while True:
             char = self.get_char_at()
             # None means that we are at the end of the string provided
             if char is None:
                 return ""
             # <object> starts with '{'
-            elif char == "{":
+            if char == "{":
                 self.index += 1
-                return self.parse_object()
+                value = self.parse_object(schema, path) if repairer else self.parse_object()
+                return repairer.repair_value(value, schema, path) if repairer else value
             # <array> starts with '['
-            elif char == "[":
+            if char == "[":
                 self.index += 1
-                return self.parse_array()
+                value = self.parse_array(schema, path) if repairer else self.parse_array()
+                return repairer.repair_value(value, schema, path) if repairer else value
             # <string> starts with a quote
-            elif not self.context.empty and (char in STRING_DELIMITERS or char.isalpha()):
-                return self.parse_string()
+            if not self.context.empty and (char in STRING_DELIMITERS or char.isalpha()):
+                value = self.parse_string()
+                return repairer.repair_value(value, schema, path) if repairer else value
             # <number> starts with [0-9] or minus
-            elif not self.context.empty and (char.isdigit() or char == "-" or char == "."):
-                return self.parse_number()
-            elif char in ["#", "/"]:
-                return self.parse_comment()
+            if not self.context.empty and (char.isdigit() or char == "-" or char == "."):
+                value = self.parse_number()
+                return repairer.repair_value(value, schema, path) if repairer else value
+            if char in ["#", "/"]:
+                value = self.parse_comment()
+                return repairer.repair_value(value, schema, path) if repairer else value
             # If everything else fails, we just ignore and move on
-            else:
-                self.index += 1
+            self.index += 1
     def get_char_at(self, count: int = 0) -> str | None:
         # Why not use something simpler? Because try/except in python is a faster alternative to an "if" statement that is often True

json-repair 0.55.1__tar.gz → 0.56.0__tar.gz

json-repair 0.55.1tar.gz → 0.56.0tar.gz