PyPI - codeboarding - Versions diffs - 0.10.4__tar.gz → 0.11.0__tar.gz - Mend

codeboarding 0.10.4tar.gz → 0.11.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (200) hide show

{codeboarding-0.10.4/codeboarding.egg-info → codeboarding-0.11.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codeboarding
-Version: 0.10.4
+Version: 0.11.0
 Summary: Interactive Diagrams for Code
 Author: CodeBoarding Team
 License-Expression: MIT
@@ -24,6 +24,7 @@ Requires-Dist: fastapi>=0.115
 Requires-Dist: filelock>=3.12
 Requires-Dist: gitpython>=3.1
 Requires-Dist: google-api-core>=2.10
+Requires-Dist: jsonpatch>=1.33
 Requires-Dist: jsonschema>=4.25
 Requires-Dist: langchain>=1.2
 Requires-Dist: langchain-anthropic>=1.3
@@ -42,6 +43,15 @@ Requires-Dist: pathspec>=0.12
 Requires-Dist: pyyaml>=6.0
 Requires-Dist: regex>=2024.11
 Requires-Dist: rich>=12.6
+Requires-Dist: tree-sitter>=0.23
+Requires-Dist: tree-sitter-c-sharp>=0.23
+Requires-Dist: tree-sitter-go>=0.23
+Requires-Dist: tree-sitter-java>=0.23
+Requires-Dist: tree-sitter-javascript>=0.23
+Requires-Dist: tree-sitter-php>=0.23
+Requires-Dist: tree-sitter-python>=0.23
+Requires-Dist: tree-sitter-rust>=0.23
+Requires-Dist: tree-sitter-typescript>=0.23
 Requires-Dist: trustcall>=0.0.39
 Requires-Dist: uvicorn>=0.23
 Provides-Extra: dev
@@ -110,10 +120,10 @@ codeboarding-setup
 ```bash
 # Analyze a local repository (output goes to /path/to/repo/.codeboarding/)
-codeboarding --local /path/to/repo
+codeboarding full --local /path/to/repo
 # Analyze a remote GitHub repository (cloned to cwd/repo_name/, output to cwd/repo_name/.codeboarding/)
-codeboarding https://github.com/user/repo
+codeboarding full https://github.com/user/repo
 ```
 ### Python API
@@ -188,19 +198,21 @@ Shell environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) always
 ## CLI Reference
 ```
-codeboarding [REPO_URL ...]           # remote: clone + analyze
-codeboarding --local PATH             # local: analyze in-place
+codeboarding full [REPO_URL ...]           # remote: clone + analyze
+codeboarding full --local PATH             # local: analyze in-place
+codeboarding incremental --local PATH      # re-analyze only changed parts
+codeboarding partial --local PATH --component-id ID   # update one component
 ```
 | Option | Description |
 |---|---|
 | `--local PATH` | Analyze a local repository (output: `PATH/.codeboarding/`) |
 | `--depth-level INT` | Diagram depth (default: 1) |
-| `--incremental` | Smart incremental update (only re-analyze changed files) |
-| `--full` | Force full reanalysis, skip incremental detection |
-| `--partial-component-id ID` | Update a single component by its ID |
+| `--force` | (full only) Force full reanalysis, skip cached static analysis |
+| `--base-ref REF` / `--target-ref REF` | (incremental only) Git refs to diff |
+| `--component-id ID` | (partial only) ID of the component to update |
 | `--binary-location PATH` | Custom path to language server binaries (overrides `~/.codeboarding/servers/`) |
-| `--upload` | Upload results to GeneratedOnBoardings repo (remote only) |
+| `--upload` | (full, remote only) Upload results to GeneratedOnBoardings repo |
 | `--enable-monitoring` | Enable run monitoring |
 ---

{codeboarding-0.10.4 → codeboarding-0.11.0}/PYPI.md RENAMED Viewed

@@ -50,10 +50,10 @@ codeboarding-setup
 ```bash
 # Analyze a local repository (output goes to /path/to/repo/.codeboarding/)
-codeboarding --local /path/to/repo
+codeboarding full --local /path/to/repo
 # Analyze a remote GitHub repository (cloned to cwd/repo_name/, output to cwd/repo_name/.codeboarding/)
-codeboarding https://github.com/user/repo
+codeboarding full https://github.com/user/repo
 ```
 ### Python API
@@ -128,19 +128,21 @@ Shell environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) always
 ## CLI Reference
 ```
-codeboarding [REPO_URL ...]           # remote: clone + analyze
-codeboarding --local PATH             # local: analyze in-place
+codeboarding full [REPO_URL ...]           # remote: clone + analyze
+codeboarding full --local PATH             # local: analyze in-place
+codeboarding incremental --local PATH      # re-analyze only changed parts
+codeboarding partial --local PATH --component-id ID   # update one component
 ```
 | Option | Description |
 |---|---|
 | `--local PATH` | Analyze a local repository (output: `PATH/.codeboarding/`) |
 | `--depth-level INT` | Diagram depth (default: 1) |
-| `--incremental` | Smart incremental update (only re-analyze changed files) |
-| `--full` | Force full reanalysis, skip incremental detection |
-| `--partial-component-id ID` | Update a single component by its ID |
+| `--force` | (full only) Force full reanalysis, skip cached static analysis |
+| `--base-ref REF` / `--target-ref REF` | (incremental only) Git refs to diff |
+| `--component-id ID` | (partial only) ID of the component to update |
 | `--binary-location PATH` | Custom path to language server binaries (overrides `~/.codeboarding/servers/`) |
-| `--upload` | Upload results to GeneratedOnBoardings repo (remote only) |
+| `--upload` | (full, remote only) Upload results to GeneratedOnBoardings repo |
 | `--enable-monitoring` | Enable run monitoring |
 ---

{codeboarding-0.10.4 → codeboarding-0.11.0}/README.md RENAMED Viewed

@@ -70,7 +70,7 @@ For a deeper architecture walkthrough, see [`.codeboarding/overview.md`](.codebo
 uv sync --frozen
 source .venv/bin/activate  # On Windows: .venv\Scripts\activate
 python install.py
-python main.py --local /path/to/repo
+python main.py full --local /path/to/repo
 ```
 ### Use the packaged CLI
@@ -80,7 +80,7 @@ Requires **Python 3.12 or 3.13**. The recommended install method is [pipx](https
 ```bash
 pipx install codeboarding --python python3.12
 codeboarding-setup
-codeboarding --local /path/to/repo
+codeboarding full --local /path/to/repo
 ```
 Or, if you prefer pip, install into a virtual environment (not the global Python):
@@ -88,7 +88,7 @@ Or, if you prefer pip, install into a virtual environment (not the global Python
 ```bash
 pip install codeboarding
 codeboarding-setup
-codeboarding --local /path/to/repo
+codeboarding full --local /path/to/repo
 ```
 Output is written to `/path/to/repo/.codeboarding/`.
@@ -120,19 +120,19 @@ Shell environment variables such as `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOG
 ```bash
 # Analyze a local repository
-python main.py --local ./my-project
+python main.py full --local ./my-project
 # Increase diagram depth
-python main.py --local ./my-project --depth-level 2
+python main.py full --local ./my-project --depth-level 2
 # Re-analyze only changed parts when possible
-python main.py --local ./my-project --incremental
+python main.py incremental --local ./my-project
 # Update a single component by ID
-python main.py --local ./my-project --partial-component-id "1.2"
+python main.py partial --local ./my-project --component-id "1.2"
 # Analyze a remote GitHub repository
-python main.py https://github.com/pytorch/pytorch
+python main.py full https://github.com/pytorch/pytorch
 ```
 ## Where to use it

{codeboarding-0.10.4 → codeboarding-0.11.0}/agents/agent.py RENAMED Viewed

@@ -1,6 +1,5 @@
 import json
 import logging
-import time
 from pathlib import Path
 from google.api_core.exceptions import ResourceExhausted
@@ -15,6 +14,7 @@ from pydantic import ValidationError
 from trustcall import create_extractor
 from agents.prompts import get_validation_feedback_message
+from agents.retry import RetryAction, RetryDecision, default_backoff, with_retries
 from agents.tools.base import RepoContext
 from agents.tools.toolkit import CodeBoardingToolkit
 from agents.validation import ValidationResult, score_validation_results, VALIDATOR_WEIGHTS, DEFAULT_VALIDATOR_WEIGHT
@@ -96,86 +96,69 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
     def _invoke(self, prompt, callbacks: list | None = None) -> str:
         """Unified agent invocation method with timeout and exponential backoff.
-        Uses exponential backoff based on total attempts, with different multipliers
-        for different error types. This ensures backoff increases appropriately even
-        when errors alternate between types.
+        Classification applied per exception:
+        - ``TimeoutError``: backoff ``min(10·2^n, 120)``, raise on exhaustion.
+        - ``ResourceExhausted``: backoff ``min(30·2^n, 300)``, raise on exhaustion.
+        - ``status_code == 404``: raise immediately (retired model ID, etc.).
+        - Other exceptions: backoff ``min(10·2^n, 120)``, return fallback string
+          on exhaustion (non-raising — callers treat the fallback as a failed run).
         """
-        max_retries = 5
-        for attempt in range(max_retries):
+        max_attempts = 5
+        # Counter captured by the closure so we can vary the per-attempt timeout
+        # without reaching into the retry helper.
+        attempt_counter = [0]
+        def call_once() -> str:
+            attempt = attempt_counter[0]
+            attempt_counter[0] += 1
             timeout_seconds = 300 if attempt == 0 else 600
-            try:
-                callback_list = callbacks or []
-                # Always append monitoring callback - logging config controls output
-                callback_list.append(MONITORING_CALLBACK)
-                callback_list.append(self.agent_monitoring_callback)
-                logger.info(
-                    f"Starting agent.invoke() [attempt {attempt + 1}/{max_retries}] with prompt length: {len(prompt)}, timeout: {timeout_seconds}s"
-                )
-                response = self._invoke_with_timeout(
-                    timeout_seconds=timeout_seconds, callback_list=callback_list, prompt=prompt
-                )
-                logger.info(
-                    f"Completed agent.invoke() - message count: {len(response['messages'])}, last message type: {type(response['messages'][-1])}"
+            callback_list = (callbacks or []) + [MONITORING_CALLBACK, self.agent_monitoring_callback]
+            logger.info(
+                f"Starting agent.invoke() [attempt {attempt + 1}/{max_attempts}] with prompt length: {len(prompt)}, timeout: {timeout_seconds}s"
+            )
+            response = self._invoke_with_timeout(
+                timeout_seconds=timeout_seconds, callback_list=callback_list, prompt=prompt
+            )
+            logger.info(
+                f"Completed agent.invoke() - message count: {len(response['messages'])}, last message type: {type(response['messages'][-1])}"
+            )
+            agent_response = response["messages"][-1]
+            assert isinstance(agent_response, AIMessage), f"Expected AIMessage, but got {type(agent_response)}"
+            if isinstance(agent_response.content, str):
+                return agent_response.content
+            if isinstance(agent_response.content, list):
+                return "".join(str(m) if not isinstance(m, str) else m for m in agent_response.content)
+            return ""  # unreachable for AIMessage but satisfies typing
+        def classify(exc: Exception, attempt: int) -> RetryDecision:
+            if getattr(exc, "status_code", None) == 404:
+                logger.error(f"Permanent HTTP 404 — not retrying: {type(exc).__name__}: {exc}")
+                return RetryDecision(action=RetryAction.GIVE_UP)
+            if isinstance(exc, ResourceExhausted):
+                return RetryDecision(
+                    action=RetryAction.RETRY,
+                    backoff_s=default_backoff(attempt, initial_s=30.0, multiplier=2.0, max_s=300.0),
                 )
+            # TimeoutError + generic Exception share the same backoff.
+            return RetryDecision(
+                action=RetryAction.RETRY,
+                backoff_s=default_backoff(attempt, initial_s=10.0, multiplier=2.0, max_s=120.0),
+            )
-                agent_response = response["messages"][-1]
-                assert isinstance(agent_response, AIMessage), f"Expected AIMessage, but got {type(agent_response)}"
-                if isinstance(agent_response.content, str):
-                    return agent_response.content
-                if isinstance(agent_response.content, list):
-                    return "".join(
-                        [
-                            str(message) if not isinstance(message, str) else message
-                            for message in agent_response.content
-                        ]
-                    )
-            except TimeoutError as e:
-                if attempt < max_retries - 1:
-                    # Exponential backoff: 10s * 2^attempt (10s, 20s, 40s, 80s)
-                    delay = min(10 * (2**attempt), 120)
-                    logger.warning(
-                        f"Agent invocation timed out after {timeout_seconds}s, retrying in {delay}s... (attempt {attempt + 1}/{max_retries})"
-                    )
-                    time.sleep(delay)
-                else:
-                    logger.error(f"Agent invocation timed out after {timeout_seconds}s on final attempt")
-                    raise
-            except ResourceExhausted as e:
-                if attempt < max_retries - 1:
-                    # Longer backoff for rate limits: 30s * 2^attempt (30s, 60s, 120s, 240s)
-                    delay = min(30 * (2**attempt), 300)
-                    logger.warning(
-                        f"ResourceExhausted (rate limit): {e}\n"
-                        f"Retrying in {delay}s... (attempt {attempt + 1}/{max_retries})"
-                    )
-                    time.sleep(delay)
-                else:
-                    logger.error(f"Max retries ({max_retries}) reached. ResourceExhausted: {e}")
-                    raise
-            except Exception as e:
-                # HTTP 404 (e.g. retired model ID) is permanent — retrying won't help.
-                if getattr(e, "status_code", None) == 404:
-                    logger.error(f"Permanent HTTP 404 — not retrying: {type(e).__name__}: {e}")
-                    raise
-                # Other errors (network, parsing, etc.) get standard exponential backoff
-                if attempt < max_retries - 1:
-                    delay = min(10 * (2**attempt), 120)
-                    logger.warning(
-                        f"Agent error: {type(e).__name__}: {e}, retrying in {delay}s... (attempt {attempt + 1}/{max_retries})"
-                    )
-                    time.sleep(delay)
-                # On final attempt, fall through to return error message below
-        logger.error("Max retries reached. Failed to get response from the agent.")
-        return "Could not get response from the agent."
+        def on_exhausted(exc: Exception) -> str:
+            # Typed exceptions surface the original error; only generic falls through
+            # to the historic fallback string that callers have long relied on.
+            if isinstance(exc, (TimeoutError, ResourceExhausted)):
+                raise exc
+            return "Could not get response from the agent."
+        return with_retries(
+            call_once,
+            max_attempts=max_attempts,
+            classify=classify,
+            on_exhausted=on_exhausted,
+            log_prefix="Agent invocation",
+        )
     def _invoke_with_timeout(self, timeout_seconds: int, callback_list: list, prompt: str):
         """Invoke agent with a timeout using threading."""
@@ -336,18 +319,27 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
         return best_result
     def _parse_response(self, prompt, response, return_type, max_retries=5, attempt=0):
-        if attempt >= max_retries:
-            logger.error(f"Max retries ({max_retries}) reached for parsing response: {response}")
-            raise Exception(f"Max retries reached for parsing response: {response}")
-        extractor = create_extractor(self.parsing_llm, tools=[return_type], tool_choice=return_type.__name__)
         if response is None or response.strip() == "":
             logger.error(f"Empty response for prompt: {prompt}")
-        try:
-            result = extractor.invoke(
-                return_type.extractor_str() + response,
-                config={"callbacks": [MONITORING_CALLBACK, self.agent_monitoring_callback]},
-            )
+        def call_once():
+            # Extractor is rebuilt on every attempt — previous trustcall state
+            # may have corrupted attributes (see the tool_call_id bug below).
+            extractor = create_extractor(self.parsing_llm, tools=[return_type], tool_choice=return_type.__name__)
+            try:
+                result = extractor.invoke(
+                    return_type.extractor_str() + response,
+                    config={"callbacks": [MONITORING_CALLBACK, self.agent_monitoring_callback]},
+                )
+            except AttributeError as e:
+                # Trustcall bug: https://github.com/hinthornw/trustcall/issues/47
+                # 'ExtractionState' object has no attribute 'tool_call_id' during validation retry.
+                # Treat as a non-retriable fallback to the Pydantic parser.
+                if "tool_call_id" in str(e):
+                    logger.warning(f"Trustcall bug encountered, falling back to Pydantic parser: {e}")
+                    parser = PydanticOutputParser(pydantic_object=return_type)
+                    return self._try_parse(response, parser)
+                raise
             if "responses" in result and len(result["responses"]) != 0:
                 return return_type.model_validate(result["responses"][0])
             if "messages" in result and len(result["messages"]) != 0:
@@ -358,38 +350,36 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
                 return self._try_parse(message, parser)
             parser = PydanticOutputParser(pydantic_object=return_type)
             return self._try_parse(response, parser)
-        except EmptyExtractorMessageError as e:
-            logger.warning(f"{e} (attempt {attempt + 1}/{max_retries})")
-            return self._parse_response(prompt, response, return_type, max_retries, attempt + 1)
-        except AttributeError as e:
-            # Workaround for trustcall bug: https://github.com/hinthornw/trustcall/issues/47
-            # 'ExtractionState' object has no attribute 'tool_call_id' occurs during validation retry
-            if "tool_call_id" in str(e):
-                logger.warning(f"Trustcall bug encountered, falling back to Pydantic parser: {e}")
-                parser = PydanticOutputParser(pydantic_object=return_type)
-                return self._try_parse(response, parser)
-            raise
-        except IndexError as e:
-            # try to parse with the json parser if possible
-            logger.warning(f"IndexError while parsing response (attempt {attempt + 1}/{max_retries}): {e}")
-            return self._parse_response(prompt, response, return_type, max_retries, attempt + 1)
-        except (json.JSONDecodeError, ValueError) as e:
-            logger.warning(f"Parse error (attempt {attempt + 1}/{max_retries}): {e}")
-            return self._parse_response(prompt, response, return_type, max_retries, attempt + 1)
-        except ResourceExhausted as e:
-            # Parsing uses exponential backoff for rate limits
-            if attempt < max_retries - 1:
-                # Exponential backoff: 30s * 2^attempt, capped at 300s
-                delay = min(30 * (2**attempt), 300)
-                logger.warning(
-                    f"ResourceExhausted during parsing (rate limit): {e}\n"
-                    f"Retrying in {delay}s... (attempt {attempt + 1}/{max_retries})"
+        def classify(exc: Exception, attempt: int) -> RetryDecision:
+            if isinstance(exc, ResourceExhausted):
+                return RetryDecision(
+                    action=RetryAction.RETRY,
+                    backoff_s=default_backoff(attempt, initial_s=30.0, multiplier=2.0, max_s=300.0),
                 )
-                time.sleep(delay)
-                return self._parse_response(prompt, response, return_type, max_retries, attempt + 1)
-            else:
-                logger.error(f"Resource exhausted on final parsing attempt: {e}")
-                raise
+            if isinstance(exc, (EmptyExtractorMessageError, IndexError, json.JSONDecodeError, ValueError)):
+                return RetryDecision(action=RetryAction.RETRY_NOW)
+            # AttributeError (non-tool_call_id) and any other exception: give up.
+            return RetryDecision(action=RetryAction.GIVE_UP)
+        def on_exhausted(exc: Exception):
+            # Preserve historic shape: ResourceExhausted surfaces the original exception;
+            # parse-error exhaustion wraps with a descriptive message naming the response.
+            if isinstance(exc, ResourceExhausted):
+                logger.error(f"Resource exhausted on final parsing attempt: {exc}")
+                raise exc
+            logger.error(f"Max retries ({max_retries}) reached for parsing response: {response}")
+            raise Exception(f"Max retries reached for parsing response: {response}")
+        # ``attempt`` kwarg kept for backwards-compat with callers that passed it;
+        # the effective attempt count is ``max_retries - attempt``.
+        return with_retries(
+            call_once,
+            max_attempts=max(1, max_retries - attempt),
+            classify=classify,
+            on_exhausted=on_exhausted,
+            log_prefix="Parse response",
+        )
     def _try_parse(self, message_content, parser):
         try:

{codeboarding-0.10.4 → codeboarding-0.11.0}/agents/agent_responses.py RENAMED Viewed

@@ -160,6 +160,16 @@ class MethodEntry(BaseModel):
             node_type=method_change.node_type,
         )
+    @classmethod
+    def from_node(cls, node) -> MethodEntry:
+        """Build from a ``static_analyzer.Node``. Accepts ``Any`` to avoid a hard dep."""
+        return cls(
+            qualified_name=node.fully_qualified_name,
+            start_line=node.line_start,
+            end_line=node.line_end,
+            node_type=node.type.name,
+        )
 class FileMethodGroup(BaseModel):
     """All methods/functions belonging to a component within a single file."""

codeboarding-0.11.0/agents/analysis_patcher.py ADDED Viewed

@@ -0,0 +1,206 @@
+"""EASE-encoded JSON Patch flow for incremental sub-analysis updates.
+Given impacted components from the tracer, extracts the parent sub-analysis,
+EASE-encodes it, asks the LLM for RFC 6902 patches, applies them, validates
+the result, and merges back into the full analysis.
+Structured output uses trustcall's ``create_extractor`` — the same pattern
+``agents/agent.py`` uses for parsing LLM responses into Pydantic models. The
+extractor binds ``AnalysisPatch`` as the tool schema and forces the LLM to emit
+a tool call matching it, so ``result["responses"][0]`` is already schema-valid.
+"""
+import json
+import logging
+from typing import Any
+import jsonpatch
+from langchain_core.language_models import BaseChatModel
+from pydantic import ValidationError
+from trustcall import create_extractor
+from agents.agent_responses import AnalysisInsights
+from agents.prompts.prompt_factory import get_patch_system_message
+from diagram_analysis.ease import ease_decode, ease_encode
+from diagram_analysis.incremental.models import AnalysisPatch, ImpactedComponent
+logger = logging.getLogger(__name__)
+MAX_PATCH_RETRIES = 3
+# Fields excluded from the patching surface. `files` is the file-level index
+# managed separately by the analysis pipeline; patching it here would race with
+# the static-analysis update path.
+_PATCH_EXCLUDE_TOP_LEVEL: set[str] = {"files"}
+# ---------------------------------------------------------------------------
+# Pydantic-native serialization
+# ---------------------------------------------------------------------------
+def _sub_analysis_to_dict(sub: AnalysisInsights) -> dict[str, Any]:
+    """Serialize a sub-analysis to a plain dict via Pydantic's model_dump."""
+    return sub.model_dump(mode="json", exclude=_PATCH_EXCLUDE_TOP_LEVEL, exclude_none=False)
+# ---------------------------------------------------------------------------
+# EASE encoding / decoding (walks nested arrays)
+# ---------------------------------------------------------------------------
+def _encode_sub_analysis(sub_dict: dict[str, Any]) -> dict[str, Any]:
+    """EASE-encode a sub-analysis dict."""
+    encoded = ease_encode(sub_dict, ["components", "components_relations"])
+    if isinstance(encoded.get("components"), dict):
+        for key, comp in encoded["components"].items():
+            if key == "display_order" or not isinstance(comp, dict):
+                continue
+            encoded["components"][key] = ease_encode(comp, ["key_entities", "file_methods"])
+            if isinstance(encoded["components"][key].get("file_methods"), dict):
+                for fm_key, fm in encoded["components"][key]["file_methods"].items():
+                    if fm_key == "display_order" or not isinstance(fm, dict):
+                        continue
+                    encoded["components"][key]["file_methods"][fm_key] = ease_encode(fm, ["methods"])
+    return encoded
+def _decode_sub_analysis(encoded: dict[str, Any]) -> dict[str, Any]:
+    """Decode EASE-encoded sub-analysis back to plain arrays."""
+    if isinstance(encoded.get("components"), dict):
+        for key, comp in encoded["components"].items():
+            if key == "display_order" or not isinstance(comp, dict):
+                continue
+            if isinstance(comp.get("file_methods"), dict):
+                for fm_key, fm in comp["file_methods"].items():
+                    if fm_key == "display_order" or not isinstance(fm, dict):
+                        continue
+                    comp["file_methods"][fm_key] = ease_decode(fm, ["methods"])
+            encoded["components"][key] = ease_decode(comp, ["key_entities", "file_methods"])
+    return ease_decode(encoded, ["components", "components_relations"])
+# ---------------------------------------------------------------------------
+# Prompt construction
+# ---------------------------------------------------------------------------
+def _build_patch_prompt(
+    encoded_sub: dict[str, Any],
+    impact: ImpactedComponent,
+    sub_analysis_id: str,
+) -> str:
+    parts = [
+        "# Current Sub-Analysis (EASE-encoded)\n",
+        f"```json\n{json.dumps(encoded_sub, indent=2)}\n```\n",
+        "# Impact Dossier\n",
+        f"Sub-analysis ID: {sub_analysis_id}\n",
+        f"Impacted methods: {', '.join(impact.impacted_methods)}\n",
+        "\nGenerate a patch to update component descriptions, key_entities, and relations",
+        "to reflect the semantic changes indicated by the impacted methods.",
+        "Respond with sub_analysis_id, reasoning, and patches (list of op/path/value).",
+    ]
+    return "\n".join(parts)
+# ---------------------------------------------------------------------------
+# Apply patches
+# ---------------------------------------------------------------------------
+def _apply_patches(encoded: dict[str, Any], patch_ops: list[dict]) -> dict[str, Any]:
+    """Apply RFC 6902 patches to an EASE-encoded dict."""
+    patch = jsonpatch.JsonPatch(patch_ops)
+    return patch.apply(encoded)
+# ---------------------------------------------------------------------------
+# Validation via Pydantic native round-trip
+# ---------------------------------------------------------------------------
+def _validate_patched(decoded: dict[str, Any], original: AnalysisInsights) -> AnalysisInsights | None:
+    """Validate that a decoded sub-analysis round-trips to AnalysisInsights.
+    The decoded dict is missing excluded fields (see ``_PATCH_EXCLUDE_TOP_LEVEL``).
+    Those fields are re-grafted directly from the original model so the result
+    is complete. We cannot re-dump them because ``exclude=True`` Field settings
+    drop them entirely — we copy the live model attributes instead.
+    """
+    try:
+        validated = AnalysisInsights.model_validate(decoded)
+        for field_name in _PATCH_EXCLUDE_TOP_LEVEL:
+            original_value = getattr(original, field_name, None)
+            if original_value is not None:
+                setattr(validated, field_name, original_value)
+        return validated
+    except (ValidationError, KeyError, TypeError) as exc:
+        logger.warning("Patched sub-analysis failed validation: %s", exc)
+        return None
+# ---------------------------------------------------------------------------
+# Main patch flow
+# ---------------------------------------------------------------------------
+def patch_sub_analysis(
+    sub_analysis: AnalysisInsights,
+    sub_analysis_id: str,
+    impact: ImpactedComponent,
+    parsing_llm: BaseChatModel,
+) -> AnalysisInsights | None:
+    """Patch a single sub-analysis using EASE-encoded JSON patches.
+    Returns the patched AnalysisInsights, or None if patching fails after retries.
+    """
+    sub_dict = _sub_analysis_to_dict(sub_analysis)
+    encoded = _encode_sub_analysis(sub_dict)
+    prompt = _build_patch_prompt(encoded, impact, sub_analysis_id)
+    extractor = create_extractor(parsing_llm, tools=[AnalysisPatch], tool_choice=AnalysisPatch.__name__)
+    last_error = ""
+    for attempt in range(MAX_PATCH_RETRIES):
+        try:
+            full_prompt = get_patch_system_message() + "\n\n" + prompt
+            if last_error:
+                full_prompt += f"\n\nPrevious attempt failed validation: {last_error}\nPlease fix the patch."
+            result = extractor.invoke(full_prompt)
+            if "responses" not in result or not result["responses"]:
+                logger.warning("Patch extractor returned no responses (attempt %d)", attempt + 1)
+                continue
+            patch_response = AnalysisPatch.model_validate(result["responses"][0])
+            patch_ops = [op.model_dump(exclude_none=True) for op in patch_response.patches]
+            if not patch_ops:
+                logger.info("LLM returned empty patch for %s — no changes needed", sub_analysis_id)
+                return sub_analysis
+            patched = _apply_patches(encoded, patch_ops)
+            decoded = _decode_sub_analysis(patched)
+            validated = _validate_patched(decoded, sub_analysis)
+            if validated is not None:
+                logger.info("Successfully patched sub-analysis %s on attempt %d", sub_analysis_id, attempt + 1)
+                return validated
+            last_error = "Decoded result failed Pydantic validation"
+        except jsonpatch.JsonPatchException as exc:
+            last_error = f"JSON Patch application error: {exc}"
+            logger.warning("Patch apply failed for %s (attempt %d): %s", sub_analysis_id, attempt + 1, exc)
+        except Exception as exc:
+            last_error = str(exc)
+            logger.warning("Patch flow error for %s (attempt %d): %s", sub_analysis_id, attempt + 1, exc)
+    logger.error("Patching failed for sub-analysis %s after %d attempts", sub_analysis_id, MAX_PATCH_RETRIES)
+    return None
+# ---------------------------------------------------------------------------
+# Merge patched sub-analyses back
+# ---------------------------------------------------------------------------
+def merge_patched_sub_analyses(
+    sub_analyses: dict[str, AnalysisInsights],
+    patched: dict[str, AnalysisInsights],
+) -> None:
+    """Merge patched sub-analyses back into the analysis structures. Mutates in place."""
+    for sub_id, patched_sub in patched.items():
+        if sub_id in sub_analyses:
+            sub_analyses[sub_id] = patched_sub
+            logger.info("Merged patched sub-analysis %s", sub_id)
+        else:
+            logger.warning("Patched sub-analysis %s not found in existing sub_analyses", sub_id)

codeboarding 0.10.4__tar.gz → 0.11.0__tar.gz

codeboarding 0.10.4tar.gz → 0.11.0tar.gz