PyPI - codeboarding - Versions diffs - 0.10.4__tar.gz → 0.12.0__tar.gz - Mend

codeboarding 0.10.4tar.gz → 0.12.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (203) hide show

{codeboarding-0.10.4/codeboarding.egg-info → codeboarding-0.12.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codeboarding
-Version: 0.10.4
+Version: 0.12.0
 Summary: Interactive Diagrams for Code
 Author: CodeBoarding Team
 License-Expression: MIT
@@ -18,12 +18,12 @@ Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: docker>=7.1
 Requires-Dist: dotenv>=0.9
-Requires-Dist: duckdb>=1.3
 Requires-Dist: dulwich>=0.22
 Requires-Dist: fastapi>=0.115
 Requires-Dist: filelock>=3.12
 Requires-Dist: gitpython>=3.1
 Requires-Dist: google-api-core>=2.10
+Requires-Dist: jsonpatch>=1.33
 Requires-Dist: jsonschema>=4.25
 Requires-Dist: langchain>=1.2
 Requires-Dist: langchain-anthropic>=1.3
@@ -33,6 +33,7 @@ Requires-Dist: langchain-community>=0.4
 Requires-Dist: langchain-google-genai>=3.1
 Requires-Dist: langchain-ollama>=1.0
 Requires-Dist: langchain-openai>=1.1
+Requires-Dist: leidenalg>=0.10
 Requires-Dist: markdown>=3.8
 Requires-Dist: markdown-it-py>=3.0
 Requires-Dist: markitdown>=0.1
@@ -42,6 +43,15 @@ Requires-Dist: pathspec>=0.12
 Requires-Dist: pyyaml>=6.0
 Requires-Dist: regex>=2024.11
 Requires-Dist: rich>=12.6
+Requires-Dist: tree-sitter>=0.23
+Requires-Dist: tree-sitter-c-sharp>=0.23
+Requires-Dist: tree-sitter-go>=0.23
+Requires-Dist: tree-sitter-java>=0.23
+Requires-Dist: tree-sitter-javascript>=0.23
+Requires-Dist: tree-sitter-php>=0.23
+Requires-Dist: tree-sitter-python>=0.23
+Requires-Dist: tree-sitter-rust>=0.23
+Requires-Dist: tree-sitter-typescript>=0.23
 Requires-Dist: trustcall>=0.0.39
 Requires-Dist: uvicorn>=0.23
 Provides-Extra: dev
@@ -110,10 +120,10 @@ codeboarding-setup
 ```bash
 # Analyze a local repository (output goes to /path/to/repo/.codeboarding/)
-codeboarding --local /path/to/repo
+codeboarding full --local /path/to/repo
 # Analyze a remote GitHub repository (cloned to cwd/repo_name/, output to cwd/repo_name/.codeboarding/)
-codeboarding https://github.com/user/repo
+codeboarding full https://github.com/user/repo
 ```
 ### Python API
@@ -188,19 +198,21 @@ Shell environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) always
 ## CLI Reference
 ```
-codeboarding [REPO_URL ...]           # remote: clone + analyze
-codeboarding --local PATH             # local: analyze in-place
+codeboarding full [REPO_URL ...]           # remote: clone + analyze
+codeboarding full --local PATH             # local: analyze in-place
+codeboarding incremental --local PATH      # re-analyze only changed parts
+codeboarding partial --local PATH --component-id ID   # update one component
 ```
 | Option | Description |
 |---|---|
 | `--local PATH` | Analyze a local repository (output: `PATH/.codeboarding/`) |
 | `--depth-level INT` | Diagram depth (default: 1) |
-| `--incremental` | Smart incremental update (only re-analyze changed files) |
-| `--full` | Force full reanalysis, skip incremental detection |
-| `--partial-component-id ID` | Update a single component by its ID |
+| `--force` | (full only) Force full reanalysis, skip cached static analysis |
+| `--base-ref REF` / `--target-ref REF` | (incremental only) Git refs to diff |
+| `--component-id ID` | (partial only) ID of the component to update |
 | `--binary-location PATH` | Custom path to language server binaries (overrides `~/.codeboarding/servers/`) |
-| `--upload` | Upload results to GeneratedOnBoardings repo (remote only) |
+| `--upload` | (full, remote only) Upload results to GeneratedOnBoardings repo |
 | `--enable-monitoring` | Enable run monitoring |
 ---

{codeboarding-0.10.4 → codeboarding-0.12.0}/PYPI.md RENAMED Viewed

@@ -50,10 +50,10 @@ codeboarding-setup
 ```bash
 # Analyze a local repository (output goes to /path/to/repo/.codeboarding/)
-codeboarding --local /path/to/repo
+codeboarding full --local /path/to/repo
 # Analyze a remote GitHub repository (cloned to cwd/repo_name/, output to cwd/repo_name/.codeboarding/)
-codeboarding https://github.com/user/repo
+codeboarding full https://github.com/user/repo
 ```
 ### Python API
@@ -128,19 +128,21 @@ Shell environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) always
 ## CLI Reference
 ```
-codeboarding [REPO_URL ...]           # remote: clone + analyze
-codeboarding --local PATH             # local: analyze in-place
+codeboarding full [REPO_URL ...]           # remote: clone + analyze
+codeboarding full --local PATH             # local: analyze in-place
+codeboarding incremental --local PATH      # re-analyze only changed parts
+codeboarding partial --local PATH --component-id ID   # update one component
 ```
 | Option | Description |
 |---|---|
 | `--local PATH` | Analyze a local repository (output: `PATH/.codeboarding/`) |
 | `--depth-level INT` | Diagram depth (default: 1) |
-| `--incremental` | Smart incremental update (only re-analyze changed files) |
-| `--full` | Force full reanalysis, skip incremental detection |
-| `--partial-component-id ID` | Update a single component by its ID |
+| `--force` | (full only) Force full reanalysis, skip cached static analysis |
+| `--base-ref REF` / `--target-ref REF` | (incremental only) Git refs to diff |
+| `--component-id ID` | (partial only) ID of the component to update |
 | `--binary-location PATH` | Custom path to language server binaries (overrides `~/.codeboarding/servers/`) |
-| `--upload` | Upload results to GeneratedOnBoardings repo (remote only) |
+| `--upload` | (full, remote only) Upload results to GeneratedOnBoardings repo |
 | `--enable-monitoring` | Enable run monitoring |
 ---

{codeboarding-0.10.4 → codeboarding-0.12.0}/README.md RENAMED Viewed

@@ -70,7 +70,7 @@ For a deeper architecture walkthrough, see [`.codeboarding/overview.md`](.codebo
 uv sync --frozen
 source .venv/bin/activate  # On Windows: .venv\Scripts\activate
 python install.py
-python main.py --local /path/to/repo
+python main.py full --local /path/to/repo
 ```
 ### Use the packaged CLI
@@ -80,7 +80,7 @@ Requires **Python 3.12 or 3.13**. The recommended install method is [pipx](https
 ```bash
 pipx install codeboarding --python python3.12
 codeboarding-setup
-codeboarding --local /path/to/repo
+codeboarding full --local /path/to/repo
 ```
 Or, if you prefer pip, install into a virtual environment (not the global Python):
@@ -88,7 +88,7 @@ Or, if you prefer pip, install into a virtual environment (not the global Python
 ```bash
 pip install codeboarding
 codeboarding-setup
-codeboarding --local /path/to/repo
+codeboarding full --local /path/to/repo
 ```
 Output is written to `/path/to/repo/.codeboarding/`.
@@ -120,19 +120,19 @@ Shell environment variables such as `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOG
 ```bash
 # Analyze a local repository
-python main.py --local ./my-project
+python main.py full --local ./my-project
 # Increase diagram depth
-python main.py --local ./my-project --depth-level 2
+python main.py full --local ./my-project --depth-level 2
 # Re-analyze only changed parts when possible
-python main.py --local ./my-project --incremental
+python main.py incremental --local ./my-project
 # Update a single component by ID
-python main.py --local ./my-project --partial-component-id "1.2"
+python main.py partial --local ./my-project --component-id "1.2"
 # Analyze a remote GitHub repository
-python main.py https://github.com/pytorch/pytorch
+python main.py full https://github.com/pytorch/pytorch
 ```
 ## Where to use it
@@ -143,7 +143,7 @@ python main.py https://github.com/pytorch/pytorch
 ## Supported stack
-- Languages: Python, TypeScript, JavaScript, Java, Go, PHP, Rust.
+- Languages: Python, TypeScript, JavaScript, Java, Go, PHP, Rust, C#.
 - LLM providers: OpenAI, Anthropic, Google, Vercel AI Gateway, AWS Bedrock, Ollama, OpenRouter, and more.
 ## Examples

{codeboarding-0.10.4 → codeboarding-0.12.0}/agents/agent.py RENAMED Viewed

@@ -1,12 +1,11 @@
 import json
 import logging
-import time
 from pathlib import Path
 from google.api_core.exceptions import ResourceExhausted
 from langchain_core.exceptions import OutputParserException
 from langchain_core.language_models import BaseChatModel
-from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
+from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage
 from langchain_core.output_parsers import PydanticOutputParser
 from langchain_core.prompts import PromptTemplate
 from langchain.agents import create_agent
@@ -15,11 +14,13 @@ from pydantic import ValidationError
 from trustcall import create_extractor
 from agents.prompts import get_validation_feedback_message
+from agents.retry import RetryAction, RetryDecision, default_backoff, with_retries
 from agents.tools.base import RepoContext
 from agents.tools.toolkit import CodeBoardingToolkit
 from agents.validation import ValidationResult, score_validation_results, VALIDATOR_WEIGHTS, DEFAULT_VALIDATOR_WEIGHT
 from monitoring.mixin import MonitoringMixin
 from repo_utils.ignore import RepoIgnoreManager
+from agents.agent_responses import LLMBaseModel
 from agents.llm_config import MONITORING_CALLBACK
 from static_analyzer.analysis_result import StaticAnalysisResults
 from static_analyzer.reference_resolve_mixin import ReferenceResolverMixin
@@ -43,10 +44,10 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
         ReferenceResolverMixin.__init__(self, repo_dir, static_analysis)
         MonitoringMixin.__init__(self)
         self.parsing_llm = parsing_llm
+        self.agent_llm = agent_llm
         self.repo_dir = repo_dir
         self.ignore_manager = RepoIgnoreManager(repo_dir)
-        # Initialize the professional toolkit
         context = RepoContext(repo_dir=repo_dir, ignore_manager=self.ignore_manager, static_analysis=static_analysis)
         self.toolkit = CodeBoardingToolkit(context=context)
@@ -96,86 +97,69 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
     def _invoke(self, prompt, callbacks: list | None = None) -> str:
         """Unified agent invocation method with timeout and exponential backoff.
-        Uses exponential backoff based on total attempts, with different multipliers
-        for different error types. This ensures backoff increases appropriately even
-        when errors alternate between types.
+        Classification applied per exception:
+        - ``TimeoutError``: backoff ``min(10·2^n, 120)``, raise on exhaustion.
+        - ``ResourceExhausted``: backoff ``min(30·2^n, 300)``, raise on exhaustion.
+        - ``status_code == 404``: raise immediately (retired model ID, etc.).
+        - Other exceptions: backoff ``min(10·2^n, 120)``, return fallback string
+          on exhaustion (non-raising — callers treat the fallback as a failed run).
         """
-        max_retries = 5
-        for attempt in range(max_retries):
+        max_attempts = 5
+        # Counter captured by the closure so we can vary the per-attempt timeout
+        # without reaching into the retry helper.
+        attempt_counter = [0]
+        def call_once() -> str:
+            attempt = attempt_counter[0]
+            attempt_counter[0] += 1
             timeout_seconds = 300 if attempt == 0 else 600
-            try:
-                callback_list = callbacks or []
-                # Always append monitoring callback - logging config controls output
-                callback_list.append(MONITORING_CALLBACK)
-                callback_list.append(self.agent_monitoring_callback)
-                logger.info(
-                    f"Starting agent.invoke() [attempt {attempt + 1}/{max_retries}] with prompt length: {len(prompt)}, timeout: {timeout_seconds}s"
-                )
-                response = self._invoke_with_timeout(
-                    timeout_seconds=timeout_seconds, callback_list=callback_list, prompt=prompt
-                )
-                logger.info(
-                    f"Completed agent.invoke() - message count: {len(response['messages'])}, last message type: {type(response['messages'][-1])}"
+            callback_list = (callbacks or []) + [MONITORING_CALLBACK, self.agent_monitoring_callback]
+            logger.info(
+                f"Starting agent.invoke() [attempt {attempt + 1}/{max_attempts}] with prompt length: {len(prompt)}, timeout: {timeout_seconds}s"
+            )
+            response = self._invoke_with_timeout(
+                timeout_seconds=timeout_seconds, callback_list=callback_list, prompt=prompt
+            )
+            logger.info(
+                f"Completed agent.invoke() - message count: {len(response['messages'])}, last message type: {type(response['messages'][-1])}"
+            )
+            agent_response = response["messages"][-1]
+            assert isinstance(agent_response, AIMessage), f"Expected AIMessage, but got {type(agent_response)}"
+            if isinstance(agent_response.content, str):
+                return agent_response.content
+            if isinstance(agent_response.content, list):
+                return "".join(str(m) if not isinstance(m, str) else m for m in agent_response.content)
+            return ""  # unreachable for AIMessage but satisfies typing
+        def classify(exc: Exception, attempt: int) -> RetryDecision:
+            if getattr(exc, "status_code", None) == 404:
+                logger.error(f"Permanent HTTP 404 — not retrying: {type(exc).__name__}: {exc}")
+                return RetryDecision(action=RetryAction.GIVE_UP)
+            if isinstance(exc, ResourceExhausted):
+                return RetryDecision(
+                    action=RetryAction.RETRY,
+                    backoff_s=default_backoff(attempt, initial_s=30.0, multiplier=2.0, max_s=300.0),
                 )
+            # TimeoutError + generic Exception share the same backoff.
+            return RetryDecision(
+                action=RetryAction.RETRY,
+                backoff_s=default_backoff(attempt, initial_s=10.0, multiplier=2.0, max_s=120.0),
+            )
-                agent_response = response["messages"][-1]
-                assert isinstance(agent_response, AIMessage), f"Expected AIMessage, but got {type(agent_response)}"
-                if isinstance(agent_response.content, str):
-                    return agent_response.content
-                if isinstance(agent_response.content, list):
-                    return "".join(
-                        [
-                            str(message) if not isinstance(message, str) else message
-                            for message in agent_response.content
-                        ]
-                    )
-            except TimeoutError as e:
-                if attempt < max_retries - 1:
-                    # Exponential backoff: 10s * 2^attempt (10s, 20s, 40s, 80s)
-                    delay = min(10 * (2**attempt), 120)
-                    logger.warning(
-                        f"Agent invocation timed out after {timeout_seconds}s, retrying in {delay}s... (attempt {attempt + 1}/{max_retries})"
-                    )
-                    time.sleep(delay)
-                else:
-                    logger.error(f"Agent invocation timed out after {timeout_seconds}s on final attempt")
-                    raise
-            except ResourceExhausted as e:
-                if attempt < max_retries - 1:
-                    # Longer backoff for rate limits: 30s * 2^attempt (30s, 60s, 120s, 240s)
-                    delay = min(30 * (2**attempt), 300)
-                    logger.warning(
-                        f"ResourceExhausted (rate limit): {e}\n"
-                        f"Retrying in {delay}s... (attempt {attempt + 1}/{max_retries})"
-                    )
-                    time.sleep(delay)
-                else:
-                    logger.error(f"Max retries ({max_retries}) reached. ResourceExhausted: {e}")
-                    raise
-            except Exception as e:
-                # HTTP 404 (e.g. retired model ID) is permanent — retrying won't help.
-                if getattr(e, "status_code", None) == 404:
-                    logger.error(f"Permanent HTTP 404 — not retrying: {type(e).__name__}: {e}")
-                    raise
-                # Other errors (network, parsing, etc.) get standard exponential backoff
-                if attempt < max_retries - 1:
-                    delay = min(10 * (2**attempt), 120)
-                    logger.warning(
-                        f"Agent error: {type(e).__name__}: {e}, retrying in {delay}s... (attempt {attempt + 1}/{max_retries})"
-                    )
-                    time.sleep(delay)
-                # On final attempt, fall through to return error message below
-        logger.error("Max retries reached. Failed to get response from the agent.")
-        return "Could not get response from the agent."
+        def on_exhausted(exc: Exception) -> str:
+            # Typed exceptions surface the original error; only generic falls through
+            # to the historic fallback string that callers have long relied on.
+            if isinstance(exc, (TimeoutError, ResourceExhausted)):
+                raise exc
+            return "Could not get response from the agent."
+        return with_retries(
+            call_once,
+            max_attempts=max_attempts,
+            classify=classify,
+            on_exhausted=on_exhausted,
+            log_prefix="Agent invocation",
+        )
     def _invoke_with_timeout(self, timeout_seconds: int, callback_list: list, prompt: str):
         """Invoke agent with a timeout using threading."""
@@ -217,10 +201,10 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
         except Empty:
             raise RuntimeError("Agent invocation completed but no result was returned")
-    def _parse_invoke(self, prompt: str, type: type):
+    def _parse_invoke(self, prompt: str, type: type, include_hidden: bool = False):
         response = self._invoke(prompt)
         assert isinstance(response, str), f"Expected a string as response type got {response}"
-        return self._parse_response(prompt, response, type)
+        return self._parse_response(prompt, response, type, include_hidden=include_hidden)
     def _score_result(self, result, validators: list, context) -> tuple[float, list[tuple[float, str]]]:
         """Run all validators on a result and return (score, prioritized_feedback).
@@ -250,7 +234,13 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
         return score, weighted_feedback
     def _validation_invoke(
-        self, prompt: str, return_type: type, validators: list, context, max_validation_attempts: int = 1
+        self,
+        prompt: str,
+        return_type: type,
+        validators: list,
+        context,
+        max_validation_attempts: int = 1,
+        include_hidden: bool = False,
     ):
         """
         Invoke LLM with validation, feedback loop, and best-of-N selection.
@@ -278,7 +268,12 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
         # Compute the maximum possible score so we can detect a perfect result
         max_possible_score = sum(VALIDATOR_WEIGHTS.get(v.__name__, DEFAULT_VALIDATOR_WEIGHT) for v in validators)
-        result = self._parse_invoke(prompt, return_type)
+        result = self._parse_invoke(prompt, return_type, include_hidden=include_hidden)
+        logger.info(
+            "[Validation] Parsed %s: %s",
+            return_type.__name__,
+            result.llm_str()[:500],
+        )
         # Track the best candidate across all attempts
         best_result = result
@@ -331,79 +326,74 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
                 f"[Validation] Preparing attempt {attempt + 1}/{max_validation_attempts} "
                 f"with {len(weighted_feedback)} feedback items"
             )
-            result = self._parse_invoke(feedback_prompt, return_type)
+            result = self._parse_invoke(feedback_prompt, return_type, include_hidden=include_hidden)
         return best_result
-    def _parse_response(self, prompt, response, return_type, max_retries=5, attempt=0):
-        if attempt >= max_retries:
-            logger.error(f"Max retries ({max_retries}) reached for parsing response: {response}")
-            raise Exception(f"Max retries reached for parsing response: {response}")
-        extractor = create_extractor(self.parsing_llm, tools=[return_type], tool_choice=return_type.__name__)
+    def _parse_response(self, prompt, response, return_type, max_retries=5, attempt=0, include_hidden: bool = False):
         if response is None or response.strip() == "":
             logger.error(f"Empty response for prompt: {prompt}")
-        try:
-            result = extractor.invoke(
-                return_type.extractor_str() + response,
-                config={"callbacks": [MONITORING_CALLBACK, self.agent_monitoring_callback]},
+        if include_hidden and issubclass(return_type, LLMBaseModel):
+            schema = return_type.model_json_schema(include_hidden=True)
+            parser = PydanticOutputParser(pydantic_object=return_type)
+            format_instructions = (
+                f"The output should be formatted as a JSON instance that conforms to the JSON schema below.\n"
+                f"Here is the output schema:\n```json\n{json.dumps(schema, indent=2)}\n```"
             )
-            if "responses" in result and len(result["responses"]) != 0:
-                return return_type.model_validate(result["responses"][0])
-            if "messages" in result and len(result["messages"]) != 0:
-                message = result["messages"][0].content
-                parser = PydanticOutputParser(pydantic_object=return_type)
-                if not message:
-                    raise EmptyExtractorMessageError("Extractor returned empty message content")
-                return self._try_parse(message, parser)
+        else:
             parser = PydanticOutputParser(pydantic_object=return_type)
-            return self._try_parse(response, parser)
-        except EmptyExtractorMessageError as e:
-            logger.warning(f"{e} (attempt {attempt + 1}/{max_retries})")
-            return self._parse_response(prompt, response, return_type, max_retries, attempt + 1)
-        except AttributeError as e:
-            # Workaround for trustcall bug: https://github.com/hinthornw/trustcall/issues/47
-            # 'ExtractionState' object has no attribute 'tool_call_id' occurs during validation retry
-            if "tool_call_id" in str(e):
-                logger.warning(f"Trustcall bug encountered, falling back to Pydantic parser: {e}")
-                parser = PydanticOutputParser(pydantic_object=return_type)
-                return self._try_parse(response, parser)
-            raise
-        except IndexError as e:
-            # try to parse with the json parser if possible
-            logger.warning(f"IndexError while parsing response (attempt {attempt + 1}/{max_retries}): {e}")
-            return self._parse_response(prompt, response, return_type, max_retries, attempt + 1)
-        except (json.JSONDecodeError, ValueError) as e:
-            logger.warning(f"Parse error (attempt {attempt + 1}/{max_retries}): {e}")
-            return self._parse_response(prompt, response, return_type, max_retries, attempt + 1)
-        except ResourceExhausted as e:
-            # Parsing uses exponential backoff for rate limits
-            if attempt < max_retries - 1:
-                # Exponential backoff: 30s * 2^attempt, capped at 300s
-                delay = min(30 * (2**attempt), 300)
-                logger.warning(
-                    f"ResourceExhausted during parsing (rate limit): {e}\n"
-                    f"Retrying in {delay}s... (attempt {attempt + 1}/{max_retries})"
+            format_instructions = parser.get_format_instructions()
+        def call_once():
+            try:
+                result = self._structured_parse(response, parser, format_instructions=format_instructions)
+                logger.debug("[parse_response] structured_parse succeeded for %s", return_type.__name__)
+                return result
+            except Exception as e:
+                logger.warning("[parse_response] structured_parse failed for %s: %s", return_type.__name__, e)
+            return self._extractor_parse(response, return_type, parser, include_hidden=include_hidden)
+        def classify(exc: Exception, attempt: int) -> RetryDecision:
+            if isinstance(exc, ResourceExhausted):
+                return RetryDecision(
+                    action=RetryAction.RETRY,
+                    backoff_s=default_backoff(attempt, initial_s=30.0, multiplier=2.0, max_s=300.0),
                 )
-                time.sleep(delay)
-                return self._parse_response(prompt, response, return_type, max_retries, attempt + 1)
-            else:
-                logger.error(f"Resource exhausted on final parsing attempt: {e}")
-                raise
+            if isinstance(exc, (EmptyExtractorMessageError, IndexError, json.JSONDecodeError, ValueError)):
+                return RetryDecision(action=RetryAction.RETRY_NOW)
+            return RetryDecision(action=RetryAction.GIVE_UP)
+        def on_exhausted(exc: Exception):
+            if isinstance(exc, ResourceExhausted):
+                logger.error(f"Resource exhausted on final parsing attempt: {exc}")
+                raise exc
+            logger.error(f"Max retries ({max_retries}) reached for parsing response: {response}")
+            raise Exception(f"Max retries reached for parsing response: {response}")
-    def _try_parse(self, message_content, parser):
-        try:
-            prompt_template = """You are an JSON expert. Here you need to extract information in the following json format: {format_instructions}
+        return with_retries(
+            call_once,
+            max_attempts=max(1, max_retries - attempt),
+            classify=classify,
+            on_exhausted=on_exhausted,
+            log_prefix="Parse response",
+        )
-            Here is the content to parse and fix: {adjective}
+    def _structured_parse(self, message_content, parser, format_instructions: str | None = None):
+        if format_instructions is None:
+            format_instructions = parser.get_format_instructions()
+        prompt_template = """You are a JSON expert. Here you need to extract information in the following json format: {format_instructions}
-            Please provide only the JSON output without any additional text."""
-            prompt = PromptTemplate(
-                template=prompt_template,
-                input_variables=["adjective"],
-                partial_variables={"format_instructions": parser.get_format_instructions()},
-            )
-            chain = prompt | self.parsing_llm | parser
+        Here is the content to parse and fix: {adjective}
+        Please provide only the JSON output without any additional text."""
+        prompt = PromptTemplate(
+            template=prompt_template,
+            input_variables=["adjective"],
+            partial_variables={"format_instructions": format_instructions},
+        )
+        chain = prompt | self.parsing_llm | parser
+        try:
             return chain.invoke(
                 {"adjective": message_content},
                 config={"callbacks": [MONITORING_CALLBACK, self.agent_monitoring_callback]},
@@ -411,7 +401,28 @@ class CodeBoardingAgent(ReferenceResolverMixin, MonitoringMixin):
         except (ValidationError, OutputParserException):
             for _, v in json.loads(message_content).items():
                 try:
-                    return self._try_parse(json.dumps(v), parser)
+                    return self._structured_parse(json.dumps(v), parser)
                 except:
                     pass
         raise ValueError(f"Couldn't parse {message_content}")
+    def _extractor_parse(self, response, return_type, parser, include_hidden: bool = False):
+        extractor = create_extractor(self.parsing_llm, tools=[return_type], tool_choice=return_type.__name__)
+        try:
+            result = extractor.invoke(
+                return_type.extractor_str(include_hidden=include_hidden) + response,
+                config={"callbacks": [MONITORING_CALLBACK, self.agent_monitoring_callback]},
+            )
+        except AttributeError as e:
+            if "tool_call_id" in str(e):
+                logger.warning(f"Trustcall bug encountered: {e}")
+                raise
+            raise
+        if "responses" in result and len(result["responses"]) != 0:
+            return return_type.model_validate(result["responses"][0])
+        if "messages" in result and len(result["messages"]) != 0:
+            message = result["messages"][0].content
+            if not message:
+                raise EmptyExtractorMessageError("Extractor returned empty message content")
+            return self._structured_parse(message, parser)
+        raise EmptyExtractorMessageError("Extractor returned no responses and no messages")

codeboarding 0.10.4__tar.gz → 0.12.0__tar.gz

codeboarding 0.10.4tar.gz → 0.12.0tar.gz