PyPI - synth-ai - Versions diffs - 0.2.16__py3-none-any.whl → 0.2.17__py3-none-any.whl - Mend

synth-ai 0.2.16py3-none-any.whl → 0.2.17py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of synth-ai might be problematic. Click here for more details.

Files changed (192) hide show

examples/analyze_semantic_words.sh +2 -2
examples/blog_posts/pokemon_vl/README.md +98 -0
examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +25 -0
examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +42 -0
examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
examples/blog_posts/warming_up_to_rl/README.md +158 -0
examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +41 -0
examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
examples/dev/qwen3_32b_qlora_4xh100.toml +5 -0
examples/multi_step/configs/crafter_rl_outcome.toml +1 -1
examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +65 -107
examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +1 -1
examples/multi_step/configs/crafter_rl_stepwise_simple.toml +1 -1
examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
examples/multi_step/configs/verilog_rl_lora.toml +80 -123
examples/qwen_coder/configs/coder_lora_30b.toml +1 -3
examples/qwen_coder/configs/coder_lora_4b.toml +4 -1
examples/qwen_coder/configs/coder_lora_small.toml +1 -3
examples/qwen_vl/README.md +10 -12
examples/qwen_vl/SETUP_COMPLETE.md +7 -8
examples/qwen_vl/VISION_TESTS_COMPLETE.md +2 -3
examples/qwen_vl/collect_data_via_cli.md +76 -84
examples/qwen_vl/collect_vision_traces.py +4 -4
examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +40 -57
examples/qwen_vl/configs/crafter_vlm_sft_example.toml +1 -2
examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +20 -37
examples/qwen_vl/configs/eval_gpt5nano_vision.toml +21 -40
examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
examples/qwen_vl/configs/{filter_qwen2vl_sft.toml → filter_qwen3vl_sft.toml} +4 -5
examples/qwen_vl/configs/filter_vision_sft.toml +2 -3
examples/qwen_vl/crafter_qwen_vl_agent.py +5 -5
examples/qwen_vl/run_vision_comparison.sh +6 -7
examples/rl/README.md +5 -5
examples/rl/configs/rl_from_base_qwen.toml +26 -1
examples/rl/configs/rl_from_base_qwen17.toml +5 -2
examples/rl/task_app/README.md +1 -2
examples/rl/task_app/math_single_step.py +2 -2
examples/run_crafter_demo.sh +2 -2
examples/sft/README.md +1 -1
examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -1
examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -1
examples/swe/task_app/README.md +32 -2
examples/swe/task_app/grpo_swe_mini.py +4 -0
examples/swe/task_app/hosted/envs/crafter/react_agent.py +1 -1
examples/swe/task_app/hosted/envs/mini_swe/environment.py +37 -10
examples/swe/task_app/hosted/inference/openai_client.py +4 -4
examples/swe/task_app/morph_backend.py +178 -0
examples/task_apps/crafter/task_app/README.md +1 -1
examples/task_apps/crafter/task_app/grpo_crafter.py +66 -3
examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +1 -1
examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +4 -26
examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -2
examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +17 -49
examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +13 -5
examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +15 -1
examples/task_apps/enron/task_app/grpo_enron_task_app.py +1 -1
examples/task_apps/math/README.md +1 -2
examples/task_apps/pokemon_red/README.md +3 -4
examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +6 -5
examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +1 -2
examples/task_apps/pokemon_red/task_app.py +36 -5
examples/task_apps/sokoban/README.md +2 -3
examples/task_apps/verilog/eval_groq_qwen32b.toml +12 -14
examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +1 -1
examples/vlm/configs/crafter_vlm_gpt4o.toml +4 -1
examples/warming_up_to_rl/configs/crafter_fft.toml +4 -1
examples/warming_up_to_rl/configs/crafter_fft_4b.toml +0 -2
examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +2 -2
examples/warming_up_to_rl/run_local_rollout_traced.py +1 -1
examples/warming_up_to_rl/task_app/README.md +1 -1
examples/warming_up_to_rl/task_app/grpo_crafter.py +134 -3
examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +1 -1
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +3 -27
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -1
examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +4 -4
examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +6 -3
examples/workflows/math_rl/configs/rl_from_base_qwen.toml +27 -0
examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +5 -0
synth_ai/api/train/builders.py +9 -3
synth_ai/api/train/cli.py +125 -10
synth_ai/api/train/configs/__init__.py +8 -1
synth_ai/api/train/configs/rl.py +32 -7
synth_ai/api/train/configs/sft.py +6 -2
synth_ai/api/train/configs/shared.py +59 -2
synth_ai/auth/credentials.py +119 -0
synth_ai/cli/__init__.py +12 -4
synth_ai/cli/commands/__init__.py +17 -0
synth_ai/cli/commands/demo/__init__.py +6 -0
synth_ai/cli/commands/demo/core.py +163 -0
synth_ai/cli/commands/deploy/__init__.py +23 -0
synth_ai/cli/commands/deploy/core.py +614 -0
synth_ai/cli/commands/deploy/errors.py +72 -0
synth_ai/cli/commands/deploy/validation.py +11 -0
synth_ai/cli/commands/eval/__init__.py +19 -0
synth_ai/cli/commands/eval/core.py +1109 -0
synth_ai/cli/commands/eval/errors.py +81 -0
synth_ai/cli/commands/eval/validation.py +133 -0
synth_ai/cli/commands/filter/__init__.py +12 -0
synth_ai/cli/commands/filter/core.py +388 -0
synth_ai/cli/commands/filter/errors.py +55 -0
synth_ai/cli/commands/filter/validation.py +77 -0
synth_ai/cli/commands/help/__init__.py +177 -0
synth_ai/cli/commands/help/core.py +73 -0
synth_ai/cli/commands/status/__init__.py +64 -0
synth_ai/cli/commands/status/client.py +192 -0
synth_ai/cli/commands/status/config.py +92 -0
synth_ai/cli/commands/status/errors.py +20 -0
synth_ai/cli/commands/status/formatters.py +164 -0
synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
synth_ai/cli/commands/status/subcommands/files.py +79 -0
synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
synth_ai/cli/commands/status/subcommands/models.py +79 -0
synth_ai/cli/commands/status/subcommands/runs.py +81 -0
synth_ai/cli/commands/status/subcommands/summary.py +47 -0
synth_ai/cli/commands/status/utils.py +114 -0
synth_ai/cli/commands/train/__init__.py +53 -0
synth_ai/cli/commands/train/core.py +21 -0
synth_ai/cli/commands/train/errors.py +117 -0
synth_ai/cli/commands/train/judge_schemas.py +199 -0
synth_ai/cli/commands/train/judge_validation.py +304 -0
synth_ai/cli/commands/train/validation.py +443 -0
synth_ai/cli/demo.py +2 -162
synth_ai/cli/deploy/__init__.py +28 -0
synth_ai/cli/deploy/core.py +5 -0
synth_ai/cli/deploy/errors.py +23 -0
synth_ai/cli/deploy/validation.py +5 -0
synth_ai/cli/eval/__init__.py +36 -0
synth_ai/cli/eval/core.py +5 -0
synth_ai/cli/eval/errors.py +31 -0
synth_ai/cli/eval/validation.py +5 -0
synth_ai/cli/filter/__init__.py +28 -0
synth_ai/cli/filter/core.py +5 -0
synth_ai/cli/filter/errors.py +23 -0
synth_ai/cli/filter/validation.py +5 -0
synth_ai/cli/modal_serve/__init__.py +12 -0
synth_ai/cli/modal_serve/core.py +14 -0
synth_ai/cli/modal_serve/errors.py +8 -0
synth_ai/cli/modal_serve/validation.py +11 -0
synth_ai/cli/serve/__init__.py +12 -0
synth_ai/cli/serve/core.py +14 -0
synth_ai/cli/serve/errors.py +8 -0
synth_ai/cli/serve/validation.py +11 -0
synth_ai/cli/setup.py +20 -265
synth_ai/cli/status.py +7 -126
synth_ai/cli/task_app_deploy.py +1 -10
synth_ai/cli/task_app_modal_serve.py +4 -9
synth_ai/cli/task_app_serve.py +4 -11
synth_ai/cli/task_apps.py +58 -1487
synth_ai/cli/train/__init__.py +12 -0
synth_ai/cli/train/core.py +21 -0
synth_ai/cli/train/errors.py +8 -0
synth_ai/cli/train/validation.py +24 -0
synth_ai/cli/train.py +1 -14
synth_ai/demos/crafter/grpo_crafter_task_app.py +1 -1
synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +1 -1
synth_ai/environments/examples/red/engine.py +33 -12
synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
synth_ai/environments/examples/red/environment.py +26 -0
synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
synth_ai/http.py +12 -0
synth_ai/judge_schemas.py +10 -11
synth_ai/learning/rl/client.py +3 -1
synth_ai/streaming/__init__.py +29 -0
synth_ai/streaming/config.py +94 -0
synth_ai/streaming/handlers.py +469 -0
synth_ai/streaming/streamer.py +301 -0
synth_ai/streaming/types.py +95 -0
synth_ai/task/validators.py +2 -2
synth_ai/tracing_v3/migration_helper.py +1 -2
synth_ai/utils/env.py +25 -18
synth_ai/utils/http.py +4 -1
synth_ai/utils/modal.py +2 -2
{synth_ai-0.2.16.dist-info → synth_ai-0.2.17.dist-info}/METADATA +8 -3
{synth_ai-0.2.16.dist-info → synth_ai-0.2.17.dist-info}/RECORD +184 -109
examples/qwen_vl/configs/eval_qwen2vl_vision.toml +0 -44
synth_ai/cli/tui.py +0 -62
synth_ai/tui/__init__.py +0 -5
synth_ai/tui/__main__.py +0 -13
synth_ai/tui/cli/__init__.py +0 -1
synth_ai/tui/cli/query_experiments.py +0 -164
synth_ai/tui/cli/query_experiments_v3.py +0 -164
synth_ai/tui/dashboard.py +0 -911
{synth_ai-0.2.16.dist-info → synth_ai-0.2.17.dist-info}/WHEEL +0 -0
{synth_ai-0.2.16.dist-info → synth_ai-0.2.17.dist-info}/entry_points.txt +0 -0
{synth_ai-0.2.16.dist-info → synth_ai-0.2.17.dist-info}/licenses/LICENSE +0 -0
{synth_ai-0.2.16.dist-info → synth_ai-0.2.17.dist-info}/top_level.txt +0 -0

examples/swe/task_app/morph_backend.py ADDED Viewed

@@ -0,0 +1,178 @@
+"""Utility classes for running swe-mini environments on Morph Cloud."""
+from __future__ import annotations
+import contextlib
+import os
+import shlex
+import time
+from dataclasses import dataclass, field
+from typing import Any, Dict
+_IMPORT_ERROR: Exception | None = None
+try:  # pragma: no cover - optional dependency
+    from morphcloud.api import MorphCloudClient
+except Exception as exc:  # pragma: no cover - optional dependency
+    MorphCloudClient = None  # type: ignore[assignment]
+    _IMPORT_ERROR = exc
+def _quote_env_var(key: str, value: str) -> str:
+    """Return a safe shell export statement."""
+    return f"export {key}={shlex.quote(value)}"
+def _now() -> float:
+    return time.time()
+@dataclass
+class MorphSandboxBackend:
+    """Thin wrapper around Morph Cloud instances for command execution.
+    The API mirrors the subset consumed by :class:`MiniSweEnvironmentWrapper`:
+    we expose an ``execute`` method that matches the mini-swe environment shape.
+    """
+    snapshot_id: str | None = None
+    image_id: str | None = None
+    cwd: str = "/workspace"
+    env: Dict[str, str] | None = None
+    metadata: Dict[str, str] | None = None
+    vcpus: int = 4
+    memory_mb: int = 8192
+    disk_mb: int = 65536
+    startup_timeout: int = 600
+    _client: MorphCloudClient = field(init=False)
+    _instance: Any = field(init=False, default=None)
+    _last_exec: Dict[str, Any] = field(init=False, default_factory=dict)
+    _started_at: float | None = field(init=False, default=None)
+    def __post_init__(self) -> None:
+        if MorphCloudClient is None:  # pragma: no cover - optional dependency
+            raise RuntimeError(
+                "morphcloud package is required for Morph environments. "
+                "Install with `pip install morphcloud`."
+            ) from _IMPORT_ERROR
+        api_key = os.getenv("MORPH_API_KEY", "")
+        if not api_key:
+            raise RuntimeError("Set MORPH_API_KEY before using the Morph backend.")
+        # Normalise metadata/env early to avoid shared references.
+        self.metadata = {str(k): str(v) for k, v in (self.metadata or {}).items()}
+        self.env = {str(k): str(v) for k, v in (self.env or {}).items()}
+        self.cwd = self.cwd or "/workspace"
+        self._client = MorphCloudClient()
+    # Public API -----------------------------------------------------------------
+    def execute(self, command: str, timeout: int | None = None) -> Dict[str, Any]:
+        """Execute ``command`` inside the Morph instance."""
+        if not command.strip():
+            command = "true"
+        instance = self._ensure_instance()
+        script_parts = []
+        for key, value in self.env.items():
+            script_parts.append(_quote_env_var(key, value))
+        if self.cwd:
+            script_parts.append(f"cd {shlex.quote(self.cwd)}")
+        script_parts.append(command)
+        script = " && ".join(script_parts)
+        if timeout:
+            wrapped = f"timeout {int(timeout)}s bash -lc {shlex.quote(script)}"
+        else:
+            wrapped = script
+        shell_cmd = f"bash -lc {shlex.quote(wrapped)}"
+        started = _now()
+        result = instance.exec(shell_cmd)
+        duration = _now() - started
+        payload = {
+            "output": (result.stdout or ""),
+            "stderr": (result.stderr or ""),
+            "returncode": getattr(result, "exit_code", None),
+            "duration": duration,
+        }
+        self._last_exec = payload
+        return payload
+    def close(self) -> None:
+        """Stops the Morph instance if one is running."""
+        instance = getattr(self, "_instance", None)
+        if not instance:
+            return
+        try:
+            instance.stop()
+        except Exception:  # pragma: no cover - best-effort shutdown
+            pass
+        finally:
+            self._instance = None
+    # Internal helpers -----------------------------------------------------------
+    def _ensure_instance(self):
+        instance = getattr(self, "_instance", None)
+        if instance is not None:
+            return instance
+        snapshot_id = (
+            self.snapshot_id
+            or os.getenv("SWE_MINI_MORPH_SNAPSHOT_ID")
+            or os.getenv("MORPH_SNAPSHOT_ID")
+        )
+        metadata = dict(self.metadata)
+        if snapshot_id:
+            instance = self._client.instances.start(snapshot_id=snapshot_id, metadata=metadata or None)
+        else:
+            image_id = (
+                self.image_id
+                or os.getenv("SWE_MINI_MORPH_IMAGE_ID")
+                or os.getenv("MORPH_IMAGE_ID")
+                or "morphvm-minimal"
+            )
+            snapshot = self._client.snapshots.create(
+                image_id=image_id,
+                vcpus=self.vcpus,
+                memory=self.memory_mb,
+                disk_size=self.disk_mb,
+            )
+            instance = self._client.instances.start(snapshot_id=snapshot.id, metadata=metadata or None)
+            self.snapshot_id = snapshot.id
+        self._instance = instance
+        self._started_at = _now()
+        self._wait_until_ready(instance)
+        self._ensure_cwd(instance)
+        return instance
+    def _wait_until_ready(self, instance) -> None:
+        deadline = _now() + float(self.startup_timeout)
+        while True:
+            try:
+                instance.wait_until_ready()
+                break
+            except Exception as exc:  # pragma: no cover - SDK may raise while polling
+                if _now() > deadline:
+                    raise TimeoutError(f"Morph instance did not become ready within {self.startup_timeout}s") from exc
+                time.sleep(5.0)
+    def _ensure_cwd(self, instance) -> None:
+        if not self.cwd:
+            return
+        try:
+            instance.exec(f"bash -lc {shlex.quote(f'mkdir -p {self.cwd}')}")
+        except Exception as exc:  # pragma: no cover - surface friendly error
+            raise RuntimeError(f"Failed to create remote workspace {self.cwd!r}: {exc}") from exc
+    def __del__(self) -> None:  # pragma: no cover - defensive cleanup
+        with contextlib.suppress(Exception):
+            self.close()

examples/task_apps/crafter/task_app/README.md CHANGED Viewed

@@ -6,7 +6,7 @@ underlying FastAPI plumbing.
 ## Local development
 ```bash
-uvx synth-ai serve grpo-crafter --port 8001
+uvx synth-ai deploy --runtime uvicorn grpo-crafter --port 8001
 # Optional extras:
 #   --env-file path/to/.env    # load additional environment variables
 #   --reload                   # enable uvicorn auto-reload

examples/task_apps/crafter/task_app/grpo_crafter.py CHANGED Viewed

@@ -9,9 +9,13 @@ import sys
 from collections.abc import Iterable, Sequence
 from contextlib import suppress
 from dataclasses import dataclass
+from datetime import UTC, datetime
 from pathlib import Path
 from typing import Any
+from fastapi import HTTPException
+from pydantic import BaseModel
 from synth_ai.task.apps import ModalDeploymentConfig, TaskAppEntry, register_task_app
 from synth_ai.task.contracts import RolloutMetrics, RolloutMode, RolloutRequest, RolloutResponse, TaskInfo
 from synth_ai.task.datasets import TaskDatasetRegistry, TaskDatasetSpec
@@ -657,6 +661,14 @@ def _resolve_trace_correlation_id(policy_cfg: dict[str, Any], mode: Any = None)
 async def rollout_executor(request: RolloutRequest, fastapi_request) -> RolloutResponse:
     request = _coerce_math_to_crafter(request)
+    record_cfg = request.record.model_copy(
+        update={
+            "return_trace": True,
+            "trace_format": "structured",
+        }
+    )
+    request = request.model_copy(update={"record": record_cfg})
     policy_cfg = dict(request.policy.config or {})
     logger.info(
         "ROLLOUT_EXEC: incoming policy config keys=%s inference_url=%s run_id=%s mode=%s",
@@ -800,11 +812,38 @@ async def rollout_executor(request: RolloutRequest, fastapi_request) -> RolloutR
         trace_correlation_id,
     )
     data = legacy_response.model_dump()
+    logger.debug(
+        "ROLLOUT_EXEC: legacy response keys=%s has_trace=%s",
+        sorted(data.keys()),
+        bool(data.get("trace")),
+    )
     metrics = data.get("metrics", {}) or {}
     metrics.setdefault("outcome_score", None)
     metrics.setdefault("events_score", None)
     metrics.setdefault("details", {})
     data["metrics"] = metrics
+    if data.get("trace") is None:
+        legacy_trace = getattr(legacy_response, "trace", None)
+        if legacy_trace is not None:
+            data["trace"] = legacy_trace
+        else:
+            tracer_factory = getattr(fastapi_request.app.state, "session_tracer_factory", None)
+            if callable(tracer_factory):
+                tracer = tracer_factory()
+                logger.debug("ROLLOUT_EXEC: trace backfill factory=%s", type(tracer))
+                if isinstance(tracer, SessionTracer):
+                    try:
+                        await tracer.initialize()
+                        if tracer.db is not None:
+                            trace_row = await tracer.db.get_session_trace(request.run_id)
+                            if trace_row is not None:
+                                data["trace"] = trace_row
+                    except Exception as exc:
+                        logger.warning("TRACE_BACKFILL_FAIL: %s", exc)
+                    finally:
+                        with suppress(Exception):
+                            await tracer.close()
     # Add trace_correlation_id at TOP-LEVEL (REQUIRED for RL training pipeline)
     # Use fallback if somehow missing
@@ -820,12 +859,30 @@ async def rollout_executor(request: RolloutRequest, fastapi_request) -> RolloutR
     if isinstance(policy_cfg.get("inference_url"), str) and policy_cfg["inference_url"]:
         existing_meta.setdefault("inference_url", policy_cfg["inference_url"])
     data["pipeline_metadata"] = existing_meta
     # Add trace_correlation_id to each trajectory (required for RL training pipeline)
     if "trajectories" in data:
+        normalized_trajs: list[dict[str, Any]] = []
         for traj in data.get("trajectories", []):
-            if isinstance(traj, dict):
-                traj["trace_correlation_id"] = final_cid
+            if isinstance(traj, BaseModel):
+                traj_dict = traj.model_dump()
+            elif isinstance(traj, dict):
+                traj_dict = dict(traj)
+            else:
+                continue
+            traj_dict["trace_correlation_id"] = final_cid
+            if not traj_dict.get("inference_url"):
+                inferred_url = policy_cfg.get("inference_url")
+                if inferred_url:
+                    traj_dict["inference_url"] = inferred_url
+            normalized_trajs.append(traj_dict)
+        if normalized_trajs:
+            data["trajectories"] = normalized_trajs
+            logger.info(
+                "ROLLOUT_EXEC: normalized trajectory sample run_id=%s inference_url=%s",
+                request.run_id,
+                normalized_trajs[0].get("inference_url") if normalized_trajs else None,
+            )
     logger.info(
         "ROLLOUT_EXEC: final pipeline metadata run_id=%s metadata=%s",
         request.run_id,
@@ -844,6 +901,12 @@ async def rollout_executor(request: RolloutRequest, fastapi_request) -> RolloutR
             request.run_id,
             existing_meta,
         )
+    if data.get("trace") is None:
+        raise HTTPException(
+            status_code=500,
+            detail="trace_payload_missing: task app did not emit a SessionTrace",
+        )
     # ASSERTION: Verify trace_correlation_id is present in response at all required levels
     assert "trace_correlation_id" in data, (

examples/task_apps/crafter/task_app/grpo_crafter_task_app.py CHANGED Viewed

@@ -3,7 +3,7 @@
 This module now delegates to the TaskAppConfig defined in the colocated example at
 `examples/task_apps/crafter/task_app/grpo_crafter.py`. It is kept for legacy usage
 (running the file directly or targeting `fastapi_app` from external tooling). Prefer using
-`uvx synth-ai serve grpo-crafter` for local development and testing.
+`uvx synth-ai deploy --runtime uvicorn grpo-crafter` for local development and testing.
 """
 from __future__ import annotations

examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py CHANGED Viewed

@@ -197,6 +197,8 @@ class CrafterPolicy(Policy):
         if self.use_tools:
             payload["tools"] = TOOLS_SCHEMA
             payload["tool_choice"] = "required"
+            payload["function_call"] = {"name": "interact_many"}
+            payload["parallel_tool_calls"] = False
             # Ensure the inference server injects family-specific stop sequences
             # to terminate immediately after the first tool call for compliance.
             payload["stop_after_tool_calls"] = 1
@@ -207,13 +209,7 @@ class CrafterPolicy(Policy):
         response: dict[str, Any],
         use_tools: bool = True,
     ) -> list[dict[str, Any]]:
-        """Turn an inference response into environment tool calls.
-        - If tools were used, expect tool_calls-compatible output and forward as-is
-          in our simple JSON format: {"tool_name": str, "arguments": {...}}.
-        - If no tools, parse plain-text actions using CrafterReActAgent parser and
-          wrap them into a single interact_many tool call.
-        """
+        """Turn an inference response into environment tool calls."""
         # First check if we got actual tool calls
         choices = response.get("choices", [])
         tool_calls: list[dict[str, Any]] = []
@@ -272,24 +268,6 @@ class CrafterPolicy(Policy):
                     normalized.append(tc)
             return normalized
-        # Otherwise, parse plain text content for actions
-        text = ""
-        for choice in choices:
-            msg = choice.get("message", {})
-            content = msg.get("content", "")
-            if content:
-                text = content
-                break
-        if text:
-            # Try to parse actions from the text
-            from .shared import parse_actions
-            actions = parse_actions(text)
-            if actions:
-                # Wrap actions in interact_many tool call
-                return [{"tool_name": "interact_many", "arguments": {"actions": actions}}]
         # No actions found
         return []
@@ -542,7 +520,7 @@ class CrafterPolicy(Policy):
             "claude-3",         # All Claude 3 models support vision
             "gemini",           # Gemini models
             "qwen-vl",          # Qwen Vision-Language models
-            "qwen2-vl",         # Qwen2 VL
+            "qwen3-vl",         # Qwen3 VL
             "pixtral",          # Mistral's vision model
             "llava",            # LLaVA models
             "phi-3-vision",     # Microsoft Phi-3 Vision

examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py CHANGED Viewed

@@ -45,8 +45,7 @@ class CrafterReActAgent:
             "Action policy:\n"
             "- Always return a single tool call: interact_many({actions: [...]})\n"
             "- Use 2–5 actions per call; prefer long movement sequences to explore.\n"
-            "- Mix in 'do' only when it makes sense (tree, stone, animal, enemy nearby).\n"
-            "- Do not spam the same exact sequence twice in a row—explore in varied directions.\n\n"
+            "- Mix in 'do' only when it makes sense (tree, stone, animal, enemy nearby).\n\n"
             "Available actions: noop, move_up, move_down, move_left, move_right, do (interact), sleep, "
             "place_stone, place_table, place_furnace, place_plant, make_wood_pickaxe, make_stone_pickaxe, "
             "make_iron_pickaxe, make_wood_sword, make_stone_sword, make_iron_sword\n"

examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py CHANGED Viewed

@@ -50,20 +50,19 @@ class OpenAIClient:
         # Make a copy to avoid modifying the original
         fixed_request = request.copy()
-        # Determine if target is OpenAI-compatible (OpenAI, Azure OpenAI, Groq);
-        # strip fields those endpoints don't accept
+        # Determine if target is OpenAI-compatible (OpenAI, Azure OpenAI).
+        # Groq shares the API surface but we keep tool enforcement fields intact.
         is_openai = False
+        is_groq = False
         try:
             if isinstance(target_url, str):
                 low = target_url.lower()
-                is_openai = (
-                    ("openai.com" in low)
-                    or ("azure" in low and ".openai." in low)
-                    or ("groq.com" in low)
-                    or ("/openai" in low)
-                    or ("/proxy/groq" in low)
-                    or ("/proxy/openai" in low)
-                )
+                if "groq.com" in low or "/proxy/groq" in low:
+                    is_groq = True
+                elif ("openai.com" in low) or ("azure" in low and ".openai." in low) or (
+                    "/proxy/openai" in low
+                ):
+                    is_openai = True
         except Exception:
             is_openai = False
@@ -259,13 +258,13 @@ class OpenAIClient:
                                 content_len = len(str(content)) if content else 0
                                 logger.debug(f"🔊 [OPENAI_CLIENT] Message[{idx}] role={role}, content_type={type(content).__name__}, len={content_len}")
-        # Final hard-guard for OpenAI: ensure unsupported field is not present
+        # Final hard-guard for OpenAI/Groq: drop unsupported field
         try:
-            if "openai" in url.lower() and "stop_after_tool_calls" in processed_request:
+            low_url = url.lower()
+            if ("openai" in low_url or "groq.com" in low_url or "/proxy/groq" in low_url) and "stop_after_tool_calls" in processed_request:
                 processed_request.pop("stop_after_tool_calls", None)
-                logger.info("Removed stop_after_tool_calls for OpenAI request")
+                logger.info("Removed stop_after_tool_calls for %s request", "Groq/OpenAI")
             # Groq-specific requirement: when using JSON mode, one of the messages must contain the word 'json'
-            low_url = url.lower()
             if ("groq.com" in low_url or "/openai" in low_url) and isinstance(
                 processed_request, dict
             ):
@@ -546,47 +545,16 @@ class OpenAIClient:
                                     error_block.get("code") or error_block.get("type") or ""
                                 ).lower()
                             if error_code in {"tool_use_failed", "tool_call_failed"}:
-                                logger.warning(
+                                logger.error(
                                     {
                                         "tool_use_failed": True,
                                         "target": (base_url or self.base_url),
                                         "message": error_block.get("message") if isinstance(error_block, dict) else None,
                                     }
                                 )
-                                fallback_actions = ["move_right", "move_up", "do"]
-                                fallback_response = {
-                                    "id": f"fallback-{int(time.time() * 1000)}",
-                                    "object": "chat.completion",
-                                    "created": int(time.time()),
-                                    "model": processed_request.get("model"),
-                                    "choices": [
-                                        {
-                                            "index": 0,
-                                            "message": {
-                                                "role": "assistant",
-                                                "content": "",
-                                                "tool_calls": [
-                                                    {
-                                                        "id": f"call_fallback_{int(time.time() * 1000)}",
-                                                        "type": "function",
-                                                        "function": {
-                                                            "name": "interact_many",
-                                                            "arguments": json.dumps(
-                                                                {"actions": fallback_actions}
-                                                            ),
-                                                        },
-                                                    }
-                                                ],
-                                            },
-                                            "finish_reason": "tool_calls",
-                                        }
-                                    ],
-                                }
-                                if isinstance(response_data.get("usage"), dict):
-                                    fallback_response["usage"] = response_data["usage"]
-                                if isinstance(error_block, dict):
-                                    fallback_response["error"] = error_block
-                                return fallback_response
+                                raise RuntimeError(
+                                    f"Inference 400 response (tool call failed): {error_block.get('message') if isinstance(error_block, dict) else 'Tool call failed'}"
+                                ) from e
                             # This is a different type of 400 error, don't retry
                             try:
                                 redacted_headers = {}

examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py CHANGED Viewed

@@ -462,6 +462,8 @@ async def step_policy(
                 )
             # Emit full system/user prompts for observability (no secrets included)
+            system_prompt_records: list[dict[str, Any]] = []
+            user_prompt_records: list[dict[str, Any]] = []
             try:
                 def _as_text(content: object) -> str:
@@ -481,8 +483,6 @@ async def step_policy(
                         return "".join(parts)
                     return str(content)
-                system_prompt_records: list[dict[str, Any]] = []
-                user_prompt_records: list[dict[str, Any]] = []
                 for message in msgs:
                     role = message.get("role")
                     raw_content = message.get("content")
@@ -525,6 +525,11 @@ async def step_policy(
             if tracing_context is not None:
                 try:
+                    logger.info(
+                        "[TRACE_DEBUG] record_policy_prompts sys=%s user=%s",
+                        len(system_prompt_records),
+                        len(user_prompt_records),
+                    )
                     await tracing_context.record_policy_prompts(
                         system_prompt_records, user_prompt_records
                     )
@@ -780,9 +785,10 @@ async def step_policy(
                 "sokoban-react",
                 "crafter-react",
             ) and getattr(policy, "use_tools", True):
-                req_tools = meta["inference_request"]["tools"]
-                req_tool_choice = meta["inference_request"]["tool_choice"]
-                req_stop_after = meta["inference_request"]["stop_after_tool_calls"]
+                inf_req = meta.get("inference_request", {})
+                req_tools = inf_req.get("tools")
+                req_tool_choice = inf_req.get("tool_choice")
+                req_stop_after = inf_req.get("stop_after_tool_calls")
                 logger.info(
                     f"TOOLCALL_CONFIG: policy={policy_name} tools_present={bool(req_tools)} tool_choice={req_tool_choice} stop_after={req_stop_after}"
                 )
@@ -791,6 +797,8 @@ async def step_policy(
                         status_code=500,
                         detail=f"TOOLCALL_ASSERTION_FAIL: Missing tools or tool_choice!=required for policy {policy_name}",
                     )
+                if req_stop_after is None:
+                    inf_req["stop_after_tool_calls"] = 1
             # Call inference service with retries for Flash cold-start (503)
             import time as _t

examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py CHANGED Viewed

@@ -491,6 +491,11 @@ class RolloutTracingContext:
             getattr(request.record, "trace_format", "compact") or "compact"
         ).lower()
         self.return_trace = bool(getattr(request.record, "return_trace", False))
+        logger.warning(
+            "[TRACE_DEBUG] RolloutTracingContext init: trace_format=%s return_trace=%s",
+            self.trace_format,
+            self.return_trace,
+        )
         self.sft_output_dir = getattr(fastapi_request.app.state, "sft_output_dir", None)
         self.session_trace = None
         self.metadata_updates: dict[str, Any] = {}
@@ -590,7 +595,7 @@ class RolloutTracingContext:
         # Debug: Check message count
         if self.tracer and self.tracer._current_trace:
             msg_count = len(self.tracer._current_trace.markov_blanket_message_history)
-            logger.info(f"[TRACE_DEBUG] After record_policy_prompts: {msg_count} messages in trace")
+            logger.warning("[TRACE_DEBUG] After record_policy_prompts: %s messages", msg_count)
     def _content_to_text(self, content: Any) -> str:
         if isinstance(content, str):
@@ -669,6 +674,11 @@ class RolloutTracingContext:
                     message_type="assistant",  # Map to standard assistant message type
                     metadata={**self._message_metadata(), "is_tool_call": True},
                 )
+                if self.tracer._current_trace:
+                    logger.warning(
+                        "[TRACE_DEBUG] After tool invocation: messages=%s",
+                        len(self.tracer._current_trace.markov_blanket_message_history),
+                    )
             except Exception as exc:
                 logger.debug("TRACING_TOOL_MSG_FAIL: %s", exc)
@@ -991,6 +1001,10 @@ class RolloutTracingContext:
         if self.trace_format in ("full", "structured"):
             payload = session_trace.to_dict()
             payload.setdefault("metadata", {}).update(self.metadata_updates)
+            logger.warning(
+                "[TRACE_DEBUG] build_trace_payload returning structured trace with messages=%s",
+                len(payload.get("markov_blanket_message_history") or []),
+            )
             return payload
         # For "compact" format, return only summary stats

examples/task_apps/enron/task_app/grpo_enron_task_app.py CHANGED Viewed

@@ -2,7 +2,7 @@
 This mirrors the structure of the Crafter task app wrapper while delegating
 all configuration to the colocated `grpo_enron.py` module. Normal usage should
-prefer invoking `uvx synth-ai serve grpo-enron`, but this module remains for
+prefer invoking `uvx synth-ai deploy --runtime uvicorn grpo-enron`, but this module remains for
 direct execution or importing the FastAPI app object.
 """

examples/task_apps/math/README.md CHANGED Viewed

@@ -3,7 +3,7 @@
 This directory hosts the legacy entrypoint for the math single-step task app. Prefer starting the app via:
 ```bash
-uvx synth-ai serve math-single-step --env-file examples/rl/.env --port 8101
+uvx synth-ai deploy --runtime uvicorn math-single-step --env-file examples/rl/.env --port 8101
 ```
 If you need to run it directly (e.g., for Modal `modal deploy` compatibility), use:
@@ -19,4 +19,3 @@ Environment variables:
 - `MATH_DATASET_DEFAULT_SPLIT`, `MATH_DATASET_VALIDATION_SPLIT`, `MATH_DATASET_TEST_SPLIT`
 The task app enforces a single `math_submit` tool call per episode, enabling RL to reward correct final answers and penalise missing or malformed submissions.

examples/task_apps/pokemon_red/README.md CHANGED Viewed

@@ -17,7 +17,7 @@ A reinforcement learning environment for Pokémon Red using PyBoy emulation with
 ```bash
 # From synth-ai root
-uv run -m synth_ai task-app serve pokemon_red --port 8913
+uv run -m synth_ai task-app deploy --runtime uvicorn pokemon_red --port 8913
 ```
 ### 2. Run a Random Rollout
@@ -232,7 +232,7 @@ uv add pyboy
 lsof -ti :8913 | xargs -r kill -9
 # Or use a different port
-uv run -m synth_ai task-app serve pokemon_red --port 8914
+uv run -m synth_ai task-app deploy --runtime uvicorn pokemon_red --port 8914
 ```
 ## Examples
@@ -249,7 +249,7 @@ cd /Users/joshpurtell/Documents/GitHub/synth-ai
 echo "OPENAI_API_KEY=sk-..." >> .env
 # 2. Start the task app server (in background)
-nohup sh -c 'printf "n\n" | uv run -m synth_ai task-app serve pokemon_red --port 8913 --no-reload' > nohup_pokemon.log 2>&1 &
+nohup sh -c 'printf "n\n" | uv run -m synth_ai task-app deploy --runtime uvicorn pokemon_red --port 8913 --no-reload' > nohup_pokemon.log 2>&1 &
 # Wait for startup
 sleep 8
@@ -354,4 +354,3 @@ TOTAL REWARD: 705 points
 - **PyBoy**: Game Boy emulator - https://github.com/Baekalfen/PyBoy
 - **Pokémon Red Disassembly**: RAM map reference - https://github.com/pret/pokered
 - **Datacrystal.org**: Memory address documentation

examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml CHANGED Viewed

@@ -1,11 +1,12 @@
-# Evaluation config for Pokemon Red with image-only input
+# Evaluation config for Pokemon Red with image-only input and NEW REWARD SYSTEM
 # This config uses GPT-4o mini with only image data (no text observations)
+# Uses the comprehensive reward system with deterministic progress milestones
 [eval]
 app_id = "pokemon_red"
 model = "gpt-4o-mini-2024-07-18"
-seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
-max_turns = 10
+seeds = [0, 1, 2, 3, 4]  # Test with fewer seeds for quick results
+max_turns = 20  # Allow more turns to see progress
 concurrency = 1  # Keep low initially to avoid issues
 env_name = "pokemon_red"
 policy_name = "pokemon_red_policy"
@@ -13,7 +14,7 @@ trace_format = "full"
 return_trace = true
 [eval.env_config]
-max_steps_per_episode = 10
+max_steps_per_episode = 20
 [eval.policy_config]
 provider = "openai"
@@ -24,6 +25,6 @@ top_p = 0.95
 max_tokens = 512
 use_vision = true
 image_only_mode = true
-max_llm_calls = 10
+max_llm_calls = 20

synth-ai 0.2.16__py3-none-any.whl → 0.2.17__py3-none-any.whl

Potentially problematic release.

synth-ai 0.2.16py3-none-any.whl → 0.2.17py3-none-any.whl