PyPI - synth-ai - Versions diffs - 0.2.12__py3-none-any.whl → 0.2.13.dev2__py3-none-any.whl - Mend

synth-ai 0.2.12py3-none-any.whl → 0.2.13.dev2py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of synth-ai might be problematic. Click here for more details.

Files changed (229) hide show

examples/multi_step/sse_metrics_streaming_notes.md ADDED Viewed

@@ -0,0 +1,357 @@
+# SSE metrics/event streaming design (RL + FT)
+## Goals
+- Near real-time push of job status, metrics, and logs during RL training, evaluation, and fine-tuning (FT)
+- Single streaming endpoint per job, resumable (Last-Event-ID), low overhead, widely compatible (HTTP/1.1)
+- Minimal client friction (CLI + Python helper), production-ready (auth, backpressure, rate limit)
+## Non-goals
+- Binary/frame multiplexing (use WebSocket if needed later)
+- Arbitrary high-frequency payloads (we will coalesce/limit ~2–4 Hz for metrics)
+---
+## Endpoint
+- Method: GET `/rl/jobs/{job_id}/stream`
+- Headers:
+  - Auth: `X-API-Key: <env key>` or `Authorization: Bearer <token>`
+  - Cache: `Cache-Control: no-cache`
+- Response:
+  - Content-Type: `text/event-stream; charset=utf-8`
+  - Transfer-Encoding: `chunked`
+  - Connection: `keep-alive`
+- Query params (optional):
+  - `since_id`: int; resume from a specific event id (inclusive)
+  - `types`: comma list `metric,status,log,artifact` (default: all)
+  - `heartbeat`: seconds between heartbeats (default 20)
+  - `split`: `train|eval` (filters metrics only)
+## Event framing (SSE)
+- Fields per message:
+  - `id: <int>` monotonically increasing per job
+  - `event: <status|metric|log|artifact|heartbeat>`
+  - `data: <JSON>` single-line JSON (compact)
+- Heartbeats: comment lines `: keep-alive` at configured interval
+- Flush: after each event write + heartbeat
+- Backpressure: if producer > consumer, coalesce metrics, keep status/logs, never buffer unbounded
+## Payload schemas
+- `status`
+  - `{ "state": "queued|running|succeeded|failed|canceled", "step": 123, "epoch": 3, "phase": "train|eval|ft", "message": "...", "ts": 173.12 }`
+- `metric`
+  - `{ "name": "avg_reward|loss|accuracy|success_rate|return", "value": 0.123, "step": 123, "epoch": 3, "split": "train|eval", "window": 100, "mean": 0.42, "std": 0.08, "ts": 173.12 }`
+  - Optional extras: `{ "tags": {"env": "crafter", "policy": "react"} }`
+- `log`
+  - `{ "level": "INFO|WARN|ERROR", "message": "...", "ts": 173.12 }`
+- `artifact`
+  - `{ "kind": "checkpoint|trace|plot|jsonl", "url": "/rl/jobs/{id}/artifacts/ckpt_0003.pt", "step": 123, "ts": 173.12 }`
+- `heartbeat`
+  - `{ "alive": true, "ts": 173.12 }`
+### Example stream (illustrative)
+```
+id: 101
+event: status
+data: {"state":"running","phase":"train","step":820,"epoch":4,"ts":173.12}
+id: 102
+event: metric
+data: {"name":"avg_reward","value":0.62,"step":820,"epoch":4,"split":"train","ts":173.13}
+id: 103
+event: metric
+data: {"name":"loss","value":1.84,"step":820,"epoch":4,"split":"train","window":100,"mean":1.90,"std":0.15,"ts":173.13}
+id: 104
+event: log
+data: {"level":"INFO","message":"checkpoint saved","ts":173.16}
+id: 105
+event: artifact
+data: {"kind":"checkpoint","url":"/rl/jobs/j_abc/artifacts/ckpt_0004.pt","step":820,"ts":173.16}
+: keep-alive
+```
+---
+## Server architecture
+### Components
+- Event bus per `job_id` (async queue) where producers (RL, FT, evaluator) `emit(Event)`
+- Ring buffer per job for replay (configurable: last N events OR last T minutes)
+- SSE handler:
+  1) Authenticate, pick job, determine resume cursor (`since_id` or `Last-Event-ID`)
+  2) Replay from ring buffer >= cursor
+  3) Attach to live queue; stream new events
+  4) Emit heartbeats; close after terminal `status` + grace
+### Concurrency & ordering
+- Single writer increments `event_id`
+- Replay preserves original order; live continues from last id
+- If consumer slow: drop/coalesce metrics (preserve last per metric name), always deliver status/log/artifact
+### Rate limiting & coalescing
+- Default target 2–4 Hz for metrics per split
+- Coalesce by metric name within a small interval (e.g., 250–500 ms)
+- Status events limited to phase changes or every 5–10s
+### Auth & security
+- Accept `X-API-Key` or `Authorization: Bearer`
+- Validate job ownership/visibility
+- CORS: allow EventSource; set `Access-Control-Allow-Origin` appropriately
+- Timeouts: server idle timeout > heartbeat * 2; client reconnect on drop
+### Config knobs (env)
+- `SSE_HEARTBEAT_SECS` (default 20)
+- `SSE_RING_BUFFER_EVENTS` (e.g., 2000) OR `SSE_RING_BUFFER_WINDOW_SECS` (e.g., 600)
+- `SSE_MAX_METRIC_HZ` (e.g., 4)
+- `SSE_MAX_CLIENTS_PER_JOB` (protect from fan-out)
+---
+## Emit points
+### RL training/eval
+- On train step end: `metric` avg_reward/return/success_rate; `status` every N steps
+- On eval step end: `metric` eval_return/success_rate; `artifact` eval JSONL optional
+- On checkpoint: `artifact` + `log`
+- On phase transitions: `status` (train→eval, etc.)
+### Fine-tuning (FT)
+- On optimizer step: `metric` loss (and optional lr)
+- On validation: `metric` val_loss/accuracy; optional `artifact` (curves)
+- On checkpoint: `artifact` + `log`
+---
+## Client (synth-ai)
+### CLI
+- `synth-ai jobs stream <job_id> [--jsonl out.jsonl] [--types metric,status]`
+- Prints compact lines: `t=18:22:40 step=820 avg_reward=0.62 loss=1.84`
+- Writes raw events to JSONL if specified
+### Python helper
+```python
+from synth_ai.client import TaskAppClient
+with TaskAppClient(base_url, api_key) as c:
+    for ev in c.stream_job(job_id, types=["metric","status"], since_id=None):
+        handle(ev)
+```
+- Handles reconnect with `Last-Event-ID`
+- Dedupes by event id; optional local aggregation windows
+---
+## Error handling & edge cases
+- 404 (job not found): immediate error, no stream
+- 401/403: immediate error, no stream
+- Producer stalls: heartbeats continue; client shows "connected: no new data"
+- Ring buffer miss (client too far behind): send a summary `status` snapshot, then live only
+- Buffer overflow: set `dropped_events: true` field in a `status` update
+---
+## Testing plan
+- Unit: SSE formatting, ring buffer indexing, resume semantics, coalescing
+- Integration: run toy RL + FT loops; assert ordered sequence and types
+- Load: simulate 100 Hz metric emissions; verify rate limiting and non-blocking behavior
+- Auth: invalid key, missing key, cross-job access rejection
+---
+## Rollout plan
+1) Implement event bus + ring buffer per job
+2) Add SSE route with auth, resume, heartbeat
+3) Emiters in RL/FT loops (train/eval steps, loss/reward, checkpoints)
+4) synth-ai client: CLI + Python helper; optional JSONL sink
+5) Docs and example snippets (curl, Python, browser EventSource)
+---
+## Open questions / decisions
+- Do we persist events beyond ring buffer? (optional artifact JSONL)
+- Multi-tenant limits per org/user? (SSE_MAX_CLIENTS_PER_ORG?)
+- Artifact URLs: signed vs. authenticated direct
+- Backward compatibility with current polling endpoints
+I'll scan the backend to find where RL jobs are created, status is served, and whether there's any existing streaming/event infrastructure to extend. Then I'll identify exact files/functions to change for adding an SSE endpoint and metric emitters.
+[2 tools called]
+Here’s the exact backend surface you’ll need to touch to add a single SSE stream for status + metrics, with where to hook metric emitters.
+### Where jobs and events live
+- Job and event tables (legacy RL path):
+```1:20:monorepo/backend/app/db/models/jobs.py
+"""Job orchestration models (RL/SFT shared).
+"""
+```
+```31:58:monorepo/backend/app/db/models/jobs.py
+class Job(Base):
+    __tablename__ = "jobs"
+    job_id = Column(String, primary_key=True)
+    ...
+    status = Column(String, nullable=False, default="QUEUED")
+    ...
+    last_seq = Column(BigInteger, nullable=True, default=0)
+    events = relationship("JobEvent", back_populates="job", cascade="all, delete-orphan", passive_deletes=True)
+```
+```66:88:monorepo/backend/app/db/models/jobs.py
+class JobEvent(Base):
+    """Append-only job event log."""
+    __tablename__ = "job_events"
+    job_id = Column(String, ForeignKey("jobs.job_id", ondelete="CASCADE"), primary_key=True)
+    seq = Column(BigInteger, primary_key=True)
+    ts = Column(DateTime(timezone=True), server_default=func.now())
+    type = Column(String, nullable=False)
+    level = Column(String, nullable=False, default="info")
+    message = Column(Text, nullable=False)
+    data = Column(JSONB, nullable=True)
+```
+- DB repo (atomic seq increment + append):
+```108:146:monorepo/backend/app/orchestration/jobs/repository_db.py
+async def append_event(...):
+    res = await self.session.execute(
+        update(LearningJob)
+        .where(LearningJob.job_id == job_id)
+        .values(last_seq=func.coalesce(LearningJob.last_seq, 0) + 1, updated_at=func.now())
+        .returning(LearningJob.last_seq)
+    )
+    seq = res.scalar_one()
+    ev = JobEvent(job_id=job_id, seq=seq, type=type_, level=level, message=message, data=(data or {}))
+    ...
+```
+- RL jobs REST (create + emit first events via PostgREST emitter):
+```214:233:monorepo/backend/app/routes/clustered_training/core/routes.py
+@router.post("/jobs", ...)
+async def create_job(...):
+    ...
+```
+```784:799:monorepo/backend/app/routes/clustered_training/core/routes.py
+await get_postgrest_emitter().append_event(job_id=job_id, type_="rl.job.created", message="RL job created", data={"work": payload.work})
+```
+- Shared storage path showing how “update” and “append event” are broadcast via the PostgREST emitter:
+```242:271:monorepo/backend/app/routes/simple_training/services/job_service.py
+async def update_job(...): ...
+async def append_job_event(...): ...
+```
+What to add/change (SSE + emitters)
+1) Add SSE endpoint under RL jobs
+- File: `monorepo/backend/app/routes/clustered_training/core/routes.py`
+- New route: `GET /api/rl/jobs/{job_id}/stream`
+  - Auth: `ValidatedAPIKey`
+  - Headers: honor `Last-Event-ID`; Query: `since_id`, `types`, `heartbeat`
+  - Flow:
+    - Validate job exists (use the DB-backed repo in production mode)
+    - Determine start seq (from Last-Event-ID or since_id, else 0)
+    - Replay: call repo `list_events(job_id, after=seq)` and stream as SSE (`id`, `event`, `data`)
+    - Live tail: loop with short sleep (e.g., 0.5–1.0s) fetching new events by `after=last_seq`
+    - Heartbeats as comments `: keep-alive` every N seconds
+  - Event mapping:
+    - `JobEvent.type` prefixes map to SSE `event`:
+      - `rl.job.*`, `job.updated` → `status`
+      - `rl.step.metric`, `ft.step.metric`, `eval.metric` → `metric`
+      - `system.log.*` → `log`
+      - `artifact.*` → `artifact`
+    - `Job.status` can be snapshotted once at connect (send a `status`)
+2) Ensure a consistent event source for reads
+- Prefer the DB repo (`JobsRepositoryDB`) in prod mode. If the current code path uses the PostgREST emitter for appends, verify that the repo’s `list_events` reads from the same canonical table (it does for `JobEvent`). If your RL path uses the “learning_shared” models instead, use the associated repository there (same pattern: list by job_id + seq).
+- If you must keep PostgREST for append-only, that’s fine; SSE can still read the DB rows inserted alongside (your outbox/emitter already supports both).
+3) Emitters in training/FT loops
+- File(s): `monorepo/backend/app/orchestration/hatchet/workflows.py` (RL workflow nodes), any FT job loops
+- After each meaningful step:
+  - Train: append `type="rl.step.metric"`, `data={"avg_reward":..., "return":..., "success_rate":..., "step":..., "epoch":..., "split":"train"}`.
+  - Eval: `type="eval.metric"` with eval metrics and split.
+  - FT: `type="ft.step.metric"`, `data={"loss":..., "lr":..., "step":..., "epoch":..., "split":"train"}`; validation as `split="eval"`.
+  - On phase changes/checkpoints: `type="job.updated"` or `artifact.checkpoint` with URLs.
+- Use the same helper used elsewhere:
+```236:276:monorepo/backend/app/routes/simple_training/services/job_service.py
+async def append_job_event(...): return await get_postgrest_emitter().append_event(...)
+```
+4) Optional shared service abstraction
+- File: `monorepo/backend/app/routes/simple_training/services/storage_shared.py`
+  - Add a small `stream_job_events(job_id, after)` helper that wraps `repo.list_events(...)` and normalizes schemas (legacy vs learning_shared). The SSE route can call this.
+5) Wire the router
+- File: `monorepo/backend/app/routes/main.py`
+  - Include the new GET route (under the RL router you already mount).
+- CORS: ensure EventSource allowed if frontend will use browser SSE.
+Indexes/Perf you already have
+- Jobs/events tables include per-job indexes and a `last_seq`; reads by `(job_id, seq)` are efficient:
+```84:88:monorepo/backend/app/db/models/jobs.py
+Index("idx_job_events_job_ts", "job_id", "ts"),
+Index("idx_job_events_data_gin", "data", postgresql_using="gin"),
+```
+- For learning_shared events (if used), there are sequence indexes too:
+```159:165:monorepo/backend/app/db/models/learning_shared.py
+sa.Index("idx_learning_job_events_job_seq_idx", "job_id", "seq"),
+```
+Summary of minimal backend edits
+- Add SSE route:
+  - `monorepo/backend/app/routes/clustered_training/core/routes.py` (GET `/api/rl/jobs/{job_id}/stream`)
+- Consume events via repo:
+  - `monorepo/backend/app/orchestration/jobs/repository_db.py` (use `list_events`)
+- Emit metrics from loops:
+  - `monorepo/backend/app/orchestration/hatchet/workflows.py` (append_event at train/eval steps)
+- Optional shared helper:
+  - `monorepo/backend/app/routes/simple_training/services/storage_shared.py` (normalize event reads)
+- Wireup:
+  - `monorepo/backend/app/routes/main.py` (include SSE route)
+- No schema changes required; you already have `JobEvent`/indexes and event append plumbing.
+### SDK additions for great terminal polling (with SSE fallback)
+- RlJobsApi extensions (synth_ai/jobs/client.py)
+  - stream(job_id, since_id=None, types=None, heartbeat=None) -> async iterator of events (uses SSE; falls back to polling)
+  - events(job_id, after=None, limit=500) -> list[JobEvent] (poll)
+  - status(job_id) -> JobSummary (single snapshot)
+- Event models (synth_ai/jobs/types.py)
+  - JobEvent base: {id, type, level, message, data, ts}
+  - StatusEvent, MetricEvent, LogEvent, ArtifactEvent (typed helpers)
+- JobsWatcher helper (synth_ai/jobs/watcher.py)
+  - constructor(client, job_id, interval=2.0, prefer_sse=True, jsonl_path=None, types=None)
+  - run(on_event, stop_when=None) → handles SSE connect/reconnect, polling fallback, Last-Event-ID cursor, dedupe
+  - metrics_tracker: rolling windows per metric name (mean/std/min/max, last_value, last_step)
+  - backoff policy: jittered reconnect; rate limiter for render
+- Terminal renderer (synth_ai/jobs/render.py)
+  - RichRenderer (or minimal TTY): compact line updates: t=HH:MM:SS | step/E | key metrics (avg_reward, loss, val_loss, success_rate)
+  - modes: one-line ticker vs. per-event lines; quiet mode; color by level/state
+  - JSONL sink: raw event writes without printing prompts/payloads
+- CLI command (synth_ai/api/train/cli.py)
+  - synth-ai jobs watch <job_id> [--types metric,status] [--interval 2] [--jsonl out.jsonl] [--since-id N] [--no-sse]
+  - exit codes: 0 on succeeded, 1 on failed/canceled, 2 on timeout
+- Utilities (synth_ai/jobs/utils.py)
+  - BackoffPolicy(retry, max) with jitter
+  - EventCursor(last_id, update)
+  - MetricsFormatter(map by job_type: RL vs FT metric labels)
+  - Coalescer: compress frequent metrics to ≤4 Hz
+- Defaults/behavior
+  - Prefer SSE; if 404/405/close → fallback to polling events() every interval
+  - Heartbeat support; show “connected/no data” when only heartbeats
+  - Resume: honor --since-id or Last-Event-ID; persist cursor optionally
+- Minimal backend assumptions
+  - GET /api/rl/jobs/{job_id}/stream (SSE) or /api/rl/jobs/{job_id}/events?after=… (poll)
+  - Events include metric/status/log/artifact with seq ids and ts
+- Extensibility
+  - Plugin renderers per job_type (rl, sft/ft)
+  - Hooks: on_status_change, on_metric(name, value), on_artifact(url)

examples/multi_step/task_app_config_notes.md CHANGED Viewed

@@ -485,4 +485,10 @@ payload = {
 }
 ```
-Status: Notes committed on branch `friday-cleanup` and pushed.
+Status: Notes committed on branch `friday-cleanup` and pushed.
+### Operational guardrails
+- Treat avg_turns == 0 (or a high fraction of episodes with turns == 0) as a failure condition; exit non‑zero.
+- Fail fast when the first policy step returns a 4xx/5xx from the inference target; include the HTTP status and URL in the error message.
+- CI hint: a tiny smoke run (2 seeds × 1 rollout) should see turns > 0 in healthy setups.

examples/swe/task_app/grpo_swe_mini.py CHANGED Viewed

@@ -60,34 +60,55 @@ try:
     HAS_HOSTED = True
 except Exception:
     try:  # pragma: no cover - optional dependency path
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.branching import (  # type: ignore
-            router as branching_router,
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.branching import (  # type: ignore
+            BranchingEnvironmentConfig,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.environment_routes import (  # type: ignore # noqa: E501
-            router as environment_router,
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.environment_routes import (  # type: ignore # noqa: E501
+            CrafterEnvironmentRoutes,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.policy_routes import (  # type: ignore
-            router as policy_router,
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.policy_routes import (  # type: ignore
+            PolicyRoutes,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (  # type: ignore
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (  # type: ignore
+            RolloutPayload,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            EnvironmentConfig,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            PolicyConfig,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            RolloutRequest,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            RolloutResponse,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            RunSpec,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            ToolUse,
+        )
+        from examples.task_apps.crafter.task_app.hosted.rollout import (  # type: ignore
             RolloutEnvSpec as LegacyRolloutEnvSpec,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutPolicySpec as LegacyRolloutPolicySpec,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutRecordConfig as LegacyRolloutRecordConfig,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutRequest as LegacyRolloutRequest,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutResponse as LegacyRolloutResponse,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutSafetyConfig as LegacyRolloutSafetyConfig,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             execute_rollout as legacy_execute_rollout,
         )
         HAS_HOSTED = True
@@ -264,7 +285,7 @@ def build_dataset() -> tuple[TaskDatasetRegistry, MiniSweDataset]:
 def _base_task_info(dataset: MiniSweDataset) -> TaskInfo:
     return TaskInfo(
         task={"id": "swe_mini", "name": "mini-SWE Tasks", "version": "0.1.0"},
-        environments=["swe-mini"],
+        environment="swe-mini",
         action_space={
             "type": "tool",
             "tools": ["run_command", "submit_patch"],
@@ -292,11 +313,6 @@ def _base_task_info(dataset: MiniSweDataset) -> TaskInfo:
             },
             "tool": {"name": "run_command", "parallel_tool_calls": False},
         },
-        capabilities={
-            "supports_rollout": True,
-            "supports_env_lifecycle": True,
-            "requires_api_key_header": True,
-        },
         limits={"max_ops": 2000, "max_time_s": 7200},
     )
@@ -348,18 +364,31 @@ def provide_task_instances(
     dataset: MiniSweDataset, base_info: TaskInfo, seeds: Sequence[int]
 ) -> Iterable[TaskInfo]:
     infos: list[TaskInfo] = []
+    base_observation = getattr(base_info, "observation", None)
+    if hasattr(base_observation, "model_dump"):
+        base_observation_data = base_observation.model_dump()
+    elif isinstance(base_observation, dict):
+        base_observation_data = dict(base_observation)
+    else:
+        base_observation_data = {}
     for seed in seeds:
         instance = dataset.sample_by_index(int(seed))
         infos.append(
             TaskInfo(
                 task=base_info.task,
-                environments=base_info.environments,
+                environment=base_info.environment,
                 action_space=base_info.action_space,
-                observation={**base_info.observation, "instance_id": instance["instance_id"]},
-                dataset={**base_info.dataset, "instance_id": instance["instance_id"]},
+                observation={
+                    **base_observation_data,
+                    "instance_id": instance["instance_id"],
+                },
+                dataset={
+                    **base_info.dataset.model_dump(),
+                    "instance_id": instance["instance_id"],
+                },
                 rubric=base_info.rubric,
                 inference=base_info.inference,
-                capabilities=base_info.capabilities,
                 limits=base_info.limits,
             )
         )
@@ -397,10 +426,10 @@ def build_config() -> TaskAppConfig:
             HostedTaskAppCls = HostedTaskApp
         except Exception:
             try:
-                from examples.warming_up_to_rl.task_app.synth_envs_hosted.hosted_app import (  # type: ignore
-                    TaskApp as HostedTaskApp,
+                from examples.task_apps.crafter.task_app.synth_envs_hosted.hosted_app import (  # type: ignore
+                    create_app,
                 )
-                HostedTaskAppCls = HostedTaskApp
+                HostedTaskAppCls = create_app
             except Exception as exc:  # pragma: no cover - optional dependency path
                 logger.warning("Unable to import HostedTaskApp for swe-mini: %s", exc)
         if HostedTaskAppCls is not None:

examples/swe/task_app/hosted/rollout.py CHANGED Viewed

@@ -1238,6 +1238,15 @@ async def execute_rollout(
                         )
                     # Build partial trajectory and return HTTP 200
+                    # Extract inference_url from policy meta (best effort)
+                    inference_url = None
+                    if policy_handle is not None:
+                        try:
+                            policy_snapshot = policy_handle.snapshot()
+                            inference_url = policy_snapshot.get("config", {}).get("inference_url")
+                        except Exception:
+                            pass
                     trajectory = RolloutTrajectory(
                         env_id=env_id,
                         policy_id=policy_id,
@@ -1249,6 +1258,7 @@ async def execute_rollout(
                             "at_op": op,
                         },
                         length=len(trajectory_steps),
+                        inference_url=inference_url,  # NEW: Required for trace correlation
                         decision_samples=decision_samples if step_rewards_active else None,
                     )
                     metrics = RolloutMetrics(
@@ -1369,6 +1379,15 @@ async def execute_rollout(
                         },
                     )
                     trajectory_steps.append(term_step)
+                    # Extract inference_url from policy meta (best effort)
+                    inference_url = None
+                    if policy_handle is not None:
+                        try:
+                            policy_snapshot = policy_handle.snapshot()
+                            inference_url = policy_snapshot.get("config", {}).get("inference_url")
+                        except Exception:
+                            pass
                     trajectory = RolloutTrajectory(
                         env_id=env_id,
                         policy_id=policy_id,
@@ -1379,6 +1398,7 @@ async def execute_rollout(
                             "at_op": op,
                         },
                         length=len(trajectory_steps),
+                        inference_url=inference_url,  # NEW: Required for trace correlation
                         decision_samples=decision_samples if step_rewards_active else None,
                     )
                     metrics = RolloutMetrics(
@@ -1460,6 +1480,15 @@ async def execute_rollout(
                     )
                     trajectory_steps.append(term_step)
                     # Build partial response
+                    # Extract inference_url from policy meta (best effort)
+                    inference_url = None
+                    if policy_handle is not None:
+                        try:
+                            policy_snapshot = policy_handle.snapshot()
+                            inference_url = policy_snapshot.get("config", {}).get("inference_url")
+                        except Exception:
+                            pass
                     trajectory = RolloutTrajectory(
                         env_id=env_id,
                         policy_id=policy_id,
@@ -1471,6 +1500,7 @@ async def execute_rollout(
                             "at_op": op,
                         },
                         length=len(trajectory_steps),
+                        inference_url=inference_url,  # NEW: Required for trace correlation
                         decision_samples=decision_samples if step_rewards_active else None,
                     )
                     metrics = RolloutMetrics(
@@ -1688,12 +1718,22 @@ async def execute_rollout(
                     timing_final.setdefault("overhead_ms", 0.0)
         # Build trajectory
+        # Extract inference_url from policy meta
+        inference_url = None
+        if policy_handle is not None:
+            try:
+                policy_snapshot = policy_handle.snapshot()
+                inference_url = policy_snapshot.get("config", {}).get("inference_url")
+            except Exception:
+                pass
         trajectory = RolloutTrajectory(
             env_id=env_id,
             policy_id=policy_id,
             steps=trajectory_steps,
             final={"observation": _summarize_observation_for_storage(env_handle, current_obs)},
             length=len(trajectory_steps),
+            inference_url=inference_url,  # NEW: Required for trace correlation
             decision_samples=decision_samples if step_rewards_active else None,
         )

examples/swe/task_app/hosted/test_service.py CHANGED Viewed

@@ -1,15 +1,14 @@
 #!/usr/bin/env python3
-"""
-Simple test script for the GRPO Synth Envs Hosted Service.
-Run this after starting the service with:
-    python main.py
-"""
+"""Manual smoke script for the GRPO Synth Envs Hosted Service."""
 import asyncio
 import json
 import httpx
+import pytest
+pytestmark = pytest.mark.skip(reason="Requires running hosted service on localhost:8000")
 async def test_service():

synth-ai 0.2.12__py3-none-any.whl → 0.2.13.dev2__py3-none-any.whl

Potentially problematic release.

synth-ai 0.2.12py3-none-any.whl → 0.2.13.dev2py3-none-any.whl