PyPI - synth-ai - Versions diffs - 0.2.13.dev1__py3-none-any.whl → 0.2.14__py3-none-any.whl - Mend

synth-ai 0.2.13.dev1py3-none-any.whl → 0.2.14py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of synth-ai might be problematic. Click here for more details.

Files changed (291) hide show

examples/multi_step/readme.md ADDED Viewed

@@ -0,0 +1,48 @@
+Crafter
+cd /Users/joshpurtell/Documents/GitHub/synth-ai && uvx synth-ai modal-serve grpo-crafter-task-app --name grpo-crafter-task-app --env-file /Users/joshpurtell/Documents/GitHub/monorepo/environments/crafter/.env
+cd /Users/joshpurtell/Documents/GitHub/monorepo && uv run modal deploy backend/app/routes/clustered_training/core/algorithms/gspo/app.py --env dev
+uvx synth-ai eval --config /Users/joshpurtell/Documents/GitHub/synth-ai/examples/multi_step/configs/crafter_eval_text_only_groq_qwen32b.toml
+uvx synth-ai train \
+  --type rl \
+  --config /Users/joshpurtell/Documents/GitHub/synth-ai/examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml \
+  --task-url https://synth-laboratories--grpo-crafter-task-app-fastapi-app-dev.modal.run \
+  --backend https://synth-backend-dev-docker.onrender.com/api \
+  --env-file /Users/joshpurtell/Documents/GitHub/monorepo/environments/crafter/.env
+---
+Verilog
+# 1. Deploy Verilog task app
+cd /Users/joshpurtell/Documents/GitHub/synth-ai && uvx synth-ai modal-serve grpo-verilog --name grpo-verilog-task-app --env-file /Users/joshpurtell/Documents/GitHub/monorepo/environments/verilog/.env
+# 2. Baseline eval using Synth backend (pre-training)
+uvx synth-ai eval --config /Users/joshpurtell/Documents/GitHub/synth-ai/examples/multi_step/configs/verilog_eval_synth_qwen4b.toml
+# 3. (Optional) External reference eval using Groq Qwen 32B
+uvx synth-ai eval --config /Users/joshpurtell/Documents/GitHub/synth-ai/examples/multi_step/configs/verilog_eval_groq_qwen32b.toml
+# 4. Deploy training backend
+cd /Users/joshpurtell/Documents/GitHub/monorepo && uv run modal deploy backend/app/routes/clustered_training/core/algorithms/gspo/app.py --env dev
+# 5. Run RL training
+uvx synth-ai train \
+  --type rl \
+  --config /Users/joshpurtell/Documents/GitHub/synth-ai/examples/multi_step/configs/verilog_rl_lora.toml \
+  --task-url https://synth-laboratories--grpo-verilog-task-app-fastapi-app-dev.modal.run \
+  --backend https://synth-backend-dev-docker.onrender.com/api \
+  --env-file /Users/joshpurtell/Documents/GitHub/monorepo/environments/verilog/.env
+# 6. Post-training eval (update job_id in config first!)
+# After training, note the job_id from logs (e.g., job_19a1823e56303de604f)
+# Update verilog_eval_synth_trained_qwen8b.toml with your job_id
+uvx synth-ai eval --config /Users/joshpurtell/Documents/GitHub/synth-ai/examples/multi_step/configs/verilog_eval_synth_trained_qwen8b.toml

examples/multi_step/verilog_rl_lora.md ADDED Viewed

@@ -0,0 +1,218 @@
+# Verilog RL with LoRA Analysis
+## Executive Summary
+**✅ YES, Verilog can absolutely do RL with LoRA just like Crafter!** The architecture is nearly identical, but there are important considerations around model size and task complexity.
+## Architecture Compatibility ✅
+### **Same Foundation** (No changes needed)
+- ✅ **Contracts**: Uses identical `RolloutRequest`/`RolloutResponse` as Crafter
+- ✅ **Task App Framework**: Same `synth_ai.task.apps` framework
+- ✅ **Environment Pattern**: Same `StatefulEnvironment` + tool-based architecture
+- ✅ **Rubrics System**: Same evaluation and reward system
+- ✅ **Trace Correlation**: Already implemented in `rollout_executor` (line 817 in `grpo_verilog.py`)
+- ✅ **Modal Deployment**: Same deployment pattern as Crafter
+### **Key Differences** (Considerations for LoRA)
+#### 1. **Model Size: 8x Larger** ⚠️
+```toml
+# Verilog (current)
+model = "qwen/qwen3-32b"  # 32B parameters
+# Crafter (working)
+model = "Qwen/Qwen3-4B"   # 4B parameters
+```
+**Impact**: Memory requirements 8x higher for LoRA training
+**Solution**: Use gradient checkpointing, smaller batch sizes, or distributed training
+#### 2. **Tool Set: Simpler but More Structured**
+```python
+# Verilog Tools (4 tools)
+TOOLS = ["write_file", "compile", "simulate", "submit"]
+# Crafter Tools (20+ tools)
+# craft, move, attack, gather, etc.
+```
+**Verilog Advantages**:
+- ✅ **Deterministic**: Write → Compile → Simulate → Submit workflow
+- ✅ **Clear Success Criteria**: Tests pass = high reward
+- ✅ **Sparse but Meaningful Rewards**: +10 for submit success, +1 for simulation pass
+**Verilog Challenges**:
+- ❌ **Sparser Rewards**: Fewer intermediate signals for learning
+- ❌ **Longer Sequences**: Multi-step compilation chains
+- ❌ **Error Recovery**: Must debug compilation failures
+#### 3. **State Representation**
+```python
+# Verilog State (file-based)
+{
+    "files": {"TopModule.v": "module TopModule(..."},
+    "compile_status": "Last compile: Success",
+    "simulate_status": "Last simulation: Passed",
+    "task_completed": false
+}
+# Crafter State (world-based)
+{
+    "inventory": {"wood": 5, "stone": 3},
+    "position": [x, y],
+    "nearby_entities": [...],
+    "achievement_unlocked": true
+}
+```
+## Configuration for LoRA RL
+### **Option 1: Qwen3-0.6B (Recommended for testing)** ⭐
+```toml
+[algorithm]
+type = "online"
+method = "policy_gradient"
+variety = "gspo"
+[model]
+base = "Qwen/Qwen3-0.6B"  # ✅ Same as existing SFT configs
+trainer_mode = "lora"
+[lora]
+r = 16
+alpha = 32
+dropout = 0.05
+target_modules = ["all-linear"]
+[rollout]
+env_name = "verilog"
+max_turns = 15
+policy_name = "verilog-designer"
+[training]
+batch_size = 4  # ✅ Same as Crafter
+gradient_accumulation_steps = 1
+```
+### **Option 2: Qwen3-32B (Production)** ⚠️
+```toml
+[algorithm]
+type = "online"
+method = "policy_gradient"
+variety = "gspo"
+[model]
+base = "qwen/qwen3-32b"  # ⚠️ 8x memory vs Crafter's 4B
+trainer_mode = "lora"
+[lora]
+r = 16
+alpha = 32
+dropout = 0.05
+target_modules = ["all-linear"]
+[rollout]
+env_name = "verilog"
+max_turns = 15
+policy_name = "verilog-designer"
+```
+### **Memory Optimization** (for 32B model)
+```toml
+[vllm]
+max_model_len = 4096  # Shorter than Crafter's 8192
+tensor_parallel_size = 2  # Distribute across GPUs
+[training]
+batch_size = 2  # Smaller than Crafter's 4
+gradient_accumulation_steps = 4
+```
+## Task App Changes Needed
+### **1. Mode Parameter Support** ✅ (Already implemented)
+The Verilog task app already handles `mode="rl"` correctly:
+```python
+# In grpo_verilog.py rollout_executor
+policy_config = dict(policy_config_raw)
+# ... mode parameter flows through naturally
+```
+### **2. Trace Correlation** ✅ (Already implemented)
+```python
+# Line 817 in grpo_verilog.py
+trajectory = RolloutTrajectory(
+    # ...
+    inference_url=agent.inference_url,  # ✅ Required for trace correlation
+    decision_samples=None,
+)
+```
+### **3. Rubric Integration** ✅ (Already configured)
+```python
+# In grpo_verilog.py
+rubrics=RubricBundle(
+    outcome=OUTCOME_RUBRIC,  # Tests pass reward
+    events=EVENTS_RUBRIC,    # Process efficiency reward
+)
+```
+## RL Training Feasibility
+### **✅ Works Great**
+1. **Clear Success Signal**: Submit passing tests = +10 reward
+2. **Guided Process**: Natural write→compile→simulate→submit progression
+3. **Error Learning**: Agent must learn to debug compilation failures
+4. **Hardware Design**: Real-world applicable skills
+### **⚠️ Challenges**
+1. **Model Size**: 32B vs 4B = 8x memory, slower training
+2. **Sparse Rewards**: Fewer learning signals than Crafter's dense rewards
+3. **Longer Episodes**: 15+ steps vs Crafter's 10 steps
+4. **Compilation Errors**: Must learn to interpret and fix syntax errors
+## Recommended Approach
+### **Phase 1: Start with Qwen3-0.6B** ⭐ (as you requested)
+```toml
+# Perfect for testing - same model used in existing SFT configs
+model = "Qwen/Qwen3-0.6B"
+batch_size = 4  # Same as Crafter
+```
+- ✅ **Zero setup**: Already configured in `synth-ai/examples/sft/configs/crafter_lora_qwen0p6b.toml`
+- ✅ **Fast iteration**: 0.6B parameters = quick training cycles
+- ✅ **Memory efficient**: Fits on single GPU easily
+- ✅ **Proven baseline**: Same model used in RL demos and SFT examples
+### **Phase 2: Scale to Qwen3-8B** (if 0.6B works well)
+```toml
+model = "qwen/qwen3-8b"
+batch_size = 2
+gradient_accumulation_steps = 2
+```
+### **Phase 3: Production with Qwen3-32B**
+```toml
+model = "qwen/qwen3-32b"
+tensor_parallel_size = 2
+batch_size = 1
+gradient_accumulation_steps = 4
+```
+### **Phase 3: Optimize for Verilog Domain**
+Consider fine-tuning the base model on:
+- Verilog syntax and semantics
+- Hardware design patterns
+- Compilation error messages
+- Testbench writing
+## Conclusion
+**✅ Verilog RL with LoRA is absolutely feasible** and should work with the same pipeline as Crafter. The main differences are:
+1. **Larger model** (32B vs 4B) requires memory optimization
+2. **Sparser rewards** may need different reward shaping
+3. **More structured tasks** could actually make learning easier
+4. **Real hardware skills** make it more valuable than game tasks
+**Recommended next step**: Create a `verilog_rl_lora.toml` config starting with Qwen3-8B and adapt the reward rubrics for the compilation workflow.

examples/qwen_coder/configs/coder_lora_30b.toml CHANGED Viewed

@@ -3,7 +3,7 @@
 [algorithm]
 type = "offline"
 method = "sft"
-variety = "fft"
+variety = "lora"
 [job]
 model = "Qwen/Qwen3-Coder-30B-A3B-Instruct"

examples/sft/evaluate.py CHANGED Viewed

@@ -44,6 +44,7 @@ def _ops(n: int) -> list[str]:
 def _request(seed: int, a: EvalArgs) -> RolloutRequest:
+    from synth_ai.task.contracts import RolloutMode
     return RolloutRequest(
         run_id=f"eval-{seed}",
         env=RolloutEnvSpec(env_name="crafter", seed=seed, config={}),
@@ -53,6 +54,7 @@ def _request(seed: int, a: EvalArgs) -> RolloutRequest:
         ),
         ops=_ops(a.max_llm_calls),
         record=RolloutRecordConfig(trajectories=True, return_trace=False, trace_format="compact"),
+        mode=RolloutMode.EVAL,
     )

examples/sft/generate_traces.py CHANGED Viewed

@@ -42,6 +42,7 @@ def _build_ops(max_llm_calls: int) -> list[str]:
 def _build_request(seed: int, run_id: str, model: str, inference_url: str, api_key: str, *, max_llm_calls: int, return_trace: bool) -> RolloutRequest:
+    from synth_ai.task.contracts import RolloutMode
     policy_cfg: dict[str, Any] = {
         "model": model,
         "inference_url": inference_url,
@@ -54,6 +55,7 @@ def _build_request(seed: int, run_id: str, model: str, inference_url: str, api_k
         policy=RolloutPolicySpec(policy_name="crafter-react", config=policy_cfg),
         ops=_build_ops(max_llm_calls),
         record=record,
+        mode=RolloutMode.EVAL,
     )

examples/swe/task_app/grpo_swe_mini.py CHANGED Viewed

@@ -60,34 +60,55 @@ try:
     HAS_HOSTED = True
 except Exception:
     try:  # pragma: no cover - optional dependency path
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.branching import (  # type: ignore
-            router as branching_router,
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.branching import (  # type: ignore
+            BranchingEnvironmentConfig,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.environment_routes import (  # type: ignore # noqa: E501
-            router as environment_router,
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.environment_routes import (  # type: ignore # noqa: E501
+            CrafterEnvironmentRoutes,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.policy_routes import (  # type: ignore
-            router as policy_router,
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.policy_routes import (  # type: ignore
+            PolicyRoutes,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (  # type: ignore
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (  # type: ignore
+            RolloutPayload,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            EnvironmentConfig,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            PolicyConfig,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            RolloutRequest,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            RolloutResponse,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            RunSpec,
+        )
+        from examples.task_apps.crafter.task_app.synth_envs_hosted.rollout import (
+            ToolUse,
+        )
+        from examples.task_apps.crafter.task_app.hosted.rollout import (  # type: ignore
             RolloutEnvSpec as LegacyRolloutEnvSpec,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutPolicySpec as LegacyRolloutPolicySpec,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutRecordConfig as LegacyRolloutRecordConfig,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutRequest as LegacyRolloutRequest,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutResponse as LegacyRolloutResponse,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             RolloutSafetyConfig as LegacyRolloutSafetyConfig,
         )
-        from examples.warming_up_to_rl.task_app.synth_envs_hosted.rollout import (
+        from examples.task_apps.crafter.task_app.hosted.rollout import (
             execute_rollout as legacy_execute_rollout,
         )
         HAS_HOSTED = True
@@ -264,7 +285,7 @@ def build_dataset() -> tuple[TaskDatasetRegistry, MiniSweDataset]:
 def _base_task_info(dataset: MiniSweDataset) -> TaskInfo:
     return TaskInfo(
         task={"id": "swe_mini", "name": "mini-SWE Tasks", "version": "0.1.0"},
-        environments=["swe-mini"],
+        environment="swe-mini",
         action_space={
             "type": "tool",
             "tools": ["run_command", "submit_patch"],
@@ -292,11 +313,6 @@ def _base_task_info(dataset: MiniSweDataset) -> TaskInfo:
             },
             "tool": {"name": "run_command", "parallel_tool_calls": False},
         },
-        capabilities={
-            "supports_rollout": True,
-            "supports_env_lifecycle": True,
-            "requires_api_key_header": True,
-        },
         limits={"max_ops": 2000, "max_time_s": 7200},
     )
@@ -348,18 +364,31 @@ def provide_task_instances(
     dataset: MiniSweDataset, base_info: TaskInfo, seeds: Sequence[int]
 ) -> Iterable[TaskInfo]:
     infos: list[TaskInfo] = []
+    base_observation = getattr(base_info, "observation", None)
+    if hasattr(base_observation, "model_dump"):
+        base_observation_data = base_observation.model_dump()
+    elif isinstance(base_observation, dict):
+        base_observation_data = dict(base_observation)
+    else:
+        base_observation_data = {}
     for seed in seeds:
         instance = dataset.sample_by_index(int(seed))
         infos.append(
             TaskInfo(
                 task=base_info.task,
-                environments=base_info.environments,
+                environment=base_info.environment,
                 action_space=base_info.action_space,
-                observation={**base_info.observation, "instance_id": instance["instance_id"]},
-                dataset={**base_info.dataset, "instance_id": instance["instance_id"]},
+                observation={
+                    **base_observation_data,
+                    "instance_id": instance["instance_id"],
+                },
+                dataset={
+                    **base_info.dataset.model_dump(),
+                    "instance_id": instance["instance_id"],
+                },
                 rubric=base_info.rubric,
                 inference=base_info.inference,
-                capabilities=base_info.capabilities,
                 limits=base_info.limits,
             )
         )
@@ -397,10 +426,10 @@ def build_config() -> TaskAppConfig:
             HostedTaskAppCls = HostedTaskApp
         except Exception:
             try:
-                from examples.warming_up_to_rl.task_app.synth_envs_hosted.hosted_app import (  # type: ignore
-                    TaskApp as HostedTaskApp,
+                from examples.task_apps.crafter.task_app.synth_envs_hosted.hosted_app import (  # type: ignore
+                    create_app,
                 )
-                HostedTaskAppCls = HostedTaskApp
+                HostedTaskAppCls = create_app
             except Exception as exc:  # pragma: no cover - optional dependency path
                 logger.warning("Unable to import HostedTaskApp for swe-mini: %s", exc)
         if HostedTaskAppCls is not None:
@@ -455,6 +484,7 @@ def build_config() -> TaskAppConfig:
         legacy_request = LegacyRolloutRequest(
             run_id=request.run_id,
+            mode=request.mode,  # Preserve mode for nested requests
             env=LegacyRolloutEnvSpec(
                 env_id=request.env.env_id,
                 env_name=env_spec.env_name or "swe-mini",

examples/swe/task_app/hosted/rollout.py CHANGED Viewed

@@ -12,6 +12,7 @@ from fastapi import APIRouter, HTTPException, Request, status
 from pydantic import BaseModel
 from synth_ai.lm.vendors.base import BaseLMResponse
 from synth_ai.task.tracing_utils import unique_sft_path
+from synth_ai.task.contracts import RolloutMode
 from synth_ai.tracing_v3.abstractions import EnvironmentEvent, LMCAISEvent, TimeRecord
 from synth_ai.tracing_v3.llm_call_record_helpers import create_llm_call_record_from_response
 from synth_ai.tracing_v3.session_tracer import SessionTracer
@@ -120,6 +121,7 @@ class RolloutRequest(BaseModel):
     # Optional run/session context
     training_session_id: str | None = None
     synth_base_url: str | None = None
+    mode: RolloutMode  # Required: explicit RL vs EVAL mode
 class RolloutStep(BaseModel):
@@ -1238,6 +1240,15 @@ async def execute_rollout(
                         )
                     # Build partial trajectory and return HTTP 200
+                    # Extract inference_url from policy meta (best effort)
+                    inference_url = None
+                    if policy_handle is not None:
+                        try:
+                            policy_snapshot = policy_handle.snapshot()
+                            inference_url = policy_snapshot.get("config", {}).get("inference_url")
+                        except Exception:
+                            pass
                     trajectory = RolloutTrajectory(
                         env_id=env_id,
                         policy_id=policy_id,
@@ -1249,6 +1260,7 @@ async def execute_rollout(
                             "at_op": op,
                         },
                         length=len(trajectory_steps),
+                        inference_url=inference_url,  # NEW: Required for trace correlation
                         decision_samples=decision_samples if step_rewards_active else None,
                     )
                     metrics = RolloutMetrics(
@@ -1369,6 +1381,15 @@ async def execute_rollout(
                         },
                     )
                     trajectory_steps.append(term_step)
+                    # Extract inference_url from policy meta (best effort)
+                    inference_url = None
+                    if policy_handle is not None:
+                        try:
+                            policy_snapshot = policy_handle.snapshot()
+                            inference_url = policy_snapshot.get("config", {}).get("inference_url")
+                        except Exception:
+                            pass
                     trajectory = RolloutTrajectory(
                         env_id=env_id,
                         policy_id=policy_id,
@@ -1379,6 +1400,7 @@ async def execute_rollout(
                             "at_op": op,
                         },
                         length=len(trajectory_steps),
+                        inference_url=inference_url,  # NEW: Required for trace correlation
                         decision_samples=decision_samples if step_rewards_active else None,
                     )
                     metrics = RolloutMetrics(
@@ -1460,6 +1482,15 @@ async def execute_rollout(
                     )
                     trajectory_steps.append(term_step)
                     # Build partial response
+                    # Extract inference_url from policy meta (best effort)
+                    inference_url = None
+                    if policy_handle is not None:
+                        try:
+                            policy_snapshot = policy_handle.snapshot()
+                            inference_url = policy_snapshot.get("config", {}).get("inference_url")
+                        except Exception:
+                            pass
                     trajectory = RolloutTrajectory(
                         env_id=env_id,
                         policy_id=policy_id,
@@ -1471,6 +1502,7 @@ async def execute_rollout(
                             "at_op": op,
                         },
                         length=len(trajectory_steps),
+                        inference_url=inference_url,  # NEW: Required for trace correlation
                         decision_samples=decision_samples if step_rewards_active else None,
                     )
                     metrics = RolloutMetrics(
@@ -1688,12 +1720,22 @@ async def execute_rollout(
                     timing_final.setdefault("overhead_ms", 0.0)
         # Build trajectory
+        # Extract inference_url from policy meta
+        inference_url = None
+        if policy_handle is not None:
+            try:
+                policy_snapshot = policy_handle.snapshot()
+                inference_url = policy_snapshot.get("config", {}).get("inference_url")
+            except Exception:
+                pass
         trajectory = RolloutTrajectory(
             env_id=env_id,
             policy_id=policy_id,
             steps=trajectory_steps,
             final={"observation": _summarize_observation_for_storage(env_handle, current_obs)},
             length=len(trajectory_steps),
+            inference_url=inference_url,  # NEW: Required for trace correlation
             decision_samples=decision_samples if step_rewards_active else None,
         )

examples/swe/task_app/hosted/test_service.py CHANGED Viewed

@@ -1,15 +1,14 @@
 #!/usr/bin/env python3
-"""
-Simple test script for the GRPO Synth Envs Hosted Service.
-Run this after starting the service with:
-    python main.py
-"""
+"""Manual smoke script for the GRPO Synth Envs Hosted Service."""
 import asyncio
 import json
 import httpx
+import pytest
+pytestmark = pytest.mark.skip(reason="Requires running hosted service on localhost:8000")
 async def test_service():

synth-ai 0.2.13.dev1__py3-none-any.whl → 0.2.14__py3-none-any.whl

Potentially problematic release.

synth-ai 0.2.13.dev1py3-none-any.whl → 0.2.14py3-none-any.whl