PyPI - synth-ai - Versions diffs - 0.2.17__py3-none-any.whl → 0.2.19__py3-none-any.whl - Mend

synth-ai 0.2.17py3-none-any.whl → 0.2.19py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of synth-ai might be problematic. Click here for more details.

Files changed (169) hide show

examples/baseline/banking77_baseline.py +204 -0
examples/baseline/crafter_baseline.py +407 -0
examples/baseline/pokemon_red_baseline.py +326 -0
examples/baseline/simple_baseline.py +56 -0
examples/baseline/warming_up_to_rl_baseline.py +239 -0
examples/blog_posts/gepa/README.md +355 -0
examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
examples/blog_posts/gepa/configs/banking77_gepa_test.toml +82 -0
examples/blog_posts/gepa/configs/banking77_mipro_local.toml +52 -0
examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +59 -0
examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +36 -0
examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +53 -0
examples/blog_posts/gepa/configs/hover_gepa_local.toml +59 -0
examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +36 -0
examples/blog_posts/gepa/configs/hover_mipro_local.toml +53 -0
examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +59 -0
examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +36 -0
examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +53 -0
examples/blog_posts/gepa/configs/pupa_gepa_local.toml +60 -0
examples/blog_posts/gepa/configs/pupa_mipro_local.toml +54 -0
examples/blog_posts/gepa/deploy_banking77_task_app.sh +41 -0
examples/blog_posts/gepa/gepa_baseline.py +204 -0
examples/blog_posts/gepa/query_prompts_example.py +97 -0
examples/blog_posts/gepa/run_gepa_banking77.sh +87 -0
examples/blog_posts/gepa/task_apps.py +105 -0
examples/blog_posts/gepa/test_gepa_local.sh +67 -0
examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +12 -10
examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +1 -0
examples/blog_posts/pokemon_vl/extract_images.py +239 -0
examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +1 -1
examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +60 -10
examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +1 -1
examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
examples/multi_step/configs/VERILOG_REWARDS.md +4 -0
examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +4 -0
examples/multi_step/configs/crafter_rl_outcome.toml +1 -0
examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +1 -0
examples/multi_step/configs/crafter_rl_stepwise_simple.toml +1 -0
examples/rl/configs/rl_from_base_qwen17.toml +1 -0
examples/swe/task_app/hosted/inference/openai_client.py +0 -34
examples/swe/task_app/hosted/policy_routes.py +17 -0
examples/swe/task_app/hosted/rollout.py +4 -2
examples/task_apps/banking77/__init__.py +6 -0
examples/task_apps/banking77/banking77_task_app.py +841 -0
examples/task_apps/banking77/deploy_wrapper.py +46 -0
examples/task_apps/crafter/CREATE_SFT_DATASET.md +4 -0
examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +4 -0
examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +4 -0
examples/task_apps/crafter/task_app/grpo_crafter.py +24 -2
examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +49 -0
examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +355 -58
examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +68 -7
examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +78 -21
examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +194 -1
examples/task_apps/gepa_benchmarks/__init__.py +7 -0
examples/task_apps/gepa_benchmarks/common.py +260 -0
examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +4 -0
examples/task_apps/pokemon_red/task_app.py +254 -36
examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +1 -0
examples/warming_up_to_rl/task_app/grpo_crafter.py +53 -4
examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +49 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +152 -41
examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +31 -1
examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +33 -3
examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +67 -0
examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +1 -0
synth_ai/api/train/builders.py +90 -1
synth_ai/api/train/cli.py +396 -21
synth_ai/api/train/config_finder.py +13 -2
synth_ai/api/train/configs/__init__.py +15 -1
synth_ai/api/train/configs/prompt_learning.py +442 -0
synth_ai/api/train/configs/rl.py +29 -0
synth_ai/api/train/task_app.py +1 -1
synth_ai/api/train/validators.py +277 -0
synth_ai/baseline/__init__.py +25 -0
synth_ai/baseline/config.py +209 -0
synth_ai/baseline/discovery.py +214 -0
synth_ai/baseline/execution.py +146 -0
synth_ai/cli/__init__.py +85 -17
synth_ai/cli/__main__.py +0 -0
synth_ai/cli/claude.py +70 -0
synth_ai/cli/codex.py +84 -0
synth_ai/cli/commands/__init__.py +1 -0
synth_ai/cli/commands/baseline/__init__.py +12 -0
synth_ai/cli/commands/baseline/core.py +637 -0
synth_ai/cli/commands/baseline/list.py +93 -0
synth_ai/cli/commands/eval/core.py +13 -10
synth_ai/cli/commands/filter/core.py +53 -17
synth_ai/cli/commands/help/core.py +0 -1
synth_ai/cli/commands/smoke/__init__.py +7 -0
synth_ai/cli/commands/smoke/core.py +1436 -0
synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
synth_ai/cli/commands/status/subcommands/usage.py +203 -0
synth_ai/cli/commands/train/judge_schemas.py +1 -0
synth_ai/cli/commands/train/judge_validation.py +1 -0
synth_ai/cli/commands/train/validation.py +0 -57
synth_ai/cli/demo.py +35 -3
synth_ai/cli/deploy/__init__.py +40 -25
synth_ai/cli/deploy.py +162 -0
synth_ai/cli/legacy_root_backup.py +14 -8
synth_ai/cli/opencode.py +107 -0
synth_ai/cli/root.py +9 -5
synth_ai/cli/task_app_deploy.py +1 -1
synth_ai/cli/task_apps.py +53 -53
synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
synth_ai/judge_schemas.py +1 -0
synth_ai/learning/__init__.py +10 -0
synth_ai/learning/prompt_learning_client.py +276 -0
synth_ai/learning/prompt_learning_types.py +184 -0
synth_ai/pricing/__init__.py +2 -0
synth_ai/pricing/model_pricing.py +57 -0
synth_ai/streaming/handlers.py +53 -4
synth_ai/streaming/streamer.py +19 -0
synth_ai/task/apps/__init__.py +1 -0
synth_ai/task/config.py +2 -0
synth_ai/task/tracing_utils.py +25 -25
synth_ai/task/validators.py +44 -8
synth_ai/task_app_cfgs.py +21 -0
synth_ai/tracing_v3/config.py +162 -19
synth_ai/tracing_v3/constants.py +1 -1
synth_ai/tracing_v3/db_config.py +24 -38
synth_ai/tracing_v3/storage/config.py +47 -13
synth_ai/tracing_v3/storage/factory.py +3 -3
synth_ai/tracing_v3/turso/daemon.py +113 -11
synth_ai/tracing_v3/turso/native_manager.py +92 -16
synth_ai/types.py +8 -0
synth_ai/urls.py +11 -0
synth_ai/utils/__init__.py +30 -1
synth_ai/utils/agents.py +74 -0
synth_ai/utils/bin.py +39 -0
synth_ai/utils/cli.py +149 -5
synth_ai/utils/env.py +17 -17
synth_ai/utils/json.py +72 -0
synth_ai/utils/modal.py +283 -1
synth_ai/utils/paths.py +48 -0
synth_ai/utils/uvicorn.py +113 -0
{synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/METADATA +102 -4
{synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/RECORD +162 -88
synth_ai/cli/commands/deploy/__init__.py +0 -23
synth_ai/cli/commands/deploy/core.py +0 -614
synth_ai/cli/commands/deploy/errors.py +0 -72
synth_ai/cli/commands/deploy/validation.py +0 -11
synth_ai/cli/deploy/core.py +0 -5
synth_ai/cli/deploy/errors.py +0 -23
synth_ai/cli/deploy/validation.py +0 -5
{synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/WHEEL +0 -0
{synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/entry_points.txt +0 -0
{synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/licenses/LICENSE +0 -0
{synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/top_level.txt +0 -0

synth_ai/utils/uvicorn.py ADDED Viewed

@@ -0,0 +1,113 @@
+import importlib.util as import_util
+import os
+import sys
+from pathlib import Path
+from typing import Any
+from synth_ai.task_app_cfgs import LocalTaskAppConfig
+from synth_ai.utils.env import resolve_env_var
+REPO_ROOT = Path(__file__).resolve().parents[2]
+START_DIV = f"{'-' * 30} Uvicorn start {'-' * 30}"
+END_DIV = f"{'-' * 31} Uvicorn end {'-' * 31}"
+_ASGI_FACTORY_NAMES = (
+    "fastapi_app",
+    "create_app",
+    "build_app",
+    "configure_app",
+    "get_app",
+    "app_factory",
+)
+def _coerce_asgi_app(candidate: Any) -> Any | None:
+    if candidate is None:
+        return None
+    if callable(candidate):
+        return candidate
+    return None
+def deploy_uvicorn_app(cfg: LocalTaskAppConfig) -> None:
+    task_app_path = cfg.task_app_path.resolve()
+    env_key = resolve_env_var("ENVIRONMENT_API_KEY")
+    if not env_key:
+        raise RuntimeError("ENVIRONMENT_API_KEY is required to serve locally.")
+    if cfg.trace:
+        os.environ["TASKAPP_TRACING_ENABLED"] = "1"
+    else:
+        os.environ.pop("TASKAPP_TRACING_ENABLED", None)
+    task_app_dir = task_app_path.parent.resolve()
+    candidates: list[Path] = [task_app_dir]
+    if (task_app_dir / "__init__.py").exists():
+        candidates.append(task_app_dir.parent.resolve())
+    candidates.append(REPO_ROOT)
+    unique: list[str] = []
+    for candidate in candidates:
+        candidate_str = str(candidate)
+        if candidate_str and candidate_str not in unique:
+            unique.append(candidate_str)
+    existing = os.environ.get("PYTHONPATH")
+    if existing:
+        for segment in existing.split(os.pathsep):
+            if segment and segment not in unique:
+                unique.append(segment)
+    os.environ["PYTHONPATH"] = os.pathsep.join(unique)
+    for entry in reversed(unique):
+        if entry and entry not in sys.path:
+            sys.path.insert(0, entry)
+    module_name = f"_synth_local_task_app_{task_app_path.stem}"
+    spec = import_util.spec_from_file_location(module_name, str(task_app_path))
+    if spec is None or spec.loader is None:
+        raise RuntimeError(f"Unable to load task app at {task_app_path}")
+    module = import_util.module_from_spec(spec)
+    sys.modules[module_name] = module
+    try:
+        spec.loader.exec_module(module)  # type: ignore[call-arg]
+    except Exception as exc:
+        raise RuntimeError(f"Failed to import task app: {exc}") from exc
+    app = _coerce_asgi_app(getattr(module, "app", None))
+    if app is None:
+        for name in _ASGI_FACTORY_NAMES:
+            factory = getattr(module, name, None)
+            if callable(factory):
+                produced = factory()
+                coerced = _coerce_asgi_app(produced)
+                if coerced is not None:
+                    app = coerced
+                    break
+    if app is None:
+        raise RuntimeError("Task app must expose an ASGI application via `app = FastAPI(...)` or a callable factory.")
+    host = cfg.host
+    port = cfg.port
+    preview_host = "127.0.0.1" if host in {"0.0.0.0", "::"} else host
+    print(f"[uvicorn] Serving task app at http://{preview_host}:{port}")
+# Deploy
+    try:
+        import uvicorn  # type: ignore
+    except ImportError as exc:
+        raise RuntimeError(
+            "uvicorn is required to serve task apps locally. Install it with `pip install uvicorn`."
+        ) from exc
+    try:
+        print(START_DIV)
+        uvicorn.run(app, host=host, port=port, reload=False, log_level="info")
+    except KeyboardInterrupt:
+        print("\n[uvicorn] Stopped by user.")
+    except Exception as exc:
+        raise RuntimeError(f"uvicorn runtime failed: {exc}") from exc
+    finally:
+        print(END_DIV)

{synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: synth-ai
-Version: 0.2.17
+Version: 0.2.19
 Summary: RL as a service SDK - Core AI functionality and tracing
 Author-email: Synth AI <josh@usesynth.ai>
 License-Expression: MIT
@@ -23,8 +23,8 @@ Requires-Dist: rich>=13.9.0
 Requires-Dist: openai>=1.99.0
 Requires-Dist: anthropic>=0.42.0
 Requires-Dist: langfuse<3.0.0,>=2.53.9
-Requires-Dist: opentelemetry-api<1.27.0,>=1.26.0
-Requires-Dist: opentelemetry-sdk<1.27.0,>=1.26.0
+Requires-Dist: opentelemetry-api>=1.26.0
+Requires-Dist: opentelemetry-sdk>=1.26.0
 Requires-Dist: diskcache>=5.6.3
 Requires-Dist: groq>=0.30.0
 Requires-Dist: google-genai>=1.26.0
@@ -53,9 +53,10 @@ Requires-Dist: aiohttp>=3.8.0
 Requires-Dist: httpx>=0.28.1
 Requires-Dist: datasets>=4.0.0
 Requires-Dist: transformers>=4.56.1
-Requires-Dist: modal==1.1.4
+Requires-Dist: modal<2.0.0,>=1.1.4
 Requires-Dist: pyboy>=2.6.0
 Requires-Dist: setuptools>=80.9.0
+Requires-Dist: libsql-experimental>=0.0.55
 Provides-Extra: dev
 Requires-Dist: build>=1.2.2.post1; extra == "dev"
 Requires-Dist: twine>=4.0.0; extra == "dev"
@@ -118,6 +119,7 @@ uvx synth-ai setup
 uvx synth-ai demo
 uvx synth-ai deploy
 uvx synth-ai run
+uvx synth-ai baseline  # For coding agents: get baseline scores
 ```
 > Full quickstart: [https://docs.usesynth.ai/sdk/get-started](https://docs.usesynth.ai/sdk/get-started)
@@ -158,6 +160,102 @@ Synth-AI ships with a built-in RL example: training **Qwen3-0.6B** on math reaso
 ---
+## 🤖 For Coding Agents: Get Started with Baselines
+**Baselines** are the fastest way for coding agents to evaluate changes and measure improvement on Synth tasks.
+### Why Use Baselines?
+Baselines provide a **self-contained evaluation system** that:
+- ✅ **No infrastructure required** — runs locally, no deployed task app needed
+- ✅ **Quick feedback loop** — get task-by-task results in seconds
+- ✅ **Compare changes** — establish a baseline score before making modifications
+- ✅ **Auto-discoverable** — finds baseline files automatically in your codebase
+### Quick Start for Coding Agents
+```bash
+# 1. List available baselines
+uvx synth-ai baseline list
+# 2. Run a quick 3-task baseline to get started
+uvx synth-ai baseline banking77 --split train --seeds 0,1,2
+# 3. Get your baseline score (full train split)
+uvx synth-ai baseline banking77 --split train
+# 4. Make your changes to the code...
+# 5. Re-run to compare performance
+uvx synth-ai baseline banking77 --split train --output results_after.json
+```
+### Available Baselines
+```bash
+# Filter by task type
+uvx synth-ai baseline list --tag rl          # RL tasks
+uvx synth-ai baseline list --tag nlp         # NLP tasks
+uvx synth-ai baseline list --tag vision      # Vision tasks
+# Run specific baselines
+uvx synth-ai baseline warming_up_to_rl       # Crafter survival game
+uvx synth-ai baseline pokemon_vl             # Pokemon Red (vision)
+uvx synth-ai baseline gepa                   # Banking77 classification
+```
+### Baseline Results
+Each baseline run provides:
+- **Task-by-task results** — see exactly which seeds succeed/fail
+- **Aggregate metrics** — success rate, mean/std rewards, total tasks
+- **Serializable output** — save to JSON with `--output results.json`
+- **Model comparison** — test different models with `--model`
+Example output:
+```
+============================================================
+Baseline Evaluation: Banking77 Intent Classification
+============================================================
+Split(s): train
+Tasks: 10
+Success: 8/10
+Execution time: 12.34s
+Aggregate Metrics:
+  mean_outcome_reward: 0.8000
+  success_rate: 0.8000
+  total_tasks: 10
+```
+### Creating Custom Baselines
+Coding agents can create new baseline files to test custom tasks:
+```python
+# my_task_baseline.py
+from synth_ai.baseline import BaselineConfig, BaselineTaskRunner, DataSplit, TaskResult
+class MyTaskRunner(BaselineTaskRunner):
+    async def run_task(self, seed: int) -> TaskResult:
+        # Your task logic here
+        return TaskResult(...)
+my_baseline = BaselineConfig(
+    baseline_id="my_task",
+    name="My Custom Task",
+    description="Evaluate my custom task",
+    task_runner=MyTaskRunner,
+    splits={
+        "train": DataSplit(name="train", seeds=list(range(10))),
+    },
+)
+```
+Place this file in `examples/baseline/` or name it `*_baseline.py` for auto-discovery.
+---
 ## 🔐 SDK → Dashboard Pairing
 When you run `uvx synth-ai setup` (or legacy `uvx synth-ai rl_demo setup`):

synth-ai 0.2.17__py3-none-any.whl → 0.2.19__py3-none-any.whl

Potentially problematic release.

synth-ai 0.2.17py3-none-any.whl → 0.2.19py3-none-any.whl