synth-ai 0.2.16__py3-none-any.whl → 0.2.19__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synth-ai might be problematic. Click here for more details.
- examples/analyze_semantic_words.sh +2 -2
- examples/baseline/banking77_baseline.py +204 -0
- examples/baseline/crafter_baseline.py +407 -0
- examples/baseline/pokemon_red_baseline.py +326 -0
- examples/baseline/simple_baseline.py +56 -0
- examples/baseline/warming_up_to_rl_baseline.py +239 -0
- examples/blog_posts/gepa/README.md +355 -0
- examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
- examples/blog_posts/gepa/configs/banking77_gepa_test.toml +82 -0
- examples/blog_posts/gepa/configs/banking77_mipro_local.toml +52 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +59 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +36 -0
- examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +53 -0
- examples/blog_posts/gepa/configs/hover_gepa_local.toml +59 -0
- examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +36 -0
- examples/blog_posts/gepa/configs/hover_mipro_local.toml +53 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +59 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +36 -0
- examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +53 -0
- examples/blog_posts/gepa/configs/pupa_gepa_local.toml +60 -0
- examples/blog_posts/gepa/configs/pupa_mipro_local.toml +54 -0
- examples/blog_posts/gepa/deploy_banking77_task_app.sh +41 -0
- examples/blog_posts/gepa/gepa_baseline.py +204 -0
- examples/blog_posts/gepa/query_prompts_example.py +97 -0
- examples/blog_posts/gepa/run_gepa_banking77.sh +87 -0
- examples/blog_posts/gepa/task_apps.py +105 -0
- examples/blog_posts/gepa/test_gepa_local.sh +67 -0
- examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
- examples/blog_posts/pokemon_vl/README.md +98 -0
- examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
- examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +27 -0
- examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
- examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
- examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +43 -0
- examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
- examples/blog_posts/pokemon_vl/extract_images.py +239 -0
- examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
- examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
- examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
- examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
- examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
- examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
- examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
- examples/blog_posts/warming_up_to_rl/README.md +158 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
- examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
- examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
- examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +91 -0
- examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
- examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
- examples/dev/qwen3_32b_qlora_4xh100.toml +5 -0
- examples/multi_step/configs/VERILOG_REWARDS.md +4 -0
- examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +4 -0
- examples/multi_step/configs/crafter_rl_outcome.toml +2 -1
- examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +65 -107
- examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +2 -1
- examples/multi_step/configs/crafter_rl_stepwise_simple.toml +2 -1
- examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
- examples/multi_step/configs/verilog_rl_lora.toml +80 -123
- examples/qwen_coder/configs/coder_lora_30b.toml +1 -3
- examples/qwen_coder/configs/coder_lora_4b.toml +4 -1
- examples/qwen_coder/configs/coder_lora_small.toml +1 -3
- examples/qwen_vl/README.md +10 -12
- examples/qwen_vl/SETUP_COMPLETE.md +7 -8
- examples/qwen_vl/VISION_TESTS_COMPLETE.md +2 -3
- examples/qwen_vl/collect_data_via_cli.md +76 -84
- examples/qwen_vl/collect_vision_traces.py +4 -4
- examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +40 -57
- examples/qwen_vl/configs/crafter_vlm_sft_example.toml +1 -2
- examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +20 -37
- examples/qwen_vl/configs/eval_gpt5nano_vision.toml +21 -40
- examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
- examples/qwen_vl/configs/{filter_qwen2vl_sft.toml → filter_qwen3vl_sft.toml} +4 -5
- examples/qwen_vl/configs/filter_vision_sft.toml +2 -3
- examples/qwen_vl/crafter_qwen_vl_agent.py +5 -5
- examples/qwen_vl/run_vision_comparison.sh +6 -7
- examples/rl/README.md +5 -5
- examples/rl/configs/rl_from_base_qwen.toml +26 -1
- examples/rl/configs/rl_from_base_qwen17.toml +6 -2
- examples/rl/task_app/README.md +1 -2
- examples/rl/task_app/math_single_step.py +2 -2
- examples/run_crafter_demo.sh +2 -2
- examples/sft/README.md +1 -1
- examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -1
- examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -1
- examples/swe/task_app/README.md +32 -2
- examples/swe/task_app/grpo_swe_mini.py +4 -0
- examples/swe/task_app/hosted/envs/crafter/react_agent.py +1 -1
- examples/swe/task_app/hosted/envs/mini_swe/environment.py +37 -10
- examples/swe/task_app/hosted/inference/openai_client.py +4 -38
- examples/swe/task_app/hosted/policy_routes.py +17 -0
- examples/swe/task_app/hosted/rollout.py +4 -2
- examples/swe/task_app/morph_backend.py +178 -0
- examples/task_apps/banking77/__init__.py +6 -0
- examples/task_apps/banking77/banking77_task_app.py +841 -0
- examples/task_apps/banking77/deploy_wrapper.py +46 -0
- examples/task_apps/crafter/CREATE_SFT_DATASET.md +4 -0
- examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +4 -0
- examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +4 -0
- examples/task_apps/crafter/task_app/README.md +1 -1
- examples/task_apps/crafter/task_app/grpo_crafter.py +90 -5
- examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +1 -1
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +4 -26
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -2
- examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +49 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +372 -107
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +81 -12
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +82 -11
- examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +194 -1
- examples/task_apps/enron/task_app/grpo_enron_task_app.py +1 -1
- examples/task_apps/gepa_benchmarks/__init__.py +7 -0
- examples/task_apps/gepa_benchmarks/common.py +260 -0
- examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
- examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
- examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
- examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
- examples/task_apps/math/README.md +1 -2
- examples/task_apps/pokemon_red/README.md +3 -4
- examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +4 -0
- examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +6 -5
- examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +1 -2
- examples/task_apps/pokemon_red/task_app.py +288 -39
- examples/task_apps/sokoban/README.md +2 -3
- examples/task_apps/verilog/eval_groq_qwen32b.toml +12 -14
- examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +1 -1
- examples/vlm/configs/crafter_vlm_gpt4o.toml +4 -1
- examples/warming_up_to_rl/configs/crafter_fft.toml +4 -1
- examples/warming_up_to_rl/configs/crafter_fft_4b.toml +0 -2
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +3 -2
- examples/warming_up_to_rl/run_local_rollout_traced.py +1 -1
- examples/warming_up_to_rl/task_app/README.md +1 -1
- examples/warming_up_to_rl/task_app/grpo_crafter.py +185 -5
- examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +1 -1
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +3 -27
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -1
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +49 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +156 -45
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +37 -4
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +33 -3
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +67 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen.toml +27 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +6 -0
- synth_ai/api/train/builders.py +99 -4
- synth_ai/api/train/cli.py +516 -26
- synth_ai/api/train/config_finder.py +13 -2
- synth_ai/api/train/configs/__init__.py +23 -2
- synth_ai/api/train/configs/prompt_learning.py +442 -0
- synth_ai/api/train/configs/rl.py +61 -7
- synth_ai/api/train/configs/sft.py +6 -2
- synth_ai/api/train/configs/shared.py +59 -2
- synth_ai/api/train/task_app.py +1 -1
- synth_ai/api/train/validators.py +277 -0
- synth_ai/auth/credentials.py +119 -0
- synth_ai/baseline/__init__.py +25 -0
- synth_ai/baseline/config.py +209 -0
- synth_ai/baseline/discovery.py +214 -0
- synth_ai/baseline/execution.py +146 -0
- synth_ai/cli/__init__.py +94 -18
- synth_ai/cli/__main__.py +0 -0
- synth_ai/cli/claude.py +70 -0
- synth_ai/cli/codex.py +84 -0
- synth_ai/cli/commands/__init__.py +18 -0
- synth_ai/cli/commands/baseline/__init__.py +12 -0
- synth_ai/cli/commands/baseline/core.py +637 -0
- synth_ai/cli/commands/baseline/list.py +93 -0
- synth_ai/cli/commands/demo/__init__.py +6 -0
- synth_ai/cli/commands/demo/core.py +163 -0
- synth_ai/cli/commands/eval/__init__.py +19 -0
- synth_ai/cli/commands/eval/core.py +1112 -0
- synth_ai/cli/commands/eval/errors.py +81 -0
- synth_ai/cli/commands/eval/validation.py +133 -0
- synth_ai/cli/commands/filter/__init__.py +12 -0
- synth_ai/cli/commands/filter/core.py +424 -0
- synth_ai/cli/commands/filter/errors.py +55 -0
- synth_ai/cli/commands/filter/validation.py +77 -0
- synth_ai/cli/commands/help/__init__.py +177 -0
- synth_ai/cli/commands/help/core.py +72 -0
- synth_ai/cli/commands/smoke/__init__.py +7 -0
- synth_ai/cli/commands/smoke/core.py +1436 -0
- synth_ai/cli/commands/status/__init__.py +64 -0
- synth_ai/cli/commands/status/client.py +192 -0
- synth_ai/cli/commands/status/config.py +92 -0
- synth_ai/cli/commands/status/errors.py +20 -0
- synth_ai/cli/commands/status/formatters.py +164 -0
- synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
- synth_ai/cli/commands/status/subcommands/files.py +79 -0
- synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
- synth_ai/cli/commands/status/subcommands/models.py +79 -0
- synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
- synth_ai/cli/commands/status/subcommands/runs.py +81 -0
- synth_ai/cli/commands/status/subcommands/summary.py +47 -0
- synth_ai/cli/commands/status/subcommands/usage.py +203 -0
- synth_ai/cli/commands/status/utils.py +114 -0
- synth_ai/cli/commands/train/__init__.py +53 -0
- synth_ai/cli/commands/train/core.py +21 -0
- synth_ai/cli/commands/train/errors.py +117 -0
- synth_ai/cli/commands/train/judge_schemas.py +200 -0
- synth_ai/cli/commands/train/judge_validation.py +305 -0
- synth_ai/cli/commands/train/validation.py +386 -0
- synth_ai/cli/demo.py +30 -158
- synth_ai/cli/deploy/__init__.py +43 -0
- synth_ai/cli/deploy.py +162 -0
- synth_ai/cli/eval/__init__.py +36 -0
- synth_ai/cli/eval/core.py +5 -0
- synth_ai/cli/eval/errors.py +31 -0
- synth_ai/cli/eval/validation.py +5 -0
- synth_ai/cli/filter/__init__.py +28 -0
- synth_ai/cli/filter/core.py +5 -0
- synth_ai/cli/filter/errors.py +23 -0
- synth_ai/cli/filter/validation.py +5 -0
- synth_ai/cli/legacy_root_backup.py +14 -8
- synth_ai/cli/modal_serve/__init__.py +12 -0
- synth_ai/cli/modal_serve/core.py +14 -0
- synth_ai/cli/modal_serve/errors.py +8 -0
- synth_ai/cli/modal_serve/validation.py +11 -0
- synth_ai/cli/opencode.py +107 -0
- synth_ai/cli/root.py +9 -5
- synth_ai/cli/serve/__init__.py +12 -0
- synth_ai/cli/serve/core.py +14 -0
- synth_ai/cli/serve/errors.py +8 -0
- synth_ai/cli/serve/validation.py +11 -0
- synth_ai/cli/setup.py +20 -265
- synth_ai/cli/status.py +7 -126
- synth_ai/cli/task_app_deploy.py +1 -10
- synth_ai/cli/task_app_modal_serve.py +4 -9
- synth_ai/cli/task_app_serve.py +4 -11
- synth_ai/cli/task_apps.py +51 -1480
- synth_ai/cli/train/__init__.py +12 -0
- synth_ai/cli/train/core.py +21 -0
- synth_ai/cli/train/errors.py +8 -0
- synth_ai/cli/train/validation.py +24 -0
- synth_ai/cli/train.py +1 -14
- synth_ai/demos/crafter/grpo_crafter_task_app.py +1 -1
- synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +1 -1
- synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
- synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
- synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
- synth_ai/environments/examples/red/engine.py +33 -12
- synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
- synth_ai/environments/examples/red/environment.py +26 -0
- synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
- synth_ai/http.py +12 -0
- synth_ai/judge_schemas.py +10 -10
- synth_ai/learning/__init__.py +10 -0
- synth_ai/learning/prompt_learning_client.py +276 -0
- synth_ai/learning/prompt_learning_types.py +184 -0
- synth_ai/learning/rl/client.py +3 -1
- synth_ai/pricing/__init__.py +2 -0
- synth_ai/pricing/model_pricing.py +57 -0
- synth_ai/streaming/__init__.py +29 -0
- synth_ai/streaming/config.py +94 -0
- synth_ai/streaming/handlers.py +518 -0
- synth_ai/streaming/streamer.py +320 -0
- synth_ai/streaming/types.py +95 -0
- synth_ai/task/apps/__init__.py +1 -0
- synth_ai/task/config.py +2 -0
- synth_ai/task/tracing_utils.py +25 -25
- synth_ai/task/validators.py +45 -9
- synth_ai/task_app_cfgs.py +21 -0
- synth_ai/tracing_v3/config.py +162 -19
- synth_ai/tracing_v3/constants.py +1 -1
- synth_ai/tracing_v3/db_config.py +24 -38
- synth_ai/tracing_v3/migration_helper.py +1 -2
- synth_ai/tracing_v3/storage/config.py +47 -13
- synth_ai/tracing_v3/storage/factory.py +3 -3
- synth_ai/tracing_v3/turso/daemon.py +113 -11
- synth_ai/tracing_v3/turso/native_manager.py +92 -16
- synth_ai/types.py +8 -0
- synth_ai/urls.py +11 -0
- synth_ai/utils/__init__.py +30 -1
- synth_ai/utils/agents.py +74 -0
- synth_ai/utils/bin.py +39 -0
- synth_ai/utils/cli.py +149 -5
- synth_ai/utils/env.py +40 -33
- synth_ai/utils/http.py +4 -1
- synth_ai/utils/json.py +72 -0
- synth_ai/utils/modal.py +285 -3
- synth_ai/utils/paths.py +48 -0
- synth_ai/utils/uvicorn.py +113 -0
- {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/METADATA +109 -6
- {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/RECORD +291 -142
- examples/qwen_vl/configs/eval_qwen2vl_vision.toml +0 -44
- synth_ai/cli/tui.py +0 -62
- synth_ai/tui/__init__.py +0 -5
- synth_ai/tui/__main__.py +0 -13
- synth_ai/tui/cli/__init__.py +0 -1
- synth_ai/tui/cli/query_experiments.py +0 -164
- synth_ai/tui/cli/query_experiments_v3.py +0 -164
- synth_ai/tui/dashboard.py +0 -911
- {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/entry_points.txt +0 -0
- {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/licenses/LICENSE +0 -0
- {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# ✅ Inference Success Report
|
|
2
|
+
|
|
3
|
+
**Date**: Oct 31, 2025
|
|
4
|
+
**Models Tested**: Latest SFT and RL models from training
|
|
5
|
+
|
|
6
|
+
## Working Solution
|
|
7
|
+
|
|
8
|
+
### Correct Endpoint
|
|
9
|
+
```
|
|
10
|
+
https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
### SFT/PEFT Models: ✅ WORKING
|
|
14
|
+
|
|
15
|
+
**Model ID**: `peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9`
|
|
16
|
+
|
|
17
|
+
**Test Code**:
|
|
18
|
+
```python
|
|
19
|
+
import httpx
|
|
20
|
+
import os
|
|
21
|
+
|
|
22
|
+
SYNTH_API_KEY = os.getenv("SYNTH_API_KEY")
|
|
23
|
+
url = "https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions"
|
|
24
|
+
|
|
25
|
+
headers = {
|
|
26
|
+
"Authorization": f"Bearer {SYNTH_API_KEY}",
|
|
27
|
+
"Content-Type": "application/json",
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
payload = {
|
|
31
|
+
"model": "peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9",
|
|
32
|
+
"messages": [
|
|
33
|
+
{"role": "system", "content": "You are a helpful assistant."},
|
|
34
|
+
{"role": "user", "content": "Say 'Hello, I am working!' and nothing else."}
|
|
35
|
+
],
|
|
36
|
+
"temperature": 0.2,
|
|
37
|
+
"max_tokens": 100,
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
with httpx.Client(timeout=300.0) as client:
|
|
41
|
+
response = client.post(url, json=payload, headers=headers)
|
|
42
|
+
print(response.json())
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**Result**:
|
|
46
|
+
- ✅ Status: 200 OK
|
|
47
|
+
- ✅ Response generated successfully
|
|
48
|
+
- ✅ Token usage tracked: 31 prompt + 72 completion = 103 total
|
|
49
|
+
- ✅ Output: "Hello, I am working!" (with thinking tokens as expected)
|
|
50
|
+
|
|
51
|
+
### RL Models: ⚠️ NEEDS PROMOTION
|
|
52
|
+
|
|
53
|
+
**Model ID**: `rl:Qwen/Qwen3-4B:job_19a38041c38f96e638c:checkpoint-epoch-1`
|
|
54
|
+
|
|
55
|
+
**Status**: 303 Redirect (empty response)
|
|
56
|
+
|
|
57
|
+
**Root Cause**:
|
|
58
|
+
From monorepo backend code inspection, RL checkpoints require a "promotion" step to be loaded onto Modal before they can be used for inference. The direct Modal endpoint returns a redirect for unpromoted RL models.
|
|
59
|
+
|
|
60
|
+
**Solution Options**:
|
|
61
|
+
|
|
62
|
+
#### Option 1: Use Backend Proxy (Recommended)
|
|
63
|
+
The backend automatically handles RL promotion:
|
|
64
|
+
```python
|
|
65
|
+
# Use backend proxy instead of direct Modal
|
|
66
|
+
url = "https://your-backend.example.com/api/chat/completions"
|
|
67
|
+
# Backend will auto-promote and route to vLLM
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
#### Option 2: Manual Promotion (Advanced)
|
|
71
|
+
1. Call promotion endpoint first
|
|
72
|
+
2. Wait for model to load onto Modal
|
|
73
|
+
3. Then call inference endpoint
|
|
74
|
+
|
|
75
|
+
## Key Learnings
|
|
76
|
+
|
|
77
|
+
### What We Got Wrong Initially:
|
|
78
|
+
1. ❌ Wrong endpoint path: Used `/v1/chat/completions` → should be `/chat/completions`
|
|
79
|
+
2. ❌ Wrong base URL: Used render.com URL → should be Modal URL
|
|
80
|
+
3. ❌ Assumed RL = PEFT workflow → RL needs promotion step
|
|
81
|
+
|
|
82
|
+
### What We Got Right:
|
|
83
|
+
1. ✅ Model ID format from `synth-ai status models list`
|
|
84
|
+
2. ✅ Using SYNTH_API_KEY for auth
|
|
85
|
+
3. ✅ Bearer token authorization header
|
|
86
|
+
|
|
87
|
+
## Recommendations for Library Improvement
|
|
88
|
+
|
|
89
|
+
### 1. Add Simple CLI Command
|
|
90
|
+
```bash
|
|
91
|
+
synth-ai inference \
|
|
92
|
+
--model "peft:Qwen/Qwen3-0.6B:job_xxx" \
|
|
93
|
+
--message "Hello" \
|
|
94
|
+
--max-tokens 100
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### 2. Document Endpoint in Model Status
|
|
98
|
+
```bash
|
|
99
|
+
$ synth-ai status models get "peft:..."
|
|
100
|
+
Model: peft:Qwen/Qwen3-0.6B:job_xxx
|
|
101
|
+
Status: succeeded
|
|
102
|
+
Inference Endpoint: https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions
|
|
103
|
+
Ready: ✅ Yes (use directly)
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### 3. Add Python SDK Example
|
|
107
|
+
```python
|
|
108
|
+
from synth_ai import InferenceClient
|
|
109
|
+
|
|
110
|
+
client = InferenceClient(api_key=os.getenv("SYNTH_API_KEY"))
|
|
111
|
+
response = client.chat.completions.create(
|
|
112
|
+
model="peft:Qwen/Qwen3-0.6B:job_xxx",
|
|
113
|
+
messages=[{"role": "user", "content": "Hello"}]
|
|
114
|
+
)
|
|
115
|
+
print(response.choices[0].message.content)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### 4. Clear Error Messages
|
|
119
|
+
- 303 → "RL model needs promotion. Use backend proxy or call /promote endpoint first."
|
|
120
|
+
- 404 → "Model not found. Check model ID with: synth-ai status models list"
|
|
121
|
+
|
|
122
|
+
## Success Criteria Met
|
|
123
|
+
|
|
124
|
+
- ✅ Can get model ID from CLI
|
|
125
|
+
- ✅ Know correct endpoint
|
|
126
|
+
- ✅ Know correct auth (SYNTH_API_KEY)
|
|
127
|
+
- ✅ Can send test message
|
|
128
|
+
- ✅ Get response back
|
|
129
|
+
- ⚠️ RL models need extra step (documented)
|
|
130
|
+
|
|
131
|
+
**Status**: PEFT/SFT inference is fully working! RL needs backend proxy.
|
|
132
|
+
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
# Crafter: From Rollouts to RL with the Synth AI CLI
|
|
2
|
+
|
|
3
|
+
This playbook mirrors the original “Warming Up to RL” walkthrough, but swaps the bespoke scripts for the first–class `uvx synth-ai` helpers. Every step—from deploying the task app to filtering rollouts, fine-tuning, and bootstrapping RL— now uses the same CLI you’d reach for in production.
|
|
4
|
+
|
|
5
|
+
All commands assume you are inside the repository root and have `uv`/`uvx` available.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 0. Prerequisites
|
|
10
|
+
|
|
11
|
+
1. Install dependencies and authenticate once:
|
|
12
|
+
```bash
|
|
13
|
+
uv pip install -e .
|
|
14
|
+
uvx synth-ai setup
|
|
15
|
+
```
|
|
16
|
+
The setup wizard writes the required `SYNTH_API_KEY`, `ENVIRONMENT_API_KEY`, and local `.env` helpers.
|
|
17
|
+
|
|
18
|
+
2. Copy the example secrets if you need a starter file:
|
|
19
|
+
```bash
|
|
20
|
+
cp examples/warming_up_to_rl/.env.example .env
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
3. Export the path we use for trace capture (optional but keeps things tidy):
|
|
24
|
+
```bash
|
|
25
|
+
export CRAFTER_TRACE_DB=traces/v3/crafter_blog.db
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## 1. Ship the Crafter Task App
|
|
31
|
+
|
|
32
|
+
Deploy the hosted Crafter environment once. The Modal URL that prints at the end is reused by eval, SFT, and RL.
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
uvx synth-ai deploy grpo-crafter \
|
|
36
|
+
--runtime modal \
|
|
37
|
+
--modal-mode serve \
|
|
38
|
+
--name crafter-blogpost \
|
|
39
|
+
--env-file .env
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
For local testing you can run:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
uvx synth-ai deploy grpo-crafter \
|
|
46
|
+
--runtime uvicorn \
|
|
47
|
+
--port 8001 \
|
|
48
|
+
--trace traces/v3 \
|
|
49
|
+
--env-file .env
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Copy the Modal URL (e.g. `https://your-app.modal.run`) and replace the `task_app_url` placeholders inside every config under `examples/blog_posts/warming_up_to_rl/configs/`.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## 2. Collect High-Quality Rollouts
|
|
57
|
+
|
|
58
|
+
We lean on large teacher models to produce demonstrations. The configs in `configs/` already request full traces so we retain chain-of-thought.
|
|
59
|
+
|
|
60
|
+
Groq Qwen3-32B (text-only prompt):
|
|
61
|
+
```bash
|
|
62
|
+
uvx synth-ai eval grpo-crafter \
|
|
63
|
+
--config examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml \
|
|
64
|
+
--trace-db "${CRAFTER_TRACE_DB}"
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
GPT-OSS-120B via Groq’s OpenAI-compatible endpoint (also text-only):
|
|
68
|
+
```bash
|
|
69
|
+
uvx synth-ai eval grpo-crafter \
|
|
70
|
+
--config examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml \
|
|
71
|
+
--trace-db "${CRAFTER_TRACE_DB}"
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Both configs disable image attachments and rely on the textual observation renderer (`format_observation`) so Groq stays within its supported modalities. If you want to try other models, keep `use_vision = false` unless the provider explicitly supports image inputs.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## 3. Filter Into an SFT Dataset
|
|
79
|
+
|
|
80
|
+
Once traces are stored in `CRAFT_TRACE_DB`, trim to the crisp trajectories:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
uvx synth-ai filter \
|
|
84
|
+
--config examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
The output JSONL lands in `ft_data/crafter_blog_high_reward.jsonl`, ready for supervised fine-tuning.
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## 4. Fine-Tune Qwen3-4B with `uvx synth-ai train`
|
|
92
|
+
|
|
93
|
+
Update the dataset path (and optionally hyperparameters) in `train_sft_qwen4b.toml`, then launch:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
uvx synth-ai train \
|
|
97
|
+
--type sft \
|
|
98
|
+
--config examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml \
|
|
99
|
+
--env-file .env \
|
|
100
|
+
--poll
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Capture the returned job id (it looks like `fft:Qwen/Qwen3-4B:job_xxxxx`). We reuse that identifier in the evaluation and RL configs.
|
|
104
|
+
At any time you can list recently minted checkpoints with:
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
uvx synth-ai status models
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
The output table shows the canonical model name/ID alongside the source job.
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## 5. Evaluate the Fine-Tuned Checkpoint
|
|
115
|
+
|
|
116
|
+
Replace both `REPLACE-WITH-SFT-JOB-ID` strings inside `eval_ft_qwen4b.toml`, then run:
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
uvx synth-ai eval grpo-crafter \
|
|
120
|
+
--config examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml \
|
|
121
|
+
--trace-db "${CRAFTER_TRACE_DB}"
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
This provides a clean, CLI-native comparison between the teacher rollouts and the fine-tuned model.
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 6. Kick Off RL from the Fine-Tuned Model
|
|
129
|
+
|
|
130
|
+
Point `train_rl_from_sft.toml` at the same Modal task app and set `model.source` to your SFT job id:
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
uvx synth-ai train \
|
|
134
|
+
--type rl \
|
|
135
|
+
--config examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml \
|
|
136
|
+
--env-file .env \
|
|
137
|
+
--poll
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
The CLI streams rollout and judge metrics in real time. When the run finishes, you can re-use the Stage 5 config (substituting the RL job id) to quantify the uplift.
|
|
141
|
+
If you lose track of the produced RL label or want to confirm the latest status, run:
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
uvx synth-ai status jobs
|
|
145
|
+
uvx synth-ai status models
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
The first command shows job completion state; the second surfaces model IDs you can plug into new eval configs.
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## 7. Where to Go Next
|
|
153
|
+
|
|
154
|
+
- The original `examples/warming_up_to_rl` folder still contains deeper experiments (auto-curricula, modal renderers, etc.).
|
|
155
|
+
- Add more `eval_*.toml` configs to compare alternative judges or reward shaping strategies.
|
|
156
|
+
- Plug the filtered dataset into `uvx synth-ai files upload` if you want to share it with a teammate without copying JSONL around.
|
|
157
|
+
|
|
158
|
+
This directory now holds everything a blog post needs: configs, output locations, and the CLI entrypoints to reproduce the Crafter SFT → RL pipeline end-to-end.
|
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
# Smoke Testing Your Task App
|
|
2
|
+
|
|
3
|
+
This guide shows how to quickly test your task app using the `synth-ai smoke` command with auto-start features.
|
|
4
|
+
|
|
5
|
+
## Quick Start
|
|
6
|
+
|
|
7
|
+
The easiest way to smoke test is using the `[smoke]` section in your RL config:
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
cd examples/blog_posts/warming_up_to_rl
|
|
11
|
+
uv run synth-ai smoke --config configs/smoke_test.toml
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
**That's it!** The smoke command will:
|
|
15
|
+
1. ✅ Auto-start sqld server for tracing (if `sqld_auto_start = true`)
|
|
16
|
+
2. ✅ Auto-start your task app on port 8765 (if `task_app_name` is set)
|
|
17
|
+
3. ✅ Run 10 rollout steps with `gpt-5-nano` using synthetic mocking
|
|
18
|
+
4. ✅ Automatically stop all background services when done
|
|
19
|
+
|
|
20
|
+
**Expected output:**
|
|
21
|
+
```
|
|
22
|
+
[smoke] sqld ready
|
|
23
|
+
[smoke] Task app ready at http://localhost:8765 (status=400)
|
|
24
|
+
[mock-rl] server ready http://127.0.0.1:51798 backend=synthetic
|
|
25
|
+
>> POST /rollout run_id=smoke-... env=crafter policy=crafter-react
|
|
26
|
+
[mock-rl] ← request backend=synthetic model=gpt-5-nano messages=2
|
|
27
|
+
[mock-rl] → response tool_calls=1 backend=synthetic
|
|
28
|
+
rollout[0:0] episodes=1 steps=10 mean_return=1.0000
|
|
29
|
+
✓ Smoke rollouts complete
|
|
30
|
+
successes=1/1 total_steps=10 v3_traces=1/1 nonzero_returns=1/1
|
|
31
|
+
[smoke] Background services stopped
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Configuration
|
|
35
|
+
|
|
36
|
+
Add a `[smoke]` section to your RL config:
|
|
37
|
+
|
|
38
|
+
```toml
|
|
39
|
+
[smoke]
|
|
40
|
+
# Auto-start task app
|
|
41
|
+
task_app_name = "grpo-crafter"
|
|
42
|
+
task_app_port = 8765
|
|
43
|
+
task_app_env_file = ".env"
|
|
44
|
+
task_app_force = true
|
|
45
|
+
|
|
46
|
+
# Auto-start sqld
|
|
47
|
+
sqld_auto_start = true
|
|
48
|
+
sqld_db_path = "./traces/local.db"
|
|
49
|
+
sqld_hrana_port = 8080
|
|
50
|
+
sqld_http_port = 8081
|
|
51
|
+
|
|
52
|
+
# Test parameters
|
|
53
|
+
max_steps = 10
|
|
54
|
+
policy = "gpt-5-nano"
|
|
55
|
+
mock_backend = "synthetic" # or "openai" (requires valid OpenAI API key)
|
|
56
|
+
return_trace = true
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Testing Methods
|
|
60
|
+
|
|
61
|
+
### 1. Full Auto (Recommended)
|
|
62
|
+
Everything auto-starts from config:
|
|
63
|
+
```bash
|
|
64
|
+
uv run synth-ai smoke --config configs/smoke_test.toml
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### 2. Manual Task App + Auto sqld
|
|
68
|
+
Start task app manually, auto-start sqld:
|
|
69
|
+
```bash
|
|
70
|
+
# Config with sqld_auto_start=true but no task_app_name
|
|
71
|
+
uv run synth-ai smoke --config configs/my_config.toml --url http://localhost:8765
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### 3. Override Config Settings
|
|
75
|
+
Override any config value via CLI:
|
|
76
|
+
```bash
|
|
77
|
+
uv run synth-ai smoke --config configs/smoke_test.toml --max-steps 5
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### 4. No Config (Manual Everything)
|
|
81
|
+
```bash
|
|
82
|
+
# Start services manually in separate terminals:
|
|
83
|
+
# Terminal 1: sqld --db-path ./traces/local.db --hrana-listen-addr 127.0.0.1:8080 --http-listen-addr 127.0.0.1:8081
|
|
84
|
+
# Terminal 2: uv run synth-ai task-app serve grpo-crafter --port 8765 --env-file .env --force
|
|
85
|
+
|
|
86
|
+
# Terminal 3: Run smoke test
|
|
87
|
+
uv run synth-ai smoke --url http://localhost:8765 \
|
|
88
|
+
--env-name crafter \
|
|
89
|
+
--policy-name crafter-react \
|
|
90
|
+
--max-steps 10 \
|
|
91
|
+
--policy mock \
|
|
92
|
+
--mock-backend openai
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## Prerequisites
|
|
96
|
+
|
|
97
|
+
### Install sqld (for tracing)
|
|
98
|
+
```bash
|
|
99
|
+
brew install sqld
|
|
100
|
+
# or
|
|
101
|
+
curl -fsSL https://get.turso.com/sqld | bash
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Verify Installation
|
|
105
|
+
```bash
|
|
106
|
+
which sqld
|
|
107
|
+
# Should output: /opt/homebrew/bin/sqld or similar
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
## Common Issues
|
|
111
|
+
|
|
112
|
+
### sqld not found
|
|
113
|
+
If you see "sqld not found in PATH":
|
|
114
|
+
```bash
|
|
115
|
+
brew install sqld
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Port already in use
|
|
119
|
+
Use `task_app_force = true` in config, or:
|
|
120
|
+
```bash
|
|
121
|
+
# Kill processes on ports 8080, 8081, 8765
|
|
122
|
+
lsof -ti:8080,8081,8765 | xargs kill -9
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Task app not starting
|
|
126
|
+
Check the error output - you may need:
|
|
127
|
+
- Valid `.env` file with required keys
|
|
128
|
+
- Correct task app name registered in your codebase
|
|
129
|
+
|
|
130
|
+
## Example Output
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
[smoke] Loaded configuration from configs/smoke_test.toml
|
|
134
|
+
[smoke] Config keys: task_app_name, task_app_port, sqld_auto_start, max_steps, policy
|
|
135
|
+
[smoke] Starting sqld server...
|
|
136
|
+
[smoke] DB path: /Users/you/project/traces/local.db
|
|
137
|
+
[smoke] Hrana port: 8080, HTTP port: 8081
|
|
138
|
+
[smoke] sqld ready
|
|
139
|
+
[smoke] Starting task app 'grpo-crafter' on port 8765...
|
|
140
|
+
[smoke] Task app ready at http://localhost:8765
|
|
141
|
+
[smoke] Task app started, will use URL: http://localhost:8765
|
|
142
|
+
[mock-rl] server ready http://127.0.0.1:52134 backend=openai
|
|
143
|
+
>> POST /rollout run_id=smoke-abc123...
|
|
144
|
+
rollout[0:0] episodes=1 steps=20 mean_return=1.2500
|
|
145
|
+
✓ Smoke rollouts complete
|
|
146
|
+
successes=1/1 total_steps=20 v3_traces=1/1 nonzero_returns=1/1
|
|
147
|
+
[smoke] Stopping sqld...
|
|
148
|
+
[smoke] Stopping task_app...
|
|
149
|
+
[smoke] Background services stopped
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Next Steps
|
|
153
|
+
|
|
154
|
+
Once smoke tests pass:
|
|
155
|
+
1. Train your model: `uv run synth-ai train --type rl --config configs/your_config.toml`
|
|
156
|
+
2. Check traces: Look in `./traces/` directory
|
|
157
|
+
3. Monitor training: Use the Synth dashboard
|
|
158
|
+
|
|
159
|
+
## Full Config Reference
|
|
160
|
+
|
|
161
|
+
See [`configs/smoke_test.toml`](configs/smoke_test.toml) for a complete example.
|
|
162
|
+
|
|
163
|
+
See [CLI Smoke Documentation](https://docs.usesynth.ai/cli/smoke) for all options.
|
|
164
|
+
|
|
@@ -0,0 +1,253 @@
|
|
|
1
|
+
# Smoke Test Implementation - Complete
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
The smoke test now provides **complete visibility into RL training rollouts**, including:
|
|
6
|
+
|
|
7
|
+
✅ **Auto-start background services** (sqld, task app)
|
|
8
|
+
✅ **Real OpenAI inference** with gpt-4o-mini
|
|
9
|
+
✅ **Tool call display** - see every action the policy takes
|
|
10
|
+
✅ **Trace validation** - verify v3 trace format
|
|
11
|
+
✅ **Clean output** - all diagnostic noise suppressed
|
|
12
|
+
|
|
13
|
+
## Quick Start
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
cd examples/blog_posts/warming_up_to_rl
|
|
17
|
+
uv run synth-ai smoke --config configs/smoke_test.toml
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
**Output shows:**
|
|
21
|
+
- Service startup (sqld, task app)
|
|
22
|
+
- Real-time inference requests
|
|
23
|
+
- **All 10 tool calls with arguments** (e.g., `interact_many({"actions":["move_up","move_up"]})`)
|
|
24
|
+
- Rollout metrics (steps, returns, rewards)
|
|
25
|
+
- Success validation
|
|
26
|
+
|
|
27
|
+
## Documentation
|
|
28
|
+
|
|
29
|
+
All documentation has been updated for future agents:
|
|
30
|
+
|
|
31
|
+
### 1. User Documentation
|
|
32
|
+
- **`SMOKE_TESTING.md`** - How to run smoke tests, what to expect
|
|
33
|
+
- **`configs/smoke_test.toml`** - Well-commented example configuration
|
|
34
|
+
- **`monorepo/docs/cli/smoke.mdx`** - Mintlify CLI documentation
|
|
35
|
+
|
|
36
|
+
### 2. Developer Documentation
|
|
37
|
+
- **`ARCHITECTURE.md`** - Internal architecture, troubleshooting guide
|
|
38
|
+
- **`synth_ai/cli/commands/smoke/core.py`** - Extensive inline comments explaining tool call extraction
|
|
39
|
+
|
|
40
|
+
### 3. Code Comments
|
|
41
|
+
|
|
42
|
+
**Tool Call Extraction (core.py lines 946-997):**
|
|
43
|
+
```python
|
|
44
|
+
# Extract and display tool calls from v3 trace
|
|
45
|
+
#
|
|
46
|
+
# IMPORTANT: Tool calls are extracted from the structured v3 trace format.
|
|
47
|
+
# The trace must be requested with return_trace=True for this to work.
|
|
48
|
+
#
|
|
49
|
+
# Trace structure:
|
|
50
|
+
# trace.event_history[] - list of events (policy calls, env steps)
|
|
51
|
+
# ├─ event.call_records[] - LLM calls made during this event
|
|
52
|
+
# ├─ call_record.output_tool_calls[] - tool calls from LLM response
|
|
53
|
+
# ├─ tool_call.name - function name (e.g., "interact_many")
|
|
54
|
+
# └─ tool_call.arguments_json - JSON string of arguments
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Key Implementation Details
|
|
58
|
+
|
|
59
|
+
### Tool Call Display
|
|
60
|
+
|
|
61
|
+
**Requirements:**
|
|
62
|
+
1. `return_trace = true` in config (CRITICAL - without this, no tool calls)
|
|
63
|
+
2. v3 trace format (`trace_format="structured"`)
|
|
64
|
+
3. Mock proxy or real inference (direct API calls don't populate traces correctly)
|
|
65
|
+
|
|
66
|
+
**Data Flow:**
|
|
67
|
+
```
|
|
68
|
+
1. Rollout request with return_trace=True
|
|
69
|
+
↓
|
|
70
|
+
2. Task app makes LLM calls, captures responses
|
|
71
|
+
↓
|
|
72
|
+
3. LLM responses include tool_calls
|
|
73
|
+
↓
|
|
74
|
+
4. Task app stores call_records in event_history
|
|
75
|
+
↓
|
|
76
|
+
5. Smoke command extracts from trace.event_history[].call_records[].output_tool_calls[]
|
|
77
|
+
↓
|
|
78
|
+
6. Display: TOOL_CALL[N]: function_name({...args})
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### Diagnostic Suppression
|
|
82
|
+
|
|
83
|
+
**Permanently disabled (commented out, not deleted):**
|
|
84
|
+
- `synth_ai/tracing_v3/config.py:21` - `[TRACING_V3_CONFIG_LOADED]`
|
|
85
|
+
- `synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py` - All `[PATCH]` messages
|
|
86
|
+
- `synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py` - All `[PATCH]` messages
|
|
87
|
+
- `synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py` - All `[PATCH]` messages
|
|
88
|
+
|
|
89
|
+
**Why commented, not deleted?**
|
|
90
|
+
- Preserves context for debugging
|
|
91
|
+
- Shows what messages existed
|
|
92
|
+
- Easy to re-enable if needed
|
|
93
|
+
|
|
94
|
+
### Background Service Management
|
|
95
|
+
|
|
96
|
+
**Task App:**
|
|
97
|
+
- Runs from synth-ai root (required for discovery)
|
|
98
|
+
- Uses `nohup` for detachment
|
|
99
|
+
- Output → `nohup_task_app.out`
|
|
100
|
+
- Health check accepts 200 or 400 (400 = server up, auth failing)
|
|
101
|
+
- 120s timeout with progress updates
|
|
102
|
+
|
|
103
|
+
**sqld:**
|
|
104
|
+
- Dual ports: 8080 (Hrana WebSocket), 8081 (HTTP)
|
|
105
|
+
- Health check: `GET http://127.0.0.1:8081/health`
|
|
106
|
+
- 30s timeout
|
|
107
|
+
- Auto-cleanup of existing processes
|
|
108
|
+
|
|
109
|
+
## Configuration Reference
|
|
110
|
+
|
|
111
|
+
### Critical Settings
|
|
112
|
+
|
|
113
|
+
```toml
|
|
114
|
+
[smoke]
|
|
115
|
+
# Auto-start services
|
|
116
|
+
task_app_name = "grpo-crafter" # Task app to serve
|
|
117
|
+
task_app_port = 8765
|
|
118
|
+
task_app_env_file = ".env" # Required for this app
|
|
119
|
+
sqld_auto_start = true
|
|
120
|
+
|
|
121
|
+
# Inference - REAL OpenAI
|
|
122
|
+
model = "gpt-4o-mini" # Actual model used
|
|
123
|
+
mock_backend = "openai" # Route through OpenAI API
|
|
124
|
+
use_mock = true # Enable mock proxy
|
|
125
|
+
|
|
126
|
+
# CRITICAL for tool call display
|
|
127
|
+
return_trace = true # Must be true!
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Optional Settings
|
|
131
|
+
|
|
132
|
+
All `[smoke]` parameters are optional - CLI args override TOML values:
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
# Override max steps
|
|
136
|
+
uv run synth-ai smoke --config configs/smoke_test.toml --max-steps 5
|
|
137
|
+
|
|
138
|
+
# Use different model
|
|
139
|
+
uv run synth-ai smoke --config configs/smoke_test.toml --model gpt-4o
|
|
140
|
+
|
|
141
|
+
# Disable mock (use direct API - won't show tool calls properly)
|
|
142
|
+
uv run synth-ai smoke --config configs/smoke_test.toml --no-mock
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
## Troubleshooting
|
|
146
|
+
|
|
147
|
+
### No tool calls displayed
|
|
148
|
+
|
|
149
|
+
**Symptom:** `⚠ No tool calls found in trace`
|
|
150
|
+
|
|
151
|
+
**Solutions:**
|
|
152
|
+
1. Verify `return_trace = true` in config
|
|
153
|
+
2. Check `v3_traces=1/1` in output (should match successes)
|
|
154
|
+
3. Ensure `use_mock = true` or using mock proxy
|
|
155
|
+
4. Check task app logs: `cat /path/to/synth-ai/nohup_task_app.out`
|
|
156
|
+
|
|
157
|
+
### Task app exits immediately
|
|
158
|
+
|
|
159
|
+
**Symptom:** `0 steps`, process not running
|
|
160
|
+
|
|
161
|
+
**Solutions:**
|
|
162
|
+
1. Verify task app name: `synth-ai task-app list`
|
|
163
|
+
2. Check .env file exists at `task_app_env_file` path
|
|
164
|
+
3. Ensure running from correct directory
|
|
165
|
+
4. Manual test: `cd /synth-ai && uvx synth-ai task-app serve grpo-crafter --port 8765 --env-file /path/.env --force`
|
|
166
|
+
|
|
167
|
+
### Port conflicts
|
|
168
|
+
|
|
169
|
+
**Symptom:** `Address already in use`
|
|
170
|
+
|
|
171
|
+
**Solution:** Auto-cleanup should handle this, but manual cleanup:
|
|
172
|
+
```bash
|
|
173
|
+
lsof -ti :8080 | xargs kill -9
|
|
174
|
+
lsof -ti :8081 | xargs kill -9
|
|
175
|
+
lsof -ti :8765 | xargs kill -9
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
## Testing
|
|
179
|
+
|
|
180
|
+
### Unit Tests
|
|
181
|
+
|
|
182
|
+
- `tests/unit/test_train_validation.py::test_rl_config_with_smoke_section` - Validates `[smoke]` section parsing
|
|
183
|
+
- `tests/unit/test_smoke_config.py` - Comprehensive Pydantic validation tests
|
|
184
|
+
|
|
185
|
+
### Integration Test
|
|
186
|
+
|
|
187
|
+
```bash
|
|
188
|
+
cd examples/blog_posts/warming_up_to_rl
|
|
189
|
+
uv run synth-ai smoke --config configs/smoke_test.toml
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
**Expected result:**
|
|
193
|
+
- ✅ Services start successfully
|
|
194
|
+
- ✅ 10 tool calls displayed
|
|
195
|
+
- ✅ `v3_traces=1/1`
|
|
196
|
+
- ✅ `successes=1/1`
|
|
197
|
+
- ✅ `nonzero_returns=1/1`
|
|
198
|
+
|
|
199
|
+
## Files Modified
|
|
200
|
+
|
|
201
|
+
### Core Implementation
|
|
202
|
+
- `synth_ai/cli/commands/smoke/core.py` - Tool call extraction, auto-start logic
|
|
203
|
+
- `synth_ai/api/train/configs/rl.py` - `SmokeConfig` Pydantic model
|
|
204
|
+
- `synth_ai/api/train/builders.py` - Remove `[smoke]` before sending to trainer
|
|
205
|
+
|
|
206
|
+
### Diagnostic Suppression
|
|
207
|
+
- `synth_ai/tracing_v3/config.py` - Commented out `[TRACING_V3_CONFIG_LOADED]`
|
|
208
|
+
- `synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py` - Commented out `[PATCH]`
|
|
209
|
+
- `synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py` - Commented out `[PATCH]`
|
|
210
|
+
- `synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py` - Commented out `[PATCH]`
|
|
211
|
+
|
|
212
|
+
### Documentation
|
|
213
|
+
- `examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md` - User guide
|
|
214
|
+
- `examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md` - Developer guide
|
|
215
|
+
- `examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml` - Example config
|
|
216
|
+
- `examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml` - Inline docs
|
|
217
|
+
- `monorepo/docs/cli/smoke.mdx` - Mintlify CLI reference
|
|
218
|
+
|
|
219
|
+
### Tests
|
|
220
|
+
- `tests/unit/test_train_validation.py` - Added smoke section test
|
|
221
|
+
- `tests/unit/test_smoke_config.py` - Comprehensive smoke config tests
|
|
222
|
+
|
|
223
|
+
## Future Improvements
|
|
224
|
+
|
|
225
|
+
Ideas for future agents:
|
|
226
|
+
|
|
227
|
+
1. **Streaming display** - Show tool calls as they happen, not just at end
|
|
228
|
+
2. **Tool call validation** - Verify format matches environment expectations
|
|
229
|
+
3. **Performance metrics** - Track inference latency per call
|
|
230
|
+
4. **Cost tracking** - Display OpenAI API costs
|
|
231
|
+
5. **Parallel rollouts** - Support concurrent execution testing
|
|
232
|
+
6. **Vision support** - Save observations for vision-based tasks
|
|
233
|
+
7. **Interactive mode** - Step through rollout one action at a time
|
|
234
|
+
8. **Replay mode** - Re-run saved traces for debugging
|
|
235
|
+
|
|
236
|
+
## Success Criteria Met
|
|
237
|
+
|
|
238
|
+
✅ **Tool calls visible** - All 10 calls displayed with arguments
|
|
239
|
+
✅ **Real inference** - OpenAI gpt-4o-mini executing actual tool calls
|
|
240
|
+
✅ **Clean output** - No diagnostic noise
|
|
241
|
+
✅ **Auto-start** - Background services managed automatically
|
|
242
|
+
✅ **Well documented** - Comprehensive docs for users and developers
|
|
243
|
+
✅ **Robust** - Error handling, health checks, timeouts
|
|
244
|
+
✅ **Tested** - Unit tests and working integration test
|
|
245
|
+
|
|
246
|
+
## Contact
|
|
247
|
+
|
|
248
|
+
For questions or issues, see:
|
|
249
|
+
- Architecture details: `ARCHITECTURE.md`
|
|
250
|
+
- User guide: `SMOKE_TESTING.md`
|
|
251
|
+
- CLI reference: `monorepo/docs/cli/smoke.mdx`
|
|
252
|
+
|
|
253
|
+
|