synth-ai 0.2.9.dev0__py3-none-any.whl → 0.2.23.dev3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- examples/README.md +1 -0
- examples/__init__.py +16 -0
- examples/analyze_semantic_words.sh +17 -0
- examples/baseline/banking77_baseline.py +243 -0
- examples/baseline/banking77_pipeline_baseline.py +294 -0
- examples/baseline/crafter_baseline.py +407 -0
- examples/baseline/pokemon_red_baseline.py +326 -0
- examples/baseline/simple_baseline.py +56 -0
- examples/baseline/warming_up_to_rl_baseline.py +239 -0
- examples/blog_posts/gepa/README.md +355 -0
- examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
- examples/blog_posts/gepa/configs/banking77_gepa_test.toml +80 -0
- examples/blog_posts/gepa/configs/banking77_mipro_local.toml +50 -0
- examples/blog_posts/gepa/configs/banking77_pipeline_gepa_local.toml +101 -0
- examples/blog_posts/gepa/configs/banking77_pipeline_gepa_test.toml +96 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/hover_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/hover_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/pupa_gepa_local.toml +58 -0
- examples/blog_posts/gepa/configs/pupa_mipro_local.toml +52 -0
- examples/blog_posts/gepa/deploy_banking77_task_app.sh +54 -0
- examples/blog_posts/gepa/gepa_baseline.py +204 -0
- examples/blog_posts/gepa/query_prompts_example.py +97 -0
- examples/blog_posts/gepa/run_gepa_banking77.sh +112 -0
- examples/blog_posts/gepa/run_gepa_banking77_pipeline.sh +163 -0
- examples/blog_posts/gepa/task_apps.py +105 -0
- examples/blog_posts/gepa/test_gepa_local.sh +67 -0
- examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
- examples/blog_posts/mipro/README.md +415 -0
- examples/blog_posts/mipro/configs/banking77_mipro_local.toml +91 -0
- examples/blog_posts/mipro/configs/banking77_mipro_test.toml +87 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_gemini_flash_lite_local.toml +98 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_gpt41mini_local.toml +96 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_local.toml +94 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_test.toml +170 -0
- examples/blog_posts/mipro/deploy_banking77_pipeline_task_app.sh +59 -0
- examples/blog_posts/mipro/deploy_banking77_task_app.sh +41 -0
- examples/blog_posts/mipro/multi_step.md +79 -0
- examples/blog_posts/mipro/run_mipro_banking77.sh +191 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline.sh +171 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline_gemini_flash_lite.sh +177 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline_gpt41mini.sh +173 -0
- examples/blog_posts/mipro/verify_banking77_setup.sh +117 -0
- examples/blog_posts/pokemon_vl/README.md +98 -0
- examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
- examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +27 -0
- examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
- examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
- examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +43 -0
- examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
- examples/blog_posts/pokemon_vl/extract_images.py +239 -0
- examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
- examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
- examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
- examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
- examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
- examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
- examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
- examples/blog_posts/warming_up_to_rl/README.md +158 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
- examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
- examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
- examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +91 -0
- examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
- examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
- examples/crafter_debug_render.py +186 -0
- examples/dev/qwen3_32b_qlora_4xh100.toml +45 -0
- examples/gepa/banking77_pipeline_gepa.toml +96 -0
- examples/gepa/multi_stage_gepa_example.toml +84 -0
- examples/gepa/run_gepa_banking77_pipeline.sh +157 -0
- examples/multi_step/SFT_README.md +147 -0
- examples/multi_step/configs/README_verilog_rl.md +77 -0
- examples/multi_step/configs/VERILOG_REWARDS.md +103 -0
- examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +196 -0
- examples/multi_step/configs/crafter_eval_synth_qwen4b.toml +35 -0
- examples/multi_step/configs/crafter_eval_text_only_groq_qwen32b.toml +36 -0
- examples/multi_step/configs/crafter_rl_outcome.toml +75 -0
- examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +145 -0
- examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +84 -0
- examples/multi_step/configs/crafter_rl_stepwise_simple.toml +79 -0
- examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
- examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
- examples/multi_step/configs/crafter_synth_backend.md +40 -0
- examples/multi_step/configs/verilog_eval_groq_qwen32b.toml +31 -0
- examples/multi_step/configs/verilog_eval_synth_qwen8b.toml +33 -0
- examples/multi_step/configs/verilog_rl_lora.toml +147 -0
- examples/multi_step/convert_traces_to_sft.py +84 -0
- examples/multi_step/crafter_rl_lora.md +70 -0
- examples/multi_step/judges/crafter_backend_judge.py +220 -0
- examples/multi_step/judges/verilog_backend_judge.py +234 -0
- examples/multi_step/readme.md +48 -0
- examples/multi_step/run_sft_qwen30b.sh +45 -0
- examples/multi_step/sse_metrics_streaming_notes.md +357 -0
- examples/multi_step/task_app_config_notes.md +494 -0
- examples/multi_step/verilog_rl_lora.md +218 -0
- examples/qwen_coder/README.md +102 -0
- examples/qwen_coder/_shared.py +113 -0
- examples/qwen_coder/configs/coder_lora_30b.toml +60 -0
- examples/qwen_coder/configs/coder_lora_4b.toml +61 -0
- examples/qwen_coder/configs/coder_lora_small.toml +57 -0
- examples/qwen_coder/generate_dataset.py +98 -0
- examples/qwen_coder/infer_ft_smoke.py +65 -0
- examples/qwen_coder/infer_prod_proxy.py +73 -0
- examples/qwen_coder/infer_via_synth.py +87 -0
- examples/qwen_coder/scripts/infer_coder.sh +19 -0
- examples/qwen_coder/scripts/train_coder_30b.sh +22 -0
- examples/qwen_coder/sft_full_17b.py +103 -0
- examples/qwen_coder/sft_lora_30b.py +110 -0
- examples/qwen_coder/subset_jsonl.py +39 -0
- examples/qwen_coder/todos.md +38 -0
- examples/qwen_coder/validate_jsonl.py +60 -0
- examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
- examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
- examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
- examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
- examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
- examples/qwen_vl/QUICKSTART.md +327 -0
- examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
- examples/qwen_vl/README.md +152 -0
- examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
- examples/qwen_vl/RL_VISION_TESTING.md +333 -0
- examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
- examples/qwen_vl/SETUP_COMPLETE.md +274 -0
- examples/qwen_vl/VISION_TESTS_COMPLETE.md +489 -0
- examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
- examples/qwen_vl/__init__.py +2 -0
- examples/qwen_vl/collect_data_via_cli.md +415 -0
- examples/qwen_vl/collect_vision_traces.py +368 -0
- examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +110 -0
- examples/qwen_vl/configs/crafter_vlm_sft_example.toml +59 -0
- examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +26 -0
- examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
- examples/qwen_vl/configs/eval_gpt5nano_vision.toml +26 -0
- examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
- examples/qwen_vl/configs/filter_qwen3vl_sft.toml +49 -0
- examples/qwen_vl/configs/filter_vision_sft.toml +52 -0
- examples/qwen_vl/configs/filter_vision_test.toml +8 -0
- examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
- examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
- examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
- examples/qwen_vl/run_vision_comparison.sh +61 -0
- examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
- examples/qwen_vl/test_image_validation.py +201 -0
- examples/qwen_vl/test_sft_vision_data.py +110 -0
- examples/rl/README.md +169 -0
- examples/rl/configs/eval_base_qwen.toml +17 -0
- examples/rl/configs/eval_rl_qwen.toml +13 -0
- examples/rl/configs/rl_from_base_qwen.toml +62 -0
- examples/rl/configs/rl_from_base_qwen17.toml +80 -0
- examples/rl/configs/rl_from_ft_qwen.toml +37 -0
- examples/rl/download_dataset.py +80 -0
- examples/rl/run_eval.py +436 -0
- examples/rl/run_rl_and_save.py +111 -0
- examples/rl/task_app/README.md +21 -0
- {synth_ai/task/apps → examples/rl/task_app}/math_single_step.py +188 -50
- examples/rl/task_app/math_task_app.py +111 -0
- examples/run_crafter_demo.sh +10 -0
- examples/sdk_prompt_learning_example.py +55 -0
- examples/sft/README.md +139 -0
- examples/sft/configs/crafter_fft_qwen0p6b.toml +49 -0
- examples/sft/configs/crafter_lora_qwen0p6b.toml +49 -0
- examples/sft/evaluate.py +117 -0
- examples/sft/export_dataset.py +120 -0
- examples/sft/generate_traces.py +164 -0
- examples/swe/__init__.py +12 -0
- examples/swe/task_app/README.md +135 -0
- examples/swe/task_app/__init__.py +2 -0
- examples/swe/task_app/grpo_swe_mini.py +604 -0
- examples/swe/task_app/grpo_swe_mini_task_app.py +124 -0
- examples/swe/task_app/hosted/README.md +173 -0
- examples/swe/task_app/hosted/__init__.py +5 -0
- examples/swe/task_app/hosted/branching.py +143 -0
- examples/swe/task_app/hosted/environment_routes.py +1289 -0
- examples/swe/task_app/hosted/envs/__init__.py +1 -0
- examples/swe/task_app/hosted/envs/crafter/__init__.py +6 -0
- examples/swe/task_app/hosted/envs/crafter/app.py +1 -0
- examples/swe/task_app/hosted/envs/crafter/environment.py +522 -0
- examples/swe/task_app/hosted/envs/crafter/policy.py +478 -0
- examples/swe/task_app/hosted/envs/crafter/react_agent.py +108 -0
- examples/swe/task_app/hosted/envs/crafter/shared.py +305 -0
- examples/swe/task_app/hosted/envs/crafter/tools.py +47 -0
- examples/swe/task_app/hosted/envs/mini_swe/__init__.py +8 -0
- examples/swe/task_app/hosted/envs/mini_swe/environment.py +1191 -0
- examples/swe/task_app/hosted/envs/mini_swe/policy.py +355 -0
- examples/swe/task_app/hosted/envs/mini_swe/shared.py +83 -0
- examples/swe/task_app/hosted/envs/mini_swe/tools.py +96 -0
- examples/swe/task_app/hosted/hosted_app.py +204 -0
- examples/swe/task_app/hosted/inference/__init__.py +5 -0
- examples/swe/task_app/hosted/inference/openai_client.py +584 -0
- examples/swe/task_app/hosted/main.py +100 -0
- examples/swe/task_app/hosted/policy_routes.py +1094 -0
- examples/swe/task_app/hosted/registry.py +195 -0
- examples/swe/task_app/hosted/rollout.py +1905 -0
- examples/swe/task_app/hosted/storage/__init__.py +5 -0
- examples/swe/task_app/hosted/storage/volume.py +211 -0
- examples/swe/task_app/hosted/test_agents.py +161 -0
- examples/swe/task_app/hosted/test_service.py +136 -0
- examples/swe/task_app/hosted/utils.py +62 -0
- examples/swe/task_app/morph_backend.py +178 -0
- examples/task_apps/IMAGE_ONLY_EVAL_QUICKSTART.md +258 -0
- examples/task_apps/TESTING.md +275 -0
- examples/task_apps/banking77/__init__.py +6 -0
- examples/task_apps/banking77/banking77_task_app.py +912 -0
- examples/task_apps/banking77/deploy_wrapper.py +46 -0
- examples/task_apps/banking77_pipeline/__init__.py +6 -0
- examples/task_apps/banking77_pipeline/banking77_pipeline_task_app.py +489 -0
- examples/task_apps/banking77_pipeline/deploy_wrapper.py +50 -0
- examples/task_apps/crafter/CREATE_SFT_DATASET.md +286 -0
- examples/task_apps/crafter/EVAL_IMAGE_ONLY_RESULTS.md +152 -0
- examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +187 -0
- examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +281 -0
- examples/task_apps/crafter/QUERY_EXAMPLES.md +203 -0
- examples/task_apps/crafter/README_IMAGE_ONLY_EVAL.md +316 -0
- examples/task_apps/crafter/eval_image_only_gpt4o.toml +28 -0
- examples/task_apps/crafter/eval_text_only_groq_llama.toml +36 -0
- examples/task_apps/crafter/filter_sft_dataset.toml +16 -0
- examples/task_apps/crafter/task_app/README.md +42 -0
- examples/task_apps/crafter/task_app/__init__.py +5 -0
- examples/task_apps/crafter/task_app/grpo_crafter.py +1055 -0
- examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +146 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/README.md +173 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/branching.py +143 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/environment.py +532 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +583 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +122 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +253 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +999 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/main.py +100 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +1252 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/registry.py +195 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +2233 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/test_service.py +136 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +411 -0
- examples/task_apps/dev/pokemon_emerald/__init__.py +2 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/README.md +811 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/__init__.py +120 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/action.py +160 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/memory.py +155 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/perception.py +69 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/planning.py +96 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/simple.py +1502 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/system_prompt.py +4 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/grab_map.py +68 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/manual.py +216 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/__init__.py +35 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/emerald_utils.py +631 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/emulator.py +1544 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/enums.py +1428 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/memory_reader.py +4848 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/types.py +41 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/utils.py +298 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pyproject.toml +95 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/run.py +204 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/app.py +2152 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/client.py +429 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/frame_server.py +155 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/README.md +78 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/run_tests.py +122 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_agent_direct.py +76 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_agent_prompts.py +413 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_battle_state_formatting.py +204 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_dialogue_detection.py +133 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_dialogue_detection_comprehensive.py +229 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_direct_agent_emulator.py +300 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_fps_adjustment_pytest.py +205 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_house_to_outside_direct.py +200 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_house_to_outside_transition.py +284 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_map_ground_truth_comparison.py +468 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_memory_map.py +575 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_server_map_validation.py +311 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_torchic_state.py +259 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/anticheat.py +372 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/checkpoint.py +296 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/error_handler.py +275 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/get_local_ip.py +22 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/helpers.py +44 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/llm_logger.py +514 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_formatter.py +415 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_stitcher.py +1763 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_stitcher_singleton.py +33 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_trimmer.py +106 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_visualizer.py +334 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/ocr_dialogue.py +1020 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/recording.py +188 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/state_formatter.py +1481 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/vlm.py +862 -0
- examples/task_apps/dev/pokemon_emerald/modal_app.py +114 -0
- examples/task_apps/dev/pokemon_emerald/task_app/README.md +81 -0
- examples/task_apps/dev/pokemon_emerald/task_app/__init__.py +6 -0
- examples/task_apps/dev/pokemon_emerald/task_app/pokemon_emerald.py +685 -0
- examples/task_apps/enron/__init__.py +2 -0
- examples/task_apps/enron/eval_groq_qwen32.toml +16 -0
- examples/task_apps/enron/filter_sft.toml +5 -0
- examples/task_apps/enron/task_app/README.md +14 -0
- examples/task_apps/enron/task_app/__init__.py +1 -0
- examples/task_apps/enron/task_app/grpo_enron.py +906 -0
- examples/task_apps/enron/task_app/grpo_enron_task_app.py +146 -0
- examples/task_apps/enron/tests/__init__.py +4 -0
- examples/task_apps/enron/tests/conftest.py +115 -0
- examples/task_apps/enron/tests/integration/__init__.py +4 -0
- examples/task_apps/enron/tests/integration/test_enron_eval.py +179 -0
- examples/task_apps/enron/tests/integration/test_enron_rollout.py +135 -0
- examples/task_apps/enron/tests/unit/__init__.py +4 -0
- examples/task_apps/enron/tests/unit/test_enron_environment.py +126 -0
- examples/task_apps/gepa_benchmarks/__init__.py +7 -0
- examples/task_apps/gepa_benchmarks/common.py +260 -0
- examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
- examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
- examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
- examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
- examples/task_apps/math/README.md +21 -0
- examples/task_apps/math/math_single_step.py +1000 -0
- examples/task_apps/math/math_task_app.py +115 -0
- examples/task_apps/pokemon_battle/__init__.py +2 -0
- examples/task_apps/pokemon_battle/modal_app.py +104 -0
- examples/task_apps/pokemon_battle/task_app/README.md +68 -0
- examples/task_apps/pokemon_battle/task_app/__init__.py +6 -0
- examples/task_apps/pokemon_battle/task_app/pokemon_showdown.py +932 -0
- examples/task_apps/pokemon_red/EVAL_IMAGE_ONLY_COMPLETE.md +283 -0
- examples/task_apps/pokemon_red/EVAL_IMAGE_ONLY_STATUS.md +155 -0
- examples/task_apps/pokemon_red/README.md +356 -0
- examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +428 -0
- examples/task_apps/pokemon_red/__init__.py +3 -0
- examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +30 -0
- examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +224 -0
- examples/task_apps/pokemon_red/pallet_town_rl_config.toml +75 -0
- examples/task_apps/pokemon_red/task_app.py +1048 -0
- examples/task_apps/pokemon_red/test_pallet_town_rewards.py +193 -0
- examples/task_apps/sokoban/README.md +306 -0
- examples/task_apps/sokoban/__init__.py +3 -0
- examples/task_apps/sokoban/eval_groq_qwen32.toml +16 -0
- examples/task_apps/sokoban/eval_openai_gpt5.toml +16 -0
- examples/task_apps/sokoban/filter_sft.toml +5 -0
- examples/task_apps/sokoban/task_app.py +1058 -0
- examples/task_apps/sokoban/tests/__init__.py +4 -0
- examples/task_apps/sokoban/tests/conftest.py +113 -0
- examples/task_apps/sokoban/tests/integration/__init__.py +4 -0
- examples/task_apps/sokoban/tests/integration/test_sokoban_eval.py +57 -0
- examples/task_apps/sokoban/tests/integration/test_sokoban_rollout.py +198 -0
- examples/task_apps/sokoban/tests/unit/__init__.py +4 -0
- examples/task_apps/sokoban/tests/unit/test_sokoban_environment.py +114 -0
- examples/task_apps/verilog/__init__.py +1 -0
- examples/task_apps/verilog/eval_groq_qwen32b.toml +22 -0
- examples/task_apps/verilog/filter_sft.toml +5 -0
- examples/task_apps/verilog/task_app/README.md +12 -0
- examples/task_apps/verilog/task_app/__init__.py +1 -0
- examples/task_apps/verilog/task_app/grpo_verilog.py +1166 -0
- examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +145 -0
- examples/task_apps/verilog/tests/__init__.py +4 -0
- examples/task_apps/verilog/tests/conftest.py +115 -0
- examples/task_apps/verilog/tests/integration/__init__.py +4 -0
- examples/task_apps/verilog/tests/integration/test_verilog_eval.py +181 -0
- examples/task_apps/verilog/tests/integration/test_verilog_rollout.py +55 -0
- examples/task_apps/verilog/tests/unit/__init__.py +4 -0
- examples/task_apps/verilog/tests/unit/test_verilog_scoring.py +118 -0
- examples/tunnel_gepa_banking77/README.md +106 -0
- examples/tunnel_gepa_banking77/banking77_gepa_tunnel.toml +95 -0
- examples/tunnel_gepa_banking77/keep_tunnel_running.py +60 -0
- examples/tunnel_gepa_banking77/run_gepa_with_tunnel.sh +226 -0
- examples/vlm/PROPOSAL.md +53 -0
- examples/vlm/README.md +68 -0
- examples/vlm/configs/crafter_vlm_gpt4o.toml +49 -0
- examples/vlm/crafter_image_only_agent.py +207 -0
- examples/vlm/crafter_openai_vlm_agent.py +275 -0
- examples/vlm/filter_image_rows.py +63 -0
- examples/vlm/run_crafter_vlm_benchmark.py +316 -0
- examples/warming_up_to_rl/_utils.py +92 -0
- examples/warming_up_to_rl/analyze_trace_db.py +422 -0
- examples/warming_up_to_rl/configs/crafter_fft.toml +53 -0
- examples/warming_up_to_rl/configs/crafter_fft_4b.toml +54 -0
- examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +22 -0
- examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +15 -0
- examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +24 -0
- examples/warming_up_to_rl/configs/eval_stepwise_complex.toml +35 -0
- examples/warming_up_to_rl/configs/eval_stepwise_consistent.toml +26 -0
- examples/warming_up_to_rl/configs/eval_stepwise_per_achievement.toml +36 -0
- examples/warming_up_to_rl/configs/eval_stepwise_simple.toml +32 -0
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +85 -0
- examples/warming_up_to_rl/configs/rl_from_ft.toml +58 -0
- examples/warming_up_to_rl/export_trace_sft.py +837 -0
- examples/warming_up_to_rl/groq_test.py +97 -0
- examples/warming_up_to_rl/manage_secrets.py +131 -0
- examples/warming_up_to_rl/old/event_rewards.md +234 -0
- examples/warming_up_to_rl/old/notes.md +73 -0
- examples/warming_up_to_rl/readme.md +110 -0
- examples/warming_up_to_rl/run_eval.py +736 -0
- examples/warming_up_to_rl/run_fft_and_save.py +380 -0
- examples/warming_up_to_rl/run_local_rollout.py +239 -0
- examples/warming_up_to_rl/run_local_rollout_modal.py +248 -0
- examples/warming_up_to_rl/run_local_rollout_parallel.py +405 -0
- examples/warming_up_to_rl/run_local_rollout_traced.py +477 -0
- examples/warming_up_to_rl/run_rl_and_save.py +124 -0
- examples/warming_up_to_rl/run_rollout_remote.py +156 -0
- examples/warming_up_to_rl/task_app/README.md +42 -0
- examples/warming_up_to_rl/task_app/grpo_crafter.py +876 -0
- examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +454 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +253 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +729 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1114 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1891 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +129 -0
- examples/workflows/math_rl/configs/eval_base_qwen.toml +15 -0
- examples/workflows/math_rl/configs/eval_rl_qwen.toml +11 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen.toml +62 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +80 -0
- examples/workflows/math_rl/configs/rl_from_ft_qwen.toml +35 -0
- examples/workflows/math_rl/download_dataset.py +80 -0
- examples/workflows/math_rl/run_eval.py +436 -0
- examples/workflows/math_rl/run_rl_and_save.py +111 -0
- synth_ai/__init__.py +47 -23
- synth_ai/_utils/__init__.py +47 -0
- synth_ai/_utils/base_url.py +10 -0
- synth_ai/_utils/http.py +10 -0
- synth_ai/_utils/prompts.py +10 -0
- synth_ai/_utils/task_app_state.py +12 -0
- synth_ai/_utils/user_config.py +10 -0
- synth_ai/api/models/supported.py +514 -0
- synth_ai/api/train/__init__.py +60 -2
- synth_ai/api/train/builders.py +347 -39
- synth_ai/api/train/cli.py +895 -160
- synth_ai/api/train/config_finder.py +103 -25
- synth_ai/api/train/configs/__init__.py +65 -0
- synth_ai/api/train/configs/prompt_learning.py +496 -0
- synth_ai/api/train/configs/rl.py +188 -0
- synth_ai/api/train/configs/sft.py +99 -0
- synth_ai/api/train/configs/shared.py +81 -0
- synth_ai/api/train/env_resolver.py +70 -20
- synth_ai/api/train/pollers.py +29 -4
- synth_ai/api/train/prompt_learning.py +425 -0
- synth_ai/api/train/sft.py +390 -0
- synth_ai/api/train/supported_algos.py +147 -0
- synth_ai/api/train/task_app.py +6 -4
- synth_ai/api/train/utils.py +64 -52
- synth_ai/api/train/validators.py +1117 -0
- synth_ai/api/tunnel.py +49 -0
- synth_ai/auth/credentials.py +94 -0
- synth_ai/baseline/__init__.py +25 -0
- synth_ai/baseline/config.py +209 -0
- synth_ai/baseline/discovery.py +214 -0
- synth_ai/baseline/execution.py +146 -0
- synth_ai/cfgs.py +227 -0
- synth_ai/cli/__init__.py +85 -63
- synth_ai/cli/_modal_wrapper.py +31 -0
- synth_ai/cli/_storage.py +20 -0
- synth_ai/cli/_typer_patch.py +47 -0
- synth_ai/cli/_validate_task_app.py +29 -0
- synth_ai/cli/balance.py +16 -4
- synth_ai/cli/calc.py +36 -21
- synth_ai/cli/claude.py +70 -0
- synth_ai/cli/codex.py +267 -0
- synth_ai/cli/commands/__init__.py +18 -0
- synth_ai/cli/commands/baseline/__init__.py +12 -0
- synth_ai/cli/commands/baseline/core.py +637 -0
- synth_ai/cli/commands/baseline/list.py +93 -0
- synth_ai/cli/commands/demo/__init__.py +6 -0
- synth_ai/cli/commands/demo/core.py +163 -0
- synth_ai/cli/commands/eval/__init__.py +19 -0
- synth_ai/cli/commands/eval/core.py +1112 -0
- synth_ai/cli/commands/eval/errors.py +81 -0
- synth_ai/cli/commands/eval/validation.py +133 -0
- synth_ai/cli/commands/filter/__init__.py +12 -0
- synth_ai/cli/commands/filter/core.py +424 -0
- synth_ai/cli/commands/filter/errors.py +55 -0
- synth_ai/cli/commands/filter/validation.py +77 -0
- synth_ai/cli/commands/help/__init__.py +185 -0
- synth_ai/cli/commands/help/core.py +72 -0
- synth_ai/cli/commands/smoke/__init__.py +7 -0
- synth_ai/cli/commands/smoke/core.py +1437 -0
- synth_ai/cli/commands/status/__init__.py +66 -0
- synth_ai/cli/commands/status/client.py +192 -0
- synth_ai/cli/commands/status/config.py +92 -0
- synth_ai/cli/commands/status/errors.py +20 -0
- synth_ai/cli/commands/status/formatters.py +164 -0
- synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
- synth_ai/cli/commands/status/subcommands/files.py +79 -0
- synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
- synth_ai/cli/commands/status/subcommands/models.py +79 -0
- synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
- synth_ai/cli/commands/status/subcommands/runs.py +81 -0
- synth_ai/cli/commands/status/subcommands/session.py +183 -0
- synth_ai/cli/commands/status/subcommands/summary.py +47 -0
- synth_ai/cli/commands/status/subcommands/usage.py +203 -0
- synth_ai/cli/commands/status/utils.py +114 -0
- synth_ai/cli/commands/train/__init__.py +53 -0
- synth_ai/cli/commands/train/core.py +21 -0
- synth_ai/cli/commands/train/errors.py +117 -0
- synth_ai/cli/commands/train/judge_schemas.py +200 -0
- synth_ai/cli/commands/train/judge_validation.py +305 -0
- synth_ai/cli/commands/train/validation.py +386 -0
- synth_ai/cli/demo.py +32 -140
- synth_ai/cli/deploy.py +233 -0
- synth_ai/cli/eval/__init__.py +36 -0
- synth_ai/cli/eval/core.py +5 -0
- synth_ai/cli/eval/errors.py +31 -0
- synth_ai/cli/eval/validation.py +5 -0
- synth_ai/cli/filter/__init__.py +28 -0
- synth_ai/cli/filter/core.py +5 -0
- synth_ai/cli/filter/errors.py +23 -0
- synth_ai/cli/filter/validation.py +5 -0
- synth_ai/cli/legacy_root_backup.py +28 -22
- synth_ai/cli/lib/__init__.py +10 -0
- synth_ai/cli/lib/task_app_discovery.py +7 -0
- synth_ai/cli/lib/task_app_env.py +518 -0
- synth_ai/cli/mcp.py +34 -0
- synth_ai/cli/modal_serve/__init__.py +12 -0
- synth_ai/cli/modal_serve/core.py +14 -0
- synth_ai/cli/modal_serve/errors.py +8 -0
- synth_ai/cli/modal_serve/validation.py +11 -0
- synth_ai/cli/opencode.py +256 -0
- synth_ai/cli/recent.py +13 -7
- synth_ai/cli/rl_demo.py +156 -116
- synth_ai/cli/root.py +131 -132
- synth_ai/cli/serve/__init__.py +12 -0
- synth_ai/cli/serve/core.py +14 -0
- synth_ai/cli/serve/errors.py +8 -0
- synth_ai/cli/serve/validation.py +11 -0
- synth_ai/cli/setup.py +49 -0
- synth_ai/cli/status.py +7 -125
- synth_ai/cli/task_app_deploy.py +7 -0
- synth_ai/cli/task_app_list.py +25 -0
- synth_ai/cli/task_app_modal_serve.py +11 -0
- synth_ai/cli/task_app_serve.py +11 -0
- synth_ai/cli/task_apps.py +2284 -257
- synth_ai/cli/traces.py +9 -5
- synth_ai/cli/train/__init__.py +12 -0
- synth_ai/cli/train/core.py +21 -0
- synth_ai/cli/train/errors.py +8 -0
- synth_ai/cli/train/validation.py +24 -0
- synth_ai/cli/train.py +5 -0
- synth_ai/cli/turso.py +73 -0
- synth_ai/cli/watch.py +13 -18
- synth_ai/demos/__init__.py +10 -0
- synth_ai/demos/core/__init__.py +28 -1
- synth_ai/demos/core/cli.py +579 -291
- synth_ai/demos/crafter/__init__.py +1 -0
- synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
- synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
- synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
- synth_ai/demos/demo_registry.py +176 -0
- synth_ai/demos/demo_task_apps/__init__.py +3 -3
- synth_ai/demos/demo_task_apps/core.py +64 -28
- synth_ai/demos/demo_task_apps/crafter/__init__.py +1 -0
- synth_ai/demos/demo_task_apps/crafter/configs/crafter_fft_4b.toml +53 -0
- synth_ai/demos/demo_task_apps/crafter/configs/rl_from_base_qwen4b.toml +73 -0
- synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +184 -0
- synth_ai/demos/demo_task_apps/math/_common.py +1 -2
- synth_ai/demos/demo_task_apps/math/app.py +2 -1
- synth_ai/demos/demo_task_apps/math/deploy_modal.py +3 -6
- synth_ai/demos/demo_task_apps/math/modal_task_app.py +185 -83
- synth_ai/demos/demo_task_apps/math/task_app_entry.py +0 -2
- synth_ai/demos/math/__init__.py +1 -0
- synth_ai/demos/math/_common.py +16 -0
- synth_ai/demos/math/app.py +38 -0
- synth_ai/demos/math/config.toml +76 -0
- synth_ai/demos/math/deploy_modal.py +54 -0
- synth_ai/demos/math/modal_task_app.py +703 -0
- synth_ai/demos/math/task_app_entry.py +51 -0
- synth_ai/environments/environment/core.py +7 -1
- synth_ai/environments/examples/bandit/engine.py +12 -5
- synth_ai/environments/examples/bandit/environment.py +0 -1
- synth_ai/environments/examples/bandit/taskset.py +4 -4
- synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
- synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
- synth_ai/environments/examples/crafter_classic/environment.py +93 -2
- synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
- synth_ai/environments/examples/enron/engine.py +7 -2
- synth_ai/environments/examples/enron/environment.py +68 -0
- synth_ai/environments/examples/red/engine.py +60 -12
- synth_ai/environments/examples/red/engine_helpers/memory_map.py +7 -0
- synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
- synth_ai/environments/examples/red/engine_helpers/reward_library/pallet_town_progression.py +477 -0
- synth_ai/environments/examples/red/engine_helpers/state_extraction.py +32 -0
- synth_ai/environments/examples/red/environment.py +86 -0
- synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
- synth_ai/environments/examples/sokoban/taskset.py +116 -0
- synth_ai/environments/examples/verilog/engine.py +104 -12
- synth_ai/environments/examples/wordle/environment.py +0 -1
- synth_ai/environments/reproducibility/tree.py +5 -6
- synth_ai/environments/service/app.py +11 -12
- synth_ai/environments/service/core_routes.py +10 -9
- synth_ai/environments/stateful/engine.py +1 -1
- synth_ai/environments/tasks/core.py +1 -0
- synth_ai/environments/tasks/filters.py +5 -6
- synth_ai/environments/tasks/utils.py +4 -5
- synth_ai/evals/__init__.py +15 -0
- synth_ai/evals/base.py +14 -5
- synth_ai/evals/client.py +82 -0
- synth_ai/evals/types.py +42 -0
- synth_ai/http.py +8 -22
- synth_ai/http_client.py +45 -12
- synth_ai/inference/__init__.py +0 -2
- synth_ai/inference/client.py +21 -7
- synth_ai/jobs/client.py +129 -80
- synth_ai/judge_schemas.py +127 -0
- synth_ai/learning/__init__.py +51 -6
- synth_ai/learning/algorithms.py +14 -0
- synth_ai/learning/client.py +122 -30
- synth_ai/learning/config.py +2 -40
- synth_ai/learning/constants.py +0 -2
- synth_ai/learning/ft_client.py +4 -56
- synth_ai/learning/health.py +14 -8
- synth_ai/learning/jobs.py +43 -47
- synth_ai/learning/prompt_learning_client.py +276 -0
- synth_ai/learning/prompt_learning_types.py +185 -0
- synth_ai/{rl → learning/rl}/__init__.py +14 -5
- synth_ai/learning/rl/client.py +269 -0
- synth_ai/learning/rl/config.py +31 -0
- synth_ai/{rl → learning/rl}/contracts.py +5 -10
- synth_ai/{rl → learning/rl}/env_keys.py +45 -16
- synth_ai/learning/rl/secrets.py +13 -0
- synth_ai/learning/rl_client.py +2 -253
- synth_ai/learning/sft/__init__.py +29 -0
- synth_ai/learning/sft/client.py +68 -0
- synth_ai/learning/sft/config.py +270 -0
- synth_ai/learning/sft/data.py +698 -0
- synth_ai/learning/sse.py +25 -26
- synth_ai/learning/validators.py +29 -25
- synth_ai/mcp/__init__.py +5 -0
- synth_ai/mcp/__main__.py +8 -0
- synth_ai/mcp/main.py +254 -0
- synth_ai/mcp/setup.py +100 -0
- synth_ai/modal.py +257 -0
- synth_ai/pricing/__init__.py +3 -0
- synth_ai/pricing/model_pricing.py +64 -0
- synth_ai/session/__init__.py +75 -0
- synth_ai/session/client.py +383 -0
- synth_ai/session/constants.py +63 -0
- synth_ai/session/exceptions.py +105 -0
- synth_ai/session/manager.py +139 -0
- synth_ai/session/models.py +89 -0
- synth_ai/session/query.py +110 -0
- synth_ai/spec/__init__.py +46 -0
- synth_ai/spec/dataclasses.py +149 -0
- synth_ai/spec/loader.py +144 -0
- synth_ai/spec/serializer.py +199 -0
- synth_ai/spec/validation.py +250 -0
- synth_ai/streaming/__init__.py +29 -0
- synth_ai/streaming/config.py +94 -0
- synth_ai/streaming/handlers.py +589 -0
- synth_ai/streaming/streamer.py +320 -0
- synth_ai/streaming/types.py +95 -0
- synth_ai/task/__init__.py +50 -30
- synth_ai/task/apps/__init__.py +63 -19
- synth_ai/task/auth.py +35 -23
- synth_ai/task/client.py +15 -13
- synth_ai/task/config.py +261 -0
- synth_ai/task/contracts.py +165 -64
- synth_ai/task/datasets.py +9 -6
- synth_ai/task/errors.py +11 -10
- synth_ai/task/health.py +17 -11
- synth_ai/task/inference_api.py +101 -0
- synth_ai/task/json.py +58 -24
- synth_ai/task/proxy.py +59 -66
- synth_ai/task/rubrics/__init__.py +55 -0
- synth_ai/task/rubrics/loaders.py +156 -0
- synth_ai/task/rubrics/models.py +57 -0
- synth_ai/task/rubrics/scoring.py +116 -0
- synth_ai/task/rubrics/strict.py +149 -0
- synth_ai/task/rubrics.py +22 -15
- synth_ai/task/server.py +65 -31
- synth_ai/task/trace_correlation_helpers.py +328 -0
- synth_ai/task/tracing_utils.py +44 -28
- synth_ai/task/validators.py +449 -6
- synth_ai/task/vendors.py +5 -7
- synth_ai/tracing_v3/__init__.py +4 -0
- synth_ai/tracing_v3/abstractions.py +21 -4
- synth_ai/tracing_v3/config.py +167 -22
- synth_ai/tracing_v3/constants.py +21 -0
- synth_ai/tracing_v3/db_config.py +42 -29
- synth_ai/tracing_v3/decorators.py +80 -45
- synth_ai/tracing_v3/examples/basic_usage.py +15 -9
- synth_ai/tracing_v3/hooks.py +6 -4
- synth_ai/tracing_v3/llm_call_record_helpers.py +161 -61
- synth_ai/tracing_v3/migration_helper.py +1 -2
- synth_ai/tracing_v3/replica_sync.py +12 -7
- synth_ai/tracing_v3/serialization.py +130 -0
- synth_ai/tracing_v3/session_tracer.py +73 -16
- synth_ai/tracing_v3/storage/base.py +89 -1
- synth_ai/tracing_v3/storage/config.py +63 -16
- synth_ai/tracing_v3/storage/factory.py +11 -9
- synth_ai/tracing_v3/storage/utils.py +15 -11
- synth_ai/tracing_v3/trace_utils.py +317 -0
- synth_ai/tracing_v3/turso/__init__.py +8 -21
- synth_ai/tracing_v3/turso/daemon.py +123 -15
- synth_ai/tracing_v3/turso/models.py +5 -2
- synth_ai/tracing_v3/turso/native_manager.py +1293 -0
- synth_ai/tracing_v3/utils.py +5 -4
- synth_ai/tunnel.py +143 -0
- synth_ai/tunnel_deploy.py +278 -0
- synth_ai/types.py +8 -0
- synth_ai/urls.py +11 -0
- synth_ai/utils/__init__.py +166 -0
- synth_ai/utils/agents.py +74 -0
- synth_ai/utils/apps.py +152 -0
- synth_ai/utils/base_url.py +94 -0
- synth_ai/utils/bin.py +39 -0
- synth_ai/utils/claude.py +36 -0
- synth_ai/utils/cli.py +284 -0
- synth_ai/utils/config.py +81 -0
- synth_ai/utils/env.py +346 -0
- synth_ai/utils/errors.py +85 -0
- synth_ai/utils/http.py +172 -0
- synth_ai/utils/json.py +72 -0
- synth_ai/utils/log_filter.py +99 -0
- synth_ai/utils/logging.py +198 -0
- synth_ai/utils/modal.py +299 -0
- synth_ai/utils/paths.py +95 -0
- synth_ai/utils/process.py +233 -0
- synth_ai/utils/prompts.py +39 -0
- synth_ai/utils/sqld.py +122 -0
- synth_ai/utils/ssl.py +25 -0
- synth_ai/utils/task_app_discovery.py +882 -0
- synth_ai/utils/task_app_env.py +186 -0
- synth_ai/utils/task_app_state.py +318 -0
- synth_ai/utils/tunnel/__init__.py +12 -0
- synth_ai/utils/tunnel/config.py +55 -0
- synth_ai/utils/user_config.py +137 -0
- synth_ai/uvicorn.py +77 -0
- synth_ai-0.2.23.dev3.dist-info/METADATA +357 -0
- synth_ai-0.2.23.dev3.dist-info/RECORD +983 -0
- {synth_ai-0.2.9.dev0.dist-info → synth_ai-0.2.23.dev3.dist-info}/entry_points.txt +0 -1
- {synth_ai-0.2.9.dev0.dist-info → synth_ai-0.2.23.dev3.dist-info}/top_level.txt +1 -0
- synth_ai/cli/man.py +0 -106
- synth_ai/core/experiment.py +0 -15
- synth_ai/core/system.py +0 -15
- synth_ai/demo_registry.py +0 -258
- synth_ai/environments/examples/sokoban/units/astar_common.py +0 -95
- synth_ai/experimental/synth_oss.py +0 -446
- synth_ai/handshake.py +0 -107
- synth_ai/install_sqld.sh +0 -40
- synth_ai/learning/offline/dpo.py +0 -0
- synth_ai/learning/offline/providers.py +0 -7
- synth_ai/learning/offline/sft.py +0 -0
- synth_ai/learning/offline/shared.py +0 -0
- synth_ai/learning/online/grpo.py +0 -0
- synth_ai/learning/online/irft.py +0 -0
- synth_ai/learning/prompts/banking77_injection_eval.py +0 -168
- synth_ai/learning/prompts/gepa.py +0 -0
- synth_ai/learning/prompts/hello_world_in_context_injection_ex.py +0 -213
- synth_ai/learning/prompts/mipro.py +0 -289
- synth_ai/learning/prompts/random_search.py +0 -246
- synth_ai/learning/prompts/run_mipro_banking77.py +0 -172
- synth_ai/learning/prompts/run_random_search_banking77.py +0 -324
- synth_ai/lm/__init__.py +0 -51
- synth_ai/lm/caching/constants.py +0 -6
- synth_ai/lm/caching/dbs.py +0 -0
- synth_ai/lm/caching/ephemeral.py +0 -102
- synth_ai/lm/caching/handler.py +0 -137
- synth_ai/lm/caching/initialize.py +0 -11
- synth_ai/lm/caching/persistent.py +0 -114
- synth_ai/lm/config.py +0 -110
- synth_ai/lm/constants.py +0 -32
- synth_ai/lm/core/__init__.py +0 -8
- synth_ai/lm/core/all.py +0 -73
- synth_ai/lm/core/exceptions.py +0 -7
- synth_ai/lm/core/main.py +0 -319
- synth_ai/lm/core/main_v3.py +0 -594
- synth_ai/lm/core/synth_models.py +0 -48
- synth_ai/lm/core/vendor_clients.py +0 -188
- synth_ai/lm/cost/monitor.py +0 -1
- synth_ai/lm/cost/statefulness.py +0 -1
- synth_ai/lm/injection.py +0 -80
- synth_ai/lm/overrides.py +0 -206
- synth_ai/lm/provider_support/__init__.py +0 -8
- synth_ai/lm/provider_support/anthropic.py +0 -972
- synth_ai/lm/provider_support/openai.py +0 -1139
- synth_ai/lm/provider_support/suppress_logging.py +0 -31
- synth_ai/lm/structured_outputs/handler.py +0 -440
- synth_ai/lm/structured_outputs/inject.py +0 -297
- synth_ai/lm/structured_outputs/rehabilitate.py +0 -185
- synth_ai/lm/tools/__init__.py +0 -3
- synth_ai/lm/tools/base.py +0 -172
- synth_ai/lm/unified_interface.py +0 -202
- synth_ai/lm/vendors/base.py +0 -81
- synth_ai/lm/vendors/core/anthropic_api.py +0 -387
- synth_ai/lm/vendors/core/gemini_api.py +0 -292
- synth_ai/lm/vendors/core/mistral_api.py +0 -322
- synth_ai/lm/vendors/core/openai_api.py +0 -225
- synth_ai/lm/vendors/core/synth_dev_api.py +0 -0
- synth_ai/lm/vendors/local/ollama.py +0 -0
- synth_ai/lm/vendors/openai_standard.py +0 -780
- synth_ai/lm/vendors/openai_standard_responses.py +0 -256
- synth_ai/lm/vendors/retries.py +0 -22
- synth_ai/lm/vendors/supported/custom_endpoint.py +0 -417
- synth_ai/lm/vendors/supported/deepseek.py +0 -69
- synth_ai/lm/vendors/supported/grok.py +0 -75
- synth_ai/lm/vendors/supported/groq.py +0 -16
- synth_ai/lm/vendors/supported/ollama.py +0 -15
- synth_ai/lm/vendors/supported/openrouter.py +0 -74
- synth_ai/lm/vendors/supported/together.py +0 -11
- synth_ai/lm/vendors/synth_client.py +0 -808
- synth_ai/lm/warmup.py +0 -186
- synth_ai/rl/secrets.py +0 -19
- synth_ai/scripts/verify_rewards.py +0 -100
- synth_ai/task/apps/grpo_crafter.py +0 -438
- synth_ai/tracing/__init__.py +0 -30
- synth_ai/tracing_v1/__init__.py +0 -33
- synth_ai/tracing_v3/turso/manager.py +0 -774
- synth_ai/v0/tracing/abstractions.py +0 -224
- synth_ai/v0/tracing/base_client.py +0 -91
- synth_ai/v0/tracing/client_manager.py +0 -131
- synth_ai/v0/tracing/config.py +0 -142
- synth_ai/v0/tracing/context.py +0 -146
- synth_ai/v0/tracing/decorators.py +0 -682
- synth_ai/v0/tracing/events/__init__.py +0 -0
- synth_ai/v0/tracing/events/manage.py +0 -147
- synth_ai/v0/tracing/events/scope.py +0 -86
- synth_ai/v0/tracing/events/store.py +0 -228
- synth_ai/v0/tracing/immediate_client.py +0 -151
- synth_ai/v0/tracing/local.py +0 -18
- synth_ai/v0/tracing/log_client_base.py +0 -73
- synth_ai/v0/tracing/retry_queue.py +0 -186
- synth_ai/v0/tracing/trackers.py +0 -515
- synth_ai/v0/tracing/upload.py +0 -512
- synth_ai/v0/tracing/utils.py +0 -9
- synth_ai/v0/tracing_v1/__init__.py +0 -16
- synth_ai/v0/tracing_v1/abstractions.py +0 -224
- synth_ai/v0/tracing_v1/base_client.py +0 -91
- synth_ai/v0/tracing_v1/client_manager.py +0 -131
- synth_ai/v0/tracing_v1/config.py +0 -142
- synth_ai/v0/tracing_v1/context.py +0 -146
- synth_ai/v0/tracing_v1/decorators.py +0 -703
- synth_ai/v0/tracing_v1/events/__init__.py +0 -0
- synth_ai/v0/tracing_v1/events/manage.py +0 -147
- synth_ai/v0/tracing_v1/events/scope.py +0 -86
- synth_ai/v0/tracing_v1/events/store.py +0 -228
- synth_ai/v0/tracing_v1/immediate_client.py +0 -151
- synth_ai/v0/tracing_v1/local.py +0 -18
- synth_ai/v0/tracing_v1/log_client_base.py +0 -73
- synth_ai/v0/tracing_v1/retry_queue.py +0 -186
- synth_ai/v0/tracing_v1/trackers.py +0 -515
- synth_ai/v0/tracing_v1/upload.py +0 -527
- synth_ai/v0/tracing_v1/utils.py +0 -9
- synth_ai/zyk/__init__.py +0 -30
- synth_ai-0.2.9.dev0.dist-info/METADATA +0 -131
- synth_ai-0.2.9.dev0.dist-info/RECORD +0 -444
- {synth_ai/lm/caching → examples/task_apps}/__init__.py +0 -0
- {synth_ai/lm/cost → examples/task_apps/crafter}/__init__.py +0 -0
- {synth_ai/lm/structured_outputs → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server}/__init__.py +0 -0
- {synth_ai/lm/vendors → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests}/__init__.py +0 -0
- {synth_ai/lm/vendors/core → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils}/__init__.py +0 -0
- {synth_ai/lm/vendors/local → examples/task_apps/math}/__init__.py +0 -0
- {synth_ai/lm/vendors/supported → examples/workflows}/__init__.py +0 -0
- {synth_ai/v0/tracing → examples/workflows/math_rl}/__init__.py +0 -0
- /synth_ai/{compound/cais.py → cli/__main__.py} +0 -0
- /synth_ai/{learning/filtering.py → py.typed} +0 -0
- {synth_ai-0.2.9.dev0.dist-info → synth_ai-0.2.23.dev3.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.9.dev0.dist-info → synth_ai-0.2.23.dev3.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
# Smoke Test Architecture
|
|
2
|
+
|
|
3
|
+
This document explains how the smoke test works internally, for future maintenance and debugging.
|
|
4
|
+
|
|
5
|
+
## Component Overview
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
9
|
+
│ synth-ai smoke command │
|
|
10
|
+
│ (synth_ai/cli/commands/smoke/core.py) │
|
|
11
|
+
└────────────┬────────────────────────────────────────────────────┘
|
|
12
|
+
│
|
|
13
|
+
├─► Auto-start sqld (optional)
|
|
14
|
+
│ ├─ Kill existing process on ports 8080/8081
|
|
15
|
+
│ ├─ Start: sqld --db-path ... --hrana-listen-addr ... --http-listen-addr ...
|
|
16
|
+
│ └─ Health check: GET http://127.0.0.1:8081/health
|
|
17
|
+
│
|
|
18
|
+
├─► Auto-start task app (optional)
|
|
19
|
+
│ ├─ Kill existing process on port 8765
|
|
20
|
+
│ ├─ Start: nohup uvx synth-ai task-app serve ... (from synth-ai root)
|
|
21
|
+
│ ├─ Health check: GET http://localhost:8765/health (accepts 200 or 400)
|
|
22
|
+
│ └─ Output: nohup_task_app.out
|
|
23
|
+
│
|
|
24
|
+
├─► Start mock RL trainer (if use_mock=true)
|
|
25
|
+
│ ├─ MockRLTrainer(port=0, backend="openai")
|
|
26
|
+
│ ├─ Forwards requests to OpenAI API
|
|
27
|
+
│ └─ Logs: [mock-rl] ← request / → response
|
|
28
|
+
│
|
|
29
|
+
└─► Execute rollout
|
|
30
|
+
├─ POST /rollout to task app
|
|
31
|
+
├─ Capture response with v3 trace
|
|
32
|
+
└─ Extract and display tool calls
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Key Implementation Details
|
|
37
|
+
|
|
38
|
+
### 1. Tool Call Extraction
|
|
39
|
+
|
|
40
|
+
**Location:** `synth_ai/cli/commands/smoke/core.py` lines ~946-1005
|
|
41
|
+
|
|
42
|
+
**How it works:**
|
|
43
|
+
1. Request rollout with `return_trace=True` and `trace_format="structured"`
|
|
44
|
+
2. Response includes `trace.event_history[]` - list of policy and environment events
|
|
45
|
+
3. Policy events have `call_records[]` containing LLM call metadata
|
|
46
|
+
4. Each `call_record` has `output_tool_calls[]` with tool call details
|
|
47
|
+
5. Extract `name` and `arguments_json` from each tool call
|
|
48
|
+
6. Display formatted tool calls to user
|
|
49
|
+
|
|
50
|
+
**Data structure:**
|
|
51
|
+
```python
|
|
52
|
+
response.trace = {
|
|
53
|
+
"event_history": [
|
|
54
|
+
{
|
|
55
|
+
"call_records": [ # Present in policy events
|
|
56
|
+
{
|
|
57
|
+
"output_tool_calls": [
|
|
58
|
+
{
|
|
59
|
+
"name": "interact_many",
|
|
60
|
+
"arguments_json": '{"actions":["move_up","move_up"]}',
|
|
61
|
+
"call_id": "call_xyz",
|
|
62
|
+
"index": 0
|
|
63
|
+
}
|
|
64
|
+
],
|
|
65
|
+
"model_name": "gpt-4o-mini",
|
|
66
|
+
"provider": "openai",
|
|
67
|
+
...
|
|
68
|
+
}
|
|
69
|
+
],
|
|
70
|
+
"metadata": {...},
|
|
71
|
+
...
|
|
72
|
+
},
|
|
73
|
+
{
|
|
74
|
+
# Environment step event (no call_records)
|
|
75
|
+
"reward": 1.0,
|
|
76
|
+
"terminated": false,
|
|
77
|
+
...
|
|
78
|
+
},
|
|
79
|
+
...
|
|
80
|
+
],
|
|
81
|
+
"session_id": "...",
|
|
82
|
+
"markov_blanket_message_history": [...],
|
|
83
|
+
...
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### 2. Background Service Management
|
|
88
|
+
|
|
89
|
+
**Task App Startup:**
|
|
90
|
+
- Must run from synth-ai root for task app discovery
|
|
91
|
+
- Uses `nohup` to detach process
|
|
92
|
+
- Redirects output to `nohup_task_app.out`
|
|
93
|
+
- Polls `/health` endpoint (accepts 200 or 400 status)
|
|
94
|
+
- Timeout: 120 seconds with progress updates every 5 seconds
|
|
95
|
+
- Propagates `SYNTH_QUIET=1` to suppress diagnostic messages
|
|
96
|
+
|
|
97
|
+
**sqld Startup:**
|
|
98
|
+
- Starts with Hrana WebSocket (8080) and HTTP (8081) ports
|
|
99
|
+
- Polls `/health` endpoint for readiness
|
|
100
|
+
- Timeout: 30 seconds
|
|
101
|
+
|
|
102
|
+
**Port Cleanup:**
|
|
103
|
+
- Uses `lsof -ti :PORT` to find PIDs
|
|
104
|
+
- Kills processes with `kill -9 PID`
|
|
105
|
+
- Waits 2 seconds for port release
|
|
106
|
+
|
|
107
|
+
### 3. Mock RL Trainer
|
|
108
|
+
|
|
109
|
+
The mock trainer (`MockRLTrainer`) acts as a proxy:
|
|
110
|
+
- `backend="synthetic"`: Generates fake tool calls deterministically
|
|
111
|
+
- `backend="openai"`: Forwards to real OpenAI API
|
|
112
|
+
- Logs all requests/responses with `[mock-rl]` prefix
|
|
113
|
+
- Auto-assigns port if `port=0`
|
|
114
|
+
|
|
115
|
+
### 4. Diagnostic Message Suppression
|
|
116
|
+
|
|
117
|
+
**Permanently disabled (commented out):**
|
|
118
|
+
- `synth_ai/tracing_v3/config.py`: `[TRACING_V3_CONFIG_LOADED]` message
|
|
119
|
+
- `synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py`: All `[PATCH]` messages
|
|
120
|
+
- `synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py`: All `[PATCH]` messages
|
|
121
|
+
- `synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py`: All `[PATCH]` messages
|
|
122
|
+
|
|
123
|
+
**Reason:** These messages add noise to smoke test output. They're still in the code as comments for documentation.
|
|
124
|
+
|
|
125
|
+
## Troubleshooting Guide
|
|
126
|
+
|
|
127
|
+
### No tool calls displayed
|
|
128
|
+
|
|
129
|
+
**Symptom:** Output shows `⚠ No tool calls found in trace`
|
|
130
|
+
|
|
131
|
+
**Causes:**
|
|
132
|
+
1. `return_trace=false` in config - **FIX:** Set `return_trace = true`
|
|
133
|
+
2. Trace format mismatch - Check `response.trace.event_history` structure
|
|
134
|
+
3. No LLM calls made - Check for policy errors in task app logs
|
|
135
|
+
|
|
136
|
+
**Debug:**
|
|
137
|
+
```bash
|
|
138
|
+
# Check task app logs
|
|
139
|
+
cat /path/to/synth-ai/nohup_task_app.out
|
|
140
|
+
|
|
141
|
+
# Verify trace structure
|
|
142
|
+
# Add debug output in core.py around line 978:
|
|
143
|
+
click.echo(f"DEBUG: trace keys: {list(tr.keys())}")
|
|
144
|
+
click.echo(f"DEBUG: event_history length: {len(event_history)}")
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Task app exits immediately
|
|
148
|
+
|
|
149
|
+
**Symptom:** `0 steps` in rollout, task app process not running
|
|
150
|
+
|
|
151
|
+
**Causes:**
|
|
152
|
+
1. Wrong task app name - **FIX:** Use `synth-ai task-app list` to find correct name
|
|
153
|
+
2. Missing .env file - **FIX:** Ensure `task_app_env_file` points to valid .env
|
|
154
|
+
3. Wrong working directory - **FIX:** Task app must be started from synth-ai root
|
|
155
|
+
|
|
156
|
+
**Debug:**
|
|
157
|
+
```bash
|
|
158
|
+
# Manual test
|
|
159
|
+
cd /path/to/synth-ai
|
|
160
|
+
uvx synth-ai task-app serve grpo-crafter --port 8765 --env-file /path/to/.env --force
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Port conflicts
|
|
164
|
+
|
|
165
|
+
**Symptom:** `Address already in use` errors
|
|
166
|
+
|
|
167
|
+
**Fix:** The smoke command auto-kills processes on ports 8080, 8081, 8765. If manual cleanup needed:
|
|
168
|
+
```bash
|
|
169
|
+
lsof -ti :8080 | xargs kill -9
|
|
170
|
+
lsof -ti :8081 | xargs kill -9
|
|
171
|
+
lsof -ti :8765 | xargs kill -9
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
## Future Improvements
|
|
175
|
+
|
|
176
|
+
Potential enhancements for future agents:
|
|
177
|
+
|
|
178
|
+
1. **Streaming tool call display**: Show tool calls as they happen, not just at the end
|
|
179
|
+
2. **Tool call validation**: Verify tool calls match expected format for the environment
|
|
180
|
+
3. **Performance metrics**: Track inference latency per tool call
|
|
181
|
+
4. **Cost tracking**: Display OpenAI API costs for the smoke test
|
|
182
|
+
5. **Parallel rollouts**: Support `--parallel N` to test concurrent execution
|
|
183
|
+
6. **Video/image capture**: For vision-based tasks, save observations
|
|
184
|
+
7. **Interactive mode**: Allow stepping through rollout one action at a time
|
|
185
|
+
|
|
186
|
+
## Related Files
|
|
187
|
+
|
|
188
|
+
- `synth_ai/cli/commands/smoke/core.py` - Main smoke command implementation
|
|
189
|
+
- `synth_ai/api/train/configs/rl.py` - `SmokeConfig` Pydantic model
|
|
190
|
+
- `synth_ai/api/train/builders.py` - Removes `[smoke]` section before sending to trainer
|
|
191
|
+
- `synth_ai/task/contracts.py` - `RolloutResponse` with trace field
|
|
192
|
+
- `examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md` - User-facing documentation
|
|
193
|
+
- `monorepo/docs/cli/smoke.mdx` - Mintlify documentation
|
|
194
|
+
|
|
195
|
+
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# Final Inference Test Results
|
|
2
|
+
|
|
3
|
+
**Date**: Oct 31, 2025
|
|
4
|
+
**Endpoint**: `https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions`
|
|
5
|
+
|
|
6
|
+
## Summary
|
|
7
|
+
|
|
8
|
+
| Model Type | Status | Result |
|
|
9
|
+
|------------|--------|--------|
|
|
10
|
+
| Base Model (Qwen/Qwen3-4B) | ✅ WORKS | Inference successful |
|
|
11
|
+
| PEFT/SFT (Qwen3-0.6B) | ✅ WORKS | Inference successful |
|
|
12
|
+
| RL (Qwen3-4B) | ❌ **BROKEN** | Modal function crashes |
|
|
13
|
+
|
|
14
|
+
## Detailed Results
|
|
15
|
+
|
|
16
|
+
### ✅ Test 1: Base Model (No Fine-Tuning)
|
|
17
|
+
|
|
18
|
+
**Model**: `Qwen/Qwen3-4B`
|
|
19
|
+
|
|
20
|
+
**Result**: **SUCCESS** ✅
|
|
21
|
+
- **Status**: 200 OK
|
|
22
|
+
- **Tokens**: 31 prompt + 100 completion = 131 total
|
|
23
|
+
- **Response**: Generated successfully
|
|
24
|
+
|
|
25
|
+
**Notes**:
|
|
26
|
+
- First attempt returned 303 redirect (cold start)
|
|
27
|
+
- Retry succeeded immediately
|
|
28
|
+
- This confirms the endpoint and auth work correctly
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
### ✅ Test 2: PEFT/SFT Model
|
|
33
|
+
|
|
34
|
+
**Model**: `peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9`
|
|
35
|
+
|
|
36
|
+
**Result**: **SUCCESS** ✅
|
|
37
|
+
- **Status**: 200 OK (consistent across retries)
|
|
38
|
+
- **Tokens**: 31 prompt + 100 completion = 131 total
|
|
39
|
+
- **Response**: "Hello, I am working!" (with thinking tokens)
|
|
40
|
+
|
|
41
|
+
**Notes**:
|
|
42
|
+
- Works reliably
|
|
43
|
+
- No cold start issues
|
|
44
|
+
- This is the expected behavior for all models
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
### ❌ Test 3: RL Model
|
|
49
|
+
|
|
50
|
+
**Model**: `rl:Qwen/Qwen3-4B:job_19a38041c38f96e638c:checkpoint-epoch-1`
|
|
51
|
+
|
|
52
|
+
**Result**: **FAILURE** ❌ - Multiple error modes
|
|
53
|
+
|
|
54
|
+
#### First Attempt:
|
|
55
|
+
```
|
|
56
|
+
Status: 400 Bad Request
|
|
57
|
+
Error: "Device string must not be empty"
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
#### Retry:
|
|
61
|
+
```
|
|
62
|
+
Status: 500 Internal Server Error
|
|
63
|
+
Error: "modal-http: internal error: function was terminated by signal"
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**This is a Modal function crash** - the inference function terminated unexpectedly.
|
|
67
|
+
|
|
68
|
+
#### Cold Start (from Modal logs):
|
|
69
|
+
```
|
|
70
|
+
RuntimeError: Cannot find any model weights with
|
|
71
|
+
'/models/rl/Qwen/Qwen3-4B/job_19a38041c38f96e638c/checkpoint-fixed'
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**Root Cause**: RL checkpoint contains LoRA adapter files (`adapter_config.json`, `adapter_model.safetensors`), but vLLM expects full merged model weights.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## Conclusion
|
|
79
|
+
|
|
80
|
+
### What Works ✅
|
|
81
|
+
- **Base models**: Standard HuggingFace models load and inference correctly
|
|
82
|
+
- **PEFT/SFT models**: Fine-tuned models with merged weights work perfectly
|
|
83
|
+
|
|
84
|
+
### What's Broken ❌
|
|
85
|
+
- **RL models**: Crash during model loading because:
|
|
86
|
+
1. RL checkpoints are stored as LoRA adapters
|
|
87
|
+
2. vLLM weight loader expects full model weights
|
|
88
|
+
3. Missing merge step causes vLLM to crash
|
|
89
|
+
4. Modal function terminates with signal (crash)
|
|
90
|
+
|
|
91
|
+
### Impact
|
|
92
|
+
- **HIGH SEVERITY**: All RL-trained models cannot be used for inference
|
|
93
|
+
- Users can train RL models but cannot deploy them
|
|
94
|
+
- This blocks the core RL training → inference workflow
|
|
95
|
+
|
|
96
|
+
### Next Steps
|
|
97
|
+
See `monorepo/RL_INFERENCE_BUG.md` for:
|
|
98
|
+
- Detailed root cause analysis
|
|
99
|
+
- Reproduction script
|
|
100
|
+
- Suggested fix (merge LoRA adapters before vLLM loading)
|
|
101
|
+
- Code locations to modify
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Developer Experience Issues Identified
|
|
106
|
+
|
|
107
|
+
### Issue #1: Confusing Error Messages
|
|
108
|
+
- **400 "Device string must not be empty"** - Not helpful, doesn't indicate RL adapter issue
|
|
109
|
+
- **500 "function was terminated by signal"** - Generic crash, no context
|
|
110
|
+
- **Should be**: "RL checkpoint contains adapter files. Merge required for vLLM loading."
|
|
111
|
+
|
|
112
|
+
### Issue #2: Inconsistent Behavior
|
|
113
|
+
- Sometimes returns 303 redirect
|
|
114
|
+
- Sometimes returns 400
|
|
115
|
+
- Sometimes crashes with 500
|
|
116
|
+
- **Should be**: Consistent error message explaining the issue
|
|
117
|
+
|
|
118
|
+
### Issue #3: Not Obvious How to Test Models
|
|
119
|
+
- Had to try 3 different endpoint URLs before finding the right one
|
|
120
|
+
- No documentation on model ID formats
|
|
121
|
+
- **Should be**: `synth-ai inference --model "rl:..." --message "test"` CLI command
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
**Status**: Bug documented and reproduction available.
|
|
126
|
+
**See**: `monorepo/RL_INFERENCE_BUG.md` for full details.
|
|
127
|
+
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# ✅ Inference Success Report
|
|
2
|
+
|
|
3
|
+
**Date**: Oct 31, 2025
|
|
4
|
+
**Models Tested**: Latest SFT and RL models from training
|
|
5
|
+
|
|
6
|
+
## Working Solution
|
|
7
|
+
|
|
8
|
+
### Correct Endpoint
|
|
9
|
+
```
|
|
10
|
+
https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
### SFT/PEFT Models: ✅ WORKING
|
|
14
|
+
|
|
15
|
+
**Model ID**: `peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9`
|
|
16
|
+
|
|
17
|
+
**Test Code**:
|
|
18
|
+
```python
|
|
19
|
+
import httpx
|
|
20
|
+
import os
|
|
21
|
+
|
|
22
|
+
SYNTH_API_KEY = os.getenv("SYNTH_API_KEY")
|
|
23
|
+
url = "https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions"
|
|
24
|
+
|
|
25
|
+
headers = {
|
|
26
|
+
"Authorization": f"Bearer {SYNTH_API_KEY}",
|
|
27
|
+
"Content-Type": "application/json",
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
payload = {
|
|
31
|
+
"model": "peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9",
|
|
32
|
+
"messages": [
|
|
33
|
+
{"role": "system", "content": "You are a helpful assistant."},
|
|
34
|
+
{"role": "user", "content": "Say 'Hello, I am working!' and nothing else."}
|
|
35
|
+
],
|
|
36
|
+
"temperature": 0.2,
|
|
37
|
+
"max_tokens": 100,
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
with httpx.Client(timeout=300.0) as client:
|
|
41
|
+
response = client.post(url, json=payload, headers=headers)
|
|
42
|
+
print(response.json())
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**Result**:
|
|
46
|
+
- ✅ Status: 200 OK
|
|
47
|
+
- ✅ Response generated successfully
|
|
48
|
+
- ✅ Token usage tracked: 31 prompt + 72 completion = 103 total
|
|
49
|
+
- ✅ Output: "Hello, I am working!" (with thinking tokens as expected)
|
|
50
|
+
|
|
51
|
+
### RL Models: ⚠️ NEEDS PROMOTION
|
|
52
|
+
|
|
53
|
+
**Model ID**: `rl:Qwen/Qwen3-4B:job_19a38041c38f96e638c:checkpoint-epoch-1`
|
|
54
|
+
|
|
55
|
+
**Status**: 303 Redirect (empty response)
|
|
56
|
+
|
|
57
|
+
**Root Cause**:
|
|
58
|
+
From monorepo backend code inspection, RL checkpoints require a "promotion" step to be loaded onto Modal before they can be used for inference. The direct Modal endpoint returns a redirect for unpromoted RL models.
|
|
59
|
+
|
|
60
|
+
**Solution Options**:
|
|
61
|
+
|
|
62
|
+
#### Option 1: Use Backend Proxy (Recommended)
|
|
63
|
+
The backend automatically handles RL promotion:
|
|
64
|
+
```python
|
|
65
|
+
# Use backend proxy instead of direct Modal
|
|
66
|
+
url = "https://your-backend.example.com/api/chat/completions"
|
|
67
|
+
# Backend will auto-promote and route to vLLM
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
#### Option 2: Manual Promotion (Advanced)
|
|
71
|
+
1. Call promotion endpoint first
|
|
72
|
+
2. Wait for model to load onto Modal
|
|
73
|
+
3. Then call inference endpoint
|
|
74
|
+
|
|
75
|
+
## Key Learnings
|
|
76
|
+
|
|
77
|
+
### What We Got Wrong Initially:
|
|
78
|
+
1. ❌ Wrong endpoint path: Used `/v1/chat/completions` → should be `/chat/completions`
|
|
79
|
+
2. ❌ Wrong base URL: Used render.com URL → should be Modal URL
|
|
80
|
+
3. ❌ Assumed RL = PEFT workflow → RL needs promotion step
|
|
81
|
+
|
|
82
|
+
### What We Got Right:
|
|
83
|
+
1. ✅ Model ID format from `synth-ai status models list`
|
|
84
|
+
2. ✅ Using SYNTH_API_KEY for auth
|
|
85
|
+
3. ✅ Bearer token authorization header
|
|
86
|
+
|
|
87
|
+
## Recommendations for Library Improvement
|
|
88
|
+
|
|
89
|
+
### 1. Add Simple CLI Command
|
|
90
|
+
```bash
|
|
91
|
+
synth-ai inference \
|
|
92
|
+
--model "peft:Qwen/Qwen3-0.6B:job_xxx" \
|
|
93
|
+
--message "Hello" \
|
|
94
|
+
--max-tokens 100
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### 2. Document Endpoint in Model Status
|
|
98
|
+
```bash
|
|
99
|
+
$ synth-ai status models get "peft:..."
|
|
100
|
+
Model: peft:Qwen/Qwen3-0.6B:job_xxx
|
|
101
|
+
Status: succeeded
|
|
102
|
+
Inference Endpoint: https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions
|
|
103
|
+
Ready: ✅ Yes (use directly)
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### 3. Add Python SDK Example
|
|
107
|
+
```python
|
|
108
|
+
from synth_ai import InferenceClient
|
|
109
|
+
|
|
110
|
+
client = InferenceClient(api_key=os.getenv("SYNTH_API_KEY"))
|
|
111
|
+
response = client.chat.completions.create(
|
|
112
|
+
model="peft:Qwen/Qwen3-0.6B:job_xxx",
|
|
113
|
+
messages=[{"role": "user", "content": "Hello"}]
|
|
114
|
+
)
|
|
115
|
+
print(response.choices[0].message.content)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### 4. Clear Error Messages
|
|
119
|
+
- 303 → "RL model needs promotion. Use backend proxy or call /promote endpoint first."
|
|
120
|
+
- 404 → "Model not found. Check model ID with: synth-ai status models list"
|
|
121
|
+
|
|
122
|
+
## Success Criteria Met
|
|
123
|
+
|
|
124
|
+
- ✅ Can get model ID from CLI
|
|
125
|
+
- ✅ Know correct endpoint
|
|
126
|
+
- ✅ Know correct auth (SYNTH_API_KEY)
|
|
127
|
+
- ✅ Can send test message
|
|
128
|
+
- ✅ Get response back
|
|
129
|
+
- ⚠️ RL models need extra step (documented)
|
|
130
|
+
|
|
131
|
+
**Status**: PEFT/SFT inference is fully working! RL needs backend proxy.
|
|
132
|
+
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
# Crafter: From Rollouts to RL with the Synth AI CLI
|
|
2
|
+
|
|
3
|
+
This playbook mirrors the original “Warming Up to RL” walkthrough, but swaps the bespoke scripts for the first–class `uvx synth-ai` helpers. Every step—from deploying the task app to filtering rollouts, fine-tuning, and bootstrapping RL— now uses the same CLI you’d reach for in production.
|
|
4
|
+
|
|
5
|
+
All commands assume you are inside the repository root and have `uv`/`uvx` available.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 0. Prerequisites
|
|
10
|
+
|
|
11
|
+
1. Install dependencies and authenticate once:
|
|
12
|
+
```bash
|
|
13
|
+
uv pip install -e .
|
|
14
|
+
uvx synth-ai setup
|
|
15
|
+
```
|
|
16
|
+
The setup wizard writes the required `SYNTH_API_KEY`, `ENVIRONMENT_API_KEY`, and local `.env` helpers.
|
|
17
|
+
|
|
18
|
+
2. Copy the example secrets if you need a starter file:
|
|
19
|
+
```bash
|
|
20
|
+
cp examples/warming_up_to_rl/.env.example .env
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
3. Export the path we use for trace capture (optional but keeps things tidy):
|
|
24
|
+
```bash
|
|
25
|
+
export CRAFTER_TRACE_DB=traces/v3/crafter_blog.db
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## 1. Ship the Crafter Task App
|
|
31
|
+
|
|
32
|
+
Deploy the hosted Crafter environment once. The Modal URL that prints at the end is reused by eval, SFT, and RL.
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
uvx synth-ai deploy grpo-crafter \
|
|
36
|
+
--runtime modal \
|
|
37
|
+
--modal-mode serve \
|
|
38
|
+
--name crafter-blogpost \
|
|
39
|
+
--env-file .env
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
For local testing you can run:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
uvx synth-ai deploy grpo-crafter \
|
|
46
|
+
--runtime uvicorn \
|
|
47
|
+
--port 8001 \
|
|
48
|
+
--trace traces/v3 \
|
|
49
|
+
--env-file .env
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Copy the Modal URL (e.g. `https://your-app.modal.run`) and replace the `task_app_url` placeholders inside every config under `examples/blog_posts/warming_up_to_rl/configs/`.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## 2. Collect High-Quality Rollouts
|
|
57
|
+
|
|
58
|
+
We lean on large teacher models to produce demonstrations. The configs in `configs/` already request full traces so we retain chain-of-thought.
|
|
59
|
+
|
|
60
|
+
Groq Qwen3-32B (text-only prompt):
|
|
61
|
+
```bash
|
|
62
|
+
uvx synth-ai eval grpo-crafter \
|
|
63
|
+
--config examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml \
|
|
64
|
+
--trace-db "${CRAFTER_TRACE_DB}"
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
GPT-OSS-120B via Groq’s OpenAI-compatible endpoint (also text-only):
|
|
68
|
+
```bash
|
|
69
|
+
uvx synth-ai eval grpo-crafter \
|
|
70
|
+
--config examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml \
|
|
71
|
+
--trace-db "${CRAFTER_TRACE_DB}"
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Both configs disable image attachments and rely on the textual observation renderer (`format_observation`) so Groq stays within its supported modalities. If you want to try other models, keep `use_vision = false` unless the provider explicitly supports image inputs.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## 3. Filter Into an SFT Dataset
|
|
79
|
+
|
|
80
|
+
Once traces are stored in `CRAFT_TRACE_DB`, trim to the crisp trajectories:
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
uvx synth-ai filter \
|
|
84
|
+
--config examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
The output JSONL lands in `ft_data/crafter_blog_high_reward.jsonl`, ready for supervised fine-tuning.
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## 4. Fine-Tune Qwen3-4B with `uvx synth-ai train`
|
|
92
|
+
|
|
93
|
+
Update the dataset path (and optionally hyperparameters) in `train_sft_qwen4b.toml`, then launch:
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
uvx synth-ai train \
|
|
97
|
+
--type sft \
|
|
98
|
+
--config examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml \
|
|
99
|
+
--env-file .env \
|
|
100
|
+
--poll
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Capture the returned job id (it looks like `fft:Qwen/Qwen3-4B:job_xxxxx`). We reuse that identifier in the evaluation and RL configs.
|
|
104
|
+
At any time you can list recently minted checkpoints with:
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
uvx synth-ai status models
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
The output table shows the canonical model name/ID alongside the source job.
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## 5. Evaluate the Fine-Tuned Checkpoint
|
|
115
|
+
|
|
116
|
+
Replace both `REPLACE-WITH-SFT-JOB-ID` strings inside `eval_ft_qwen4b.toml`, then run:
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
uvx synth-ai eval grpo-crafter \
|
|
120
|
+
--config examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml \
|
|
121
|
+
--trace-db "${CRAFTER_TRACE_DB}"
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
This provides a clean, CLI-native comparison between the teacher rollouts and the fine-tuned model.
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 6. Kick Off RL from the Fine-Tuned Model
|
|
129
|
+
|
|
130
|
+
Point `train_rl_from_sft.toml` at the same Modal task app and set `model.source` to your SFT job id:
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
uvx synth-ai train \
|
|
134
|
+
--type rl \
|
|
135
|
+
--config examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml \
|
|
136
|
+
--env-file .env \
|
|
137
|
+
--poll
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
The CLI streams rollout and judge metrics in real time. When the run finishes, you can re-use the Stage 5 config (substituting the RL job id) to quantify the uplift.
|
|
141
|
+
If you lose track of the produced RL label or want to confirm the latest status, run:
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
uvx synth-ai status jobs
|
|
145
|
+
uvx synth-ai status models
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
The first command shows job completion state; the second surfaces model IDs you can plug into new eval configs.
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## 7. Where to Go Next
|
|
153
|
+
|
|
154
|
+
- The original `examples/warming_up_to_rl` folder still contains deeper experiments (auto-curricula, modal renderers, etc.).
|
|
155
|
+
- Add more `eval_*.toml` configs to compare alternative judges or reward shaping strategies.
|
|
156
|
+
- Plug the filtered dataset into `uvx synth-ai files upload` if you want to share it with a teammate without copying JSONL around.
|
|
157
|
+
|
|
158
|
+
This directory now holds everything a blog post needs: configs, output locations, and the CLI entrypoints to reproduce the Crafter SFT → RL pipeline end-to-end.
|