synth-ai 0.2.9.dev0__py3-none-any.whl → 0.2.23.dev3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- examples/README.md +1 -0
- examples/__init__.py +16 -0
- examples/analyze_semantic_words.sh +17 -0
- examples/baseline/banking77_baseline.py +243 -0
- examples/baseline/banking77_pipeline_baseline.py +294 -0
- examples/baseline/crafter_baseline.py +407 -0
- examples/baseline/pokemon_red_baseline.py +326 -0
- examples/baseline/simple_baseline.py +56 -0
- examples/baseline/warming_up_to_rl_baseline.py +239 -0
- examples/blog_posts/gepa/README.md +355 -0
- examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
- examples/blog_posts/gepa/configs/banking77_gepa_test.toml +80 -0
- examples/blog_posts/gepa/configs/banking77_mipro_local.toml +50 -0
- examples/blog_posts/gepa/configs/banking77_pipeline_gepa_local.toml +101 -0
- examples/blog_posts/gepa/configs/banking77_pipeline_gepa_test.toml +96 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/hover_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/hover_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/pupa_gepa_local.toml +58 -0
- examples/blog_posts/gepa/configs/pupa_mipro_local.toml +52 -0
- examples/blog_posts/gepa/deploy_banking77_task_app.sh +54 -0
- examples/blog_posts/gepa/gepa_baseline.py +204 -0
- examples/blog_posts/gepa/query_prompts_example.py +97 -0
- examples/blog_posts/gepa/run_gepa_banking77.sh +112 -0
- examples/blog_posts/gepa/run_gepa_banking77_pipeline.sh +163 -0
- examples/blog_posts/gepa/task_apps.py +105 -0
- examples/blog_posts/gepa/test_gepa_local.sh +67 -0
- examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
- examples/blog_posts/mipro/README.md +415 -0
- examples/blog_posts/mipro/configs/banking77_mipro_local.toml +91 -0
- examples/blog_posts/mipro/configs/banking77_mipro_test.toml +87 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_gemini_flash_lite_local.toml +98 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_gpt41mini_local.toml +96 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_local.toml +94 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_test.toml +170 -0
- examples/blog_posts/mipro/deploy_banking77_pipeline_task_app.sh +59 -0
- examples/blog_posts/mipro/deploy_banking77_task_app.sh +41 -0
- examples/blog_posts/mipro/multi_step.md +79 -0
- examples/blog_posts/mipro/run_mipro_banking77.sh +191 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline.sh +171 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline_gemini_flash_lite.sh +177 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline_gpt41mini.sh +173 -0
- examples/blog_posts/mipro/verify_banking77_setup.sh +117 -0
- examples/blog_posts/pokemon_vl/README.md +98 -0
- examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
- examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +27 -0
- examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
- examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
- examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +43 -0
- examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
- examples/blog_posts/pokemon_vl/extract_images.py +239 -0
- examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
- examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
- examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
- examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
- examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
- examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
- examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
- examples/blog_posts/warming_up_to_rl/README.md +158 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
- examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
- examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
- examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +91 -0
- examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
- examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
- examples/crafter_debug_render.py +186 -0
- examples/dev/qwen3_32b_qlora_4xh100.toml +45 -0
- examples/gepa/banking77_pipeline_gepa.toml +96 -0
- examples/gepa/multi_stage_gepa_example.toml +84 -0
- examples/gepa/run_gepa_banking77_pipeline.sh +157 -0
- examples/multi_step/SFT_README.md +147 -0
- examples/multi_step/configs/README_verilog_rl.md +77 -0
- examples/multi_step/configs/VERILOG_REWARDS.md +103 -0
- examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +196 -0
- examples/multi_step/configs/crafter_eval_synth_qwen4b.toml +35 -0
- examples/multi_step/configs/crafter_eval_text_only_groq_qwen32b.toml +36 -0
- examples/multi_step/configs/crafter_rl_outcome.toml +75 -0
- examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +145 -0
- examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +84 -0
- examples/multi_step/configs/crafter_rl_stepwise_simple.toml +79 -0
- examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
- examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
- examples/multi_step/configs/crafter_synth_backend.md +40 -0
- examples/multi_step/configs/verilog_eval_groq_qwen32b.toml +31 -0
- examples/multi_step/configs/verilog_eval_synth_qwen8b.toml +33 -0
- examples/multi_step/configs/verilog_rl_lora.toml +147 -0
- examples/multi_step/convert_traces_to_sft.py +84 -0
- examples/multi_step/crafter_rl_lora.md +70 -0
- examples/multi_step/judges/crafter_backend_judge.py +220 -0
- examples/multi_step/judges/verilog_backend_judge.py +234 -0
- examples/multi_step/readme.md +48 -0
- examples/multi_step/run_sft_qwen30b.sh +45 -0
- examples/multi_step/sse_metrics_streaming_notes.md +357 -0
- examples/multi_step/task_app_config_notes.md +494 -0
- examples/multi_step/verilog_rl_lora.md +218 -0
- examples/qwen_coder/README.md +102 -0
- examples/qwen_coder/_shared.py +113 -0
- examples/qwen_coder/configs/coder_lora_30b.toml +60 -0
- examples/qwen_coder/configs/coder_lora_4b.toml +61 -0
- examples/qwen_coder/configs/coder_lora_small.toml +57 -0
- examples/qwen_coder/generate_dataset.py +98 -0
- examples/qwen_coder/infer_ft_smoke.py +65 -0
- examples/qwen_coder/infer_prod_proxy.py +73 -0
- examples/qwen_coder/infer_via_synth.py +87 -0
- examples/qwen_coder/scripts/infer_coder.sh +19 -0
- examples/qwen_coder/scripts/train_coder_30b.sh +22 -0
- examples/qwen_coder/sft_full_17b.py +103 -0
- examples/qwen_coder/sft_lora_30b.py +110 -0
- examples/qwen_coder/subset_jsonl.py +39 -0
- examples/qwen_coder/todos.md +38 -0
- examples/qwen_coder/validate_jsonl.py +60 -0
- examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
- examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
- examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
- examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
- examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
- examples/qwen_vl/QUICKSTART.md +327 -0
- examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
- examples/qwen_vl/README.md +152 -0
- examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
- examples/qwen_vl/RL_VISION_TESTING.md +333 -0
- examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
- examples/qwen_vl/SETUP_COMPLETE.md +274 -0
- examples/qwen_vl/VISION_TESTS_COMPLETE.md +489 -0
- examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
- examples/qwen_vl/__init__.py +2 -0
- examples/qwen_vl/collect_data_via_cli.md +415 -0
- examples/qwen_vl/collect_vision_traces.py +368 -0
- examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +110 -0
- examples/qwen_vl/configs/crafter_vlm_sft_example.toml +59 -0
- examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +26 -0
- examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
- examples/qwen_vl/configs/eval_gpt5nano_vision.toml +26 -0
- examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
- examples/qwen_vl/configs/filter_qwen3vl_sft.toml +49 -0
- examples/qwen_vl/configs/filter_vision_sft.toml +52 -0
- examples/qwen_vl/configs/filter_vision_test.toml +8 -0
- examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
- examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
- examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
- examples/qwen_vl/run_vision_comparison.sh +61 -0
- examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
- examples/qwen_vl/test_image_validation.py +201 -0
- examples/qwen_vl/test_sft_vision_data.py +110 -0
- examples/rl/README.md +169 -0
- examples/rl/configs/eval_base_qwen.toml +17 -0
- examples/rl/configs/eval_rl_qwen.toml +13 -0
- examples/rl/configs/rl_from_base_qwen.toml +62 -0
- examples/rl/configs/rl_from_base_qwen17.toml +80 -0
- examples/rl/configs/rl_from_ft_qwen.toml +37 -0
- examples/rl/download_dataset.py +80 -0
- examples/rl/run_eval.py +436 -0
- examples/rl/run_rl_and_save.py +111 -0
- examples/rl/task_app/README.md +21 -0
- {synth_ai/task/apps → examples/rl/task_app}/math_single_step.py +188 -50
- examples/rl/task_app/math_task_app.py +111 -0
- examples/run_crafter_demo.sh +10 -0
- examples/sdk_prompt_learning_example.py +55 -0
- examples/sft/README.md +139 -0
- examples/sft/configs/crafter_fft_qwen0p6b.toml +49 -0
- examples/sft/configs/crafter_lora_qwen0p6b.toml +49 -0
- examples/sft/evaluate.py +117 -0
- examples/sft/export_dataset.py +120 -0
- examples/sft/generate_traces.py +164 -0
- examples/swe/__init__.py +12 -0
- examples/swe/task_app/README.md +135 -0
- examples/swe/task_app/__init__.py +2 -0
- examples/swe/task_app/grpo_swe_mini.py +604 -0
- examples/swe/task_app/grpo_swe_mini_task_app.py +124 -0
- examples/swe/task_app/hosted/README.md +173 -0
- examples/swe/task_app/hosted/__init__.py +5 -0
- examples/swe/task_app/hosted/branching.py +143 -0
- examples/swe/task_app/hosted/environment_routes.py +1289 -0
- examples/swe/task_app/hosted/envs/__init__.py +1 -0
- examples/swe/task_app/hosted/envs/crafter/__init__.py +6 -0
- examples/swe/task_app/hosted/envs/crafter/app.py +1 -0
- examples/swe/task_app/hosted/envs/crafter/environment.py +522 -0
- examples/swe/task_app/hosted/envs/crafter/policy.py +478 -0
- examples/swe/task_app/hosted/envs/crafter/react_agent.py +108 -0
- examples/swe/task_app/hosted/envs/crafter/shared.py +305 -0
- examples/swe/task_app/hosted/envs/crafter/tools.py +47 -0
- examples/swe/task_app/hosted/envs/mini_swe/__init__.py +8 -0
- examples/swe/task_app/hosted/envs/mini_swe/environment.py +1191 -0
- examples/swe/task_app/hosted/envs/mini_swe/policy.py +355 -0
- examples/swe/task_app/hosted/envs/mini_swe/shared.py +83 -0
- examples/swe/task_app/hosted/envs/mini_swe/tools.py +96 -0
- examples/swe/task_app/hosted/hosted_app.py +204 -0
- examples/swe/task_app/hosted/inference/__init__.py +5 -0
- examples/swe/task_app/hosted/inference/openai_client.py +584 -0
- examples/swe/task_app/hosted/main.py +100 -0
- examples/swe/task_app/hosted/policy_routes.py +1094 -0
- examples/swe/task_app/hosted/registry.py +195 -0
- examples/swe/task_app/hosted/rollout.py +1905 -0
- examples/swe/task_app/hosted/storage/__init__.py +5 -0
- examples/swe/task_app/hosted/storage/volume.py +211 -0
- examples/swe/task_app/hosted/test_agents.py +161 -0
- examples/swe/task_app/hosted/test_service.py +136 -0
- examples/swe/task_app/hosted/utils.py +62 -0
- examples/swe/task_app/morph_backend.py +178 -0
- examples/task_apps/IMAGE_ONLY_EVAL_QUICKSTART.md +258 -0
- examples/task_apps/TESTING.md +275 -0
- examples/task_apps/banking77/__init__.py +6 -0
- examples/task_apps/banking77/banking77_task_app.py +912 -0
- examples/task_apps/banking77/deploy_wrapper.py +46 -0
- examples/task_apps/banking77_pipeline/__init__.py +6 -0
- examples/task_apps/banking77_pipeline/banking77_pipeline_task_app.py +489 -0
- examples/task_apps/banking77_pipeline/deploy_wrapper.py +50 -0
- examples/task_apps/crafter/CREATE_SFT_DATASET.md +286 -0
- examples/task_apps/crafter/EVAL_IMAGE_ONLY_RESULTS.md +152 -0
- examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +187 -0
- examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +281 -0
- examples/task_apps/crafter/QUERY_EXAMPLES.md +203 -0
- examples/task_apps/crafter/README_IMAGE_ONLY_EVAL.md +316 -0
- examples/task_apps/crafter/eval_image_only_gpt4o.toml +28 -0
- examples/task_apps/crafter/eval_text_only_groq_llama.toml +36 -0
- examples/task_apps/crafter/filter_sft_dataset.toml +16 -0
- examples/task_apps/crafter/task_app/README.md +42 -0
- examples/task_apps/crafter/task_app/__init__.py +5 -0
- examples/task_apps/crafter/task_app/grpo_crafter.py +1055 -0
- examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +146 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/README.md +173 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/branching.py +143 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/environment.py +532 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +583 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +122 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +253 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +999 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/main.py +100 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +1252 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/registry.py +195 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +2233 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/test_service.py +136 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +411 -0
- examples/task_apps/dev/pokemon_emerald/__init__.py +2 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/README.md +811 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/__init__.py +120 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/action.py +160 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/memory.py +155 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/perception.py +69 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/planning.py +96 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/simple.py +1502 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/system_prompt.py +4 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/grab_map.py +68 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/manual.py +216 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/__init__.py +35 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/emerald_utils.py +631 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/emulator.py +1544 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/enums.py +1428 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/memory_reader.py +4848 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/types.py +41 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/utils.py +298 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pyproject.toml +95 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/run.py +204 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/app.py +2152 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/client.py +429 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/frame_server.py +155 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/README.md +78 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/run_tests.py +122 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_agent_direct.py +76 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_agent_prompts.py +413 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_battle_state_formatting.py +204 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_dialogue_detection.py +133 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_dialogue_detection_comprehensive.py +229 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_direct_agent_emulator.py +300 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_fps_adjustment_pytest.py +205 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_house_to_outside_direct.py +200 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_house_to_outside_transition.py +284 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_map_ground_truth_comparison.py +468 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_memory_map.py +575 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_server_map_validation.py +311 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_torchic_state.py +259 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/anticheat.py +372 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/checkpoint.py +296 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/error_handler.py +275 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/get_local_ip.py +22 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/helpers.py +44 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/llm_logger.py +514 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_formatter.py +415 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_stitcher.py +1763 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_stitcher_singleton.py +33 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_trimmer.py +106 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_visualizer.py +334 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/ocr_dialogue.py +1020 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/recording.py +188 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/state_formatter.py +1481 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/vlm.py +862 -0
- examples/task_apps/dev/pokemon_emerald/modal_app.py +114 -0
- examples/task_apps/dev/pokemon_emerald/task_app/README.md +81 -0
- examples/task_apps/dev/pokemon_emerald/task_app/__init__.py +6 -0
- examples/task_apps/dev/pokemon_emerald/task_app/pokemon_emerald.py +685 -0
- examples/task_apps/enron/__init__.py +2 -0
- examples/task_apps/enron/eval_groq_qwen32.toml +16 -0
- examples/task_apps/enron/filter_sft.toml +5 -0
- examples/task_apps/enron/task_app/README.md +14 -0
- examples/task_apps/enron/task_app/__init__.py +1 -0
- examples/task_apps/enron/task_app/grpo_enron.py +906 -0
- examples/task_apps/enron/task_app/grpo_enron_task_app.py +146 -0
- examples/task_apps/enron/tests/__init__.py +4 -0
- examples/task_apps/enron/tests/conftest.py +115 -0
- examples/task_apps/enron/tests/integration/__init__.py +4 -0
- examples/task_apps/enron/tests/integration/test_enron_eval.py +179 -0
- examples/task_apps/enron/tests/integration/test_enron_rollout.py +135 -0
- examples/task_apps/enron/tests/unit/__init__.py +4 -0
- examples/task_apps/enron/tests/unit/test_enron_environment.py +126 -0
- examples/task_apps/gepa_benchmarks/__init__.py +7 -0
- examples/task_apps/gepa_benchmarks/common.py +260 -0
- examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
- examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
- examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
- examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
- examples/task_apps/math/README.md +21 -0
- examples/task_apps/math/math_single_step.py +1000 -0
- examples/task_apps/math/math_task_app.py +115 -0
- examples/task_apps/pokemon_battle/__init__.py +2 -0
- examples/task_apps/pokemon_battle/modal_app.py +104 -0
- examples/task_apps/pokemon_battle/task_app/README.md +68 -0
- examples/task_apps/pokemon_battle/task_app/__init__.py +6 -0
- examples/task_apps/pokemon_battle/task_app/pokemon_showdown.py +932 -0
- examples/task_apps/pokemon_red/EVAL_IMAGE_ONLY_COMPLETE.md +283 -0
- examples/task_apps/pokemon_red/EVAL_IMAGE_ONLY_STATUS.md +155 -0
- examples/task_apps/pokemon_red/README.md +356 -0
- examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +428 -0
- examples/task_apps/pokemon_red/__init__.py +3 -0
- examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +30 -0
- examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +224 -0
- examples/task_apps/pokemon_red/pallet_town_rl_config.toml +75 -0
- examples/task_apps/pokemon_red/task_app.py +1048 -0
- examples/task_apps/pokemon_red/test_pallet_town_rewards.py +193 -0
- examples/task_apps/sokoban/README.md +306 -0
- examples/task_apps/sokoban/__init__.py +3 -0
- examples/task_apps/sokoban/eval_groq_qwen32.toml +16 -0
- examples/task_apps/sokoban/eval_openai_gpt5.toml +16 -0
- examples/task_apps/sokoban/filter_sft.toml +5 -0
- examples/task_apps/sokoban/task_app.py +1058 -0
- examples/task_apps/sokoban/tests/__init__.py +4 -0
- examples/task_apps/sokoban/tests/conftest.py +113 -0
- examples/task_apps/sokoban/tests/integration/__init__.py +4 -0
- examples/task_apps/sokoban/tests/integration/test_sokoban_eval.py +57 -0
- examples/task_apps/sokoban/tests/integration/test_sokoban_rollout.py +198 -0
- examples/task_apps/sokoban/tests/unit/__init__.py +4 -0
- examples/task_apps/sokoban/tests/unit/test_sokoban_environment.py +114 -0
- examples/task_apps/verilog/__init__.py +1 -0
- examples/task_apps/verilog/eval_groq_qwen32b.toml +22 -0
- examples/task_apps/verilog/filter_sft.toml +5 -0
- examples/task_apps/verilog/task_app/README.md +12 -0
- examples/task_apps/verilog/task_app/__init__.py +1 -0
- examples/task_apps/verilog/task_app/grpo_verilog.py +1166 -0
- examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +145 -0
- examples/task_apps/verilog/tests/__init__.py +4 -0
- examples/task_apps/verilog/tests/conftest.py +115 -0
- examples/task_apps/verilog/tests/integration/__init__.py +4 -0
- examples/task_apps/verilog/tests/integration/test_verilog_eval.py +181 -0
- examples/task_apps/verilog/tests/integration/test_verilog_rollout.py +55 -0
- examples/task_apps/verilog/tests/unit/__init__.py +4 -0
- examples/task_apps/verilog/tests/unit/test_verilog_scoring.py +118 -0
- examples/tunnel_gepa_banking77/README.md +106 -0
- examples/tunnel_gepa_banking77/banking77_gepa_tunnel.toml +95 -0
- examples/tunnel_gepa_banking77/keep_tunnel_running.py +60 -0
- examples/tunnel_gepa_banking77/run_gepa_with_tunnel.sh +226 -0
- examples/vlm/PROPOSAL.md +53 -0
- examples/vlm/README.md +68 -0
- examples/vlm/configs/crafter_vlm_gpt4o.toml +49 -0
- examples/vlm/crafter_image_only_agent.py +207 -0
- examples/vlm/crafter_openai_vlm_agent.py +275 -0
- examples/vlm/filter_image_rows.py +63 -0
- examples/vlm/run_crafter_vlm_benchmark.py +316 -0
- examples/warming_up_to_rl/_utils.py +92 -0
- examples/warming_up_to_rl/analyze_trace_db.py +422 -0
- examples/warming_up_to_rl/configs/crafter_fft.toml +53 -0
- examples/warming_up_to_rl/configs/crafter_fft_4b.toml +54 -0
- examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +22 -0
- examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +15 -0
- examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +24 -0
- examples/warming_up_to_rl/configs/eval_stepwise_complex.toml +35 -0
- examples/warming_up_to_rl/configs/eval_stepwise_consistent.toml +26 -0
- examples/warming_up_to_rl/configs/eval_stepwise_per_achievement.toml +36 -0
- examples/warming_up_to_rl/configs/eval_stepwise_simple.toml +32 -0
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +85 -0
- examples/warming_up_to_rl/configs/rl_from_ft.toml +58 -0
- examples/warming_up_to_rl/export_trace_sft.py +837 -0
- examples/warming_up_to_rl/groq_test.py +97 -0
- examples/warming_up_to_rl/manage_secrets.py +131 -0
- examples/warming_up_to_rl/old/event_rewards.md +234 -0
- examples/warming_up_to_rl/old/notes.md +73 -0
- examples/warming_up_to_rl/readme.md +110 -0
- examples/warming_up_to_rl/run_eval.py +736 -0
- examples/warming_up_to_rl/run_fft_and_save.py +380 -0
- examples/warming_up_to_rl/run_local_rollout.py +239 -0
- examples/warming_up_to_rl/run_local_rollout_modal.py +248 -0
- examples/warming_up_to_rl/run_local_rollout_parallel.py +405 -0
- examples/warming_up_to_rl/run_local_rollout_traced.py +477 -0
- examples/warming_up_to_rl/run_rl_and_save.py +124 -0
- examples/warming_up_to_rl/run_rollout_remote.py +156 -0
- examples/warming_up_to_rl/task_app/README.md +42 -0
- examples/warming_up_to_rl/task_app/grpo_crafter.py +876 -0
- examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +454 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +253 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +729 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1114 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1891 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +129 -0
- examples/workflows/math_rl/configs/eval_base_qwen.toml +15 -0
- examples/workflows/math_rl/configs/eval_rl_qwen.toml +11 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen.toml +62 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +80 -0
- examples/workflows/math_rl/configs/rl_from_ft_qwen.toml +35 -0
- examples/workflows/math_rl/download_dataset.py +80 -0
- examples/workflows/math_rl/run_eval.py +436 -0
- examples/workflows/math_rl/run_rl_and_save.py +111 -0
- synth_ai/__init__.py +47 -23
- synth_ai/_utils/__init__.py +47 -0
- synth_ai/_utils/base_url.py +10 -0
- synth_ai/_utils/http.py +10 -0
- synth_ai/_utils/prompts.py +10 -0
- synth_ai/_utils/task_app_state.py +12 -0
- synth_ai/_utils/user_config.py +10 -0
- synth_ai/api/models/supported.py +514 -0
- synth_ai/api/train/__init__.py +60 -2
- synth_ai/api/train/builders.py +347 -39
- synth_ai/api/train/cli.py +895 -160
- synth_ai/api/train/config_finder.py +103 -25
- synth_ai/api/train/configs/__init__.py +65 -0
- synth_ai/api/train/configs/prompt_learning.py +496 -0
- synth_ai/api/train/configs/rl.py +188 -0
- synth_ai/api/train/configs/sft.py +99 -0
- synth_ai/api/train/configs/shared.py +81 -0
- synth_ai/api/train/env_resolver.py +70 -20
- synth_ai/api/train/pollers.py +29 -4
- synth_ai/api/train/prompt_learning.py +425 -0
- synth_ai/api/train/sft.py +390 -0
- synth_ai/api/train/supported_algos.py +147 -0
- synth_ai/api/train/task_app.py +6 -4
- synth_ai/api/train/utils.py +64 -52
- synth_ai/api/train/validators.py +1117 -0
- synth_ai/api/tunnel.py +49 -0
- synth_ai/auth/credentials.py +94 -0
- synth_ai/baseline/__init__.py +25 -0
- synth_ai/baseline/config.py +209 -0
- synth_ai/baseline/discovery.py +214 -0
- synth_ai/baseline/execution.py +146 -0
- synth_ai/cfgs.py +227 -0
- synth_ai/cli/__init__.py +85 -63
- synth_ai/cli/_modal_wrapper.py +31 -0
- synth_ai/cli/_storage.py +20 -0
- synth_ai/cli/_typer_patch.py +47 -0
- synth_ai/cli/_validate_task_app.py +29 -0
- synth_ai/cli/balance.py +16 -4
- synth_ai/cli/calc.py +36 -21
- synth_ai/cli/claude.py +70 -0
- synth_ai/cli/codex.py +267 -0
- synth_ai/cli/commands/__init__.py +18 -0
- synth_ai/cli/commands/baseline/__init__.py +12 -0
- synth_ai/cli/commands/baseline/core.py +637 -0
- synth_ai/cli/commands/baseline/list.py +93 -0
- synth_ai/cli/commands/demo/__init__.py +6 -0
- synth_ai/cli/commands/demo/core.py +163 -0
- synth_ai/cli/commands/eval/__init__.py +19 -0
- synth_ai/cli/commands/eval/core.py +1112 -0
- synth_ai/cli/commands/eval/errors.py +81 -0
- synth_ai/cli/commands/eval/validation.py +133 -0
- synth_ai/cli/commands/filter/__init__.py +12 -0
- synth_ai/cli/commands/filter/core.py +424 -0
- synth_ai/cli/commands/filter/errors.py +55 -0
- synth_ai/cli/commands/filter/validation.py +77 -0
- synth_ai/cli/commands/help/__init__.py +185 -0
- synth_ai/cli/commands/help/core.py +72 -0
- synth_ai/cli/commands/smoke/__init__.py +7 -0
- synth_ai/cli/commands/smoke/core.py +1437 -0
- synth_ai/cli/commands/status/__init__.py +66 -0
- synth_ai/cli/commands/status/client.py +192 -0
- synth_ai/cli/commands/status/config.py +92 -0
- synth_ai/cli/commands/status/errors.py +20 -0
- synth_ai/cli/commands/status/formatters.py +164 -0
- synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
- synth_ai/cli/commands/status/subcommands/files.py +79 -0
- synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
- synth_ai/cli/commands/status/subcommands/models.py +79 -0
- synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
- synth_ai/cli/commands/status/subcommands/runs.py +81 -0
- synth_ai/cli/commands/status/subcommands/session.py +183 -0
- synth_ai/cli/commands/status/subcommands/summary.py +47 -0
- synth_ai/cli/commands/status/subcommands/usage.py +203 -0
- synth_ai/cli/commands/status/utils.py +114 -0
- synth_ai/cli/commands/train/__init__.py +53 -0
- synth_ai/cli/commands/train/core.py +21 -0
- synth_ai/cli/commands/train/errors.py +117 -0
- synth_ai/cli/commands/train/judge_schemas.py +200 -0
- synth_ai/cli/commands/train/judge_validation.py +305 -0
- synth_ai/cli/commands/train/validation.py +386 -0
- synth_ai/cli/demo.py +32 -140
- synth_ai/cli/deploy.py +233 -0
- synth_ai/cli/eval/__init__.py +36 -0
- synth_ai/cli/eval/core.py +5 -0
- synth_ai/cli/eval/errors.py +31 -0
- synth_ai/cli/eval/validation.py +5 -0
- synth_ai/cli/filter/__init__.py +28 -0
- synth_ai/cli/filter/core.py +5 -0
- synth_ai/cli/filter/errors.py +23 -0
- synth_ai/cli/filter/validation.py +5 -0
- synth_ai/cli/legacy_root_backup.py +28 -22
- synth_ai/cli/lib/__init__.py +10 -0
- synth_ai/cli/lib/task_app_discovery.py +7 -0
- synth_ai/cli/lib/task_app_env.py +518 -0
- synth_ai/cli/mcp.py +34 -0
- synth_ai/cli/modal_serve/__init__.py +12 -0
- synth_ai/cli/modal_serve/core.py +14 -0
- synth_ai/cli/modal_serve/errors.py +8 -0
- synth_ai/cli/modal_serve/validation.py +11 -0
- synth_ai/cli/opencode.py +256 -0
- synth_ai/cli/recent.py +13 -7
- synth_ai/cli/rl_demo.py +156 -116
- synth_ai/cli/root.py +131 -132
- synth_ai/cli/serve/__init__.py +12 -0
- synth_ai/cli/serve/core.py +14 -0
- synth_ai/cli/serve/errors.py +8 -0
- synth_ai/cli/serve/validation.py +11 -0
- synth_ai/cli/setup.py +49 -0
- synth_ai/cli/status.py +7 -125
- synth_ai/cli/task_app_deploy.py +7 -0
- synth_ai/cli/task_app_list.py +25 -0
- synth_ai/cli/task_app_modal_serve.py +11 -0
- synth_ai/cli/task_app_serve.py +11 -0
- synth_ai/cli/task_apps.py +2284 -257
- synth_ai/cli/traces.py +9 -5
- synth_ai/cli/train/__init__.py +12 -0
- synth_ai/cli/train/core.py +21 -0
- synth_ai/cli/train/errors.py +8 -0
- synth_ai/cli/train/validation.py +24 -0
- synth_ai/cli/train.py +5 -0
- synth_ai/cli/turso.py +73 -0
- synth_ai/cli/watch.py +13 -18
- synth_ai/demos/__init__.py +10 -0
- synth_ai/demos/core/__init__.py +28 -1
- synth_ai/demos/core/cli.py +579 -291
- synth_ai/demos/crafter/__init__.py +1 -0
- synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
- synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
- synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
- synth_ai/demos/demo_registry.py +176 -0
- synth_ai/demos/demo_task_apps/__init__.py +3 -3
- synth_ai/demos/demo_task_apps/core.py +64 -28
- synth_ai/demos/demo_task_apps/crafter/__init__.py +1 -0
- synth_ai/demos/demo_task_apps/crafter/configs/crafter_fft_4b.toml +53 -0
- synth_ai/demos/demo_task_apps/crafter/configs/rl_from_base_qwen4b.toml +73 -0
- synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +184 -0
- synth_ai/demos/demo_task_apps/math/_common.py +1 -2
- synth_ai/demos/demo_task_apps/math/app.py +2 -1
- synth_ai/demos/demo_task_apps/math/deploy_modal.py +3 -6
- synth_ai/demos/demo_task_apps/math/modal_task_app.py +185 -83
- synth_ai/demos/demo_task_apps/math/task_app_entry.py +0 -2
- synth_ai/demos/math/__init__.py +1 -0
- synth_ai/demos/math/_common.py +16 -0
- synth_ai/demos/math/app.py +38 -0
- synth_ai/demos/math/config.toml +76 -0
- synth_ai/demos/math/deploy_modal.py +54 -0
- synth_ai/demos/math/modal_task_app.py +703 -0
- synth_ai/demos/math/task_app_entry.py +51 -0
- synth_ai/environments/environment/core.py +7 -1
- synth_ai/environments/examples/bandit/engine.py +12 -5
- synth_ai/environments/examples/bandit/environment.py +0 -1
- synth_ai/environments/examples/bandit/taskset.py +4 -4
- synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
- synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
- synth_ai/environments/examples/crafter_classic/environment.py +93 -2
- synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
- synth_ai/environments/examples/enron/engine.py +7 -2
- synth_ai/environments/examples/enron/environment.py +68 -0
- synth_ai/environments/examples/red/engine.py +60 -12
- synth_ai/environments/examples/red/engine_helpers/memory_map.py +7 -0
- synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
- synth_ai/environments/examples/red/engine_helpers/reward_library/pallet_town_progression.py +477 -0
- synth_ai/environments/examples/red/engine_helpers/state_extraction.py +32 -0
- synth_ai/environments/examples/red/environment.py +86 -0
- synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
- synth_ai/environments/examples/sokoban/taskset.py +116 -0
- synth_ai/environments/examples/verilog/engine.py +104 -12
- synth_ai/environments/examples/wordle/environment.py +0 -1
- synth_ai/environments/reproducibility/tree.py +5 -6
- synth_ai/environments/service/app.py +11 -12
- synth_ai/environments/service/core_routes.py +10 -9
- synth_ai/environments/stateful/engine.py +1 -1
- synth_ai/environments/tasks/core.py +1 -0
- synth_ai/environments/tasks/filters.py +5 -6
- synth_ai/environments/tasks/utils.py +4 -5
- synth_ai/evals/__init__.py +15 -0
- synth_ai/evals/base.py +14 -5
- synth_ai/evals/client.py +82 -0
- synth_ai/evals/types.py +42 -0
- synth_ai/http.py +8 -22
- synth_ai/http_client.py +45 -12
- synth_ai/inference/__init__.py +0 -2
- synth_ai/inference/client.py +21 -7
- synth_ai/jobs/client.py +129 -80
- synth_ai/judge_schemas.py +127 -0
- synth_ai/learning/__init__.py +51 -6
- synth_ai/learning/algorithms.py +14 -0
- synth_ai/learning/client.py +122 -30
- synth_ai/learning/config.py +2 -40
- synth_ai/learning/constants.py +0 -2
- synth_ai/learning/ft_client.py +4 -56
- synth_ai/learning/health.py +14 -8
- synth_ai/learning/jobs.py +43 -47
- synth_ai/learning/prompt_learning_client.py +276 -0
- synth_ai/learning/prompt_learning_types.py +185 -0
- synth_ai/{rl → learning/rl}/__init__.py +14 -5
- synth_ai/learning/rl/client.py +269 -0
- synth_ai/learning/rl/config.py +31 -0
- synth_ai/{rl → learning/rl}/contracts.py +5 -10
- synth_ai/{rl → learning/rl}/env_keys.py +45 -16
- synth_ai/learning/rl/secrets.py +13 -0
- synth_ai/learning/rl_client.py +2 -253
- synth_ai/learning/sft/__init__.py +29 -0
- synth_ai/learning/sft/client.py +68 -0
- synth_ai/learning/sft/config.py +270 -0
- synth_ai/learning/sft/data.py +698 -0
- synth_ai/learning/sse.py +25 -26
- synth_ai/learning/validators.py +29 -25
- synth_ai/mcp/__init__.py +5 -0
- synth_ai/mcp/__main__.py +8 -0
- synth_ai/mcp/main.py +254 -0
- synth_ai/mcp/setup.py +100 -0
- synth_ai/modal.py +257 -0
- synth_ai/pricing/__init__.py +3 -0
- synth_ai/pricing/model_pricing.py +64 -0
- synth_ai/session/__init__.py +75 -0
- synth_ai/session/client.py +383 -0
- synth_ai/session/constants.py +63 -0
- synth_ai/session/exceptions.py +105 -0
- synth_ai/session/manager.py +139 -0
- synth_ai/session/models.py +89 -0
- synth_ai/session/query.py +110 -0
- synth_ai/spec/__init__.py +46 -0
- synth_ai/spec/dataclasses.py +149 -0
- synth_ai/spec/loader.py +144 -0
- synth_ai/spec/serializer.py +199 -0
- synth_ai/spec/validation.py +250 -0
- synth_ai/streaming/__init__.py +29 -0
- synth_ai/streaming/config.py +94 -0
- synth_ai/streaming/handlers.py +589 -0
- synth_ai/streaming/streamer.py +320 -0
- synth_ai/streaming/types.py +95 -0
- synth_ai/task/__init__.py +50 -30
- synth_ai/task/apps/__init__.py +63 -19
- synth_ai/task/auth.py +35 -23
- synth_ai/task/client.py +15 -13
- synth_ai/task/config.py +261 -0
- synth_ai/task/contracts.py +165 -64
- synth_ai/task/datasets.py +9 -6
- synth_ai/task/errors.py +11 -10
- synth_ai/task/health.py +17 -11
- synth_ai/task/inference_api.py +101 -0
- synth_ai/task/json.py +58 -24
- synth_ai/task/proxy.py +59 -66
- synth_ai/task/rubrics/__init__.py +55 -0
- synth_ai/task/rubrics/loaders.py +156 -0
- synth_ai/task/rubrics/models.py +57 -0
- synth_ai/task/rubrics/scoring.py +116 -0
- synth_ai/task/rubrics/strict.py +149 -0
- synth_ai/task/rubrics.py +22 -15
- synth_ai/task/server.py +65 -31
- synth_ai/task/trace_correlation_helpers.py +328 -0
- synth_ai/task/tracing_utils.py +44 -28
- synth_ai/task/validators.py +449 -6
- synth_ai/task/vendors.py +5 -7
- synth_ai/tracing_v3/__init__.py +4 -0
- synth_ai/tracing_v3/abstractions.py +21 -4
- synth_ai/tracing_v3/config.py +167 -22
- synth_ai/tracing_v3/constants.py +21 -0
- synth_ai/tracing_v3/db_config.py +42 -29
- synth_ai/tracing_v3/decorators.py +80 -45
- synth_ai/tracing_v3/examples/basic_usage.py +15 -9
- synth_ai/tracing_v3/hooks.py +6 -4
- synth_ai/tracing_v3/llm_call_record_helpers.py +161 -61
- synth_ai/tracing_v3/migration_helper.py +1 -2
- synth_ai/tracing_v3/replica_sync.py +12 -7
- synth_ai/tracing_v3/serialization.py +130 -0
- synth_ai/tracing_v3/session_tracer.py +73 -16
- synth_ai/tracing_v3/storage/base.py +89 -1
- synth_ai/tracing_v3/storage/config.py +63 -16
- synth_ai/tracing_v3/storage/factory.py +11 -9
- synth_ai/tracing_v3/storage/utils.py +15 -11
- synth_ai/tracing_v3/trace_utils.py +317 -0
- synth_ai/tracing_v3/turso/__init__.py +8 -21
- synth_ai/tracing_v3/turso/daemon.py +123 -15
- synth_ai/tracing_v3/turso/models.py +5 -2
- synth_ai/tracing_v3/turso/native_manager.py +1293 -0
- synth_ai/tracing_v3/utils.py +5 -4
- synth_ai/tunnel.py +143 -0
- synth_ai/tunnel_deploy.py +278 -0
- synth_ai/types.py +8 -0
- synth_ai/urls.py +11 -0
- synth_ai/utils/__init__.py +166 -0
- synth_ai/utils/agents.py +74 -0
- synth_ai/utils/apps.py +152 -0
- synth_ai/utils/base_url.py +94 -0
- synth_ai/utils/bin.py +39 -0
- synth_ai/utils/claude.py +36 -0
- synth_ai/utils/cli.py +284 -0
- synth_ai/utils/config.py +81 -0
- synth_ai/utils/env.py +346 -0
- synth_ai/utils/errors.py +85 -0
- synth_ai/utils/http.py +172 -0
- synth_ai/utils/json.py +72 -0
- synth_ai/utils/log_filter.py +99 -0
- synth_ai/utils/logging.py +198 -0
- synth_ai/utils/modal.py +299 -0
- synth_ai/utils/paths.py +95 -0
- synth_ai/utils/process.py +233 -0
- synth_ai/utils/prompts.py +39 -0
- synth_ai/utils/sqld.py +122 -0
- synth_ai/utils/ssl.py +25 -0
- synth_ai/utils/task_app_discovery.py +882 -0
- synth_ai/utils/task_app_env.py +186 -0
- synth_ai/utils/task_app_state.py +318 -0
- synth_ai/utils/tunnel/__init__.py +12 -0
- synth_ai/utils/tunnel/config.py +55 -0
- synth_ai/utils/user_config.py +137 -0
- synth_ai/uvicorn.py +77 -0
- synth_ai-0.2.23.dev3.dist-info/METADATA +357 -0
- synth_ai-0.2.23.dev3.dist-info/RECORD +983 -0
- {synth_ai-0.2.9.dev0.dist-info → synth_ai-0.2.23.dev3.dist-info}/entry_points.txt +0 -1
- {synth_ai-0.2.9.dev0.dist-info → synth_ai-0.2.23.dev3.dist-info}/top_level.txt +1 -0
- synth_ai/cli/man.py +0 -106
- synth_ai/core/experiment.py +0 -15
- synth_ai/core/system.py +0 -15
- synth_ai/demo_registry.py +0 -258
- synth_ai/environments/examples/sokoban/units/astar_common.py +0 -95
- synth_ai/experimental/synth_oss.py +0 -446
- synth_ai/handshake.py +0 -107
- synth_ai/install_sqld.sh +0 -40
- synth_ai/learning/offline/dpo.py +0 -0
- synth_ai/learning/offline/providers.py +0 -7
- synth_ai/learning/offline/sft.py +0 -0
- synth_ai/learning/offline/shared.py +0 -0
- synth_ai/learning/online/grpo.py +0 -0
- synth_ai/learning/online/irft.py +0 -0
- synth_ai/learning/prompts/banking77_injection_eval.py +0 -168
- synth_ai/learning/prompts/gepa.py +0 -0
- synth_ai/learning/prompts/hello_world_in_context_injection_ex.py +0 -213
- synth_ai/learning/prompts/mipro.py +0 -289
- synth_ai/learning/prompts/random_search.py +0 -246
- synth_ai/learning/prompts/run_mipro_banking77.py +0 -172
- synth_ai/learning/prompts/run_random_search_banking77.py +0 -324
- synth_ai/lm/__init__.py +0 -51
- synth_ai/lm/caching/constants.py +0 -6
- synth_ai/lm/caching/dbs.py +0 -0
- synth_ai/lm/caching/ephemeral.py +0 -102
- synth_ai/lm/caching/handler.py +0 -137
- synth_ai/lm/caching/initialize.py +0 -11
- synth_ai/lm/caching/persistent.py +0 -114
- synth_ai/lm/config.py +0 -110
- synth_ai/lm/constants.py +0 -32
- synth_ai/lm/core/__init__.py +0 -8
- synth_ai/lm/core/all.py +0 -73
- synth_ai/lm/core/exceptions.py +0 -7
- synth_ai/lm/core/main.py +0 -319
- synth_ai/lm/core/main_v3.py +0 -594
- synth_ai/lm/core/synth_models.py +0 -48
- synth_ai/lm/core/vendor_clients.py +0 -188
- synth_ai/lm/cost/monitor.py +0 -1
- synth_ai/lm/cost/statefulness.py +0 -1
- synth_ai/lm/injection.py +0 -80
- synth_ai/lm/overrides.py +0 -206
- synth_ai/lm/provider_support/__init__.py +0 -8
- synth_ai/lm/provider_support/anthropic.py +0 -972
- synth_ai/lm/provider_support/openai.py +0 -1139
- synth_ai/lm/provider_support/suppress_logging.py +0 -31
- synth_ai/lm/structured_outputs/handler.py +0 -440
- synth_ai/lm/structured_outputs/inject.py +0 -297
- synth_ai/lm/structured_outputs/rehabilitate.py +0 -185
- synth_ai/lm/tools/__init__.py +0 -3
- synth_ai/lm/tools/base.py +0 -172
- synth_ai/lm/unified_interface.py +0 -202
- synth_ai/lm/vendors/base.py +0 -81
- synth_ai/lm/vendors/core/anthropic_api.py +0 -387
- synth_ai/lm/vendors/core/gemini_api.py +0 -292
- synth_ai/lm/vendors/core/mistral_api.py +0 -322
- synth_ai/lm/vendors/core/openai_api.py +0 -225
- synth_ai/lm/vendors/core/synth_dev_api.py +0 -0
- synth_ai/lm/vendors/local/ollama.py +0 -0
- synth_ai/lm/vendors/openai_standard.py +0 -780
- synth_ai/lm/vendors/openai_standard_responses.py +0 -256
- synth_ai/lm/vendors/retries.py +0 -22
- synth_ai/lm/vendors/supported/custom_endpoint.py +0 -417
- synth_ai/lm/vendors/supported/deepseek.py +0 -69
- synth_ai/lm/vendors/supported/grok.py +0 -75
- synth_ai/lm/vendors/supported/groq.py +0 -16
- synth_ai/lm/vendors/supported/ollama.py +0 -15
- synth_ai/lm/vendors/supported/openrouter.py +0 -74
- synth_ai/lm/vendors/supported/together.py +0 -11
- synth_ai/lm/vendors/synth_client.py +0 -808
- synth_ai/lm/warmup.py +0 -186
- synth_ai/rl/secrets.py +0 -19
- synth_ai/scripts/verify_rewards.py +0 -100
- synth_ai/task/apps/grpo_crafter.py +0 -438
- synth_ai/tracing/__init__.py +0 -30
- synth_ai/tracing_v1/__init__.py +0 -33
- synth_ai/tracing_v3/turso/manager.py +0 -774
- synth_ai/v0/tracing/abstractions.py +0 -224
- synth_ai/v0/tracing/base_client.py +0 -91
- synth_ai/v0/tracing/client_manager.py +0 -131
- synth_ai/v0/tracing/config.py +0 -142
- synth_ai/v0/tracing/context.py +0 -146
- synth_ai/v0/tracing/decorators.py +0 -682
- synth_ai/v0/tracing/events/__init__.py +0 -0
- synth_ai/v0/tracing/events/manage.py +0 -147
- synth_ai/v0/tracing/events/scope.py +0 -86
- synth_ai/v0/tracing/events/store.py +0 -228
- synth_ai/v0/tracing/immediate_client.py +0 -151
- synth_ai/v0/tracing/local.py +0 -18
- synth_ai/v0/tracing/log_client_base.py +0 -73
- synth_ai/v0/tracing/retry_queue.py +0 -186
- synth_ai/v0/tracing/trackers.py +0 -515
- synth_ai/v0/tracing/upload.py +0 -512
- synth_ai/v0/tracing/utils.py +0 -9
- synth_ai/v0/tracing_v1/__init__.py +0 -16
- synth_ai/v0/tracing_v1/abstractions.py +0 -224
- synth_ai/v0/tracing_v1/base_client.py +0 -91
- synth_ai/v0/tracing_v1/client_manager.py +0 -131
- synth_ai/v0/tracing_v1/config.py +0 -142
- synth_ai/v0/tracing_v1/context.py +0 -146
- synth_ai/v0/tracing_v1/decorators.py +0 -703
- synth_ai/v0/tracing_v1/events/__init__.py +0 -0
- synth_ai/v0/tracing_v1/events/manage.py +0 -147
- synth_ai/v0/tracing_v1/events/scope.py +0 -86
- synth_ai/v0/tracing_v1/events/store.py +0 -228
- synth_ai/v0/tracing_v1/immediate_client.py +0 -151
- synth_ai/v0/tracing_v1/local.py +0 -18
- synth_ai/v0/tracing_v1/log_client_base.py +0 -73
- synth_ai/v0/tracing_v1/retry_queue.py +0 -186
- synth_ai/v0/tracing_v1/trackers.py +0 -515
- synth_ai/v0/tracing_v1/upload.py +0 -527
- synth_ai/v0/tracing_v1/utils.py +0 -9
- synth_ai/zyk/__init__.py +0 -30
- synth_ai-0.2.9.dev0.dist-info/METADATA +0 -131
- synth_ai-0.2.9.dev0.dist-info/RECORD +0 -444
- {synth_ai/lm/caching → examples/task_apps}/__init__.py +0 -0
- {synth_ai/lm/cost → examples/task_apps/crafter}/__init__.py +0 -0
- {synth_ai/lm/structured_outputs → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server}/__init__.py +0 -0
- {synth_ai/lm/vendors → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests}/__init__.py +0 -0
- {synth_ai/lm/vendors/core → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils}/__init__.py +0 -0
- {synth_ai/lm/vendors/local → examples/task_apps/math}/__init__.py +0 -0
- {synth_ai/lm/vendors/supported → examples/workflows}/__init__.py +0 -0
- {synth_ai/v0/tracing → examples/workflows/math_rl}/__init__.py +0 -0
- /synth_ai/{compound/cais.py → cli/__main__.py} +0 -0
- /synth_ai/{learning/filtering.py → py.typed} +0 -0
- {synth_ai-0.2.9.dev0.dist-info → synth_ai-0.2.23.dev3.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.9.dev0.dist-info → synth_ai-0.2.23.dev3.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
"""Quick smoke test that drives a rollout through the Groq proxy-backed Crafter Task App."""
|
|
2
|
+
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import argparse
|
|
6
|
+
import asyncio
|
|
7
|
+
import os
|
|
8
|
+
from typing import Any
|
|
9
|
+
|
|
10
|
+
from synth_ai.task import (
|
|
11
|
+
INTERACT_TOOL_SCHEMA,
|
|
12
|
+
RolloutEnvSpec,
|
|
13
|
+
RolloutPolicySpec,
|
|
14
|
+
RolloutRequest,
|
|
15
|
+
TaskAppClient,
|
|
16
|
+
to_jsonable,
|
|
17
|
+
)
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
def _build_policy_payload(seed: int, model: str) -> dict[str, Any]:
|
|
21
|
+
return {
|
|
22
|
+
"model": model,
|
|
23
|
+
"tools": INTERACT_TOOL_SCHEMA,
|
|
24
|
+
"messages": [
|
|
25
|
+
{
|
|
26
|
+
"role": "system",
|
|
27
|
+
"content": "You control the Crafter agent. Think briefly, then call the interact tool with 3-5 actions to maximize achievements.",
|
|
28
|
+
},
|
|
29
|
+
{
|
|
30
|
+
"role": "user",
|
|
31
|
+
"content": (
|
|
32
|
+
f"Environment seed {seed}. Plan initial survival/crafting steps and then call interact with concrete actions."
|
|
33
|
+
),
|
|
34
|
+
},
|
|
35
|
+
],
|
|
36
|
+
}
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
async def run(args: argparse.Namespace) -> None:
|
|
40
|
+
client = TaskAppClient(args.base_url, api_key=args.api_key, timeout=args.timeout)
|
|
41
|
+
|
|
42
|
+
health = await client.health()
|
|
43
|
+
print("/health →", to_jsonable(health))
|
|
44
|
+
|
|
45
|
+
info = await client.info()
|
|
46
|
+
print("/info →", to_jsonable(info))
|
|
47
|
+
|
|
48
|
+
inference_url = args.inference_url or f"{args.base_url.rstrip('/')}/proxy/groq"
|
|
49
|
+
|
|
50
|
+
from synth_ai.task.contracts import RolloutMode
|
|
51
|
+
request = RolloutRequest(
|
|
52
|
+
run_id=args.run_id,
|
|
53
|
+
mode=RolloutMode.EVAL,
|
|
54
|
+
env=RolloutEnvSpec(env_name="crafter", seed=args.seed, config={"seed": args.seed}),
|
|
55
|
+
policy=RolloutPolicySpec(
|
|
56
|
+
policy_name="groq-smoke",
|
|
57
|
+
config={"model": args.model, "inference_url": inference_url.rstrip("/")},
|
|
58
|
+
),
|
|
59
|
+
ops=[
|
|
60
|
+
{"type": "policy", "payload": _build_policy_payload(args.seed, args.model)},
|
|
61
|
+
{"type": "env"},
|
|
62
|
+
],
|
|
63
|
+
)
|
|
64
|
+
|
|
65
|
+
response = await client.rollout(request)
|
|
66
|
+
print("rollout.metrics →", to_jsonable(response.metrics.model_dump()))
|
|
67
|
+
for idx, step in enumerate(response.trajectories[0].steps, start=1):
|
|
68
|
+
print(
|
|
69
|
+
f"step[{idx}] tool_calls={step.tool_calls} reward={step.reward} info={to_jsonable(step.info)}"
|
|
70
|
+
)
|
|
71
|
+
|
|
72
|
+
|
|
73
|
+
def _parse_args() -> argparse.Namespace:
|
|
74
|
+
parser = argparse.ArgumentParser(description=__doc__)
|
|
75
|
+
parser.add_argument(
|
|
76
|
+
"--base-url", default=os.getenv("TASK_APP_BASE_URL", "http://localhost:8000")
|
|
77
|
+
)
|
|
78
|
+
parser.add_argument(
|
|
79
|
+
"--api-key",
|
|
80
|
+
default=os.getenv("TASK_APP_API_KEY"),
|
|
81
|
+
required=os.getenv("TASK_APP_API_KEY") is None,
|
|
82
|
+
)
|
|
83
|
+
parser.add_argument("--model", default=os.getenv("GROQ_MODEL", "groq/mixtral-8x7b"))
|
|
84
|
+
parser.add_argument("--inference-url", default=os.getenv("TASK_APP_INFERENCE_URL"))
|
|
85
|
+
parser.add_argument("--seed", type=int, default=int(os.getenv("CRAFTER_TEST_SEED", "42")))
|
|
86
|
+
parser.add_argument("--run-id", default=os.getenv("TASK_APP_RUN_ID", "groq-test"))
|
|
87
|
+
parser.add_argument("--timeout", type=float, default=float(os.getenv("TASK_APP_TIMEOUT", "60")))
|
|
88
|
+
return parser.parse_args()
|
|
89
|
+
|
|
90
|
+
|
|
91
|
+
def main() -> None:
|
|
92
|
+
args = _parse_args()
|
|
93
|
+
asyncio.run(run(args))
|
|
94
|
+
|
|
95
|
+
|
|
96
|
+
if __name__ == "__main__":
|
|
97
|
+
main()
|
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
from __future__ import annotations
|
|
3
|
+
|
|
4
|
+
import argparse
|
|
5
|
+
import os
|
|
6
|
+
import shlex
|
|
7
|
+
import subprocess
|
|
8
|
+
import sys
|
|
9
|
+
import tempfile
|
|
10
|
+
from pathlib import Path
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
def load_env_file(path: Path) -> dict[str, str]:
|
|
14
|
+
env: dict[str, str] = {}
|
|
15
|
+
if not path.exists():
|
|
16
|
+
raise FileNotFoundError(f".env not found at {path}")
|
|
17
|
+
for line in path.read_text(encoding="utf-8").splitlines():
|
|
18
|
+
line = line.strip()
|
|
19
|
+
if not line or line.startswith("#") or "=" not in line:
|
|
20
|
+
continue
|
|
21
|
+
k, v = line.split("=", 1)
|
|
22
|
+
env[k.strip()] = v.strip().strip("'").strip('"')
|
|
23
|
+
return env
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
def write_temp_env(kv: dict[str, str]) -> Path:
|
|
27
|
+
fd, p = tempfile.mkstemp(prefix="modal_secret_", suffix=".env")
|
|
28
|
+
path = Path(p)
|
|
29
|
+
with os.fdopen(fd, "w", encoding="utf-8") as fh:
|
|
30
|
+
for k, v in kv.items():
|
|
31
|
+
fh.write(f"{k}={v}\n")
|
|
32
|
+
return path
|
|
33
|
+
|
|
34
|
+
|
|
35
|
+
def run(cmd: str) -> tuple[int, str]:
|
|
36
|
+
proc = subprocess.run(
|
|
37
|
+
cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
|
|
38
|
+
)
|
|
39
|
+
return proc.returncode, proc.stdout
|
|
40
|
+
|
|
41
|
+
|
|
42
|
+
def ensure_secret(secret_name: str, kv: dict[str, str]) -> None:
|
|
43
|
+
if not kv:
|
|
44
|
+
print(f"[skip] {secret_name}: no values provided")
|
|
45
|
+
return
|
|
46
|
+
# Prefer passing KEY=VALUE pairs to avoid Typer --env-file bug under some shells
|
|
47
|
+
kv_args = " ".join([f"{shlex.quote(k)}={shlex.quote(v)}" for k, v in kv.items()])
|
|
48
|
+
|
|
49
|
+
# Try plain modal first; fallback to uv run modal
|
|
50
|
+
def _create() -> tuple[int, str]:
|
|
51
|
+
return run(f"modal secret create {shlex.quote(secret_name)} {kv_args}")
|
|
52
|
+
|
|
53
|
+
def _delete() -> tuple[int, str]:
|
|
54
|
+
return run(f"printf 'y\n' | modal secret delete {shlex.quote(secret_name)}")
|
|
55
|
+
|
|
56
|
+
rc, out = _create()
|
|
57
|
+
if rc != 0:
|
|
58
|
+
# Fallback: use uv run modal
|
|
59
|
+
rc_uv, out_uv = run(f"uv run modal secret create {shlex.quote(secret_name)} {kv_args}")
|
|
60
|
+
if rc_uv == 0:
|
|
61
|
+
print(f"[ok] secret ready: {secret_name}")
|
|
62
|
+
return
|
|
63
|
+
# Try delete+create with both variants
|
|
64
|
+
print(f"[info] create failed for {secret_name}, attempting delete+create…")
|
|
65
|
+
_ = _delete()
|
|
66
|
+
rc2, out2 = _create()
|
|
67
|
+
if rc2 != 0:
|
|
68
|
+
_ = run(f"printf 'y\n' | uv run modal secret delete {shlex.quote(secret_name)}")
|
|
69
|
+
rc3, out3 = run(f"uv run modal secret create {shlex.quote(secret_name)} {kv_args}")
|
|
70
|
+
if rc3 != 0:
|
|
71
|
+
print(out3 or out2 or out_uv or out)
|
|
72
|
+
raise RuntimeError(f"failed to create secret {secret_name}")
|
|
73
|
+
print(f"[ok] secret ready: {secret_name}")
|
|
74
|
+
|
|
75
|
+
|
|
76
|
+
def main() -> None:
|
|
77
|
+
ap = argparse.ArgumentParser(
|
|
78
|
+
description="Sync .env keys into Modal secret bundles for the task app"
|
|
79
|
+
)
|
|
80
|
+
ap.add_argument(
|
|
81
|
+
"--env-path", default=str(Path(__file__).parent / ".env"), help="Path to .env with keys"
|
|
82
|
+
)
|
|
83
|
+
args = ap.parse_args()
|
|
84
|
+
|
|
85
|
+
env = load_env_file(Path(args.env_path))
|
|
86
|
+
|
|
87
|
+
# Secrets used by the task app
|
|
88
|
+
groq_secret = {
|
|
89
|
+
k: v
|
|
90
|
+
for k, v in {
|
|
91
|
+
"GROQ_API_KEY": env.get("GROQ_API_KEY", ""),
|
|
92
|
+
"dev_groq_api_key": env.get("GROQ_API_KEY", ""),
|
|
93
|
+
}.items()
|
|
94
|
+
if v
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
openai_secret = {
|
|
98
|
+
k: v
|
|
99
|
+
for k, v in {
|
|
100
|
+
"OPENAI_API_KEY": env.get("OPENAI_API_KEY", ""),
|
|
101
|
+
"dev_openai_api_key": env.get("OPENAI_API_KEY", ""),
|
|
102
|
+
}.items()
|
|
103
|
+
if v
|
|
104
|
+
}
|
|
105
|
+
|
|
106
|
+
# Optional: backend key (not mounted by task app today, but useful to keep consistent)
|
|
107
|
+
synth_secret = (
|
|
108
|
+
{"SYNTH_API_KEY": env.get("SYNTH_API_KEY", "")} if env.get("SYNTH_API_KEY") else {}
|
|
109
|
+
)
|
|
110
|
+
|
|
111
|
+
env_key = env.get("ENVIRONMENT_API_KEY", "")
|
|
112
|
+
if env_key:
|
|
113
|
+
print(
|
|
114
|
+
"Skipping Modal secret 'crafter-environment-sdk'; the task app now expects "
|
|
115
|
+
"ENVIRONMENT_API_KEY via --env-file so the CLI-minted value stays in sync."
|
|
116
|
+
)
|
|
117
|
+
ensure_secret("groq-api-key", groq_secret)
|
|
118
|
+
ensure_secret("openai-api-key", openai_secret)
|
|
119
|
+
if synth_secret:
|
|
120
|
+
ensure_secret("synth-api-key", synth_secret)
|
|
121
|
+
|
|
122
|
+
print("All requested secrets ensured. Redeploy the app if you updated any secrets:")
|
|
123
|
+
print(" uv run modal deploy examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py")
|
|
124
|
+
|
|
125
|
+
|
|
126
|
+
if __name__ == "__main__":
|
|
127
|
+
try:
|
|
128
|
+
main()
|
|
129
|
+
except Exception as e:
|
|
130
|
+
print(f"[error] {type(e).__name__}: {e}")
|
|
131
|
+
sys.exit(1)
|
|
@@ -0,0 +1,234 @@
|
|
|
1
|
+
# Crafter Event-Level Rewards (NOTES)
|
|
2
|
+
|
|
3
|
+
This note outlines how to support event-level reward layering for Crafter across the warming_up_to_rl task app and the monorepo clustered_training RL pipeline.
|
|
4
|
+
|
|
5
|
+
## Goals
|
|
6
|
+
- Attribute reward at decision/step level (per tool call) instead of only using a single trajectory outcome reward.
|
|
7
|
+
- Make this behavior controllable via TOML config flags (enable/disable and choose the source/kind of event reward).
|
|
8
|
+
- Keep compatibility with existing trajectory-outcome paths; when disabled, the system behaves exactly as before.
|
|
9
|
+
|
|
10
|
+
## Definitions
|
|
11
|
+
- "Decision": one LM tool call (e.g., `interact_many`) and the sequence of environment steps it triggers.
|
|
12
|
+
- "Absolute achievement delta" (AchΔ): count of achievements that became true during a decision.
|
|
13
|
+
- "Unique achievement delta" (UniqueΔ): count of achievements first unlocked in the episode by a decision.
|
|
14
|
+
- "Env sparse reward": the environment’s own per-step reward (e.g., `reward_last_step`).
|
|
15
|
+
|
|
16
|
+
## What to compute per decision
|
|
17
|
+
- From observation before and after the decision:
|
|
18
|
+
- `turned_true = achievements_after − achievements_before`
|
|
19
|
+
- `new_unique = episode_achievements_after − episode_achievements_before`
|
|
20
|
+
- Scalars:
|
|
21
|
+
- `ach_delta = len(turned_true)`
|
|
22
|
+
- `unique_delta = len(new_unique)`
|
|
23
|
+
- Optional: per-achievement markers for each `a ∈ new_unique` (reward 1.0) for fine-grained shaping.
|
|
24
|
+
|
|
25
|
+
## Switches/Flags in TOML
|
|
26
|
+
Prefer reusing existing RL trainer flags in clustered_training (already present in code):
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
[training]
|
|
30
|
+
# Stepwise/event rewards
|
|
31
|
+
step_rewards_enabled = true # master switch
|
|
32
|
+
step_rewards_mode = "decision_stepwise" # "off" | "decision_stepwise" | "env_sparse"
|
|
33
|
+
step_rewards_beta = 0.0 # optional coefficient for time weighting
|
|
34
|
+
step_rewards_indicator_lambda = 0.0 # optional coefficient for indicator-based flips
|
|
35
|
+
|
|
36
|
+
# Crafter-specific selection (proposed extension, optional)
|
|
37
|
+
# event_rewards_kind = "unique" # "unique" | "absolute" (if omitted, default to "unique")
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
- `step_rewards_enabled`: enables all event-level aggregation.
|
|
41
|
+
- `step_rewards_mode`:
|
|
42
|
+
- `off`: use only trajectory outcome reward (status quo).
|
|
43
|
+
- `decision_stepwise`: use per-decision computed deltas (from policy app or collector), aggregate as returns.
|
|
44
|
+
- `env_sparse`: use the environment’s `reward_last_step` per step.
|
|
45
|
+
- `event_rewards_kind` (optional): if present, selects `unique_delta` (default) vs `ach_delta` for `decision_stepwise`.
|
|
46
|
+
|
|
47
|
+
Warmup task TOML may place these under a `training` or `rollout` section; the launcher just forwards the full TOML blob to the backend, so the monorepo side should read the same keys.
|
|
48
|
+
|
|
49
|
+
## Warming_up_to_rl task app – producing decision rewards
|
|
50
|
+
- In the Crafter policy (or rollout coordinator), for each decision:
|
|
51
|
+
- Compute `ach_delta` and `unique_delta` as above.
|
|
52
|
+
- Attach a compact record to the step metadata, e.g.:
|
|
53
|
+
```json
|
|
54
|
+
{
|
|
55
|
+
"decision_rewards": {
|
|
56
|
+
"turn": 5,
|
|
57
|
+
"ach_delta": 1,
|
|
58
|
+
"unique_delta": 1,
|
|
59
|
+
"all": ["collect_wood"],
|
|
60
|
+
"unique": ["collect_wood"]
|
|
61
|
+
}
|
|
62
|
+
}
|
|
63
|
+
```
|
|
64
|
+
- When `step_rewards_enabled=false`, omit this block.
|
|
65
|
+
- When `step_rewards_mode="env_sparse"`, rely on `reward_last_step` (no decision block required).
|
|
66
|
+
|
|
67
|
+
Notes:
|
|
68
|
+
- The app already records previous tool calls and environment results; this simply adds a small, structured payload per decision (turn).
|
|
69
|
+
- If per-step `reward_last_step` is unavailable, `decision_stepwise` remains effective as long as achievements maps are present.
|
|
70
|
+
|
|
71
|
+
## Monorepo clustered_training – consuming event rewards
|
|
72
|
+
Integration points (based on existing config structure):
|
|
73
|
+
- `ClusteredTrainerConfig` already includes:
|
|
74
|
+
- `step_rewards_enabled: bool`
|
|
75
|
+
- `step_rewards_mode: str` (off | decision_stepwise)
|
|
76
|
+
- `step_rewards_beta: float`
|
|
77
|
+
- `step_rewards_indicator_lambda: float`
|
|
78
|
+
|
|
79
|
+
Collector changes (conceptual):
|
|
80
|
+
1. During trajectory collection, build a vector `r_t` of per-time-step rewards:
|
|
81
|
+
- If `step_rewards_mode == "decision_stepwise"`:
|
|
82
|
+
- For time step `t` corresponding to a decision, set:
|
|
83
|
+
- `r_t = unique_delta` if `event_rewards_kind=="unique"` (default), else `r_t = ach_delta`.
|
|
84
|
+
- For non-decision steps, `r_t = 0.0` (unless you prefer to spread rewards over sub-steps; keep simple attribution by default).
|
|
85
|
+
- If `step_rewards_mode == "env_sparse"`:
|
|
86
|
+
- For each environment step, set `r_t = reward_last_step`.
|
|
87
|
+
- Else (`off`):
|
|
88
|
+
- Use a single scalar outcome reward at the end (status quo).
|
|
89
|
+
|
|
90
|
+
2. Compute returns/advantages as usual, summing event rewards:
|
|
91
|
+
- For GRPO/GRPO-Ludic, the typical group-based advantage calculation remains unchanged; only the reward signal changes from a single scalar to a sequence `[r_1, …, r_T]`.
|
|
92
|
+
- Optional time weighting: `r_t ← r_t + beta * (T − t) * indicator_flip_t`, where `indicator_flip_t` is 1 if any unique achievement flipped at `t`, else 0. Use `step_rewards_indicator_lambda` as a coefficient if needed.
|
|
93
|
+
|
|
94
|
+
Pseudo-code (collector side):
|
|
95
|
+
```python
|
|
96
|
+
r = [0.0] * T
|
|
97
|
+
if cfg.step_rewards_enabled:
|
|
98
|
+
if cfg.step_rewards_mode == "decision_stepwise":
|
|
99
|
+
for ev in decision_events: # each with fields {turn, ach_delta, unique_delta}
|
|
100
|
+
idx = ev["turn"] - 1 # 0-based
|
|
101
|
+
base = ev["unique_delta"] if event_kind == "unique" else ev["ach_delta"]
|
|
102
|
+
r[idx] += float(base)
|
|
103
|
+
if cfg.step_rewards_indicator_lambda > 0 and ev["unique_delta"] > 0:
|
|
104
|
+
r[idx] += float(cfg.step_rewards_indicator_lambda)
|
|
105
|
+
elif cfg.step_rewards_mode == "env_sparse":
|
|
106
|
+
for t, step in enumerate(env_steps):
|
|
107
|
+
r[t] += float(step.get("reward_last_step", 0.0))
|
|
108
|
+
else:
|
|
109
|
+
r[-1] += float(trajectory_outcome_reward)
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Respecting the TOML switch
|
|
113
|
+
- warming_up_to_rl launcher (`run_rl_and_save.py`) forwards the entire TOML to the backend.
|
|
114
|
+
- clustered_training should read `[training].step_rewards_enabled` and `[training].step_rewards_mode` (and optionally `event_rewards_kind`) inside its config loader (already present fields in `ClusteredTrainerConfig`).
|
|
115
|
+
- When disabled, the collector must not attempt to parse or rely on any per-decision metadata.
|
|
116
|
+
|
|
117
|
+
## Debugging & metrics
|
|
118
|
+
- Log per-trajectory aggregates: `ΣAchΔ`, `ΣUniqueΔ`, and a breakdown by decision turn (already added to the Groq rollout table in research). These can be mirrored in the backend logs for quick checks.
|
|
119
|
+
- Add simple counters to training logs:
|
|
120
|
+
- number of decisions with `unique_delta>0`
|
|
121
|
+
- sum of deltas per batch
|
|
122
|
+
- share of batches with nonzero event rewards
|
|
123
|
+
|
|
124
|
+
## Backward compatibility
|
|
125
|
+
- When flags are off, the pipeline uses trajectory outcome rewards only.
|
|
126
|
+
- No schema migrations are required; event-level metadata is optional.
|
|
127
|
+
|
|
128
|
+
## Recommended defaults
|
|
129
|
+
- `step_rewards_enabled = true`
|
|
130
|
+
- `step_rewards_mode = "decision_stepwise"`
|
|
131
|
+
- Prefer `unique` deltas for better credit assignment; set `event_rewards_kind = "unique"` (if adopted) or implicitly default to unique deltas.
|
|
132
|
+
|
|
133
|
+
Here’s the exact file-by-file implementation checklist, scoped so another engineer can implement from this alone.
|
|
134
|
+
|
|
135
|
+
Warming_up_to_rl (task app) – record decision rewards and honor flags
|
|
136
|
+
- Config examples (ensure flags present and documented)
|
|
137
|
+
- `examples/warming_up_to_rl/configs/*.toml`
|
|
138
|
+
- Add under [training]:
|
|
139
|
+
- `step_rewards_enabled = true|false`
|
|
140
|
+
- `step_rewards_mode = "off" | "decision_stepwise" | "env_sparse"`
|
|
141
|
+
- Optional: `event_rewards_kind = "unique" | "absolute"`
|
|
142
|
+
- Optional shaping: `step_rewards_beta`, `step_rewards_indicator_lambda`
|
|
143
|
+
|
|
144
|
+
- Policy (compute ach/unique deltas per decision; emit into step metadata when enabled)
|
|
145
|
+
- `examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py`
|
|
146
|
+
- Before/after each tool call sequence, compute:
|
|
147
|
+
- `ach_delta = len(achievements_after − achievements_before)`
|
|
148
|
+
- `unique_delta = len((episode_achievements_after) − (episode_achievements_before))`
|
|
149
|
+
- When `[training].step_rewards_enabled` and `step_rewards_mode == "decision_stepwise"`:
|
|
150
|
+
- Attach to the step’s returned metadata:
|
|
151
|
+
- `decision_rewards = { turn, ach_delta, unique_delta, all: [...], unique: [...] }`
|
|
152
|
+
- If `step_rewards_mode == "env_sparse"`, do not emit `decision_rewards` (leave environment’s `reward_last_step` as the only per-step reward).
|
|
153
|
+
- Respect clipping for long “Previous tool calls” context (already added; keep).
|
|
154
|
+
|
|
155
|
+
- Policy routes (surface flags to policy; store on policy instance or in request metadata)
|
|
156
|
+
- `examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py`
|
|
157
|
+
- Accept training flags from create/init endpoints (if provided via config).
|
|
158
|
+
- Pass through/attach the flags into the policy or per-step metadata so `policy.step(...)` can read them.
|
|
159
|
+
|
|
160
|
+
- Rollout coordinator (guarantee metadata flows out with each step)
|
|
161
|
+
- `examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py`
|
|
162
|
+
- Ensure the step response returned to the caller includes `decision_rewards` when set by the policy.
|
|
163
|
+
- No compute here; just propagate metadata.
|
|
164
|
+
|
|
165
|
+
- Environment adapter (ensure observation has fields needed by the deltas)
|
|
166
|
+
- `examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py`
|
|
167
|
+
- Confirm each step response includes `observation.achievements_status` and `observation.reward_last_step`.
|
|
168
|
+
- No reward computation changes here; just guarantee the fields exist.
|
|
169
|
+
|
|
170
|
+
Monorepo (clustered training, GSPO/GRPO) – use decision/env-sparse rewards to build per-step returns
|
|
171
|
+
- Config loader (read flags; default behavior preserved)
|
|
172
|
+
- `backend/app/routes/clustered_training/core/algorithms/gspo/training/clustered_trainer.py`
|
|
173
|
+
- In `ClusteredTrainerConfig.from_dict(...)`:
|
|
174
|
+
- Already present: `step_rewards_enabled`, `step_rewards_mode`, `step_rewards_beta`, `step_rewards_indicator_lambda`.
|
|
175
|
+
- Add (optional) read: `event_rewards_kind` with default `"unique"` if not present.
|
|
176
|
+
|
|
177
|
+
- Collector/rollout trajectory builder (construct r_t per episode)
|
|
178
|
+
- The module that converts environment/policy step records into trajectories (collector). If it’s split, cover the point where step arrays are built just before advantage computation.
|
|
179
|
+
- New logic:
|
|
180
|
+
- Initialize `r = [0.0] * T`.
|
|
181
|
+
- If `step_rewards_enabled`:
|
|
182
|
+
- If `step_rewards_mode == "decision_stepwise"`:
|
|
183
|
+
- For each step metadata with `decision_rewards`:
|
|
184
|
+
- `idx = turn - 1`
|
|
185
|
+
- `base = unique_delta` if `event_rewards_kind == "unique"` else `ach_delta`
|
|
186
|
+
- `r[idx] += float(base)`
|
|
187
|
+
- If `step_rewards_indicator_lambda > 0` and `unique_delta > 0`, `r[idx] += step_rewards_indicator_lambda`
|
|
188
|
+
- Else if `step_rewards_mode == "env_sparse"`:
|
|
189
|
+
- For each step, `r[t] += float(observation.reward_last_step or 0.0)`
|
|
190
|
+
- Else (`off`): `r[-1] += float(outcome_reward)`
|
|
191
|
+
- Optional shaping: `r[t] += step_rewards_beta * (T - t) * indicator_flip_t` where `indicator_flip_t = 1` if the step had `unique_delta > 0`, else 0.
|
|
192
|
+
- Ensure this path does not run when flags are off; old outcome-only behavior remains.
|
|
193
|
+
|
|
194
|
+
- Advantage/returns computation (no API change; just consume r)
|
|
195
|
+
- The function/module that currently builds returns/advantages from rewards.
|
|
196
|
+
- No interface changes; ensure it takes `r` from the collector path above instead of a single scalar outcome reward when event rewards are enabled.
|
|
197
|
+
|
|
198
|
+
- Logging/metrics (help ops confirm it’s working)
|
|
199
|
+
- Add counters in the training loop logs:
|
|
200
|
+
- Sum of `r` per batch (stepwise mode).
|
|
201
|
+
- Count of decisions with `unique_delta > 0`.
|
|
202
|
+
- Mode/flags echoed on startup.
|
|
203
|
+
|
|
204
|
+
- RL configs (dev example TOMLs with flags)
|
|
205
|
+
- `backend/app/routes/clustered_training/dev/configs/crafter_online.toml`
|
|
206
|
+
- Add the `[training]` keys above with comments showing choices.
|
|
207
|
+
- Any job start scripts that inline TOML (e.g. `tests/applications/crafter/rl/start_qwen_full_clustered.py` if used)
|
|
208
|
+
- Ensure they don’t strip the new keys; no code change needed if they pass through the TOML.
|
|
209
|
+
|
|
210
|
+
Research (optional reference; not required for GSPO)
|
|
211
|
+
- Reference rollout script demonstrating decision-delta computation
|
|
212
|
+
- `research/testing/crafter/eval_rollout_table_groq.py`
|
|
213
|
+
- Already computes/prints per-decision deltas; use as validation aid (no further changes required for GSPO).
|
|
214
|
+
|
|
215
|
+
Docs/notes (keep implementers aligned)
|
|
216
|
+
- Warming up to RL notes
|
|
217
|
+
- `examples/warming_up_to_rl/event_rewards.md`
|
|
218
|
+
- Already describes flags and expectations; keep this in sync if any naming changes happen.
|
|
219
|
+
|
|
220
|
+
- Research spec
|
|
221
|
+
- `research/testing/crafter/event_rewards.txt`
|
|
222
|
+
- Already contains the full design and the “recording AND using stepwise rewards” plan.
|
|
223
|
+
|
|
224
|
+
Sanity checklist (engineer can validate with these)
|
|
225
|
+
- With `[training].step_rewards_enabled=false`: identical behavior to today (only outcome reward used).
|
|
226
|
+
- With `decision_stepwise`:
|
|
227
|
+
- The task app emits `decision_rewards` per decision (check one trajectory).
|
|
228
|
+
- The collector constructs `r_t` from `unique_delta` (or `ach_delta` if configured).
|
|
229
|
+
- Training logs show nonzero stepwise batch reward sums.
|
|
230
|
+
- With `env_sparse`:
|
|
231
|
+
- No decision payload; rewards come strictly from `reward_last_step`.
|
|
232
|
+
- Switching `event_rewards_kind` between `"unique"` and `"absolute"` changes which scalar lands in r at a decision turn.
|
|
233
|
+
|
|
234
|
+
If you want, I can generate minimal code diffs for each target file after you confirm these paths and flag names.
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
# Crafter Task App Ops Cheatsheet
|
|
2
|
+
|
|
3
|
+
## Discover available task apps
|
|
4
|
+
- `uvx synth-ai task-app list`
|
|
5
|
+
- Lists the registered apps plus any aliases (e.g. `grpo-crafter`, `crafter`).
|
|
6
|
+
|
|
7
|
+
## Run locally with uvicorn
|
|
8
|
+
- Launch the FastAPI server:
|
|
9
|
+
- `uvx synth-ai serve grpo-crafter --port 8010 --force`
|
|
10
|
+
- `--force` frees the port if a previous run is still bound.
|
|
11
|
+
- Add `--reload` while iterating on code.
|
|
12
|
+
- Enable tracing + SFT dumps while serving:
|
|
13
|
+
- `uvx synth-ai serve grpo-crafter --port 8010 --force --trace ./traces --trace-db ./traces/v3/synth_ai.db`
|
|
14
|
+
- `--trace` writes JSONL trajectories into the folder.
|
|
15
|
+
- `--trace-db` points the sqlite/Turso-compatible tracing DB (defaults to `traces/v3/synth_ai.db`).
|
|
16
|
+
|
|
17
|
+
## Modal hot-reload (`modal serve`)
|
|
18
|
+
- Run the hosted app locally inside Modal’s hot-reload loop:
|
|
19
|
+
- `uvx synth-ai task-app modal-serve grpo-crafter --env-file .env`
|
|
20
|
+
- CLI will prompt for a `.env` file if not supplied; secrets are loaded via `Secret.from_dotenv`.
|
|
21
|
+
- Keeps watching the repo for changes and streams logs in your terminal.
|
|
22
|
+
|
|
23
|
+
## Modal deploy (persistent endpoint)
|
|
24
|
+
- Build + deploy to the `modal deploy` target:
|
|
25
|
+
- `uvx synth-ai task-app deploy grpo-crafter --env-file .env`
|
|
26
|
+
- Use `--dry-run` first to inspect the generated `modal deploy …` command.
|
|
27
|
+
- `--modal-cli` lets you point at a non-default Modal binary if needed.
|
|
28
|
+
|
|
29
|
+
## Collecting traces & rollouts
|
|
30
|
+
- Local rollouts against a running server with full trace payloads:
|
|
31
|
+
- `uv run python examples/warming_up_to_rl/run_local_rollout_traced.py --api-key "$ENVIRONMENT_API_KEY" --base-url http://localhost:8010 --model gpt-4o-mini --trace-format full --trace-path ./trace_full.json`
|
|
32
|
+
- This script prints a reward summary, dumps the trace JSON, and warns if episode returns don’t line up with event rewards.
|
|
33
|
+
- Remote rollouts against a deployed Modal endpoint:
|
|
34
|
+
- `uv run python examples/warming_up_to_rl/run_rollout_remote.py --base-url https://<modal-app-url> --api-key "$ENVIRONMENT_API_KEY" --model gpt-4o-mini --max-llm-calls 10`
|
|
35
|
+
|
|
36
|
+
## Trace analytics
|
|
37
|
+
- Summarise model usage, reward breakdowns, and achievement histograms:
|
|
38
|
+
- `uv run python examples/warming_up_to_rl/analyze_trace_db.py --db traces/v3/synth_ai.db`
|
|
39
|
+
- Output includes per-model achievement tallies and episode reward stats.
|
|
40
|
+
|
|
41
|
+
## Exporting behavioural-cloning datasets
|
|
42
|
+
- Filter sessions via model, achievements, rewards, etc., then export JSONL:
|
|
43
|
+
- `uv run python examples/warming_up_to_rl/export_trace_sft.py \`
|
|
44
|
+
` --db traces/v3/synth_ai.db \`
|
|
45
|
+
` --output traces/qwen32b_filtered.jsonl \`
|
|
46
|
+
` --model qwen/qwen3-32b \`
|
|
47
|
+
` --exclude-achievement collect_sapling \`
|
|
48
|
+
` --exclude-achievement collect_drink \`
|
|
49
|
+
` --min-unique 3 \`
|
|
50
|
+
` --event-reward unique_achievement_delta:1.0 \`
|
|
51
|
+
` --limit 100`
|
|
52
|
+
- `--exclude-achievement` makes it easy to ignore easier unlocks when enforcing `--min-unique`.
|
|
53
|
+
- Combine `--require-achievement`, `--min-outcome-reward`, or provider filters as needed.
|
|
54
|
+
|
|
55
|
+
## Training jobs (RL + SFT)
|
|
56
|
+
- `uvx synth-ai train` is the consolidated entry point for RL or SFT launches.
|
|
57
|
+
- Omit `--config` to let the CLI enumerate candidate TOMLs (RL + FFT) and pick interactively.
|
|
58
|
+
- Omit `--env-file` to browse available `.env` files; the CLI never auto-selects.
|
|
59
|
+
- Missing secrets trigger an interactive loop: enter manually, switch `.env`, or fetch from Modal (secrets/apps) before proceeding.
|
|
60
|
+
- RL run (local backend + local task app):
|
|
61
|
+
- `uvx synth-ai train --type rl --config examples/warming_up_to_rl/configs/crafter_cluster.toml --backend http://localhost:8000/api --task-url http://localhost:8010`
|
|
62
|
+
- Performs task-app health checks using the resolved `ENVIRONMENT_API_KEY` before posting to `/rl/jobs`.
|
|
63
|
+
- Polls job status until terminal unless `--no-poll` is supplied.
|
|
64
|
+
- SFT run (FFT fine-tune):
|
|
65
|
+
- `uvx synth-ai train --type sft --config examples/warming_up_to_rl/configs/fft_crafter.toml --dataset traces/crafter_sft.jsonl`
|
|
66
|
+
- Uploads training/validation JSONL to `/learning/files` and starts the job.
|
|
67
|
+
- Poll output mirrors the legacy `run_fft_and_save.py` script.
|
|
68
|
+
- Common flags:
|
|
69
|
+
- `--dry-run` previews payloads/uploads without making requests.
|
|
70
|
+
- `--idempotency` sets the `Idempotency-Key` header for RL submissions.
|
|
71
|
+
- `--poll-timeout` / `--poll-interval` tune the backend polling cadence.
|
|
72
|
+
|
|
73
|
+
> Tip: all `uvx synth-ai …` subcommands accept `--help` if you need to inspect additional options on the fly.
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# Warming Up to RL (Crafter)
|
|
2
|
+
|
|
3
|
+
This folder contains an end-to-end Crafter workflow: stand up the task app, collect Groq-powered rollouts, export tracing data for supervised fine-tuning, run FFT/RL jobs, and evaluate checkpoints. Commands assume the repository root as the working directory unless stated otherwise.
|
|
4
|
+
|
|
5
|
+
## 1. Prerequisites
|
|
6
|
+
|
|
7
|
+
- Python 3.11+
|
|
8
|
+
- [`uv`](https://docs.astral.sh/uv/) / `uvx` (or install `synth-ai` inside a virtualenv)
|
|
9
|
+
- Modal CLI (`modal token new`) if you plan to deploy the task app remotely
|
|
10
|
+
- API keys:
|
|
11
|
+
- `SYNTH_API_KEY` and `ENVIRONMENT_API_KEY` are required for CLI flows
|
|
12
|
+
- `GROQ_API_KEY` (used by the Groq policy) and optional `OPENAI_API_KEY`
|
|
13
|
+
- Run `uvx synth-ai setup` once to pair with the Synth dashboard and populate `~/.synth-ai/user_config.json`
|
|
14
|
+
|
|
15
|
+
## 2. Task App
|
|
16
|
+
|
|
17
|
+
### Local serve (FastAPI)
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
uvx synth-ai serve \
|
|
21
|
+
--env-file examples/warming_up_to_rl/.env \
|
|
22
|
+
--host 127.0.0.1 --port 8001 \
|
|
23
|
+
--trace traces/v3
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
- `--trace` creates/uses `traces/v3/task_app_traces_<timestamp>.db` for the lifetime of the server. All rollouts append to this file.
|
|
27
|
+
- Add `--trace-db` to override the SQLite path (one DB per server instance).
|
|
28
|
+
- Pass `--reload` during development for auto-reload.
|
|
29
|
+
|
|
30
|
+
### Modal deploy / serve
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
uvx synth-ai deploy grpo-crafter --name grpo-crafter-task-app
|
|
34
|
+
uvx synth-ai modal-serve grpo-crafter --name grpo-crafter-task-app
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Both commands reuse the same tracing defaults; the backend persists rollouts into the configured SQLite/Turso store.
|
|
38
|
+
|
|
39
|
+
## 3. Collect rollouts
|
|
40
|
+
|
|
41
|
+
Hit the running task app with the local helper to gather a traced rollout (Groq policy shown below):
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
python examples/warming_up_to_rl/run_local_rollout_traced.py \
|
|
45
|
+
--base-url http://localhost:8001 \
|
|
46
|
+
--api-key "$ENVIRONMENT_API_KEY" \
|
|
47
|
+
--inference-api-key "$GROQ_API_KEY" \
|
|
48
|
+
--model qwen/qwen3-32b \
|
|
49
|
+
--inference-url https://api.groq.com/openai \
|
|
50
|
+
--max-llm-calls 3 \
|
|
51
|
+
--run-id local-trace
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Artifacts produced per rollout:
|
|
55
|
+
- `traces/v3/task_app_traces_<timestamp>.db`: the task app’s append-only database (one per server lifetime; new rollouts append rows).
|
|
56
|
+
- `local-trace_trace.json`: single-run JSON snapshot for inspection.
|
|
57
|
+
|
|
58
|
+
## 4. Export SFT-ready data
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
python examples/warming_up_to_rl/export_trace_sft.py
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
- When run without `--in`, the script lists every `task_app_traces*.db` under the current directory (and subdirectories), sorted by recency, and prompts you to pick one (the newest is marked `← most recent`).
|
|
65
|
+
- The exporter validates the trace data, filters sessions, and writes JSONL to `ft_data/crafter_sft.jsonl` by default (override with `--out`).
|
|
66
|
+
|
|
67
|
+
## 5. FFT / SFT Training
|
|
68
|
+
|
|
69
|
+
Recommended via CLI:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
uvx synth-ai train \
|
|
73
|
+
--type sft \
|
|
74
|
+
--config examples/warming_up_to_rl/configs/crafter_fft.toml \
|
|
75
|
+
--dataset /absolute/path/to/crafter_sft.jsonl
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
The CLI uploads training data, submits the job to the Synth backend, and polls for completion. A legacy helper (`run_fft_and_save.py`) is still provided for ad-hoc usage.
|
|
79
|
+
|
|
80
|
+
## 6. Evaluate checkpoints
|
|
81
|
+
|
|
82
|
+
Update the relevant TOML with the model identifier (e.g., `model = "ft:<model_id>"`) and run:
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
uv run python examples/warming_up_to_rl/run_eval.py \
|
|
86
|
+
--toml examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml \
|
|
87
|
+
--use-rollout
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
`--use-rollout` exercises the `/rollout` endpoint so achievements/rewards are surfaced in traces.
|
|
91
|
+
|
|
92
|
+
## 7. RL Training
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
uvx synth-ai train \
|
|
96
|
+
--type rl \
|
|
97
|
+
--config examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Start from `rl_from_ft.toml` if you want to bootstrap from a previously fine-tuned checkpoint.
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
### Notes on tracing
|
|
105
|
+
|
|
106
|
+
- **One SQLite DB per server:** every task app instance maintains a single `task_app_traces_<timestamp>.db` and appends each new rollout. If you want a fresh file, start another `synth-ai serve` with a different `--trace-db` path.
|
|
107
|
+
- **JSON snapshots per run:** `run_local_rollout_traced.py` writes `<run_id>_trace.json` so you can inspect or hand-edit individual runs.
|
|
108
|
+
- **Exporter discovery:** the SFT exporter recursively catalogs all `task_app_traces*.db` files beneath the task app directory, allowing you to select any historical snapshot when exporting training data.
|
|
109
|
+
|
|
110
|
+
These conventions keep tracing predictable: continuous history per server, easy selection of historical DBs, and one-off JSON exports for quick analysis.
|