synth-ai 0.2.8.dev4__py3-none-any.whl → 0.2.23.dev3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- examples/README.md +1 -0
- examples/__init__.py +16 -0
- examples/analyze_semantic_words.sh +17 -0
- examples/baseline/banking77_baseline.py +243 -0
- examples/baseline/banking77_pipeline_baseline.py +294 -0
- examples/baseline/crafter_baseline.py +407 -0
- examples/baseline/pokemon_red_baseline.py +326 -0
- examples/baseline/simple_baseline.py +56 -0
- examples/baseline/warming_up_to_rl_baseline.py +239 -0
- examples/blog_posts/gepa/README.md +355 -0
- examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
- examples/blog_posts/gepa/configs/banking77_gepa_test.toml +80 -0
- examples/blog_posts/gepa/configs/banking77_mipro_local.toml +50 -0
- examples/blog_posts/gepa/configs/banking77_pipeline_gepa_local.toml +101 -0
- examples/blog_posts/gepa/configs/banking77_pipeline_gepa_test.toml +96 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/hover_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/hover_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +57 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +35 -0
- examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +51 -0
- examples/blog_posts/gepa/configs/pupa_gepa_local.toml +58 -0
- examples/blog_posts/gepa/configs/pupa_mipro_local.toml +52 -0
- examples/blog_posts/gepa/deploy_banking77_task_app.sh +54 -0
- examples/blog_posts/gepa/gepa_baseline.py +204 -0
- examples/blog_posts/gepa/query_prompts_example.py +97 -0
- examples/blog_posts/gepa/run_gepa_banking77.sh +112 -0
- examples/blog_posts/gepa/run_gepa_banking77_pipeline.sh +163 -0
- examples/blog_posts/gepa/task_apps.py +105 -0
- examples/blog_posts/gepa/test_gepa_local.sh +67 -0
- examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
- examples/blog_posts/mipro/README.md +415 -0
- examples/blog_posts/mipro/configs/banking77_mipro_local.toml +91 -0
- examples/blog_posts/mipro/configs/banking77_mipro_test.toml +87 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_gemini_flash_lite_local.toml +98 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_gpt41mini_local.toml +96 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_local.toml +94 -0
- examples/blog_posts/mipro/configs/banking77_pipeline_mipro_test.toml +170 -0
- examples/blog_posts/mipro/deploy_banking77_pipeline_task_app.sh +59 -0
- examples/blog_posts/mipro/deploy_banking77_task_app.sh +41 -0
- examples/blog_posts/mipro/multi_step.md +79 -0
- examples/blog_posts/mipro/run_mipro_banking77.sh +191 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline.sh +171 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline_gemini_flash_lite.sh +177 -0
- examples/blog_posts/mipro/run_mipro_banking77_pipeline_gpt41mini.sh +173 -0
- examples/blog_posts/mipro/verify_banking77_setup.sh +117 -0
- examples/blog_posts/pokemon_vl/README.md +98 -0
- examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
- examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +27 -0
- examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
- examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
- examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +43 -0
- examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
- examples/blog_posts/pokemon_vl/extract_images.py +239 -0
- examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
- examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
- examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
- examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
- examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
- examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
- examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
- examples/blog_posts/warming_up_to_rl/README.md +158 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
- examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
- examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
- examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +91 -0
- examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
- examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
- examples/crafter_debug_render.py +186 -0
- examples/dev/qwen3_32b_qlora_4xh100.toml +45 -0
- examples/gepa/banking77_pipeline_gepa.toml +96 -0
- examples/gepa/multi_stage_gepa_example.toml +84 -0
- examples/gepa/run_gepa_banking77_pipeline.sh +157 -0
- examples/multi_step/SFT_README.md +147 -0
- examples/multi_step/configs/README_verilog_rl.md +77 -0
- examples/multi_step/configs/VERILOG_REWARDS.md +103 -0
- examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +196 -0
- examples/multi_step/configs/crafter_eval_synth_qwen4b.toml +35 -0
- examples/multi_step/configs/crafter_eval_text_only_groq_qwen32b.toml +36 -0
- examples/multi_step/configs/crafter_rl_outcome.toml +75 -0
- examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +145 -0
- examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +84 -0
- examples/multi_step/configs/crafter_rl_stepwise_simple.toml +79 -0
- examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
- examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
- examples/multi_step/configs/crafter_synth_backend.md +40 -0
- examples/multi_step/configs/verilog_eval_groq_qwen32b.toml +31 -0
- examples/multi_step/configs/verilog_eval_synth_qwen8b.toml +33 -0
- examples/multi_step/configs/verilog_rl_lora.toml +147 -0
- examples/multi_step/convert_traces_to_sft.py +84 -0
- examples/multi_step/crafter_rl_lora.md +70 -0
- examples/multi_step/judges/crafter_backend_judge.py +220 -0
- examples/multi_step/judges/verilog_backend_judge.py +234 -0
- examples/multi_step/readme.md +48 -0
- examples/multi_step/run_sft_qwen30b.sh +45 -0
- examples/multi_step/sse_metrics_streaming_notes.md +357 -0
- examples/multi_step/task_app_config_notes.md +494 -0
- examples/multi_step/verilog_rl_lora.md +218 -0
- examples/qwen_coder/README.md +102 -0
- examples/qwen_coder/_shared.py +113 -0
- examples/qwen_coder/configs/coder_lora_30b.toml +60 -0
- examples/qwen_coder/configs/coder_lora_4b.toml +61 -0
- examples/qwen_coder/configs/coder_lora_small.toml +57 -0
- examples/qwen_coder/generate_dataset.py +98 -0
- examples/qwen_coder/infer_ft_smoke.py +65 -0
- examples/qwen_coder/infer_prod_proxy.py +73 -0
- examples/qwen_coder/infer_via_synth.py +87 -0
- examples/qwen_coder/scripts/infer_coder.sh +19 -0
- examples/qwen_coder/scripts/train_coder_30b.sh +22 -0
- examples/qwen_coder/sft_full_17b.py +103 -0
- examples/qwen_coder/sft_lora_30b.py +110 -0
- examples/qwen_coder/subset_jsonl.py +39 -0
- examples/qwen_coder/todos.md +38 -0
- examples/qwen_coder/validate_jsonl.py +60 -0
- examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
- examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
- examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
- examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
- examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
- examples/qwen_vl/QUICKSTART.md +327 -0
- examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
- examples/qwen_vl/README.md +152 -0
- examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
- examples/qwen_vl/RL_VISION_TESTING.md +333 -0
- examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
- examples/qwen_vl/SETUP_COMPLETE.md +274 -0
- examples/qwen_vl/VISION_TESTS_COMPLETE.md +489 -0
- examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
- examples/qwen_vl/__init__.py +2 -0
- examples/qwen_vl/collect_data_via_cli.md +415 -0
- examples/qwen_vl/collect_vision_traces.py +368 -0
- examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +110 -0
- examples/qwen_vl/configs/crafter_vlm_sft_example.toml +59 -0
- examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +26 -0
- examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
- examples/qwen_vl/configs/eval_gpt5nano_vision.toml +26 -0
- examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
- examples/qwen_vl/configs/filter_qwen3vl_sft.toml +49 -0
- examples/qwen_vl/configs/filter_vision_sft.toml +52 -0
- examples/qwen_vl/configs/filter_vision_test.toml +8 -0
- examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
- examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
- examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
- examples/qwen_vl/run_vision_comparison.sh +61 -0
- examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
- examples/qwen_vl/test_image_validation.py +201 -0
- examples/qwen_vl/test_sft_vision_data.py +110 -0
- examples/rl/README.md +169 -0
- examples/rl/configs/eval_base_qwen.toml +17 -0
- examples/rl/configs/eval_rl_qwen.toml +13 -0
- examples/rl/configs/rl_from_base_qwen.toml +62 -0
- examples/rl/configs/rl_from_base_qwen17.toml +80 -0
- examples/rl/configs/rl_from_ft_qwen.toml +37 -0
- examples/rl/download_dataset.py +80 -0
- examples/rl/run_eval.py +436 -0
- examples/rl/run_rl_and_save.py +111 -0
- examples/rl/task_app/README.md +21 -0
- examples/rl/task_app/math_single_step.py +990 -0
- examples/rl/task_app/math_task_app.py +111 -0
- examples/run_crafter_demo.sh +10 -0
- examples/sdk_prompt_learning_example.py +55 -0
- examples/sft/README.md +139 -0
- examples/sft/configs/crafter_fft_qwen0p6b.toml +49 -0
- examples/sft/configs/crafter_lora_qwen0p6b.toml +49 -0
- examples/sft/evaluate.py +117 -0
- examples/sft/export_dataset.py +120 -0
- examples/sft/generate_traces.py +164 -0
- examples/swe/__init__.py +12 -0
- examples/swe/task_app/README.md +135 -0
- examples/swe/task_app/__init__.py +2 -0
- examples/swe/task_app/grpo_swe_mini.py +604 -0
- examples/swe/task_app/grpo_swe_mini_task_app.py +124 -0
- examples/swe/task_app/hosted/README.md +173 -0
- examples/swe/task_app/hosted/__init__.py +5 -0
- examples/swe/task_app/hosted/branching.py +143 -0
- examples/swe/task_app/hosted/environment_routes.py +1289 -0
- examples/swe/task_app/hosted/envs/__init__.py +1 -0
- examples/swe/task_app/hosted/envs/crafter/__init__.py +6 -0
- examples/swe/task_app/hosted/envs/crafter/app.py +1 -0
- examples/swe/task_app/hosted/envs/crafter/environment.py +522 -0
- examples/swe/task_app/hosted/envs/crafter/policy.py +478 -0
- examples/swe/task_app/hosted/envs/crafter/react_agent.py +108 -0
- examples/swe/task_app/hosted/envs/crafter/shared.py +305 -0
- examples/swe/task_app/hosted/envs/crafter/tools.py +47 -0
- examples/swe/task_app/hosted/envs/mini_swe/__init__.py +8 -0
- examples/swe/task_app/hosted/envs/mini_swe/environment.py +1191 -0
- examples/swe/task_app/hosted/envs/mini_swe/policy.py +355 -0
- examples/swe/task_app/hosted/envs/mini_swe/shared.py +83 -0
- examples/swe/task_app/hosted/envs/mini_swe/tools.py +96 -0
- examples/swe/task_app/hosted/hosted_app.py +204 -0
- examples/swe/task_app/hosted/inference/__init__.py +5 -0
- examples/swe/task_app/hosted/inference/openai_client.py +584 -0
- examples/swe/task_app/hosted/main.py +100 -0
- examples/swe/task_app/hosted/policy_routes.py +1094 -0
- examples/swe/task_app/hosted/registry.py +195 -0
- examples/swe/task_app/hosted/rollout.py +1905 -0
- examples/swe/task_app/hosted/storage/__init__.py +5 -0
- examples/swe/task_app/hosted/storage/volume.py +211 -0
- examples/swe/task_app/hosted/test_agents.py +161 -0
- examples/swe/task_app/hosted/test_service.py +136 -0
- examples/swe/task_app/hosted/utils.py +62 -0
- examples/swe/task_app/morph_backend.py +178 -0
- examples/task_apps/IMAGE_ONLY_EVAL_QUICKSTART.md +258 -0
- examples/task_apps/TESTING.md +275 -0
- examples/task_apps/banking77/__init__.py +6 -0
- examples/task_apps/banking77/banking77_task_app.py +912 -0
- examples/task_apps/banking77/deploy_wrapper.py +46 -0
- examples/task_apps/banking77_pipeline/__init__.py +6 -0
- examples/task_apps/banking77_pipeline/banking77_pipeline_task_app.py +489 -0
- examples/task_apps/banking77_pipeline/deploy_wrapper.py +50 -0
- examples/task_apps/crafter/CREATE_SFT_DATASET.md +286 -0
- examples/task_apps/crafter/EVAL_IMAGE_ONLY_RESULTS.md +152 -0
- examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +187 -0
- examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +281 -0
- examples/task_apps/crafter/QUERY_EXAMPLES.md +203 -0
- examples/task_apps/crafter/README_IMAGE_ONLY_EVAL.md +316 -0
- examples/task_apps/crafter/eval_image_only_gpt4o.toml +28 -0
- examples/task_apps/crafter/eval_text_only_groq_llama.toml +36 -0
- examples/task_apps/crafter/filter_sft_dataset.toml +16 -0
- examples/task_apps/crafter/task_app/README.md +42 -0
- examples/task_apps/crafter/task_app/__init__.py +5 -0
- examples/task_apps/crafter/task_app/grpo_crafter.py +1055 -0
- examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +146 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/README.md +173 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/branching.py +143 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/environment.py +532 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +583 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +122 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +253 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +999 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/main.py +100 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +1252 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/registry.py +195 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +2233 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/test_service.py +136 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +411 -0
- examples/task_apps/dev/pokemon_emerald/__init__.py +2 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/README.md +811 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/__init__.py +120 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/action.py +160 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/memory.py +155 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/perception.py +69 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/planning.py +96 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/simple.py +1502 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/agent/system_prompt.py +4 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/grab_map.py +68 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/manual.py +216 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/__init__.py +35 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/emerald_utils.py +631 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/emulator.py +1544 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/enums.py +1428 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/memory_reader.py +4848 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/types.py +41 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pokemon_env/utils.py +298 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/pyproject.toml +95 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/run.py +204 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/app.py +2152 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/client.py +429 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server/frame_server.py +155 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/README.md +78 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/run_tests.py +122 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_agent_direct.py +76 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_agent_prompts.py +413 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_battle_state_formatting.py +204 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_dialogue_detection.py +133 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_dialogue_detection_comprehensive.py +229 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_direct_agent_emulator.py +300 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_fps_adjustment_pytest.py +205 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_house_to_outside_direct.py +200 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_house_to_outside_transition.py +284 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_map_ground_truth_comparison.py +468 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_memory_map.py +575 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_server_map_validation.py +311 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests/test_torchic_state.py +259 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/anticheat.py +372 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/checkpoint.py +296 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/error_handler.py +275 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/get_local_ip.py +22 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/helpers.py +44 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/llm_logger.py +514 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_formatter.py +415 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_stitcher.py +1763 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_stitcher_singleton.py +33 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_trimmer.py +106 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/map_visualizer.py +334 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/ocr_dialogue.py +1020 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/recording.py +188 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/state_formatter.py +1481 -0
- examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils/vlm.py +862 -0
- examples/task_apps/dev/pokemon_emerald/modal_app.py +114 -0
- examples/task_apps/dev/pokemon_emerald/task_app/README.md +81 -0
- examples/task_apps/dev/pokemon_emerald/task_app/__init__.py +6 -0
- examples/task_apps/dev/pokemon_emerald/task_app/pokemon_emerald.py +685 -0
- examples/task_apps/enron/__init__.py +2 -0
- examples/task_apps/enron/eval_groq_qwen32.toml +16 -0
- examples/task_apps/enron/filter_sft.toml +5 -0
- examples/task_apps/enron/task_app/README.md +14 -0
- examples/task_apps/enron/task_app/__init__.py +1 -0
- examples/task_apps/enron/task_app/grpo_enron.py +906 -0
- examples/task_apps/enron/task_app/grpo_enron_task_app.py +146 -0
- examples/task_apps/enron/tests/__init__.py +4 -0
- examples/task_apps/enron/tests/conftest.py +115 -0
- examples/task_apps/enron/tests/integration/__init__.py +4 -0
- examples/task_apps/enron/tests/integration/test_enron_eval.py +179 -0
- examples/task_apps/enron/tests/integration/test_enron_rollout.py +135 -0
- examples/task_apps/enron/tests/unit/__init__.py +4 -0
- examples/task_apps/enron/tests/unit/test_enron_environment.py +126 -0
- examples/task_apps/gepa_benchmarks/__init__.py +7 -0
- examples/task_apps/gepa_benchmarks/common.py +260 -0
- examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
- examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
- examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
- examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
- examples/task_apps/math/README.md +21 -0
- examples/task_apps/math/math_single_step.py +1000 -0
- examples/task_apps/math/math_task_app.py +115 -0
- examples/task_apps/pokemon_battle/__init__.py +2 -0
- examples/task_apps/pokemon_battle/modal_app.py +104 -0
- examples/task_apps/pokemon_battle/task_app/README.md +68 -0
- examples/task_apps/pokemon_battle/task_app/__init__.py +6 -0
- examples/task_apps/pokemon_battle/task_app/pokemon_showdown.py +932 -0
- examples/task_apps/pokemon_red/EVAL_IMAGE_ONLY_COMPLETE.md +283 -0
- examples/task_apps/pokemon_red/EVAL_IMAGE_ONLY_STATUS.md +155 -0
- examples/task_apps/pokemon_red/README.md +356 -0
- examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +428 -0
- examples/task_apps/pokemon_red/__init__.py +3 -0
- examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +30 -0
- examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +224 -0
- examples/task_apps/pokemon_red/pallet_town_rl_config.toml +75 -0
- examples/task_apps/pokemon_red/task_app.py +1048 -0
- examples/task_apps/pokemon_red/test_pallet_town_rewards.py +193 -0
- examples/task_apps/sokoban/README.md +306 -0
- examples/task_apps/sokoban/__init__.py +3 -0
- examples/task_apps/sokoban/eval_groq_qwen32.toml +16 -0
- examples/task_apps/sokoban/eval_openai_gpt5.toml +16 -0
- examples/task_apps/sokoban/filter_sft.toml +5 -0
- examples/task_apps/sokoban/task_app.py +1058 -0
- examples/task_apps/sokoban/tests/__init__.py +4 -0
- examples/task_apps/sokoban/tests/conftest.py +113 -0
- examples/task_apps/sokoban/tests/integration/__init__.py +4 -0
- examples/task_apps/sokoban/tests/integration/test_sokoban_eval.py +57 -0
- examples/task_apps/sokoban/tests/integration/test_sokoban_rollout.py +198 -0
- examples/task_apps/sokoban/tests/unit/__init__.py +4 -0
- examples/task_apps/sokoban/tests/unit/test_sokoban_environment.py +114 -0
- examples/task_apps/verilog/__init__.py +1 -0
- examples/task_apps/verilog/eval_groq_qwen32b.toml +22 -0
- examples/task_apps/verilog/filter_sft.toml +5 -0
- examples/task_apps/verilog/task_app/README.md +12 -0
- examples/task_apps/verilog/task_app/__init__.py +1 -0
- examples/task_apps/verilog/task_app/grpo_verilog.py +1166 -0
- examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +145 -0
- examples/task_apps/verilog/tests/__init__.py +4 -0
- examples/task_apps/verilog/tests/conftest.py +115 -0
- examples/task_apps/verilog/tests/integration/__init__.py +4 -0
- examples/task_apps/verilog/tests/integration/test_verilog_eval.py +181 -0
- examples/task_apps/verilog/tests/integration/test_verilog_rollout.py +55 -0
- examples/task_apps/verilog/tests/unit/__init__.py +4 -0
- examples/task_apps/verilog/tests/unit/test_verilog_scoring.py +118 -0
- examples/tunnel_gepa_banking77/README.md +106 -0
- examples/tunnel_gepa_banking77/banking77_gepa_tunnel.toml +95 -0
- examples/tunnel_gepa_banking77/keep_tunnel_running.py +60 -0
- examples/tunnel_gepa_banking77/run_gepa_with_tunnel.sh +226 -0
- examples/vlm/PROPOSAL.md +53 -0
- examples/vlm/README.md +68 -0
- examples/vlm/configs/crafter_vlm_gpt4o.toml +49 -0
- examples/vlm/crafter_image_only_agent.py +207 -0
- examples/vlm/crafter_openai_vlm_agent.py +275 -0
- examples/vlm/filter_image_rows.py +63 -0
- examples/vlm/run_crafter_vlm_benchmark.py +316 -0
- examples/warming_up_to_rl/_utils.py +92 -0
- examples/warming_up_to_rl/analyze_trace_db.py +422 -0
- examples/warming_up_to_rl/configs/crafter_fft.toml +53 -0
- examples/warming_up_to_rl/configs/crafter_fft_4b.toml +54 -0
- examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +22 -0
- examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +15 -0
- examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +24 -0
- examples/warming_up_to_rl/configs/eval_stepwise_complex.toml +35 -0
- examples/warming_up_to_rl/configs/eval_stepwise_consistent.toml +26 -0
- examples/warming_up_to_rl/configs/eval_stepwise_per_achievement.toml +36 -0
- examples/warming_up_to_rl/configs/eval_stepwise_simple.toml +32 -0
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +85 -0
- examples/warming_up_to_rl/configs/rl_from_ft.toml +58 -0
- examples/warming_up_to_rl/export_trace_sft.py +837 -0
- examples/warming_up_to_rl/groq_test.py +97 -0
- examples/warming_up_to_rl/manage_secrets.py +131 -0
- examples/warming_up_to_rl/old/event_rewards.md +234 -0
- examples/warming_up_to_rl/old/notes.md +73 -0
- examples/warming_up_to_rl/readme.md +110 -0
- examples/warming_up_to_rl/run_eval.py +736 -0
- examples/warming_up_to_rl/run_fft_and_save.py +380 -0
- examples/warming_up_to_rl/run_local_rollout.py +239 -0
- examples/warming_up_to_rl/run_local_rollout_modal.py +248 -0
- examples/warming_up_to_rl/run_local_rollout_parallel.py +405 -0
- examples/warming_up_to_rl/run_local_rollout_traced.py +477 -0
- examples/warming_up_to_rl/run_rl_and_save.py +124 -0
- examples/warming_up_to_rl/run_rollout_remote.py +156 -0
- examples/warming_up_to_rl/task_app/README.md +42 -0
- examples/warming_up_to_rl/task_app/grpo_crafter.py +876 -0
- examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +454 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +253 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +729 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1114 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1891 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +129 -0
- examples/workflows/math_rl/configs/eval_base_qwen.toml +15 -0
- examples/workflows/math_rl/configs/eval_rl_qwen.toml +11 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen.toml +62 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +80 -0
- examples/workflows/math_rl/configs/rl_from_ft_qwen.toml +35 -0
- examples/workflows/math_rl/download_dataset.py +80 -0
- examples/workflows/math_rl/run_eval.py +436 -0
- examples/workflows/math_rl/run_rl_and_save.py +111 -0
- synth_ai/__init__.py +47 -23
- synth_ai/_utils/__init__.py +47 -0
- synth_ai/_utils/base_url.py +10 -0
- synth_ai/_utils/http.py +10 -0
- synth_ai/_utils/prompts.py +10 -0
- synth_ai/_utils/task_app_state.py +12 -0
- synth_ai/_utils/user_config.py +10 -0
- synth_ai/api/models/supported.py +514 -0
- synth_ai/api/train/__init__.py +63 -0
- synth_ai/api/train/builders.py +473 -0
- synth_ai/api/train/cli.py +1185 -0
- synth_ai/api/train/config_finder.py +246 -0
- synth_ai/api/train/configs/__init__.py +65 -0
- synth_ai/api/train/configs/prompt_learning.py +496 -0
- synth_ai/api/train/configs/rl.py +188 -0
- synth_ai/api/train/configs/sft.py +99 -0
- synth_ai/api/train/configs/shared.py +81 -0
- synth_ai/api/train/env_resolver.py +352 -0
- synth_ai/api/train/pollers.py +91 -0
- synth_ai/api/train/prompt_learning.py +425 -0
- synth_ai/api/train/sft.py +390 -0
- synth_ai/api/train/supported_algos.py +147 -0
- synth_ai/api/train/task_app.py +195 -0
- synth_ai/api/train/utils.py +244 -0
- synth_ai/api/train/validators.py +1117 -0
- synth_ai/api/tunnel.py +49 -0
- synth_ai/auth/credentials.py +94 -0
- synth_ai/baseline/__init__.py +25 -0
- synth_ai/baseline/config.py +209 -0
- synth_ai/baseline/discovery.py +214 -0
- synth_ai/baseline/execution.py +146 -0
- synth_ai/cfgs.py +227 -0
- synth_ai/cli/__init__.py +90 -45
- synth_ai/cli/_modal_wrapper.py +31 -0
- synth_ai/cli/_storage.py +20 -0
- synth_ai/cli/_typer_patch.py +47 -0
- synth_ai/cli/_validate_task_app.py +29 -0
- synth_ai/cli/balance.py +16 -4
- synth_ai/cli/calc.py +36 -21
- synth_ai/cli/claude.py +70 -0
- synth_ai/cli/codex.py +267 -0
- synth_ai/cli/commands/__init__.py +18 -0
- synth_ai/cli/commands/baseline/__init__.py +12 -0
- synth_ai/cli/commands/baseline/core.py +637 -0
- synth_ai/cli/commands/baseline/list.py +93 -0
- synth_ai/cli/commands/demo/__init__.py +6 -0
- synth_ai/cli/commands/demo/core.py +163 -0
- synth_ai/cli/commands/eval/__init__.py +19 -0
- synth_ai/cli/commands/eval/core.py +1112 -0
- synth_ai/cli/commands/eval/errors.py +81 -0
- synth_ai/cli/commands/eval/validation.py +133 -0
- synth_ai/cli/commands/filter/__init__.py +12 -0
- synth_ai/cli/commands/filter/core.py +424 -0
- synth_ai/cli/commands/filter/errors.py +55 -0
- synth_ai/cli/commands/filter/validation.py +77 -0
- synth_ai/cli/commands/help/__init__.py +185 -0
- synth_ai/cli/commands/help/core.py +72 -0
- synth_ai/cli/commands/smoke/__init__.py +7 -0
- synth_ai/cli/commands/smoke/core.py +1437 -0
- synth_ai/cli/commands/status/__init__.py +66 -0
- synth_ai/cli/commands/status/client.py +192 -0
- synth_ai/cli/commands/status/config.py +92 -0
- synth_ai/cli/commands/status/errors.py +20 -0
- synth_ai/cli/commands/status/formatters.py +164 -0
- synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
- synth_ai/cli/commands/status/subcommands/files.py +79 -0
- synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
- synth_ai/cli/commands/status/subcommands/models.py +79 -0
- synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
- synth_ai/cli/commands/status/subcommands/runs.py +81 -0
- synth_ai/cli/commands/status/subcommands/session.py +183 -0
- synth_ai/cli/commands/status/subcommands/summary.py +47 -0
- synth_ai/cli/commands/status/subcommands/usage.py +203 -0
- synth_ai/cli/commands/status/utils.py +114 -0
- synth_ai/cli/commands/train/__init__.py +53 -0
- synth_ai/cli/commands/train/core.py +21 -0
- synth_ai/cli/commands/train/errors.py +117 -0
- synth_ai/cli/commands/train/judge_schemas.py +200 -0
- synth_ai/cli/commands/train/judge_validation.py +305 -0
- synth_ai/cli/commands/train/validation.py +386 -0
- synth_ai/cli/demo.py +32 -140
- synth_ai/cli/deploy.py +233 -0
- synth_ai/cli/eval/__init__.py +36 -0
- synth_ai/cli/eval/core.py +5 -0
- synth_ai/cli/eval/errors.py +31 -0
- synth_ai/cli/eval/validation.py +5 -0
- synth_ai/cli/filter/__init__.py +28 -0
- synth_ai/cli/filter/core.py +5 -0
- synth_ai/cli/filter/errors.py +23 -0
- synth_ai/cli/filter/validation.py +5 -0
- synth_ai/cli/legacy_root_backup.py +28 -22
- synth_ai/cli/lib/__init__.py +10 -0
- synth_ai/cli/lib/task_app_discovery.py +7 -0
- synth_ai/cli/lib/task_app_env.py +518 -0
- synth_ai/cli/mcp.py +34 -0
- synth_ai/cli/modal_serve/__init__.py +12 -0
- synth_ai/cli/modal_serve/core.py +14 -0
- synth_ai/cli/modal_serve/errors.py +8 -0
- synth_ai/cli/modal_serve/validation.py +11 -0
- synth_ai/cli/opencode.py +256 -0
- synth_ai/cli/recent.py +13 -7
- synth_ai/cli/rl_demo.py +166 -114
- synth_ai/cli/root.py +143 -112
- synth_ai/cli/serve/__init__.py +12 -0
- synth_ai/cli/serve/core.py +14 -0
- synth_ai/cli/serve/errors.py +8 -0
- synth_ai/cli/serve/validation.py +11 -0
- synth_ai/cli/setup.py +49 -0
- synth_ai/cli/status.py +7 -125
- synth_ai/cli/task_app_deploy.py +7 -0
- synth_ai/cli/task_app_list.py +25 -0
- synth_ai/cli/task_app_modal_serve.py +11 -0
- synth_ai/cli/task_app_serve.py +11 -0
- synth_ai/cli/task_apps.py +3134 -0
- synth_ai/cli/traces.py +9 -5
- synth_ai/cli/train/__init__.py +12 -0
- synth_ai/cli/train/core.py +21 -0
- synth_ai/cli/train/errors.py +8 -0
- synth_ai/cli/train/validation.py +24 -0
- synth_ai/cli/train.py +5 -0
- synth_ai/cli/turso.py +73 -0
- synth_ai/cli/watch.py +13 -18
- synth_ai/demos/__init__.py +10 -0
- synth_ai/demos/core/__init__.py +28 -1
- synth_ai/demos/core/cli.py +745 -416
- synth_ai/demos/crafter/__init__.py +1 -0
- synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
- synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
- synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
- synth_ai/demos/demo_registry.py +176 -0
- synth_ai/demos/demo_task_apps/__init__.py +7 -1
- synth_ai/demos/demo_task_apps/core.py +75 -37
- synth_ai/demos/demo_task_apps/crafter/__init__.py +1 -0
- synth_ai/demos/demo_task_apps/crafter/configs/crafter_fft_4b.toml +53 -0
- synth_ai/demos/demo_task_apps/crafter/configs/rl_from_base_qwen4b.toml +73 -0
- synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +184 -0
- synth_ai/demos/demo_task_apps/math/_common.py +1 -2
- synth_ai/demos/demo_task_apps/math/app.py +2 -1
- synth_ai/demos/demo_task_apps/math/config.toml +55 -110
- synth_ai/demos/demo_task_apps/math/deploy_modal.py +3 -6
- synth_ai/demos/demo_task_apps/math/modal_task_app.py +491 -166
- synth_ai/demos/demo_task_apps/math/task_app_entry.py +37 -0
- synth_ai/demos/math/__init__.py +1 -0
- synth_ai/demos/math/_common.py +16 -0
- synth_ai/demos/math/app.py +38 -0
- synth_ai/demos/math/config.toml +76 -0
- synth_ai/demos/math/deploy_modal.py +54 -0
- synth_ai/demos/math/modal_task_app.py +703 -0
- synth_ai/demos/math/task_app_entry.py +51 -0
- synth_ai/environments/environment/core.py +7 -1
- synth_ai/environments/examples/bandit/engine.py +12 -5
- synth_ai/environments/examples/bandit/environment.py +0 -1
- synth_ai/environments/examples/bandit/taskset.py +4 -4
- synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
- synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
- synth_ai/environments/examples/crafter_classic/environment.py +93 -2
- synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
- synth_ai/environments/examples/enron/engine.py +7 -2
- synth_ai/environments/examples/enron/environment.py +68 -0
- synth_ai/environments/examples/red/engine.py +60 -12
- synth_ai/environments/examples/red/engine_helpers/memory_map.py +7 -0
- synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
- synth_ai/environments/examples/red/engine_helpers/reward_library/pallet_town_progression.py +477 -0
- synth_ai/environments/examples/red/engine_helpers/state_extraction.py +32 -0
- synth_ai/environments/examples/red/environment.py +86 -0
- synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
- synth_ai/environments/examples/sokoban/taskset.py +116 -0
- synth_ai/environments/examples/verilog/engine.py +104 -12
- synth_ai/environments/examples/wordle/environment.py +0 -1
- synth_ai/environments/reproducibility/tree.py +5 -6
- synth_ai/environments/service/app.py +11 -12
- synth_ai/environments/service/core_routes.py +10 -9
- synth_ai/environments/stateful/engine.py +1 -1
- synth_ai/environments/tasks/core.py +1 -0
- synth_ai/environments/tasks/filters.py +5 -6
- synth_ai/environments/tasks/utils.py +4 -5
- synth_ai/evals/__init__.py +15 -0
- synth_ai/evals/base.py +14 -5
- synth_ai/evals/client.py +82 -0
- synth_ai/evals/types.py +42 -0
- synth_ai/http.py +8 -22
- synth_ai/http_client.py +45 -12
- synth_ai/inference/__init__.py +0 -2
- synth_ai/inference/client.py +21 -7
- synth_ai/jobs/client.py +129 -80
- synth_ai/judge_schemas.py +127 -0
- synth_ai/learning/__init__.py +51 -6
- synth_ai/learning/algorithms.py +14 -0
- synth_ai/learning/client.py +122 -30
- synth_ai/learning/config.py +2 -40
- synth_ai/learning/constants.py +0 -2
- synth_ai/learning/ft_client.py +4 -56
- synth_ai/learning/health.py +14 -8
- synth_ai/learning/jobs.py +43 -47
- synth_ai/learning/prompt_learning_client.py +276 -0
- synth_ai/learning/prompt_learning_types.py +185 -0
- synth_ai/{rl → learning/rl}/__init__.py +14 -5
- synth_ai/learning/rl/client.py +269 -0
- synth_ai/learning/rl/config.py +31 -0
- synth_ai/{rl → learning/rl}/contracts.py +5 -10
- synth_ai/{rl → learning/rl}/env_keys.py +45 -16
- synth_ai/learning/rl/secrets.py +13 -0
- synth_ai/learning/rl_client.py +2 -253
- synth_ai/learning/sft/__init__.py +29 -0
- synth_ai/learning/sft/client.py +68 -0
- synth_ai/learning/sft/config.py +270 -0
- synth_ai/learning/sft/data.py +698 -0
- synth_ai/learning/sse.py +25 -26
- synth_ai/learning/validators.py +29 -25
- synth_ai/mcp/__init__.py +5 -0
- synth_ai/mcp/__main__.py +8 -0
- synth_ai/mcp/main.py +254 -0
- synth_ai/mcp/setup.py +100 -0
- synth_ai/modal.py +257 -0
- synth_ai/pricing/__init__.py +3 -0
- synth_ai/pricing/model_pricing.py +64 -0
- synth_ai/session/__init__.py +75 -0
- synth_ai/session/client.py +383 -0
- synth_ai/session/constants.py +63 -0
- synth_ai/session/exceptions.py +105 -0
- synth_ai/session/manager.py +139 -0
- synth_ai/session/models.py +89 -0
- synth_ai/session/query.py +110 -0
- synth_ai/spec/__init__.py +46 -0
- synth_ai/spec/dataclasses.py +149 -0
- synth_ai/spec/loader.py +144 -0
- synth_ai/spec/serializer.py +199 -0
- synth_ai/spec/validation.py +250 -0
- synth_ai/streaming/__init__.py +29 -0
- synth_ai/streaming/config.py +94 -0
- synth_ai/streaming/handlers.py +589 -0
- synth_ai/streaming/streamer.py +320 -0
- synth_ai/streaming/types.py +95 -0
- synth_ai/task/__init__.py +116 -3
- synth_ai/task/apps/__init__.py +132 -0
- synth_ai/task/auth.py +165 -0
- synth_ai/task/client.py +167 -0
- synth_ai/task/config.py +261 -0
- synth_ai/task/contracts.py +173 -57
- synth_ai/task/datasets.py +108 -0
- synth_ai/task/errors.py +50 -0
- synth_ai/task/health.py +17 -11
- synth_ai/task/inference_api.py +101 -0
- synth_ai/task/json.py +111 -0
- synth_ai/task/proxy.py +251 -0
- synth_ai/task/rubrics/__init__.py +55 -0
- synth_ai/task/rubrics/loaders.py +156 -0
- synth_ai/task/rubrics/models.py +57 -0
- synth_ai/task/rubrics/scoring.py +116 -0
- synth_ai/task/rubrics/strict.py +149 -0
- synth_ai/task/rubrics.py +219 -0
- synth_ai/task/server.py +432 -0
- synth_ai/task/trace_correlation_helpers.py +328 -0
- synth_ai/task/tracing_utils.py +95 -0
- synth_ai/task/validators.py +449 -6
- synth_ai/task/vendors.py +59 -0
- synth_ai/tracing_v3/__init__.py +4 -0
- synth_ai/tracing_v3/abstractions.py +21 -4
- synth_ai/tracing_v3/config.py +167 -22
- synth_ai/tracing_v3/constants.py +21 -0
- synth_ai/tracing_v3/db_config.py +42 -29
- synth_ai/tracing_v3/decorators.py +80 -45
- synth_ai/tracing_v3/examples/basic_usage.py +15 -9
- synth_ai/tracing_v3/hooks.py +6 -4
- synth_ai/tracing_v3/llm_call_record_helpers.py +161 -61
- synth_ai/tracing_v3/migration_helper.py +1 -2
- synth_ai/tracing_v3/replica_sync.py +12 -7
- synth_ai/tracing_v3/serialization.py +130 -0
- synth_ai/tracing_v3/session_tracer.py +86 -21
- synth_ai/tracing_v3/storage/base.py +98 -12
- synth_ai/tracing_v3/storage/config.py +63 -16
- synth_ai/tracing_v3/storage/factory.py +11 -9
- synth_ai/tracing_v3/storage/utils.py +15 -11
- synth_ai/tracing_v3/trace_utils.py +317 -0
- synth_ai/tracing_v3/turso/__init__.py +8 -21
- synth_ai/tracing_v3/turso/daemon.py +123 -15
- synth_ai/tracing_v3/turso/models.py +5 -2
- synth_ai/tracing_v3/turso/native_manager.py +1293 -0
- synth_ai/tracing_v3/utils.py +5 -4
- synth_ai/tunnel.py +143 -0
- synth_ai/tunnel_deploy.py +278 -0
- synth_ai/types.py +8 -0
- synth_ai/urls.py +11 -0
- synth_ai/utils/__init__.py +166 -0
- synth_ai/utils/agents.py +74 -0
- synth_ai/utils/apps.py +152 -0
- synth_ai/utils/base_url.py +94 -0
- synth_ai/utils/bin.py +39 -0
- synth_ai/utils/claude.py +36 -0
- synth_ai/utils/cli.py +284 -0
- synth_ai/utils/config.py +81 -0
- synth_ai/utils/env.py +346 -0
- synth_ai/utils/errors.py +85 -0
- synth_ai/utils/http.py +172 -0
- synth_ai/utils/json.py +72 -0
- synth_ai/utils/log_filter.py +99 -0
- synth_ai/utils/logging.py +198 -0
- synth_ai/utils/modal.py +299 -0
- synth_ai/utils/paths.py +95 -0
- synth_ai/utils/process.py +233 -0
- synth_ai/utils/prompts.py +39 -0
- synth_ai/utils/sqld.py +122 -0
- synth_ai/utils/ssl.py +25 -0
- synth_ai/utils/task_app_discovery.py +882 -0
- synth_ai/utils/task_app_env.py +186 -0
- synth_ai/utils/task_app_state.py +318 -0
- synth_ai/utils/tunnel/__init__.py +12 -0
- synth_ai/utils/tunnel/config.py +55 -0
- synth_ai/utils/user_config.py +137 -0
- synth_ai/uvicorn.py +77 -0
- synth_ai-0.2.23.dev3.dist-info/METADATA +357 -0
- synth_ai-0.2.23.dev3.dist-info/RECORD +983 -0
- {synth_ai-0.2.8.dev4.dist-info → synth_ai-0.2.23.dev3.dist-info}/entry_points.txt +0 -1
- {synth_ai-0.2.8.dev4.dist-info → synth_ai-0.2.23.dev3.dist-info}/top_level.txt +1 -0
- synth_ai/cli/man.py +0 -106
- synth_ai/core/experiment.py +0 -15
- synth_ai/core/system.py +0 -15
- synth_ai/environments/examples/sokoban/units/astar_common.py +0 -95
- synth_ai/experimental/synth_oss.py +0 -446
- synth_ai/handshake.py +0 -63
- synth_ai/install_sqld.sh +0 -40
- synth_ai/learning/offline/dpo.py +0 -0
- synth_ai/learning/offline/providers.py +0 -7
- synth_ai/learning/offline/sft.py +0 -0
- synth_ai/learning/offline/shared.py +0 -0
- synth_ai/learning/online/grpo.py +0 -0
- synth_ai/learning/online/irft.py +0 -0
- synth_ai/learning/prompts/banking77_injection_eval.py +0 -168
- synth_ai/learning/prompts/gepa.py +0 -0
- synth_ai/learning/prompts/hello_world_in_context_injection_ex.py +0 -213
- synth_ai/learning/prompts/mipro.py +0 -289
- synth_ai/learning/prompts/random_search.py +0 -246
- synth_ai/learning/prompts/run_mipro_banking77.py +0 -172
- synth_ai/learning/prompts/run_random_search_banking77.py +0 -324
- synth_ai/lm/__init__.py +0 -51
- synth_ai/lm/caching/constants.py +0 -6
- synth_ai/lm/caching/dbs.py +0 -0
- synth_ai/lm/caching/ephemeral.py +0 -102
- synth_ai/lm/caching/handler.py +0 -137
- synth_ai/lm/caching/initialize.py +0 -11
- synth_ai/lm/caching/persistent.py +0 -114
- synth_ai/lm/config.py +0 -110
- synth_ai/lm/constants.py +0 -32
- synth_ai/lm/core/__init__.py +0 -8
- synth_ai/lm/core/all.py +0 -73
- synth_ai/lm/core/exceptions.py +0 -7
- synth_ai/lm/core/main.py +0 -319
- synth_ai/lm/core/main_v3.py +0 -594
- synth_ai/lm/core/synth_models.py +0 -48
- synth_ai/lm/core/vendor_clients.py +0 -188
- synth_ai/lm/cost/monitor.py +0 -1
- synth_ai/lm/cost/statefulness.py +0 -1
- synth_ai/lm/injection.py +0 -80
- synth_ai/lm/overrides.py +0 -206
- synth_ai/lm/provider_support/__init__.py +0 -8
- synth_ai/lm/provider_support/anthropic.py +0 -972
- synth_ai/lm/provider_support/openai.py +0 -1139
- synth_ai/lm/provider_support/suppress_logging.py +0 -31
- synth_ai/lm/structured_outputs/handler.py +0 -440
- synth_ai/lm/structured_outputs/inject.py +0 -297
- synth_ai/lm/structured_outputs/rehabilitate.py +0 -185
- synth_ai/lm/tools/__init__.py +0 -3
- synth_ai/lm/tools/base.py +0 -172
- synth_ai/lm/unified_interface.py +0 -202
- synth_ai/lm/vendors/base.py +0 -81
- synth_ai/lm/vendors/core/anthropic_api.py +0 -387
- synth_ai/lm/vendors/core/gemini_api.py +0 -292
- synth_ai/lm/vendors/core/mistral_api.py +0 -322
- synth_ai/lm/vendors/core/openai_api.py +0 -225
- synth_ai/lm/vendors/core/synth_dev_api.py +0 -0
- synth_ai/lm/vendors/local/ollama.py +0 -0
- synth_ai/lm/vendors/openai_standard.py +0 -780
- synth_ai/lm/vendors/openai_standard_responses.py +0 -256
- synth_ai/lm/vendors/retries.py +0 -22
- synth_ai/lm/vendors/supported/custom_endpoint.py +0 -417
- synth_ai/lm/vendors/supported/deepseek.py +0 -69
- synth_ai/lm/vendors/supported/grok.py +0 -75
- synth_ai/lm/vendors/supported/groq.py +0 -16
- synth_ai/lm/vendors/supported/ollama.py +0 -15
- synth_ai/lm/vendors/supported/openrouter.py +0 -74
- synth_ai/lm/vendors/supported/together.py +0 -11
- synth_ai/lm/vendors/synth_client.py +0 -808
- synth_ai/lm/warmup.py +0 -186
- synth_ai/rl/secrets.py +0 -19
- synth_ai/scripts/verify_rewards.py +0 -100
- synth_ai/tracing/__init__.py +0 -30
- synth_ai/tracing_v1/__init__.py +0 -33
- synth_ai/tracing_v3/turso/manager.py +0 -760
- synth_ai/v0/tracing/abstractions.py +0 -224
- synth_ai/v0/tracing/base_client.py +0 -91
- synth_ai/v0/tracing/client_manager.py +0 -131
- synth_ai/v0/tracing/config.py +0 -142
- synth_ai/v0/tracing/context.py +0 -146
- synth_ai/v0/tracing/decorators.py +0 -682
- synth_ai/v0/tracing/events/__init__.py +0 -0
- synth_ai/v0/tracing/events/manage.py +0 -147
- synth_ai/v0/tracing/events/scope.py +0 -86
- synth_ai/v0/tracing/events/store.py +0 -228
- synth_ai/v0/tracing/immediate_client.py +0 -151
- synth_ai/v0/tracing/local.py +0 -18
- synth_ai/v0/tracing/log_client_base.py +0 -73
- synth_ai/v0/tracing/retry_queue.py +0 -186
- synth_ai/v0/tracing/trackers.py +0 -515
- synth_ai/v0/tracing/upload.py +0 -512
- synth_ai/v0/tracing/utils.py +0 -9
- synth_ai/v0/tracing_v1/__init__.py +0 -16
- synth_ai/v0/tracing_v1/abstractions.py +0 -224
- synth_ai/v0/tracing_v1/base_client.py +0 -91
- synth_ai/v0/tracing_v1/client_manager.py +0 -131
- synth_ai/v0/tracing_v1/config.py +0 -142
- synth_ai/v0/tracing_v1/context.py +0 -146
- synth_ai/v0/tracing_v1/decorators.py +0 -703
- synth_ai/v0/tracing_v1/events/__init__.py +0 -0
- synth_ai/v0/tracing_v1/events/manage.py +0 -147
- synth_ai/v0/tracing_v1/events/scope.py +0 -86
- synth_ai/v0/tracing_v1/events/store.py +0 -228
- synth_ai/v0/tracing_v1/immediate_client.py +0 -151
- synth_ai/v0/tracing_v1/local.py +0 -18
- synth_ai/v0/tracing_v1/log_client_base.py +0 -73
- synth_ai/v0/tracing_v1/retry_queue.py +0 -186
- synth_ai/v0/tracing_v1/trackers.py +0 -515
- synth_ai/v0/tracing_v1/upload.py +0 -527
- synth_ai/v0/tracing_v1/utils.py +0 -9
- synth_ai/zyk/__init__.py +0 -30
- synth_ai-0.2.8.dev4.dist-info/METADATA +0 -129
- synth_ai-0.2.8.dev4.dist-info/RECORD +0 -420
- {synth_ai/lm/caching → examples/task_apps}/__init__.py +0 -0
- {synth_ai/lm/cost → examples/task_apps/crafter}/__init__.py +0 -0
- {synth_ai/lm/structured_outputs → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/server}/__init__.py +0 -0
- {synth_ai/lm/vendors → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/tests}/__init__.py +0 -0
- {synth_ai/lm/vendors/core → examples/task_apps/dev/pokemon_emerald/external/pokeagent-speedrun/utils}/__init__.py +0 -0
- {synth_ai/lm/vendors/local → examples/task_apps/math}/__init__.py +0 -0
- {synth_ai/lm/vendors/supported → examples/workflows}/__init__.py +0 -0
- {synth_ai/v0/tracing → examples/workflows/math_rl}/__init__.py +0 -0
- /synth_ai/{compound/cais.py → cli/__main__.py} +0 -0
- /synth_ai/{learning/filtering.py → py.typed} +0 -0
- {synth_ai-0.2.8.dev4.dist-info → synth_ai-0.2.23.dev3.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.8.dev4.dist-info → synth_ai-0.2.23.dev3.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
"""Metadata for GEPA blog task app coverage.
|
|
2
|
+
|
|
3
|
+
This module centralises the set of task apps that the GEPA blog post
|
|
4
|
+
references so that configuration files and documentation can import the
|
|
5
|
+
same canonical definitions. Each entry mirrors a task app that is
|
|
6
|
+
available via Synth's prompt-learning backend, making it easier to keep
|
|
7
|
+
configs, docs, and evaluation notebooks in sync.
|
|
8
|
+
"""
|
|
9
|
+
|
|
10
|
+
from __future__ import annotations
|
|
11
|
+
|
|
12
|
+
from dataclasses import dataclass
|
|
13
|
+
from typing import Iterable, Sequence
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
@dataclass(frozen=True, slots=True)
|
|
17
|
+
class TaskAppSupport:
|
|
18
|
+
"""Describes a task app that the GEPA blog supports."""
|
|
19
|
+
|
|
20
|
+
app_id: str
|
|
21
|
+
display_name: str
|
|
22
|
+
dataset_id: str
|
|
23
|
+
description: str
|
|
24
|
+
default_port: int
|
|
25
|
+
tags: Sequence[str]
|
|
26
|
+
metrics: Sequence[str]
|
|
27
|
+
sources: Sequence[str]
|
|
28
|
+
|
|
29
|
+
|
|
30
|
+
SUPPORTED_TASK_APPS: tuple[TaskAppSupport, ...] = (
|
|
31
|
+
TaskAppSupport(
|
|
32
|
+
app_id="banking77",
|
|
33
|
+
display_name="Banking77 Intent Classification",
|
|
34
|
+
dataset_id="PolyAI/banking77",
|
|
35
|
+
description="Classify banking customer support queries into 77 intents.",
|
|
36
|
+
default_port=8102,
|
|
37
|
+
tags=("classification", "intent", "nlp"),
|
|
38
|
+
metrics=("accuracy",),
|
|
39
|
+
sources=(
|
|
40
|
+
"GEPA blog quickstart",
|
|
41
|
+
"PolyAI Banking77 dataset card",
|
|
42
|
+
),
|
|
43
|
+
),
|
|
44
|
+
TaskAppSupport(
|
|
45
|
+
app_id="hotpotqa",
|
|
46
|
+
display_name="HotpotQA Multi-Hop QA",
|
|
47
|
+
dataset_id="hotpot_qa",
|
|
48
|
+
description="Answer multi-hop questions with supporting facts sourced from Wikipedia passages.",
|
|
49
|
+
default_port=8110,
|
|
50
|
+
tags=("qa", "multi-hop", "reasoning"),
|
|
51
|
+
metrics=("answer_em", "supporting_fact_f1"),
|
|
52
|
+
sources=(
|
|
53
|
+
"GEPA Table 1",
|
|
54
|
+
"HotpotQA (Yang et al., 2018)",
|
|
55
|
+
),
|
|
56
|
+
),
|
|
57
|
+
TaskAppSupport(
|
|
58
|
+
app_id="ifbench",
|
|
59
|
+
display_name="IFBench Instruction Following",
|
|
60
|
+
dataset_id="Muennighoff/IFBench",
|
|
61
|
+
description="Follow natural language instructions focusing on faithful adherence.",
|
|
62
|
+
default_port=8111,
|
|
63
|
+
tags=("instruction-following", "nlp"),
|
|
64
|
+
metrics=("compliance", "accuracy"),
|
|
65
|
+
sources=(
|
|
66
|
+
"GEPA Table 1",
|
|
67
|
+
"IFBench benchmark release",
|
|
68
|
+
),
|
|
69
|
+
),
|
|
70
|
+
TaskAppSupport(
|
|
71
|
+
app_id="hover",
|
|
72
|
+
display_name="HoVer Claim Verification",
|
|
73
|
+
dataset_id="hover",
|
|
74
|
+
description="Determine whether Wikipedia claims are supported, refuted, or not enough info given retrieved evidence.",
|
|
75
|
+
default_port=8112,
|
|
76
|
+
tags=("fact-checking", "classification"),
|
|
77
|
+
metrics=("label_accuracy", "evidence_f1"),
|
|
78
|
+
sources=(
|
|
79
|
+
"GEPA Table 1",
|
|
80
|
+
"HoVer benchmark (Jiang et al., 2020)",
|
|
81
|
+
),
|
|
82
|
+
),
|
|
83
|
+
TaskAppSupport(
|
|
84
|
+
app_id="pupa",
|
|
85
|
+
display_name="PUPA Privacy-Aware Delegation",
|
|
86
|
+
dataset_id="microsoft/PUPA",
|
|
87
|
+
description="Delegate actions while respecting privacy policies and extracting structured responses.",
|
|
88
|
+
default_port=8113,
|
|
89
|
+
tags=("delegation", "privacy", "structured-output"),
|
|
90
|
+
metrics=("privacy_compliance", "task_success"),
|
|
91
|
+
sources=(
|
|
92
|
+
"GEPA Table 1",
|
|
93
|
+
"PUPA benchmark release",
|
|
94
|
+
),
|
|
95
|
+
),
|
|
96
|
+
)
|
|
97
|
+
|
|
98
|
+
|
|
99
|
+
def list_supported_task_apps() -> Iterable[TaskAppSupport]:
|
|
100
|
+
"""Return iterable over supported task apps for convenience."""
|
|
101
|
+
|
|
102
|
+
return SUPPORTED_TASK_APPS
|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
__all__ = ["TaskAppSupport", "SUPPORTED_TASK_APPS", "list_supported_task_apps"]
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# Quick test script for GEPA Banking77 prompt learning
|
|
3
|
+
# Tests against local backend on port 8000
|
|
4
|
+
|
|
5
|
+
set -e
|
|
6
|
+
|
|
7
|
+
echo "🚀 Testing GEPA Prompt Learning for Banking77"
|
|
8
|
+
echo "=============================================="
|
|
9
|
+
|
|
10
|
+
# Check required environment variables
|
|
11
|
+
if [ -z "$SYNTH_API_KEY" ]; then
|
|
12
|
+
echo "❌ ERROR: SYNTH_API_KEY not set"
|
|
13
|
+
exit 1
|
|
14
|
+
fi
|
|
15
|
+
|
|
16
|
+
if [ -z "$ENVIRONMENT_API_KEY" ]; then
|
|
17
|
+
echo "❌ ERROR: ENVIRONMENT_API_KEY not set"
|
|
18
|
+
exit 1
|
|
19
|
+
fi
|
|
20
|
+
|
|
21
|
+
# Set backend URL (default to localhost:8000)
|
|
22
|
+
BACKEND_URL="${BACKEND_BASE_URL:-http://localhost:8000}"
|
|
23
|
+
echo "📍 Backend URL: $BACKEND_URL"
|
|
24
|
+
|
|
25
|
+
# Check backend is accessible
|
|
26
|
+
echo "🔍 Checking backend health..."
|
|
27
|
+
if curl -s -f "$BACKEND_URL/api/health" > /dev/null 2>&1; then
|
|
28
|
+
echo "✅ Backend is accessible"
|
|
29
|
+
else
|
|
30
|
+
echo "❌ ERROR: Backend not accessible at $BACKEND_URL"
|
|
31
|
+
echo " Make sure backend is running on port 8000"
|
|
32
|
+
exit 1
|
|
33
|
+
fi
|
|
34
|
+
|
|
35
|
+
# Check task app is accessible
|
|
36
|
+
TASK_APP_URL="${TASK_APP_URL:-http://127.0.0.1:8102}"
|
|
37
|
+
echo "🔍 Checking task app health..."
|
|
38
|
+
if curl -s -f -H "X-API-Key: $ENVIRONMENT_API_KEY" "$TASK_APP_URL/health" > /dev/null 2>&1; then
|
|
39
|
+
echo "✅ Task app is accessible"
|
|
40
|
+
else
|
|
41
|
+
echo "⚠️ WARNING: Task app not accessible at $TASK_APP_URL"
|
|
42
|
+
echo " You may need to deploy it first:"
|
|
43
|
+
echo " uvx synth-ai deploy banking77 --runtime uvicorn --port 8102"
|
|
44
|
+
fi
|
|
45
|
+
|
|
46
|
+
# Run GEPA training
|
|
47
|
+
echo ""
|
|
48
|
+
echo "🎯 Starting GEPA prompt optimization..."
|
|
49
|
+
echo ""
|
|
50
|
+
|
|
51
|
+
CONFIG_FILE="examples/blog_posts/gepa/configs/banking77_gepa_local.toml"
|
|
52
|
+
|
|
53
|
+
if [ ! -f "$CONFIG_FILE" ]; then
|
|
54
|
+
echo "❌ ERROR: Config file not found: $CONFIG_FILE"
|
|
55
|
+
exit 1
|
|
56
|
+
fi
|
|
57
|
+
|
|
58
|
+
uvx synth-ai train \
|
|
59
|
+
--type prompt_learning \
|
|
60
|
+
--config "$CONFIG_FILE" \
|
|
61
|
+
--backend "$BACKEND_URL" \
|
|
62
|
+
--poll \
|
|
63
|
+
--poll-timeout 3600
|
|
64
|
+
|
|
65
|
+
echo ""
|
|
66
|
+
echo "✅ GEPA training completed!"
|
|
67
|
+
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# Verify Banking77 setup is working
|
|
3
|
+
|
|
4
|
+
set -e
|
|
5
|
+
|
|
6
|
+
echo "🔍 Verifying Banking77 Setup"
|
|
7
|
+
echo "============================="
|
|
8
|
+
echo ""
|
|
9
|
+
|
|
10
|
+
cd "$(dirname "$0")/../../.."
|
|
11
|
+
|
|
12
|
+
echo "1️⃣ Checking Python import..."
|
|
13
|
+
python3 -c "
|
|
14
|
+
try:
|
|
15
|
+
from examples.task_apps.banking77.banking77_task_app import build_config
|
|
16
|
+
print(' ✅ Task app imports successfully')
|
|
17
|
+
config = build_config()
|
|
18
|
+
print(f' ✅ Config built: app_id={config.app_id}')
|
|
19
|
+
print(f' ✅ Task name: {config.name}')
|
|
20
|
+
except ImportError as e:
|
|
21
|
+
print(f' ❌ Import error: {e}')
|
|
22
|
+
print(' 💡 Run: uv pip install -e .')
|
|
23
|
+
exit(1)
|
|
24
|
+
except Exception as e:
|
|
25
|
+
print(f' ❌ Error: {e}')
|
|
26
|
+
exit(1)
|
|
27
|
+
"
|
|
28
|
+
|
|
29
|
+
echo ""
|
|
30
|
+
echo "2️⃣ Checking CLI registration..."
|
|
31
|
+
if uvx synth-ai task-app list 2>/dev/null | grep -q "banking77"; then
|
|
32
|
+
echo " ✅ Banking77 registered with CLI"
|
|
33
|
+
else
|
|
34
|
+
echo " ⚠️ Banking77 not found in task-app list"
|
|
35
|
+
echo " 💡 This is OK if you haven't run 'uv pip install -e .' yet"
|
|
36
|
+
fi
|
|
37
|
+
|
|
38
|
+
echo ""
|
|
39
|
+
echo "3️⃣ Checking helper scripts..."
|
|
40
|
+
if [ -x "./examples/blog_posts/gepa/deploy_banking77_task_app.sh" ]; then
|
|
41
|
+
echo " ✅ deploy_banking77_task_app.sh is executable"
|
|
42
|
+
else
|
|
43
|
+
echo " ❌ deploy_banking77_task_app.sh is not executable"
|
|
44
|
+
echo " 💡 Run: chmod +x ./examples/blog_posts/gepa/deploy_banking77_task_app.sh"
|
|
45
|
+
fi
|
|
46
|
+
|
|
47
|
+
if [ -x "./examples/blog_posts/gepa/run_gepa_banking77.sh" ]; then
|
|
48
|
+
echo " ✅ run_gepa_banking77.sh is executable"
|
|
49
|
+
else
|
|
50
|
+
echo " ❌ run_gepa_banking77.sh is not executable"
|
|
51
|
+
echo " 💡 Run: chmod +x ./examples/blog_posts/gepa/run_gepa_banking77.sh"
|
|
52
|
+
fi
|
|
53
|
+
|
|
54
|
+
echo ""
|
|
55
|
+
echo "4️⃣ Checking configuration files..."
|
|
56
|
+
if [ -f "./examples/blog_posts/gepa/configs/banking77_gepa_local.toml" ]; then
|
|
57
|
+
echo " ✅ banking77_gepa_local.toml exists"
|
|
58
|
+
else
|
|
59
|
+
echo " ❌ banking77_gepa_local.toml not found"
|
|
60
|
+
fi
|
|
61
|
+
|
|
62
|
+
echo ""
|
|
63
|
+
echo "5️⃣ Checking environment variables..."
|
|
64
|
+
if [ -n "$GROQ_API_KEY" ]; then
|
|
65
|
+
echo " ✅ GROQ_API_KEY is set (${GROQ_API_KEY:0:10}...)"
|
|
66
|
+
else
|
|
67
|
+
echo " ⚠️ GROQ_API_KEY not set"
|
|
68
|
+
echo " 💡 Run: export GROQ_API_KEY='gsk_...'"
|
|
69
|
+
fi
|
|
70
|
+
|
|
71
|
+
if [ -n "$ENVIRONMENT_API_KEY" ]; then
|
|
72
|
+
echo " ✅ ENVIRONMENT_API_KEY is set (${ENVIRONMENT_API_KEY:0:10}...)"
|
|
73
|
+
else
|
|
74
|
+
echo " ⚠️ ENVIRONMENT_API_KEY not set"
|
|
75
|
+
echo " 💡 Run: export ENVIRONMENT_API_KEY=\$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')"
|
|
76
|
+
fi
|
|
77
|
+
|
|
78
|
+
if [ -n "$SYNTH_API_KEY" ]; then
|
|
79
|
+
echo " ✅ SYNTH_API_KEY is set (${SYNTH_API_KEY:0:10}...)"
|
|
80
|
+
else
|
|
81
|
+
echo " ⚠️ SYNTH_API_KEY not set"
|
|
82
|
+
echo " 💡 Get from backend admin or .env.dev file"
|
|
83
|
+
fi
|
|
84
|
+
|
|
85
|
+
echo ""
|
|
86
|
+
echo "6️⃣ Checking services..."
|
|
87
|
+
if curl -s -f http://localhost:8000/api/health > /dev/null 2>&1; then
|
|
88
|
+
echo " ✅ Backend is running on http://localhost:8000"
|
|
89
|
+
else
|
|
90
|
+
echo " ⚠️ Backend not reachable at http://localhost:8000"
|
|
91
|
+
echo " 💡 Start the backend before running GEPA"
|
|
92
|
+
fi
|
|
93
|
+
|
|
94
|
+
if curl -s -f http://127.0.0.1:8102/health > /dev/null 2>&1; then
|
|
95
|
+
echo " ✅ Task app is running on http://127.0.0.1:8102"
|
|
96
|
+
else
|
|
97
|
+
echo " ⚠️ Task app not running on http://127.0.0.1:8102"
|
|
98
|
+
echo " 💡 Run: ./examples/blog_posts/gepa/deploy_banking77_task_app.sh"
|
|
99
|
+
fi
|
|
100
|
+
|
|
101
|
+
echo ""
|
|
102
|
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
|
103
|
+
echo "Summary"
|
|
104
|
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
|
105
|
+
echo ""
|
|
106
|
+
echo "To run Banking77 GEPA:"
|
|
107
|
+
echo ""
|
|
108
|
+
echo " 1. Install dependencies:"
|
|
109
|
+
echo " uv pip install -e ."
|
|
110
|
+
echo ""
|
|
111
|
+
echo " 2. Set environment variables:"
|
|
112
|
+
echo " export GROQ_API_KEY='gsk_...'"
|
|
113
|
+
echo " export SYNTH_API_KEY='your-backend-key'"
|
|
114
|
+
echo " export ENVIRONMENT_API_KEY=\$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')"
|
|
115
|
+
echo ""
|
|
116
|
+
echo " 3. Start task app (Terminal 1):"
|
|
117
|
+
echo " ./examples/blog_posts/gepa/deploy_banking77_task_app.sh"
|
|
118
|
+
echo ""
|
|
119
|
+
echo " 4. Run GEPA (Terminal 2):"
|
|
120
|
+
echo " ./examples/blog_posts/gepa/run_gepa_banking77.sh"
|
|
121
|
+
echo ""
|
|
122
|
+
echo "✅ Setup verification complete!"
|
|
123
|
+
|
|
@@ -0,0 +1,415 @@
|
|
|
1
|
+
# MIPROv2: Multi-Objective Prompt Optimization
|
|
2
|
+
|
|
3
|
+
This directory contains examples and configurations for using MIPROv2 (Multi-Objective Prompt Optimization) to optimize prompts for various classification and reasoning tasks.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
**MIPROv2** is a meta-learning algorithm that optimizes prompts using:
|
|
8
|
+
- **Bootstrap Phase**: Collects few-shot examples from high-scoring seeds
|
|
9
|
+
- **Meta-Model**: LLM that proposes prompt improvements based on demonstrations
|
|
10
|
+
- **TPE Optimization**: Tree-structured Parzen Estimator for efficient hyperparameter search
|
|
11
|
+
- **Mini-Batch Evaluation**: Efficient online evaluation on small seed pools
|
|
12
|
+
|
|
13
|
+
MIPROv2 is particularly effective when:
|
|
14
|
+
- You want faster convergence with fewer evaluations (~100 vs ~1000 for GEPA)
|
|
15
|
+
- You have clear task structure (can bootstrap with examples)
|
|
16
|
+
- You need efficient optimization (mini-batch evaluation)
|
|
17
|
+
- You want meta-learning benefits (few-shot adaptation)
|
|
18
|
+
|
|
19
|
+
## Supported Tasks
|
|
20
|
+
|
|
21
|
+
Configuration files live under `configs/`:
|
|
22
|
+
|
|
23
|
+
| Task | Description | Config Files |
|
|
24
|
+
|------|-------------|--------------|
|
|
25
|
+
| **Banking77** | Intent classification (single-step) | `banking77_mipro_local.toml`, `banking77_mipro_test.toml` |
|
|
26
|
+
| **Banking77 Pipeline** | Classifier ➞ calibrator multi-step pipeline | `banking77_pipeline_mipro_local.toml`, `banking77_pipeline_mipro_test.toml` |
|
|
27
|
+
|
|
28
|
+
*More task configs coming soon: HotpotQA, IFBench, HoVer, PUPA*
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Quick Start (Banking77 Single-Step)
|
|
33
|
+
|
|
34
|
+
### Prerequisites
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
# 1. Install dependencies
|
|
38
|
+
uv pip install -e .
|
|
39
|
+
|
|
40
|
+
# 2. Set environment variables
|
|
41
|
+
export SYNTH_API_KEY="your-backend-api-key"
|
|
42
|
+
export GROQ_API_KEY="gsk_your_groq_key"
|
|
43
|
+
export OPENAI_API_KEY="sk-your-openai-key" # Required for meta-model
|
|
44
|
+
export ENVIRONMENT_API_KEY="$(python -c 'import secrets; print(secrets.token_urlsafe(32))')"
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
**Where to get API keys:**
|
|
48
|
+
- **GROQ_API_KEY**: Get from https://console.groq.com/keys
|
|
49
|
+
- **OPENAI_API_KEY**: Get from https://platform.openai.com/api-keys (required for meta-model)
|
|
50
|
+
- **SYNTH_API_KEY**: Get from your backend admin or `.env.dev` file
|
|
51
|
+
- **ENVIRONMENT_API_KEY**: Generate a random secure token (command above)
|
|
52
|
+
|
|
53
|
+
### Step 1: Start the Backend
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
# Make sure your backend is running
|
|
57
|
+
curl http://localhost:8000/api/health
|
|
58
|
+
# Should return: {"status":"ok"}
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Step 2: Deploy Task App
|
|
62
|
+
|
|
63
|
+
**Option A: Using helper script (recommended)**
|
|
64
|
+
```bash
|
|
65
|
+
# Terminal 1
|
|
66
|
+
./examples/blog_posts/mipro/deploy_banking77_task_app.sh
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
**Option B: Using CLI**
|
|
70
|
+
```bash
|
|
71
|
+
uvx synth-ai deploy banking77 --runtime uvicorn --port 8102
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**Option C: Deploy to Modal**
|
|
75
|
+
```bash
|
|
76
|
+
uvx synth-ai deploy banking77 --runtime modal --name banking77-mipro --env-file .env
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### Step 3: Run MIPROv2 Optimization
|
|
80
|
+
|
|
81
|
+
**Option A: Using helper script (recommended)**
|
|
82
|
+
```bash
|
|
83
|
+
# Terminal 2
|
|
84
|
+
./examples/blog_posts/mipro/run_mipro_banking77.sh
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
**Option B: Using CLI directly**
|
|
88
|
+
```bash
|
|
89
|
+
uvx synth-ai train \
|
|
90
|
+
--config examples/blog_posts/mipro/configs/banking77_mipro_local.toml \
|
|
91
|
+
--backend http://localhost:8000 \
|
|
92
|
+
--poll
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Step 4: Monitor Progress
|
|
96
|
+
|
|
97
|
+
You'll see real-time output like:
|
|
98
|
+
```
|
|
99
|
+
🔬 Running MIPROv2 on Banking77
|
|
100
|
+
=================================
|
|
101
|
+
✅ Backend URL: http://localhost:8000
|
|
102
|
+
✅ Task app is healthy
|
|
103
|
+
|
|
104
|
+
🚀 Starting MIPROv2 training...
|
|
105
|
+
|
|
106
|
+
Bootstrap Phase:
|
|
107
|
+
Evaluating baseline on seeds [0-4]...
|
|
108
|
+
Found 3 high-scoring examples (score >= 0.85)
|
|
109
|
+
Initializing meta-model with few-shot examples...
|
|
110
|
+
|
|
111
|
+
Iteration 1/16:
|
|
112
|
+
Meta-model proposing 6 prompt variants...
|
|
113
|
+
Evaluating on online pool [5-9]...
|
|
114
|
+
Best score: 0.78
|
|
115
|
+
|
|
116
|
+
Iteration 2/16:
|
|
117
|
+
...
|
|
118
|
+
|
|
119
|
+
✅ MIPROv2 training complete!
|
|
120
|
+
Best prompt accuracy: 0.87 (87%)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Results are automatically saved and can be queried via the Python API or REST endpoints.
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
## Quick Start (Banking77 Multi-Step Pipeline)
|
|
128
|
+
|
|
129
|
+
The pipeline task app chains a classifier stage with a calibrator stage. Each evaluation now executes both modules, so configs are toned down for latency.
|
|
130
|
+
|
|
131
|
+
### Step 1: Deploy Pipeline Task App
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
# Terminal 1
|
|
135
|
+
./examples/blog_posts/mipro/deploy_banking77_pipeline_task_app.sh
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
This launches the `banking77-pipeline` task app on `http://127.0.0.1:8112`.
|
|
139
|
+
|
|
140
|
+
### Step 2: Run Pipeline Optimisation
|
|
141
|
+
|
|
142
|
+
```bash
|
|
143
|
+
# Terminal 2
|
|
144
|
+
./examples/blog_posts/mipro/run_mipro_banking77_pipeline.sh
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
The helper script checks backend health, ensures the pipeline task app is online, and runs `banking77_pipeline_mipro_local.toml`.
|
|
148
|
+
|
|
149
|
+
### Step 3: Monitor Output
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
🔬 Running MIPROv2 on Banking77 Pipeline
|
|
153
|
+
========================================
|
|
154
|
+
✅ Pipeline task app is healthy
|
|
155
|
+
✅ Backend is healthy
|
|
156
|
+
|
|
157
|
+
🚀 Starting MIPROv2 training...
|
|
158
|
+
Multi-Step Flow:
|
|
159
|
+
1. Bootstrap: two-module pipeline on seeds [0-7]
|
|
160
|
+
2. Optimisation: 8 iterations × 4 variants
|
|
161
|
+
3. Held-out evaluation on seeds [16-23]
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
Expect slightly higher per-iteration latency because each candidate runs both pipeline modules. The script summarises results when optimisation finishes.
|
|
165
|
+
|
|
166
|
+
## Configuration
|
|
167
|
+
|
|
168
|
+
### Example: Banking77 MIPROv2 Configuration (Single-Step)
|
|
169
|
+
|
|
170
|
+
```toml
|
|
171
|
+
[prompt_learning]
|
|
172
|
+
algorithm = "mipro"
|
|
173
|
+
task_app_url = "http://127.0.0.1:8102"
|
|
174
|
+
task_app_id = "banking77"
|
|
175
|
+
|
|
176
|
+
[prompt_learning.initial_prompt]
|
|
177
|
+
messages = [
|
|
178
|
+
{ role = "system", content = "You are an expert banking assistant..." },
|
|
179
|
+
{ role = "user", pattern = "Customer Query: {query}\n\nClassify..." }
|
|
180
|
+
]
|
|
181
|
+
|
|
182
|
+
[prompt_learning.mipro]
|
|
183
|
+
num_iterations = 16 # Optimization iterations
|
|
184
|
+
num_evaluations_per_iteration = 6 # Variants per iteration
|
|
185
|
+
batch_size = 6 # Concurrent evaluations
|
|
186
|
+
max_concurrent = 16 # Max parallel rollouts
|
|
187
|
+
meta_model = "gpt-4o-mini" # Meta-model for proposals
|
|
188
|
+
meta_model_provider = "openai"
|
|
189
|
+
few_shot_score_threshold = 0.85 # Bootstrap threshold
|
|
190
|
+
|
|
191
|
+
# Seed pools
|
|
192
|
+
bootstrap_train_seeds = [0, 1, 2, 3, 4] # Bootstrap phase seeds
|
|
193
|
+
online_pool = [5, 6, 7, 8, 9] # Online evaluation seeds
|
|
194
|
+
test_pool = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] # Final test seeds
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Example: Banking77 Pipeline Configuration (Multi-Step)
|
|
198
|
+
|
|
199
|
+
```toml
|
|
200
|
+
[prompt_learning]
|
|
201
|
+
algorithm = "mipro"
|
|
202
|
+
task_app_url = "http://127.0.0.1:8112"
|
|
203
|
+
task_app_id = "banking77-pipeline"
|
|
204
|
+
|
|
205
|
+
[prompt_learning.initial_prompt.metadata]
|
|
206
|
+
pipeline_modules = [
|
|
207
|
+
{ name = "classifier", instruction_text = "...", few_shots = [] },
|
|
208
|
+
{ name = "calibrator", instruction_text = "...", few_shots = [] }
|
|
209
|
+
]
|
|
210
|
+
|
|
211
|
+
[prompt_learning.mipro]
|
|
212
|
+
num_iterations = 8
|
|
213
|
+
num_evaluations_per_iteration = 4
|
|
214
|
+
batch_size = 4
|
|
215
|
+
max_concurrent = 12
|
|
216
|
+
few_shot_score_threshold = 0.82
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
The `pipeline_modules` metadata tells the task app which instruction text and demonstrations belong to each stage.
|
|
220
|
+
|
|
221
|
+
### Key Parameters
|
|
222
|
+
|
|
223
|
+
| Parameter | Description | Typical Range |
|
|
224
|
+
|-----------|-------------|---------------|
|
|
225
|
+
| `num_iterations` | Optimization iterations | 10-20 |
|
|
226
|
+
| `num_evaluations_per_iteration` | Variants per iteration | 4-8 |
|
|
227
|
+
| `batch_size` | Concurrent evaluations | 4-10 |
|
|
228
|
+
| `few_shot_score_threshold` | Bootstrap threshold | 0.75-0.90 |
|
|
229
|
+
| `bootstrap_train_seeds` | Bootstrap phase seeds | 3-10 seeds |
|
|
230
|
+
| `online_pool` | Online evaluation seeds | 5-20 seeds |
|
|
231
|
+
| `test_pool` | Final test seeds | 5-50 seeds |
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## How MIPROv2 Works
|
|
236
|
+
|
|
237
|
+
### Bootstrap Phase
|
|
238
|
+
|
|
239
|
+
1. **Evaluate Baseline**: Run initial prompt on `bootstrap_train_seeds`
|
|
240
|
+
2. **Collect Examples**: Filter seeds with score >= `few_shot_score_threshold`
|
|
241
|
+
3. **Generate Demonstrations**: Format high-scoring examples as few-shot demonstrations
|
|
242
|
+
4. **Initialize Meta-Model**: Provide demonstrations to meta-model for context
|
|
243
|
+
5. **Warm Up TPE**: Initialize Tree-structured Parzen Estimator with initial evaluations
|
|
244
|
+
|
|
245
|
+
### Optimization Loop
|
|
246
|
+
|
|
247
|
+
For each iteration (1 to `num_iterations`):
|
|
248
|
+
|
|
249
|
+
1. **Meta-Model Proposals**: Meta-model proposes `num_evaluations_per_iteration` prompt variants
|
|
250
|
+
2. **TPE Selection**: TPE selects hyperparameters (mutation locations, instruction additions)
|
|
251
|
+
3. **Mini-Batch Evaluation**: Evaluate variants on `online_pool` seeds (batch_size concurrent)
|
|
252
|
+
4. **Update Meta-Model**: Learn from evaluation results
|
|
253
|
+
5. **Update TPE**: Refine hyperparameter distribution
|
|
254
|
+
|
|
255
|
+
### Final Evaluation
|
|
256
|
+
|
|
257
|
+
- Evaluate best prompts on `test_pool` (held-out seeds)
|
|
258
|
+
- Return optimized prompt with test score
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## Querying Results
|
|
263
|
+
|
|
264
|
+
### Python API
|
|
265
|
+
|
|
266
|
+
```python
|
|
267
|
+
from synth_ai.learning import get_prompts, get_prompt_text, get_scoring_summary
|
|
268
|
+
|
|
269
|
+
# Get all results
|
|
270
|
+
results = get_prompts(
|
|
271
|
+
job_id="pl_abc123",
|
|
272
|
+
base_url="http://localhost:8000",
|
|
273
|
+
api_key="sk_..."
|
|
274
|
+
)
|
|
275
|
+
|
|
276
|
+
# Access best prompt
|
|
277
|
+
best_prompt = results["best_prompt"]
|
|
278
|
+
best_score = results["best_score"]
|
|
279
|
+
print(f"Best Score: {best_score:.3f}")
|
|
280
|
+
|
|
281
|
+
# Get prompt text
|
|
282
|
+
best_text = get_prompt_text(
|
|
283
|
+
job_id="pl_abc123",
|
|
284
|
+
base_url="http://localhost:8000",
|
|
285
|
+
api_key="sk_...",
|
|
286
|
+
rank=1
|
|
287
|
+
)
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### REST API
|
|
291
|
+
|
|
292
|
+
```bash
|
|
293
|
+
# Get job status
|
|
294
|
+
curl -H "Authorization: Bearer $SYNTH_API_KEY" \
|
|
295
|
+
http://localhost:8000/api/prompt-learning/online/jobs/JOB_ID
|
|
296
|
+
|
|
297
|
+
# Stream events
|
|
298
|
+
curl -H "Authorization: Bearer $SYNTH_API_KEY" \
|
|
299
|
+
http://localhost:8000/api/prompt-learning/online/jobs/JOB_ID/events/stream
|
|
300
|
+
|
|
301
|
+
# Get metrics
|
|
302
|
+
curl -H "Authorization: Bearer $SYNTH_API_KEY" \
|
|
303
|
+
http://localhost:8000/api/prompt-learning/online/jobs/JOB_ID/metrics
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
---
|
|
307
|
+
|
|
308
|
+
## Expected Results
|
|
309
|
+
|
|
310
|
+
MIPROv2 typically achieves similar accuracy to GEPA with fewer evaluations:
|
|
311
|
+
|
|
312
|
+
| Phase | Typical Accuracy | Notes |
|
|
313
|
+
|-------|------------------|-------|
|
|
314
|
+
| Baseline | 60-75% | Initial prompt |
|
|
315
|
+
| After Bootstrap | 70-80% | Meta-model initialized with examples |
|
|
316
|
+
| After 10 iterations | 80-85% | Mid-optimization |
|
|
317
|
+
| After 16 iterations | 85-90%+ | Final optimized prompt |
|
|
318
|
+
|
|
319
|
+
**Total Evaluations**: ~96 rollouts (16 iterations × 6 variants) vs ~1000 for GEPA
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
## GEPA vs MIPROv2
|
|
324
|
+
|
|
325
|
+
| Aspect | GEPA | MIPROv2 |
|
|
326
|
+
|--------|------|---------|
|
|
327
|
+
| **Initialization** | Random population | Bootstrap phase (few-shot examples) |
|
|
328
|
+
| **Exploration** | Mutation + Crossover | Meta-model + TPE |
|
|
329
|
+
| **Evaluation** | Full (30 seeds) | Mini-batch (5 seeds per iteration) |
|
|
330
|
+
| **Learning** | Population evolution | Meta-learning |
|
|
331
|
+
| **Cost** | ~1000 rollouts | ~96 rollouts |
|
|
332
|
+
| **Convergence** | 3-10 generations | 10-20 iterations |
|
|
333
|
+
| **Best For** | Diverse solutions | Fast, efficient optimization |
|
|
334
|
+
|
|
335
|
+
**When to Use GEPA:**
|
|
336
|
+
- Need diverse prompt variants (Pareto front)
|
|
337
|
+
- Want to explore many approaches
|
|
338
|
+
- Have large evaluation budget
|
|
339
|
+
|
|
340
|
+
**When to Use MIPROv2:**
|
|
341
|
+
- Want faster convergence
|
|
342
|
+
- Have clear task structure
|
|
343
|
+
- Need efficient optimization
|
|
344
|
+
- Want meta-learning benefits
|
|
345
|
+
|
|
346
|
+
---
|
|
347
|
+
|
|
348
|
+
## Troubleshooting
|
|
349
|
+
|
|
350
|
+
### ❌ "MIPRO algorithm is not yet implemented"
|
|
351
|
+
|
|
352
|
+
**Solution:** MIPROv2 support is currently under development. Use GEPA for now:
|
|
353
|
+
```bash
|
|
354
|
+
uvx synth-ai train --config examples/blog_posts/gepa/configs/banking77_gepa_local.toml
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
### ❌ "OPENAI_API_KEY environment variable is required"
|
|
358
|
+
|
|
359
|
+
**Solution:** Export your OpenAI API key for the meta-model:
|
|
360
|
+
```bash
|
|
361
|
+
export OPENAI_API_KEY="sk-your-key-here"
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
### ❌ "Bootstrap phase found no high-scoring examples"
|
|
365
|
+
|
|
366
|
+
**Solution:** Lower the `few_shot_score_threshold` in your config:
|
|
367
|
+
```toml
|
|
368
|
+
[prompt_learning.mipro]
|
|
369
|
+
few_shot_score_threshold = 0.75 # Lower from 0.85
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
### ❌ "Banking77 task app is not running"
|
|
373
|
+
|
|
374
|
+
**Solution:** Start the task app first:
|
|
375
|
+
```bash
|
|
376
|
+
./examples/blog_posts/mipro/deploy_banking77_task_app.sh
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
---
|
|
380
|
+
|
|
381
|
+
## Files in This Directory
|
|
382
|
+
|
|
383
|
+
```
|
|
384
|
+
examples/blog_posts/mipro/
|
|
385
|
+
├── README.md # This file - comprehensive guide
|
|
386
|
+
├── configs/ # Configuration files
|
|
387
|
+
│ ├── banking77_mipro_local.toml # Banking77 MIPRO config (local)
|
|
388
|
+
│ └── banking77_mipro_test.toml # Banking77 MIPRO config (test)
|
|
389
|
+
├── deploy_banking77_task_app.sh # Helper: Start task app
|
|
390
|
+
└── run_mipro_banking77.sh # Helper: Run MIPROv2 optimization
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
---
|
|
394
|
+
|
|
395
|
+
## Next Steps
|
|
396
|
+
|
|
397
|
+
1. **Test the bootstrap phase**: Verify few-shot example collection works
|
|
398
|
+
2. **Run full optimization**: Complete 16 iterations and check results
|
|
399
|
+
3. **Compare with GEPA**: Run GEPA on same task and compare accuracy/cost
|
|
400
|
+
4. **Experiment with parameters**: Adjust bootstrap threshold, iteration count
|
|
401
|
+
5. **Try other tasks**: Adapt configs for HotpotQA, IFBench, etc.
|
|
402
|
+
|
|
403
|
+
---
|
|
404
|
+
|
|
405
|
+
## Support
|
|
406
|
+
|
|
407
|
+
For issues or questions:
|
|
408
|
+
|
|
409
|
+
1. Verify all API keys are set correctly (SYNTH_API_KEY, GROQ_API_KEY, OPENAI_API_KEY)
|
|
410
|
+
2. Check task app: `curl -H "X-API-Key: $ENVIRONMENT_API_KEY" http://127.0.0.1:8102/health`
|
|
411
|
+
3. Check backend: `curl http://localhost:8000/api/health`
|
|
412
|
+
4. Review logs in both terminals for error messages
|
|
413
|
+
|
|
414
|
+
Happy optimizing! 🔬🚀
|
|
415
|
+
|