synth-ai 0.2.14__py3-none-any.whl → 0.2.17__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synth-ai might be problematic. Click here for more details.
- examples/README.md +1 -0
- examples/analyze_semantic_words.sh +2 -2
- examples/blog_posts/pokemon_vl/README.md +98 -0
- examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +25 -0
- examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
- examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
- examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +42 -0
- examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
- examples/blog_posts/warming_up_to_rl/README.md +158 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
- examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
- examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +41 -0
- examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
- examples/dev/qwen3_32b_qlora_4xh100.toml +5 -0
- examples/multi_step/SFT_README.md +147 -0
- examples/multi_step/configs/crafter_rl_outcome.toml +1 -1
- examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +73 -115
- examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +1 -1
- examples/multi_step/configs/crafter_rl_stepwise_simple.toml +1 -1
- examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
- examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
- examples/multi_step/configs/verilog_rl_lora.toml +80 -123
- examples/multi_step/convert_traces_to_sft.py +84 -0
- examples/multi_step/run_sft_qwen30b.sh +45 -0
- examples/qwen_coder/configs/coder_lora_30b.toml +1 -2
- examples/qwen_coder/configs/coder_lora_4b.toml +5 -1
- examples/qwen_coder/configs/coder_lora_small.toml +1 -2
- examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
- examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
- examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
- examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
- examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
- examples/qwen_vl/QUICKSTART.md +327 -0
- examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
- examples/qwen_vl/README.md +152 -0
- examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
- examples/qwen_vl/RL_VISION_TESTING.md +333 -0
- examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
- examples/qwen_vl/SETUP_COMPLETE.md +274 -0
- examples/qwen_vl/VISION_TESTS_COMPLETE.md +489 -0
- examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
- examples/qwen_vl/__init__.py +2 -0
- examples/qwen_vl/collect_data_via_cli.md +415 -0
- examples/qwen_vl/collect_vision_traces.py +368 -0
- examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +110 -0
- examples/qwen_vl/configs/crafter_vlm_sft_example.toml +59 -0
- examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +26 -0
- examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
- examples/qwen_vl/configs/eval_gpt5nano_vision.toml +26 -0
- examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
- examples/qwen_vl/configs/filter_qwen3vl_sft.toml +49 -0
- examples/qwen_vl/configs/filter_vision_sft.toml +52 -0
- examples/qwen_vl/configs/filter_vision_test.toml +8 -0
- examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
- examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
- examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
- examples/qwen_vl/run_vision_comparison.sh +61 -0
- examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
- examples/qwen_vl/test_image_validation.py +201 -0
- examples/qwen_vl/test_sft_vision_data.py +110 -0
- examples/rl/README.md +6 -6
- examples/rl/configs/eval_base_qwen.toml +17 -0
- examples/rl/configs/eval_rl_qwen.toml +13 -0
- examples/rl/configs/rl_from_base_qwen.toml +62 -0
- examples/rl/configs/rl_from_base_qwen17.toml +79 -0
- examples/rl/configs/rl_from_ft_qwen.toml +37 -0
- examples/rl/run_eval.py +436 -0
- examples/rl/run_rl_and_save.py +111 -0
- examples/rl/task_app/README.md +21 -0
- examples/rl/task_app/math_single_step.py +990 -0
- examples/rl/task_app/math_task_app.py +111 -0
- examples/run_crafter_demo.sh +2 -2
- examples/sft/README.md +6 -6
- examples/sft/configs/crafter_fft_qwen0p6b.toml +7 -2
- examples/sft/configs/crafter_lora_qwen0p6b.toml +7 -3
- examples/sft/evaluate.py +2 -4
- examples/sft/export_dataset.py +7 -4
- examples/swe/task_app/README.md +33 -3
- examples/swe/task_app/grpo_swe_mini.py +4 -1
- examples/swe/task_app/grpo_swe_mini_task_app.py +0 -12
- examples/swe/task_app/hosted/envs/crafter/react_agent.py +1 -1
- examples/swe/task_app/hosted/envs/mini_swe/environment.py +50 -23
- examples/swe/task_app/hosted/inference/openai_client.py +4 -4
- examples/swe/task_app/hosted/policy_routes.py +0 -2
- examples/swe/task_app/hosted/rollout.py +0 -8
- examples/swe/task_app/morph_backend.py +178 -0
- examples/task_apps/crafter/task_app/README.md +1 -1
- examples/task_apps/crafter/task_app/grpo_crafter.py +70 -10
- examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +1 -1
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +63 -27
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -2
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +48 -50
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +75 -36
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +31 -15
- examples/task_apps/enron/__init__.py +1 -0
- examples/task_apps/enron/task_app/grpo_enron_task_app.py +1 -1
- examples/task_apps/math/README.md +1 -2
- examples/task_apps/pokemon_red/README.md +3 -4
- examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +6 -5
- examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +1 -2
- examples/task_apps/pokemon_red/task_app.py +36 -5
- examples/task_apps/sokoban/README.md +2 -3
- examples/task_apps/verilog/eval_groq_qwen32b.toml +12 -14
- examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +1 -1
- examples/vlm/README.md +3 -3
- examples/vlm/configs/crafter_vlm_gpt4o.toml +5 -0
- examples/vlm/crafter_openai_vlm_agent.py +3 -5
- examples/vlm/filter_image_rows.py +1 -1
- examples/vlm/run_crafter_vlm_benchmark.py +2 -2
- examples/warming_up_to_rl/_utils.py +92 -0
- examples/warming_up_to_rl/analyze_trace_db.py +1 -1
- examples/warming_up_to_rl/configs/crafter_fft.toml +5 -0
- examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +2 -1
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +2 -1
- examples/warming_up_to_rl/configs/rl_from_ft.toml +2 -0
- examples/warming_up_to_rl/export_trace_sft.py +174 -60
- examples/warming_up_to_rl/readme.md +63 -132
- examples/warming_up_to_rl/run_fft_and_save.py +1 -1
- examples/warming_up_to_rl/run_local_rollout_traced.py +1 -1
- examples/warming_up_to_rl/run_rl_and_save.py +1 -1
- examples/warming_up_to_rl/task_app/README.md +42 -0
- examples/warming_up_to_rl/task_app/grpo_crafter.py +827 -0
- examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +454 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +204 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +618 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1084 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1861 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +62 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen.toml +27 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +5 -0
- synth_ai/__init__.py +44 -30
- synth_ai/_utils/__init__.py +47 -0
- synth_ai/_utils/base_url.py +10 -0
- synth_ai/_utils/http.py +10 -0
- synth_ai/_utils/prompts.py +10 -0
- synth_ai/_utils/task_app_state.py +12 -0
- synth_ai/_utils/user_config.py +10 -0
- synth_ai/api/models/supported.py +144 -7
- synth_ai/api/train/__init__.py +13 -1
- synth_ai/api/train/builders.py +9 -3
- synth_ai/api/train/cli.py +155 -17
- synth_ai/api/train/config_finder.py +18 -11
- synth_ai/api/train/configs/__init__.py +8 -1
- synth_ai/api/train/configs/rl.py +32 -7
- synth_ai/api/train/configs/sft.py +6 -2
- synth_ai/api/train/configs/shared.py +59 -2
- synth_ai/api/train/env_resolver.py +13 -10
- synth_ai/auth/credentials.py +119 -0
- synth_ai/cli/__init__.py +61 -69
- synth_ai/cli/_modal_wrapper.py +7 -5
- synth_ai/cli/_typer_patch.py +0 -2
- synth_ai/cli/_validate_task_app.py +22 -4
- synth_ai/cli/commands/__init__.py +17 -0
- synth_ai/cli/commands/demo/__init__.py +6 -0
- synth_ai/cli/commands/demo/core.py +163 -0
- synth_ai/cli/commands/deploy/__init__.py +23 -0
- synth_ai/cli/commands/deploy/core.py +614 -0
- synth_ai/cli/commands/deploy/errors.py +72 -0
- synth_ai/cli/commands/deploy/validation.py +11 -0
- synth_ai/cli/commands/eval/__init__.py +19 -0
- synth_ai/cli/commands/eval/core.py +1109 -0
- synth_ai/cli/commands/eval/errors.py +81 -0
- synth_ai/cli/commands/eval/validation.py +133 -0
- synth_ai/cli/commands/filter/__init__.py +12 -0
- synth_ai/cli/commands/filter/core.py +388 -0
- synth_ai/cli/commands/filter/errors.py +55 -0
- synth_ai/cli/commands/filter/validation.py +77 -0
- synth_ai/cli/commands/help/__init__.py +177 -0
- synth_ai/cli/commands/help/core.py +73 -0
- synth_ai/cli/commands/status/__init__.py +64 -0
- synth_ai/cli/commands/status/client.py +192 -0
- synth_ai/cli/commands/status/config.py +92 -0
- synth_ai/cli/commands/status/errors.py +20 -0
- synth_ai/cli/commands/status/formatters.py +164 -0
- synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
- synth_ai/cli/commands/status/subcommands/files.py +79 -0
- synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
- synth_ai/cli/commands/status/subcommands/models.py +79 -0
- synth_ai/cli/commands/status/subcommands/runs.py +81 -0
- synth_ai/cli/commands/status/subcommands/summary.py +47 -0
- synth_ai/cli/commands/status/utils.py +114 -0
- synth_ai/cli/commands/train/__init__.py +53 -0
- synth_ai/cli/commands/train/core.py +21 -0
- synth_ai/cli/commands/train/errors.py +117 -0
- synth_ai/cli/commands/train/judge_schemas.py +199 -0
- synth_ai/cli/commands/train/judge_validation.py +304 -0
- synth_ai/cli/commands/train/validation.py +443 -0
- synth_ai/cli/demo.py +2 -162
- synth_ai/cli/deploy/__init__.py +28 -0
- synth_ai/cli/deploy/core.py +5 -0
- synth_ai/cli/deploy/errors.py +23 -0
- synth_ai/cli/deploy/validation.py +5 -0
- synth_ai/cli/eval/__init__.py +36 -0
- synth_ai/cli/eval/core.py +5 -0
- synth_ai/cli/eval/errors.py +31 -0
- synth_ai/cli/eval/validation.py +5 -0
- synth_ai/cli/filter/__init__.py +28 -0
- synth_ai/cli/filter/core.py +5 -0
- synth_ai/cli/filter/errors.py +23 -0
- synth_ai/cli/filter/validation.py +5 -0
- synth_ai/cli/legacy_root_backup.py +3 -1
- synth_ai/cli/lib/__init__.py +10 -0
- synth_ai/cli/lib/task_app_discovery.py +7 -0
- synth_ai/cli/lib/task_app_env.py +518 -0
- synth_ai/cli/modal_serve/__init__.py +12 -0
- synth_ai/cli/modal_serve/core.py +14 -0
- synth_ai/cli/modal_serve/errors.py +8 -0
- synth_ai/cli/modal_serve/validation.py +11 -0
- synth_ai/cli/recent.py +2 -1
- synth_ai/cli/serve/__init__.py +12 -0
- synth_ai/cli/serve/core.py +14 -0
- synth_ai/cli/serve/errors.py +8 -0
- synth_ai/cli/serve/validation.py +11 -0
- synth_ai/cli/setup.py +21 -0
- synth_ai/cli/status.py +7 -126
- synth_ai/cli/task_app_deploy.py +7 -0
- synth_ai/cli/task_app_list.py +25 -0
- synth_ai/cli/task_app_modal_serve.py +11 -0
- synth_ai/cli/task_app_serve.py +11 -0
- synth_ai/cli/task_apps.py +110 -1499
- synth_ai/cli/traces.py +1 -1
- synth_ai/cli/train/__init__.py +12 -0
- synth_ai/cli/train/core.py +21 -0
- synth_ai/cli/train/errors.py +8 -0
- synth_ai/cli/train/validation.py +24 -0
- synth_ai/cli/train.py +5 -0
- synth_ai/cli/turso.py +1 -1
- synth_ai/cli/watch.py +1 -1
- synth_ai/demos/__init__.py +10 -0
- synth_ai/demos/core/__init__.py +28 -1
- synth_ai/demos/crafter/__init__.py +1 -0
- synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
- synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
- synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
- synth_ai/demos/demo_registry.py +176 -0
- synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +1 -1
- synth_ai/demos/math/__init__.py +1 -0
- synth_ai/demos/math/_common.py +16 -0
- synth_ai/demos/math/app.py +38 -0
- synth_ai/demos/math/config.toml +76 -0
- synth_ai/demos/math/deploy_modal.py +54 -0
- synth_ai/demos/math/modal_task_app.py +702 -0
- synth_ai/demos/math/task_app_entry.py +51 -0
- synth_ai/environments/environment/core.py +7 -1
- synth_ai/environments/examples/bandit/engine.py +0 -1
- synth_ai/environments/examples/bandit/environment.py +0 -1
- synth_ai/environments/examples/red/engine.py +33 -12
- synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
- synth_ai/environments/examples/red/environment.py +26 -0
- synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
- synth_ai/environments/examples/wordle/environment.py +0 -1
- synth_ai/evals/base.py +16 -5
- synth_ai/evals/client.py +1 -1
- synth_ai/http.py +8 -22
- synth_ai/inference/client.py +1 -1
- synth_ai/judge_schemas.py +4 -5
- synth_ai/learning/client.py +1 -1
- synth_ai/learning/health.py +1 -1
- synth_ai/learning/jobs.py +1 -1
- synth_ai/learning/rl/client.py +4 -2
- synth_ai/learning/rl/env_keys.py +1 -1
- synth_ai/learning/rl/secrets.py +1 -1
- synth_ai/learning/sft/client.py +1 -1
- synth_ai/learning/sft/data.py +407 -4
- synth_ai/learning/validators.py +4 -1
- synth_ai/streaming/__init__.py +29 -0
- synth_ai/streaming/config.py +94 -0
- synth_ai/streaming/handlers.py +469 -0
- synth_ai/streaming/streamer.py +301 -0
- synth_ai/streaming/types.py +95 -0
- synth_ai/task/apps/__init__.py +4 -2
- synth_ai/task/config.py +6 -4
- synth_ai/task/rubrics/__init__.py +1 -2
- synth_ai/task/rubrics/loaders.py +14 -10
- synth_ai/task/rubrics.py +219 -0
- synth_ai/task/trace_correlation_helpers.py +24 -11
- synth_ai/task/tracing_utils.py +14 -3
- synth_ai/task/validators.py +0 -1
- synth_ai/tracing_v3/abstractions.py +3 -3
- synth_ai/tracing_v3/config.py +15 -13
- synth_ai/tracing_v3/constants.py +21 -0
- synth_ai/tracing_v3/db_config.py +3 -1
- synth_ai/tracing_v3/decorators.py +10 -7
- synth_ai/tracing_v3/llm_call_record_helpers.py +5 -5
- synth_ai/tracing_v3/migration_helper.py +1 -2
- synth_ai/tracing_v3/session_tracer.py +7 -7
- synth_ai/tracing_v3/storage/base.py +29 -29
- synth_ai/tracing_v3/storage/config.py +3 -3
- synth_ai/tracing_v3/turso/daemon.py +8 -9
- synth_ai/tracing_v3/turso/native_manager.py +80 -72
- synth_ai/tracing_v3/utils.py +2 -2
- synth_ai/utils/__init__.py +101 -0
- synth_ai/utils/base_url.py +94 -0
- synth_ai/utils/cli.py +131 -0
- synth_ai/utils/env.py +294 -0
- synth_ai/utils/http.py +172 -0
- synth_ai/utils/modal.py +308 -0
- synth_ai/utils/process.py +212 -0
- synth_ai/utils/prompts.py +39 -0
- synth_ai/utils/sqld.py +122 -0
- synth_ai/utils/task_app_discovery.py +882 -0
- synth_ai/utils/task_app_env.py +186 -0
- synth_ai/utils/task_app_state.py +318 -0
- synth_ai/utils/user_config.py +137 -0
- synth_ai/v0/config/__init__.py +1 -5
- synth_ai/v0/config/base_url.py +1 -7
- synth_ai/v0/tracing/config.py +1 -1
- synth_ai/v0/tracing/decorators.py +1 -1
- synth_ai/v0/tracing/upload.py +1 -1
- synth_ai/v0/tracing_v1/config.py +1 -1
- synth_ai/v0/tracing_v1/decorators.py +1 -1
- synth_ai/v0/tracing_v1/upload.py +1 -1
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/METADATA +91 -32
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/RECORD +341 -154
- synth_ai/cli/man.py +0 -106
- synth_ai/cli/tui.py +0 -57
- synth_ai/compound/cais.py +0 -0
- synth_ai/core/experiment.py +0 -13
- synth_ai/core/system.py +0 -15
- synth_ai/demo_registry.py +0 -295
- synth_ai/handshake.py +0 -109
- synth_ai/tui/__init__.py +0 -5
- synth_ai/tui/__main__.py +0 -13
- synth_ai/tui/cli/__init__.py +0 -1
- synth_ai/tui/cli/query_experiments.py +0 -164
- synth_ai/tui/cli/query_experiments_v3.py +0 -164
- synth_ai/tui/dashboard.py +0 -906
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/entry_points.txt +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/licenses/LICENSE +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/top_level.txt +0 -0
|
@@ -1,40 +1,33 @@
|
|
|
1
|
-
# Verilog RL experiment – LoRA training on Qwen3-0.6B
|
|
2
|
-
#
|
|
3
|
-
# This configuration adapts the Crafter RL setup for Verilog spec-to-RTL tasks.
|
|
4
|
-
# Uses the same proven pipeline but optimized for 0.6B model and Verilog domain.
|
|
5
|
-
|
|
6
1
|
[algorithm]
|
|
7
2
|
type = "online"
|
|
8
3
|
method = "policy_gradient"
|
|
9
4
|
variety = "gspo"
|
|
10
5
|
|
|
11
6
|
[services]
|
|
12
|
-
# Replace with the Modal URL printed by `uvx synth-ai modal-serve grpo-verilog`
|
|
13
7
|
task_url = "https://synth-laboratories--grpo-verilog-task-app-fastapi-app-dev.modal.run"
|
|
14
|
-
# Point at the Synth backend (or compatible service) that exposes /api/judge/v1/*
|
|
15
8
|
judge_url = "https://synth-backend-dev-docker.onrender.com/api"
|
|
16
9
|
|
|
17
10
|
[compute]
|
|
18
|
-
gpu_type = "H200"
|
|
19
|
-
gpu_count = 2
|
|
11
|
+
gpu_type = "H200"
|
|
12
|
+
gpu_count = 2
|
|
20
13
|
nodes = 1
|
|
21
14
|
|
|
22
15
|
[topology]
|
|
23
16
|
type = "single_node_split"
|
|
24
|
-
gpus_for_vllm = 1
|
|
25
|
-
gpus_for_training = 1
|
|
17
|
+
gpus_for_vllm = 1
|
|
18
|
+
gpus_for_training = 1
|
|
26
19
|
gpus_for_ref = 0
|
|
27
20
|
tensor_parallel = 1
|
|
28
21
|
|
|
29
22
|
[vllm]
|
|
30
23
|
tensor_parallel_size = 1
|
|
31
|
-
max_model_len = 24576
|
|
24
|
+
max_model_len = 24576
|
|
32
25
|
|
|
33
26
|
[reference]
|
|
34
27
|
placement = "none"
|
|
35
28
|
|
|
36
29
|
[model]
|
|
37
|
-
base = "Qwen/Qwen3-8B"
|
|
30
|
+
base = "Qwen/Qwen3-8B"
|
|
38
31
|
trainer_mode = "lora"
|
|
39
32
|
label = "verilog-rl-lora-qwen8b"
|
|
40
33
|
|
|
@@ -42,38 +35,21 @@ label = "verilog-rl-lora-qwen8b"
|
|
|
42
35
|
r = 16
|
|
43
36
|
alpha = 32
|
|
44
37
|
dropout = 0.05
|
|
45
|
-
target_modules = ["all-linear"]
|
|
38
|
+
target_modules = [ "all-linear",]
|
|
46
39
|
|
|
47
40
|
[rollout]
|
|
48
|
-
env_name = "verilog"
|
|
49
|
-
max_turns = 6
|
|
50
|
-
episodes_per_batch = 4
|
|
41
|
+
env_name = "verilog"
|
|
42
|
+
max_turns = 6
|
|
43
|
+
episodes_per_batch = 4
|
|
51
44
|
policy_name = "verilog-designer"
|
|
52
45
|
max_concurrent_rollouts = 8
|
|
53
46
|
batches_per_step = 2
|
|
54
|
-
ops = ["agent", "env"]
|
|
55
|
-
|
|
56
|
-
[rollout.env_config]
|
|
57
|
-
# Verilog-specific environment settings
|
|
58
|
-
difficulty = "medium" # Can be "easy", "medium", or "hard"
|
|
59
|
-
|
|
60
|
-
[rollout.env_config.step_rewards]
|
|
61
|
-
enabled = true
|
|
62
|
-
mode = "decision_stepwise"
|
|
63
|
-
strategy = "consistent"
|
|
64
|
-
indicator_lambda = 0.5 # ✅ Reduced from Crafter (sparser rewards)
|
|
65
|
-
step_beta = 0.0
|
|
66
|
-
|
|
67
|
-
[rollout.policy_config]
|
|
68
|
-
provider = "openai"
|
|
69
|
-
model = "Qwen/Qwen3-8B" # ✅ Use the model being trained (8B) for rollouts
|
|
70
|
-
temperature = 0.2
|
|
71
|
-
max_tokens = 4096 # ✅ Balanced for Verilog generation while leaving room for long input prompts (testbenches + history)
|
|
47
|
+
ops = [ "agent", "env",]
|
|
72
48
|
|
|
73
49
|
[evaluation]
|
|
74
50
|
instances = 16
|
|
75
51
|
every_n_iters = 10
|
|
76
|
-
seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
|
|
52
|
+
seeds = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,]
|
|
77
53
|
|
|
78
54
|
[training]
|
|
79
55
|
num_epochs = 1
|
|
@@ -81,110 +57,91 @@ iterations_per_epoch = 5
|
|
|
81
57
|
gradient_accumulation_steps = 1
|
|
82
58
|
max_accumulated_minibatch = 1
|
|
83
59
|
max_turns = 15
|
|
84
|
-
batch_size = 4
|
|
60
|
+
batch_size = 4
|
|
85
61
|
group_size = 4
|
|
86
|
-
learning_rate = 5e-5
|
|
62
|
+
learning_rate = 5e-5
|
|
87
63
|
log_interval = 1
|
|
88
64
|
weight_sync_interval = 1
|
|
89
65
|
event_rewards_kind = "unique"
|
|
90
|
-
async_semaphore_max = 20
|
|
91
|
-
|
|
92
|
-
# Enable dense decision rewards in the trainer
|
|
66
|
+
async_semaphore_max = 20
|
|
93
67
|
step_rewards_enabled = true
|
|
94
68
|
step_rewards_mode = "decision_stepwise"
|
|
95
|
-
step_rewards_indicator_lambda = 0.5
|
|
69
|
+
step_rewards_indicator_lambda = 0.5
|
|
96
70
|
step_rewards_beta = 0.0
|
|
97
71
|
step_rewards_strategy = "consistent"
|
|
98
72
|
|
|
73
|
+
[judge]
|
|
74
|
+
enabled = true
|
|
75
|
+
|
|
76
|
+
[rollout.env_config]
|
|
77
|
+
difficulty = "medium"
|
|
78
|
+
|
|
79
|
+
[rollout.policy_config]
|
|
80
|
+
provider = "openai"
|
|
81
|
+
model = "Qwen/Qwen3-8B"
|
|
82
|
+
temperature = 0.2
|
|
83
|
+
max_tokens = 4096
|
|
84
|
+
|
|
99
85
|
[training.weight_sync]
|
|
100
86
|
enable = true
|
|
101
|
-
targets = ["policy"]
|
|
87
|
+
targets = [ "policy",]
|
|
102
88
|
mode = "direct"
|
|
103
89
|
direct = true
|
|
104
90
|
verify_every_k = 0
|
|
105
91
|
|
|
106
|
-
[
|
|
107
|
-
|
|
92
|
+
[judge.reward_blend]
|
|
93
|
+
env = 0.3
|
|
94
|
+
event = 0.3
|
|
95
|
+
outcome = 0.4
|
|
96
|
+
|
|
97
|
+
[judge.options]
|
|
98
|
+
event = true
|
|
99
|
+
outcome = true
|
|
100
|
+
provider = "openai"
|
|
108
101
|
model = "openai/gpt-oss-120b"
|
|
109
|
-
|
|
110
|
-
|
|
102
|
+
rubric_id = "verilog/bundle@v1"
|
|
103
|
+
timeout_s = 45
|
|
111
104
|
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
105
|
+
[rollout.env_config.step_rewards]
|
|
106
|
+
enabled = true
|
|
107
|
+
mode = "decision_stepwise"
|
|
108
|
+
strategy = "consistent"
|
|
109
|
+
indicator_lambda = 0.5
|
|
110
|
+
step_beta = 0.0
|
|
111
|
+
|
|
112
|
+
[judge.options.weights]
|
|
113
|
+
process = 0.1
|
|
114
|
+
reasoning = 0.2
|
|
115
|
+
progress = 0.3
|
|
116
116
|
outcome = 0.4
|
|
117
117
|
|
|
118
|
-
[
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
criteria
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
118
|
+
[judge.options.rubric_overrides.event]
|
|
119
|
+
goal_text = " Evaluate each Verilog design decision for compilation success and process efficiency.\n High scores for successful compilation and strategic tool usage.\n Penalize unnecessary operations and compilation failures."
|
|
120
|
+
aggregation = "weighted_sum"
|
|
121
|
+
[[judge.options.rubric_overrides.event.criteria]]
|
|
122
|
+
id = "process.compilation_success"
|
|
123
|
+
weight = 0.7
|
|
124
|
+
scale = "bounded"
|
|
125
|
+
description = "Return 1.0 when compilation succeeds cleanly, 0.5 for warnings, 0.0 for errors"
|
|
126
|
+
|
|
127
|
+
[[judge.options.rubric_overrides.event.criteria]]
|
|
128
|
+
id = "process.design_iterations"
|
|
129
|
+
weight = 0.3
|
|
130
|
+
scale = "bounded"
|
|
131
|
+
description = "Reward efficient write→compile→simulate workflow, penalize redundant operations"
|
|
132
|
+
|
|
133
|
+
[judge.options.rubric_overrides.outcome]
|
|
134
|
+
goal_text = " Evaluate the final Verilog implementation for correctness and quality.\n High scores for working designs that pass all tests with good code quality."
|
|
135
|
+
aggregation = "weighted_sum"
|
|
136
|
+
[[judge.options.rubric_overrides.outcome.criteria]]
|
|
137
|
+
id = "outcome.tests_passed"
|
|
138
|
+
weight = 0.8
|
|
139
|
+
scale = "binary"
|
|
140
|
+
description = "Full credit when all tests pass, partial credit for some tests passing"
|
|
141
|
+
|
|
142
|
+
[[judge.options.rubric_overrides.outcome.criteria]]
|
|
143
|
+
id = "outcome.design_quality"
|
|
144
|
+
weight = 0.2
|
|
145
|
+
scale = "bounded"
|
|
146
|
+
description = "Code clarity, proper documentation, and efficient design patterns"
|
|
137
147
|
|
|
138
|
-
[judge.options]
|
|
139
|
-
event = true
|
|
140
|
-
outcome = true
|
|
141
|
-
provider = "openai"
|
|
142
|
-
model = "openai/gpt-oss-120b"
|
|
143
|
-
rubric_id = "verilog/bundle@v1"
|
|
144
|
-
max_concurrency = 6
|
|
145
|
-
tracks = ["process", "reasoning", "progress", "outcome"]
|
|
146
|
-
|
|
147
|
-
[judge.options.rubric_overrides]
|
|
148
|
-
|
|
149
|
-
[judge.options.rubric_overrides.event]
|
|
150
|
-
goal_text = """
|
|
151
|
-
Evaluate each Verilog design decision for compilation success and process efficiency.
|
|
152
|
-
High scores for successful compilation and strategic tool usage.
|
|
153
|
-
Penalize unnecessary operations and compilation failures."""
|
|
154
|
-
aggregation = "weighted_sum"
|
|
155
|
-
|
|
156
|
-
[[judge.options.rubric_overrides.event.criteria]]
|
|
157
|
-
id = "process.compilation_success"
|
|
158
|
-
weight = 0.7
|
|
159
|
-
scale = "bounded"
|
|
160
|
-
description = "Return 1.0 when compilation succeeds cleanly, 0.5 for warnings, 0.0 for errors"
|
|
161
|
-
|
|
162
|
-
[[judge.options.rubric_overrides.event.criteria]]
|
|
163
|
-
id = "process.design_iterations"
|
|
164
|
-
weight = 0.3
|
|
165
|
-
scale = "bounded"
|
|
166
|
-
description = "Reward efficient write→compile→simulate workflow, penalize redundant operations"
|
|
167
|
-
|
|
168
|
-
[judge.options.rubric_overrides.outcome]
|
|
169
|
-
goal_text = """
|
|
170
|
-
Evaluate the final Verilog implementation for correctness and quality.
|
|
171
|
-
High scores for working designs that pass all tests with good code quality."""
|
|
172
|
-
aggregation = "weighted_sum"
|
|
173
|
-
|
|
174
|
-
[[judge.options.rubric_overrides.outcome.criteria]]
|
|
175
|
-
id = "outcome.tests_passed"
|
|
176
|
-
weight = 0.8
|
|
177
|
-
scale = "binary"
|
|
178
|
-
description = "Full credit when all tests pass, partial credit for some tests passing"
|
|
179
|
-
|
|
180
|
-
[[judge.options.rubric_overrides.outcome.criteria]]
|
|
181
|
-
id = "outcome.design_quality"
|
|
182
|
-
weight = 0.2
|
|
183
|
-
scale = "bounded"
|
|
184
|
-
description = "Code clarity, proper documentation, and efficient design patterns"
|
|
185
|
-
|
|
186
|
-
[judge.options.weights]
|
|
187
|
-
process = 0.1
|
|
188
|
-
reasoning = 0.2
|
|
189
|
-
progress = 0.3
|
|
190
|
-
outcome = 0.4
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""Convert Crafter trace format to SFT format with messages[] structure."""
|
|
3
|
+
|
|
4
|
+
import json
|
|
5
|
+
import sys
|
|
6
|
+
from pathlib import Path
|
|
7
|
+
|
|
8
|
+
def convert_trace_to_sft(trace: dict) -> dict:
|
|
9
|
+
"""Convert a single trace to SFT format."""
|
|
10
|
+
# Extract dialogue from trace
|
|
11
|
+
dialogue = trace.get("dialogue", [])
|
|
12
|
+
assistant = trace.get("assistant", {})
|
|
13
|
+
|
|
14
|
+
# Build messages list
|
|
15
|
+
messages = []
|
|
16
|
+
|
|
17
|
+
# Add dialogue history
|
|
18
|
+
for msg in dialogue:
|
|
19
|
+
messages.append({
|
|
20
|
+
"role": msg["role"],
|
|
21
|
+
"content": msg["content"]
|
|
22
|
+
})
|
|
23
|
+
|
|
24
|
+
# Add assistant response if present
|
|
25
|
+
if assistant:
|
|
26
|
+
content = assistant.get("content", "")
|
|
27
|
+
tool_calls = assistant.get("tool_calls", [])
|
|
28
|
+
|
|
29
|
+
# If there are tool calls, format them
|
|
30
|
+
if tool_calls:
|
|
31
|
+
# Convert tool calls to a simple text format for SFT
|
|
32
|
+
tool_text = "\n".join([
|
|
33
|
+
f"Tool: {tc['name']}\nArguments: {json.dumps(tc.get('arguments', {}))}"
|
|
34
|
+
for tc in tool_calls
|
|
35
|
+
])
|
|
36
|
+
content = f"{content}\n\n{tool_text}".strip()
|
|
37
|
+
|
|
38
|
+
messages.append({
|
|
39
|
+
"role": "assistant",
|
|
40
|
+
"content": content
|
|
41
|
+
})
|
|
42
|
+
|
|
43
|
+
return {"messages": messages}
|
|
44
|
+
|
|
45
|
+
def main():
|
|
46
|
+
if len(sys.argv) < 2:
|
|
47
|
+
print("Usage: python convert_traces_to_sft.py <input.jsonl> [output.jsonl]")
|
|
48
|
+
sys.exit(1)
|
|
49
|
+
|
|
50
|
+
input_path = Path(sys.argv[1])
|
|
51
|
+
output_path = Path(sys.argv[2]) if len(sys.argv) > 2 else input_path.with_name(f"{input_path.stem}_sft_format.jsonl")
|
|
52
|
+
|
|
53
|
+
if not input_path.exists():
|
|
54
|
+
print(f"Error: Input file not found: {input_path}")
|
|
55
|
+
sys.exit(1)
|
|
56
|
+
|
|
57
|
+
print(f"Converting {input_path} → {output_path}")
|
|
58
|
+
|
|
59
|
+
converted = 0
|
|
60
|
+
skipped = 0
|
|
61
|
+
|
|
62
|
+
with open(input_path) as f_in, open(output_path, "w") as f_out:
|
|
63
|
+
for line_no, line in enumerate(f_in, 1):
|
|
64
|
+
try:
|
|
65
|
+
trace = json.loads(line.strip())
|
|
66
|
+
sft_entry = convert_trace_to_sft(trace)
|
|
67
|
+
|
|
68
|
+
# Only write if we have messages
|
|
69
|
+
if sft_entry["messages"]:
|
|
70
|
+
f_out.write(json.dumps(sft_entry) + "\n")
|
|
71
|
+
converted += 1
|
|
72
|
+
else:
|
|
73
|
+
skipped += 1
|
|
74
|
+
|
|
75
|
+
except Exception as e:
|
|
76
|
+
print(f"Warning: Skipping line {line_no}: {e}")
|
|
77
|
+
skipped += 1
|
|
78
|
+
|
|
79
|
+
print(f"✅ Converted {converted} entries, skipped {skipped}")
|
|
80
|
+
print(f"Output: {output_path}")
|
|
81
|
+
|
|
82
|
+
if __name__ == "__main__":
|
|
83
|
+
main()
|
|
84
|
+
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# Run SFT for Qwen3-Coder-30B with LoRA on Crafter data
|
|
3
|
+
|
|
4
|
+
# Usage:
|
|
5
|
+
# ./run_sft_qwen30b.sh <dataset_path> [env_file]
|
|
6
|
+
#
|
|
7
|
+
# Example:
|
|
8
|
+
# ./run_sft_qwen30b.sh examples/multi_step/ft_data/crafter_traces.jsonl
|
|
9
|
+
# ./run_sft_qwen30b.sh examples/multi_step/ft_data/crafter_traces.jsonl backend/.env.dev
|
|
10
|
+
|
|
11
|
+
set -e
|
|
12
|
+
|
|
13
|
+
DATASET_PATH="${1:-examples/sft/ft_data/crafter_traces.jsonl}"
|
|
14
|
+
ENV_FILE="${2:-backend/.env.dev}"
|
|
15
|
+
|
|
16
|
+
if [ ! -f "$DATASET_PATH" ]; then
|
|
17
|
+
echo "Error: Dataset not found at $DATASET_PATH"
|
|
18
|
+
echo "Usage: $0 <dataset_path> [env_file]"
|
|
19
|
+
exit 1
|
|
20
|
+
fi
|
|
21
|
+
|
|
22
|
+
if [ ! -f "$ENV_FILE" ]; then
|
|
23
|
+
echo "Error: Env file not found at $ENV_FILE"
|
|
24
|
+
echo "Usage: $0 <dataset_path> [env_file]"
|
|
25
|
+
exit 1
|
|
26
|
+
fi
|
|
27
|
+
|
|
28
|
+
echo "🚀 Starting SFT training for Qwen3-Coder-30B with LoRA"
|
|
29
|
+
echo " Model: Qwen/Qwen3-Coder-30B-A3B-Instruct"
|
|
30
|
+
echo " Dataset: $DATASET_PATH"
|
|
31
|
+
echo " Config: examples/multi_step/configs/crafter_sft_qwen30b_lora.toml"
|
|
32
|
+
echo " GPUs: 4x H200"
|
|
33
|
+
echo " LoRA: r=16, alpha=32, all-linear"
|
|
34
|
+
echo ""
|
|
35
|
+
|
|
36
|
+
uvx synth-ai train \
|
|
37
|
+
--type sft \
|
|
38
|
+
--config examples/multi_step/configs/crafter_sft_qwen30b_lora.toml \
|
|
39
|
+
--dataset "$DATASET_PATH" \
|
|
40
|
+
--env-file "$ENV_FILE"
|
|
41
|
+
|
|
42
|
+
echo ""
|
|
43
|
+
echo "✅ SFT training job submitted!"
|
|
44
|
+
echo " Monitor progress in your Synth dashboard"
|
|
45
|
+
|
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
[algorithm]
|
|
4
4
|
type = "offline"
|
|
5
5
|
method = "sft"
|
|
6
|
-
variety = "
|
|
6
|
+
variety = "qlora"
|
|
7
7
|
|
|
8
8
|
[job]
|
|
9
9
|
# Smallest supported Qwen3 base; replace with the smallest Coder variant when available
|
|
@@ -55,4 +55,3 @@ alpha = 32
|
|
|
55
55
|
dropout = 0.05
|
|
56
56
|
target_modules = ["all-linear"]
|
|
57
57
|
|
|
58
|
-
|
|
@@ -0,0 +1,232 @@
|
|
|
1
|
+
# Vision SFT Pipeline - Bugs and Fixes
|
|
2
|
+
|
|
3
|
+
Complete log of issues encountered and resolved during vision data collection setup.
|
|
4
|
+
|
|
5
|
+
## ✅ Issue #1: Import Error - CrafterEnvironment
|
|
6
|
+
|
|
7
|
+
**Problem:**
|
|
8
|
+
```python
|
|
9
|
+
ImportError: cannot import name 'CrafterEnvironment' from 'examples.task_apps.crafter.task_app.synth_envs_hosted.envs.crafter.environment'
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
**Root Cause:**
|
|
13
|
+
Class is named `CrafterEnvironmentWrapper`, not `CrafterEnvironment`
|
|
14
|
+
|
|
15
|
+
**Fix:**
|
|
16
|
+
Updated imports and usages in:
|
|
17
|
+
- `crafter_gpt5nano_agent.py`
|
|
18
|
+
- `crafter_qwen_vl_agent.py`
|
|
19
|
+
- `collect_vision_traces.py`
|
|
20
|
+
|
|
21
|
+
```python
|
|
22
|
+
# Before
|
|
23
|
+
from ...environment import CrafterEnvironment
|
|
24
|
+
wrapper = CrafterEnvironment(env, seed=seed)
|
|
25
|
+
|
|
26
|
+
# After
|
|
27
|
+
from ...environment import CrafterEnvironmentWrapper
|
|
28
|
+
wrapper = CrafterEnvironmentWrapper(env, seed=seed)
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**Status:** FIXED ✓
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## ✅ Issue #2: OpenAI API Parameter - max_tokens
|
|
36
|
+
|
|
37
|
+
**Problem:**
|
|
38
|
+
```
|
|
39
|
+
openai.BadRequestError: Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead."}}
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**Root Cause:**
|
|
43
|
+
gpt-5 models require `max_completion_tokens` parameter instead of `max_tokens`
|
|
44
|
+
|
|
45
|
+
**Fix:**
|
|
46
|
+
Updated `_normalise_openai_request()` function to detect gpt-5 models:
|
|
47
|
+
|
|
48
|
+
```python
|
|
49
|
+
def _normalise_openai_request(payload, model, temperature):
|
|
50
|
+
request = dict(payload)
|
|
51
|
+
request["model"] = model
|
|
52
|
+
|
|
53
|
+
# gpt-5 models use max_completion_tokens, not max_tokens
|
|
54
|
+
if "gpt-5" in model.lower():
|
|
55
|
+
request.setdefault("max_completion_tokens", 512)
|
|
56
|
+
request.pop("max_tokens", None) # Remove if present
|
|
57
|
+
else:
|
|
58
|
+
# Older models use max_tokens
|
|
59
|
+
request.setdefault("max_tokens", 512)
|
|
60
|
+
|
|
61
|
+
return request
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**Files Updated:**
|
|
65
|
+
- `crafter_gpt5nano_agent.py`
|
|
66
|
+
- `collect_vision_traces.py`
|
|
67
|
+
|
|
68
|
+
**Status:** FIXED ✓
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## ✅ Issue #3: OpenAI API Parameter - temperature
|
|
73
|
+
|
|
74
|
+
**Problem:**
|
|
75
|
+
```
|
|
76
|
+
openai.BadRequestError: Error code: 400 - {'error': {'message': "Unsupported value: 'temperature' does not support 0.6 with this model. Only the default (1) value is supported."}}
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
**Root Cause:**
|
|
80
|
+
gpt-5-nano only supports `temperature=1` (default), custom temperature values are not allowed
|
|
81
|
+
|
|
82
|
+
**Fix:**
|
|
83
|
+
Remove temperature parameter for gpt-5 models:
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
def _normalise_openai_request(payload, model, temperature):
|
|
87
|
+
# ...
|
|
88
|
+
|
|
89
|
+
if "gpt-5" in model.lower():
|
|
90
|
+
# gpt-5-nano only supports temperature=1 (default)
|
|
91
|
+
request.pop("temperature", None) # Remove custom temperature
|
|
92
|
+
request.setdefault("max_completion_tokens", 512)
|
|
93
|
+
request.pop("max_tokens", None)
|
|
94
|
+
else:
|
|
95
|
+
# Older models support custom temperature
|
|
96
|
+
request.setdefault("temperature", temperature)
|
|
97
|
+
request.setdefault("max_tokens", 512)
|
|
98
|
+
|
|
99
|
+
return request
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
**Files Updated:**
|
|
103
|
+
- `crafter_gpt5nano_agent.py`
|
|
104
|
+
- `collect_vision_traces.py`
|
|
105
|
+
|
|
106
|
+
**Status:** FIXED ✓
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## ⚠️ Issue #4: gpt-5-nano Tool Calling Support
|
|
111
|
+
|
|
112
|
+
**Problem:**
|
|
113
|
+
```
|
|
114
|
+
Seed 0: no tool calls returned by model; ending episode early at step 0.
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Root Cause:**
|
|
118
|
+
gpt-5-nano does not appear to support function/tool calling yet, or requires a different prompt format for tool use.
|
|
119
|
+
|
|
120
|
+
**Testing Results:**
|
|
121
|
+
- API returned 200 OK (auth and network fine)
|
|
122
|
+
- Model processed vision inputs successfully
|
|
123
|
+
- Model did not return tool calls even with tools schema provided
|
|
124
|
+
- Both episodes stopped immediately (step 0)
|
|
125
|
+
|
|
126
|
+
**Workaround:**
|
|
127
|
+
Switch to `gpt-4o-mini-2024-07-18` for data collection:
|
|
128
|
+
- Confirmed to support both vision AND tool calling
|
|
129
|
+
- Successfully completed 10 episodes with good quality
|
|
130
|
+
- Mean 2.6 achievements per episode
|
|
131
|
+
- 685 total tool calls across 10 episodes
|
|
132
|
+
|
|
133
|
+
**Status:** WORKAROUND APPLIED (use gpt-4o-mini) ✓
|
|
134
|
+
|
|
135
|
+
**Note:**
|
|
136
|
+
This is a model capability limitation, not a code bug. gpt-5-nano can be revisited when tool calling support is confirmed by OpenAI.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## 📊 Final Validation Results
|
|
141
|
+
|
|
142
|
+
### Test Run #5: 10-Episode Collection with gpt-4o-mini
|
|
143
|
+
|
|
144
|
+
**Command:**
|
|
145
|
+
```bash
|
|
146
|
+
uv run python examples/qwen_vl/crafter_gpt5nano_agent.py \
|
|
147
|
+
--model gpt-4o-mini-2024-07-18 \
|
|
148
|
+
--seeds 10 \
|
|
149
|
+
--steps 50
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
**Results:**
|
|
153
|
+
```
|
|
154
|
+
✓ All 10 episodes completed (50 steps each)
|
|
155
|
+
✓ Mean achievements: 2.6 per episode
|
|
156
|
+
✓ Total tool calls: 685
|
|
157
|
+
✓ Vision processing: Working (64x64 PNG frames)
|
|
158
|
+
✓ Tool calling: Working (proper tool call format)
|
|
159
|
+
✓ Frame saving: Working (saved to output directory)
|
|
160
|
+
✓ Performance: ~5-6 minutes for 10 episodes
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
**Quality Metrics:**
|
|
164
|
+
- Episode 1: 4 achievements, 72 tool calls, reward: 97.3
|
|
165
|
+
- Episode 5: 3 achievements, 62 tool calls, reward: 120.0
|
|
166
|
+
- Episode 8: 1 achievement, 71 tool calls, reward: 12.9
|
|
167
|
+
- Good variety in performance (1-4 achievements)
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## 🔧 Code Changes Summary
|
|
172
|
+
|
|
173
|
+
### Files Modified:
|
|
174
|
+
1. **crafter_gpt5nano_agent.py**
|
|
175
|
+
- Import: `CrafterEnvironment` → `CrafterEnvironmentWrapper`
|
|
176
|
+
- Function: `_normalise_openai_request()` - handle gpt-5 parameters
|
|
177
|
+
|
|
178
|
+
2. **crafter_qwen_vl_agent.py**
|
|
179
|
+
- Import: `CrafterEnvironment` → `CrafterEnvironmentWrapper`
|
|
180
|
+
|
|
181
|
+
3. **collect_vision_traces.py**
|
|
182
|
+
- Import: `CrafterEnvironment` → `CrafterEnvironmentWrapper`
|
|
183
|
+
- Function: `_normalise_openai_request()` - handle gpt-5 parameters
|
|
184
|
+
|
|
185
|
+
### Key Learnings:
|
|
186
|
+
1. ✅ Always check actual class names in source code
|
|
187
|
+
2. ✅ OpenAI's API evolves - newer models have different parameter requirements
|
|
188
|
+
3. ✅ Test with known-working models first (gpt-4o-mini) before trying cutting-edge ones
|
|
189
|
+
4. ✅ Vision + tool calling combo requires mature model support
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
## 🎯 Recommendations
|
|
194
|
+
|
|
195
|
+
### For Production:
|
|
196
|
+
- **Teacher model:** Use `gpt-4o-mini-2024-07-18` for data collection
|
|
197
|
+
- Proven to work with vision + tools
|
|
198
|
+
- Good quality (2-4 achievements per episode)
|
|
199
|
+
- Reasonable cost
|
|
200
|
+
|
|
201
|
+
- **Monitor gpt-5-nano:** Revisit when tool calling support is confirmed
|
|
202
|
+
|
|
203
|
+
### For Configs:
|
|
204
|
+
- Update eval configs to use `gpt-4o-mini` by default:
|
|
205
|
+
```toml
|
|
206
|
+
[eval]
|
|
207
|
+
model = "gpt-4o-mini-2024-07-18" # Not gpt-5-nano
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## ✅ All Issues Resolved
|
|
213
|
+
|
|
214
|
+
**Infrastructure Status:** READY FOR PRODUCTION ✓
|
|
215
|
+
|
|
216
|
+
- Vision processing: Working
|
|
217
|
+
- Tool calling: Working
|
|
218
|
+
- Frame saving: Working
|
|
219
|
+
- OpenAI API integration: Working
|
|
220
|
+
- 10-episode test: Successful
|
|
221
|
+
|
|
222
|
+
**Next Steps:**
|
|
223
|
+
1. Scale to 100 episodes for full dataset
|
|
224
|
+
2. Apply filters and export to SFT format
|
|
225
|
+
3. Train VLM with LoRA
|
|
226
|
+
4. Fine-tune with RL
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
**Last Updated:** 2025-10-26
|
|
231
|
+
**Test Environment:** synth-ai dev, macOS, Python 3.11
|
|
232
|
+
|