synth-ai 0.2.14__py3-none-any.whl → 0.2.17__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synth-ai might be problematic. Click here for more details.
- examples/README.md +1 -0
- examples/analyze_semantic_words.sh +2 -2
- examples/blog_posts/pokemon_vl/README.md +98 -0
- examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +25 -0
- examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
- examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
- examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +42 -0
- examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
- examples/blog_posts/warming_up_to_rl/README.md +158 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
- examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
- examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +41 -0
- examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
- examples/dev/qwen3_32b_qlora_4xh100.toml +5 -0
- examples/multi_step/SFT_README.md +147 -0
- examples/multi_step/configs/crafter_rl_outcome.toml +1 -1
- examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +73 -115
- examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +1 -1
- examples/multi_step/configs/crafter_rl_stepwise_simple.toml +1 -1
- examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
- examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
- examples/multi_step/configs/verilog_rl_lora.toml +80 -123
- examples/multi_step/convert_traces_to_sft.py +84 -0
- examples/multi_step/run_sft_qwen30b.sh +45 -0
- examples/qwen_coder/configs/coder_lora_30b.toml +1 -2
- examples/qwen_coder/configs/coder_lora_4b.toml +5 -1
- examples/qwen_coder/configs/coder_lora_small.toml +1 -2
- examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
- examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
- examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
- examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
- examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
- examples/qwen_vl/QUICKSTART.md +327 -0
- examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
- examples/qwen_vl/README.md +152 -0
- examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
- examples/qwen_vl/RL_VISION_TESTING.md +333 -0
- examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
- examples/qwen_vl/SETUP_COMPLETE.md +274 -0
- examples/qwen_vl/VISION_TESTS_COMPLETE.md +489 -0
- examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
- examples/qwen_vl/__init__.py +2 -0
- examples/qwen_vl/collect_data_via_cli.md +415 -0
- examples/qwen_vl/collect_vision_traces.py +368 -0
- examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +110 -0
- examples/qwen_vl/configs/crafter_vlm_sft_example.toml +59 -0
- examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +26 -0
- examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
- examples/qwen_vl/configs/eval_gpt5nano_vision.toml +26 -0
- examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
- examples/qwen_vl/configs/filter_qwen3vl_sft.toml +49 -0
- examples/qwen_vl/configs/filter_vision_sft.toml +52 -0
- examples/qwen_vl/configs/filter_vision_test.toml +8 -0
- examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
- examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
- examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
- examples/qwen_vl/run_vision_comparison.sh +61 -0
- examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
- examples/qwen_vl/test_image_validation.py +201 -0
- examples/qwen_vl/test_sft_vision_data.py +110 -0
- examples/rl/README.md +6 -6
- examples/rl/configs/eval_base_qwen.toml +17 -0
- examples/rl/configs/eval_rl_qwen.toml +13 -0
- examples/rl/configs/rl_from_base_qwen.toml +62 -0
- examples/rl/configs/rl_from_base_qwen17.toml +79 -0
- examples/rl/configs/rl_from_ft_qwen.toml +37 -0
- examples/rl/run_eval.py +436 -0
- examples/rl/run_rl_and_save.py +111 -0
- examples/rl/task_app/README.md +21 -0
- examples/rl/task_app/math_single_step.py +990 -0
- examples/rl/task_app/math_task_app.py +111 -0
- examples/run_crafter_demo.sh +2 -2
- examples/sft/README.md +6 -6
- examples/sft/configs/crafter_fft_qwen0p6b.toml +7 -2
- examples/sft/configs/crafter_lora_qwen0p6b.toml +7 -3
- examples/sft/evaluate.py +2 -4
- examples/sft/export_dataset.py +7 -4
- examples/swe/task_app/README.md +33 -3
- examples/swe/task_app/grpo_swe_mini.py +4 -1
- examples/swe/task_app/grpo_swe_mini_task_app.py +0 -12
- examples/swe/task_app/hosted/envs/crafter/react_agent.py +1 -1
- examples/swe/task_app/hosted/envs/mini_swe/environment.py +50 -23
- examples/swe/task_app/hosted/inference/openai_client.py +4 -4
- examples/swe/task_app/hosted/policy_routes.py +0 -2
- examples/swe/task_app/hosted/rollout.py +0 -8
- examples/swe/task_app/morph_backend.py +178 -0
- examples/task_apps/crafter/task_app/README.md +1 -1
- examples/task_apps/crafter/task_app/grpo_crafter.py +70 -10
- examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +1 -1
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +63 -27
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -2
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +48 -50
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +75 -36
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +31 -15
- examples/task_apps/enron/__init__.py +1 -0
- examples/task_apps/enron/task_app/grpo_enron_task_app.py +1 -1
- examples/task_apps/math/README.md +1 -2
- examples/task_apps/pokemon_red/README.md +3 -4
- examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +6 -5
- examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +1 -2
- examples/task_apps/pokemon_red/task_app.py +36 -5
- examples/task_apps/sokoban/README.md +2 -3
- examples/task_apps/verilog/eval_groq_qwen32b.toml +12 -14
- examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +1 -1
- examples/vlm/README.md +3 -3
- examples/vlm/configs/crafter_vlm_gpt4o.toml +5 -0
- examples/vlm/crafter_openai_vlm_agent.py +3 -5
- examples/vlm/filter_image_rows.py +1 -1
- examples/vlm/run_crafter_vlm_benchmark.py +2 -2
- examples/warming_up_to_rl/_utils.py +92 -0
- examples/warming_up_to_rl/analyze_trace_db.py +1 -1
- examples/warming_up_to_rl/configs/crafter_fft.toml +5 -0
- examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +2 -1
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +2 -1
- examples/warming_up_to_rl/configs/rl_from_ft.toml +2 -0
- examples/warming_up_to_rl/export_trace_sft.py +174 -60
- examples/warming_up_to_rl/readme.md +63 -132
- examples/warming_up_to_rl/run_fft_and_save.py +1 -1
- examples/warming_up_to_rl/run_local_rollout_traced.py +1 -1
- examples/warming_up_to_rl/run_rl_and_save.py +1 -1
- examples/warming_up_to_rl/task_app/README.md +42 -0
- examples/warming_up_to_rl/task_app/grpo_crafter.py +827 -0
- examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +454 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +204 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +618 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1084 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1861 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +62 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen.toml +27 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +5 -0
- synth_ai/__init__.py +44 -30
- synth_ai/_utils/__init__.py +47 -0
- synth_ai/_utils/base_url.py +10 -0
- synth_ai/_utils/http.py +10 -0
- synth_ai/_utils/prompts.py +10 -0
- synth_ai/_utils/task_app_state.py +12 -0
- synth_ai/_utils/user_config.py +10 -0
- synth_ai/api/models/supported.py +144 -7
- synth_ai/api/train/__init__.py +13 -1
- synth_ai/api/train/builders.py +9 -3
- synth_ai/api/train/cli.py +155 -17
- synth_ai/api/train/config_finder.py +18 -11
- synth_ai/api/train/configs/__init__.py +8 -1
- synth_ai/api/train/configs/rl.py +32 -7
- synth_ai/api/train/configs/sft.py +6 -2
- synth_ai/api/train/configs/shared.py +59 -2
- synth_ai/api/train/env_resolver.py +13 -10
- synth_ai/auth/credentials.py +119 -0
- synth_ai/cli/__init__.py +61 -69
- synth_ai/cli/_modal_wrapper.py +7 -5
- synth_ai/cli/_typer_patch.py +0 -2
- synth_ai/cli/_validate_task_app.py +22 -4
- synth_ai/cli/commands/__init__.py +17 -0
- synth_ai/cli/commands/demo/__init__.py +6 -0
- synth_ai/cli/commands/demo/core.py +163 -0
- synth_ai/cli/commands/deploy/__init__.py +23 -0
- synth_ai/cli/commands/deploy/core.py +614 -0
- synth_ai/cli/commands/deploy/errors.py +72 -0
- synth_ai/cli/commands/deploy/validation.py +11 -0
- synth_ai/cli/commands/eval/__init__.py +19 -0
- synth_ai/cli/commands/eval/core.py +1109 -0
- synth_ai/cli/commands/eval/errors.py +81 -0
- synth_ai/cli/commands/eval/validation.py +133 -0
- synth_ai/cli/commands/filter/__init__.py +12 -0
- synth_ai/cli/commands/filter/core.py +388 -0
- synth_ai/cli/commands/filter/errors.py +55 -0
- synth_ai/cli/commands/filter/validation.py +77 -0
- synth_ai/cli/commands/help/__init__.py +177 -0
- synth_ai/cli/commands/help/core.py +73 -0
- synth_ai/cli/commands/status/__init__.py +64 -0
- synth_ai/cli/commands/status/client.py +192 -0
- synth_ai/cli/commands/status/config.py +92 -0
- synth_ai/cli/commands/status/errors.py +20 -0
- synth_ai/cli/commands/status/formatters.py +164 -0
- synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
- synth_ai/cli/commands/status/subcommands/files.py +79 -0
- synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
- synth_ai/cli/commands/status/subcommands/models.py +79 -0
- synth_ai/cli/commands/status/subcommands/runs.py +81 -0
- synth_ai/cli/commands/status/subcommands/summary.py +47 -0
- synth_ai/cli/commands/status/utils.py +114 -0
- synth_ai/cli/commands/train/__init__.py +53 -0
- synth_ai/cli/commands/train/core.py +21 -0
- synth_ai/cli/commands/train/errors.py +117 -0
- synth_ai/cli/commands/train/judge_schemas.py +199 -0
- synth_ai/cli/commands/train/judge_validation.py +304 -0
- synth_ai/cli/commands/train/validation.py +443 -0
- synth_ai/cli/demo.py +2 -162
- synth_ai/cli/deploy/__init__.py +28 -0
- synth_ai/cli/deploy/core.py +5 -0
- synth_ai/cli/deploy/errors.py +23 -0
- synth_ai/cli/deploy/validation.py +5 -0
- synth_ai/cli/eval/__init__.py +36 -0
- synth_ai/cli/eval/core.py +5 -0
- synth_ai/cli/eval/errors.py +31 -0
- synth_ai/cli/eval/validation.py +5 -0
- synth_ai/cli/filter/__init__.py +28 -0
- synth_ai/cli/filter/core.py +5 -0
- synth_ai/cli/filter/errors.py +23 -0
- synth_ai/cli/filter/validation.py +5 -0
- synth_ai/cli/legacy_root_backup.py +3 -1
- synth_ai/cli/lib/__init__.py +10 -0
- synth_ai/cli/lib/task_app_discovery.py +7 -0
- synth_ai/cli/lib/task_app_env.py +518 -0
- synth_ai/cli/modal_serve/__init__.py +12 -0
- synth_ai/cli/modal_serve/core.py +14 -0
- synth_ai/cli/modal_serve/errors.py +8 -0
- synth_ai/cli/modal_serve/validation.py +11 -0
- synth_ai/cli/recent.py +2 -1
- synth_ai/cli/serve/__init__.py +12 -0
- synth_ai/cli/serve/core.py +14 -0
- synth_ai/cli/serve/errors.py +8 -0
- synth_ai/cli/serve/validation.py +11 -0
- synth_ai/cli/setup.py +21 -0
- synth_ai/cli/status.py +7 -126
- synth_ai/cli/task_app_deploy.py +7 -0
- synth_ai/cli/task_app_list.py +25 -0
- synth_ai/cli/task_app_modal_serve.py +11 -0
- synth_ai/cli/task_app_serve.py +11 -0
- synth_ai/cli/task_apps.py +110 -1499
- synth_ai/cli/traces.py +1 -1
- synth_ai/cli/train/__init__.py +12 -0
- synth_ai/cli/train/core.py +21 -0
- synth_ai/cli/train/errors.py +8 -0
- synth_ai/cli/train/validation.py +24 -0
- synth_ai/cli/train.py +5 -0
- synth_ai/cli/turso.py +1 -1
- synth_ai/cli/watch.py +1 -1
- synth_ai/demos/__init__.py +10 -0
- synth_ai/demos/core/__init__.py +28 -1
- synth_ai/demos/crafter/__init__.py +1 -0
- synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
- synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
- synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
- synth_ai/demos/demo_registry.py +176 -0
- synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +1 -1
- synth_ai/demos/math/__init__.py +1 -0
- synth_ai/demos/math/_common.py +16 -0
- synth_ai/demos/math/app.py +38 -0
- synth_ai/demos/math/config.toml +76 -0
- synth_ai/demos/math/deploy_modal.py +54 -0
- synth_ai/demos/math/modal_task_app.py +702 -0
- synth_ai/demos/math/task_app_entry.py +51 -0
- synth_ai/environments/environment/core.py +7 -1
- synth_ai/environments/examples/bandit/engine.py +0 -1
- synth_ai/environments/examples/bandit/environment.py +0 -1
- synth_ai/environments/examples/red/engine.py +33 -12
- synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
- synth_ai/environments/examples/red/environment.py +26 -0
- synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
- synth_ai/environments/examples/wordle/environment.py +0 -1
- synth_ai/evals/base.py +16 -5
- synth_ai/evals/client.py +1 -1
- synth_ai/http.py +8 -22
- synth_ai/inference/client.py +1 -1
- synth_ai/judge_schemas.py +4 -5
- synth_ai/learning/client.py +1 -1
- synth_ai/learning/health.py +1 -1
- synth_ai/learning/jobs.py +1 -1
- synth_ai/learning/rl/client.py +4 -2
- synth_ai/learning/rl/env_keys.py +1 -1
- synth_ai/learning/rl/secrets.py +1 -1
- synth_ai/learning/sft/client.py +1 -1
- synth_ai/learning/sft/data.py +407 -4
- synth_ai/learning/validators.py +4 -1
- synth_ai/streaming/__init__.py +29 -0
- synth_ai/streaming/config.py +94 -0
- synth_ai/streaming/handlers.py +469 -0
- synth_ai/streaming/streamer.py +301 -0
- synth_ai/streaming/types.py +95 -0
- synth_ai/task/apps/__init__.py +4 -2
- synth_ai/task/config.py +6 -4
- synth_ai/task/rubrics/__init__.py +1 -2
- synth_ai/task/rubrics/loaders.py +14 -10
- synth_ai/task/rubrics.py +219 -0
- synth_ai/task/trace_correlation_helpers.py +24 -11
- synth_ai/task/tracing_utils.py +14 -3
- synth_ai/task/validators.py +0 -1
- synth_ai/tracing_v3/abstractions.py +3 -3
- synth_ai/tracing_v3/config.py +15 -13
- synth_ai/tracing_v3/constants.py +21 -0
- synth_ai/tracing_v3/db_config.py +3 -1
- synth_ai/tracing_v3/decorators.py +10 -7
- synth_ai/tracing_v3/llm_call_record_helpers.py +5 -5
- synth_ai/tracing_v3/migration_helper.py +1 -2
- synth_ai/tracing_v3/session_tracer.py +7 -7
- synth_ai/tracing_v3/storage/base.py +29 -29
- synth_ai/tracing_v3/storage/config.py +3 -3
- synth_ai/tracing_v3/turso/daemon.py +8 -9
- synth_ai/tracing_v3/turso/native_manager.py +80 -72
- synth_ai/tracing_v3/utils.py +2 -2
- synth_ai/utils/__init__.py +101 -0
- synth_ai/utils/base_url.py +94 -0
- synth_ai/utils/cli.py +131 -0
- synth_ai/utils/env.py +294 -0
- synth_ai/utils/http.py +172 -0
- synth_ai/utils/modal.py +308 -0
- synth_ai/utils/process.py +212 -0
- synth_ai/utils/prompts.py +39 -0
- synth_ai/utils/sqld.py +122 -0
- synth_ai/utils/task_app_discovery.py +882 -0
- synth_ai/utils/task_app_env.py +186 -0
- synth_ai/utils/task_app_state.py +318 -0
- synth_ai/utils/user_config.py +137 -0
- synth_ai/v0/config/__init__.py +1 -5
- synth_ai/v0/config/base_url.py +1 -7
- synth_ai/v0/tracing/config.py +1 -1
- synth_ai/v0/tracing/decorators.py +1 -1
- synth_ai/v0/tracing/upload.py +1 -1
- synth_ai/v0/tracing_v1/config.py +1 -1
- synth_ai/v0/tracing_v1/decorators.py +1 -1
- synth_ai/v0/tracing_v1/upload.py +1 -1
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/METADATA +91 -32
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/RECORD +341 -154
- synth_ai/cli/man.py +0 -106
- synth_ai/cli/tui.py +0 -57
- synth_ai/compound/cais.py +0 -0
- synth_ai/core/experiment.py +0 -13
- synth_ai/core/system.py +0 -15
- synth_ai/demo_registry.py +0 -295
- synth_ai/handshake.py +0 -109
- synth_ai/tui/__init__.py +0 -5
- synth_ai/tui/__main__.py +0 -13
- synth_ai/tui/cli/__init__.py +0 -1
- synth_ai/tui/cli/query_experiments.py +0 -164
- synth_ai/tui/cli/query_experiments_v3.py +0 -164
- synth_ai/tui/dashboard.py +0 -906
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/entry_points.txt +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/licenses/LICENSE +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,274 @@
|
|
|
1
|
+
# ✅ VLM Setup Complete!
|
|
2
|
+
|
|
3
|
+
Complete vision-language model (VLM) infrastructure for Crafter with image observations.
|
|
4
|
+
|
|
5
|
+
## 📦 What Was Created
|
|
6
|
+
|
|
7
|
+
### **Core Examples** (Python Scripts)
|
|
8
|
+
1. **`crafter_gpt5nano_agent.py`** - Demo agent using OpenAI gpt-5-nano
|
|
9
|
+
2. **`crafter_qwen_vl_agent.py`** - Demo agent using Qwen-VL via synth-ai
|
|
10
|
+
3. **`collect_vision_traces.py`** - Manual trace collection script
|
|
11
|
+
|
|
12
|
+
### **CLI-Based Pipeline** (Recommended)
|
|
13
|
+
4. **`run_vision_sft_pipeline.sh`** - Complete automated pipeline
|
|
14
|
+
5. **`run_vision_comparison.sh`** - Compare gpt-5-nano vs Qwen-VL
|
|
15
|
+
|
|
16
|
+
### **Configuration Files**
|
|
17
|
+
6. **`configs/eval_gpt5nano_vision.toml`** - Eval config for gpt-5-nano
|
|
18
|
+
7. **`configs/eval_qwen3vl_vision.toml`** - Eval config for Qwen3-VL
|
|
19
|
+
8. **`configs/eval_gpt4o_mini_vision.toml`** - Eval config for gpt-4o-mini (stronger teacher)
|
|
20
|
+
9. **`configs/filter_vision_sft.toml`** - Filter config for gpt-5-nano traces
|
|
21
|
+
10. **`configs/filter_qwen3vl_sft.toml`** - Filter config for Qwen3-VL traces
|
|
22
|
+
11. **`configs/crafter_vlm_sft_example.toml`** - Example SFT training config
|
|
23
|
+
|
|
24
|
+
### **Documentation**
|
|
25
|
+
12. **`README.md`** - Overview and quick start
|
|
26
|
+
13. **`QUICKSTART.md`** - Complete manual pipeline guide
|
|
27
|
+
14. **`collect_data_via_cli.md`** - **Detailed CLI guide** ⭐
|
|
28
|
+
15. **`SETUP_COMPLETE.md`** - This file
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## 🚀 Quick Start (3 Commands)
|
|
33
|
+
|
|
34
|
+
### Option 1: Automated Pipeline
|
|
35
|
+
```bash
|
|
36
|
+
cd /Users/joshpurtell/Documents/GitHub/synth-ai
|
|
37
|
+
export OPENAI_API_KEY="sk-..."
|
|
38
|
+
bash examples/qwen_vl/run_vision_sft_pipeline.sh
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### Option 2: Step-by-Step CLI
|
|
42
|
+
```bash
|
|
43
|
+
# 1. Collect traces (30-60 min)
|
|
44
|
+
uvx synth-ai eval --config examples/qwen_vl/configs/eval_gpt5nano_vision.toml
|
|
45
|
+
|
|
46
|
+
# 2. Filter and export (< 1 min)
|
|
47
|
+
uvx synth-ai filter --config examples/qwen_vl/configs/filter_vision_sft.toml
|
|
48
|
+
|
|
49
|
+
# 3. Train SFT (2-4 hours)
|
|
50
|
+
cd /Users/joshpurtell/Documents/GitHub/monorepo
|
|
51
|
+
uvx synth-ai train --type sft --config configs/vision_sft/crafter_qwen3vl_8b_gpt5nano.toml
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### Option 3: Quick Demo
|
|
55
|
+
```bash
|
|
56
|
+
# Test gpt-5-nano (5 episodes, 10 steps each)
|
|
57
|
+
export OPENAI_API_KEY="sk-..."
|
|
58
|
+
uv run python examples/qwen_vl/crafter_gpt5nano_agent.py --seeds 5 --steps 10
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## 📖 Documentation Index
|
|
64
|
+
|
|
65
|
+
| File | Purpose |
|
|
66
|
+
|------|---------|
|
|
67
|
+
| **`collect_data_via_cli.md`** ⭐ | **Main guide**: Complete CLI-based pipeline |
|
|
68
|
+
| `README.md` | Overview and quick reference |
|
|
69
|
+
| `QUICKSTART.md` | Manual Python script approach |
|
|
70
|
+
| `SETUP_COMPLETE.md` | This summary (you are here) |
|
|
71
|
+
|
|
72
|
+
**Start here:** 👉 `collect_data_via_cli.md`
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## 🎯 What Each Tool Does
|
|
77
|
+
|
|
78
|
+
### **synth-ai eval** (Data Collection)
|
|
79
|
+
- Runs rollouts with vision-enabled models
|
|
80
|
+
- Automatically detects vision capability from model name
|
|
81
|
+
- Stores traces to SQLite with base64-encoded images
|
|
82
|
+
- Supports parallel episodes for faster collection
|
|
83
|
+
|
|
84
|
+
**Config:** `eval_gpt5nano_vision.toml`, `eval_qwen3vl_vision.toml`, etc.
|
|
85
|
+
|
|
86
|
+
### **synth-ai filter** (Quality Filtering)
|
|
87
|
+
- Removes low-quality episodes (too short, errors, loops)
|
|
88
|
+
- Deduplicates state-action pairs
|
|
89
|
+
- Exports to SFT JSONL format (OpenAI-style messages)
|
|
90
|
+
- Splits into train/val sets
|
|
91
|
+
|
|
92
|
+
**Config:** `filter_vision_sft.toml`, `filter_qwen3vl_sft.toml`
|
|
93
|
+
|
|
94
|
+
### **synth-ai train** (Model Training)
|
|
95
|
+
- Trains VLM with LoRA on collected traces
|
|
96
|
+
- Supports Qwen-VL models (Qwen2-VL, Qwen3-VL)
|
|
97
|
+
- Uses 2x or 4x H200 GPUs
|
|
98
|
+
- Saves adapters to HF Hub or S3
|
|
99
|
+
|
|
100
|
+
**Config:** `crafter_vlm_sft_example.toml` (in synth-ai repo)
|
|
101
|
+
**Training config:** `monorepo/configs/vision_sft/crafter_qwen3vl_8b_gpt5nano.toml`
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## 🔍 Key Features
|
|
106
|
+
|
|
107
|
+
### **Automatic Vision Detection**
|
|
108
|
+
CrafterPolicy auto-detects vision from model names:
|
|
109
|
+
```python
|
|
110
|
+
# These automatically enable vision:
|
|
111
|
+
"gpt-5-nano" # ✅
|
|
112
|
+
"gpt-4o-mini" # ✅
|
|
113
|
+
"Qwen2-VL-7B-Instruct" # ✅
|
|
114
|
+
"Qwen3-VL-8B" # ✅
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### **Multimodal Messages**
|
|
118
|
+
User messages include both text and images:
|
|
119
|
+
```json
|
|
120
|
+
{
|
|
121
|
+
"role": "user",
|
|
122
|
+
"content": [
|
|
123
|
+
{"type": "text", "text": "Observation: Health: 9/9, Hunger: 9/9..."},
|
|
124
|
+
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
|
|
125
|
+
]
|
|
126
|
+
}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### **64x64 PNG Images**
|
|
130
|
+
Crafter renders 64x64 frames as base64-encoded PNGs:
|
|
131
|
+
- Efficient token usage (~85 tokens per image)
|
|
132
|
+
- High enough resolution for gameplay
|
|
133
|
+
- Standard OpenAI vision format
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## 💰 Cost & Timeline
|
|
138
|
+
|
|
139
|
+
### Complete Pipeline (gpt-5-nano → SFT → RL)
|
|
140
|
+
|
|
141
|
+
| Step | Duration | Cost | Hardware |
|
|
142
|
+
|------|----------|------|----------|
|
|
143
|
+
| Data collection (100 episodes) | 30-60 min | ~$1-2 | OpenAI API |
|
|
144
|
+
| Filter & export | < 5 min | Free | Local |
|
|
145
|
+
| SFT training (2 epochs) | 2-4 hrs | ~$21 | 2x H200 |
|
|
146
|
+
| RL fine-tuning (20 iterations) | 6-10 hrs | ~$112 | 4x H200 |
|
|
147
|
+
| Evaluation (100 episodes × 4 models) | 2-3 hrs | ~$5 | 1x H200 |
|
|
148
|
+
|
|
149
|
+
**Total:** ~$140, 12-18 hours
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## 🎉 Next Steps
|
|
154
|
+
|
|
155
|
+
1. **Run a quick demo** to verify vision inference works:
|
|
156
|
+
```bash
|
|
157
|
+
uv run python examples/qwen_vl/crafter_gpt5nano_agent.py --seeds 3 --steps 5
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
2. **Collect training data** (100 episodes):
|
|
161
|
+
```bash
|
|
162
|
+
uvx synth-ai eval --config examples/qwen_vl/configs/eval_gpt5nano_vision.toml
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
3. **Filter and export** to SFT format:
|
|
166
|
+
```bash
|
|
167
|
+
uvx synth-ai filter --config examples/qwen_vl/configs/filter_vision_sft.toml
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
4. **Train VLM** with LoRA:
|
|
171
|
+
```bash
|
|
172
|
+
cd /Users/joshpurtell/Documents/GitHub/monorepo
|
|
173
|
+
uvx synth-ai train --type sft --config configs/vision_sft/crafter_qwen3vl_8b_gpt5nano.toml
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
5. **Fine-tune with RL** (optional):
|
|
177
|
+
```bash
|
|
178
|
+
uvx synth-ai train --type rl --config configs/vision_rl/crafter_qwen3vl_8b_grpo.toml
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
6. **Benchmark** final model vs baselines
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## 🔧 Customization
|
|
186
|
+
|
|
187
|
+
### Use a Different Teacher Model
|
|
188
|
+
Edit `configs/eval_gpt5nano_vision.toml`:
|
|
189
|
+
```toml
|
|
190
|
+
[eval]
|
|
191
|
+
model = "gpt-4o-mini-2024-07-18" # Stronger teacher
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
### Collect More Episodes
|
|
195
|
+
```toml
|
|
196
|
+
[eval]
|
|
197
|
+
seeds = "0-499" # Default: "0-99"
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
### Change Image Resolution
|
|
201
|
+
```toml
|
|
202
|
+
[eval.env_config]
|
|
203
|
+
env_params = {render_size = [128, 128]} # Default: [64, 64]
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
### Adjust Quality Filters
|
|
207
|
+
Edit `configs/filter_vision_sft.toml`:
|
|
208
|
+
```toml
|
|
209
|
+
[filter]
|
|
210
|
+
min_steps_per_episode = 10 # Stricter (default: 5)
|
|
211
|
+
min_achievements_per_episode = 2 # Require achievements (default: 0)
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## 📊 Expected Results
|
|
217
|
+
|
|
218
|
+
### Data Collection Quality
|
|
219
|
+
- **gpt-5-nano:** ~20-30% achievement rate
|
|
220
|
+
- **gpt-4o-mini:** ~35-45% achievement rate (better teacher)
|
|
221
|
+
- **Qwen2-VL-7B (base):** ~5-10% achievement rate
|
|
222
|
+
|
|
223
|
+
### SFT Performance (After Training)
|
|
224
|
+
- **Base Qwen-VL:** ~5-10% → **SFT:** ~20-30%
|
|
225
|
+
- **Improvement:** +15-20% absolute gain from distillation
|
|
226
|
+
|
|
227
|
+
### RL Performance (After 20 Iterations)
|
|
228
|
+
- **SFT:** ~20-30% → **SFT+RL:** ~40-50%
|
|
229
|
+
- **Improvement:** +20% absolute gain from RL fine-tuning
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
## 🐛 Troubleshooting
|
|
234
|
+
|
|
235
|
+
### Vision not detected
|
|
236
|
+
```bash
|
|
237
|
+
# Add explicitly in eval config:
|
|
238
|
+
use_vision = true
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
### API key errors
|
|
242
|
+
```bash
|
|
243
|
+
# OpenAI
|
|
244
|
+
export OPENAI_API_KEY="sk-..."
|
|
245
|
+
|
|
246
|
+
# synth-ai
|
|
247
|
+
export SYNTH_API_KEY="sk_live_..."
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
### Task app connection failed
|
|
251
|
+
```bash
|
|
252
|
+
# Check task app is running
|
|
253
|
+
curl https://synth-laboratories--grpo-crafter-task-app.modal.run/health
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
### Filter removes all samples
|
|
257
|
+
```bash
|
|
258
|
+
# Lower quality thresholds in filter config
|
|
259
|
+
min_steps_per_episode = 3
|
|
260
|
+
min_achievements_per_episode = 0
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
## 📚 Related Resources
|
|
266
|
+
|
|
267
|
+
- **Main plan:** `/Users/joshpurtell/Documents/GitHub/monorepo/vision_sft_rl.txt` (Phase 9)
|
|
268
|
+
- **Crafter environment:** `examples/task_apps/crafter/README.md`
|
|
269
|
+
- **OpenAI VLM examples:** `examples/vlm/`
|
|
270
|
+
- **synth-ai CLI docs:** Run `uvx synth-ai --help`
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
**Infrastructure ready!** 🎉 Start collecting vision traces and training your VLM! 🚀
|