PyPI - synth-ai - Versions diffs - 0.2.14__py3-none-any.whl → 0.2.17__py3-none-any.whl - Mend

synth-ai 0.2.14py3-none-any.whl → 0.2.17py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of synth-ai might be problematic. Click here for more details.

Files changed (354) hide show

examples/README.md +1 -0
examples/analyze_semantic_words.sh +2 -2
examples/blog_posts/pokemon_vl/README.md +98 -0
examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +25 -0
examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +42 -0
examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
examples/blog_posts/warming_up_to_rl/README.md +158 -0
examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +41 -0
examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
examples/dev/qwen3_32b_qlora_4xh100.toml +5 -0
examples/multi_step/SFT_README.md +147 -0
examples/multi_step/configs/crafter_rl_outcome.toml +1 -1
examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +73 -115
examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +1 -1
examples/multi_step/configs/crafter_rl_stepwise_simple.toml +1 -1
examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
examples/multi_step/configs/verilog_rl_lora.toml +80 -123
examples/multi_step/convert_traces_to_sft.py +84 -0
examples/multi_step/run_sft_qwen30b.sh +45 -0
examples/qwen_coder/configs/coder_lora_30b.toml +1 -2
examples/qwen_coder/configs/coder_lora_4b.toml +5 -1
examples/qwen_coder/configs/coder_lora_small.toml +1 -2
examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
examples/qwen_vl/QUICKSTART.md +327 -0
examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
examples/qwen_vl/README.md +152 -0
examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
examples/qwen_vl/RL_VISION_TESTING.md +333 -0
examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
examples/qwen_vl/SETUP_COMPLETE.md +274 -0
examples/qwen_vl/VISION_TESTS_COMPLETE.md +489 -0
examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
examples/qwen_vl/__init__.py +2 -0
examples/qwen_vl/collect_data_via_cli.md +415 -0
examples/qwen_vl/collect_vision_traces.py +368 -0
examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +110 -0
examples/qwen_vl/configs/crafter_vlm_sft_example.toml +59 -0
examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +26 -0
examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
examples/qwen_vl/configs/eval_gpt5nano_vision.toml +26 -0
examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
examples/qwen_vl/configs/filter_qwen3vl_sft.toml +49 -0
examples/qwen_vl/configs/filter_vision_sft.toml +52 -0
examples/qwen_vl/configs/filter_vision_test.toml +8 -0
examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
examples/qwen_vl/run_vision_comparison.sh +61 -0
examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
examples/qwen_vl/test_image_validation.py +201 -0
examples/qwen_vl/test_sft_vision_data.py +110 -0
examples/rl/README.md +6 -6
examples/rl/configs/eval_base_qwen.toml +17 -0
examples/rl/configs/eval_rl_qwen.toml +13 -0
examples/rl/configs/rl_from_base_qwen.toml +62 -0
examples/rl/configs/rl_from_base_qwen17.toml +79 -0
examples/rl/configs/rl_from_ft_qwen.toml +37 -0
examples/rl/run_eval.py +436 -0
examples/rl/run_rl_and_save.py +111 -0
examples/rl/task_app/README.md +21 -0
examples/rl/task_app/math_single_step.py +990 -0
examples/rl/task_app/math_task_app.py +111 -0
examples/run_crafter_demo.sh +2 -2
examples/sft/README.md +6 -6
examples/sft/configs/crafter_fft_qwen0p6b.toml +7 -2
examples/sft/configs/crafter_lora_qwen0p6b.toml +7 -3
examples/sft/evaluate.py +2 -4
examples/sft/export_dataset.py +7 -4
examples/swe/task_app/README.md +33 -3
examples/swe/task_app/grpo_swe_mini.py +4 -1
examples/swe/task_app/grpo_swe_mini_task_app.py +0 -12
examples/swe/task_app/hosted/envs/crafter/react_agent.py +1 -1
examples/swe/task_app/hosted/envs/mini_swe/environment.py +50 -23
examples/swe/task_app/hosted/inference/openai_client.py +4 -4
examples/swe/task_app/hosted/policy_routes.py +0 -2
examples/swe/task_app/hosted/rollout.py +0 -8
examples/swe/task_app/morph_backend.py +178 -0
examples/task_apps/crafter/task_app/README.md +1 -1
examples/task_apps/crafter/task_app/grpo_crafter.py +70 -10
examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +1 -1
examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +63 -27
examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -2
examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +48 -50
examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +75 -36
examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +31 -15
examples/task_apps/enron/__init__.py +1 -0
examples/task_apps/enron/task_app/grpo_enron_task_app.py +1 -1
examples/task_apps/math/README.md +1 -2
examples/task_apps/pokemon_red/README.md +3 -4
examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +6 -5
examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +1 -2
examples/task_apps/pokemon_red/task_app.py +36 -5
examples/task_apps/sokoban/README.md +2 -3
examples/task_apps/verilog/eval_groq_qwen32b.toml +12 -14
examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +1 -1
examples/vlm/README.md +3 -3
examples/vlm/configs/crafter_vlm_gpt4o.toml +5 -0
examples/vlm/crafter_openai_vlm_agent.py +3 -5
examples/vlm/filter_image_rows.py +1 -1
examples/vlm/run_crafter_vlm_benchmark.py +2 -2
examples/warming_up_to_rl/_utils.py +92 -0
examples/warming_up_to_rl/analyze_trace_db.py +1 -1
examples/warming_up_to_rl/configs/crafter_fft.toml +5 -0
examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +2 -0
examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +2 -0
examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +2 -1
examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +2 -1
examples/warming_up_to_rl/configs/rl_from_ft.toml +2 -0
examples/warming_up_to_rl/export_trace_sft.py +174 -60
examples/warming_up_to_rl/readme.md +63 -132
examples/warming_up_to_rl/run_fft_and_save.py +1 -1
examples/warming_up_to_rl/run_local_rollout_traced.py +1 -1
examples/warming_up_to_rl/run_rl_and_save.py +1 -1
examples/warming_up_to_rl/task_app/README.md +42 -0
examples/warming_up_to_rl/task_app/grpo_crafter.py +827 -0
examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +454 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +204 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +618 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1084 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1861 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +62 -0
examples/workflows/math_rl/configs/rl_from_base_qwen.toml +27 -0
examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +5 -0
synth_ai/__init__.py +44 -30
synth_ai/_utils/__init__.py +47 -0
synth_ai/_utils/base_url.py +10 -0
synth_ai/_utils/http.py +10 -0
synth_ai/_utils/prompts.py +10 -0
synth_ai/_utils/task_app_state.py +12 -0
synth_ai/_utils/user_config.py +10 -0
synth_ai/api/models/supported.py +144 -7
synth_ai/api/train/__init__.py +13 -1
synth_ai/api/train/builders.py +9 -3
synth_ai/api/train/cli.py +155 -17
synth_ai/api/train/config_finder.py +18 -11
synth_ai/api/train/configs/__init__.py +8 -1
synth_ai/api/train/configs/rl.py +32 -7
synth_ai/api/train/configs/sft.py +6 -2
synth_ai/api/train/configs/shared.py +59 -2
synth_ai/api/train/env_resolver.py +13 -10
synth_ai/auth/credentials.py +119 -0
synth_ai/cli/__init__.py +61 -69
synth_ai/cli/_modal_wrapper.py +7 -5
synth_ai/cli/_typer_patch.py +0 -2
synth_ai/cli/_validate_task_app.py +22 -4
synth_ai/cli/commands/__init__.py +17 -0
synth_ai/cli/commands/demo/__init__.py +6 -0
synth_ai/cli/commands/demo/core.py +163 -0
synth_ai/cli/commands/deploy/__init__.py +23 -0
synth_ai/cli/commands/deploy/core.py +614 -0
synth_ai/cli/commands/deploy/errors.py +72 -0
synth_ai/cli/commands/deploy/validation.py +11 -0
synth_ai/cli/commands/eval/__init__.py +19 -0
synth_ai/cli/commands/eval/core.py +1109 -0
synth_ai/cli/commands/eval/errors.py +81 -0
synth_ai/cli/commands/eval/validation.py +133 -0
synth_ai/cli/commands/filter/__init__.py +12 -0
synth_ai/cli/commands/filter/core.py +388 -0
synth_ai/cli/commands/filter/errors.py +55 -0
synth_ai/cli/commands/filter/validation.py +77 -0
synth_ai/cli/commands/help/__init__.py +177 -0
synth_ai/cli/commands/help/core.py +73 -0
synth_ai/cli/commands/status/__init__.py +64 -0
synth_ai/cli/commands/status/client.py +192 -0
synth_ai/cli/commands/status/config.py +92 -0
synth_ai/cli/commands/status/errors.py +20 -0
synth_ai/cli/commands/status/formatters.py +164 -0
synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
synth_ai/cli/commands/status/subcommands/files.py +79 -0
synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
synth_ai/cli/commands/status/subcommands/models.py +79 -0
synth_ai/cli/commands/status/subcommands/runs.py +81 -0
synth_ai/cli/commands/status/subcommands/summary.py +47 -0
synth_ai/cli/commands/status/utils.py +114 -0
synth_ai/cli/commands/train/__init__.py +53 -0
synth_ai/cli/commands/train/core.py +21 -0
synth_ai/cli/commands/train/errors.py +117 -0
synth_ai/cli/commands/train/judge_schemas.py +199 -0
synth_ai/cli/commands/train/judge_validation.py +304 -0
synth_ai/cli/commands/train/validation.py +443 -0
synth_ai/cli/demo.py +2 -162
synth_ai/cli/deploy/__init__.py +28 -0
synth_ai/cli/deploy/core.py +5 -0
synth_ai/cli/deploy/errors.py +23 -0
synth_ai/cli/deploy/validation.py +5 -0
synth_ai/cli/eval/__init__.py +36 -0
synth_ai/cli/eval/core.py +5 -0
synth_ai/cli/eval/errors.py +31 -0
synth_ai/cli/eval/validation.py +5 -0
synth_ai/cli/filter/__init__.py +28 -0
synth_ai/cli/filter/core.py +5 -0
synth_ai/cli/filter/errors.py +23 -0
synth_ai/cli/filter/validation.py +5 -0
synth_ai/cli/legacy_root_backup.py +3 -1
synth_ai/cli/lib/__init__.py +10 -0
synth_ai/cli/lib/task_app_discovery.py +7 -0
synth_ai/cli/lib/task_app_env.py +518 -0
synth_ai/cli/modal_serve/__init__.py +12 -0
synth_ai/cli/modal_serve/core.py +14 -0
synth_ai/cli/modal_serve/errors.py +8 -0
synth_ai/cli/modal_serve/validation.py +11 -0
synth_ai/cli/recent.py +2 -1
synth_ai/cli/serve/__init__.py +12 -0
synth_ai/cli/serve/core.py +14 -0
synth_ai/cli/serve/errors.py +8 -0
synth_ai/cli/serve/validation.py +11 -0
synth_ai/cli/setup.py +21 -0
synth_ai/cli/status.py +7 -126
synth_ai/cli/task_app_deploy.py +7 -0
synth_ai/cli/task_app_list.py +25 -0
synth_ai/cli/task_app_modal_serve.py +11 -0
synth_ai/cli/task_app_serve.py +11 -0
synth_ai/cli/task_apps.py +110 -1499
synth_ai/cli/traces.py +1 -1
synth_ai/cli/train/__init__.py +12 -0
synth_ai/cli/train/core.py +21 -0
synth_ai/cli/train/errors.py +8 -0
synth_ai/cli/train/validation.py +24 -0
synth_ai/cli/train.py +5 -0
synth_ai/cli/turso.py +1 -1
synth_ai/cli/watch.py +1 -1
synth_ai/demos/__init__.py +10 -0
synth_ai/demos/core/__init__.py +28 -1
synth_ai/demos/crafter/__init__.py +1 -0
synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
synth_ai/demos/demo_registry.py +176 -0
synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +1 -1
synth_ai/demos/math/__init__.py +1 -0
synth_ai/demos/math/_common.py +16 -0
synth_ai/demos/math/app.py +38 -0
synth_ai/demos/math/config.toml +76 -0
synth_ai/demos/math/deploy_modal.py +54 -0
synth_ai/demos/math/modal_task_app.py +702 -0
synth_ai/demos/math/task_app_entry.py +51 -0
synth_ai/environments/environment/core.py +7 -1
synth_ai/environments/examples/bandit/engine.py +0 -1
synth_ai/environments/examples/bandit/environment.py +0 -1
synth_ai/environments/examples/red/engine.py +33 -12
synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
synth_ai/environments/examples/red/environment.py +26 -0
synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
synth_ai/environments/examples/wordle/environment.py +0 -1
synth_ai/evals/base.py +16 -5
synth_ai/evals/client.py +1 -1
synth_ai/http.py +8 -22
synth_ai/inference/client.py +1 -1
synth_ai/judge_schemas.py +4 -5
synth_ai/learning/client.py +1 -1
synth_ai/learning/health.py +1 -1
synth_ai/learning/jobs.py +1 -1
synth_ai/learning/rl/client.py +4 -2
synth_ai/learning/rl/env_keys.py +1 -1
synth_ai/learning/rl/secrets.py +1 -1
synth_ai/learning/sft/client.py +1 -1
synth_ai/learning/sft/data.py +407 -4
synth_ai/learning/validators.py +4 -1
synth_ai/streaming/__init__.py +29 -0
synth_ai/streaming/config.py +94 -0
synth_ai/streaming/handlers.py +469 -0
synth_ai/streaming/streamer.py +301 -0
synth_ai/streaming/types.py +95 -0
synth_ai/task/apps/__init__.py +4 -2
synth_ai/task/config.py +6 -4
synth_ai/task/rubrics/__init__.py +1 -2
synth_ai/task/rubrics/loaders.py +14 -10
synth_ai/task/rubrics.py +219 -0
synth_ai/task/trace_correlation_helpers.py +24 -11
synth_ai/task/tracing_utils.py +14 -3
synth_ai/task/validators.py +0 -1
synth_ai/tracing_v3/abstractions.py +3 -3
synth_ai/tracing_v3/config.py +15 -13
synth_ai/tracing_v3/constants.py +21 -0
synth_ai/tracing_v3/db_config.py +3 -1
synth_ai/tracing_v3/decorators.py +10 -7
synth_ai/tracing_v3/llm_call_record_helpers.py +5 -5
synth_ai/tracing_v3/migration_helper.py +1 -2
synth_ai/tracing_v3/session_tracer.py +7 -7
synth_ai/tracing_v3/storage/base.py +29 -29
synth_ai/tracing_v3/storage/config.py +3 -3
synth_ai/tracing_v3/turso/daemon.py +8 -9
synth_ai/tracing_v3/turso/native_manager.py +80 -72
synth_ai/tracing_v3/utils.py +2 -2
synth_ai/utils/__init__.py +101 -0
synth_ai/utils/base_url.py +94 -0
synth_ai/utils/cli.py +131 -0
synth_ai/utils/env.py +294 -0
synth_ai/utils/http.py +172 -0
synth_ai/utils/modal.py +308 -0
synth_ai/utils/process.py +212 -0
synth_ai/utils/prompts.py +39 -0
synth_ai/utils/sqld.py +122 -0
synth_ai/utils/task_app_discovery.py +882 -0
synth_ai/utils/task_app_env.py +186 -0
synth_ai/utils/task_app_state.py +318 -0
synth_ai/utils/user_config.py +137 -0
synth_ai/v0/config/__init__.py +1 -5
synth_ai/v0/config/base_url.py +1 -7
synth_ai/v0/tracing/config.py +1 -1
synth_ai/v0/tracing/decorators.py +1 -1
synth_ai/v0/tracing/upload.py +1 -1
synth_ai/v0/tracing_v1/config.py +1 -1
synth_ai/v0/tracing_v1/decorators.py +1 -1
synth_ai/v0/tracing_v1/upload.py +1 -1
{synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/METADATA +91 -32
{synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/RECORD +341 -154
synth_ai/cli/man.py +0 -106
synth_ai/cli/tui.py +0 -57
synth_ai/compound/cais.py +0 -0
synth_ai/core/experiment.py +0 -13
synth_ai/core/system.py +0 -15
synth_ai/demo_registry.py +0 -295
synth_ai/handshake.py +0 -109
synth_ai/tui/__init__.py +0 -5
synth_ai/tui/__main__.py +0 -13
synth_ai/tui/cli/__init__.py +0 -1
synth_ai/tui/cli/query_experiments.py +0 -164
synth_ai/tui/cli/query_experiments_v3.py +0 -164
synth_ai/tui/dashboard.py +0 -906
{synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/WHEEL +0 -0
{synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/entry_points.txt +0 -0
{synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/licenses/LICENSE +0 -0
{synth_ai-0.2.14.dist-info → synth_ai-0.2.17.dist-info}/top_level.txt +0 -0

examples/qwen_vl/RL_VISION_TESTING.md ADDED Viewed

@@ -0,0 +1,333 @@
+# Vision RL Integration Testing
+Complete integration tests for Reinforcement Learning with vision-language models using the Crafter task app.
+## Overview
+These tests verify the full vision RL pipeline:
+1. **Task App**: Same Crafter task app used for SFT data collection (generates image observations)
+2. **Model**: Qwen3-VL-4B (smaller, faster for testing)
+3. **Policy**: Uses `image_only_mode=true` - agent sees only images, no text observations
+4. **Training**: Full RL (GRPO/GSPO) with vision-capable model
+## Files
+### Configs
+- `configs/crafter_rl_vision_qwen3vl4b.toml` - Full RL config for Qwen3-VL-4B with vision
+### Tests
+- `../../tests/integration/cli/test_cli_train_rl_vision.py` - Integration tests:
+  - `test_cli_train_rl_vision_qwen3vl4b` - Full RL training test
+  - `test_task_app_vision_support` - Task app vision capability test
+## Quick Start
+### 1. Prerequisites
+```bash
+# Required environment variables
+export SYNTH_API_KEY="your-api-key"
+export BACKEND_BASE_URL="https://agent-learning.onrender.com/api"  # or your backend
+export ENVIRONMENT_API_KEY="your-modal-key"  # For Modal deployment
+# Optional: for faster testing
+export TASK_APP_WARMUP_TIMEOUT=300  # 5min for vision models
+export SYNTH_TRAIN_TEST_POLL_TIMEOUT=180
+```
+### 2. Run Tests
+```bash
+cd /Users/joshpurtell/Documents/GitHub/synth-ai
+# Run all vision RL tests
+uv run pytest tests/integration/cli/test_cli_train_rl_vision.py -v -s
+# Run specific test
+uv run pytest tests/integration/cli/test_cli_train_rl_vision.py::test_cli_train_rl_vision_qwen3vl4b -v -s
+# Run with marks
+uv run pytest -m "vision and slow" -v -s
+```
+### 3. Manual RL Training (without pytest)
+```bash
+# 1. Deploy task app (if not already deployed)
+uvx synth-ai task-app deploy grpo-crafter --name grpo-crafter-task-app
+# 2. Get task app URL (from deploy output)
+export TASK_APP_URL="https://your-app.modal.run"
+# 3. Run RL training
+uvx synth-ai train \
+  --type rl \
+  --config examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml \
+  --backend $BACKEND_BASE_URL \
+  --task-url $TASK_APP_URL
+```
+## Configuration Details
+### Model: Qwen3-VL-4B
+```toml
+[model]
+base = "Qwen/Qwen3-VL-4B-Instruct"
+trainer_mode = "lora"
+supports_vision = true  # Enable vision support
+```
+### Vision-Specific Settings
+```toml
+[vllm]
+limit_mm_per_prompt = { "image": 1 }  # Max 1 image per prompt
+[rollout.policy_config]
+use_vision = true  # Enable vision input
+image_only_mode = true  # Use only images, no text observations
+temperature = 0.6
+max_tokens = 512
+[training]
+batch_size = 2  # Smaller for vision models (memory)
+max_images_per_message = 1
+supports_vision = true
+```
+### GPU Allocation (2x H200)
+```toml
+[topology]
+gpus_for_vllm = 1  # Inference
+gpus_for_training = 1  # Training
+tensor_parallel = 1
+```
+## Test Details
+### Test 1: Full RL Training
+**Function:** `test_cli_train_rl_vision_qwen3vl4b`
+**What it tests:**
+1. Task app deployment
+2. Task app warmup (health check)
+3. RL job submission with vision config
+4. Job creation confirmation
+**Expected output:**
+```
+✅ Vision RL job created: job-abc123
+   Model: Qwen3-VL-4B
+   Task App: https://your-app.modal.run
+   Image Mode: image_only
+```
+**Runtime:** ~5-10 minutes (deploy + warmup + job submit)
+### Test 2: Task App Vision Support
+**Function:** `test_task_app_vision_support`
+**What it tests:**
+1. Task app can be deployed
+2. Task app health endpoint responds
+3. Task app accepts vision policy config
+4. Can make rollout request with `use_vision=true` and `image_only_mode=true`
+**Expected output:**
+```
+✅ Task app supports vision config
+   Response keys: ['trajectory', 'metadata', ...]
+```
+**Runtime:** ~2-3 minutes (deploy + warmup + single rollout)
+## Task App Details
+The Crafter task app (`grpo-crafter-task-app`) provides:
+### Environment
+- **Crafter game** with visual observations
+- Generates RGB images (64x64 or configurable)
+- Text observations also available (but ignored in `image_only_mode`)
+### Policy (crafter-react)
+- **Vision Detection:** Auto-detects vision models from name (e.g., "Qwen3-VL", "gpt-4o-mini")
+- **Image Formatting:** Converts observations to OpenAI-style multimodal messages
+- **Tool Calling:** Supports structured action space via tools
+### Trace Format
+- **Structured traces** with multimodal messages
+- Images stored as base64 in trace DB
+- Compatible with `synth-ai filter` for SFT export
+## Integration with SFT Pipeline
+This RL setup uses the **same task app** as the SFT data collection:
+### SFT Data Collection
+```bash
+# Collect episodes with gpt-4o-mini teacher
+uvx synth-ai eval --config configs/eval_gpt4o_vision_proper.toml
+# Export to SFT dataset
+uvx synth-ai filter --config configs/filter_vision_sft.toml
+```
+### RL Training
+```bash
+# Train student model (Qwen3-VL-4B) with RL
+uvx synth-ai train \
+  --type rl \
+  --config configs/crafter_rl_vision_qwen3vl4b.toml
+```
+**Benefits:**
+1. **Consistency:** Same environment, same observations
+2. **Curriculum:** SFT → RL progression
+3. **Debugging:** Compare SFT and RL traces in same format
+## Troubleshooting
+### Task App Deployment Fails
+```bash
+# Check Modal auth
+modal token set --token-id <id> --token-secret <secret>
+# Check environment variables
+echo $SYNTH_API_KEY
+echo $ENVIRONMENT_API_KEY
+# Try manual deploy
+uvx synth-ai task-app deploy grpo-crafter --name grpo-crafter-task-app
+```
+### Task App Won't Warm Up
+```bash
+# Increase timeout
+export TASK_APP_WARMUP_TIMEOUT=600  # 10 minutes
+# Check task app logs in Modal dashboard
+# https://modal.com/apps
+# Try health check manually
+curl https://your-app.modal.run/health
+```
+### RL Job Submission Fails
+```bash
+# Check backend connectivity
+curl $BACKEND_BASE_URL/health
+# Verify API key
+curl -H "Authorization: Bearer $SYNTH_API_KEY" $BACKEND_BASE_URL/api/health
+# Check task app URL format
+echo $TASK_APP_URL  # Should be https://...modal.run
+```
+### Vision Model OOM (Out of Memory)
+```toml
+# Reduce batch size in config
+[training]
+batch_size = 1  # Down from 2
+gradient_accumulation_steps = 4  # Up from 2
+# Reduce concurrent rollouts
+[rollout]
+max_concurrent_rollouts = 2  # Down from 4
+```
+### Images Not Appearing in Training
+```bash
+# Verify vision support is enabled
+grep -A 5 "\[model\]" configs/crafter_rl_vision_qwen3vl4b.toml
+# Should show: supports_vision = true
+# Check policy config
+grep -A 10 "\[rollout.policy_config\]" configs/crafter_rl_vision_qwen3vl4b.toml
+# Should show: use_vision = true, image_only_mode = true
+# Verify vLLM config
+grep -A 3 "\[vllm\]" configs/crafter_rl_vision_qwen3vl4b.toml
+# Should show: limit_mm_per_prompt = { "image": 1 }
+```
+## Performance Expectations
+### Qwen3-VL-4B (2x H200)
+- **Throughput:** ~2-4 episodes/min (with TP=1)
+- **Memory:** ~40-60GB GPU (model + images + gradients)
+- **Iteration Time:** ~10-15 min (with 4 episodes, 10 steps each)
+### Training Time Estimates
+- **3 iterations (test):** ~30-45 minutes
+- **10 iterations (short run):** ~2-3 hours
+- **50 iterations (full run):** ~12-20 hours
+## Next Steps
+### 1. Baseline Evaluation
+```bash
+# Evaluate untrained model
+uvx synth-ai eval \
+  --model Qwen/Qwen3-VL-4B-Instruct \
+  --env crafter \
+  --seeds 0,1,2,3,4 \
+  --policy-config '{"use_vision": true, "image_only_mode": true}'
+```
+### 2. SFT Initialization (Optional)
+```bash
+# Train on teacher demonstrations first
+uvx synth-ai train \
+  --type sft \
+  --model Qwen/Qwen3-VL-4B-Instruct \
+  --data traces/gpt4o_vision/sft/train.jsonl
+```
+### 3. RL Fine-Tuning
+```bash
+# Run full RL training
+uvx synth-ai train \
+  --type rl \
+  --config configs/crafter_rl_vision_qwen3vl4b.toml \
+  --iterations 50
+```
+### 4. Eval Comparison
+```bash
+# Compare pre-trained vs post-RL
+uvx synth-ai eval --model <rl-checkpoint> --seeds 0-9
+```
+## References
+- **VLM SFT Pipeline:** `examples/qwen_vl/PIPELINE_RUN_LOG.txt`
+- **Image Validation:** `examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md`
+- **Task App Source:** `examples/task_apps/crafter/task_app/`
+- **Policy Implementation:** `examples/task_apps/crafter/task_app/synth_envs_hosted/policy.py`
+## CI Integration
+### Pytest Marks
+```python
+@pytest.mark.slow      # Takes >5 minutes
+@pytest.mark.vision    # Requires vision model support
+@pytest.mark.integration  # Full pipeline test
+```
+### Run in CI
+```bash
+# Run all integration tests including vision
+pytest tests/integration/cli/ -m integration -v
+# Run only vision tests
+pytest -m vision -v
+# Skip slow tests for PR checks
+pytest -m "not slow" -v
+```
+---
+**Status:** ✅ Integration tests ready. Task app and RL config validated for Qwen3-VL-4B with image-only observations.

examples/qwen_vl/SDK_VISION_INTEGRATION.md ADDED Viewed

@@ -0,0 +1,328 @@
+# SDK Vision Support Integration
+**Status**: ✅ Complete
+## Overview
+Added comprehensive vision/multimodal support to the synth-ai SDK's SFT data module, and integrated it with the monorepo backend for consistent multimodal data handling across both codebases.
+## Changes Made
+### 1. **SDK Enhancement** (`synth-ai/synth_ai/learning/sft/data.py`)
+Added vision-specific utilities to the SDK:
+#### New Functions
+1. **`has_image_content(content: SFTMessageContent) -> bool`**
+   - Detects if message content contains images
+   - Supports OpenAI multimodal format
+   - Handles both `{"type": "image_url"}` and `{"type": "image"}` formats
+2. **`message_has_image(message: SFTMessage) -> bool`**
+   - Checks if an SFTMessage contains image content
+   - Convenience wrapper around `has_image_content`
+3. **`example_has_image(example: SFTExample) -> bool`**
+   - Checks if any message in an SFTExample contains images
+   - Used for filtering vision datasets
+4. **`count_images_in_content(content: SFTMessageContent) -> int`**
+   - Counts number of image segments in message content
+   - Useful for statistics and validation
+5. **`extract_image_urls(content: SFTMessageContent) -> list[str]`**
+   - Extracts all image URLs from message content
+   - Supports http(s):// URLs and data:image/... base64
+   - Returns list of URL strings
+6. **`validate_vision_example(example: SFTExample, *, require_images: bool = True) -> tuple[bool, str | None]`**
+   - Comprehensive validation of vision SFT examples
+   - Checks for image presence, URL validity
+   - Returns `(is_valid, error_message)` tuple
+   - Logs warnings for suspicious URLs
+7. **`iter_vision_examples(...) -> Iterator[SFTExample]`**
+   - Specialized iterator for vision examples
+   - Includes vision-specific validation
+   - Option to require images or skip invalid examples
+   - Useful for processing large JSONL files
+#### Example Usage
+```python
+from synth_ai.learning.sft.data import (
+    load_jsonl,
+    example_has_image,
+    validate_vision_example,
+    extract_image_urls
+)
+# Load and filter vision examples
+examples = load_jsonl("vision_data.jsonl")
+vision_examples = [ex for ex in examples if example_has_image(ex)]
+# Validate each example
+for ex in vision_examples:
+    is_valid, error = validate_vision_example(ex)
+    if not is_valid:
+        print(f"Invalid: {error}")
+    # Extract image URLs for inspection
+    for msg in ex.messages:
+        urls = extract_image_urls(msg.content)
+        print(f"Images: {urls}")
+```
+### 2. **Backend Integration** (`monorepo/backend/.../training/sft/data.py`)
+Updated the monorepo backend to use SDK utilities:
+#### Changes
+1. **Added SDK imports with fallback**:
+   ```python
+   try:
+       from synth_ai.learning.sft.data import (
+           has_image_content as sdk_has_image_content,
+           example_has_image as sdk_example_has_image,
+           validate_vision_example as sdk_validate_vision_example,
+           # ... more imports
+       )
+       SDK_VISION_AVAILABLE = True
+   except ImportError:
+       SDK_VISION_AVAILABLE = False
+       logger.warning("synth_ai SDK not available - vision support will be limited")
+   ```
+2. **Updated `SFTDataProcessor` docstring**:
+   - Documents integration with SDK
+   - Shows OpenAI multimodal format example
+   - Explains fallback behavior
+3. **Enhanced `_vision_message_has_image()` method**:
+   - Uses SDK's `has_image_content()` when available
+   - Falls back to local implementation if SDK unavailable
+   - Ensures consistency between SDK and backend
+4. **Enhanced `_validate_vision_examples()` method**:
+   - Uses SDK's `coerce_example()` and `validate_vision_example()` for messages format
+   - Provides comprehensive validation with detailed error messages
+   - Falls back gracefully if SDK validation fails
+   - Maintains backward compatibility with non-messages formats
+## Supported Data Formats
+### OpenAI Multimodal Format (Recommended)
+```json
+{
+  "messages": [
+    {
+      "role": "system",
+      "content": "You are a helpful assistant."
+    },
+    {
+      "role": "user",
+      "content": [
+        {"type": "text", "text": "What's in this image?"},
+        {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
+      ]
+    },
+    {
+      "role": "assistant",
+      "content": "I see a cat sitting on a couch."
+    }
+  ],
+  "metadata": {
+    "session_id": "ep001",
+    "has_image": true
+  }
+}
+```
+### Alternative Formats (Also Supported)
+**Legacy image field**:
+```json
+{
+  "messages": [...],
+  "images": ["/path/to/image.jpg"],
+  "metadata": {}
+}
+```
+**Single image field**:
+```json
+{
+  "messages": [...],
+  "image": "https://example.com/image.jpg",
+  "metadata": {}
+}
+```
+## Image URL Formats
+Supported image URL formats:
+1. **HTTP(S) URLs**: `https://example.com/image.jpg`
+2. **Data URLs (base64)**: `data:image/png;base64,iVBORw0KGgo...`
+3. **Local file paths**: `/path/to/image.jpg` (for local training only)
+## Validation Rules
+The SDK validates:
+1. **Image presence**: At least one message must contain an image (when `require_images=True`)
+2. **URL format**: All image URLs must be non-empty strings
+3. **URL scheme**: URLs should start with `http://`, `https://`, or `data:image/`
+   - Warnings logged for non-standard formats
+4. **Message structure**: Messages must follow OpenAI format
+## Benefits
+### 1. **Consistency**
+- Single source of truth for vision data validation
+- Both SDK and backend use the same logic
+- Reduces bugs and maintenance burden
+### 2. **Type Safety**
+- Strong typing with dataclasses
+- Clear SFTMessage and SFTExample structures
+- IDE autocomplete and type checking
+### 3. **Error Handling**
+- Comprehensive validation with detailed error messages
+- Graceful fallbacks if SDK unavailable
+- Helpful warnings for edge cases
+### 4. **OpenAI Compatibility**
+- Matches OpenAI's fine-tuning format exactly
+- Data can be used with OpenAI or local models
+- Easy migration between platforms
+### 5. **Tool Call Support**
+- SDK already handles tool calls, tool definitions
+- Ready for complex agentic workflows
+- Supports reasoning blocks (`<think>` tags) if needed
+## Testing
+### Quick SDK Test
+```python
+# Test in synth-ai repo
+from synth_ai.learning.sft.data import has_image_content, validate_vision_example, coerce_example
+import json
+# Test multimodal message detection
+content = [
+    {"type": "text", "text": "What's this?"},
+    {"type": "image_url", "image_url": {"url": "data:image/png;base64,abc123"}}
+]
+assert has_image_content(content) == True
+# Test validation
+example_data = {
+    "messages": [
+        {"role": "user", "content": content},
+        {"role": "assistant", "content": "A test image"}
+    ]
+}
+example = coerce_example(example_data)
+is_valid, error = validate_vision_example(example)
+assert is_valid == True
+print("✓ SDK vision utilities working correctly!")
+```
+### Integration Test
+```python
+# Test in monorepo backend
+from backend.app.routes.simple_training.training.sft.data import SFTDataProcessor
+processor = SFTDataProcessor()
+test_data = [{
+    "messages": [
+        {"role": "user", "content": [
+            {"type": "text", "text": "Describe this."},
+            {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
+        ]},
+        {"role": "assistant", "content": "Description"}
+    ]
+}]
+validated = processor._validate_vision_examples(test_data)
+assert len(validated) == 1
+print("✓ Backend SDK integration working!")
+```
+## Future Enhancements
+### Potential Additions
+1. **Image preprocessing utilities**
+   - Resize images to model requirements
+   - Validate image dimensions
+   - Convert between formats (JPEG ↔ PNG)
+2. **Base64 encoding helpers**
+   - Convert file paths to data URLs
+   - Batch encode images for JSONL
+   - Memory-efficient streaming
+3. **Statistics and analytics**
+   - Count images per example
+   - Measure average image sizes
+   - Detect corrupted or invalid images
+4. **Dataset transformation**
+   - Convert between formats
+   - Augment with additional images
+   - Filter by image properties
+## Migration Guide
+### For Existing Backend Code
+If you have existing vision validation code:
+```python
+# Before (manual validation)
+def has_images(messages):
+    for msg in messages:
+        content = msg.get("content")
+        if isinstance(content, list):
+            for part in content:
+                if part.get("type") == "image_url":
+                    return True
+    return False
+# After (use SDK)
+from synth_ai.learning.sft.data import has_image_content
+def has_images(messages):
+    return any(has_image_content(msg.get("content")) for msg in messages)
+```
+### For Existing SDK Code
+No changes needed! The SDK already handles OpenAI message formats correctly. Vision utilities are additive and don't break existing functionality.
+## Documentation
+- **SDK docs**: See `synth_ai/learning/sft/data.py` docstrings
+- **Backend docs**: See `backend/app/routes/simple_training/training/sft/data.py` class docstring
+- **Examples**: See `synth-ai/examples/qwen_vl/` for vision-specific examples
+## Related Files
+- SDK: `synth-ai/synth_ai/learning/sft/data.py`
+- Backend: `monorepo/backend/app/routes/simple_training/training/sft/data.py`
+- Examples: `synth-ai/examples/qwen_vl/`
+- Pipeline guide: `synth-ai/examples/qwen_vl/NEXT_STEPS_2B.md`
+---
+✅ **SDK vision support is now production-ready for both synth-ai and monorepo!**

synth-ai 0.2.14__py3-none-any.whl → 0.2.17__py3-none-any.whl

Potentially problematic release.

synth-ai 0.2.14py3-none-any.whl → 0.2.17py3-none-any.whl