PyPI - synth-ai - Versions diffs - 0.2.14__py3-none-any.whl → 0.2.16__py3-none-any.whl - Mend

synth-ai 0.2.14py3-none-any.whl → 0.2.16py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of synth-ai might be problematic. Click here for more details.

Files changed (236) hide show

examples/README.md +1 -0
examples/multi_step/SFT_README.md +147 -0
examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +9 -9
examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
examples/multi_step/convert_traces_to_sft.py +84 -0
examples/multi_step/run_sft_qwen30b.sh +45 -0
examples/qwen_coder/configs/coder_lora_30b.toml +2 -1
examples/qwen_coder/configs/coder_lora_4b.toml +2 -1
examples/qwen_coder/configs/coder_lora_small.toml +2 -1
examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
examples/qwen_vl/QUICKSTART.md +327 -0
examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
examples/qwen_vl/README.md +154 -0
examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
examples/qwen_vl/RL_VISION_TESTING.md +333 -0
examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
examples/qwen_vl/SETUP_COMPLETE.md +275 -0
examples/qwen_vl/VISION_TESTS_COMPLETE.md +490 -0
examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
examples/qwen_vl/__init__.py +2 -0
examples/qwen_vl/collect_data_via_cli.md +423 -0
examples/qwen_vl/collect_vision_traces.py +368 -0
examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +127 -0
examples/qwen_vl/configs/crafter_vlm_sft_example.toml +60 -0
examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +43 -0
examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
examples/qwen_vl/configs/eval_gpt5nano_vision.toml +45 -0
examples/qwen_vl/configs/eval_qwen2vl_vision.toml +44 -0
examples/qwen_vl/configs/filter_qwen2vl_sft.toml +50 -0
examples/qwen_vl/configs/filter_vision_sft.toml +53 -0
examples/qwen_vl/configs/filter_vision_test.toml +8 -0
examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
examples/qwen_vl/run_vision_comparison.sh +62 -0
examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
examples/qwen_vl/test_image_validation.py +201 -0
examples/qwen_vl/test_sft_vision_data.py +110 -0
examples/rl/README.md +1 -1
examples/rl/configs/eval_base_qwen.toml +17 -0
examples/rl/configs/eval_rl_qwen.toml +13 -0
examples/rl/configs/rl_from_base_qwen.toml +37 -0
examples/rl/configs/rl_from_base_qwen17.toml +76 -0
examples/rl/configs/rl_from_ft_qwen.toml +37 -0
examples/rl/run_eval.py +436 -0
examples/rl/run_rl_and_save.py +111 -0
examples/rl/task_app/README.md +22 -0
examples/rl/task_app/math_single_step.py +990 -0
examples/rl/task_app/math_task_app.py +111 -0
examples/sft/README.md +5 -5
examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -2
examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -3
examples/sft/evaluate.py +2 -4
examples/sft/export_dataset.py +7 -4
examples/swe/task_app/README.md +1 -1
examples/swe/task_app/grpo_swe_mini.py +0 -1
examples/swe/task_app/grpo_swe_mini_task_app.py +0 -12
examples/swe/task_app/hosted/envs/mini_swe/environment.py +13 -13
examples/swe/task_app/hosted/policy_routes.py +0 -2
examples/swe/task_app/hosted/rollout.py +0 -8
examples/task_apps/crafter/task_app/grpo_crafter.py +4 -7
examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +59 -1
examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +30 -0
examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +62 -31
examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +16 -14
examples/task_apps/enron/__init__.py +1 -0
examples/vlm/README.md +3 -3
examples/vlm/configs/crafter_vlm_gpt4o.toml +2 -0
examples/vlm/crafter_openai_vlm_agent.py +3 -5
examples/vlm/filter_image_rows.py +1 -1
examples/vlm/run_crafter_vlm_benchmark.py +2 -2
examples/warming_up_to_rl/_utils.py +92 -0
examples/warming_up_to_rl/analyze_trace_db.py +1 -1
examples/warming_up_to_rl/configs/crafter_fft.toml +2 -0
examples/warming_up_to_rl/configs/crafter_fft_4b.toml +2 -0
examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +2 -0
examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +2 -0
examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +2 -1
examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +2 -1
examples/warming_up_to_rl/configs/rl_from_ft.toml +2 -0
examples/warming_up_to_rl/export_trace_sft.py +174 -60
examples/warming_up_to_rl/readme.md +63 -132
examples/warming_up_to_rl/run_fft_and_save.py +1 -1
examples/warming_up_to_rl/run_rl_and_save.py +1 -1
examples/warming_up_to_rl/task_app/README.md +42 -0
examples/warming_up_to_rl/task_app/grpo_crafter.py +696 -0
examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +478 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +204 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +618 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1081 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1861 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +62 -0
synth_ai/__init__.py +44 -30
synth_ai/_utils/__init__.py +47 -0
synth_ai/_utils/base_url.py +10 -0
synth_ai/_utils/http.py +10 -0
synth_ai/_utils/prompts.py +10 -0
synth_ai/_utils/task_app_state.py +12 -0
synth_ai/_utils/user_config.py +10 -0
synth_ai/api/models/supported.py +144 -7
synth_ai/api/train/__init__.py +13 -1
synth_ai/api/train/cli.py +30 -7
synth_ai/api/train/config_finder.py +18 -11
synth_ai/api/train/env_resolver.py +13 -10
synth_ai/cli/__init__.py +62 -78
synth_ai/cli/_modal_wrapper.py +7 -5
synth_ai/cli/_typer_patch.py +0 -2
synth_ai/cli/_validate_task_app.py +22 -4
synth_ai/cli/legacy_root_backup.py +3 -1
synth_ai/cli/lib/__init__.py +10 -0
synth_ai/cli/lib/task_app_discovery.py +7 -0
synth_ai/cli/lib/task_app_env.py +518 -0
synth_ai/cli/recent.py +2 -1
synth_ai/cli/setup.py +266 -0
synth_ai/cli/status.py +1 -1
synth_ai/cli/task_app_deploy.py +16 -0
synth_ai/cli/task_app_list.py +25 -0
synth_ai/cli/task_app_modal_serve.py +16 -0
synth_ai/cli/task_app_serve.py +18 -0
synth_ai/cli/task_apps.py +71 -31
synth_ai/cli/traces.py +1 -1
synth_ai/cli/train.py +18 -0
synth_ai/cli/tui.py +7 -2
synth_ai/cli/turso.py +1 -1
synth_ai/cli/watch.py +1 -1
synth_ai/demos/__init__.py +10 -0
synth_ai/demos/core/__init__.py +28 -1
synth_ai/demos/crafter/__init__.py +1 -0
synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
synth_ai/demos/demo_registry.py +176 -0
synth_ai/demos/math/__init__.py +1 -0
synth_ai/demos/math/_common.py +16 -0
synth_ai/demos/math/app.py +38 -0
synth_ai/demos/math/config.toml +76 -0
synth_ai/demos/math/deploy_modal.py +54 -0
synth_ai/demos/math/modal_task_app.py +702 -0
synth_ai/demos/math/task_app_entry.py +51 -0
synth_ai/environments/environment/core.py +7 -1
synth_ai/environments/examples/bandit/engine.py +0 -1
synth_ai/environments/examples/bandit/environment.py +0 -1
synth_ai/environments/examples/wordle/environment.py +0 -1
synth_ai/evals/base.py +16 -5
synth_ai/evals/client.py +1 -1
synth_ai/inference/client.py +1 -1
synth_ai/judge_schemas.py +8 -8
synth_ai/learning/client.py +1 -1
synth_ai/learning/health.py +1 -1
synth_ai/learning/jobs.py +1 -1
synth_ai/learning/rl/client.py +1 -1
synth_ai/learning/rl/env_keys.py +1 -1
synth_ai/learning/rl/secrets.py +1 -1
synth_ai/learning/sft/client.py +1 -1
synth_ai/learning/sft/data.py +407 -4
synth_ai/learning/validators.py +4 -1
synth_ai/task/apps/__init__.py +4 -2
synth_ai/task/config.py +6 -4
synth_ai/task/rubrics/__init__.py +1 -2
synth_ai/task/rubrics/loaders.py +14 -10
synth_ai/task/rubrics.py +219 -0
synth_ai/task/trace_correlation_helpers.py +24 -11
synth_ai/task/tracing_utils.py +14 -3
synth_ai/task/validators.py +2 -3
synth_ai/tracing_v3/abstractions.py +3 -3
synth_ai/tracing_v3/config.py +15 -13
synth_ai/tracing_v3/constants.py +21 -0
synth_ai/tracing_v3/db_config.py +3 -1
synth_ai/tracing_v3/decorators.py +10 -7
synth_ai/tracing_v3/llm_call_record_helpers.py +5 -5
synth_ai/tracing_v3/session_tracer.py +7 -7
synth_ai/tracing_v3/storage/base.py +29 -29
synth_ai/tracing_v3/storage/config.py +3 -3
synth_ai/tracing_v3/turso/daemon.py +8 -9
synth_ai/tracing_v3/turso/native_manager.py +80 -72
synth_ai/tracing_v3/utils.py +2 -2
synth_ai/tui/cli/query_experiments.py +4 -4
synth_ai/tui/cli/query_experiments_v3.py +4 -4
synth_ai/tui/dashboard.py +14 -9
synth_ai/utils/__init__.py +101 -0
synth_ai/utils/base_url.py +94 -0
synth_ai/utils/cli.py +131 -0
synth_ai/utils/env.py +287 -0
synth_ai/utils/http.py +169 -0
synth_ai/utils/modal.py +308 -0
synth_ai/utils/process.py +212 -0
synth_ai/utils/prompts.py +39 -0
synth_ai/utils/sqld.py +122 -0
synth_ai/utils/task_app_discovery.py +882 -0
synth_ai/utils/task_app_env.py +186 -0
synth_ai/utils/task_app_state.py +318 -0
synth_ai/utils/user_config.py +137 -0
synth_ai/v0/config/__init__.py +1 -5
synth_ai/v0/config/base_url.py +1 -7
synth_ai/v0/tracing/config.py +1 -1
synth_ai/v0/tracing/decorators.py +1 -1
synth_ai/v0/tracing/upload.py +1 -1
synth_ai/v0/tracing_v1/config.py +1 -1
synth_ai/v0/tracing_v1/decorators.py +1 -1
synth_ai/v0/tracing_v1/upload.py +1 -1
{synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/METADATA +85 -31
{synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/RECORD +229 -117
synth_ai/cli/man.py +0 -106
synth_ai/compound/cais.py +0 -0
synth_ai/core/experiment.py +0 -13
synth_ai/core/system.py +0 -15
synth_ai/demo_registry.py +0 -295
synth_ai/handshake.py +0 -109
synth_ai/http.py +0 -26
{synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/WHEEL +0 -0
{synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/entry_points.txt +0 -0
{synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/licenses/LICENSE +0 -0
{synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/top_level.txt +0 -0

examples/qwen_vl/NEXT_STEPS_2B.md ADDED Viewed

@@ -0,0 +1,325 @@
+# Next Steps: Qwen3-VL-2B SFT & RL Training
+**Status:** Data collection complete ✅ | Ready for SFT training 🚀
+---
+## 📋 Current Status
+### ✅ Completed
+1. **VLM Data Collection Pipeline** - WORKING END-TO-END
+   - Fixed task app tracing to return full session traces
+   - Fixed CLI to handle multimodal content preservation
+   - Successfully collected traces with base64 PNG images
+   - Database: `traces/gpt4o_vision_test/rollouts.db`
+   - Exported: `traces/gpt4o_vision_test/sft/train.jsonl` (50 examples validated)
+2. **Infrastructure Validated**
+   - ✅ `synth-ai eval` stores traces with images
+   - ✅ `synth-ai filter` exports SFT JSONL with preserved images
+   - ✅ Multimodal messages follow OpenAI format
+   - ✅ Images embedded as base64 PNG (~1306 chars per 64x64 image)
+3. **Documentation**
+   - `VLM_PIPELINE_COMPLETE.md` - Full pipeline guide
+   - `PIPELINE_RUN_LOG.txt` - Execution log with all fixes
+   - `BUGS_AND_FIXES.md` - Detailed bug reports
+   - `SETUP_COMPLETE.md` - Summary of setup
+---
+## 🎯 Next Steps: Train Qwen3-VL-2B
+### Step 1: Scale Up Data Collection (Optional)
+We have 50 working examples. For production training, collect more:
+```bash
+cd /Users/joshpurtell/Documents/GitHub/synth-ai
+# Collect 100 episodes (will create ~5000 samples)
+export TASKAPP_TRACING_ENABLED=1
+uvx synth-ai eval \
+  --config examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml \
+  --seeds 0-99 \
+  --trace-db traces/gpt4o_vision_100/rollouts.db \
+  --env-file /path/to/.env
+# Filter and export
+uvx synth-ai filter \
+  --config examples/qwen_vl/configs/filter_vision_test.toml
+```
+**Output:** ~4500 SFT examples with images
+---
+### Step 2: Create SFT Config for Qwen3-VL-2B
+File: `/Users/joshpurtell/Documents/GitHub/monorepo/configs/vision_sft/crafter_qwen3vl_2b_gpt4o.toml`
+```toml
+# Crafter Vision SFT: Qwen3-VL-2B trained on gpt-4o-mini traces
+# Using 2B model for faster iteration and lower GPU requirements
+[algorithm]
+type = "offline"
+method = "sft"
+variety = "lora"
+[job]
+model = "Qwen/Qwen3-VL-2B-Instruct"
+data = "traces/gpt4o_vision_100/sft/train.jsonl"
+[compute]
+gpu_type = "H200"
+gpu_count = 2
+nodes = 1
+[training]
+mode = "lora"
+use_qlora = true
+[hyperparameters]
+n_epochs = 3
+per_device_batch = 1
+gradient_accumulation_steps = 16
+sequence_length = 2048
+learning_rate = 5e-05
+warmup_ratio = 0.03
+train_kind = "peft"
+# LoRA config
+lora_rank = 16
+lora_alpha = 32
+lora_dropout = 0.05
+lora_target_modules = ["all-linear"]
+# Training optimizations
+[hyperparameters.parallelism]
+use_deepspeed = true
+deepspeed_stage = 2
+bf16 = true
+activation_checkpointing = true
+# Evaluation
+evaluation_strategy = "steps"
+eval_steps = 50
+save_best_model_at_end = true
+metric_for_best_model = "val.loss"
+[tags]
+task = "crafter"
+modality = "vision"
+data_source = "openai_gpt4o_mini"
+model_family = "qwen3_vl"
+model_size = "2b"
+```
+---
+### Step 3: Run SFT Training
+```bash
+cd /Users/joshpurtell/Documents/GitHub/monorepo
+# Copy data to monorepo (if not already there)
+cp -r /Users/joshpurtell/Documents/GitHub/synth-ai/traces/gpt4o_vision_100/sft/ \
+   backend/data/vision_sft/
+# Submit SFT job
+export BACKEND_BASE_URL="https://synth-backend-dev-docker.onrender.com/api"
+uvx synth-ai train \
+  --type sft \
+  --config configs/vision_sft/crafter_qwen3vl_2b_gpt4o.toml \
+  --env-file backend/.env.dev
+```
+**Expected:**
+- Training time: 1-2 hours
+- Cost: ~$10.50 (2x H200)
+- Output: LoRA adapter at `lora_adapters/qwen3vl_2b_crafter_gpt4o/`
+---
+### Step 4: Create RL Config for Qwen3-VL-2B
+File: `/Users/joshpurtell/Documents/GitHub/synth-ai/examples/qwen_vl/configs/crafter_rl_qwen3vl_2b.toml`
+```toml
+# Crafter Vision RL: Qwen3-VL-2B with GRPO
+# Uses SFT-initialized model for RL fine-tuning
+[algorithm]
+type = "online"
+method = "grpo"
+variety = "default"
+[model]
+base = "Qwen/Qwen3-VL-2B-Instruct"
+adapter = "lora_adapters/qwen3vl_2b_crafter_gpt4o"  # From SFT step
+[job]
+rollout_count = 50
+n_iterations = 20
+max_steps_per_rollout = 50
+[compute]
+gpu_type = "H200"
+gpu_count = 4
+nodes = 1
+[topology]
+type = "single_node_split"
+gpus_for_vllm = 2
+gpus_for_training = 2
+gpus_for_ref = 0
+[vllm]
+tensor_parallel_size = 1  # 2B fits on 1 GPU
+enable_prefix_caching = false
+use_cudagraph = true
+gpu_memory_utilization = 0.85
+max_model_len = 2048
+[training]
+mode = "lora"
+use_qlora = true
+[hyperparameters]
+per_device_batch = 2
+gradient_accumulation_steps = 8
+sequence_length = 2048
+learning_rate = 2e-06
+warmup_ratio = 0.1
+train_kind = "peft"
+lora_rank = 16
+lora_alpha = 32
+lora_dropout = 0.05
+lora_target_modules = ["all-linear"]
+[grpo]
+kl_coeff = 0.1
+clip_range = 0.2
+value_clip_range = 0.2
+normalize_rewards = true
+[judge]
+type = "remote"
+provider = "openai"
+model = "gpt-4o-mini"
+[tags]
+task = "crafter"
+modality = "vision"
+algorithm = "grpo"
+model_family = "qwen3_vl"
+model_size = "2b"
+```
+---
+### Step 5: Run RL Training
+```bash
+cd /Users/joshpurtell/Documents/GitHub/synth-ai
+# Submit RL job
+uvx synth-ai train \
+  --type rl \
+  --config examples/qwen_vl/configs/crafter_rl_qwen3vl_2b.toml \
+  --env-file /path/to/.env
+```
+**Expected:**
+- Training time: 4-6 hours
+- Cost: ~$70 (4x H200)
+- Output: RL-tuned adapter at `lora_adapters/qwen3vl_2b_crafter_rl_iter20/`
+---
+### Step 6: Evaluate Results
+```bash
+# Run benchmark
+python examples/qwen_vl/benchmark_vision_agents.py
+```
+**Expected Performance:**
+- Base Qwen3-VL-2B: ~6.5% achievement rate
+- After SFT: ~20% achievement rate (+13.5%)
+- After RL: ~38% achievement rate (+18% more)
+- Teacher (gpt-4o-mini): ~45% achievement rate
+---
+## 💰 Cost & Timeline Summary
+### Qwen3-VL-2B Pipeline
+| Step | Description | Cost | Time |
+|------|-------------|------|------|
+| 1 | Data collection (100 episodes) | ~$1-2 | 30-60 min |
+| 2 | Dataset assembly | $0 | < 5 min |
+| 3 | Vision SFT (3 epochs) | ~$10.50 | 1-2 hrs |
+| 4 | Vision RL (20 iterations) | ~$70 | 4-6 hrs |
+| 5 | Evaluation | ~$5 | 2-3 hrs |
+**Total:** ~$87, 8-12 hours
+### Cost Comparison: 2B vs 8B
+| Model | SFT Cost | RL Cost | Total | Training Time |
+|-------|----------|---------|-------|---------------|
+| 2B    | $10.50   | $70     | $87   | 8-12 hrs      |
+| 8B    | $21      | $112    | $140  | 12-18 hrs     |
+**Savings with 2B:** 40% cost reduction, 30% faster
+---
+## 🎯 Key Advantages of 2B Model
+1. **Faster Iteration**
+   - SFT: 1-2 hours vs 2-4 hours for 8B
+   - RL: 4-6 hours vs 6-10 hours for 8B
+   - Enables rapid experimentation
+2. **Lower GPU Requirements**
+   - Fits on 1 GPU for inference (use 2 for safety)
+   - Can use batch_size=2 vs 1 for 8B
+   - More efficient GPU utilization
+3. **Cost Effective**
+   - ~$87 total vs $140 for 8B
+   - Better for initial prototyping
+   - Scale to 8B later if needed
+4. **Competitive Performance**
+   - ~38% achievement rate after RL
+   - vs ~42% for 8B (only 4% difference)
+   - Good enough for validation and testing
+---
+## 📝 Notes
+- All configs use LoRA for memory efficiency
+- Vision models require batch_size=1-2 (images are memory-intensive)
+- Use DeepSpeed Stage 2 for training optimization
+- Disable prefix caching (unstable with LoRA + vision)
+- 2B model is perfect for initial testing and rapid iteration
+---
+## 🚀 Ready to Start!
+The infrastructure is ready. Just need to:
+1. Create the two TOML configs above
+2. Run SFT training
+3. Run RL training
+4. Evaluate and compare
+All the hard work (data collection, tracing fixes, filtering) is done! 🎉

examples/qwen_vl/QUICKSTART.md ADDED Viewed

@@ -0,0 +1,327 @@
+# Qwen VL Quickstart Guide
+Complete guide to running vision-language models on Crafter with image observations.
+## 🚀 Quick Demo
+### Option 1: Run gpt-5-nano (OpenAI)
+```bash
+export OPENAI_API_KEY="sk-..."
+uv run python examples/qwen_vl/crafter_gpt5nano_agent.py --seeds 5 --steps 10
+```
+### Option 2: Run Qwen-VL (synth-ai)
+```bash
+export SYNTH_API_KEY="sk_live_..."
+uv run python examples/qwen_vl/crafter_qwen_vl_agent.py \
+  --model Qwen/Qwen2-VL-7B-Instruct --seeds 5 --steps 10
+```
+### Option 3: Compare Both
+```bash
+export OPENAI_API_KEY="sk-..."
+export SYNTH_API_KEY="sk_live_..."
+bash examples/qwen_vl/run_vision_comparison.sh
+```
+---
+## 📊 Expected Output
+```
+Running 10 Crafter episodes with model=gpt-5-nano
+Using OpenAI API
+Seed 00: steps=10, achievements=2, tool_calls=10, reward≈1.250
+Seed 01: steps=10, achievements=1, tool_calls=10, reward≈0.750
+Seed 02: steps=10, achievements=3, tool_calls=10, reward≈1.500
+...
+Summary
+-------
+{
+  "model": "gpt-5-nano",
+  "provider": "openai",
+  "episodes": 10,
+  "mean_steps": 9.8,
+  "mean_achievements": 2.1,
+  "total_tool_calls": 98,
+  "output_dir": "examples/qwen_vl/temp/gpt5nano_frames"
+}
+Frames saved in: examples/qwen_vl/temp/gpt5nano_frames/
+```
+Each episode saves PNG frames (64x64) showing what the VLM saw:
+```
+examples/qwen_vl/temp/gpt5nano_frames/
+  seed_0000/
+    step_000.png
+    step_001.png
+    step_002.png
+    ...
+  seed_0001/
+    ...
+```
+---
+## 🎯 Full Pipeline: Data Collection → SFT → RL
+### Step 1: Collect Vision Traces
+Collect 100 episodes with gpt-5-nano (for teacher distillation):
+```bash
+export OPENAI_API_KEY="sk-..."
+uv run python examples/qwen_vl/collect_vision_traces.py \
+  --model gpt-5-nano \
+  --provider openai \
+  --episodes 100 \
+  --max-steps 50 \
+  --output-dir traces/gpt5nano_vision
+```
+**Output:**
+- SQLite DB: `traces/gpt5nano_vision/rollouts.db`
+- Contains multimodal traces with images
+- ~5000 samples (100 episodes × ~50 steps)
+**Timeline:** 30-60 minutes
+**Cost:** ~$1-2 (OpenAI gpt-5-nano)
+---
+### Step 2: Export to SFT JSONL
+Convert SQLite traces to SFT training format:
+```bash
+uv run python examples/qwen_vl/export_traces_to_sft.py \
+  --db-path traces/gpt5nano_vision/rollouts.db \
+  --output traces/gpt5nano_vision/sft_dataset.jsonl \
+  --min-steps 5
+```
+**Output:**
+- JSONL file with OpenAI-format messages
+- Each line: `{"messages": [...], "metadata": {...}}`
+- Messages include base64-encoded images
+---
+### Step 3: Split Train/Val
+```bash
+uv run python examples/qwen_vl/split_sft_data.py \
+  --input traces/gpt5nano_vision/sft_dataset.jsonl \
+  --train-output traces/gpt5nano_vision/train.jsonl \
+  --val-output traces/gpt5nano_vision/val.jsonl \
+  --val-fraction 0.1
+```
+**Output:**
+- `train.jsonl`: ~4400 samples
+- `val.jsonl`: ~500 samples
+---
+### Step 4: Train Vision SFT
+Use the example config or create your own:
+```bash
+cd /path/to/monorepo
+export BACKEND_BASE_URL="https://synth-backend-dev-docker.onrender.com/api"
+uvx synth-ai train \
+  --type sft \
+  --config examples/qwen_vl/configs/crafter_vlm_sft_example.toml \
+  --env-file backend/.env.dev
+```
+**Hardware:** 2x H200 (or 4x H100)
+**Time:** 2-4 hours (2 epochs)
+**Cost:** ~$21 (Modal GPU pricing)
+**Output:**
+- LoRA adapter saved to HF Hub or S3
+- Wandb logs with training curves
+---
+### Step 5: Run Vision RL (Optional)
+After SFT, fine-tune with GRPO for better performance:
+```toml
+# example RL config
+[algorithm]
+type = "online"
+method = "grpo"
+[model]
+base = "Qwen/Qwen2-VL-7B-Instruct"
+adapter = "s3://my-bucket/qwen2vl_crafter_sft"  # From SFT
+[compute]
+gpu_count = 4  # 2 inference + 2 training
+```
+**Time:** 6-10 hours (20 iterations)
+**Cost:** ~$112
+---
+## 📁 File Structure
+```
+synth-ai/examples/qwen_vl/
+├── README.md                       # Overview
+├── QUICKSTART.md                   # This file
+├── __init__.py
+│
+├── crafter_gpt5nano_agent.py       # OpenAI gpt-5-nano demo
+├── crafter_qwen_vl_agent.py        # Qwen-VL (synth-ai) demo
+├── collect_vision_traces.py        # Trace collection for SFT
+├── run_vision_comparison.sh        # Compare both models
+│
+├── configs/
+│   └── crafter_vlm_sft_example.toml  # Example SFT config
+│
+└── temp/                           # Output frames and summaries
+    ├── gpt5nano_frames/
+    ├── qwen_vl_frames/
+    └── comparison/
+```
+---
+## 🔍 How Vision Detection Works
+CrafterPolicy automatically detects vision capability:
+```python
+# From examples/task_apps/crafter/.../policy.py
+@staticmethod
+def _is_vision_model(model_name: str) -> bool:
+    """Check if model supports vision from its name."""
+    model_lower = model_name.lower()
+    vision_patterns = [
+        "gpt-5",           # ✅ gpt-5-nano, gpt-5-turbo, etc.
+        "gpt-4o",          # ✅ gpt-4o-mini, gpt-4o
+        "qwen-vl",         # ✅ Qwen-VL-Chat
+        "qwen2-vl",        # ✅ Qwen2-VL-7B-Instruct
+        "qwen3-vl",        # ✅ Qwen3-VL-8B
+        # ... more patterns
+    ]
+    return any(pattern in model_lower for pattern in vision_patterns)
+```
+If detected:
+- Policy includes base64 image in user message
+- Images are 64x64 PNG frames from Crafter
+- Format: `{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}`
+---
+## 🎛️ Advanced Configuration
+### Custom Image Resolution
+Edit Crafter task instance config:
+```python
+instance.config = {
+    "seed": seed,
+    "length": 256,
+    "area": [128, 128],  # Higher resolution (default: 64x64)
+}
+```
+**Note:** Higher resolution = more tokens = higher cost
+### Image-Only Mode
+Disable text observations, use only images:
+```python
+await policy.initialize({
+    "use_tools": True,
+    "model": model,
+    "image_only_mode": True,  # No text, only images
+})
+```
+### Multiple Images per Step
+For temporal context (not yet implemented):
+```python
+# Future: Include last N frames
+image_parts = [
+    {"type": "image_url", "image_url": {"url": frame_t}},
+    {"type": "image_url", "image_url": {"url": frame_t_minus_1}},
+    {"type": "image_url", "image_url": {"url": frame_t_minus_2}},
+]
+```
+---
+## 🐛 Troubleshooting
+### Error: `OPENAI_API_KEY not set`
+```bash
+export OPENAI_API_KEY="sk-..."
+```
+### Error: `SYNTH_API_KEY not set`
+```bash
+export SYNTH_API_KEY="sk_live_..."
+```
+### Error: `TracingStore not available`
+Traces require synth-ai tracing module:
+```bash
+uv sync  # Ensure all dependencies are installed
+```
+### Vision not detected
+Manually enable:
+```python
+await policy.initialize({"use_vision": True})
+```
+---
+## 📚 Related Documentation
+- **SFT Pipeline:** See `/Users/joshpurtell/Documents/GitHub/monorepo/vision_sft_rl.txt` (Phase 9)
+- **Crafter Environment:** `examples/task_apps/crafter/README.md`
+- **OpenAI VLM Examples:** `examples/vlm/crafter_openai_vlm_agent.py`
+- **Image-Only Eval:** `examples/task_apps/IMAGE_ONLY_EVAL_QUICKSTART.md`
+---
+## 🎉 Next Steps
+1. ✅ Run demos to verify vision inference works
+2. 🎯 Collect training traces (100-1000 episodes)
+3. 📦 Export and split into train/val
+4. 🚀 Train VLM with LoRA (see `crafter_vlm_sft_example.toml`)
+5. 🏆 Fine-tune with RL/GRPO for better achievement rates
+6. 📊 Benchmark: base model vs SFT vs SFT+RL
+**Expected Performance:**
+- Base Qwen-VL: ~5-10% achievement rate
+- After SFT (gpt-5-nano distillation): ~20-30%
+- After RL (20 iterations): ~40-50%
+---
+Happy vision-language model training! 🚀✨

synth-ai 0.2.14__py3-none-any.whl → 0.2.16__py3-none-any.whl

Potentially problematic release.

synth-ai 0.2.14py3-none-any.whl → 0.2.16py3-none-any.whl