synth-ai 0.2.14__py3-none-any.whl → 0.2.16__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synth-ai might be problematic. Click here for more details.
- examples/README.md +1 -0
- examples/multi_step/SFT_README.md +147 -0
- examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +9 -9
- examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
- examples/multi_step/convert_traces_to_sft.py +84 -0
- examples/multi_step/run_sft_qwen30b.sh +45 -0
- examples/qwen_coder/configs/coder_lora_30b.toml +2 -1
- examples/qwen_coder/configs/coder_lora_4b.toml +2 -1
- examples/qwen_coder/configs/coder_lora_small.toml +2 -1
- examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
- examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
- examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
- examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
- examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
- examples/qwen_vl/QUICKSTART.md +327 -0
- examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
- examples/qwen_vl/README.md +154 -0
- examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
- examples/qwen_vl/RL_VISION_TESTING.md +333 -0
- examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
- examples/qwen_vl/SETUP_COMPLETE.md +275 -0
- examples/qwen_vl/VISION_TESTS_COMPLETE.md +490 -0
- examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
- examples/qwen_vl/__init__.py +2 -0
- examples/qwen_vl/collect_data_via_cli.md +423 -0
- examples/qwen_vl/collect_vision_traces.py +368 -0
- examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +127 -0
- examples/qwen_vl/configs/crafter_vlm_sft_example.toml +60 -0
- examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +43 -0
- examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
- examples/qwen_vl/configs/eval_gpt5nano_vision.toml +45 -0
- examples/qwen_vl/configs/eval_qwen2vl_vision.toml +44 -0
- examples/qwen_vl/configs/filter_qwen2vl_sft.toml +50 -0
- examples/qwen_vl/configs/filter_vision_sft.toml +53 -0
- examples/qwen_vl/configs/filter_vision_test.toml +8 -0
- examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
- examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
- examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
- examples/qwen_vl/run_vision_comparison.sh +62 -0
- examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
- examples/qwen_vl/test_image_validation.py +201 -0
- examples/qwen_vl/test_sft_vision_data.py +110 -0
- examples/rl/README.md +1 -1
- examples/rl/configs/eval_base_qwen.toml +17 -0
- examples/rl/configs/eval_rl_qwen.toml +13 -0
- examples/rl/configs/rl_from_base_qwen.toml +37 -0
- examples/rl/configs/rl_from_base_qwen17.toml +76 -0
- examples/rl/configs/rl_from_ft_qwen.toml +37 -0
- examples/rl/run_eval.py +436 -0
- examples/rl/run_rl_and_save.py +111 -0
- examples/rl/task_app/README.md +22 -0
- examples/rl/task_app/math_single_step.py +990 -0
- examples/rl/task_app/math_task_app.py +111 -0
- examples/sft/README.md +5 -5
- examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -2
- examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -3
- examples/sft/evaluate.py +2 -4
- examples/sft/export_dataset.py +7 -4
- examples/swe/task_app/README.md +1 -1
- examples/swe/task_app/grpo_swe_mini.py +0 -1
- examples/swe/task_app/grpo_swe_mini_task_app.py +0 -12
- examples/swe/task_app/hosted/envs/mini_swe/environment.py +13 -13
- examples/swe/task_app/hosted/policy_routes.py +0 -2
- examples/swe/task_app/hosted/rollout.py +0 -8
- examples/task_apps/crafter/task_app/grpo_crafter.py +4 -7
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +59 -1
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +30 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +62 -31
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +16 -14
- examples/task_apps/enron/__init__.py +1 -0
- examples/vlm/README.md +3 -3
- examples/vlm/configs/crafter_vlm_gpt4o.toml +2 -0
- examples/vlm/crafter_openai_vlm_agent.py +3 -5
- examples/vlm/filter_image_rows.py +1 -1
- examples/vlm/run_crafter_vlm_benchmark.py +2 -2
- examples/warming_up_to_rl/_utils.py +92 -0
- examples/warming_up_to_rl/analyze_trace_db.py +1 -1
- examples/warming_up_to_rl/configs/crafter_fft.toml +2 -0
- examples/warming_up_to_rl/configs/crafter_fft_4b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +2 -1
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +2 -1
- examples/warming_up_to_rl/configs/rl_from_ft.toml +2 -0
- examples/warming_up_to_rl/export_trace_sft.py +174 -60
- examples/warming_up_to_rl/readme.md +63 -132
- examples/warming_up_to_rl/run_fft_and_save.py +1 -1
- examples/warming_up_to_rl/run_rl_and_save.py +1 -1
- examples/warming_up_to_rl/task_app/README.md +42 -0
- examples/warming_up_to_rl/task_app/grpo_crafter.py +696 -0
- examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +478 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +204 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +618 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1081 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1861 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +62 -0
- synth_ai/__init__.py +44 -30
- synth_ai/_utils/__init__.py +47 -0
- synth_ai/_utils/base_url.py +10 -0
- synth_ai/_utils/http.py +10 -0
- synth_ai/_utils/prompts.py +10 -0
- synth_ai/_utils/task_app_state.py +12 -0
- synth_ai/_utils/user_config.py +10 -0
- synth_ai/api/models/supported.py +144 -7
- synth_ai/api/train/__init__.py +13 -1
- synth_ai/api/train/cli.py +30 -7
- synth_ai/api/train/config_finder.py +18 -11
- synth_ai/api/train/env_resolver.py +13 -10
- synth_ai/cli/__init__.py +62 -78
- synth_ai/cli/_modal_wrapper.py +7 -5
- synth_ai/cli/_typer_patch.py +0 -2
- synth_ai/cli/_validate_task_app.py +22 -4
- synth_ai/cli/legacy_root_backup.py +3 -1
- synth_ai/cli/lib/__init__.py +10 -0
- synth_ai/cli/lib/task_app_discovery.py +7 -0
- synth_ai/cli/lib/task_app_env.py +518 -0
- synth_ai/cli/recent.py +2 -1
- synth_ai/cli/setup.py +266 -0
- synth_ai/cli/status.py +1 -1
- synth_ai/cli/task_app_deploy.py +16 -0
- synth_ai/cli/task_app_list.py +25 -0
- synth_ai/cli/task_app_modal_serve.py +16 -0
- synth_ai/cli/task_app_serve.py +18 -0
- synth_ai/cli/task_apps.py +71 -31
- synth_ai/cli/traces.py +1 -1
- synth_ai/cli/train.py +18 -0
- synth_ai/cli/tui.py +7 -2
- synth_ai/cli/turso.py +1 -1
- synth_ai/cli/watch.py +1 -1
- synth_ai/demos/__init__.py +10 -0
- synth_ai/demos/core/__init__.py +28 -1
- synth_ai/demos/crafter/__init__.py +1 -0
- synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
- synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
- synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
- synth_ai/demos/demo_registry.py +176 -0
- synth_ai/demos/math/__init__.py +1 -0
- synth_ai/demos/math/_common.py +16 -0
- synth_ai/demos/math/app.py +38 -0
- synth_ai/demos/math/config.toml +76 -0
- synth_ai/demos/math/deploy_modal.py +54 -0
- synth_ai/demos/math/modal_task_app.py +702 -0
- synth_ai/demos/math/task_app_entry.py +51 -0
- synth_ai/environments/environment/core.py +7 -1
- synth_ai/environments/examples/bandit/engine.py +0 -1
- synth_ai/environments/examples/bandit/environment.py +0 -1
- synth_ai/environments/examples/wordle/environment.py +0 -1
- synth_ai/evals/base.py +16 -5
- synth_ai/evals/client.py +1 -1
- synth_ai/inference/client.py +1 -1
- synth_ai/judge_schemas.py +8 -8
- synth_ai/learning/client.py +1 -1
- synth_ai/learning/health.py +1 -1
- synth_ai/learning/jobs.py +1 -1
- synth_ai/learning/rl/client.py +1 -1
- synth_ai/learning/rl/env_keys.py +1 -1
- synth_ai/learning/rl/secrets.py +1 -1
- synth_ai/learning/sft/client.py +1 -1
- synth_ai/learning/sft/data.py +407 -4
- synth_ai/learning/validators.py +4 -1
- synth_ai/task/apps/__init__.py +4 -2
- synth_ai/task/config.py +6 -4
- synth_ai/task/rubrics/__init__.py +1 -2
- synth_ai/task/rubrics/loaders.py +14 -10
- synth_ai/task/rubrics.py +219 -0
- synth_ai/task/trace_correlation_helpers.py +24 -11
- synth_ai/task/tracing_utils.py +14 -3
- synth_ai/task/validators.py +2 -3
- synth_ai/tracing_v3/abstractions.py +3 -3
- synth_ai/tracing_v3/config.py +15 -13
- synth_ai/tracing_v3/constants.py +21 -0
- synth_ai/tracing_v3/db_config.py +3 -1
- synth_ai/tracing_v3/decorators.py +10 -7
- synth_ai/tracing_v3/llm_call_record_helpers.py +5 -5
- synth_ai/tracing_v3/session_tracer.py +7 -7
- synth_ai/tracing_v3/storage/base.py +29 -29
- synth_ai/tracing_v3/storage/config.py +3 -3
- synth_ai/tracing_v3/turso/daemon.py +8 -9
- synth_ai/tracing_v3/turso/native_manager.py +80 -72
- synth_ai/tracing_v3/utils.py +2 -2
- synth_ai/tui/cli/query_experiments.py +4 -4
- synth_ai/tui/cli/query_experiments_v3.py +4 -4
- synth_ai/tui/dashboard.py +14 -9
- synth_ai/utils/__init__.py +101 -0
- synth_ai/utils/base_url.py +94 -0
- synth_ai/utils/cli.py +131 -0
- synth_ai/utils/env.py +287 -0
- synth_ai/utils/http.py +169 -0
- synth_ai/utils/modal.py +308 -0
- synth_ai/utils/process.py +212 -0
- synth_ai/utils/prompts.py +39 -0
- synth_ai/utils/sqld.py +122 -0
- synth_ai/utils/task_app_discovery.py +882 -0
- synth_ai/utils/task_app_env.py +186 -0
- synth_ai/utils/task_app_state.py +318 -0
- synth_ai/utils/user_config.py +137 -0
- synth_ai/v0/config/__init__.py +1 -5
- synth_ai/v0/config/base_url.py +1 -7
- synth_ai/v0/tracing/config.py +1 -1
- synth_ai/v0/tracing/decorators.py +1 -1
- synth_ai/v0/tracing/upload.py +1 -1
- synth_ai/v0/tracing_v1/config.py +1 -1
- synth_ai/v0/tracing_v1/decorators.py +1 -1
- synth_ai/v0/tracing_v1/upload.py +1 -1
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/METADATA +85 -31
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/RECORD +229 -117
- synth_ai/cli/man.py +0 -106
- synth_ai/compound/cais.py +0 -0
- synth_ai/core/experiment.py +0 -13
- synth_ai/core/system.py +0 -15
- synth_ai/demo_registry.py +0 -295
- synth_ai/handshake.py +0 -109
- synth_ai/http.py +0 -26
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/entry_points.txt +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/licenses/LICENSE +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# Vision RL - Quick Start 🚀
|
|
2
|
+
|
|
3
|
+
Complete RL training with vision models in 3 commands.
|
|
4
|
+
|
|
5
|
+
## Prerequisites
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
export SYNTH_API_KEY="your-key"
|
|
9
|
+
export BACKEND_BASE_URL="https://agent-learning.onrender.com/api"
|
|
10
|
+
export ENVIRONMENT_API_KEY="your-modal-key"
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
## Option 1: Run Tests (Validate Pipeline)
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
cd /Users/joshpurtell/Documents/GitHub/synth-ai
|
|
17
|
+
|
|
18
|
+
# Fast test (~3-5 min)
|
|
19
|
+
uv run pytest tests/integration/cli/test_cli_train_rl_vision.py::test_cli_train_rl_vision_small_config -v -s
|
|
20
|
+
|
|
21
|
+
# Full test (~5-10 min)
|
|
22
|
+
uv run pytest tests/integration/cli/test_cli_train_rl_vision.py::test_cli_train_rl_vision_qwen3vl4b -v -s
|
|
23
|
+
|
|
24
|
+
# All vision tests
|
|
25
|
+
uv run pytest tests/integration/cli/test_cli_train_rl_vision.py -v -s
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Option 2: Manual Training
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
# 1. Deploy task app
|
|
32
|
+
uvx synth-ai task-app deploy grpo-crafter --name grpo-crafter-task-app
|
|
33
|
+
|
|
34
|
+
# 2. Get URL (from deploy output)
|
|
35
|
+
export TASK_APP_URL="https://your-app.modal.run"
|
|
36
|
+
|
|
37
|
+
# 3. Run RL training
|
|
38
|
+
uvx synth-ai train \
|
|
39
|
+
--type rl \
|
|
40
|
+
--config examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml \
|
|
41
|
+
--backend $BACKEND_BASE_URL \
|
|
42
|
+
--task-url $TASK_APP_URL
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## What It Does
|
|
46
|
+
|
|
47
|
+
1. ✅ Deploys Crafter task app (generates image observations)
|
|
48
|
+
2. ✅ Runs Qwen3-VL-4B with image-only input
|
|
49
|
+
3. ✅ RL training with GRPO/GSPO
|
|
50
|
+
4. ✅ Uses same task app as SFT data collection
|
|
51
|
+
|
|
52
|
+
## Configs
|
|
53
|
+
|
|
54
|
+
### Fast CI Test
|
|
55
|
+
**Config:** `tests/artifacts/configs/rl.vision.small.toml`
|
|
56
|
+
- 1 iteration, 3 steps, 1 episode
|
|
57
|
+
- Runtime: ~5 minutes
|
|
58
|
+
|
|
59
|
+
### Full Training
|
|
60
|
+
**Config:** `examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml`
|
|
61
|
+
- 3 iterations per epoch, 10 steps, 2 episodes
|
|
62
|
+
- Runtime: ~30-45 minutes per epoch
|
|
63
|
+
|
|
64
|
+
## Expected Output
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
✅ Vision RL job created: job-abc123
|
|
68
|
+
Model: Qwen3-VL-4B
|
|
69
|
+
Task App: https://your-app.modal.run
|
|
70
|
+
Image Mode: image_only
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Troubleshooting
|
|
74
|
+
|
|
75
|
+
### Task app timeout?
|
|
76
|
+
```bash
|
|
77
|
+
export TASK_APP_WARMUP_TIMEOUT=600 # 10 minutes
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### OOM?
|
|
81
|
+
```toml
|
|
82
|
+
# Edit config: reduce batch_size to 1
|
|
83
|
+
[training]
|
|
84
|
+
batch_size = 1
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Not seeing images?
|
|
88
|
+
```bash
|
|
89
|
+
# Verify config
|
|
90
|
+
grep "supports_vision = true" <config.toml>
|
|
91
|
+
grep "use_vision = true" <config.toml>
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## Full Documentation
|
|
95
|
+
|
|
96
|
+
- 📘 **Complete Guide:** `RL_VISION_COMPLETE.md`
|
|
97
|
+
- 🧪 **Testing Details:** `RL_VISION_TESTING.md`
|
|
98
|
+
- 📊 **SFT Pipeline:** `VLM_PIPELINE_COMPLETE.md`
|
|
99
|
+
|
|
100
|
+
## One-Liner Test
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
cd /Users/joshpurtell/Documents/GitHub/synth-ai && \
|
|
104
|
+
uv run pytest tests/integration/cli/test_cli_train_rl_vision.py::test_cli_train_rl_vision_small_config -v -s
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
**Ready?** Run the tests to validate your vision RL pipeline! 🎯
|
|
110
|
+
|
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
# Qwen VL Examples for Crafter
|
|
2
|
+
|
|
3
|
+
**📖 MASTER GUIDE**: See `../../VLM_COMPLETE_GUIDE.md` for full documentation
|
|
4
|
+
|
|
5
|
+
Vision-language model examples for Crafter agents with image observations.
|
|
6
|
+
|
|
7
|
+
**Status**: ✅ Production Ready (October 27, 2025)
|
|
8
|
+
|
|
9
|
+
## Documentation
|
|
10
|
+
|
|
11
|
+
| Document | Purpose |
|
|
12
|
+
|----------|---------|
|
|
13
|
+
| `../../VLM_COMPLETE_GUIDE.md` | Complete VLM documentation |
|
|
14
|
+
| `VLM_PIPELINE_COMPLETE.md` | Pipeline success summary |
|
|
15
|
+
| `QUICKSTART.md` | Quick start guide |
|
|
16
|
+
| `collect_data_via_cli.md` | CLI-based data collection |
|
|
17
|
+
| `BUGS_AND_FIXES.md` | Historical issues and fixes |
|
|
18
|
+
| `monorepo/VISION_SFT_COLLATOR_REFERENCE.md` | Collator technical details |
|
|
19
|
+
|
|
20
|
+
## 🚀 Quick Start (Recommended)
|
|
21
|
+
|
|
22
|
+
**Use synth-ai CLI for complete pipeline:**
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
# Run complete pipeline: collect → filter → train
|
|
26
|
+
bash examples/qwen_vl/run_vision_sft_pipeline.sh
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
This will:
|
|
30
|
+
1. Collect 100 episodes with gpt-5-nano (vision enabled)
|
|
31
|
+
2. Filter traces and export to SFT JSONL format
|
|
32
|
+
3. Optionally start SFT training
|
|
33
|
+
|
|
34
|
+
**Or step-by-step:**
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
# 1. Collect traces
|
|
38
|
+
uvx synth-ai eval --config examples/qwen_vl/configs/eval_gpt5nano_vision.toml
|
|
39
|
+
|
|
40
|
+
# 2. Filter and export
|
|
41
|
+
uvx synth-ai filter --config examples/qwen_vl/configs/filter_vision_sft.toml
|
|
42
|
+
|
|
43
|
+
# 3. Train SFT
|
|
44
|
+
cd /path/to/monorepo
|
|
45
|
+
uvx synth-ai train --type sft --config configs/vision_sft/crafter_qwen3vl_8b_gpt5nano.toml
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
📖 **Full guide:** See `collect_data_via_cli.md` for detailed CLI usage.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Examples (Direct Python Scripts)
|
|
53
|
+
|
|
54
|
+
### 1. **crafter_qwen_vl_agent.py**
|
|
55
|
+
Run Crafter agent using Qwen-VL models via synth-ai's hosted inference.
|
|
56
|
+
|
|
57
|
+
**Models supported:**
|
|
58
|
+
- `Qwen/Qwen2-VL-7B-Instruct`
|
|
59
|
+
- `Qwen/Qwen2-VL-2B-Instruct`
|
|
60
|
+
- `Qwen/Qwen3-VL-8B` (or any Qwen VL variant)
|
|
61
|
+
|
|
62
|
+
**Usage:**
|
|
63
|
+
```bash
|
|
64
|
+
# Run with Qwen2-VL-7B
|
|
65
|
+
uv run python examples/qwen_vl/crafter_qwen_vl_agent.py \
|
|
66
|
+
--model Qwen/Qwen2-VL-7B-Instruct \
|
|
67
|
+
--seeds 10 \
|
|
68
|
+
--steps 20
|
|
69
|
+
|
|
70
|
+
# Run with Qwen3-VL-8B
|
|
71
|
+
uv run python examples/qwen_vl/crafter_qwen_vl_agent.py \
|
|
72
|
+
--model Qwen/Qwen3-VL-8B \
|
|
73
|
+
--seeds 10 \
|
|
74
|
+
--steps 20
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
**Requires:** Synth-AI API key (`SYNTH_API_KEY` environment variable)
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
### 2. **crafter_gpt5nano_agent.py**
|
|
82
|
+
Run Crafter agent using OpenAI's gpt-5-nano vision model.
|
|
83
|
+
|
|
84
|
+
**Usage:**
|
|
85
|
+
```bash
|
|
86
|
+
# Run with gpt-5-nano
|
|
87
|
+
uv run python examples/qwen_vl/crafter_gpt5nano_agent.py \
|
|
88
|
+
--model gpt-5-nano \
|
|
89
|
+
--seeds 10 \
|
|
90
|
+
--steps 20
|
|
91
|
+
|
|
92
|
+
# Run with gpt-4o-mini for comparison
|
|
93
|
+
uv run python examples/qwen_vl/crafter_gpt5nano_agent.py \
|
|
94
|
+
--model gpt-4o-mini-2024-07-18 \
|
|
95
|
+
--seeds 10 \
|
|
96
|
+
--steps 20
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
**Requires:** OpenAI API key (`OPENAI_API_KEY` environment variable)
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
### 3. **collect_vision_traces.py**
|
|
104
|
+
Collect vision traces for SFT dataset creation. Supports both Qwen-VL (synth) and OpenAI models.
|
|
105
|
+
|
|
106
|
+
**Usage:**
|
|
107
|
+
```bash
|
|
108
|
+
# Collect traces with gpt-5-nano
|
|
109
|
+
uv run python examples/qwen_vl/collect_vision_traces.py \
|
|
110
|
+
--model gpt-5-nano \
|
|
111
|
+
--provider openai \
|
|
112
|
+
--episodes 100 \
|
|
113
|
+
--max-steps 50 \
|
|
114
|
+
--output-dir traces/gpt5nano_vision
|
|
115
|
+
|
|
116
|
+
# Collect traces with Qwen2-VL via synth
|
|
117
|
+
uv run python examples/qwen_vl/collect_vision_traces.py \
|
|
118
|
+
--model Qwen/Qwen2-VL-7B-Instruct \
|
|
119
|
+
--provider synth \
|
|
120
|
+
--episodes 100 \
|
|
121
|
+
--max-steps 50 \
|
|
122
|
+
--output-dir traces/qwen2vl_vision
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
**Output:** SQLite database with multimodal traces ready for SFT export.
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## Vision Detection
|
|
130
|
+
|
|
131
|
+
CrafterPolicy automatically detects vision capability from model names:
|
|
132
|
+
- ✅ `gpt-5*` → Vision enabled
|
|
133
|
+
- ✅ `gpt-4o*` → Vision enabled
|
|
134
|
+
- ✅ `*qwen-vl*` → Vision enabled
|
|
135
|
+
- ✅ `*qwen2-vl*` → Vision enabled
|
|
136
|
+
- ✅ `qwen3-vl*` → Vision enabled
|
|
137
|
+
|
|
138
|
+
Or set explicitly: `policy.use_vision = True`
|
|
139
|
+
|
|
140
|
+
## Image Format
|
|
141
|
+
|
|
142
|
+
Crafter environment provides observations as:
|
|
143
|
+
- **64x64 PNG images**
|
|
144
|
+
- **Base64-encoded data URLs**
|
|
145
|
+
- Format: `"data:image/png;base64,iVBORw0KGgo..."`
|
|
146
|
+
|
|
147
|
+
## Next Steps
|
|
148
|
+
|
|
149
|
+
1. Run demo agents to verify vision inference works
|
|
150
|
+
2. Collect training traces with `collect_vision_traces.py`
|
|
151
|
+
3. Export to SFT JSONL format (see `vision_sft_rl.txt`)
|
|
152
|
+
4. Train VLM with LoRA (see monorepo SFT configs)
|
|
153
|
+
5. Fine-tune with RL/GRPO
|
|
154
|
+
|