synth-ai 0.2.14__py3-none-any.whl → 0.2.16__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of synth-ai might be problematic. Click here for more details.

Files changed (236) hide show
  1. examples/README.md +1 -0
  2. examples/multi_step/SFT_README.md +147 -0
  3. examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +9 -9
  4. examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
  5. examples/multi_step/convert_traces_to_sft.py +84 -0
  6. examples/multi_step/run_sft_qwen30b.sh +45 -0
  7. examples/qwen_coder/configs/coder_lora_30b.toml +2 -1
  8. examples/qwen_coder/configs/coder_lora_4b.toml +2 -1
  9. examples/qwen_coder/configs/coder_lora_small.toml +2 -1
  10. examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
  11. examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
  12. examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
  13. examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
  14. examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
  15. examples/qwen_vl/QUICKSTART.md +327 -0
  16. examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
  17. examples/qwen_vl/README.md +154 -0
  18. examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
  19. examples/qwen_vl/RL_VISION_TESTING.md +333 -0
  20. examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
  21. examples/qwen_vl/SETUP_COMPLETE.md +275 -0
  22. examples/qwen_vl/VISION_TESTS_COMPLETE.md +490 -0
  23. examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
  24. examples/qwen_vl/__init__.py +2 -0
  25. examples/qwen_vl/collect_data_via_cli.md +423 -0
  26. examples/qwen_vl/collect_vision_traces.py +368 -0
  27. examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +127 -0
  28. examples/qwen_vl/configs/crafter_vlm_sft_example.toml +60 -0
  29. examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +43 -0
  30. examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
  31. examples/qwen_vl/configs/eval_gpt5nano_vision.toml +45 -0
  32. examples/qwen_vl/configs/eval_qwen2vl_vision.toml +44 -0
  33. examples/qwen_vl/configs/filter_qwen2vl_sft.toml +50 -0
  34. examples/qwen_vl/configs/filter_vision_sft.toml +53 -0
  35. examples/qwen_vl/configs/filter_vision_test.toml +8 -0
  36. examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
  37. examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
  38. examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
  39. examples/qwen_vl/run_vision_comparison.sh +62 -0
  40. examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
  41. examples/qwen_vl/test_image_validation.py +201 -0
  42. examples/qwen_vl/test_sft_vision_data.py +110 -0
  43. examples/rl/README.md +1 -1
  44. examples/rl/configs/eval_base_qwen.toml +17 -0
  45. examples/rl/configs/eval_rl_qwen.toml +13 -0
  46. examples/rl/configs/rl_from_base_qwen.toml +37 -0
  47. examples/rl/configs/rl_from_base_qwen17.toml +76 -0
  48. examples/rl/configs/rl_from_ft_qwen.toml +37 -0
  49. examples/rl/run_eval.py +436 -0
  50. examples/rl/run_rl_and_save.py +111 -0
  51. examples/rl/task_app/README.md +22 -0
  52. examples/rl/task_app/math_single_step.py +990 -0
  53. examples/rl/task_app/math_task_app.py +111 -0
  54. examples/sft/README.md +5 -5
  55. examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -2
  56. examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -3
  57. examples/sft/evaluate.py +2 -4
  58. examples/sft/export_dataset.py +7 -4
  59. examples/swe/task_app/README.md +1 -1
  60. examples/swe/task_app/grpo_swe_mini.py +0 -1
  61. examples/swe/task_app/grpo_swe_mini_task_app.py +0 -12
  62. examples/swe/task_app/hosted/envs/mini_swe/environment.py +13 -13
  63. examples/swe/task_app/hosted/policy_routes.py +0 -2
  64. examples/swe/task_app/hosted/rollout.py +0 -8
  65. examples/task_apps/crafter/task_app/grpo_crafter.py +4 -7
  66. examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +59 -1
  67. examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +30 -0
  68. examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +62 -31
  69. examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +16 -14
  70. examples/task_apps/enron/__init__.py +1 -0
  71. examples/vlm/README.md +3 -3
  72. examples/vlm/configs/crafter_vlm_gpt4o.toml +2 -0
  73. examples/vlm/crafter_openai_vlm_agent.py +3 -5
  74. examples/vlm/filter_image_rows.py +1 -1
  75. examples/vlm/run_crafter_vlm_benchmark.py +2 -2
  76. examples/warming_up_to_rl/_utils.py +92 -0
  77. examples/warming_up_to_rl/analyze_trace_db.py +1 -1
  78. examples/warming_up_to_rl/configs/crafter_fft.toml +2 -0
  79. examples/warming_up_to_rl/configs/crafter_fft_4b.toml +2 -0
  80. examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +2 -0
  81. examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +2 -0
  82. examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +2 -1
  83. examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +2 -1
  84. examples/warming_up_to_rl/configs/rl_from_ft.toml +2 -0
  85. examples/warming_up_to_rl/export_trace_sft.py +174 -60
  86. examples/warming_up_to_rl/readme.md +63 -132
  87. examples/warming_up_to_rl/run_fft_and_save.py +1 -1
  88. examples/warming_up_to_rl/run_rl_and_save.py +1 -1
  89. examples/warming_up_to_rl/task_app/README.md +42 -0
  90. examples/warming_up_to_rl/task_app/grpo_crafter.py +696 -0
  91. examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
  92. examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
  93. examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
  94. examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
  95. examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
  96. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
  97. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
  98. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
  99. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
  100. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +478 -0
  101. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
  102. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
  103. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
  104. examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +204 -0
  105. examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
  106. examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +618 -0
  107. examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
  108. examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1081 -0
  109. examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
  110. examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1861 -0
  111. examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
  112. examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
  113. examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
  114. examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
  115. examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +62 -0
  116. synth_ai/__init__.py +44 -30
  117. synth_ai/_utils/__init__.py +47 -0
  118. synth_ai/_utils/base_url.py +10 -0
  119. synth_ai/_utils/http.py +10 -0
  120. synth_ai/_utils/prompts.py +10 -0
  121. synth_ai/_utils/task_app_state.py +12 -0
  122. synth_ai/_utils/user_config.py +10 -0
  123. synth_ai/api/models/supported.py +144 -7
  124. synth_ai/api/train/__init__.py +13 -1
  125. synth_ai/api/train/cli.py +30 -7
  126. synth_ai/api/train/config_finder.py +18 -11
  127. synth_ai/api/train/env_resolver.py +13 -10
  128. synth_ai/cli/__init__.py +62 -78
  129. synth_ai/cli/_modal_wrapper.py +7 -5
  130. synth_ai/cli/_typer_patch.py +0 -2
  131. synth_ai/cli/_validate_task_app.py +22 -4
  132. synth_ai/cli/legacy_root_backup.py +3 -1
  133. synth_ai/cli/lib/__init__.py +10 -0
  134. synth_ai/cli/lib/task_app_discovery.py +7 -0
  135. synth_ai/cli/lib/task_app_env.py +518 -0
  136. synth_ai/cli/recent.py +2 -1
  137. synth_ai/cli/setup.py +266 -0
  138. synth_ai/cli/status.py +1 -1
  139. synth_ai/cli/task_app_deploy.py +16 -0
  140. synth_ai/cli/task_app_list.py +25 -0
  141. synth_ai/cli/task_app_modal_serve.py +16 -0
  142. synth_ai/cli/task_app_serve.py +18 -0
  143. synth_ai/cli/task_apps.py +71 -31
  144. synth_ai/cli/traces.py +1 -1
  145. synth_ai/cli/train.py +18 -0
  146. synth_ai/cli/tui.py +7 -2
  147. synth_ai/cli/turso.py +1 -1
  148. synth_ai/cli/watch.py +1 -1
  149. synth_ai/demos/__init__.py +10 -0
  150. synth_ai/demos/core/__init__.py +28 -1
  151. synth_ai/demos/crafter/__init__.py +1 -0
  152. synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
  153. synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
  154. synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
  155. synth_ai/demos/demo_registry.py +176 -0
  156. synth_ai/demos/math/__init__.py +1 -0
  157. synth_ai/demos/math/_common.py +16 -0
  158. synth_ai/demos/math/app.py +38 -0
  159. synth_ai/demos/math/config.toml +76 -0
  160. synth_ai/demos/math/deploy_modal.py +54 -0
  161. synth_ai/demos/math/modal_task_app.py +702 -0
  162. synth_ai/demos/math/task_app_entry.py +51 -0
  163. synth_ai/environments/environment/core.py +7 -1
  164. synth_ai/environments/examples/bandit/engine.py +0 -1
  165. synth_ai/environments/examples/bandit/environment.py +0 -1
  166. synth_ai/environments/examples/wordle/environment.py +0 -1
  167. synth_ai/evals/base.py +16 -5
  168. synth_ai/evals/client.py +1 -1
  169. synth_ai/inference/client.py +1 -1
  170. synth_ai/judge_schemas.py +8 -8
  171. synth_ai/learning/client.py +1 -1
  172. synth_ai/learning/health.py +1 -1
  173. synth_ai/learning/jobs.py +1 -1
  174. synth_ai/learning/rl/client.py +1 -1
  175. synth_ai/learning/rl/env_keys.py +1 -1
  176. synth_ai/learning/rl/secrets.py +1 -1
  177. synth_ai/learning/sft/client.py +1 -1
  178. synth_ai/learning/sft/data.py +407 -4
  179. synth_ai/learning/validators.py +4 -1
  180. synth_ai/task/apps/__init__.py +4 -2
  181. synth_ai/task/config.py +6 -4
  182. synth_ai/task/rubrics/__init__.py +1 -2
  183. synth_ai/task/rubrics/loaders.py +14 -10
  184. synth_ai/task/rubrics.py +219 -0
  185. synth_ai/task/trace_correlation_helpers.py +24 -11
  186. synth_ai/task/tracing_utils.py +14 -3
  187. synth_ai/task/validators.py +2 -3
  188. synth_ai/tracing_v3/abstractions.py +3 -3
  189. synth_ai/tracing_v3/config.py +15 -13
  190. synth_ai/tracing_v3/constants.py +21 -0
  191. synth_ai/tracing_v3/db_config.py +3 -1
  192. synth_ai/tracing_v3/decorators.py +10 -7
  193. synth_ai/tracing_v3/llm_call_record_helpers.py +5 -5
  194. synth_ai/tracing_v3/session_tracer.py +7 -7
  195. synth_ai/tracing_v3/storage/base.py +29 -29
  196. synth_ai/tracing_v3/storage/config.py +3 -3
  197. synth_ai/tracing_v3/turso/daemon.py +8 -9
  198. synth_ai/tracing_v3/turso/native_manager.py +80 -72
  199. synth_ai/tracing_v3/utils.py +2 -2
  200. synth_ai/tui/cli/query_experiments.py +4 -4
  201. synth_ai/tui/cli/query_experiments_v3.py +4 -4
  202. synth_ai/tui/dashboard.py +14 -9
  203. synth_ai/utils/__init__.py +101 -0
  204. synth_ai/utils/base_url.py +94 -0
  205. synth_ai/utils/cli.py +131 -0
  206. synth_ai/utils/env.py +287 -0
  207. synth_ai/utils/http.py +169 -0
  208. synth_ai/utils/modal.py +308 -0
  209. synth_ai/utils/process.py +212 -0
  210. synth_ai/utils/prompts.py +39 -0
  211. synth_ai/utils/sqld.py +122 -0
  212. synth_ai/utils/task_app_discovery.py +882 -0
  213. synth_ai/utils/task_app_env.py +186 -0
  214. synth_ai/utils/task_app_state.py +318 -0
  215. synth_ai/utils/user_config.py +137 -0
  216. synth_ai/v0/config/__init__.py +1 -5
  217. synth_ai/v0/config/base_url.py +1 -7
  218. synth_ai/v0/tracing/config.py +1 -1
  219. synth_ai/v0/tracing/decorators.py +1 -1
  220. synth_ai/v0/tracing/upload.py +1 -1
  221. synth_ai/v0/tracing_v1/config.py +1 -1
  222. synth_ai/v0/tracing_v1/decorators.py +1 -1
  223. synth_ai/v0/tracing_v1/upload.py +1 -1
  224. {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/METADATA +85 -31
  225. {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/RECORD +229 -117
  226. synth_ai/cli/man.py +0 -106
  227. synth_ai/compound/cais.py +0 -0
  228. synth_ai/core/experiment.py +0 -13
  229. synth_ai/core/system.py +0 -15
  230. synth_ai/demo_registry.py +0 -295
  231. synth_ai/handshake.py +0 -109
  232. synth_ai/http.py +0 -26
  233. {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/WHEEL +0 -0
  234. {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/entry_points.txt +0 -0
  235. {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/licenses/LICENSE +0 -0
  236. {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,412 @@
1
+ # Vision Inference & SFT Integration Tests
2
+
3
+ Complete integration tests for vision inference and SFT training with multimodal data.
4
+
5
+ ## Overview
6
+
7
+ Two new test suites validate the full vision ML pipeline:
8
+ 1. **Inference Tests** - Vision model inference with multimodal requests
9
+ 2. **SFT Tests** - Supervised fine-tuning with vision data
10
+
11
+ ## Test Files
12
+
13
+ ### 1. Vision Inference Tests
14
+ **File:** `tests/integration/cli/test_cli_inference_vision.py`
15
+
16
+ **Tests:**
17
+ - `test_vision_inference_with_image` - Basic vision inference with image + text
18
+ - `test_vision_inference_validation` - Invalid image validation (empty URLs, etc.)
19
+ - `test_vision_inference_multiple_images` - Multiple images in one request
20
+
21
+ **What They Test:**
22
+ - ✅ Backend accepts multimodal messages
23
+ - ✅ Vision models process image + text input
24
+ - ✅ Image validation catches invalid data before inference
25
+ - ✅ Multiple image handling
26
+ - ✅ Response format validation
27
+
28
+ ### 2. Vision SFT Tests
29
+ **File:** `tests/integration/cli/test_cli_train_sft_vision.py`
30
+
31
+ **Tests:**
32
+ - `test_cli_train_sft_vision_qwen2vl` - Full SFT training job submission
33
+ - `test_vision_sft_dataset_validation` - Dataset validation with mixed valid/invalid
34
+ - `test_cli_train_sft_vision_small_config` - Fast CI test with artifact config
35
+
36
+ **What They Test:**
37
+ - ✅ Vision SFT dataset creation with images
38
+ - ✅ Job submission for vision SFT training
39
+ - ✅ Backend accepts vision training config
40
+ - ✅ Dataset validation filters invalid examples
41
+ - ✅ LoRA training configuration for vision models
42
+
43
+ ## Quick Start
44
+
45
+ ### Prerequisites
46
+ ```bash
47
+ export SYNTH_API_KEY="your-api-key"
48
+ export BACKEND_BASE_URL="https://agent-learning.onrender.com/api"
49
+ ```
50
+
51
+ ### Run Inference Tests
52
+ ```bash
53
+ cd /Users/joshpurtell/Documents/GitHub/synth-ai
54
+
55
+ # All inference tests
56
+ uv run pytest tests/integration/cli/test_cli_inference_vision.py -v -s
57
+
58
+ # Single test
59
+ uv run pytest tests/integration/cli/test_cli_inference_vision.py::test_vision_inference_with_image -v
60
+
61
+ # With marks
62
+ uv run pytest -m "vision and slow" tests/integration/cli/test_cli_inference_vision.py
63
+ ```
64
+
65
+ ### Run SFT Tests
66
+ ```bash
67
+ # All SFT tests
68
+ uv run pytest tests/integration/cli/test_cli_train_sft_vision.py -v -s
69
+
70
+ # Dataset validation only (fast)
71
+ uv run pytest tests/integration/cli/test_cli_train_sft_vision.py::test_vision_sft_dataset_validation -v
72
+
73
+ # Small config test (job submission)
74
+ uv run pytest tests/integration/cli/test_cli_train_sft_vision.py::test_cli_train_sft_vision_small_config -v
75
+ ```
76
+
77
+ ### Run All Vision Tests
78
+ ```bash
79
+ # All vision tests (inference + SFT + RL)
80
+ uv run pytest -m vision -v -s
81
+
82
+ # Vision tests without slow ones
83
+ uv run pytest -m "vision and not slow" -v
84
+ ```
85
+
86
+ ## Test Details
87
+
88
+ ### Inference Test 1: Basic Vision Inference
89
+ **Function:** `test_vision_inference_with_image`
90
+
91
+ **Creates:**
92
+ - Simple 64x64 red image (base64 encoded)
93
+ - Multimodal request with text + image
94
+ - POST to `/v1/chat/completions`
95
+
96
+ **Validates:**
97
+ - Response has `choices` array
98
+ - Choice has `message` with `content`
99
+ - Content is non-empty string
100
+
101
+ **Expected Output:**
102
+ ```
103
+ ✅ Vision inference successful
104
+ Model: Qwen/Qwen2-VL-2B-Instruct
105
+ Response: This image is red...
106
+ ```
107
+
108
+ **Runtime:** ~10-20 seconds (depends on model loading)
109
+
110
+ ### Inference Test 2: Validation
111
+ **Function:** `test_vision_inference_validation`
112
+
113
+ **Tests Invalid Requests:**
114
+ 1. Empty image URL: `{"url": ""}`
115
+ 2. Missing URL field: `{"image_url": {}}`
116
+ 3. Whitespace URL: `{"url": " "}`
117
+
118
+ **Validates:**
119
+ - Backend returns 4xx error (validation failure)
120
+ - Error message indicates the problem
121
+ - No wasted inference on invalid data
122
+
123
+ **Expected Output:**
124
+ ```
125
+ ✅ Correctly rejected: Empty image URL
126
+ Error code: 400
127
+ Error message: Image URL cannot be empty...
128
+ ```
129
+
130
+ ### Inference Test 3: Multiple Images
131
+ **Function:** `test_vision_inference_multiple_images`
132
+
133
+ **Creates:**
134
+ - Red and blue test images
135
+ - Single message with 2 images
136
+
137
+ **Validates:**
138
+ - Backend handles multiple images
139
+ - Model processes both images
140
+ - Response mentions both colors (if model supports)
141
+
142
+ **Note:** May skip if model doesn't support multiple images per message.
143
+
144
+ ### SFT Test 1: Full Training Job
145
+ **Function:** `test_cli_train_sft_vision_qwen2vl`
146
+
147
+ **Creates:**
148
+ - 3-example vision SFT dataset (JSONL)
149
+ - Each example has 1 image (base64 in data URL)
150
+ - Minimal training config (1 epoch, LoRA)
151
+
152
+ **Submits:**
153
+ - SFT training job via CLI
154
+ - Model: Qwen2-VL-2B-Instruct
155
+ - Config includes `supports_vision = true`
156
+
157
+ **Validates:**
158
+ - Job created successfully
159
+ - Job ID returned
160
+ - Config accepted by backend
161
+
162
+ **Expected Output:**
163
+ ```
164
+ ✅ Vision SFT job created: job-abc123
165
+ Model: Qwen2-VL-2B-Instruct
166
+ Dataset: /tmp/.../vision_sft_test.jsonl
167
+ Examples: 3 (with images)
168
+ ```
169
+
170
+ **Runtime:** ~30-60 seconds (job submission only, not training)
171
+
172
+ ### SFT Test 2: Dataset Validation
173
+ **Function:** `test_vision_sft_dataset_validation`
174
+
175
+ **Creates:**
176
+ - 4-example dataset (2 valid, 2 invalid)
177
+ - Invalid examples have empty/missing URLs
178
+
179
+ **Validates:**
180
+ - SDK validation correctly identifies valid examples
181
+ - Invalid examples are flagged with specific errors
182
+ - No false positives or negatives
183
+
184
+ **Expected Output:**
185
+ ```
186
+ ✅ Example 0: Valid
187
+ ❌ Example 1: Invalid - Has 1 image_url entries but only 0 valid URLs
188
+ ❌ Example 2: Invalid - Has 1 image_url entries but only 0 valid URLs
189
+ ✅ Example 3: Valid
190
+
191
+ ✅ Dataset validation working correctly
192
+ Total examples: 4
193
+ Valid: 2
194
+ Invalid: 2
195
+ ```
196
+
197
+ **Runtime:** ~1-2 seconds (pure validation, no network)
198
+
199
+ ### SFT Test 3: Fast CI Test
200
+ **Function:** `test_cli_train_sft_vision_small_config`
201
+
202
+ **Uses:**
203
+ - Artifact config (`tests/artifacts/configs/sft.vision.small.toml`)
204
+ - Minimal settings for fast validation
205
+
206
+ **Validates:**
207
+ - Same as Test 1 but faster
208
+ - Config artifact is correct
209
+
210
+ **Runtime:** ~20-40 seconds
211
+
212
+ ## Dataset Format
213
+
214
+ ### Vision SFT Example
215
+ ```json
216
+ {
217
+ "messages": [
218
+ {
219
+ "role": "user",
220
+ "content": [
221
+ {"type": "text", "text": "What color is this?"},
222
+ {
223
+ "type": "image_url",
224
+ "image_url": {
225
+ "url": "..."
226
+ }
227
+ }
228
+ ]
229
+ },
230
+ {
231
+ "role": "assistant",
232
+ "content": "This image is red."
233
+ }
234
+ ],
235
+ "metadata": {"example_id": 1}
236
+ }
237
+ ```
238
+
239
+ ### Supported Image Formats
240
+ - **Data URLs:** `data:image/png;base64,<base64-data>`
241
+ - **HTTP URLs:** `https://example.com/image.jpg`
242
+ - **Local paths:** `/path/to/image.png` (converted to PIL Image)
243
+
244
+ ### Validation Rules
245
+ ✅ **Valid:**
246
+ - Non-empty URL string
247
+ - Valid scheme (http://, https://, data:image/)
248
+ - Properly formatted base64 (if data URL)
249
+
250
+ ❌ **Invalid:**
251
+ - Empty string: `""`
252
+ - Whitespace only: `" "`
253
+ - Null value: `None` or `null`
254
+ - Missing URL field
255
+ - Non-string URL
256
+
257
+ ## Integration with Other Tests
258
+
259
+ ### Combined with RL Vision Tests
260
+ ```bash
261
+ # All vision tests (inference + SFT + RL)
262
+ uv run pytest -m vision tests/integration/cli/ -v
263
+
264
+ # Specific pipeline
265
+ uv run pytest \
266
+ tests/integration/cli/test_cli_inference_vision.py \
267
+ tests/integration/cli/test_cli_train_sft_vision.py \
268
+ tests/integration/cli/test_cli_train_rl_vision.py \
269
+ -v -s
270
+ ```
271
+
272
+ ### Test Matrix
273
+
274
+ | Test Suite | Model | Data | Runtime | Purpose |
275
+ |------------|-------|------|---------|---------|
276
+ | Inference | Qwen2-VL-2B | Generated | ~20s | API validation |
277
+ | SFT | Qwen2-VL-2B | Generated | ~30s | Training job |
278
+ | RL | Qwen3-VL-4B | Task app | ~5-10min | Full pipeline |
279
+
280
+ ## Troubleshooting
281
+
282
+ ### Inference Test Fails
283
+ ```bash
284
+ # Check backend connectivity
285
+ curl $BACKEND_BASE_URL/health
286
+
287
+ # Check API key
288
+ echo $SYNTH_API_KEY
289
+
290
+ # Verify model is available
291
+ curl -H "Authorization: Bearer $SYNTH_API_KEY" \
292
+ $BACKEND_BASE_URL/v1/models
293
+ ```
294
+
295
+ ### SFT Test Fails
296
+ ```bash
297
+ # Check dataset was created
298
+ cat /tmp/test_sft_vision/vision_sft_test.jsonl
299
+
300
+ # Validate dataset manually
301
+ python -c "
302
+ from synth_ai.learning.sft.data import load_jsonl, validate_vision_example
303
+ examples = load_jsonl('path/to/dataset.jsonl', min_messages=1)
304
+ for ex in examples:
305
+ is_valid, error = validate_vision_example(ex, require_images=True)
306
+ print(f'Valid: {is_valid}, Error: {error}')
307
+ "
308
+ ```
309
+
310
+ ### PIL Not Available
311
+ ```bash
312
+ # Install Pillow
313
+ uv pip install Pillow
314
+
315
+ # Or use conda
316
+ conda install pillow
317
+ ```
318
+
319
+ ### Image Too Large
320
+ ```python
321
+ # Reduce image size in test
322
+ img = Image.new('RGB', (32, 32), color='red') # 32x32 instead of 64x64
323
+ ```
324
+
325
+ ## CI Integration
326
+
327
+ ### Pytest Marks
328
+ ```python
329
+ @pytest.mark.slow # Takes >5 seconds
330
+ @pytest.mark.vision # Requires vision support
331
+ @pytest.mark.integration # Full integration test
332
+ ```
333
+
334
+ ### Run in CI
335
+ ```yaml
336
+ # .github/workflows/test.yml
337
+ - name: Run vision integration tests
338
+ run: |
339
+ pytest -m "vision and integration" \
340
+ tests/integration/cli/test_cli_inference_vision.py \
341
+ tests/integration/cli/test_cli_train_sft_vision.py \
342
+ -v --tb=short
343
+ env:
344
+ SYNTH_API_KEY: ${{ secrets.SYNTH_API_KEY }}
345
+ BACKEND_BASE_URL: ${{ secrets.BACKEND_URL }}
346
+ ```
347
+
348
+ ### Skip in Fast CI
349
+ ```bash
350
+ # Skip slow tests for PR checks
351
+ pytest -m "not slow" tests/
352
+
353
+ # Include vision but skip slow
354
+ pytest -m "vision and not slow" tests/
355
+ ```
356
+
357
+ ## Performance Expectations
358
+
359
+ ### Inference Tests
360
+ - **test_vision_inference_with_image:** 10-20s
361
+ - **test_vision_inference_validation:** 5-10s (3 requests)
362
+ - **test_vision_inference_multiple_images:** 15-25s
363
+
364
+ **Total:** ~30-55 seconds
365
+
366
+ ### SFT Tests
367
+ - **test_vision_sft_dataset_validation:** 1-2s (local only)
368
+ - **test_cli_train_sft_vision_small_config:** 20-40s
369
+ - **test_cli_train_sft_vision_qwen2vl:** 30-60s
370
+
371
+ **Total:** ~50-100 seconds
372
+
373
+ ### All Vision Tests (Inference + SFT + RL)
374
+ - **Total Runtime:** ~6-12 minutes
375
+ - **Network calls:** ~10-15
376
+ - **GPU time:** 0 (job submission only, not actual training)
377
+
378
+ ## Related Documentation
379
+
380
+ - **RL Vision Tests:** `RL_VISION_TESTING.md`
381
+ - **Image Validation:** `IMAGE_VALIDATION_COMPLETE.md`
382
+ - **VLM Pipeline:** `VLM_PIPELINE_COMPLETE.md`
383
+ - **Quick Start:** `QUICKSTART_RL_VISION.md`
384
+
385
+ ## Summary
386
+
387
+ ✅ **Complete test coverage for vision ML pipeline:**
388
+ - Inference API with multimodal messages
389
+ - Image validation before inference
390
+ - SFT dataset creation and validation
391
+ - SFT training job submission
392
+ - Integration with existing RL vision tests
393
+
394
+ **Test Count:**
395
+ - Inference: 3 tests
396
+ - SFT: 3 tests
397
+ - RL: 3 tests (from previous work)
398
+ - **Total: 9 vision integration tests**
399
+
400
+ **Coverage:**
401
+ - ✅ End-to-end inference
402
+ - ✅ Request validation
403
+ - ✅ Dataset creation
404
+ - ✅ Dataset validation
405
+ - ✅ SFT job submission
406
+ - ✅ RL job submission
407
+ - ✅ Task app vision support
408
+
409
+ ---
410
+
411
+ **Status:** Production-ready! Run `pytest -m vision -v` to validate the full vision ML pipeline from inference to RL training! 🎉
412
+