synth-ai 0.2.14__py3-none-any.whl → 0.2.16__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synth-ai might be problematic. Click here for more details.
- examples/README.md +1 -0
- examples/multi_step/SFT_README.md +147 -0
- examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +9 -9
- examples/multi_step/configs/crafter_sft_qwen30b_lora.toml +62 -0
- examples/multi_step/convert_traces_to_sft.py +84 -0
- examples/multi_step/run_sft_qwen30b.sh +45 -0
- examples/qwen_coder/configs/coder_lora_30b.toml +2 -1
- examples/qwen_coder/configs/coder_lora_4b.toml +2 -1
- examples/qwen_coder/configs/coder_lora_small.toml +2 -1
- examples/qwen_vl/BUGS_AND_FIXES.md +232 -0
- examples/qwen_vl/IMAGE_VALIDATION_COMPLETE.md +271 -0
- examples/qwen_vl/IMAGE_VALIDATION_SUMMARY.md +260 -0
- examples/qwen_vl/INFERENCE_SFT_TESTS.md +412 -0
- examples/qwen_vl/NEXT_STEPS_2B.md +325 -0
- examples/qwen_vl/QUICKSTART.md +327 -0
- examples/qwen_vl/QUICKSTART_RL_VISION.md +110 -0
- examples/qwen_vl/README.md +154 -0
- examples/qwen_vl/RL_VISION_COMPLETE.md +475 -0
- examples/qwen_vl/RL_VISION_TESTING.md +333 -0
- examples/qwen_vl/SDK_VISION_INTEGRATION.md +328 -0
- examples/qwen_vl/SETUP_COMPLETE.md +275 -0
- examples/qwen_vl/VISION_TESTS_COMPLETE.md +490 -0
- examples/qwen_vl/VLM_PIPELINE_COMPLETE.md +242 -0
- examples/qwen_vl/__init__.py +2 -0
- examples/qwen_vl/collect_data_via_cli.md +423 -0
- examples/qwen_vl/collect_vision_traces.py +368 -0
- examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +127 -0
- examples/qwen_vl/configs/crafter_vlm_sft_example.toml +60 -0
- examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +43 -0
- examples/qwen_vl/configs/eval_gpt4o_vision_proper.toml +29 -0
- examples/qwen_vl/configs/eval_gpt5nano_vision.toml +45 -0
- examples/qwen_vl/configs/eval_qwen2vl_vision.toml +44 -0
- examples/qwen_vl/configs/filter_qwen2vl_sft.toml +50 -0
- examples/qwen_vl/configs/filter_vision_sft.toml +53 -0
- examples/qwen_vl/configs/filter_vision_test.toml +8 -0
- examples/qwen_vl/configs/sft_qwen3_vl_2b_test.toml +54 -0
- examples/qwen_vl/crafter_gpt5nano_agent.py +308 -0
- examples/qwen_vl/crafter_qwen_vl_agent.py +300 -0
- examples/qwen_vl/run_vision_comparison.sh +62 -0
- examples/qwen_vl/run_vision_sft_pipeline.sh +175 -0
- examples/qwen_vl/test_image_validation.py +201 -0
- examples/qwen_vl/test_sft_vision_data.py +110 -0
- examples/rl/README.md +1 -1
- examples/rl/configs/eval_base_qwen.toml +17 -0
- examples/rl/configs/eval_rl_qwen.toml +13 -0
- examples/rl/configs/rl_from_base_qwen.toml +37 -0
- examples/rl/configs/rl_from_base_qwen17.toml +76 -0
- examples/rl/configs/rl_from_ft_qwen.toml +37 -0
- examples/rl/run_eval.py +436 -0
- examples/rl/run_rl_and_save.py +111 -0
- examples/rl/task_app/README.md +22 -0
- examples/rl/task_app/math_single_step.py +990 -0
- examples/rl/task_app/math_task_app.py +111 -0
- examples/sft/README.md +5 -5
- examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -2
- examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -3
- examples/sft/evaluate.py +2 -4
- examples/sft/export_dataset.py +7 -4
- examples/swe/task_app/README.md +1 -1
- examples/swe/task_app/grpo_swe_mini.py +0 -1
- examples/swe/task_app/grpo_swe_mini_task_app.py +0 -12
- examples/swe/task_app/hosted/envs/mini_swe/environment.py +13 -13
- examples/swe/task_app/hosted/policy_routes.py +0 -2
- examples/swe/task_app/hosted/rollout.py +0 -8
- examples/task_apps/crafter/task_app/grpo_crafter.py +4 -7
- examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +59 -1
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +30 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +62 -31
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +16 -14
- examples/task_apps/enron/__init__.py +1 -0
- examples/vlm/README.md +3 -3
- examples/vlm/configs/crafter_vlm_gpt4o.toml +2 -0
- examples/vlm/crafter_openai_vlm_agent.py +3 -5
- examples/vlm/filter_image_rows.py +1 -1
- examples/vlm/run_crafter_vlm_benchmark.py +2 -2
- examples/warming_up_to_rl/_utils.py +92 -0
- examples/warming_up_to_rl/analyze_trace_db.py +1 -1
- examples/warming_up_to_rl/configs/crafter_fft.toml +2 -0
- examples/warming_up_to_rl/configs/crafter_fft_4b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_fft_qwen4b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml +2 -0
- examples/warming_up_to_rl/configs/eval_modal_qwen4b.toml +2 -1
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +2 -1
- examples/warming_up_to_rl/configs/rl_from_ft.toml +2 -0
- examples/warming_up_to_rl/export_trace_sft.py +174 -60
- examples/warming_up_to_rl/readme.md +63 -132
- examples/warming_up_to_rl/run_fft_and_save.py +1 -1
- examples/warming_up_to_rl/run_rl_and_save.py +1 -1
- examples/warming_up_to_rl/task_app/README.md +42 -0
- examples/warming_up_to_rl/task_app/grpo_crafter.py +696 -0
- examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +135 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/README.md +173 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/branching.py +143 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/environment_routes.py +1226 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/__init__.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/__init__.py +6 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/app.py +1 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/environment.py +522 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +478 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +108 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/shared.py +305 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/tools.py +47 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +204 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +618 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/main.py +100 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +1081 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/registry.py +195 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +1861 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/__init__.py +5 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/storage/volume.py +211 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_agents.py +161 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/test_service.py +137 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +62 -0
- synth_ai/__init__.py +44 -30
- synth_ai/_utils/__init__.py +47 -0
- synth_ai/_utils/base_url.py +10 -0
- synth_ai/_utils/http.py +10 -0
- synth_ai/_utils/prompts.py +10 -0
- synth_ai/_utils/task_app_state.py +12 -0
- synth_ai/_utils/user_config.py +10 -0
- synth_ai/api/models/supported.py +144 -7
- synth_ai/api/train/__init__.py +13 -1
- synth_ai/api/train/cli.py +30 -7
- synth_ai/api/train/config_finder.py +18 -11
- synth_ai/api/train/env_resolver.py +13 -10
- synth_ai/cli/__init__.py +62 -78
- synth_ai/cli/_modal_wrapper.py +7 -5
- synth_ai/cli/_typer_patch.py +0 -2
- synth_ai/cli/_validate_task_app.py +22 -4
- synth_ai/cli/legacy_root_backup.py +3 -1
- synth_ai/cli/lib/__init__.py +10 -0
- synth_ai/cli/lib/task_app_discovery.py +7 -0
- synth_ai/cli/lib/task_app_env.py +518 -0
- synth_ai/cli/recent.py +2 -1
- synth_ai/cli/setup.py +266 -0
- synth_ai/cli/status.py +1 -1
- synth_ai/cli/task_app_deploy.py +16 -0
- synth_ai/cli/task_app_list.py +25 -0
- synth_ai/cli/task_app_modal_serve.py +16 -0
- synth_ai/cli/task_app_serve.py +18 -0
- synth_ai/cli/task_apps.py +71 -31
- synth_ai/cli/traces.py +1 -1
- synth_ai/cli/train.py +18 -0
- synth_ai/cli/tui.py +7 -2
- synth_ai/cli/turso.py +1 -1
- synth_ai/cli/watch.py +1 -1
- synth_ai/demos/__init__.py +10 -0
- synth_ai/demos/core/__init__.py +28 -1
- synth_ai/demos/crafter/__init__.py +1 -0
- synth_ai/demos/crafter/crafter_fft_4b.toml +55 -0
- synth_ai/demos/crafter/grpo_crafter_task_app.py +185 -0
- synth_ai/demos/crafter/rl_from_base_qwen4b.toml +74 -0
- synth_ai/demos/demo_registry.py +176 -0
- synth_ai/demos/math/__init__.py +1 -0
- synth_ai/demos/math/_common.py +16 -0
- synth_ai/demos/math/app.py +38 -0
- synth_ai/demos/math/config.toml +76 -0
- synth_ai/demos/math/deploy_modal.py +54 -0
- synth_ai/demos/math/modal_task_app.py +702 -0
- synth_ai/demos/math/task_app_entry.py +51 -0
- synth_ai/environments/environment/core.py +7 -1
- synth_ai/environments/examples/bandit/engine.py +0 -1
- synth_ai/environments/examples/bandit/environment.py +0 -1
- synth_ai/environments/examples/wordle/environment.py +0 -1
- synth_ai/evals/base.py +16 -5
- synth_ai/evals/client.py +1 -1
- synth_ai/inference/client.py +1 -1
- synth_ai/judge_schemas.py +8 -8
- synth_ai/learning/client.py +1 -1
- synth_ai/learning/health.py +1 -1
- synth_ai/learning/jobs.py +1 -1
- synth_ai/learning/rl/client.py +1 -1
- synth_ai/learning/rl/env_keys.py +1 -1
- synth_ai/learning/rl/secrets.py +1 -1
- synth_ai/learning/sft/client.py +1 -1
- synth_ai/learning/sft/data.py +407 -4
- synth_ai/learning/validators.py +4 -1
- synth_ai/task/apps/__init__.py +4 -2
- synth_ai/task/config.py +6 -4
- synth_ai/task/rubrics/__init__.py +1 -2
- synth_ai/task/rubrics/loaders.py +14 -10
- synth_ai/task/rubrics.py +219 -0
- synth_ai/task/trace_correlation_helpers.py +24 -11
- synth_ai/task/tracing_utils.py +14 -3
- synth_ai/task/validators.py +2 -3
- synth_ai/tracing_v3/abstractions.py +3 -3
- synth_ai/tracing_v3/config.py +15 -13
- synth_ai/tracing_v3/constants.py +21 -0
- synth_ai/tracing_v3/db_config.py +3 -1
- synth_ai/tracing_v3/decorators.py +10 -7
- synth_ai/tracing_v3/llm_call_record_helpers.py +5 -5
- synth_ai/tracing_v3/session_tracer.py +7 -7
- synth_ai/tracing_v3/storage/base.py +29 -29
- synth_ai/tracing_v3/storage/config.py +3 -3
- synth_ai/tracing_v3/turso/daemon.py +8 -9
- synth_ai/tracing_v3/turso/native_manager.py +80 -72
- synth_ai/tracing_v3/utils.py +2 -2
- synth_ai/tui/cli/query_experiments.py +4 -4
- synth_ai/tui/cli/query_experiments_v3.py +4 -4
- synth_ai/tui/dashboard.py +14 -9
- synth_ai/utils/__init__.py +101 -0
- synth_ai/utils/base_url.py +94 -0
- synth_ai/utils/cli.py +131 -0
- synth_ai/utils/env.py +287 -0
- synth_ai/utils/http.py +169 -0
- synth_ai/utils/modal.py +308 -0
- synth_ai/utils/process.py +212 -0
- synth_ai/utils/prompts.py +39 -0
- synth_ai/utils/sqld.py +122 -0
- synth_ai/utils/task_app_discovery.py +882 -0
- synth_ai/utils/task_app_env.py +186 -0
- synth_ai/utils/task_app_state.py +318 -0
- synth_ai/utils/user_config.py +137 -0
- synth_ai/v0/config/__init__.py +1 -5
- synth_ai/v0/config/base_url.py +1 -7
- synth_ai/v0/tracing/config.py +1 -1
- synth_ai/v0/tracing/decorators.py +1 -1
- synth_ai/v0/tracing/upload.py +1 -1
- synth_ai/v0/tracing_v1/config.py +1 -1
- synth_ai/v0/tracing_v1/decorators.py +1 -1
- synth_ai/v0/tracing_v1/upload.py +1 -1
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/METADATA +85 -31
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/RECORD +229 -117
- synth_ai/cli/man.py +0 -106
- synth_ai/compound/cais.py +0 -0
- synth_ai/core/experiment.py +0 -13
- synth_ai/core/system.py +0 -15
- synth_ai/demo_registry.py +0 -295
- synth_ai/handshake.py +0 -109
- synth_ai/http.py +0 -26
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/entry_points.txt +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/licenses/LICENSE +0 -0
- {synth_ai-0.2.14.dist-info → synth_ai-0.2.16.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,412 @@
|
|
|
1
|
+
# Vision Inference & SFT Integration Tests
|
|
2
|
+
|
|
3
|
+
Complete integration tests for vision inference and SFT training with multimodal data.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
Two new test suites validate the full vision ML pipeline:
|
|
8
|
+
1. **Inference Tests** - Vision model inference with multimodal requests
|
|
9
|
+
2. **SFT Tests** - Supervised fine-tuning with vision data
|
|
10
|
+
|
|
11
|
+
## Test Files
|
|
12
|
+
|
|
13
|
+
### 1. Vision Inference Tests
|
|
14
|
+
**File:** `tests/integration/cli/test_cli_inference_vision.py`
|
|
15
|
+
|
|
16
|
+
**Tests:**
|
|
17
|
+
- `test_vision_inference_with_image` - Basic vision inference with image + text
|
|
18
|
+
- `test_vision_inference_validation` - Invalid image validation (empty URLs, etc.)
|
|
19
|
+
- `test_vision_inference_multiple_images` - Multiple images in one request
|
|
20
|
+
|
|
21
|
+
**What They Test:**
|
|
22
|
+
- ✅ Backend accepts multimodal messages
|
|
23
|
+
- ✅ Vision models process image + text input
|
|
24
|
+
- ✅ Image validation catches invalid data before inference
|
|
25
|
+
- ✅ Multiple image handling
|
|
26
|
+
- ✅ Response format validation
|
|
27
|
+
|
|
28
|
+
### 2. Vision SFT Tests
|
|
29
|
+
**File:** `tests/integration/cli/test_cli_train_sft_vision.py`
|
|
30
|
+
|
|
31
|
+
**Tests:**
|
|
32
|
+
- `test_cli_train_sft_vision_qwen2vl` - Full SFT training job submission
|
|
33
|
+
- `test_vision_sft_dataset_validation` - Dataset validation with mixed valid/invalid
|
|
34
|
+
- `test_cli_train_sft_vision_small_config` - Fast CI test with artifact config
|
|
35
|
+
|
|
36
|
+
**What They Test:**
|
|
37
|
+
- ✅ Vision SFT dataset creation with images
|
|
38
|
+
- ✅ Job submission for vision SFT training
|
|
39
|
+
- ✅ Backend accepts vision training config
|
|
40
|
+
- ✅ Dataset validation filters invalid examples
|
|
41
|
+
- ✅ LoRA training configuration for vision models
|
|
42
|
+
|
|
43
|
+
## Quick Start
|
|
44
|
+
|
|
45
|
+
### Prerequisites
|
|
46
|
+
```bash
|
|
47
|
+
export SYNTH_API_KEY="your-api-key"
|
|
48
|
+
export BACKEND_BASE_URL="https://agent-learning.onrender.com/api"
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
### Run Inference Tests
|
|
52
|
+
```bash
|
|
53
|
+
cd /Users/joshpurtell/Documents/GitHub/synth-ai
|
|
54
|
+
|
|
55
|
+
# All inference tests
|
|
56
|
+
uv run pytest tests/integration/cli/test_cli_inference_vision.py -v -s
|
|
57
|
+
|
|
58
|
+
# Single test
|
|
59
|
+
uv run pytest tests/integration/cli/test_cli_inference_vision.py::test_vision_inference_with_image -v
|
|
60
|
+
|
|
61
|
+
# With marks
|
|
62
|
+
uv run pytest -m "vision and slow" tests/integration/cli/test_cli_inference_vision.py
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Run SFT Tests
|
|
66
|
+
```bash
|
|
67
|
+
# All SFT tests
|
|
68
|
+
uv run pytest tests/integration/cli/test_cli_train_sft_vision.py -v -s
|
|
69
|
+
|
|
70
|
+
# Dataset validation only (fast)
|
|
71
|
+
uv run pytest tests/integration/cli/test_cli_train_sft_vision.py::test_vision_sft_dataset_validation -v
|
|
72
|
+
|
|
73
|
+
# Small config test (job submission)
|
|
74
|
+
uv run pytest tests/integration/cli/test_cli_train_sft_vision.py::test_cli_train_sft_vision_small_config -v
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Run All Vision Tests
|
|
78
|
+
```bash
|
|
79
|
+
# All vision tests (inference + SFT + RL)
|
|
80
|
+
uv run pytest -m vision -v -s
|
|
81
|
+
|
|
82
|
+
# Vision tests without slow ones
|
|
83
|
+
uv run pytest -m "vision and not slow" -v
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Test Details
|
|
87
|
+
|
|
88
|
+
### Inference Test 1: Basic Vision Inference
|
|
89
|
+
**Function:** `test_vision_inference_with_image`
|
|
90
|
+
|
|
91
|
+
**Creates:**
|
|
92
|
+
- Simple 64x64 red image (base64 encoded)
|
|
93
|
+
- Multimodal request with text + image
|
|
94
|
+
- POST to `/v1/chat/completions`
|
|
95
|
+
|
|
96
|
+
**Validates:**
|
|
97
|
+
- Response has `choices` array
|
|
98
|
+
- Choice has `message` with `content`
|
|
99
|
+
- Content is non-empty string
|
|
100
|
+
|
|
101
|
+
**Expected Output:**
|
|
102
|
+
```
|
|
103
|
+
✅ Vision inference successful
|
|
104
|
+
Model: Qwen/Qwen2-VL-2B-Instruct
|
|
105
|
+
Response: This image is red...
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
**Runtime:** ~10-20 seconds (depends on model loading)
|
|
109
|
+
|
|
110
|
+
### Inference Test 2: Validation
|
|
111
|
+
**Function:** `test_vision_inference_validation`
|
|
112
|
+
|
|
113
|
+
**Tests Invalid Requests:**
|
|
114
|
+
1. Empty image URL: `{"url": ""}`
|
|
115
|
+
2. Missing URL field: `{"image_url": {}}`
|
|
116
|
+
3. Whitespace URL: `{"url": " "}`
|
|
117
|
+
|
|
118
|
+
**Validates:**
|
|
119
|
+
- Backend returns 4xx error (validation failure)
|
|
120
|
+
- Error message indicates the problem
|
|
121
|
+
- No wasted inference on invalid data
|
|
122
|
+
|
|
123
|
+
**Expected Output:**
|
|
124
|
+
```
|
|
125
|
+
✅ Correctly rejected: Empty image URL
|
|
126
|
+
Error code: 400
|
|
127
|
+
Error message: Image URL cannot be empty...
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Inference Test 3: Multiple Images
|
|
131
|
+
**Function:** `test_vision_inference_multiple_images`
|
|
132
|
+
|
|
133
|
+
**Creates:**
|
|
134
|
+
- Red and blue test images
|
|
135
|
+
- Single message with 2 images
|
|
136
|
+
|
|
137
|
+
**Validates:**
|
|
138
|
+
- Backend handles multiple images
|
|
139
|
+
- Model processes both images
|
|
140
|
+
- Response mentions both colors (if model supports)
|
|
141
|
+
|
|
142
|
+
**Note:** May skip if model doesn't support multiple images per message.
|
|
143
|
+
|
|
144
|
+
### SFT Test 1: Full Training Job
|
|
145
|
+
**Function:** `test_cli_train_sft_vision_qwen2vl`
|
|
146
|
+
|
|
147
|
+
**Creates:**
|
|
148
|
+
- 3-example vision SFT dataset (JSONL)
|
|
149
|
+
- Each example has 1 image (base64 in data URL)
|
|
150
|
+
- Minimal training config (1 epoch, LoRA)
|
|
151
|
+
|
|
152
|
+
**Submits:**
|
|
153
|
+
- SFT training job via CLI
|
|
154
|
+
- Model: Qwen2-VL-2B-Instruct
|
|
155
|
+
- Config includes `supports_vision = true`
|
|
156
|
+
|
|
157
|
+
**Validates:**
|
|
158
|
+
- Job created successfully
|
|
159
|
+
- Job ID returned
|
|
160
|
+
- Config accepted by backend
|
|
161
|
+
|
|
162
|
+
**Expected Output:**
|
|
163
|
+
```
|
|
164
|
+
✅ Vision SFT job created: job-abc123
|
|
165
|
+
Model: Qwen2-VL-2B-Instruct
|
|
166
|
+
Dataset: /tmp/.../vision_sft_test.jsonl
|
|
167
|
+
Examples: 3 (with images)
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
**Runtime:** ~30-60 seconds (job submission only, not training)
|
|
171
|
+
|
|
172
|
+
### SFT Test 2: Dataset Validation
|
|
173
|
+
**Function:** `test_vision_sft_dataset_validation`
|
|
174
|
+
|
|
175
|
+
**Creates:**
|
|
176
|
+
- 4-example dataset (2 valid, 2 invalid)
|
|
177
|
+
- Invalid examples have empty/missing URLs
|
|
178
|
+
|
|
179
|
+
**Validates:**
|
|
180
|
+
- SDK validation correctly identifies valid examples
|
|
181
|
+
- Invalid examples are flagged with specific errors
|
|
182
|
+
- No false positives or negatives
|
|
183
|
+
|
|
184
|
+
**Expected Output:**
|
|
185
|
+
```
|
|
186
|
+
✅ Example 0: Valid
|
|
187
|
+
❌ Example 1: Invalid - Has 1 image_url entries but only 0 valid URLs
|
|
188
|
+
❌ Example 2: Invalid - Has 1 image_url entries but only 0 valid URLs
|
|
189
|
+
✅ Example 3: Valid
|
|
190
|
+
|
|
191
|
+
✅ Dataset validation working correctly
|
|
192
|
+
Total examples: 4
|
|
193
|
+
Valid: 2
|
|
194
|
+
Invalid: 2
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
**Runtime:** ~1-2 seconds (pure validation, no network)
|
|
198
|
+
|
|
199
|
+
### SFT Test 3: Fast CI Test
|
|
200
|
+
**Function:** `test_cli_train_sft_vision_small_config`
|
|
201
|
+
|
|
202
|
+
**Uses:**
|
|
203
|
+
- Artifact config (`tests/artifacts/configs/sft.vision.small.toml`)
|
|
204
|
+
- Minimal settings for fast validation
|
|
205
|
+
|
|
206
|
+
**Validates:**
|
|
207
|
+
- Same as Test 1 but faster
|
|
208
|
+
- Config artifact is correct
|
|
209
|
+
|
|
210
|
+
**Runtime:** ~20-40 seconds
|
|
211
|
+
|
|
212
|
+
## Dataset Format
|
|
213
|
+
|
|
214
|
+
### Vision SFT Example
|
|
215
|
+
```json
|
|
216
|
+
{
|
|
217
|
+
"messages": [
|
|
218
|
+
{
|
|
219
|
+
"role": "user",
|
|
220
|
+
"content": [
|
|
221
|
+
{"type": "text", "text": "What color is this?"},
|
|
222
|
+
{
|
|
223
|
+
"type": "image_url",
|
|
224
|
+
"image_url": {
|
|
225
|
+
"url": "..."
|
|
226
|
+
}
|
|
227
|
+
}
|
|
228
|
+
]
|
|
229
|
+
},
|
|
230
|
+
{
|
|
231
|
+
"role": "assistant",
|
|
232
|
+
"content": "This image is red."
|
|
233
|
+
}
|
|
234
|
+
],
|
|
235
|
+
"metadata": {"example_id": 1}
|
|
236
|
+
}
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
### Supported Image Formats
|
|
240
|
+
- **Data URLs:** `data:image/png;base64,<base64-data>`
|
|
241
|
+
- **HTTP URLs:** `https://example.com/image.jpg`
|
|
242
|
+
- **Local paths:** `/path/to/image.png` (converted to PIL Image)
|
|
243
|
+
|
|
244
|
+
### Validation Rules
|
|
245
|
+
✅ **Valid:**
|
|
246
|
+
- Non-empty URL string
|
|
247
|
+
- Valid scheme (http://, https://, data:image/)
|
|
248
|
+
- Properly formatted base64 (if data URL)
|
|
249
|
+
|
|
250
|
+
❌ **Invalid:**
|
|
251
|
+
- Empty string: `""`
|
|
252
|
+
- Whitespace only: `" "`
|
|
253
|
+
- Null value: `None` or `null`
|
|
254
|
+
- Missing URL field
|
|
255
|
+
- Non-string URL
|
|
256
|
+
|
|
257
|
+
## Integration with Other Tests
|
|
258
|
+
|
|
259
|
+
### Combined with RL Vision Tests
|
|
260
|
+
```bash
|
|
261
|
+
# All vision tests (inference + SFT + RL)
|
|
262
|
+
uv run pytest -m vision tests/integration/cli/ -v
|
|
263
|
+
|
|
264
|
+
# Specific pipeline
|
|
265
|
+
uv run pytest \
|
|
266
|
+
tests/integration/cli/test_cli_inference_vision.py \
|
|
267
|
+
tests/integration/cli/test_cli_train_sft_vision.py \
|
|
268
|
+
tests/integration/cli/test_cli_train_rl_vision.py \
|
|
269
|
+
-v -s
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
### Test Matrix
|
|
273
|
+
|
|
274
|
+
| Test Suite | Model | Data | Runtime | Purpose |
|
|
275
|
+
|------------|-------|------|---------|---------|
|
|
276
|
+
| Inference | Qwen2-VL-2B | Generated | ~20s | API validation |
|
|
277
|
+
| SFT | Qwen2-VL-2B | Generated | ~30s | Training job |
|
|
278
|
+
| RL | Qwen3-VL-4B | Task app | ~5-10min | Full pipeline |
|
|
279
|
+
|
|
280
|
+
## Troubleshooting
|
|
281
|
+
|
|
282
|
+
### Inference Test Fails
|
|
283
|
+
```bash
|
|
284
|
+
# Check backend connectivity
|
|
285
|
+
curl $BACKEND_BASE_URL/health
|
|
286
|
+
|
|
287
|
+
# Check API key
|
|
288
|
+
echo $SYNTH_API_KEY
|
|
289
|
+
|
|
290
|
+
# Verify model is available
|
|
291
|
+
curl -H "Authorization: Bearer $SYNTH_API_KEY" \
|
|
292
|
+
$BACKEND_BASE_URL/v1/models
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### SFT Test Fails
|
|
296
|
+
```bash
|
|
297
|
+
# Check dataset was created
|
|
298
|
+
cat /tmp/test_sft_vision/vision_sft_test.jsonl
|
|
299
|
+
|
|
300
|
+
# Validate dataset manually
|
|
301
|
+
python -c "
|
|
302
|
+
from synth_ai.learning.sft.data import load_jsonl, validate_vision_example
|
|
303
|
+
examples = load_jsonl('path/to/dataset.jsonl', min_messages=1)
|
|
304
|
+
for ex in examples:
|
|
305
|
+
is_valid, error = validate_vision_example(ex, require_images=True)
|
|
306
|
+
print(f'Valid: {is_valid}, Error: {error}')
|
|
307
|
+
"
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
### PIL Not Available
|
|
311
|
+
```bash
|
|
312
|
+
# Install Pillow
|
|
313
|
+
uv pip install Pillow
|
|
314
|
+
|
|
315
|
+
# Or use conda
|
|
316
|
+
conda install pillow
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
### Image Too Large
|
|
320
|
+
```python
|
|
321
|
+
# Reduce image size in test
|
|
322
|
+
img = Image.new('RGB', (32, 32), color='red') # 32x32 instead of 64x64
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
## CI Integration
|
|
326
|
+
|
|
327
|
+
### Pytest Marks
|
|
328
|
+
```python
|
|
329
|
+
@pytest.mark.slow # Takes >5 seconds
|
|
330
|
+
@pytest.mark.vision # Requires vision support
|
|
331
|
+
@pytest.mark.integration # Full integration test
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
### Run in CI
|
|
335
|
+
```yaml
|
|
336
|
+
# .github/workflows/test.yml
|
|
337
|
+
- name: Run vision integration tests
|
|
338
|
+
run: |
|
|
339
|
+
pytest -m "vision and integration" \
|
|
340
|
+
tests/integration/cli/test_cli_inference_vision.py \
|
|
341
|
+
tests/integration/cli/test_cli_train_sft_vision.py \
|
|
342
|
+
-v --tb=short
|
|
343
|
+
env:
|
|
344
|
+
SYNTH_API_KEY: ${{ secrets.SYNTH_API_KEY }}
|
|
345
|
+
BACKEND_BASE_URL: ${{ secrets.BACKEND_URL }}
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
### Skip in Fast CI
|
|
349
|
+
```bash
|
|
350
|
+
# Skip slow tests for PR checks
|
|
351
|
+
pytest -m "not slow" tests/
|
|
352
|
+
|
|
353
|
+
# Include vision but skip slow
|
|
354
|
+
pytest -m "vision and not slow" tests/
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
## Performance Expectations
|
|
358
|
+
|
|
359
|
+
### Inference Tests
|
|
360
|
+
- **test_vision_inference_with_image:** 10-20s
|
|
361
|
+
- **test_vision_inference_validation:** 5-10s (3 requests)
|
|
362
|
+
- **test_vision_inference_multiple_images:** 15-25s
|
|
363
|
+
|
|
364
|
+
**Total:** ~30-55 seconds
|
|
365
|
+
|
|
366
|
+
### SFT Tests
|
|
367
|
+
- **test_vision_sft_dataset_validation:** 1-2s (local only)
|
|
368
|
+
- **test_cli_train_sft_vision_small_config:** 20-40s
|
|
369
|
+
- **test_cli_train_sft_vision_qwen2vl:** 30-60s
|
|
370
|
+
|
|
371
|
+
**Total:** ~50-100 seconds
|
|
372
|
+
|
|
373
|
+
### All Vision Tests (Inference + SFT + RL)
|
|
374
|
+
- **Total Runtime:** ~6-12 minutes
|
|
375
|
+
- **Network calls:** ~10-15
|
|
376
|
+
- **GPU time:** 0 (job submission only, not actual training)
|
|
377
|
+
|
|
378
|
+
## Related Documentation
|
|
379
|
+
|
|
380
|
+
- **RL Vision Tests:** `RL_VISION_TESTING.md`
|
|
381
|
+
- **Image Validation:** `IMAGE_VALIDATION_COMPLETE.md`
|
|
382
|
+
- **VLM Pipeline:** `VLM_PIPELINE_COMPLETE.md`
|
|
383
|
+
- **Quick Start:** `QUICKSTART_RL_VISION.md`
|
|
384
|
+
|
|
385
|
+
## Summary
|
|
386
|
+
|
|
387
|
+
✅ **Complete test coverage for vision ML pipeline:**
|
|
388
|
+
- Inference API with multimodal messages
|
|
389
|
+
- Image validation before inference
|
|
390
|
+
- SFT dataset creation and validation
|
|
391
|
+
- SFT training job submission
|
|
392
|
+
- Integration with existing RL vision tests
|
|
393
|
+
|
|
394
|
+
**Test Count:**
|
|
395
|
+
- Inference: 3 tests
|
|
396
|
+
- SFT: 3 tests
|
|
397
|
+
- RL: 3 tests (from previous work)
|
|
398
|
+
- **Total: 9 vision integration tests**
|
|
399
|
+
|
|
400
|
+
**Coverage:**
|
|
401
|
+
- ✅ End-to-end inference
|
|
402
|
+
- ✅ Request validation
|
|
403
|
+
- ✅ Dataset creation
|
|
404
|
+
- ✅ Dataset validation
|
|
405
|
+
- ✅ SFT job submission
|
|
406
|
+
- ✅ RL job submission
|
|
407
|
+
- ✅ Task app vision support
|
|
408
|
+
|
|
409
|
+
---
|
|
410
|
+
|
|
411
|
+
**Status:** Production-ready! Run `pytest -m vision -v` to validate the full vision ML pipeline from inference to RL training! 🎉
|
|
412
|
+
|