synth-ai 0.2.17__py3-none-any.whl → 0.2.19__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of synth-ai might be problematic. Click here for more details.
- examples/baseline/banking77_baseline.py +204 -0
- examples/baseline/crafter_baseline.py +407 -0
- examples/baseline/pokemon_red_baseline.py +326 -0
- examples/baseline/simple_baseline.py +56 -0
- examples/baseline/warming_up_to_rl_baseline.py +239 -0
- examples/blog_posts/gepa/README.md +355 -0
- examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
- examples/blog_posts/gepa/configs/banking77_gepa_test.toml +82 -0
- examples/blog_posts/gepa/configs/banking77_mipro_local.toml +52 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +59 -0
- examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +36 -0
- examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +53 -0
- examples/blog_posts/gepa/configs/hover_gepa_local.toml +59 -0
- examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +36 -0
- examples/blog_posts/gepa/configs/hover_mipro_local.toml +53 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +59 -0
- examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +36 -0
- examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +53 -0
- examples/blog_posts/gepa/configs/pupa_gepa_local.toml +60 -0
- examples/blog_posts/gepa/configs/pupa_mipro_local.toml +54 -0
- examples/blog_posts/gepa/deploy_banking77_task_app.sh +41 -0
- examples/blog_posts/gepa/gepa_baseline.py +204 -0
- examples/blog_posts/gepa/query_prompts_example.py +97 -0
- examples/blog_posts/gepa/run_gepa_banking77.sh +87 -0
- examples/blog_posts/gepa/task_apps.py +105 -0
- examples/blog_posts/gepa/test_gepa_local.sh +67 -0
- examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
- examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
- examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +12 -10
- examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +1 -0
- examples/blog_posts/pokemon_vl/extract_images.py +239 -0
- examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
- examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
- examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
- examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
- examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
- examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
- examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
- examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
- examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
- examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +1 -1
- examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
- examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +60 -10
- examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +1 -1
- examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
- examples/multi_step/configs/VERILOG_REWARDS.md +4 -0
- examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +4 -0
- examples/multi_step/configs/crafter_rl_outcome.toml +1 -0
- examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +1 -0
- examples/multi_step/configs/crafter_rl_stepwise_simple.toml +1 -0
- examples/rl/configs/rl_from_base_qwen17.toml +1 -0
- examples/swe/task_app/hosted/inference/openai_client.py +0 -34
- examples/swe/task_app/hosted/policy_routes.py +17 -0
- examples/swe/task_app/hosted/rollout.py +4 -2
- examples/task_apps/banking77/__init__.py +6 -0
- examples/task_apps/banking77/banking77_task_app.py +841 -0
- examples/task_apps/banking77/deploy_wrapper.py +46 -0
- examples/task_apps/crafter/CREATE_SFT_DATASET.md +4 -0
- examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +4 -0
- examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +4 -0
- examples/task_apps/crafter/task_app/grpo_crafter.py +24 -2
- examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +49 -0
- examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +355 -58
- examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +68 -7
- examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +78 -21
- examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +194 -1
- examples/task_apps/gepa_benchmarks/__init__.py +7 -0
- examples/task_apps/gepa_benchmarks/common.py +260 -0
- examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
- examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
- examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
- examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
- examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +4 -0
- examples/task_apps/pokemon_red/task_app.py +254 -36
- examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +1 -0
- examples/warming_up_to_rl/task_app/grpo_crafter.py +53 -4
- examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +49 -0
- examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +152 -41
- examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +31 -1
- examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +33 -3
- examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +67 -0
- examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +1 -0
- synth_ai/api/train/builders.py +90 -1
- synth_ai/api/train/cli.py +396 -21
- synth_ai/api/train/config_finder.py +13 -2
- synth_ai/api/train/configs/__init__.py +15 -1
- synth_ai/api/train/configs/prompt_learning.py +442 -0
- synth_ai/api/train/configs/rl.py +29 -0
- synth_ai/api/train/task_app.py +1 -1
- synth_ai/api/train/validators.py +277 -0
- synth_ai/baseline/__init__.py +25 -0
- synth_ai/baseline/config.py +209 -0
- synth_ai/baseline/discovery.py +214 -0
- synth_ai/baseline/execution.py +146 -0
- synth_ai/cli/__init__.py +85 -17
- synth_ai/cli/__main__.py +0 -0
- synth_ai/cli/claude.py +70 -0
- synth_ai/cli/codex.py +84 -0
- synth_ai/cli/commands/__init__.py +1 -0
- synth_ai/cli/commands/baseline/__init__.py +12 -0
- synth_ai/cli/commands/baseline/core.py +637 -0
- synth_ai/cli/commands/baseline/list.py +93 -0
- synth_ai/cli/commands/eval/core.py +13 -10
- synth_ai/cli/commands/filter/core.py +53 -17
- synth_ai/cli/commands/help/core.py +0 -1
- synth_ai/cli/commands/smoke/__init__.py +7 -0
- synth_ai/cli/commands/smoke/core.py +1436 -0
- synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
- synth_ai/cli/commands/status/subcommands/usage.py +203 -0
- synth_ai/cli/commands/train/judge_schemas.py +1 -0
- synth_ai/cli/commands/train/judge_validation.py +1 -0
- synth_ai/cli/commands/train/validation.py +0 -57
- synth_ai/cli/demo.py +35 -3
- synth_ai/cli/deploy/__init__.py +40 -25
- synth_ai/cli/deploy.py +162 -0
- synth_ai/cli/legacy_root_backup.py +14 -8
- synth_ai/cli/opencode.py +107 -0
- synth_ai/cli/root.py +9 -5
- synth_ai/cli/task_app_deploy.py +1 -1
- synth_ai/cli/task_apps.py +53 -53
- synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
- synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
- synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
- synth_ai/judge_schemas.py +1 -0
- synth_ai/learning/__init__.py +10 -0
- synth_ai/learning/prompt_learning_client.py +276 -0
- synth_ai/learning/prompt_learning_types.py +184 -0
- synth_ai/pricing/__init__.py +2 -0
- synth_ai/pricing/model_pricing.py +57 -0
- synth_ai/streaming/handlers.py +53 -4
- synth_ai/streaming/streamer.py +19 -0
- synth_ai/task/apps/__init__.py +1 -0
- synth_ai/task/config.py +2 -0
- synth_ai/task/tracing_utils.py +25 -25
- synth_ai/task/validators.py +44 -8
- synth_ai/task_app_cfgs.py +21 -0
- synth_ai/tracing_v3/config.py +162 -19
- synth_ai/tracing_v3/constants.py +1 -1
- synth_ai/tracing_v3/db_config.py +24 -38
- synth_ai/tracing_v3/storage/config.py +47 -13
- synth_ai/tracing_v3/storage/factory.py +3 -3
- synth_ai/tracing_v3/turso/daemon.py +113 -11
- synth_ai/tracing_v3/turso/native_manager.py +92 -16
- synth_ai/types.py +8 -0
- synth_ai/urls.py +11 -0
- synth_ai/utils/__init__.py +30 -1
- synth_ai/utils/agents.py +74 -0
- synth_ai/utils/bin.py +39 -0
- synth_ai/utils/cli.py +149 -5
- synth_ai/utils/env.py +17 -17
- synth_ai/utils/json.py +72 -0
- synth_ai/utils/modal.py +283 -1
- synth_ai/utils/paths.py +48 -0
- synth_ai/utils/uvicorn.py +113 -0
- {synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/METADATA +102 -4
- {synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/RECORD +162 -88
- synth_ai/cli/commands/deploy/__init__.py +0 -23
- synth_ai/cli/commands/deploy/core.py +0 -614
- synth_ai/cli/commands/deploy/errors.py +0 -72
- synth_ai/cli/commands/deploy/validation.py +0 -11
- synth_ai/cli/deploy/core.py +0 -5
- synth_ai/cli/deploy/errors.py +0 -23
- synth_ai/cli/deploy/validation.py +0 -5
- {synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/WHEEL +0 -0
- {synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/entry_points.txt +0 -0
- {synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/licenses/LICENSE +0 -0
- {synth_ai-0.2.17.dist-info → synth_ai-0.2.19.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,195 @@
|
|
|
1
|
+
# Smoke Test Architecture
|
|
2
|
+
|
|
3
|
+
This document explains how the smoke test works internally, for future maintenance and debugging.
|
|
4
|
+
|
|
5
|
+
## Component Overview
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
9
|
+
│ synth-ai smoke command │
|
|
10
|
+
│ (synth_ai/cli/commands/smoke/core.py) │
|
|
11
|
+
└────────────┬────────────────────────────────────────────────────┘
|
|
12
|
+
│
|
|
13
|
+
├─► Auto-start sqld (optional)
|
|
14
|
+
│ ├─ Kill existing process on ports 8080/8081
|
|
15
|
+
│ ├─ Start: sqld --db-path ... --hrana-listen-addr ... --http-listen-addr ...
|
|
16
|
+
│ └─ Health check: GET http://127.0.0.1:8081/health
|
|
17
|
+
│
|
|
18
|
+
├─► Auto-start task app (optional)
|
|
19
|
+
│ ├─ Kill existing process on port 8765
|
|
20
|
+
│ ├─ Start: nohup uvx synth-ai task-app serve ... (from synth-ai root)
|
|
21
|
+
│ ├─ Health check: GET http://localhost:8765/health (accepts 200 or 400)
|
|
22
|
+
│ └─ Output: nohup_task_app.out
|
|
23
|
+
│
|
|
24
|
+
├─► Start mock RL trainer (if use_mock=true)
|
|
25
|
+
│ ├─ MockRLTrainer(port=0, backend="openai")
|
|
26
|
+
│ ├─ Forwards requests to OpenAI API
|
|
27
|
+
│ └─ Logs: [mock-rl] ← request / → response
|
|
28
|
+
│
|
|
29
|
+
└─► Execute rollout
|
|
30
|
+
├─ POST /rollout to task app
|
|
31
|
+
├─ Capture response with v3 trace
|
|
32
|
+
└─ Extract and display tool calls
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Key Implementation Details
|
|
37
|
+
|
|
38
|
+
### 1. Tool Call Extraction
|
|
39
|
+
|
|
40
|
+
**Location:** `synth_ai/cli/commands/smoke/core.py` lines ~946-1005
|
|
41
|
+
|
|
42
|
+
**How it works:**
|
|
43
|
+
1. Request rollout with `return_trace=True` and `trace_format="structured"`
|
|
44
|
+
2. Response includes `trace.event_history[]` - list of policy and environment events
|
|
45
|
+
3. Policy events have `call_records[]` containing LLM call metadata
|
|
46
|
+
4. Each `call_record` has `output_tool_calls[]` with tool call details
|
|
47
|
+
5. Extract `name` and `arguments_json` from each tool call
|
|
48
|
+
6. Display formatted tool calls to user
|
|
49
|
+
|
|
50
|
+
**Data structure:**
|
|
51
|
+
```python
|
|
52
|
+
response.trace = {
|
|
53
|
+
"event_history": [
|
|
54
|
+
{
|
|
55
|
+
"call_records": [ # Present in policy events
|
|
56
|
+
{
|
|
57
|
+
"output_tool_calls": [
|
|
58
|
+
{
|
|
59
|
+
"name": "interact_many",
|
|
60
|
+
"arguments_json": '{"actions":["move_up","move_up"]}',
|
|
61
|
+
"call_id": "call_xyz",
|
|
62
|
+
"index": 0
|
|
63
|
+
}
|
|
64
|
+
],
|
|
65
|
+
"model_name": "gpt-4o-mini",
|
|
66
|
+
"provider": "openai",
|
|
67
|
+
...
|
|
68
|
+
}
|
|
69
|
+
],
|
|
70
|
+
"metadata": {...},
|
|
71
|
+
...
|
|
72
|
+
},
|
|
73
|
+
{
|
|
74
|
+
# Environment step event (no call_records)
|
|
75
|
+
"reward": 1.0,
|
|
76
|
+
"terminated": false,
|
|
77
|
+
...
|
|
78
|
+
},
|
|
79
|
+
...
|
|
80
|
+
],
|
|
81
|
+
"session_id": "...",
|
|
82
|
+
"markov_blanket_message_history": [...],
|
|
83
|
+
...
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### 2. Background Service Management
|
|
88
|
+
|
|
89
|
+
**Task App Startup:**
|
|
90
|
+
- Must run from synth-ai root for task app discovery
|
|
91
|
+
- Uses `nohup` to detach process
|
|
92
|
+
- Redirects output to `nohup_task_app.out`
|
|
93
|
+
- Polls `/health` endpoint (accepts 200 or 400 status)
|
|
94
|
+
- Timeout: 120 seconds with progress updates every 5 seconds
|
|
95
|
+
- Propagates `SYNTH_QUIET=1` to suppress diagnostic messages
|
|
96
|
+
|
|
97
|
+
**sqld Startup:**
|
|
98
|
+
- Starts with Hrana WebSocket (8080) and HTTP (8081) ports
|
|
99
|
+
- Polls `/health` endpoint for readiness
|
|
100
|
+
- Timeout: 30 seconds
|
|
101
|
+
|
|
102
|
+
**Port Cleanup:**
|
|
103
|
+
- Uses `lsof -ti :PORT` to find PIDs
|
|
104
|
+
- Kills processes with `kill -9 PID`
|
|
105
|
+
- Waits 2 seconds for port release
|
|
106
|
+
|
|
107
|
+
### 3. Mock RL Trainer
|
|
108
|
+
|
|
109
|
+
The mock trainer (`MockRLTrainer`) acts as a proxy:
|
|
110
|
+
- `backend="synthetic"`: Generates fake tool calls deterministically
|
|
111
|
+
- `backend="openai"`: Forwards to real OpenAI API
|
|
112
|
+
- Logs all requests/responses with `[mock-rl]` prefix
|
|
113
|
+
- Auto-assigns port if `port=0`
|
|
114
|
+
|
|
115
|
+
### 4. Diagnostic Message Suppression
|
|
116
|
+
|
|
117
|
+
**Permanently disabled (commented out):**
|
|
118
|
+
- `synth_ai/tracing_v3/config.py`: `[TRACING_V3_CONFIG_LOADED]` message
|
|
119
|
+
- `synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py`: All `[PATCH]` messages
|
|
120
|
+
- `synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py`: All `[PATCH]` messages
|
|
121
|
+
- `synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py`: All `[PATCH]` messages
|
|
122
|
+
|
|
123
|
+
**Reason:** These messages add noise to smoke test output. They're still in the code as comments for documentation.
|
|
124
|
+
|
|
125
|
+
## Troubleshooting Guide
|
|
126
|
+
|
|
127
|
+
### No tool calls displayed
|
|
128
|
+
|
|
129
|
+
**Symptom:** Output shows `⚠ No tool calls found in trace`
|
|
130
|
+
|
|
131
|
+
**Causes:**
|
|
132
|
+
1. `return_trace=false` in config - **FIX:** Set `return_trace = true`
|
|
133
|
+
2. Trace format mismatch - Check `response.trace.event_history` structure
|
|
134
|
+
3. No LLM calls made - Check for policy errors in task app logs
|
|
135
|
+
|
|
136
|
+
**Debug:**
|
|
137
|
+
```bash
|
|
138
|
+
# Check task app logs
|
|
139
|
+
cat /path/to/synth-ai/nohup_task_app.out
|
|
140
|
+
|
|
141
|
+
# Verify trace structure
|
|
142
|
+
# Add debug output in core.py around line 978:
|
|
143
|
+
click.echo(f"DEBUG: trace keys: {list(tr.keys())}")
|
|
144
|
+
click.echo(f"DEBUG: event_history length: {len(event_history)}")
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Task app exits immediately
|
|
148
|
+
|
|
149
|
+
**Symptom:** `0 steps` in rollout, task app process not running
|
|
150
|
+
|
|
151
|
+
**Causes:**
|
|
152
|
+
1. Wrong task app name - **FIX:** Use `synth-ai task-app list` to find correct name
|
|
153
|
+
2. Missing .env file - **FIX:** Ensure `task_app_env_file` points to valid .env
|
|
154
|
+
3. Wrong working directory - **FIX:** Task app must be started from synth-ai root
|
|
155
|
+
|
|
156
|
+
**Debug:**
|
|
157
|
+
```bash
|
|
158
|
+
# Manual test
|
|
159
|
+
cd /path/to/synth-ai
|
|
160
|
+
uvx synth-ai task-app serve grpo-crafter --port 8765 --env-file /path/to/.env --force
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Port conflicts
|
|
164
|
+
|
|
165
|
+
**Symptom:** `Address already in use` errors
|
|
166
|
+
|
|
167
|
+
**Fix:** The smoke command auto-kills processes on ports 8080, 8081, 8765. If manual cleanup needed:
|
|
168
|
+
```bash
|
|
169
|
+
lsof -ti :8080 | xargs kill -9
|
|
170
|
+
lsof -ti :8081 | xargs kill -9
|
|
171
|
+
lsof -ti :8765 | xargs kill -9
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
## Future Improvements
|
|
175
|
+
|
|
176
|
+
Potential enhancements for future agents:
|
|
177
|
+
|
|
178
|
+
1. **Streaming tool call display**: Show tool calls as they happen, not just at the end
|
|
179
|
+
2. **Tool call validation**: Verify tool calls match expected format for the environment
|
|
180
|
+
3. **Performance metrics**: Track inference latency per tool call
|
|
181
|
+
4. **Cost tracking**: Display OpenAI API costs for the smoke test
|
|
182
|
+
5. **Parallel rollouts**: Support `--parallel N` to test concurrent execution
|
|
183
|
+
6. **Video/image capture**: For vision-based tasks, save observations
|
|
184
|
+
7. **Interactive mode**: Allow stepping through rollout one action at a time
|
|
185
|
+
|
|
186
|
+
## Related Files
|
|
187
|
+
|
|
188
|
+
- `synth_ai/cli/commands/smoke/core.py` - Main smoke command implementation
|
|
189
|
+
- `synth_ai/api/train/configs/rl.py` - `SmokeConfig` Pydantic model
|
|
190
|
+
- `synth_ai/api/train/builders.py` - Removes `[smoke]` section before sending to trainer
|
|
191
|
+
- `synth_ai/task/contracts.py` - `RolloutResponse` with trace field
|
|
192
|
+
- `examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md` - User-facing documentation
|
|
193
|
+
- `monorepo/docs/cli/smoke.mdx` - Mintlify documentation
|
|
194
|
+
|
|
195
|
+
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# Final Inference Test Results
|
|
2
|
+
|
|
3
|
+
**Date**: Oct 31, 2025
|
|
4
|
+
**Endpoint**: `https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions`
|
|
5
|
+
|
|
6
|
+
## Summary
|
|
7
|
+
|
|
8
|
+
| Model Type | Status | Result |
|
|
9
|
+
|------------|--------|--------|
|
|
10
|
+
| Base Model (Qwen/Qwen3-4B) | ✅ WORKS | Inference successful |
|
|
11
|
+
| PEFT/SFT (Qwen3-0.6B) | ✅ WORKS | Inference successful |
|
|
12
|
+
| RL (Qwen3-4B) | ❌ **BROKEN** | Modal function crashes |
|
|
13
|
+
|
|
14
|
+
## Detailed Results
|
|
15
|
+
|
|
16
|
+
### ✅ Test 1: Base Model (No Fine-Tuning)
|
|
17
|
+
|
|
18
|
+
**Model**: `Qwen/Qwen3-4B`
|
|
19
|
+
|
|
20
|
+
**Result**: **SUCCESS** ✅
|
|
21
|
+
- **Status**: 200 OK
|
|
22
|
+
- **Tokens**: 31 prompt + 100 completion = 131 total
|
|
23
|
+
- **Response**: Generated successfully
|
|
24
|
+
|
|
25
|
+
**Notes**:
|
|
26
|
+
- First attempt returned 303 redirect (cold start)
|
|
27
|
+
- Retry succeeded immediately
|
|
28
|
+
- This confirms the endpoint and auth work correctly
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
### ✅ Test 2: PEFT/SFT Model
|
|
33
|
+
|
|
34
|
+
**Model**: `peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9`
|
|
35
|
+
|
|
36
|
+
**Result**: **SUCCESS** ✅
|
|
37
|
+
- **Status**: 200 OK (consistent across retries)
|
|
38
|
+
- **Tokens**: 31 prompt + 100 completion = 131 total
|
|
39
|
+
- **Response**: "Hello, I am working!" (with thinking tokens)
|
|
40
|
+
|
|
41
|
+
**Notes**:
|
|
42
|
+
- Works reliably
|
|
43
|
+
- No cold start issues
|
|
44
|
+
- This is the expected behavior for all models
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
### ❌ Test 3: RL Model
|
|
49
|
+
|
|
50
|
+
**Model**: `rl:Qwen/Qwen3-4B:job_19a38041c38f96e638c:checkpoint-epoch-1`
|
|
51
|
+
|
|
52
|
+
**Result**: **FAILURE** ❌ - Multiple error modes
|
|
53
|
+
|
|
54
|
+
#### First Attempt:
|
|
55
|
+
```
|
|
56
|
+
Status: 400 Bad Request
|
|
57
|
+
Error: "Device string must not be empty"
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
#### Retry:
|
|
61
|
+
```
|
|
62
|
+
Status: 500 Internal Server Error
|
|
63
|
+
Error: "modal-http: internal error: function was terminated by signal"
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**This is a Modal function crash** - the inference function terminated unexpectedly.
|
|
67
|
+
|
|
68
|
+
#### Cold Start (from Modal logs):
|
|
69
|
+
```
|
|
70
|
+
RuntimeError: Cannot find any model weights with
|
|
71
|
+
'/models/rl/Qwen/Qwen3-4B/job_19a38041c38f96e638c/checkpoint-fixed'
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**Root Cause**: RL checkpoint contains LoRA adapter files (`adapter_config.json`, `adapter_model.safetensors`), but vLLM expects full merged model weights.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
## Conclusion
|
|
79
|
+
|
|
80
|
+
### What Works ✅
|
|
81
|
+
- **Base models**: Standard HuggingFace models load and inference correctly
|
|
82
|
+
- **PEFT/SFT models**: Fine-tuned models with merged weights work perfectly
|
|
83
|
+
|
|
84
|
+
### What's Broken ❌
|
|
85
|
+
- **RL models**: Crash during model loading because:
|
|
86
|
+
1. RL checkpoints are stored as LoRA adapters
|
|
87
|
+
2. vLLM weight loader expects full model weights
|
|
88
|
+
3. Missing merge step causes vLLM to crash
|
|
89
|
+
4. Modal function terminates with signal (crash)
|
|
90
|
+
|
|
91
|
+
### Impact
|
|
92
|
+
- **HIGH SEVERITY**: All RL-trained models cannot be used for inference
|
|
93
|
+
- Users can train RL models but cannot deploy them
|
|
94
|
+
- This blocks the core RL training → inference workflow
|
|
95
|
+
|
|
96
|
+
### Next Steps
|
|
97
|
+
See `monorepo/RL_INFERENCE_BUG.md` for:
|
|
98
|
+
- Detailed root cause analysis
|
|
99
|
+
- Reproduction script
|
|
100
|
+
- Suggested fix (merge LoRA adapters before vLLM loading)
|
|
101
|
+
- Code locations to modify
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Developer Experience Issues Identified
|
|
106
|
+
|
|
107
|
+
### Issue #1: Confusing Error Messages
|
|
108
|
+
- **400 "Device string must not be empty"** - Not helpful, doesn't indicate RL adapter issue
|
|
109
|
+
- **500 "function was terminated by signal"** - Generic crash, no context
|
|
110
|
+
- **Should be**: "RL checkpoint contains adapter files. Merge required for vLLM loading."
|
|
111
|
+
|
|
112
|
+
### Issue #2: Inconsistent Behavior
|
|
113
|
+
- Sometimes returns 303 redirect
|
|
114
|
+
- Sometimes returns 400
|
|
115
|
+
- Sometimes crashes with 500
|
|
116
|
+
- **Should be**: Consistent error message explaining the issue
|
|
117
|
+
|
|
118
|
+
### Issue #3: Not Obvious How to Test Models
|
|
119
|
+
- Had to try 3 different endpoint URLs before finding the right one
|
|
120
|
+
- No documentation on model ID formats
|
|
121
|
+
- **Should be**: `synth-ai inference --model "rl:..." --message "test"` CLI command
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
**Status**: Bug documented and reproduction available.
|
|
126
|
+
**See**: `monorepo/RL_INFERENCE_BUG.md` for full details.
|
|
127
|
+
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# ✅ Inference Success Report
|
|
2
|
+
|
|
3
|
+
**Date**: Oct 31, 2025
|
|
4
|
+
**Models Tested**: Latest SFT and RL models from training
|
|
5
|
+
|
|
6
|
+
## Working Solution
|
|
7
|
+
|
|
8
|
+
### Correct Endpoint
|
|
9
|
+
```
|
|
10
|
+
https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
### SFT/PEFT Models: ✅ WORKING
|
|
14
|
+
|
|
15
|
+
**Model ID**: `peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9`
|
|
16
|
+
|
|
17
|
+
**Test Code**:
|
|
18
|
+
```python
|
|
19
|
+
import httpx
|
|
20
|
+
import os
|
|
21
|
+
|
|
22
|
+
SYNTH_API_KEY = os.getenv("SYNTH_API_KEY")
|
|
23
|
+
url = "https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions"
|
|
24
|
+
|
|
25
|
+
headers = {
|
|
26
|
+
"Authorization": f"Bearer {SYNTH_API_KEY}",
|
|
27
|
+
"Content-Type": "application/json",
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
payload = {
|
|
31
|
+
"model": "peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9",
|
|
32
|
+
"messages": [
|
|
33
|
+
{"role": "system", "content": "You are a helpful assistant."},
|
|
34
|
+
{"role": "user", "content": "Say 'Hello, I am working!' and nothing else."}
|
|
35
|
+
],
|
|
36
|
+
"temperature": 0.2,
|
|
37
|
+
"max_tokens": 100,
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
with httpx.Client(timeout=300.0) as client:
|
|
41
|
+
response = client.post(url, json=payload, headers=headers)
|
|
42
|
+
print(response.json())
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
**Result**:
|
|
46
|
+
- ✅ Status: 200 OK
|
|
47
|
+
- ✅ Response generated successfully
|
|
48
|
+
- ✅ Token usage tracked: 31 prompt + 72 completion = 103 total
|
|
49
|
+
- ✅ Output: "Hello, I am working!" (with thinking tokens as expected)
|
|
50
|
+
|
|
51
|
+
### RL Models: ⚠️ NEEDS PROMOTION
|
|
52
|
+
|
|
53
|
+
**Model ID**: `rl:Qwen/Qwen3-4B:job_19a38041c38f96e638c:checkpoint-epoch-1`
|
|
54
|
+
|
|
55
|
+
**Status**: 303 Redirect (empty response)
|
|
56
|
+
|
|
57
|
+
**Root Cause**:
|
|
58
|
+
From monorepo backend code inspection, RL checkpoints require a "promotion" step to be loaded onto Modal before they can be used for inference. The direct Modal endpoint returns a redirect for unpromoted RL models.
|
|
59
|
+
|
|
60
|
+
**Solution Options**:
|
|
61
|
+
|
|
62
|
+
#### Option 1: Use Backend Proxy (Recommended)
|
|
63
|
+
The backend automatically handles RL promotion:
|
|
64
|
+
```python
|
|
65
|
+
# Use backend proxy instead of direct Modal
|
|
66
|
+
url = "https://your-backend.example.com/api/chat/completions"
|
|
67
|
+
# Backend will auto-promote and route to vLLM
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
#### Option 2: Manual Promotion (Advanced)
|
|
71
|
+
1. Call promotion endpoint first
|
|
72
|
+
2. Wait for model to load onto Modal
|
|
73
|
+
3. Then call inference endpoint
|
|
74
|
+
|
|
75
|
+
## Key Learnings
|
|
76
|
+
|
|
77
|
+
### What We Got Wrong Initially:
|
|
78
|
+
1. ❌ Wrong endpoint path: Used `/v1/chat/completions` → should be `/chat/completions`
|
|
79
|
+
2. ❌ Wrong base URL: Used render.com URL → should be Modal URL
|
|
80
|
+
3. ❌ Assumed RL = PEFT workflow → RL needs promotion step
|
|
81
|
+
|
|
82
|
+
### What We Got Right:
|
|
83
|
+
1. ✅ Model ID format from `synth-ai status models list`
|
|
84
|
+
2. ✅ Using SYNTH_API_KEY for auth
|
|
85
|
+
3. ✅ Bearer token authorization header
|
|
86
|
+
|
|
87
|
+
## Recommendations for Library Improvement
|
|
88
|
+
|
|
89
|
+
### 1. Add Simple CLI Command
|
|
90
|
+
```bash
|
|
91
|
+
synth-ai inference \
|
|
92
|
+
--model "peft:Qwen/Qwen3-0.6B:job_xxx" \
|
|
93
|
+
--message "Hello" \
|
|
94
|
+
--max-tokens 100
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### 2. Document Endpoint in Model Status
|
|
98
|
+
```bash
|
|
99
|
+
$ synth-ai status models get "peft:..."
|
|
100
|
+
Model: peft:Qwen/Qwen3-0.6B:job_xxx
|
|
101
|
+
Status: succeeded
|
|
102
|
+
Inference Endpoint: https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions
|
|
103
|
+
Ready: ✅ Yes (use directly)
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### 3. Add Python SDK Example
|
|
107
|
+
```python
|
|
108
|
+
from synth_ai import InferenceClient
|
|
109
|
+
|
|
110
|
+
client = InferenceClient(api_key=os.getenv("SYNTH_API_KEY"))
|
|
111
|
+
response = client.chat.completions.create(
|
|
112
|
+
model="peft:Qwen/Qwen3-0.6B:job_xxx",
|
|
113
|
+
messages=[{"role": "user", "content": "Hello"}]
|
|
114
|
+
)
|
|
115
|
+
print(response.choices[0].message.content)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### 4. Clear Error Messages
|
|
119
|
+
- 303 → "RL model needs promotion. Use backend proxy or call /promote endpoint first."
|
|
120
|
+
- 404 → "Model not found. Check model ID with: synth-ai status models list"
|
|
121
|
+
|
|
122
|
+
## Success Criteria Met
|
|
123
|
+
|
|
124
|
+
- ✅ Can get model ID from CLI
|
|
125
|
+
- ✅ Know correct endpoint
|
|
126
|
+
- ✅ Know correct auth (SYNTH_API_KEY)
|
|
127
|
+
- ✅ Can send test message
|
|
128
|
+
- ✅ Get response back
|
|
129
|
+
- ⚠️ RL models need extra step (documented)
|
|
130
|
+
|
|
131
|
+
**Status**: PEFT/SFT inference is fully working! RL needs backend proxy.
|
|
132
|
+
|
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
# Smoke Testing Your Task App
|
|
2
|
+
|
|
3
|
+
This guide shows how to quickly test your task app using the `synth-ai smoke` command with auto-start features.
|
|
4
|
+
|
|
5
|
+
## Quick Start
|
|
6
|
+
|
|
7
|
+
The easiest way to smoke test is using the `[smoke]` section in your RL config:
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
cd examples/blog_posts/warming_up_to_rl
|
|
11
|
+
uv run synth-ai smoke --config configs/smoke_test.toml
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
**That's it!** The smoke command will:
|
|
15
|
+
1. ✅ Auto-start sqld server for tracing (if `sqld_auto_start = true`)
|
|
16
|
+
2. ✅ Auto-start your task app on port 8765 (if `task_app_name` is set)
|
|
17
|
+
3. ✅ Run 10 rollout steps with `gpt-5-nano` using synthetic mocking
|
|
18
|
+
4. ✅ Automatically stop all background services when done
|
|
19
|
+
|
|
20
|
+
**Expected output:**
|
|
21
|
+
```
|
|
22
|
+
[smoke] sqld ready
|
|
23
|
+
[smoke] Task app ready at http://localhost:8765 (status=400)
|
|
24
|
+
[mock-rl] server ready http://127.0.0.1:51798 backend=synthetic
|
|
25
|
+
>> POST /rollout run_id=smoke-... env=crafter policy=crafter-react
|
|
26
|
+
[mock-rl] ← request backend=synthetic model=gpt-5-nano messages=2
|
|
27
|
+
[mock-rl] → response tool_calls=1 backend=synthetic
|
|
28
|
+
rollout[0:0] episodes=1 steps=10 mean_return=1.0000
|
|
29
|
+
✓ Smoke rollouts complete
|
|
30
|
+
successes=1/1 total_steps=10 v3_traces=1/1 nonzero_returns=1/1
|
|
31
|
+
[smoke] Background services stopped
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Configuration
|
|
35
|
+
|
|
36
|
+
Add a `[smoke]` section to your RL config:
|
|
37
|
+
|
|
38
|
+
```toml
|
|
39
|
+
[smoke]
|
|
40
|
+
# Auto-start task app
|
|
41
|
+
task_app_name = "grpo-crafter"
|
|
42
|
+
task_app_port = 8765
|
|
43
|
+
task_app_env_file = ".env"
|
|
44
|
+
task_app_force = true
|
|
45
|
+
|
|
46
|
+
# Auto-start sqld
|
|
47
|
+
sqld_auto_start = true
|
|
48
|
+
sqld_db_path = "./traces/local.db"
|
|
49
|
+
sqld_hrana_port = 8080
|
|
50
|
+
sqld_http_port = 8081
|
|
51
|
+
|
|
52
|
+
# Test parameters
|
|
53
|
+
max_steps = 10
|
|
54
|
+
policy = "gpt-5-nano"
|
|
55
|
+
mock_backend = "synthetic" # or "openai" (requires valid OpenAI API key)
|
|
56
|
+
return_trace = true
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Testing Methods
|
|
60
|
+
|
|
61
|
+
### 1. Full Auto (Recommended)
|
|
62
|
+
Everything auto-starts from config:
|
|
63
|
+
```bash
|
|
64
|
+
uv run synth-ai smoke --config configs/smoke_test.toml
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### 2. Manual Task App + Auto sqld
|
|
68
|
+
Start task app manually, auto-start sqld:
|
|
69
|
+
```bash
|
|
70
|
+
# Config with sqld_auto_start=true but no task_app_name
|
|
71
|
+
uv run synth-ai smoke --config configs/my_config.toml --url http://localhost:8765
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### 3. Override Config Settings
|
|
75
|
+
Override any config value via CLI:
|
|
76
|
+
```bash
|
|
77
|
+
uv run synth-ai smoke --config configs/smoke_test.toml --max-steps 5
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### 4. No Config (Manual Everything)
|
|
81
|
+
```bash
|
|
82
|
+
# Start services manually in separate terminals:
|
|
83
|
+
# Terminal 1: sqld --db-path ./traces/local.db --hrana-listen-addr 127.0.0.1:8080 --http-listen-addr 127.0.0.1:8081
|
|
84
|
+
# Terminal 2: uv run synth-ai task-app serve grpo-crafter --port 8765 --env-file .env --force
|
|
85
|
+
|
|
86
|
+
# Terminal 3: Run smoke test
|
|
87
|
+
uv run synth-ai smoke --url http://localhost:8765 \
|
|
88
|
+
--env-name crafter \
|
|
89
|
+
--policy-name crafter-react \
|
|
90
|
+
--max-steps 10 \
|
|
91
|
+
--policy mock \
|
|
92
|
+
--mock-backend openai
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## Prerequisites
|
|
96
|
+
|
|
97
|
+
### Install sqld (for tracing)
|
|
98
|
+
```bash
|
|
99
|
+
brew install sqld
|
|
100
|
+
# or
|
|
101
|
+
curl -fsSL https://get.turso.com/sqld | bash
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Verify Installation
|
|
105
|
+
```bash
|
|
106
|
+
which sqld
|
|
107
|
+
# Should output: /opt/homebrew/bin/sqld or similar
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
## Common Issues
|
|
111
|
+
|
|
112
|
+
### sqld not found
|
|
113
|
+
If you see "sqld not found in PATH":
|
|
114
|
+
```bash
|
|
115
|
+
brew install sqld
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Port already in use
|
|
119
|
+
Use `task_app_force = true` in config, or:
|
|
120
|
+
```bash
|
|
121
|
+
# Kill processes on ports 8080, 8081, 8765
|
|
122
|
+
lsof -ti:8080,8081,8765 | xargs kill -9
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Task app not starting
|
|
126
|
+
Check the error output - you may need:
|
|
127
|
+
- Valid `.env` file with required keys
|
|
128
|
+
- Correct task app name registered in your codebase
|
|
129
|
+
|
|
130
|
+
## Example Output
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
[smoke] Loaded configuration from configs/smoke_test.toml
|
|
134
|
+
[smoke] Config keys: task_app_name, task_app_port, sqld_auto_start, max_steps, policy
|
|
135
|
+
[smoke] Starting sqld server...
|
|
136
|
+
[smoke] DB path: /Users/you/project/traces/local.db
|
|
137
|
+
[smoke] Hrana port: 8080, HTTP port: 8081
|
|
138
|
+
[smoke] sqld ready
|
|
139
|
+
[smoke] Starting task app 'grpo-crafter' on port 8765...
|
|
140
|
+
[smoke] Task app ready at http://localhost:8765
|
|
141
|
+
[smoke] Task app started, will use URL: http://localhost:8765
|
|
142
|
+
[mock-rl] server ready http://127.0.0.1:52134 backend=openai
|
|
143
|
+
>> POST /rollout run_id=smoke-abc123...
|
|
144
|
+
rollout[0:0] episodes=1 steps=20 mean_return=1.2500
|
|
145
|
+
✓ Smoke rollouts complete
|
|
146
|
+
successes=1/1 total_steps=20 v3_traces=1/1 nonzero_returns=1/1
|
|
147
|
+
[smoke] Stopping sqld...
|
|
148
|
+
[smoke] Stopping task_app...
|
|
149
|
+
[smoke] Background services stopped
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Next Steps
|
|
153
|
+
|
|
154
|
+
Once smoke tests pass:
|
|
155
|
+
1. Train your model: `uv run synth-ai train --type rl --config configs/your_config.toml`
|
|
156
|
+
2. Check traces: Look in `./traces/` directory
|
|
157
|
+
3. Monitor training: Use the Synth dashboard
|
|
158
|
+
|
|
159
|
+
## Full Config Reference
|
|
160
|
+
|
|
161
|
+
See [`configs/smoke_test.toml`](configs/smoke_test.toml) for a complete example.
|
|
162
|
+
|
|
163
|
+
See [CLI Smoke Documentation](https://docs.usesynth.ai/cli/smoke) for all options.
|
|
164
|
+
|