synth-ai 0.2.16__py3-none-any.whl → 0.2.19__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of synth-ai might be problematic. Click here for more details.

Files changed (299) hide show
  1. examples/analyze_semantic_words.sh +2 -2
  2. examples/baseline/banking77_baseline.py +204 -0
  3. examples/baseline/crafter_baseline.py +407 -0
  4. examples/baseline/pokemon_red_baseline.py +326 -0
  5. examples/baseline/simple_baseline.py +56 -0
  6. examples/baseline/warming_up_to_rl_baseline.py +239 -0
  7. examples/blog_posts/gepa/README.md +355 -0
  8. examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
  9. examples/blog_posts/gepa/configs/banking77_gepa_test.toml +82 -0
  10. examples/blog_posts/gepa/configs/banking77_mipro_local.toml +52 -0
  11. examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +59 -0
  12. examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +36 -0
  13. examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +53 -0
  14. examples/blog_posts/gepa/configs/hover_gepa_local.toml +59 -0
  15. examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +36 -0
  16. examples/blog_posts/gepa/configs/hover_mipro_local.toml +53 -0
  17. examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +59 -0
  18. examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +36 -0
  19. examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +53 -0
  20. examples/blog_posts/gepa/configs/pupa_gepa_local.toml +60 -0
  21. examples/blog_posts/gepa/configs/pupa_mipro_local.toml +54 -0
  22. examples/blog_posts/gepa/deploy_banking77_task_app.sh +41 -0
  23. examples/blog_posts/gepa/gepa_baseline.py +204 -0
  24. examples/blog_posts/gepa/query_prompts_example.py +97 -0
  25. examples/blog_posts/gepa/run_gepa_banking77.sh +87 -0
  26. examples/blog_posts/gepa/task_apps.py +105 -0
  27. examples/blog_posts/gepa/test_gepa_local.sh +67 -0
  28. examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
  29. examples/blog_posts/pokemon_vl/README.md +98 -0
  30. examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
  31. examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +27 -0
  32. examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
  33. examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
  34. examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +43 -0
  35. examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
  36. examples/blog_posts/pokemon_vl/extract_images.py +239 -0
  37. examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
  38. examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
  39. examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
  40. examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
  41. examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
  42. examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
  43. examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
  44. examples/blog_posts/warming_up_to_rl/README.md +158 -0
  45. examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
  46. examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
  47. examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
  48. examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
  49. examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
  50. examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
  51. examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
  52. examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
  53. examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
  54. examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +91 -0
  55. examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
  56. examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
  57. examples/dev/qwen3_32b_qlora_4xh100.toml +5 -0
  58. examples/multi_step/configs/VERILOG_REWARDS.md +4 -0
  59. examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +4 -0
  60. examples/multi_step/configs/crafter_rl_outcome.toml +2 -1
  61. examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +65 -107
  62. examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +2 -1
  63. examples/multi_step/configs/crafter_rl_stepwise_simple.toml +2 -1
  64. examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
  65. examples/multi_step/configs/verilog_rl_lora.toml +80 -123
  66. examples/qwen_coder/configs/coder_lora_30b.toml +1 -3
  67. examples/qwen_coder/configs/coder_lora_4b.toml +4 -1
  68. examples/qwen_coder/configs/coder_lora_small.toml +1 -3
  69. examples/qwen_vl/README.md +10 -12
  70. examples/qwen_vl/SETUP_COMPLETE.md +7 -8
  71. examples/qwen_vl/VISION_TESTS_COMPLETE.md +2 -3
  72. examples/qwen_vl/collect_data_via_cli.md +76 -84
  73. examples/qwen_vl/collect_vision_traces.py +4 -4
  74. examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +40 -57
  75. examples/qwen_vl/configs/crafter_vlm_sft_example.toml +1 -2
  76. examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +20 -37
  77. examples/qwen_vl/configs/eval_gpt5nano_vision.toml +21 -40
  78. examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
  79. examples/qwen_vl/configs/{filter_qwen2vl_sft.toml → filter_qwen3vl_sft.toml} +4 -5
  80. examples/qwen_vl/configs/filter_vision_sft.toml +2 -3
  81. examples/qwen_vl/crafter_qwen_vl_agent.py +5 -5
  82. examples/qwen_vl/run_vision_comparison.sh +6 -7
  83. examples/rl/README.md +5 -5
  84. examples/rl/configs/rl_from_base_qwen.toml +26 -1
  85. examples/rl/configs/rl_from_base_qwen17.toml +6 -2
  86. examples/rl/task_app/README.md +1 -2
  87. examples/rl/task_app/math_single_step.py +2 -2
  88. examples/run_crafter_demo.sh +2 -2
  89. examples/sft/README.md +1 -1
  90. examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -1
  91. examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -1
  92. examples/swe/task_app/README.md +32 -2
  93. examples/swe/task_app/grpo_swe_mini.py +4 -0
  94. examples/swe/task_app/hosted/envs/crafter/react_agent.py +1 -1
  95. examples/swe/task_app/hosted/envs/mini_swe/environment.py +37 -10
  96. examples/swe/task_app/hosted/inference/openai_client.py +4 -38
  97. examples/swe/task_app/hosted/policy_routes.py +17 -0
  98. examples/swe/task_app/hosted/rollout.py +4 -2
  99. examples/swe/task_app/morph_backend.py +178 -0
  100. examples/task_apps/banking77/__init__.py +6 -0
  101. examples/task_apps/banking77/banking77_task_app.py +841 -0
  102. examples/task_apps/banking77/deploy_wrapper.py +46 -0
  103. examples/task_apps/crafter/CREATE_SFT_DATASET.md +4 -0
  104. examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +4 -0
  105. examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +4 -0
  106. examples/task_apps/crafter/task_app/README.md +1 -1
  107. examples/task_apps/crafter/task_app/grpo_crafter.py +90 -5
  108. examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +1 -1
  109. examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +4 -26
  110. examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -2
  111. examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +49 -0
  112. examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +372 -107
  113. examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +81 -12
  114. examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +82 -11
  115. examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +194 -1
  116. examples/task_apps/enron/task_app/grpo_enron_task_app.py +1 -1
  117. examples/task_apps/gepa_benchmarks/__init__.py +7 -0
  118. examples/task_apps/gepa_benchmarks/common.py +260 -0
  119. examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
  120. examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
  121. examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
  122. examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
  123. examples/task_apps/math/README.md +1 -2
  124. examples/task_apps/pokemon_red/README.md +3 -4
  125. examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +4 -0
  126. examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +6 -5
  127. examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +1 -2
  128. examples/task_apps/pokemon_red/task_app.py +288 -39
  129. examples/task_apps/sokoban/README.md +2 -3
  130. examples/task_apps/verilog/eval_groq_qwen32b.toml +12 -14
  131. examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +1 -1
  132. examples/vlm/configs/crafter_vlm_gpt4o.toml +4 -1
  133. examples/warming_up_to_rl/configs/crafter_fft.toml +4 -1
  134. examples/warming_up_to_rl/configs/crafter_fft_4b.toml +0 -2
  135. examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +3 -2
  136. examples/warming_up_to_rl/run_local_rollout_traced.py +1 -1
  137. examples/warming_up_to_rl/task_app/README.md +1 -1
  138. examples/warming_up_to_rl/task_app/grpo_crafter.py +185 -5
  139. examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +1 -1
  140. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +3 -27
  141. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -1
  142. examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +49 -0
  143. examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +156 -45
  144. examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +37 -4
  145. examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +33 -3
  146. examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +67 -0
  147. examples/workflows/math_rl/configs/rl_from_base_qwen.toml +27 -0
  148. examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +6 -0
  149. synth_ai/api/train/builders.py +99 -4
  150. synth_ai/api/train/cli.py +516 -26
  151. synth_ai/api/train/config_finder.py +13 -2
  152. synth_ai/api/train/configs/__init__.py +23 -2
  153. synth_ai/api/train/configs/prompt_learning.py +442 -0
  154. synth_ai/api/train/configs/rl.py +61 -7
  155. synth_ai/api/train/configs/sft.py +6 -2
  156. synth_ai/api/train/configs/shared.py +59 -2
  157. synth_ai/api/train/task_app.py +1 -1
  158. synth_ai/api/train/validators.py +277 -0
  159. synth_ai/auth/credentials.py +119 -0
  160. synth_ai/baseline/__init__.py +25 -0
  161. synth_ai/baseline/config.py +209 -0
  162. synth_ai/baseline/discovery.py +214 -0
  163. synth_ai/baseline/execution.py +146 -0
  164. synth_ai/cli/__init__.py +94 -18
  165. synth_ai/cli/__main__.py +0 -0
  166. synth_ai/cli/claude.py +70 -0
  167. synth_ai/cli/codex.py +84 -0
  168. synth_ai/cli/commands/__init__.py +18 -0
  169. synth_ai/cli/commands/baseline/__init__.py +12 -0
  170. synth_ai/cli/commands/baseline/core.py +637 -0
  171. synth_ai/cli/commands/baseline/list.py +93 -0
  172. synth_ai/cli/commands/demo/__init__.py +6 -0
  173. synth_ai/cli/commands/demo/core.py +163 -0
  174. synth_ai/cli/commands/eval/__init__.py +19 -0
  175. synth_ai/cli/commands/eval/core.py +1112 -0
  176. synth_ai/cli/commands/eval/errors.py +81 -0
  177. synth_ai/cli/commands/eval/validation.py +133 -0
  178. synth_ai/cli/commands/filter/__init__.py +12 -0
  179. synth_ai/cli/commands/filter/core.py +424 -0
  180. synth_ai/cli/commands/filter/errors.py +55 -0
  181. synth_ai/cli/commands/filter/validation.py +77 -0
  182. synth_ai/cli/commands/help/__init__.py +177 -0
  183. synth_ai/cli/commands/help/core.py +72 -0
  184. synth_ai/cli/commands/smoke/__init__.py +7 -0
  185. synth_ai/cli/commands/smoke/core.py +1436 -0
  186. synth_ai/cli/commands/status/__init__.py +64 -0
  187. synth_ai/cli/commands/status/client.py +192 -0
  188. synth_ai/cli/commands/status/config.py +92 -0
  189. synth_ai/cli/commands/status/errors.py +20 -0
  190. synth_ai/cli/commands/status/formatters.py +164 -0
  191. synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
  192. synth_ai/cli/commands/status/subcommands/files.py +79 -0
  193. synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
  194. synth_ai/cli/commands/status/subcommands/models.py +79 -0
  195. synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
  196. synth_ai/cli/commands/status/subcommands/runs.py +81 -0
  197. synth_ai/cli/commands/status/subcommands/summary.py +47 -0
  198. synth_ai/cli/commands/status/subcommands/usage.py +203 -0
  199. synth_ai/cli/commands/status/utils.py +114 -0
  200. synth_ai/cli/commands/train/__init__.py +53 -0
  201. synth_ai/cli/commands/train/core.py +21 -0
  202. synth_ai/cli/commands/train/errors.py +117 -0
  203. synth_ai/cli/commands/train/judge_schemas.py +200 -0
  204. synth_ai/cli/commands/train/judge_validation.py +305 -0
  205. synth_ai/cli/commands/train/validation.py +386 -0
  206. synth_ai/cli/demo.py +30 -158
  207. synth_ai/cli/deploy/__init__.py +43 -0
  208. synth_ai/cli/deploy.py +162 -0
  209. synth_ai/cli/eval/__init__.py +36 -0
  210. synth_ai/cli/eval/core.py +5 -0
  211. synth_ai/cli/eval/errors.py +31 -0
  212. synth_ai/cli/eval/validation.py +5 -0
  213. synth_ai/cli/filter/__init__.py +28 -0
  214. synth_ai/cli/filter/core.py +5 -0
  215. synth_ai/cli/filter/errors.py +23 -0
  216. synth_ai/cli/filter/validation.py +5 -0
  217. synth_ai/cli/legacy_root_backup.py +14 -8
  218. synth_ai/cli/modal_serve/__init__.py +12 -0
  219. synth_ai/cli/modal_serve/core.py +14 -0
  220. synth_ai/cli/modal_serve/errors.py +8 -0
  221. synth_ai/cli/modal_serve/validation.py +11 -0
  222. synth_ai/cli/opencode.py +107 -0
  223. synth_ai/cli/root.py +9 -5
  224. synth_ai/cli/serve/__init__.py +12 -0
  225. synth_ai/cli/serve/core.py +14 -0
  226. synth_ai/cli/serve/errors.py +8 -0
  227. synth_ai/cli/serve/validation.py +11 -0
  228. synth_ai/cli/setup.py +20 -265
  229. synth_ai/cli/status.py +7 -126
  230. synth_ai/cli/task_app_deploy.py +1 -10
  231. synth_ai/cli/task_app_modal_serve.py +4 -9
  232. synth_ai/cli/task_app_serve.py +4 -11
  233. synth_ai/cli/task_apps.py +51 -1480
  234. synth_ai/cli/train/__init__.py +12 -0
  235. synth_ai/cli/train/core.py +21 -0
  236. synth_ai/cli/train/errors.py +8 -0
  237. synth_ai/cli/train/validation.py +24 -0
  238. synth_ai/cli/train.py +1 -14
  239. synth_ai/demos/crafter/grpo_crafter_task_app.py +1 -1
  240. synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +1 -1
  241. synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
  242. synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
  243. synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
  244. synth_ai/environments/examples/red/engine.py +33 -12
  245. synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
  246. synth_ai/environments/examples/red/environment.py +26 -0
  247. synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
  248. synth_ai/http.py +12 -0
  249. synth_ai/judge_schemas.py +10 -10
  250. synth_ai/learning/__init__.py +10 -0
  251. synth_ai/learning/prompt_learning_client.py +276 -0
  252. synth_ai/learning/prompt_learning_types.py +184 -0
  253. synth_ai/learning/rl/client.py +3 -1
  254. synth_ai/pricing/__init__.py +2 -0
  255. synth_ai/pricing/model_pricing.py +57 -0
  256. synth_ai/streaming/__init__.py +29 -0
  257. synth_ai/streaming/config.py +94 -0
  258. synth_ai/streaming/handlers.py +518 -0
  259. synth_ai/streaming/streamer.py +320 -0
  260. synth_ai/streaming/types.py +95 -0
  261. synth_ai/task/apps/__init__.py +1 -0
  262. synth_ai/task/config.py +2 -0
  263. synth_ai/task/tracing_utils.py +25 -25
  264. synth_ai/task/validators.py +45 -9
  265. synth_ai/task_app_cfgs.py +21 -0
  266. synth_ai/tracing_v3/config.py +162 -19
  267. synth_ai/tracing_v3/constants.py +1 -1
  268. synth_ai/tracing_v3/db_config.py +24 -38
  269. synth_ai/tracing_v3/migration_helper.py +1 -2
  270. synth_ai/tracing_v3/storage/config.py +47 -13
  271. synth_ai/tracing_v3/storage/factory.py +3 -3
  272. synth_ai/tracing_v3/turso/daemon.py +113 -11
  273. synth_ai/tracing_v3/turso/native_manager.py +92 -16
  274. synth_ai/types.py +8 -0
  275. synth_ai/urls.py +11 -0
  276. synth_ai/utils/__init__.py +30 -1
  277. synth_ai/utils/agents.py +74 -0
  278. synth_ai/utils/bin.py +39 -0
  279. synth_ai/utils/cli.py +149 -5
  280. synth_ai/utils/env.py +40 -33
  281. synth_ai/utils/http.py +4 -1
  282. synth_ai/utils/json.py +72 -0
  283. synth_ai/utils/modal.py +285 -3
  284. synth_ai/utils/paths.py +48 -0
  285. synth_ai/utils/uvicorn.py +113 -0
  286. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/METADATA +109 -6
  287. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/RECORD +291 -142
  288. examples/qwen_vl/configs/eval_qwen2vl_vision.toml +0 -44
  289. synth_ai/cli/tui.py +0 -62
  290. synth_ai/tui/__init__.py +0 -5
  291. synth_ai/tui/__main__.py +0 -13
  292. synth_ai/tui/cli/__init__.py +0 -1
  293. synth_ai/tui/cli/query_experiments.py +0 -164
  294. synth_ai/tui/cli/query_experiments_v3.py +0 -164
  295. synth_ai/tui/dashboard.py +0 -911
  296. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/WHEEL +0 -0
  297. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/entry_points.txt +0 -0
  298. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/licenses/LICENSE +0 -0
  299. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,212 @@
1
+ #!/usr/bin/env python3
2
+ """Run pokemon_vl eval with Qwen3-VL and extract images from trajectory response.
3
+
4
+ This script runs a qwen eval and extracts images directly from the trajectory steps
5
+ in the rollout response, similar to run_eval_extract_images.py but for Qwen models.
6
+ """
7
+
8
+ import argparse
9
+ import asyncio
10
+ import base64
11
+ import json
12
+ import os
13
+ from pathlib import Path
14
+
15
+ import httpx
16
+ from dotenv import load_dotenv
17
+
18
+ load_dotenv()
19
+
20
+
21
+ async def run_qwen_eval_and_extract_images(
22
+ task_app_url: str,
23
+ output_dir: Path,
24
+ seed: int = 10,
25
+ max_turns: int = 10,
26
+ model: str = "Qwen/Qwen3-VL-30B-A3B-Thinking",
27
+ ):
28
+ """Run qwen eval and extract images from trajectory."""
29
+ output_dir.mkdir(parents=True, exist_ok=True)
30
+
31
+ async with httpx.AsyncClient(timeout=600.0) as client: # Longer timeout for qwen
32
+ # Build rollout request matching eval_qwen3_vl.toml config
33
+ rollout_request = {
34
+ "run_id": f"qwen_eval_seed_{seed}",
35
+ "env": {
36
+ "env_name": "pokemon_red",
37
+ "seed": seed,
38
+ "config": {
39
+ "split": "train",
40
+ "index": seed,
41
+ "env_params": {"max_steps_per_episode": 100},
42
+ },
43
+ },
44
+ "policy": {
45
+ "policy_name": "pokemon_vl_qwen3_vl",
46
+ "config": {
47
+ "model": model,
48
+ "provider": "synth",
49
+ "inference_url": "https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions",
50
+ "temperature": 1.0,
51
+ "top_p": 0.95,
52
+ "max_tokens": 2048,
53
+ "use_vision": True,
54
+ "image_only_mode": False,
55
+ "max_llm_calls": max_turns,
56
+ "thinking_mode": "think",
57
+ "thinking_budget": 3072,
58
+ },
59
+ },
60
+ "ops": ["policy"] * max_turns,
61
+ "mode": "eval",
62
+ "record": {
63
+ "return_trace": True,
64
+ "trace_format": "full",
65
+ },
66
+ }
67
+
68
+ print(f"Running eval with {model} (seed={seed})...")
69
+ print(f"This may take a while as Qwen models load...")
70
+ response = await client.post(f"{task_app_url}/rollout", json=rollout_request)
71
+ response.raise_for_status()
72
+ result = response.json()
73
+
74
+ # Extract trajectory
75
+ trajectories = result.get("trajectories", [])
76
+ if not trajectories:
77
+ print("Error: No trajectories in response")
78
+ return
79
+
80
+ trajectory = trajectories[0]
81
+ steps = trajectory.get("steps", [])
82
+
83
+ print(f"✓ Received {len(steps)} steps")
84
+ print(f"Extracting images (filtering intermediate text box frames)...")
85
+
86
+ # First pass: collect all images with their state
87
+ image_data = []
88
+ for idx, step in enumerate(steps):
89
+ obs = step.get("obs", {})
90
+ img_b64 = obs.get("observation_image_base64")
91
+
92
+ if not img_b64:
93
+ continue
94
+
95
+ try:
96
+ img_data = base64.b64decode(img_b64)
97
+ map_id = obs.get("map_id", "?")
98
+ player_x = obs.get("player_x", "?")
99
+ player_y = obs.get("player_y", "?")
100
+ text_box_active = obs.get("text_box_active", False)
101
+
102
+ image_data.append({
103
+ "idx": idx,
104
+ "img_data": img_data,
105
+ "map_id": map_id,
106
+ "player_x": player_x,
107
+ "player_y": player_y,
108
+ "text_box_active": text_box_active,
109
+ })
110
+ except Exception as e:
111
+ print(f" Error decoding step {idx}: {e}")
112
+ continue
113
+
114
+ # Second pass: filter out intermediate text box frames
115
+ # Keep: text_box_active=False OR the last frame of a text box sequence
116
+ filtered_images = []
117
+ for i, img_info in enumerate(image_data):
118
+ text_box_active = img_info["text_box_active"]
119
+ prev_text_box_active = image_data[i - 1]["text_box_active"] if i > 0 else False
120
+ next_text_box_active = image_data[i + 1]["text_box_active"] if i + 1 < len(image_data) else False
121
+
122
+ # Keep if:
123
+ # 1. Not in a text box (text_box_active=False)
124
+ # 2. Last frame of text box sequence (text_box_active=True and next is False)
125
+ # 3. Last frame overall and in text box (no next frame)
126
+ if not text_box_active:
127
+ # Always keep non-text-box frames
128
+ filtered_images.append(img_info)
129
+ elif text_box_active and (not next_text_box_active or i + 1 >= len(image_data)):
130
+ # Keep final frame of text box sequence (transition out or end of trajectory)
131
+ filtered_images.append(img_info)
132
+ # Otherwise skip intermediate text box loading frames
133
+
134
+ # Save filtered images
135
+ image_count = 0
136
+ for img_info in filtered_images:
137
+ try:
138
+ map_id = img_info["map_id"]
139
+ player_x = img_info["player_x"]
140
+ player_y = img_info["player_y"]
141
+ text_box_active = img_info["text_box_active"]
142
+ idx = img_info["idx"]
143
+
144
+ pos_str = f"Map{map_id}_{player_x},{player_y}"
145
+ textbox_str = "True" if text_box_active else "False"
146
+ filename = f"step_{idx:03d}_pos_{pos_str}_textbox_{textbox_str}_seed{seed}.png"
147
+
148
+ filepath = output_dir / filename
149
+ filepath.write_bytes(img_info["img_data"])
150
+
151
+ print(f" Saved: {filename}")
152
+ image_count += 1
153
+ except Exception as e:
154
+ print(f" Error saving step {img_info['idx']}: {e}")
155
+ continue
156
+
157
+ print(f"\n Filtered: {len(image_data)} -> {len(filtered_images)} images (removed {len(image_data) - len(filtered_images)} intermediate text box frames)")
158
+
159
+ print(f"\n✓ Extracted {image_count} images to {output_dir}/")
160
+
161
+ # Also save metrics
162
+ metrics = result.get("metrics", {})
163
+ if metrics:
164
+ metrics_file = output_dir / "metrics.json"
165
+ with open(metrics_file, "w") as f:
166
+ json.dump(metrics, f, indent=2)
167
+ print(f"✓ Saved metrics to {metrics_file}")
168
+
169
+
170
+ async def main():
171
+ parser = argparse.ArgumentParser(description=__doc__)
172
+ parser.add_argument(
173
+ "--task-app-url",
174
+ default="http://127.0.0.1:8914",
175
+ help="Task app URL",
176
+ )
177
+ parser.add_argument(
178
+ "--output-dir",
179
+ default="examples/blog_posts/pokemon_vl/images_qwen",
180
+ help="Output directory for images",
181
+ )
182
+ parser.add_argument(
183
+ "--seed",
184
+ type=int,
185
+ default=10,
186
+ help="Random seed (default matches eval_qwen3_vl.toml)",
187
+ )
188
+ parser.add_argument(
189
+ "--max-turns",
190
+ type=int,
191
+ default=10,
192
+ help="Maximum turns",
193
+ )
194
+ parser.add_argument(
195
+ "--model",
196
+ default="Qwen/Qwen3-VL-30B-A3B-Thinking",
197
+ help="Qwen model name",
198
+ )
199
+ args = parser.parse_args()
200
+
201
+ await run_qwen_eval_and_extract_images(
202
+ args.task_app_url,
203
+ Path(args.output_dir),
204
+ args.seed,
205
+ args.max_turns,
206
+ args.model,
207
+ )
208
+
209
+
210
+ if __name__ == "__main__":
211
+ asyncio.run(main())
212
+
@@ -0,0 +1,106 @@
1
+ # Pokemon Red Text Box Issue Analysis
2
+
3
+ ## Problem Summary
4
+ The model is getting stuck in text boxes during evaluation, particularly at the starting position `Map26:(3,6)`.
5
+
6
+ ## Key Findings
7
+
8
+ ### Statistics
9
+ - **42 out of 76 states (55%)** have `text_box_active=True`
10
+ - **Position Map26:(3,6) is stuck 18 times** - this is the starting bedroom position
11
+ - The model does eventually escape text boxes, but it takes many steps (50+ steps)
12
+
13
+ ### Visual Issue: Gray Block
14
+ - **Reported**: There's a weird gray block visible in the captured images
15
+ - **Possible causes**:
16
+ 1. PyBoy screen rendering artifact
17
+ 2. Text box background overlay (normal Game Boy behavior)
18
+ 3. Screen capture timing issue (captured during screen transition)
19
+ 4. RGBA→RGB conversion issue in `environment.py` line 295-296
20
+
21
+ **Investigation needed**: Check if gray block appears in:
22
+ - All images vs only text_box_active=True images
23
+ - Specific screen regions (bottom half = text box area?)
24
+ - Consistent across all steps or only certain states
25
+
26
+ ### State Progression
27
+ ```
28
+ Step 0: pos=Map26:(3,6) text_box=True reward= 0.00 map=38
29
+ Step 10: pos=Map26:(3,6) text_box=True reward= 0.02 map=38
30
+ Step 16: pos=Map26:(3,6) text_box=True reward= 0.02 map=38
31
+ ...
32
+ Step 33: pos=Map26:(4,6) text_box=True reward= 0.04 map=38
33
+ Step 43: pos=Map26:(5,7) text_box=True reward= 0.10 map=38
34
+ Step 52: pos=Map26:(5,7) text_box=False reward= 0.10 map=38 ← Finally escaped
35
+ ```
36
+
37
+ ### Observations
38
+
39
+ 1. **Text box persists across multiple steps** - Even when the model presses B then A (as instructed), the text box doesn't advance immediately
40
+ 2. **Position doesn't change when stuck** - The model is stuck at the same position (3,6) for many steps
41
+ 3. **Reward stays low** - The model gets minimal reward (0.02-0.04) while stuck
42
+ 4. **Eventually breaks free** - After ~50 steps, the model does escape and starts exploring
43
+
44
+ ## Possible Causes
45
+
46
+ ### 1. Game Environment Issue
47
+ - The text box might require a specific button sequence that the model isn't using
48
+ - There might be a timing issue - the model needs to wait longer between button presses
49
+ - The text box might be part of a multi-screen dialogue that requires multiple A presses
50
+
51
+ ### 2. Model Behavior Issue
52
+ - The model might not be pressing buttons correctly (wrong duration/frames)
53
+ - The model might be pressing B too quickly after A, canceling the action
54
+ - The model might need to see the text box advance before understanding it worked
55
+
56
+ ### 3. Reward Function Issue
57
+ - No reward for advancing text boxes means the model doesn't learn this is progress
58
+ - The model might not realize escaping the text box is beneficial
59
+
60
+ ## Recommendations
61
+
62
+ ### Immediate Fixes
63
+
64
+ 1. **Add explicit reward for text box advancement**
65
+ - Give small reward (+1-2 points) when `text_box_active` transitions from True to False
66
+ - This signals to the model that escaping text boxes is progress
67
+
68
+ 2. **Improve system prompt**
69
+ - Be more explicit: "When text_box_active=True, you MUST press A multiple times (5-10 times) to advance through all dialogue screens"
70
+ - Add: "Each dialogue screen requires pressing A. Continue pressing A until text_box_active becomes False"
71
+
72
+ 3. **Increase button press duration**
73
+ - Current: `{"button": "A", "frames": 10}` or `{"button": "A", "frames": 30}`
74
+ - Try: `{"button": "A", "frames": 60}` to ensure the press registers
75
+
76
+ 4. **Add loop detection**
77
+ - If stuck at same position with text_box_active=True for 3+ turns, force a sequence of 10 A presses
78
+
79
+ ### Longer-term Solutions
80
+
81
+ 1. **Investigate game emulator behavior**
82
+ - Check if the Pokemon Red emulator handles button presses correctly
83
+ - Verify text box advancement logic
84
+
85
+ 2. **Add visual feedback**
86
+ - Show the model screenshots before/after text box advancement
87
+ - Help it understand the visual change
88
+
89
+ 3. **Pre-training on text box handling**
90
+ - Create a simple reward for pressing A when text_box_active=True
91
+ - Let the model learn this basic skill first
92
+
93
+ ## Current Performance
94
+
95
+ - **Mean outcome score**: 0.010 (very low)
96
+ - **Official mean**: 0.500 (one seed succeeded, one failed)
97
+ - **Total reward**: 0.42-0.50 (milestones give 20-150 points each)
98
+ - **Steps taken**: 105-115 steps (but most spent stuck in text boxes)
99
+
100
+ ## Next Steps
101
+
102
+ 1. Add reward for text box advancement
103
+ 2. Update system prompt to be more explicit about text box handling
104
+ 3. Test with longer A button press durations
105
+ 4. Consider adding loop detection to break out of stuck states
106
+
@@ -0,0 +1,195 @@
1
+ # Smoke Test Architecture
2
+
3
+ This document explains how the smoke test works internally, for future maintenance and debugging.
4
+
5
+ ## Component Overview
6
+
7
+ ```
8
+ ┌─────────────────────────────────────────────────────────────────┐
9
+ │ synth-ai smoke command │
10
+ │ (synth_ai/cli/commands/smoke/core.py) │
11
+ └────────────┬────────────────────────────────────────────────────┘
12
+
13
+ ├─► Auto-start sqld (optional)
14
+ │ ├─ Kill existing process on ports 8080/8081
15
+ │ ├─ Start: sqld --db-path ... --hrana-listen-addr ... --http-listen-addr ...
16
+ │ └─ Health check: GET http://127.0.0.1:8081/health
17
+
18
+ ├─► Auto-start task app (optional)
19
+ │ ├─ Kill existing process on port 8765
20
+ │ ├─ Start: nohup uvx synth-ai task-app serve ... (from synth-ai root)
21
+ │ ├─ Health check: GET http://localhost:8765/health (accepts 200 or 400)
22
+ │ └─ Output: nohup_task_app.out
23
+
24
+ ├─► Start mock RL trainer (if use_mock=true)
25
+ │ ├─ MockRLTrainer(port=0, backend="openai")
26
+ │ ├─ Forwards requests to OpenAI API
27
+ │ └─ Logs: [mock-rl] ← request / → response
28
+
29
+ └─► Execute rollout
30
+ ├─ POST /rollout to task app
31
+ ├─ Capture response with v3 trace
32
+ └─ Extract and display tool calls
33
+
34
+ ```
35
+
36
+ ## Key Implementation Details
37
+
38
+ ### 1. Tool Call Extraction
39
+
40
+ **Location:** `synth_ai/cli/commands/smoke/core.py` lines ~946-1005
41
+
42
+ **How it works:**
43
+ 1. Request rollout with `return_trace=True` and `trace_format="structured"`
44
+ 2. Response includes `trace.event_history[]` - list of policy and environment events
45
+ 3. Policy events have `call_records[]` containing LLM call metadata
46
+ 4. Each `call_record` has `output_tool_calls[]` with tool call details
47
+ 5. Extract `name` and `arguments_json` from each tool call
48
+ 6. Display formatted tool calls to user
49
+
50
+ **Data structure:**
51
+ ```python
52
+ response.trace = {
53
+ "event_history": [
54
+ {
55
+ "call_records": [ # Present in policy events
56
+ {
57
+ "output_tool_calls": [
58
+ {
59
+ "name": "interact_many",
60
+ "arguments_json": '{"actions":["move_up","move_up"]}',
61
+ "call_id": "call_xyz",
62
+ "index": 0
63
+ }
64
+ ],
65
+ "model_name": "gpt-4o-mini",
66
+ "provider": "openai",
67
+ ...
68
+ }
69
+ ],
70
+ "metadata": {...},
71
+ ...
72
+ },
73
+ {
74
+ # Environment step event (no call_records)
75
+ "reward": 1.0,
76
+ "terminated": false,
77
+ ...
78
+ },
79
+ ...
80
+ ],
81
+ "session_id": "...",
82
+ "markov_blanket_message_history": [...],
83
+ ...
84
+ }
85
+ ```
86
+
87
+ ### 2. Background Service Management
88
+
89
+ **Task App Startup:**
90
+ - Must run from synth-ai root for task app discovery
91
+ - Uses `nohup` to detach process
92
+ - Redirects output to `nohup_task_app.out`
93
+ - Polls `/health` endpoint (accepts 200 or 400 status)
94
+ - Timeout: 120 seconds with progress updates every 5 seconds
95
+ - Propagates `SYNTH_QUIET=1` to suppress diagnostic messages
96
+
97
+ **sqld Startup:**
98
+ - Starts with Hrana WebSocket (8080) and HTTP (8081) ports
99
+ - Polls `/health` endpoint for readiness
100
+ - Timeout: 30 seconds
101
+
102
+ **Port Cleanup:**
103
+ - Uses `lsof -ti :PORT` to find PIDs
104
+ - Kills processes with `kill -9 PID`
105
+ - Waits 2 seconds for port release
106
+
107
+ ### 3. Mock RL Trainer
108
+
109
+ The mock trainer (`MockRLTrainer`) acts as a proxy:
110
+ - `backend="synthetic"`: Generates fake tool calls deterministically
111
+ - `backend="openai"`: Forwards to real OpenAI API
112
+ - Logs all requests/responses with `[mock-rl]` prefix
113
+ - Auto-assigns port if `port=0`
114
+
115
+ ### 4. Diagnostic Message Suppression
116
+
117
+ **Permanently disabled (commented out):**
118
+ - `synth_ai/tracing_v3/config.py`: `[TRACING_V3_CONFIG_LOADED]` message
119
+ - `synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py`: All `[PATCH]` messages
120
+ - `synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py`: All `[PATCH]` messages
121
+ - `synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py`: All `[PATCH]` messages
122
+
123
+ **Reason:** These messages add noise to smoke test output. They're still in the code as comments for documentation.
124
+
125
+ ## Troubleshooting Guide
126
+
127
+ ### No tool calls displayed
128
+
129
+ **Symptom:** Output shows `⚠ No tool calls found in trace`
130
+
131
+ **Causes:**
132
+ 1. `return_trace=false` in config - **FIX:** Set `return_trace = true`
133
+ 2. Trace format mismatch - Check `response.trace.event_history` structure
134
+ 3. No LLM calls made - Check for policy errors in task app logs
135
+
136
+ **Debug:**
137
+ ```bash
138
+ # Check task app logs
139
+ cat /path/to/synth-ai/nohup_task_app.out
140
+
141
+ # Verify trace structure
142
+ # Add debug output in core.py around line 978:
143
+ click.echo(f"DEBUG: trace keys: {list(tr.keys())}")
144
+ click.echo(f"DEBUG: event_history length: {len(event_history)}")
145
+ ```
146
+
147
+ ### Task app exits immediately
148
+
149
+ **Symptom:** `0 steps` in rollout, task app process not running
150
+
151
+ **Causes:**
152
+ 1. Wrong task app name - **FIX:** Use `synth-ai task-app list` to find correct name
153
+ 2. Missing .env file - **FIX:** Ensure `task_app_env_file` points to valid .env
154
+ 3. Wrong working directory - **FIX:** Task app must be started from synth-ai root
155
+
156
+ **Debug:**
157
+ ```bash
158
+ # Manual test
159
+ cd /path/to/synth-ai
160
+ uvx synth-ai task-app serve grpo-crafter --port 8765 --env-file /path/to/.env --force
161
+ ```
162
+
163
+ ### Port conflicts
164
+
165
+ **Symptom:** `Address already in use` errors
166
+
167
+ **Fix:** The smoke command auto-kills processes on ports 8080, 8081, 8765. If manual cleanup needed:
168
+ ```bash
169
+ lsof -ti :8080 | xargs kill -9
170
+ lsof -ti :8081 | xargs kill -9
171
+ lsof -ti :8765 | xargs kill -9
172
+ ```
173
+
174
+ ## Future Improvements
175
+
176
+ Potential enhancements for future agents:
177
+
178
+ 1. **Streaming tool call display**: Show tool calls as they happen, not just at the end
179
+ 2. **Tool call validation**: Verify tool calls match expected format for the environment
180
+ 3. **Performance metrics**: Track inference latency per tool call
181
+ 4. **Cost tracking**: Display OpenAI API costs for the smoke test
182
+ 5. **Parallel rollouts**: Support `--parallel N` to test concurrent execution
183
+ 6. **Video/image capture**: For vision-based tasks, save observations
184
+ 7. **Interactive mode**: Allow stepping through rollout one action at a time
185
+
186
+ ## Related Files
187
+
188
+ - `synth_ai/cli/commands/smoke/core.py` - Main smoke command implementation
189
+ - `synth_ai/api/train/configs/rl.py` - `SmokeConfig` Pydantic model
190
+ - `synth_ai/api/train/builders.py` - Removes `[smoke]` section before sending to trainer
191
+ - `synth_ai/task/contracts.py` - `RolloutResponse` with trace field
192
+ - `examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md` - User-facing documentation
193
+ - `monorepo/docs/cli/smoke.mdx` - Mintlify documentation
194
+
195
+
@@ -0,0 +1,127 @@
1
+ # Final Inference Test Results
2
+
3
+ **Date**: Oct 31, 2025
4
+ **Endpoint**: `https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions`
5
+
6
+ ## Summary
7
+
8
+ | Model Type | Status | Result |
9
+ |------------|--------|--------|
10
+ | Base Model (Qwen/Qwen3-4B) | ✅ WORKS | Inference successful |
11
+ | PEFT/SFT (Qwen3-0.6B) | ✅ WORKS | Inference successful |
12
+ | RL (Qwen3-4B) | ❌ **BROKEN** | Modal function crashes |
13
+
14
+ ## Detailed Results
15
+
16
+ ### ✅ Test 1: Base Model (No Fine-Tuning)
17
+
18
+ **Model**: `Qwen/Qwen3-4B`
19
+
20
+ **Result**: **SUCCESS** ✅
21
+ - **Status**: 200 OK
22
+ - **Tokens**: 31 prompt + 100 completion = 131 total
23
+ - **Response**: Generated successfully
24
+
25
+ **Notes**:
26
+ - First attempt returned 303 redirect (cold start)
27
+ - Retry succeeded immediately
28
+ - This confirms the endpoint and auth work correctly
29
+
30
+ ---
31
+
32
+ ### ✅ Test 2: PEFT/SFT Model
33
+
34
+ **Model**: `peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9`
35
+
36
+ **Result**: **SUCCESS** ✅
37
+ - **Status**: 200 OK (consistent across retries)
38
+ - **Tokens**: 31 prompt + 100 completion = 131 total
39
+ - **Response**: "Hello, I am working!" (with thinking tokens)
40
+
41
+ **Notes**:
42
+ - Works reliably
43
+ - No cold start issues
44
+ - This is the expected behavior for all models
45
+
46
+ ---
47
+
48
+ ### ❌ Test 3: RL Model
49
+
50
+ **Model**: `rl:Qwen/Qwen3-4B:job_19a38041c38f96e638c:checkpoint-epoch-1`
51
+
52
+ **Result**: **FAILURE** ❌ - Multiple error modes
53
+
54
+ #### First Attempt:
55
+ ```
56
+ Status: 400 Bad Request
57
+ Error: "Device string must not be empty"
58
+ ```
59
+
60
+ #### Retry:
61
+ ```
62
+ Status: 500 Internal Server Error
63
+ Error: "modal-http: internal error: function was terminated by signal"
64
+ ```
65
+
66
+ **This is a Modal function crash** - the inference function terminated unexpectedly.
67
+
68
+ #### Cold Start (from Modal logs):
69
+ ```
70
+ RuntimeError: Cannot find any model weights with
71
+ '/models/rl/Qwen/Qwen3-4B/job_19a38041c38f96e638c/checkpoint-fixed'
72
+ ```
73
+
74
+ **Root Cause**: RL checkpoint contains LoRA adapter files (`adapter_config.json`, `adapter_model.safetensors`), but vLLM expects full merged model weights.
75
+
76
+ ---
77
+
78
+ ## Conclusion
79
+
80
+ ### What Works ✅
81
+ - **Base models**: Standard HuggingFace models load and inference correctly
82
+ - **PEFT/SFT models**: Fine-tuned models with merged weights work perfectly
83
+
84
+ ### What's Broken ❌
85
+ - **RL models**: Crash during model loading because:
86
+ 1. RL checkpoints are stored as LoRA adapters
87
+ 2. vLLM weight loader expects full model weights
88
+ 3. Missing merge step causes vLLM to crash
89
+ 4. Modal function terminates with signal (crash)
90
+
91
+ ### Impact
92
+ - **HIGH SEVERITY**: All RL-trained models cannot be used for inference
93
+ - Users can train RL models but cannot deploy them
94
+ - This blocks the core RL training → inference workflow
95
+
96
+ ### Next Steps
97
+ See `monorepo/RL_INFERENCE_BUG.md` for:
98
+ - Detailed root cause analysis
99
+ - Reproduction script
100
+ - Suggested fix (merge LoRA adapters before vLLM loading)
101
+ - Code locations to modify
102
+
103
+ ---
104
+
105
+ ## Developer Experience Issues Identified
106
+
107
+ ### Issue #1: Confusing Error Messages
108
+ - **400 "Device string must not be empty"** - Not helpful, doesn't indicate RL adapter issue
109
+ - **500 "function was terminated by signal"** - Generic crash, no context
110
+ - **Should be**: "RL checkpoint contains adapter files. Merge required for vLLM loading."
111
+
112
+ ### Issue #2: Inconsistent Behavior
113
+ - Sometimes returns 303 redirect
114
+ - Sometimes returns 400
115
+ - Sometimes crashes with 500
116
+ - **Should be**: Consistent error message explaining the issue
117
+
118
+ ### Issue #3: Not Obvious How to Test Models
119
+ - Had to try 3 different endpoint URLs before finding the right one
120
+ - No documentation on model ID formats
121
+ - **Should be**: `synth-ai inference --model "rl:..." --message "test"` CLI command
122
+
123
+ ---
124
+
125
+ **Status**: Bug documented and reproduction available.
126
+ **See**: `monorepo/RL_INFERENCE_BUG.md` for full details.
127
+