synth-ai 0.2.16__py3-none-any.whl → 0.2.19__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of synth-ai might be problematic. Click here for more details.

Files changed (299) hide show
  1. examples/analyze_semantic_words.sh +2 -2
  2. examples/baseline/banking77_baseline.py +204 -0
  3. examples/baseline/crafter_baseline.py +407 -0
  4. examples/baseline/pokemon_red_baseline.py +326 -0
  5. examples/baseline/simple_baseline.py +56 -0
  6. examples/baseline/warming_up_to_rl_baseline.py +239 -0
  7. examples/blog_posts/gepa/README.md +355 -0
  8. examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
  9. examples/blog_posts/gepa/configs/banking77_gepa_test.toml +82 -0
  10. examples/blog_posts/gepa/configs/banking77_mipro_local.toml +52 -0
  11. examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +59 -0
  12. examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +36 -0
  13. examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +53 -0
  14. examples/blog_posts/gepa/configs/hover_gepa_local.toml +59 -0
  15. examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +36 -0
  16. examples/blog_posts/gepa/configs/hover_mipro_local.toml +53 -0
  17. examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +59 -0
  18. examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +36 -0
  19. examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +53 -0
  20. examples/blog_posts/gepa/configs/pupa_gepa_local.toml +60 -0
  21. examples/blog_posts/gepa/configs/pupa_mipro_local.toml +54 -0
  22. examples/blog_posts/gepa/deploy_banking77_task_app.sh +41 -0
  23. examples/blog_posts/gepa/gepa_baseline.py +204 -0
  24. examples/blog_posts/gepa/query_prompts_example.py +97 -0
  25. examples/blog_posts/gepa/run_gepa_banking77.sh +87 -0
  26. examples/blog_posts/gepa/task_apps.py +105 -0
  27. examples/blog_posts/gepa/test_gepa_local.sh +67 -0
  28. examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
  29. examples/blog_posts/pokemon_vl/README.md +98 -0
  30. examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
  31. examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +27 -0
  32. examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
  33. examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
  34. examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +43 -0
  35. examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
  36. examples/blog_posts/pokemon_vl/extract_images.py +239 -0
  37. examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
  38. examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
  39. examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
  40. examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
  41. examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
  42. examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
  43. examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
  44. examples/blog_posts/warming_up_to_rl/README.md +158 -0
  45. examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
  46. examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
  47. examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
  48. examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
  49. examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
  50. examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
  51. examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
  52. examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
  53. examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
  54. examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +91 -0
  55. examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
  56. examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
  57. examples/dev/qwen3_32b_qlora_4xh100.toml +5 -0
  58. examples/multi_step/configs/VERILOG_REWARDS.md +4 -0
  59. examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +4 -0
  60. examples/multi_step/configs/crafter_rl_outcome.toml +2 -1
  61. examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +65 -107
  62. examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +2 -1
  63. examples/multi_step/configs/crafter_rl_stepwise_simple.toml +2 -1
  64. examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
  65. examples/multi_step/configs/verilog_rl_lora.toml +80 -123
  66. examples/qwen_coder/configs/coder_lora_30b.toml +1 -3
  67. examples/qwen_coder/configs/coder_lora_4b.toml +4 -1
  68. examples/qwen_coder/configs/coder_lora_small.toml +1 -3
  69. examples/qwen_vl/README.md +10 -12
  70. examples/qwen_vl/SETUP_COMPLETE.md +7 -8
  71. examples/qwen_vl/VISION_TESTS_COMPLETE.md +2 -3
  72. examples/qwen_vl/collect_data_via_cli.md +76 -84
  73. examples/qwen_vl/collect_vision_traces.py +4 -4
  74. examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +40 -57
  75. examples/qwen_vl/configs/crafter_vlm_sft_example.toml +1 -2
  76. examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +20 -37
  77. examples/qwen_vl/configs/eval_gpt5nano_vision.toml +21 -40
  78. examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
  79. examples/qwen_vl/configs/{filter_qwen2vl_sft.toml → filter_qwen3vl_sft.toml} +4 -5
  80. examples/qwen_vl/configs/filter_vision_sft.toml +2 -3
  81. examples/qwen_vl/crafter_qwen_vl_agent.py +5 -5
  82. examples/qwen_vl/run_vision_comparison.sh +6 -7
  83. examples/rl/README.md +5 -5
  84. examples/rl/configs/rl_from_base_qwen.toml +26 -1
  85. examples/rl/configs/rl_from_base_qwen17.toml +6 -2
  86. examples/rl/task_app/README.md +1 -2
  87. examples/rl/task_app/math_single_step.py +2 -2
  88. examples/run_crafter_demo.sh +2 -2
  89. examples/sft/README.md +1 -1
  90. examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -1
  91. examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -1
  92. examples/swe/task_app/README.md +32 -2
  93. examples/swe/task_app/grpo_swe_mini.py +4 -0
  94. examples/swe/task_app/hosted/envs/crafter/react_agent.py +1 -1
  95. examples/swe/task_app/hosted/envs/mini_swe/environment.py +37 -10
  96. examples/swe/task_app/hosted/inference/openai_client.py +4 -38
  97. examples/swe/task_app/hosted/policy_routes.py +17 -0
  98. examples/swe/task_app/hosted/rollout.py +4 -2
  99. examples/swe/task_app/morph_backend.py +178 -0
  100. examples/task_apps/banking77/__init__.py +6 -0
  101. examples/task_apps/banking77/banking77_task_app.py +841 -0
  102. examples/task_apps/banking77/deploy_wrapper.py +46 -0
  103. examples/task_apps/crafter/CREATE_SFT_DATASET.md +4 -0
  104. examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +4 -0
  105. examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +4 -0
  106. examples/task_apps/crafter/task_app/README.md +1 -1
  107. examples/task_apps/crafter/task_app/grpo_crafter.py +90 -5
  108. examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +1 -1
  109. examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +4 -26
  110. examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -2
  111. examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +49 -0
  112. examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +372 -107
  113. examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +81 -12
  114. examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +82 -11
  115. examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +194 -1
  116. examples/task_apps/enron/task_app/grpo_enron_task_app.py +1 -1
  117. examples/task_apps/gepa_benchmarks/__init__.py +7 -0
  118. examples/task_apps/gepa_benchmarks/common.py +260 -0
  119. examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
  120. examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
  121. examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
  122. examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
  123. examples/task_apps/math/README.md +1 -2
  124. examples/task_apps/pokemon_red/README.md +3 -4
  125. examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +4 -0
  126. examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +6 -5
  127. examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +1 -2
  128. examples/task_apps/pokemon_red/task_app.py +288 -39
  129. examples/task_apps/sokoban/README.md +2 -3
  130. examples/task_apps/verilog/eval_groq_qwen32b.toml +12 -14
  131. examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +1 -1
  132. examples/vlm/configs/crafter_vlm_gpt4o.toml +4 -1
  133. examples/warming_up_to_rl/configs/crafter_fft.toml +4 -1
  134. examples/warming_up_to_rl/configs/crafter_fft_4b.toml +0 -2
  135. examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +3 -2
  136. examples/warming_up_to_rl/run_local_rollout_traced.py +1 -1
  137. examples/warming_up_to_rl/task_app/README.md +1 -1
  138. examples/warming_up_to_rl/task_app/grpo_crafter.py +185 -5
  139. examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +1 -1
  140. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +3 -27
  141. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -1
  142. examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +49 -0
  143. examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +156 -45
  144. examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +37 -4
  145. examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +33 -3
  146. examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +67 -0
  147. examples/workflows/math_rl/configs/rl_from_base_qwen.toml +27 -0
  148. examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +6 -0
  149. synth_ai/api/train/builders.py +99 -4
  150. synth_ai/api/train/cli.py +516 -26
  151. synth_ai/api/train/config_finder.py +13 -2
  152. synth_ai/api/train/configs/__init__.py +23 -2
  153. synth_ai/api/train/configs/prompt_learning.py +442 -0
  154. synth_ai/api/train/configs/rl.py +61 -7
  155. synth_ai/api/train/configs/sft.py +6 -2
  156. synth_ai/api/train/configs/shared.py +59 -2
  157. synth_ai/api/train/task_app.py +1 -1
  158. synth_ai/api/train/validators.py +277 -0
  159. synth_ai/auth/credentials.py +119 -0
  160. synth_ai/baseline/__init__.py +25 -0
  161. synth_ai/baseline/config.py +209 -0
  162. synth_ai/baseline/discovery.py +214 -0
  163. synth_ai/baseline/execution.py +146 -0
  164. synth_ai/cli/__init__.py +94 -18
  165. synth_ai/cli/__main__.py +0 -0
  166. synth_ai/cli/claude.py +70 -0
  167. synth_ai/cli/codex.py +84 -0
  168. synth_ai/cli/commands/__init__.py +18 -0
  169. synth_ai/cli/commands/baseline/__init__.py +12 -0
  170. synth_ai/cli/commands/baseline/core.py +637 -0
  171. synth_ai/cli/commands/baseline/list.py +93 -0
  172. synth_ai/cli/commands/demo/__init__.py +6 -0
  173. synth_ai/cli/commands/demo/core.py +163 -0
  174. synth_ai/cli/commands/eval/__init__.py +19 -0
  175. synth_ai/cli/commands/eval/core.py +1112 -0
  176. synth_ai/cli/commands/eval/errors.py +81 -0
  177. synth_ai/cli/commands/eval/validation.py +133 -0
  178. synth_ai/cli/commands/filter/__init__.py +12 -0
  179. synth_ai/cli/commands/filter/core.py +424 -0
  180. synth_ai/cli/commands/filter/errors.py +55 -0
  181. synth_ai/cli/commands/filter/validation.py +77 -0
  182. synth_ai/cli/commands/help/__init__.py +177 -0
  183. synth_ai/cli/commands/help/core.py +72 -0
  184. synth_ai/cli/commands/smoke/__init__.py +7 -0
  185. synth_ai/cli/commands/smoke/core.py +1436 -0
  186. synth_ai/cli/commands/status/__init__.py +64 -0
  187. synth_ai/cli/commands/status/client.py +192 -0
  188. synth_ai/cli/commands/status/config.py +92 -0
  189. synth_ai/cli/commands/status/errors.py +20 -0
  190. synth_ai/cli/commands/status/formatters.py +164 -0
  191. synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
  192. synth_ai/cli/commands/status/subcommands/files.py +79 -0
  193. synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
  194. synth_ai/cli/commands/status/subcommands/models.py +79 -0
  195. synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
  196. synth_ai/cli/commands/status/subcommands/runs.py +81 -0
  197. synth_ai/cli/commands/status/subcommands/summary.py +47 -0
  198. synth_ai/cli/commands/status/subcommands/usage.py +203 -0
  199. synth_ai/cli/commands/status/utils.py +114 -0
  200. synth_ai/cli/commands/train/__init__.py +53 -0
  201. synth_ai/cli/commands/train/core.py +21 -0
  202. synth_ai/cli/commands/train/errors.py +117 -0
  203. synth_ai/cli/commands/train/judge_schemas.py +200 -0
  204. synth_ai/cli/commands/train/judge_validation.py +305 -0
  205. synth_ai/cli/commands/train/validation.py +386 -0
  206. synth_ai/cli/demo.py +30 -158
  207. synth_ai/cli/deploy/__init__.py +43 -0
  208. synth_ai/cli/deploy.py +162 -0
  209. synth_ai/cli/eval/__init__.py +36 -0
  210. synth_ai/cli/eval/core.py +5 -0
  211. synth_ai/cli/eval/errors.py +31 -0
  212. synth_ai/cli/eval/validation.py +5 -0
  213. synth_ai/cli/filter/__init__.py +28 -0
  214. synth_ai/cli/filter/core.py +5 -0
  215. synth_ai/cli/filter/errors.py +23 -0
  216. synth_ai/cli/filter/validation.py +5 -0
  217. synth_ai/cli/legacy_root_backup.py +14 -8
  218. synth_ai/cli/modal_serve/__init__.py +12 -0
  219. synth_ai/cli/modal_serve/core.py +14 -0
  220. synth_ai/cli/modal_serve/errors.py +8 -0
  221. synth_ai/cli/modal_serve/validation.py +11 -0
  222. synth_ai/cli/opencode.py +107 -0
  223. synth_ai/cli/root.py +9 -5
  224. synth_ai/cli/serve/__init__.py +12 -0
  225. synth_ai/cli/serve/core.py +14 -0
  226. synth_ai/cli/serve/errors.py +8 -0
  227. synth_ai/cli/serve/validation.py +11 -0
  228. synth_ai/cli/setup.py +20 -265
  229. synth_ai/cli/status.py +7 -126
  230. synth_ai/cli/task_app_deploy.py +1 -10
  231. synth_ai/cli/task_app_modal_serve.py +4 -9
  232. synth_ai/cli/task_app_serve.py +4 -11
  233. synth_ai/cli/task_apps.py +51 -1480
  234. synth_ai/cli/train/__init__.py +12 -0
  235. synth_ai/cli/train/core.py +21 -0
  236. synth_ai/cli/train/errors.py +8 -0
  237. synth_ai/cli/train/validation.py +24 -0
  238. synth_ai/cli/train.py +1 -14
  239. synth_ai/demos/crafter/grpo_crafter_task_app.py +1 -1
  240. synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +1 -1
  241. synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
  242. synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
  243. synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
  244. synth_ai/environments/examples/red/engine.py +33 -12
  245. synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
  246. synth_ai/environments/examples/red/environment.py +26 -0
  247. synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
  248. synth_ai/http.py +12 -0
  249. synth_ai/judge_schemas.py +10 -10
  250. synth_ai/learning/__init__.py +10 -0
  251. synth_ai/learning/prompt_learning_client.py +276 -0
  252. synth_ai/learning/prompt_learning_types.py +184 -0
  253. synth_ai/learning/rl/client.py +3 -1
  254. synth_ai/pricing/__init__.py +2 -0
  255. synth_ai/pricing/model_pricing.py +57 -0
  256. synth_ai/streaming/__init__.py +29 -0
  257. synth_ai/streaming/config.py +94 -0
  258. synth_ai/streaming/handlers.py +518 -0
  259. synth_ai/streaming/streamer.py +320 -0
  260. synth_ai/streaming/types.py +95 -0
  261. synth_ai/task/apps/__init__.py +1 -0
  262. synth_ai/task/config.py +2 -0
  263. synth_ai/task/tracing_utils.py +25 -25
  264. synth_ai/task/validators.py +45 -9
  265. synth_ai/task_app_cfgs.py +21 -0
  266. synth_ai/tracing_v3/config.py +162 -19
  267. synth_ai/tracing_v3/constants.py +1 -1
  268. synth_ai/tracing_v3/db_config.py +24 -38
  269. synth_ai/tracing_v3/migration_helper.py +1 -2
  270. synth_ai/tracing_v3/storage/config.py +47 -13
  271. synth_ai/tracing_v3/storage/factory.py +3 -3
  272. synth_ai/tracing_v3/turso/daemon.py +113 -11
  273. synth_ai/tracing_v3/turso/native_manager.py +92 -16
  274. synth_ai/types.py +8 -0
  275. synth_ai/urls.py +11 -0
  276. synth_ai/utils/__init__.py +30 -1
  277. synth_ai/utils/agents.py +74 -0
  278. synth_ai/utils/bin.py +39 -0
  279. synth_ai/utils/cli.py +149 -5
  280. synth_ai/utils/env.py +40 -33
  281. synth_ai/utils/http.py +4 -1
  282. synth_ai/utils/json.py +72 -0
  283. synth_ai/utils/modal.py +285 -3
  284. synth_ai/utils/paths.py +48 -0
  285. synth_ai/utils/uvicorn.py +113 -0
  286. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/METADATA +109 -6
  287. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/RECORD +291 -142
  288. examples/qwen_vl/configs/eval_qwen2vl_vision.toml +0 -44
  289. synth_ai/cli/tui.py +0 -62
  290. synth_ai/tui/__init__.py +0 -5
  291. synth_ai/tui/__main__.py +0 -13
  292. synth_ai/tui/cli/__init__.py +0 -1
  293. synth_ai/tui/cli/query_experiments.py +0 -164
  294. synth_ai/tui/cli/query_experiments_v3.py +0 -164
  295. synth_ai/tui/dashboard.py +0 -911
  296. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/WHEEL +0 -0
  297. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/entry_points.txt +0 -0
  298. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/licenses/LICENSE +0 -0
  299. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,132 @@
1
+ # ✅ Inference Success Report
2
+
3
+ **Date**: Oct 31, 2025
4
+ **Models Tested**: Latest SFT and RL models from training
5
+
6
+ ## Working Solution
7
+
8
+ ### Correct Endpoint
9
+ ```
10
+ https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions
11
+ ```
12
+
13
+ ### SFT/PEFT Models: ✅ WORKING
14
+
15
+ **Model ID**: `peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9`
16
+
17
+ **Test Code**:
18
+ ```python
19
+ import httpx
20
+ import os
21
+
22
+ SYNTH_API_KEY = os.getenv("SYNTH_API_KEY")
23
+ url = "https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions"
24
+
25
+ headers = {
26
+ "Authorization": f"Bearer {SYNTH_API_KEY}",
27
+ "Content-Type": "application/json",
28
+ }
29
+
30
+ payload = {
31
+ "model": "peft:Qwen/Qwen3-0.6B:job_24faa0fdfdf648b9",
32
+ "messages": [
33
+ {"role": "system", "content": "You are a helpful assistant."},
34
+ {"role": "user", "content": "Say 'Hello, I am working!' and nothing else."}
35
+ ],
36
+ "temperature": 0.2,
37
+ "max_tokens": 100,
38
+ }
39
+
40
+ with httpx.Client(timeout=300.0) as client:
41
+ response = client.post(url, json=payload, headers=headers)
42
+ print(response.json())
43
+ ```
44
+
45
+ **Result**:
46
+ - ✅ Status: 200 OK
47
+ - ✅ Response generated successfully
48
+ - ✅ Token usage tracked: 31 prompt + 72 completion = 103 total
49
+ - ✅ Output: "Hello, I am working!" (with thinking tokens as expected)
50
+
51
+ ### RL Models: ⚠️ NEEDS PROMOTION
52
+
53
+ **Model ID**: `rl:Qwen/Qwen3-4B:job_19a38041c38f96e638c:checkpoint-epoch-1`
54
+
55
+ **Status**: 303 Redirect (empty response)
56
+
57
+ **Root Cause**:
58
+ From monorepo backend code inspection, RL checkpoints require a "promotion" step to be loaded onto Modal before they can be used for inference. The direct Modal endpoint returns a redirect for unpromoted RL models.
59
+
60
+ **Solution Options**:
61
+
62
+ #### Option 1: Use Backend Proxy (Recommended)
63
+ The backend automatically handles RL promotion:
64
+ ```python
65
+ # Use backend proxy instead of direct Modal
66
+ url = "https://your-backend.example.com/api/chat/completions"
67
+ # Backend will auto-promote and route to vLLM
68
+ ```
69
+
70
+ #### Option 2: Manual Promotion (Advanced)
71
+ 1. Call promotion endpoint first
72
+ 2. Wait for model to load onto Modal
73
+ 3. Then call inference endpoint
74
+
75
+ ## Key Learnings
76
+
77
+ ### What We Got Wrong Initially:
78
+ 1. ❌ Wrong endpoint path: Used `/v1/chat/completions` → should be `/chat/completions`
79
+ 2. ❌ Wrong base URL: Used render.com URL → should be Modal URL
80
+ 3. ❌ Assumed RL = PEFT workflow → RL needs promotion step
81
+
82
+ ### What We Got Right:
83
+ 1. ✅ Model ID format from `synth-ai status models list`
84
+ 2. ✅ Using SYNTH_API_KEY for auth
85
+ 3. ✅ Bearer token authorization header
86
+
87
+ ## Recommendations for Library Improvement
88
+
89
+ ### 1. Add Simple CLI Command
90
+ ```bash
91
+ synth-ai inference \
92
+ --model "peft:Qwen/Qwen3-0.6B:job_xxx" \
93
+ --message "Hello" \
94
+ --max-tokens 100
95
+ ```
96
+
97
+ ### 2. Document Endpoint in Model Status
98
+ ```bash
99
+ $ synth-ai status models get "peft:..."
100
+ Model: peft:Qwen/Qwen3-0.6B:job_xxx
101
+ Status: succeeded
102
+ Inference Endpoint: https://synth-laboratories-dev--learning-v2-service-fastapi-app.modal.run/chat/completions
103
+ Ready: ✅ Yes (use directly)
104
+ ```
105
+
106
+ ### 3. Add Python SDK Example
107
+ ```python
108
+ from synth_ai import InferenceClient
109
+
110
+ client = InferenceClient(api_key=os.getenv("SYNTH_API_KEY"))
111
+ response = client.chat.completions.create(
112
+ model="peft:Qwen/Qwen3-0.6B:job_xxx",
113
+ messages=[{"role": "user", "content": "Hello"}]
114
+ )
115
+ print(response.choices[0].message.content)
116
+ ```
117
+
118
+ ### 4. Clear Error Messages
119
+ - 303 → "RL model needs promotion. Use backend proxy or call /promote endpoint first."
120
+ - 404 → "Model not found. Check model ID with: synth-ai status models list"
121
+
122
+ ## Success Criteria Met
123
+
124
+ - ✅ Can get model ID from CLI
125
+ - ✅ Know correct endpoint
126
+ - ✅ Know correct auth (SYNTH_API_KEY)
127
+ - ✅ Can send test message
128
+ - ✅ Get response back
129
+ - ⚠️ RL models need extra step (documented)
130
+
131
+ **Status**: PEFT/SFT inference is fully working! RL needs backend proxy.
132
+
@@ -0,0 +1,158 @@
1
+ # Crafter: From Rollouts to RL with the Synth AI CLI
2
+
3
+ This playbook mirrors the original “Warming Up to RL” walkthrough, but swaps the bespoke scripts for the first–class `uvx synth-ai` helpers. Every step—from deploying the task app to filtering rollouts, fine-tuning, and bootstrapping RL— now uses the same CLI you’d reach for in production.
4
+
5
+ All commands assume you are inside the repository root and have `uv`/`uvx` available.
6
+
7
+ ---
8
+
9
+ ## 0. Prerequisites
10
+
11
+ 1. Install dependencies and authenticate once:
12
+ ```bash
13
+ uv pip install -e .
14
+ uvx synth-ai setup
15
+ ```
16
+ The setup wizard writes the required `SYNTH_API_KEY`, `ENVIRONMENT_API_KEY`, and local `.env` helpers.
17
+
18
+ 2. Copy the example secrets if you need a starter file:
19
+ ```bash
20
+ cp examples/warming_up_to_rl/.env.example .env
21
+ ```
22
+
23
+ 3. Export the path we use for trace capture (optional but keeps things tidy):
24
+ ```bash
25
+ export CRAFTER_TRACE_DB=traces/v3/crafter_blog.db
26
+ ```
27
+
28
+ ---
29
+
30
+ ## 1. Ship the Crafter Task App
31
+
32
+ Deploy the hosted Crafter environment once. The Modal URL that prints at the end is reused by eval, SFT, and RL.
33
+
34
+ ```bash
35
+ uvx synth-ai deploy grpo-crafter \
36
+ --runtime modal \
37
+ --modal-mode serve \
38
+ --name crafter-blogpost \
39
+ --env-file .env
40
+ ```
41
+
42
+ For local testing you can run:
43
+
44
+ ```bash
45
+ uvx synth-ai deploy grpo-crafter \
46
+ --runtime uvicorn \
47
+ --port 8001 \
48
+ --trace traces/v3 \
49
+ --env-file .env
50
+ ```
51
+
52
+ Copy the Modal URL (e.g. `https://your-app.modal.run`) and replace the `task_app_url` placeholders inside every config under `examples/blog_posts/warming_up_to_rl/configs/`.
53
+
54
+ ---
55
+
56
+ ## 2. Collect High-Quality Rollouts
57
+
58
+ We lean on large teacher models to produce demonstrations. The configs in `configs/` already request full traces so we retain chain-of-thought.
59
+
60
+ Groq Qwen3-32B (text-only prompt):
61
+ ```bash
62
+ uvx synth-ai eval grpo-crafter \
63
+ --config examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml \
64
+ --trace-db "${CRAFTER_TRACE_DB}"
65
+ ```
66
+
67
+ GPT-OSS-120B via Groq’s OpenAI-compatible endpoint (also text-only):
68
+ ```bash
69
+ uvx synth-ai eval grpo-crafter \
70
+ --config examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml \
71
+ --trace-db "${CRAFTER_TRACE_DB}"
72
+ ```
73
+
74
+ Both configs disable image attachments and rely on the textual observation renderer (`format_observation`) so Groq stays within its supported modalities. If you want to try other models, keep `use_vision = false` unless the provider explicitly supports image inputs.
75
+
76
+ ---
77
+
78
+ ## 3. Filter Into an SFT Dataset
79
+
80
+ Once traces are stored in `CRAFT_TRACE_DB`, trim to the crisp trajectories:
81
+
82
+ ```bash
83
+ uvx synth-ai filter \
84
+ --config examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml
85
+ ```
86
+
87
+ The output JSONL lands in `ft_data/crafter_blog_high_reward.jsonl`, ready for supervised fine-tuning.
88
+
89
+ ---
90
+
91
+ ## 4. Fine-Tune Qwen3-4B with `uvx synth-ai train`
92
+
93
+ Update the dataset path (and optionally hyperparameters) in `train_sft_qwen4b.toml`, then launch:
94
+
95
+ ```bash
96
+ uvx synth-ai train \
97
+ --type sft \
98
+ --config examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml \
99
+ --env-file .env \
100
+ --poll
101
+ ```
102
+
103
+ Capture the returned job id (it looks like `fft:Qwen/Qwen3-4B:job_xxxxx`). We reuse that identifier in the evaluation and RL configs.
104
+ At any time you can list recently minted checkpoints with:
105
+
106
+ ```bash
107
+ uvx synth-ai status models
108
+ ```
109
+
110
+ The output table shows the canonical model name/ID alongside the source job.
111
+
112
+ ---
113
+
114
+ ## 5. Evaluate the Fine-Tuned Checkpoint
115
+
116
+ Replace both `REPLACE-WITH-SFT-JOB-ID` strings inside `eval_ft_qwen4b.toml`, then run:
117
+
118
+ ```bash
119
+ uvx synth-ai eval grpo-crafter \
120
+ --config examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml \
121
+ --trace-db "${CRAFTER_TRACE_DB}"
122
+ ```
123
+
124
+ This provides a clean, CLI-native comparison between the teacher rollouts and the fine-tuned model.
125
+
126
+ ---
127
+
128
+ ## 6. Kick Off RL from the Fine-Tuned Model
129
+
130
+ Point `train_rl_from_sft.toml` at the same Modal task app and set `model.source` to your SFT job id:
131
+
132
+ ```bash
133
+ uvx synth-ai train \
134
+ --type rl \
135
+ --config examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml \
136
+ --env-file .env \
137
+ --poll
138
+ ```
139
+
140
+ The CLI streams rollout and judge metrics in real time. When the run finishes, you can re-use the Stage 5 config (substituting the RL job id) to quantify the uplift.
141
+ If you lose track of the produced RL label or want to confirm the latest status, run:
142
+
143
+ ```bash
144
+ uvx synth-ai status jobs
145
+ uvx synth-ai status models
146
+ ```
147
+
148
+ The first command shows job completion state; the second surfaces model IDs you can plug into new eval configs.
149
+
150
+ ---
151
+
152
+ ## 7. Where to Go Next
153
+
154
+ - The original `examples/warming_up_to_rl` folder still contains deeper experiments (auto-curricula, modal renderers, etc.).
155
+ - Add more `eval_*.toml` configs to compare alternative judges or reward shaping strategies.
156
+ - Plug the filtered dataset into `uvx synth-ai files upload` if you want to share it with a teammate without copying JSONL around.
157
+
158
+ This directory now holds everything a blog post needs: configs, output locations, and the CLI entrypoints to reproduce the Crafter SFT → RL pipeline end-to-end.
@@ -0,0 +1,164 @@
1
+ # Smoke Testing Your Task App
2
+
3
+ This guide shows how to quickly test your task app using the `synth-ai smoke` command with auto-start features.
4
+
5
+ ## Quick Start
6
+
7
+ The easiest way to smoke test is using the `[smoke]` section in your RL config:
8
+
9
+ ```bash
10
+ cd examples/blog_posts/warming_up_to_rl
11
+ uv run synth-ai smoke --config configs/smoke_test.toml
12
+ ```
13
+
14
+ **That's it!** The smoke command will:
15
+ 1. ✅ Auto-start sqld server for tracing (if `sqld_auto_start = true`)
16
+ 2. ✅ Auto-start your task app on port 8765 (if `task_app_name` is set)
17
+ 3. ✅ Run 10 rollout steps with `gpt-5-nano` using synthetic mocking
18
+ 4. ✅ Automatically stop all background services when done
19
+
20
+ **Expected output:**
21
+ ```
22
+ [smoke] sqld ready
23
+ [smoke] Task app ready at http://localhost:8765 (status=400)
24
+ [mock-rl] server ready http://127.0.0.1:51798 backend=synthetic
25
+ >> POST /rollout run_id=smoke-... env=crafter policy=crafter-react
26
+ [mock-rl] ← request backend=synthetic model=gpt-5-nano messages=2
27
+ [mock-rl] → response tool_calls=1 backend=synthetic
28
+ rollout[0:0] episodes=1 steps=10 mean_return=1.0000
29
+ ✓ Smoke rollouts complete
30
+ successes=1/1 total_steps=10 v3_traces=1/1 nonzero_returns=1/1
31
+ [smoke] Background services stopped
32
+ ```
33
+
34
+ ## Configuration
35
+
36
+ Add a `[smoke]` section to your RL config:
37
+
38
+ ```toml
39
+ [smoke]
40
+ # Auto-start task app
41
+ task_app_name = "grpo-crafter"
42
+ task_app_port = 8765
43
+ task_app_env_file = ".env"
44
+ task_app_force = true
45
+
46
+ # Auto-start sqld
47
+ sqld_auto_start = true
48
+ sqld_db_path = "./traces/local.db"
49
+ sqld_hrana_port = 8080
50
+ sqld_http_port = 8081
51
+
52
+ # Test parameters
53
+ max_steps = 10
54
+ policy = "gpt-5-nano"
55
+ mock_backend = "synthetic" # or "openai" (requires valid OpenAI API key)
56
+ return_trace = true
57
+ ```
58
+
59
+ ## Testing Methods
60
+
61
+ ### 1. Full Auto (Recommended)
62
+ Everything auto-starts from config:
63
+ ```bash
64
+ uv run synth-ai smoke --config configs/smoke_test.toml
65
+ ```
66
+
67
+ ### 2. Manual Task App + Auto sqld
68
+ Start task app manually, auto-start sqld:
69
+ ```bash
70
+ # Config with sqld_auto_start=true but no task_app_name
71
+ uv run synth-ai smoke --config configs/my_config.toml --url http://localhost:8765
72
+ ```
73
+
74
+ ### 3. Override Config Settings
75
+ Override any config value via CLI:
76
+ ```bash
77
+ uv run synth-ai smoke --config configs/smoke_test.toml --max-steps 5
78
+ ```
79
+
80
+ ### 4. No Config (Manual Everything)
81
+ ```bash
82
+ # Start services manually in separate terminals:
83
+ # Terminal 1: sqld --db-path ./traces/local.db --hrana-listen-addr 127.0.0.1:8080 --http-listen-addr 127.0.0.1:8081
84
+ # Terminal 2: uv run synth-ai task-app serve grpo-crafter --port 8765 --env-file .env --force
85
+
86
+ # Terminal 3: Run smoke test
87
+ uv run synth-ai smoke --url http://localhost:8765 \
88
+ --env-name crafter \
89
+ --policy-name crafter-react \
90
+ --max-steps 10 \
91
+ --policy mock \
92
+ --mock-backend openai
93
+ ```
94
+
95
+ ## Prerequisites
96
+
97
+ ### Install sqld (for tracing)
98
+ ```bash
99
+ brew install sqld
100
+ # or
101
+ curl -fsSL https://get.turso.com/sqld | bash
102
+ ```
103
+
104
+ ### Verify Installation
105
+ ```bash
106
+ which sqld
107
+ # Should output: /opt/homebrew/bin/sqld or similar
108
+ ```
109
+
110
+ ## Common Issues
111
+
112
+ ### sqld not found
113
+ If you see "sqld not found in PATH":
114
+ ```bash
115
+ brew install sqld
116
+ ```
117
+
118
+ ### Port already in use
119
+ Use `task_app_force = true` in config, or:
120
+ ```bash
121
+ # Kill processes on ports 8080, 8081, 8765
122
+ lsof -ti:8080,8081,8765 | xargs kill -9
123
+ ```
124
+
125
+ ### Task app not starting
126
+ Check the error output - you may need:
127
+ - Valid `.env` file with required keys
128
+ - Correct task app name registered in your codebase
129
+
130
+ ## Example Output
131
+
132
+ ```
133
+ [smoke] Loaded configuration from configs/smoke_test.toml
134
+ [smoke] Config keys: task_app_name, task_app_port, sqld_auto_start, max_steps, policy
135
+ [smoke] Starting sqld server...
136
+ [smoke] DB path: /Users/you/project/traces/local.db
137
+ [smoke] Hrana port: 8080, HTTP port: 8081
138
+ [smoke] sqld ready
139
+ [smoke] Starting task app 'grpo-crafter' on port 8765...
140
+ [smoke] Task app ready at http://localhost:8765
141
+ [smoke] Task app started, will use URL: http://localhost:8765
142
+ [mock-rl] server ready http://127.0.0.1:52134 backend=openai
143
+ >> POST /rollout run_id=smoke-abc123...
144
+ rollout[0:0] episodes=1 steps=20 mean_return=1.2500
145
+ ✓ Smoke rollouts complete
146
+ successes=1/1 total_steps=20 v3_traces=1/1 nonzero_returns=1/1
147
+ [smoke] Stopping sqld...
148
+ [smoke] Stopping task_app...
149
+ [smoke] Background services stopped
150
+ ```
151
+
152
+ ## Next Steps
153
+
154
+ Once smoke tests pass:
155
+ 1. Train your model: `uv run synth-ai train --type rl --config configs/your_config.toml`
156
+ 2. Check traces: Look in `./traces/` directory
157
+ 3. Monitor training: Use the Synth dashboard
158
+
159
+ ## Full Config Reference
160
+
161
+ See [`configs/smoke_test.toml`](configs/smoke_test.toml) for a complete example.
162
+
163
+ See [CLI Smoke Documentation](https://docs.usesynth.ai/cli/smoke) for all options.
164
+
@@ -0,0 +1,253 @@
1
+ # Smoke Test Implementation - Complete
2
+
3
+ ## Summary
4
+
5
+ The smoke test now provides **complete visibility into RL training rollouts**, including:
6
+
7
+ ✅ **Auto-start background services** (sqld, task app)
8
+ ✅ **Real OpenAI inference** with gpt-4o-mini
9
+ ✅ **Tool call display** - see every action the policy takes
10
+ ✅ **Trace validation** - verify v3 trace format
11
+ ✅ **Clean output** - all diagnostic noise suppressed
12
+
13
+ ## Quick Start
14
+
15
+ ```bash
16
+ cd examples/blog_posts/warming_up_to_rl
17
+ uv run synth-ai smoke --config configs/smoke_test.toml
18
+ ```
19
+
20
+ **Output shows:**
21
+ - Service startup (sqld, task app)
22
+ - Real-time inference requests
23
+ - **All 10 tool calls with arguments** (e.g., `interact_many({"actions":["move_up","move_up"]})`)
24
+ - Rollout metrics (steps, returns, rewards)
25
+ - Success validation
26
+
27
+ ## Documentation
28
+
29
+ All documentation has been updated for future agents:
30
+
31
+ ### 1. User Documentation
32
+ - **`SMOKE_TESTING.md`** - How to run smoke tests, what to expect
33
+ - **`configs/smoke_test.toml`** - Well-commented example configuration
34
+ - **`monorepo/docs/cli/smoke.mdx`** - Mintlify CLI documentation
35
+
36
+ ### 2. Developer Documentation
37
+ - **`ARCHITECTURE.md`** - Internal architecture, troubleshooting guide
38
+ - **`synth_ai/cli/commands/smoke/core.py`** - Extensive inline comments explaining tool call extraction
39
+
40
+ ### 3. Code Comments
41
+
42
+ **Tool Call Extraction (core.py lines 946-997):**
43
+ ```python
44
+ # Extract and display tool calls from v3 trace
45
+ #
46
+ # IMPORTANT: Tool calls are extracted from the structured v3 trace format.
47
+ # The trace must be requested with return_trace=True for this to work.
48
+ #
49
+ # Trace structure:
50
+ # trace.event_history[] - list of events (policy calls, env steps)
51
+ # ├─ event.call_records[] - LLM calls made during this event
52
+ # ├─ call_record.output_tool_calls[] - tool calls from LLM response
53
+ # ├─ tool_call.name - function name (e.g., "interact_many")
54
+ # └─ tool_call.arguments_json - JSON string of arguments
55
+ ```
56
+
57
+ ## Key Implementation Details
58
+
59
+ ### Tool Call Display
60
+
61
+ **Requirements:**
62
+ 1. `return_trace = true` in config (CRITICAL - without this, no tool calls)
63
+ 2. v3 trace format (`trace_format="structured"`)
64
+ 3. Mock proxy or real inference (direct API calls don't populate traces correctly)
65
+
66
+ **Data Flow:**
67
+ ```
68
+ 1. Rollout request with return_trace=True
69
+
70
+ 2. Task app makes LLM calls, captures responses
71
+
72
+ 3. LLM responses include tool_calls
73
+
74
+ 4. Task app stores call_records in event_history
75
+
76
+ 5. Smoke command extracts from trace.event_history[].call_records[].output_tool_calls[]
77
+
78
+ 6. Display: TOOL_CALL[N]: function_name({...args})
79
+ ```
80
+
81
+ ### Diagnostic Suppression
82
+
83
+ **Permanently disabled (commented out, not deleted):**
84
+ - `synth_ai/tracing_v3/config.py:21` - `[TRACING_V3_CONFIG_LOADED]`
85
+ - `synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py` - All `[PATCH]` messages
86
+ - `synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py` - All `[PATCH]` messages
87
+ - `synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py` - All `[PATCH]` messages
88
+
89
+ **Why commented, not deleted?**
90
+ - Preserves context for debugging
91
+ - Shows what messages existed
92
+ - Easy to re-enable if needed
93
+
94
+ ### Background Service Management
95
+
96
+ **Task App:**
97
+ - Runs from synth-ai root (required for discovery)
98
+ - Uses `nohup` for detachment
99
+ - Output → `nohup_task_app.out`
100
+ - Health check accepts 200 or 400 (400 = server up, auth failing)
101
+ - 120s timeout with progress updates
102
+
103
+ **sqld:**
104
+ - Dual ports: 8080 (Hrana WebSocket), 8081 (HTTP)
105
+ - Health check: `GET http://127.0.0.1:8081/health`
106
+ - 30s timeout
107
+ - Auto-cleanup of existing processes
108
+
109
+ ## Configuration Reference
110
+
111
+ ### Critical Settings
112
+
113
+ ```toml
114
+ [smoke]
115
+ # Auto-start services
116
+ task_app_name = "grpo-crafter" # Task app to serve
117
+ task_app_port = 8765
118
+ task_app_env_file = ".env" # Required for this app
119
+ sqld_auto_start = true
120
+
121
+ # Inference - REAL OpenAI
122
+ model = "gpt-4o-mini" # Actual model used
123
+ mock_backend = "openai" # Route through OpenAI API
124
+ use_mock = true # Enable mock proxy
125
+
126
+ # CRITICAL for tool call display
127
+ return_trace = true # Must be true!
128
+ ```
129
+
130
+ ### Optional Settings
131
+
132
+ All `[smoke]` parameters are optional - CLI args override TOML values:
133
+
134
+ ```bash
135
+ # Override max steps
136
+ uv run synth-ai smoke --config configs/smoke_test.toml --max-steps 5
137
+
138
+ # Use different model
139
+ uv run synth-ai smoke --config configs/smoke_test.toml --model gpt-4o
140
+
141
+ # Disable mock (use direct API - won't show tool calls properly)
142
+ uv run synth-ai smoke --config configs/smoke_test.toml --no-mock
143
+ ```
144
+
145
+ ## Troubleshooting
146
+
147
+ ### No tool calls displayed
148
+
149
+ **Symptom:** `⚠ No tool calls found in trace`
150
+
151
+ **Solutions:**
152
+ 1. Verify `return_trace = true` in config
153
+ 2. Check `v3_traces=1/1` in output (should match successes)
154
+ 3. Ensure `use_mock = true` or using mock proxy
155
+ 4. Check task app logs: `cat /path/to/synth-ai/nohup_task_app.out`
156
+
157
+ ### Task app exits immediately
158
+
159
+ **Symptom:** `0 steps`, process not running
160
+
161
+ **Solutions:**
162
+ 1. Verify task app name: `synth-ai task-app list`
163
+ 2. Check .env file exists at `task_app_env_file` path
164
+ 3. Ensure running from correct directory
165
+ 4. Manual test: `cd /synth-ai && uvx synth-ai task-app serve grpo-crafter --port 8765 --env-file /path/.env --force`
166
+
167
+ ### Port conflicts
168
+
169
+ **Symptom:** `Address already in use`
170
+
171
+ **Solution:** Auto-cleanup should handle this, but manual cleanup:
172
+ ```bash
173
+ lsof -ti :8080 | xargs kill -9
174
+ lsof -ti :8081 | xargs kill -9
175
+ lsof -ti :8765 | xargs kill -9
176
+ ```
177
+
178
+ ## Testing
179
+
180
+ ### Unit Tests
181
+
182
+ - `tests/unit/test_train_validation.py::test_rl_config_with_smoke_section` - Validates `[smoke]` section parsing
183
+ - `tests/unit/test_smoke_config.py` - Comprehensive Pydantic validation tests
184
+
185
+ ### Integration Test
186
+
187
+ ```bash
188
+ cd examples/blog_posts/warming_up_to_rl
189
+ uv run synth-ai smoke --config configs/smoke_test.toml
190
+ ```
191
+
192
+ **Expected result:**
193
+ - ✅ Services start successfully
194
+ - ✅ 10 tool calls displayed
195
+ - ✅ `v3_traces=1/1`
196
+ - ✅ `successes=1/1`
197
+ - ✅ `nonzero_returns=1/1`
198
+
199
+ ## Files Modified
200
+
201
+ ### Core Implementation
202
+ - `synth_ai/cli/commands/smoke/core.py` - Tool call extraction, auto-start logic
203
+ - `synth_ai/api/train/configs/rl.py` - `SmokeConfig` Pydantic model
204
+ - `synth_ai/api/train/builders.py` - Remove `[smoke]` before sending to trainer
205
+
206
+ ### Diagnostic Suppression
207
+ - `synth_ai/tracing_v3/config.py` - Commented out `[TRACING_V3_CONFIG_LOADED]`
208
+ - `synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py` - Commented out `[PATCH]`
209
+ - `synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py` - Commented out `[PATCH]`
210
+ - `synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py` - Commented out `[PATCH]`
211
+
212
+ ### Documentation
213
+ - `examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md` - User guide
214
+ - `examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md` - Developer guide
215
+ - `examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml` - Example config
216
+ - `examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml` - Inline docs
217
+ - `monorepo/docs/cli/smoke.mdx` - Mintlify CLI reference
218
+
219
+ ### Tests
220
+ - `tests/unit/test_train_validation.py` - Added smoke section test
221
+ - `tests/unit/test_smoke_config.py` - Comprehensive smoke config tests
222
+
223
+ ## Future Improvements
224
+
225
+ Ideas for future agents:
226
+
227
+ 1. **Streaming display** - Show tool calls as they happen, not just at end
228
+ 2. **Tool call validation** - Verify format matches environment expectations
229
+ 3. **Performance metrics** - Track inference latency per call
230
+ 4. **Cost tracking** - Display OpenAI API costs
231
+ 5. **Parallel rollouts** - Support concurrent execution testing
232
+ 6. **Vision support** - Save observations for vision-based tasks
233
+ 7. **Interactive mode** - Step through rollout one action at a time
234
+ 8. **Replay mode** - Re-run saved traces for debugging
235
+
236
+ ## Success Criteria Met
237
+
238
+ ✅ **Tool calls visible** - All 10 calls displayed with arguments
239
+ ✅ **Real inference** - OpenAI gpt-4o-mini executing actual tool calls
240
+ ✅ **Clean output** - No diagnostic noise
241
+ ✅ **Auto-start** - Background services managed automatically
242
+ ✅ **Well documented** - Comprehensive docs for users and developers
243
+ ✅ **Robust** - Error handling, health checks, timeouts
244
+ ✅ **Tested** - Unit tests and working integration test
245
+
246
+ ## Contact
247
+
248
+ For questions or issues, see:
249
+ - Architecture details: `ARCHITECTURE.md`
250
+ - User guide: `SMOKE_TESTING.md`
251
+ - CLI reference: `monorepo/docs/cli/smoke.mdx`
252
+
253
+