synth-ai 0.2.16__py3-none-any.whl → 0.2.19__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of synth-ai might be problematic. Click here for more details.

Files changed (299) hide show
  1. examples/analyze_semantic_words.sh +2 -2
  2. examples/baseline/banking77_baseline.py +204 -0
  3. examples/baseline/crafter_baseline.py +407 -0
  4. examples/baseline/pokemon_red_baseline.py +326 -0
  5. examples/baseline/simple_baseline.py +56 -0
  6. examples/baseline/warming_up_to_rl_baseline.py +239 -0
  7. examples/blog_posts/gepa/README.md +355 -0
  8. examples/blog_posts/gepa/configs/banking77_gepa_local.toml +95 -0
  9. examples/blog_posts/gepa/configs/banking77_gepa_test.toml +82 -0
  10. examples/blog_posts/gepa/configs/banking77_mipro_local.toml +52 -0
  11. examples/blog_posts/gepa/configs/hotpotqa_gepa_local.toml +59 -0
  12. examples/blog_posts/gepa/configs/hotpotqa_gepa_qwen.toml +36 -0
  13. examples/blog_posts/gepa/configs/hotpotqa_mipro_local.toml +53 -0
  14. examples/blog_posts/gepa/configs/hover_gepa_local.toml +59 -0
  15. examples/blog_posts/gepa/configs/hover_gepa_qwen.toml +36 -0
  16. examples/blog_posts/gepa/configs/hover_mipro_local.toml +53 -0
  17. examples/blog_posts/gepa/configs/ifbench_gepa_local.toml +59 -0
  18. examples/blog_posts/gepa/configs/ifbench_gepa_qwen.toml +36 -0
  19. examples/blog_posts/gepa/configs/ifbench_mipro_local.toml +53 -0
  20. examples/blog_posts/gepa/configs/pupa_gepa_local.toml +60 -0
  21. examples/blog_posts/gepa/configs/pupa_mipro_local.toml +54 -0
  22. examples/blog_posts/gepa/deploy_banking77_task_app.sh +41 -0
  23. examples/blog_posts/gepa/gepa_baseline.py +204 -0
  24. examples/blog_posts/gepa/query_prompts_example.py +97 -0
  25. examples/blog_posts/gepa/run_gepa_banking77.sh +87 -0
  26. examples/blog_posts/gepa/task_apps.py +105 -0
  27. examples/blog_posts/gepa/test_gepa_local.sh +67 -0
  28. examples/blog_posts/gepa/verify_banking77_setup.sh +123 -0
  29. examples/blog_posts/pokemon_vl/README.md +98 -0
  30. examples/blog_posts/pokemon_vl/configs/eval_gpt5nano.toml +26 -0
  31. examples/blog_posts/pokemon_vl/configs/eval_qwen3_vl.toml +27 -0
  32. examples/blog_posts/pokemon_vl/configs/eval_rl_final.toml +24 -0
  33. examples/blog_posts/pokemon_vl/configs/filter_high_reward.toml +10 -0
  34. examples/blog_posts/pokemon_vl/configs/train_rl_from_sft.toml +43 -0
  35. examples/blog_posts/pokemon_vl/configs/train_sft_qwen4b_vl.toml +40 -0
  36. examples/blog_posts/pokemon_vl/extract_images.py +239 -0
  37. examples/blog_posts/pokemon_vl/pokemon_vl_baseline.py +326 -0
  38. examples/blog_posts/pokemon_vl/run_eval_extract_images.py +209 -0
  39. examples/blog_posts/pokemon_vl/run_qwen_eval_extract_images.py +212 -0
  40. examples/blog_posts/pokemon_vl/text_box_analysis.md +106 -0
  41. examples/blog_posts/warming_up_to_rl/ARCHITECTURE.md +195 -0
  42. examples/blog_posts/warming_up_to_rl/FINAL_TEST_RESULTS.md +127 -0
  43. examples/blog_posts/warming_up_to_rl/INFERENCE_SUCCESS.md +132 -0
  44. examples/blog_posts/warming_up_to_rl/README.md +158 -0
  45. examples/blog_posts/warming_up_to_rl/SMOKE_TESTING.md +164 -0
  46. examples/blog_posts/warming_up_to_rl/SMOKE_TEST_COMPLETE.md +253 -0
  47. examples/blog_posts/warming_up_to_rl/configs/eval_baseline_qwen32b_10x20.toml +25 -0
  48. examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b.toml +25 -0
  49. examples/blog_posts/warming_up_to_rl/configs/eval_ft_qwen4b_10x20.toml +26 -0
  50. examples/blog_posts/warming_up_to_rl/configs/eval_groq_qwen32b.toml +25 -0
  51. examples/blog_posts/warming_up_to_rl/configs/eval_openai_gpt_oss_120b.toml +29 -0
  52. examples/blog_posts/warming_up_to_rl/configs/filter_high_reward_dataset.toml +10 -0
  53. examples/blog_posts/warming_up_to_rl/configs/smoke_test.toml +75 -0
  54. examples/blog_posts/warming_up_to_rl/configs/train_rl_from_sft.toml +91 -0
  55. examples/blog_posts/warming_up_to_rl/configs/train_sft_qwen4b.toml +40 -0
  56. examples/blog_posts/warming_up_to_rl/warming_up_to_rl_baseline.py +187 -0
  57. examples/dev/qwen3_32b_qlora_4xh100.toml +5 -0
  58. examples/multi_step/configs/VERILOG_REWARDS.md +4 -0
  59. examples/multi_step/configs/VERILOG_RL_CHECKLIST.md +4 -0
  60. examples/multi_step/configs/crafter_rl_outcome.toml +2 -1
  61. examples/multi_step/configs/crafter_rl_stepwise_hosted_judge.toml +65 -107
  62. examples/multi_step/configs/crafter_rl_stepwise_shaped.toml +2 -1
  63. examples/multi_step/configs/crafter_rl_stepwise_simple.toml +2 -1
  64. examples/multi_step/configs/crafter_rl_stepwise_simple_NEW_FORMAT.toml +105 -0
  65. examples/multi_step/configs/verilog_rl_lora.toml +80 -123
  66. examples/qwen_coder/configs/coder_lora_30b.toml +1 -3
  67. examples/qwen_coder/configs/coder_lora_4b.toml +4 -1
  68. examples/qwen_coder/configs/coder_lora_small.toml +1 -3
  69. examples/qwen_vl/README.md +10 -12
  70. examples/qwen_vl/SETUP_COMPLETE.md +7 -8
  71. examples/qwen_vl/VISION_TESTS_COMPLETE.md +2 -3
  72. examples/qwen_vl/collect_data_via_cli.md +76 -84
  73. examples/qwen_vl/collect_vision_traces.py +4 -4
  74. examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.toml +40 -57
  75. examples/qwen_vl/configs/crafter_vlm_sft_example.toml +1 -2
  76. examples/qwen_vl/configs/eval_gpt4o_mini_vision.toml +20 -37
  77. examples/qwen_vl/configs/eval_gpt5nano_vision.toml +21 -40
  78. examples/qwen_vl/configs/eval_qwen3vl_vision.toml +26 -0
  79. examples/qwen_vl/configs/{filter_qwen2vl_sft.toml → filter_qwen3vl_sft.toml} +4 -5
  80. examples/qwen_vl/configs/filter_vision_sft.toml +2 -3
  81. examples/qwen_vl/crafter_qwen_vl_agent.py +5 -5
  82. examples/qwen_vl/run_vision_comparison.sh +6 -7
  83. examples/rl/README.md +5 -5
  84. examples/rl/configs/rl_from_base_qwen.toml +26 -1
  85. examples/rl/configs/rl_from_base_qwen17.toml +6 -2
  86. examples/rl/task_app/README.md +1 -2
  87. examples/rl/task_app/math_single_step.py +2 -2
  88. examples/run_crafter_demo.sh +2 -2
  89. examples/sft/README.md +1 -1
  90. examples/sft/configs/crafter_fft_qwen0p6b.toml +4 -1
  91. examples/sft/configs/crafter_lora_qwen0p6b.toml +4 -1
  92. examples/swe/task_app/README.md +32 -2
  93. examples/swe/task_app/grpo_swe_mini.py +4 -0
  94. examples/swe/task_app/hosted/envs/crafter/react_agent.py +1 -1
  95. examples/swe/task_app/hosted/envs/mini_swe/environment.py +37 -10
  96. examples/swe/task_app/hosted/inference/openai_client.py +4 -38
  97. examples/swe/task_app/hosted/policy_routes.py +17 -0
  98. examples/swe/task_app/hosted/rollout.py +4 -2
  99. examples/swe/task_app/morph_backend.py +178 -0
  100. examples/task_apps/banking77/__init__.py +6 -0
  101. examples/task_apps/banking77/banking77_task_app.py +841 -0
  102. examples/task_apps/banking77/deploy_wrapper.py +46 -0
  103. examples/task_apps/crafter/CREATE_SFT_DATASET.md +4 -0
  104. examples/task_apps/crafter/FILTER_COMMAND_STATUS.md +4 -0
  105. examples/task_apps/crafter/FILTER_COMMAND_SUCCESS.md +4 -0
  106. examples/task_apps/crafter/task_app/README.md +1 -1
  107. examples/task_apps/crafter/task_app/grpo_crafter.py +90 -5
  108. examples/task_apps/crafter/task_app/grpo_crafter_task_app.py +1 -1
  109. examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py +4 -26
  110. examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -2
  111. examples/task_apps/crafter/task_app/synth_envs_hosted/hosted_app.py +49 -0
  112. examples/task_apps/crafter/task_app/synth_envs_hosted/inference/openai_client.py +372 -107
  113. examples/task_apps/crafter/task_app/synth_envs_hosted/policy_routes.py +81 -12
  114. examples/task_apps/crafter/task_app/synth_envs_hosted/rollout.py +82 -11
  115. examples/task_apps/crafter/task_app/synth_envs_hosted/utils.py +194 -1
  116. examples/task_apps/enron/task_app/grpo_enron_task_app.py +1 -1
  117. examples/task_apps/gepa_benchmarks/__init__.py +7 -0
  118. examples/task_apps/gepa_benchmarks/common.py +260 -0
  119. examples/task_apps/gepa_benchmarks/hotpotqa_task_app.py +507 -0
  120. examples/task_apps/gepa_benchmarks/hover_task_app.py +436 -0
  121. examples/task_apps/gepa_benchmarks/ifbench_task_app.py +563 -0
  122. examples/task_apps/gepa_benchmarks/pupa_task_app.py +460 -0
  123. examples/task_apps/math/README.md +1 -2
  124. examples/task_apps/pokemon_red/README.md +3 -4
  125. examples/task_apps/pokemon_red/README_IMAGE_ONLY_EVAL.md +4 -0
  126. examples/task_apps/pokemon_red/eval_image_only_gpt4o.toml +6 -5
  127. examples/task_apps/pokemon_red/eval_pokemon_red_policy.py +1 -2
  128. examples/task_apps/pokemon_red/task_app.py +288 -39
  129. examples/task_apps/sokoban/README.md +2 -3
  130. examples/task_apps/verilog/eval_groq_qwen32b.toml +12 -14
  131. examples/task_apps/verilog/task_app/grpo_verilog_task_app.py +1 -1
  132. examples/vlm/configs/crafter_vlm_gpt4o.toml +4 -1
  133. examples/warming_up_to_rl/configs/crafter_fft.toml +4 -1
  134. examples/warming_up_to_rl/configs/crafter_fft_4b.toml +0 -2
  135. examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml +3 -2
  136. examples/warming_up_to_rl/run_local_rollout_traced.py +1 -1
  137. examples/warming_up_to_rl/task_app/README.md +1 -1
  138. examples/warming_up_to_rl/task_app/grpo_crafter.py +185 -5
  139. examples/warming_up_to_rl/task_app/grpo_crafter_task_app.py +1 -1
  140. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/policy.py +3 -27
  141. examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/react_agent.py +1 -1
  142. examples/warming_up_to_rl/task_app/synth_envs_hosted/hosted_app.py +49 -0
  143. examples/warming_up_to_rl/task_app/synth_envs_hosted/inference/openai_client.py +156 -45
  144. examples/warming_up_to_rl/task_app/synth_envs_hosted/policy_routes.py +37 -4
  145. examples/warming_up_to_rl/task_app/synth_envs_hosted/rollout.py +33 -3
  146. examples/warming_up_to_rl/task_app/synth_envs_hosted/utils.py +67 -0
  147. examples/workflows/math_rl/configs/rl_from_base_qwen.toml +27 -0
  148. examples/workflows/math_rl/configs/rl_from_base_qwen17.toml +6 -0
  149. synth_ai/api/train/builders.py +99 -4
  150. synth_ai/api/train/cli.py +516 -26
  151. synth_ai/api/train/config_finder.py +13 -2
  152. synth_ai/api/train/configs/__init__.py +23 -2
  153. synth_ai/api/train/configs/prompt_learning.py +442 -0
  154. synth_ai/api/train/configs/rl.py +61 -7
  155. synth_ai/api/train/configs/sft.py +6 -2
  156. synth_ai/api/train/configs/shared.py +59 -2
  157. synth_ai/api/train/task_app.py +1 -1
  158. synth_ai/api/train/validators.py +277 -0
  159. synth_ai/auth/credentials.py +119 -0
  160. synth_ai/baseline/__init__.py +25 -0
  161. synth_ai/baseline/config.py +209 -0
  162. synth_ai/baseline/discovery.py +214 -0
  163. synth_ai/baseline/execution.py +146 -0
  164. synth_ai/cli/__init__.py +94 -18
  165. synth_ai/cli/__main__.py +0 -0
  166. synth_ai/cli/claude.py +70 -0
  167. synth_ai/cli/codex.py +84 -0
  168. synth_ai/cli/commands/__init__.py +18 -0
  169. synth_ai/cli/commands/baseline/__init__.py +12 -0
  170. synth_ai/cli/commands/baseline/core.py +637 -0
  171. synth_ai/cli/commands/baseline/list.py +93 -0
  172. synth_ai/cli/commands/demo/__init__.py +6 -0
  173. synth_ai/cli/commands/demo/core.py +163 -0
  174. synth_ai/cli/commands/eval/__init__.py +19 -0
  175. synth_ai/cli/commands/eval/core.py +1112 -0
  176. synth_ai/cli/commands/eval/errors.py +81 -0
  177. synth_ai/cli/commands/eval/validation.py +133 -0
  178. synth_ai/cli/commands/filter/__init__.py +12 -0
  179. synth_ai/cli/commands/filter/core.py +424 -0
  180. synth_ai/cli/commands/filter/errors.py +55 -0
  181. synth_ai/cli/commands/filter/validation.py +77 -0
  182. synth_ai/cli/commands/help/__init__.py +177 -0
  183. synth_ai/cli/commands/help/core.py +72 -0
  184. synth_ai/cli/commands/smoke/__init__.py +7 -0
  185. synth_ai/cli/commands/smoke/core.py +1436 -0
  186. synth_ai/cli/commands/status/__init__.py +64 -0
  187. synth_ai/cli/commands/status/client.py +192 -0
  188. synth_ai/cli/commands/status/config.py +92 -0
  189. synth_ai/cli/commands/status/errors.py +20 -0
  190. synth_ai/cli/commands/status/formatters.py +164 -0
  191. synth_ai/cli/commands/status/subcommands/__init__.py +9 -0
  192. synth_ai/cli/commands/status/subcommands/files.py +79 -0
  193. synth_ai/cli/commands/status/subcommands/jobs.py +334 -0
  194. synth_ai/cli/commands/status/subcommands/models.py +79 -0
  195. synth_ai/cli/commands/status/subcommands/pricing.py +22 -0
  196. synth_ai/cli/commands/status/subcommands/runs.py +81 -0
  197. synth_ai/cli/commands/status/subcommands/summary.py +47 -0
  198. synth_ai/cli/commands/status/subcommands/usage.py +203 -0
  199. synth_ai/cli/commands/status/utils.py +114 -0
  200. synth_ai/cli/commands/train/__init__.py +53 -0
  201. synth_ai/cli/commands/train/core.py +21 -0
  202. synth_ai/cli/commands/train/errors.py +117 -0
  203. synth_ai/cli/commands/train/judge_schemas.py +200 -0
  204. synth_ai/cli/commands/train/judge_validation.py +305 -0
  205. synth_ai/cli/commands/train/validation.py +386 -0
  206. synth_ai/cli/demo.py +30 -158
  207. synth_ai/cli/deploy/__init__.py +43 -0
  208. synth_ai/cli/deploy.py +162 -0
  209. synth_ai/cli/eval/__init__.py +36 -0
  210. synth_ai/cli/eval/core.py +5 -0
  211. synth_ai/cli/eval/errors.py +31 -0
  212. synth_ai/cli/eval/validation.py +5 -0
  213. synth_ai/cli/filter/__init__.py +28 -0
  214. synth_ai/cli/filter/core.py +5 -0
  215. synth_ai/cli/filter/errors.py +23 -0
  216. synth_ai/cli/filter/validation.py +5 -0
  217. synth_ai/cli/legacy_root_backup.py +14 -8
  218. synth_ai/cli/modal_serve/__init__.py +12 -0
  219. synth_ai/cli/modal_serve/core.py +14 -0
  220. synth_ai/cli/modal_serve/errors.py +8 -0
  221. synth_ai/cli/modal_serve/validation.py +11 -0
  222. synth_ai/cli/opencode.py +107 -0
  223. synth_ai/cli/root.py +9 -5
  224. synth_ai/cli/serve/__init__.py +12 -0
  225. synth_ai/cli/serve/core.py +14 -0
  226. synth_ai/cli/serve/errors.py +8 -0
  227. synth_ai/cli/serve/validation.py +11 -0
  228. synth_ai/cli/setup.py +20 -265
  229. synth_ai/cli/status.py +7 -126
  230. synth_ai/cli/task_app_deploy.py +1 -10
  231. synth_ai/cli/task_app_modal_serve.py +4 -9
  232. synth_ai/cli/task_app_serve.py +4 -11
  233. synth_ai/cli/task_apps.py +51 -1480
  234. synth_ai/cli/train/__init__.py +12 -0
  235. synth_ai/cli/train/core.py +21 -0
  236. synth_ai/cli/train/errors.py +8 -0
  237. synth_ai/cli/train/validation.py +24 -0
  238. synth_ai/cli/train.py +1 -14
  239. synth_ai/demos/crafter/grpo_crafter_task_app.py +1 -1
  240. synth_ai/demos/demo_task_apps/crafter/grpo_crafter_task_app.py +1 -1
  241. synth_ai/environments/examples/crafter_classic/engine_deterministic_patch.py +7 -4
  242. synth_ai/environments/examples/crafter_classic/engine_serialization_patch_v3.py +9 -5
  243. synth_ai/environments/examples/crafter_classic/world_config_patch_simple.py +4 -3
  244. synth_ai/environments/examples/red/engine.py +33 -12
  245. synth_ai/environments/examples/red/engine_helpers/reward_components.py +151 -179
  246. synth_ai/environments/examples/red/environment.py +26 -0
  247. synth_ai/environments/examples/red/trace_hooks_v3.py +168 -0
  248. synth_ai/http.py +12 -0
  249. synth_ai/judge_schemas.py +10 -10
  250. synth_ai/learning/__init__.py +10 -0
  251. synth_ai/learning/prompt_learning_client.py +276 -0
  252. synth_ai/learning/prompt_learning_types.py +184 -0
  253. synth_ai/learning/rl/client.py +3 -1
  254. synth_ai/pricing/__init__.py +2 -0
  255. synth_ai/pricing/model_pricing.py +57 -0
  256. synth_ai/streaming/__init__.py +29 -0
  257. synth_ai/streaming/config.py +94 -0
  258. synth_ai/streaming/handlers.py +518 -0
  259. synth_ai/streaming/streamer.py +320 -0
  260. synth_ai/streaming/types.py +95 -0
  261. synth_ai/task/apps/__init__.py +1 -0
  262. synth_ai/task/config.py +2 -0
  263. synth_ai/task/tracing_utils.py +25 -25
  264. synth_ai/task/validators.py +45 -9
  265. synth_ai/task_app_cfgs.py +21 -0
  266. synth_ai/tracing_v3/config.py +162 -19
  267. synth_ai/tracing_v3/constants.py +1 -1
  268. synth_ai/tracing_v3/db_config.py +24 -38
  269. synth_ai/tracing_v3/migration_helper.py +1 -2
  270. synth_ai/tracing_v3/storage/config.py +47 -13
  271. synth_ai/tracing_v3/storage/factory.py +3 -3
  272. synth_ai/tracing_v3/turso/daemon.py +113 -11
  273. synth_ai/tracing_v3/turso/native_manager.py +92 -16
  274. synth_ai/types.py +8 -0
  275. synth_ai/urls.py +11 -0
  276. synth_ai/utils/__init__.py +30 -1
  277. synth_ai/utils/agents.py +74 -0
  278. synth_ai/utils/bin.py +39 -0
  279. synth_ai/utils/cli.py +149 -5
  280. synth_ai/utils/env.py +40 -33
  281. synth_ai/utils/http.py +4 -1
  282. synth_ai/utils/json.py +72 -0
  283. synth_ai/utils/modal.py +285 -3
  284. synth_ai/utils/paths.py +48 -0
  285. synth_ai/utils/uvicorn.py +113 -0
  286. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/METADATA +109 -6
  287. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/RECORD +291 -142
  288. examples/qwen_vl/configs/eval_qwen2vl_vision.toml +0 -44
  289. synth_ai/cli/tui.py +0 -62
  290. synth_ai/tui/__init__.py +0 -5
  291. synth_ai/tui/__main__.py +0 -13
  292. synth_ai/tui/cli/__init__.py +0 -1
  293. synth_ai/tui/cli/query_experiments.py +0 -164
  294. synth_ai/tui/cli/query_experiments_v3.py +0 -164
  295. synth_ai/tui/dashboard.py +0 -911
  296. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/WHEEL +0 -0
  297. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/entry_points.txt +0 -0
  298. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/licenses/LICENSE +0 -0
  299. {synth_ai-0.2.16.dist-info → synth_ai-0.2.19.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,113 @@
1
+ import importlib.util as import_util
2
+ import os
3
+ import sys
4
+ from pathlib import Path
5
+ from typing import Any
6
+
7
+ from synth_ai.task_app_cfgs import LocalTaskAppConfig
8
+ from synth_ai.utils.env import resolve_env_var
9
+
10
+ REPO_ROOT = Path(__file__).resolve().parents[2]
11
+ START_DIV = f"{'-' * 30} Uvicorn start {'-' * 30}"
12
+ END_DIV = f"{'-' * 31} Uvicorn end {'-' * 31}"
13
+
14
+ _ASGI_FACTORY_NAMES = (
15
+ "fastapi_app",
16
+ "create_app",
17
+ "build_app",
18
+ "configure_app",
19
+ "get_app",
20
+ "app_factory",
21
+ )
22
+
23
+
24
+ def _coerce_asgi_app(candidate: Any) -> Any | None:
25
+ if candidate is None:
26
+ return None
27
+ if callable(candidate):
28
+ return candidate
29
+ return None
30
+
31
+
32
+ def deploy_uvicorn_app(cfg: LocalTaskAppConfig) -> None:
33
+ task_app_path = cfg.task_app_path.resolve()
34
+
35
+ env_key = resolve_env_var("ENVIRONMENT_API_KEY")
36
+ if not env_key:
37
+ raise RuntimeError("ENVIRONMENT_API_KEY is required to serve locally.")
38
+
39
+ if cfg.trace:
40
+ os.environ["TASKAPP_TRACING_ENABLED"] = "1"
41
+ else:
42
+ os.environ.pop("TASKAPP_TRACING_ENABLED", None)
43
+
44
+ task_app_dir = task_app_path.parent.resolve()
45
+ candidates: list[Path] = [task_app_dir]
46
+ if (task_app_dir / "__init__.py").exists():
47
+ candidates.append(task_app_dir.parent.resolve())
48
+ candidates.append(REPO_ROOT)
49
+
50
+ unique: list[str] = []
51
+ for candidate in candidates:
52
+ candidate_str = str(candidate)
53
+ if candidate_str and candidate_str not in unique:
54
+ unique.append(candidate_str)
55
+
56
+ existing = os.environ.get("PYTHONPATH")
57
+ if existing:
58
+ for segment in existing.split(os.pathsep):
59
+ if segment and segment not in unique:
60
+ unique.append(segment)
61
+
62
+ os.environ["PYTHONPATH"] = os.pathsep.join(unique)
63
+ for entry in reversed(unique):
64
+ if entry and entry not in sys.path:
65
+ sys.path.insert(0, entry)
66
+
67
+ module_name = f"_synth_local_task_app_{task_app_path.stem}"
68
+ spec = import_util.spec_from_file_location(module_name, str(task_app_path))
69
+ if spec is None or spec.loader is None:
70
+ raise RuntimeError(f"Unable to load task app at {task_app_path}")
71
+ module = import_util.module_from_spec(spec)
72
+ sys.modules[module_name] = module
73
+ try:
74
+ spec.loader.exec_module(module) # type: ignore[call-arg]
75
+ except Exception as exc:
76
+ raise RuntimeError(f"Failed to import task app: {exc}") from exc
77
+
78
+ app = _coerce_asgi_app(getattr(module, "app", None))
79
+ if app is None:
80
+ for name in _ASGI_FACTORY_NAMES:
81
+ factory = getattr(module, name, None)
82
+ if callable(factory):
83
+ produced = factory()
84
+ coerced = _coerce_asgi_app(produced)
85
+ if coerced is not None:
86
+ app = coerced
87
+ break
88
+ if app is None:
89
+ raise RuntimeError("Task app must expose an ASGI application via `app = FastAPI(...)` or a callable factory.")
90
+
91
+ host = cfg.host
92
+ port = cfg.port
93
+ preview_host = "127.0.0.1" if host in {"0.0.0.0", "::"} else host
94
+ print(f"[uvicorn] Serving task app at http://{preview_host}:{port}")
95
+
96
+
97
+ # Deploy
98
+ try:
99
+ import uvicorn # type: ignore
100
+ except ImportError as exc:
101
+ raise RuntimeError(
102
+ "uvicorn is required to serve task apps locally. Install it with `pip install uvicorn`."
103
+ ) from exc
104
+
105
+ try:
106
+ print(START_DIV)
107
+ uvicorn.run(app, host=host, port=port, reload=False, log_level="info")
108
+ except KeyboardInterrupt:
109
+ print("\n[uvicorn] Stopped by user.")
110
+ except Exception as exc:
111
+ raise RuntimeError(f"uvicorn runtime failed: {exc}") from exc
112
+ finally:
113
+ print(END_DIV)
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: synth-ai
3
- Version: 0.2.16
3
+ Version: 0.2.19
4
4
  Summary: RL as a service SDK - Core AI functionality and tracing
5
5
  Author-email: Synth AI <josh@usesynth.ai>
6
6
  License-Expression: MIT
@@ -19,11 +19,12 @@ Requires-Dist: tqdm>=4.66.4
19
19
  Requires-Dist: jsonschema>=4.23.0
20
20
  Requires-Dist: backoff>=2.0.0
21
21
  Requires-Dist: typing_extensions>=4.0.0
22
+ Requires-Dist: rich>=13.9.0
22
23
  Requires-Dist: openai>=1.99.0
23
24
  Requires-Dist: anthropic>=0.42.0
24
25
  Requires-Dist: langfuse<3.0.0,>=2.53.9
25
- Requires-Dist: opentelemetry-api<1.27.0,>=1.26.0
26
- Requires-Dist: opentelemetry-sdk<1.27.0,>=1.26.0
26
+ Requires-Dist: opentelemetry-api>=1.26.0
27
+ Requires-Dist: opentelemetry-sdk>=1.26.0
27
28
  Requires-Dist: diskcache>=5.6.3
28
29
  Requires-Dist: groq>=0.30.0
29
30
  Requires-Dist: google-genai>=1.26.0
@@ -46,16 +47,16 @@ Requires-Dist: google-api-core>=2.25.1
46
47
  Requires-Dist: google-generativeai>=0.8.5
47
48
  Requires-Dist: crafter>=1.8.3
48
49
  Requires-Dist: click<8.2,>=8.1.7
49
- Requires-Dist: textual>=1.1.0
50
50
  Requires-Dist: openai-harmony>=0.0.1
51
51
  Requires-Dist: asyncpg>=0.30.0
52
52
  Requires-Dist: aiohttp>=3.8.0
53
53
  Requires-Dist: httpx>=0.28.1
54
54
  Requires-Dist: datasets>=4.0.0
55
55
  Requires-Dist: transformers>=4.56.1
56
- Requires-Dist: modal==1.1.4
56
+ Requires-Dist: modal<2.0.0,>=1.1.4
57
57
  Requires-Dist: pyboy>=2.6.0
58
58
  Requires-Dist: setuptools>=80.9.0
59
+ Requires-Dist: libsql-experimental>=0.0.55
59
60
  Provides-Extra: dev
60
61
  Requires-Dist: build>=1.2.2.post1; extra == "dev"
61
62
  Requires-Dist: twine>=4.0.0; extra == "dev"
@@ -71,9 +72,14 @@ Requires-Dist: ruff>=0.1.0; extra == "dev"
71
72
  Provides-Extra: research
72
73
  Requires-Dist: crafter>=1.8.3; extra == "research"
73
74
  Requires-Dist: datasets>=4.0.0; extra == "research"
75
+ Provides-Extra: swe
76
+ Requires-Dist: morphcloud>=0.1.3; extra == "swe"
77
+ Requires-Dist: swebench>=2.3.0; extra == "swe"
74
78
  Provides-Extra: all
75
79
  Requires-Dist: crafter>=1.8.3; extra == "all"
76
80
  Requires-Dist: datasets>=4.0.0; extra == "all"
81
+ Requires-Dist: morphcloud>=0.1.3; extra == "all"
82
+ Requires-Dist: swebench>=2.3.0; extra == "all"
77
83
  Provides-Extra: analytics
78
84
  Requires-Dist: pandas>=2.2.3; extra == "analytics"
79
85
  Dynamic: license-file
@@ -92,7 +98,7 @@ Dynamic: license-file
92
98
 
93
99
  ---
94
100
 
95
- ## 🚀 Install
101
+ ## 🚀 Install version 0.2.16
96
102
 
97
103
  ```bash
98
104
  pip install synth-ai
@@ -113,6 +119,7 @@ uvx synth-ai setup
113
119
  uvx synth-ai demo
114
120
  uvx synth-ai deploy
115
121
  uvx synth-ai run
122
+ uvx synth-ai baseline # For coding agents: get baseline scores
116
123
  ```
117
124
 
118
125
  > Full quickstart: [https://docs.usesynth.ai/sdk/get-started](https://docs.usesynth.ai/sdk/get-started)
@@ -153,6 +160,102 @@ Synth-AI ships with a built-in RL example: training **Qwen3-0.6B** on math reaso
153
160
 
154
161
  ---
155
162
 
163
+ ## 🤖 For Coding Agents: Get Started with Baselines
164
+
165
+ **Baselines** are the fastest way for coding agents to evaluate changes and measure improvement on Synth tasks.
166
+
167
+ ### Why Use Baselines?
168
+
169
+ Baselines provide a **self-contained evaluation system** that:
170
+ - ✅ **No infrastructure required** — runs locally, no deployed task app needed
171
+ - ✅ **Quick feedback loop** — get task-by-task results in seconds
172
+ - ✅ **Compare changes** — establish a baseline score before making modifications
173
+ - ✅ **Auto-discoverable** — finds baseline files automatically in your codebase
174
+
175
+ ### Quick Start for Coding Agents
176
+
177
+ ```bash
178
+ # 1. List available baselines
179
+ uvx synth-ai baseline list
180
+
181
+ # 2. Run a quick 3-task baseline to get started
182
+ uvx synth-ai baseline banking77 --split train --seeds 0,1,2
183
+
184
+ # 3. Get your baseline score (full train split)
185
+ uvx synth-ai baseline banking77 --split train
186
+
187
+ # 4. Make your changes to the code...
188
+
189
+ # 5. Re-run to compare performance
190
+ uvx synth-ai baseline banking77 --split train --output results_after.json
191
+ ```
192
+
193
+ ### Available Baselines
194
+
195
+ ```bash
196
+ # Filter by task type
197
+ uvx synth-ai baseline list --tag rl # RL tasks
198
+ uvx synth-ai baseline list --tag nlp # NLP tasks
199
+ uvx synth-ai baseline list --tag vision # Vision tasks
200
+
201
+ # Run specific baselines
202
+ uvx synth-ai baseline warming_up_to_rl # Crafter survival game
203
+ uvx synth-ai baseline pokemon_vl # Pokemon Red (vision)
204
+ uvx synth-ai baseline gepa # Banking77 classification
205
+ ```
206
+
207
+ ### Baseline Results
208
+
209
+ Each baseline run provides:
210
+ - **Task-by-task results** — see exactly which seeds succeed/fail
211
+ - **Aggregate metrics** — success rate, mean/std rewards, total tasks
212
+ - **Serializable output** — save to JSON with `--output results.json`
213
+ - **Model comparison** — test different models with `--model`
214
+
215
+ Example output:
216
+ ```
217
+ ============================================================
218
+ Baseline Evaluation: Banking77 Intent Classification
219
+ ============================================================
220
+ Split(s): train
221
+ Tasks: 10
222
+ Success: 8/10
223
+ Execution time: 12.34s
224
+
225
+ Aggregate Metrics:
226
+ mean_outcome_reward: 0.8000
227
+ success_rate: 0.8000
228
+ total_tasks: 10
229
+ ```
230
+
231
+ ### Creating Custom Baselines
232
+
233
+ Coding agents can create new baseline files to test custom tasks:
234
+
235
+ ```python
236
+ # my_task_baseline.py
237
+ from synth_ai.baseline import BaselineConfig, BaselineTaskRunner, DataSplit, TaskResult
238
+
239
+ class MyTaskRunner(BaselineTaskRunner):
240
+ async def run_task(self, seed: int) -> TaskResult:
241
+ # Your task logic here
242
+ return TaskResult(...)
243
+
244
+ my_baseline = BaselineConfig(
245
+ baseline_id="my_task",
246
+ name="My Custom Task",
247
+ description="Evaluate my custom task",
248
+ task_runner=MyTaskRunner,
249
+ splits={
250
+ "train": DataSplit(name="train", seeds=list(range(10))),
251
+ },
252
+ )
253
+ ```
254
+
255
+ Place this file in `examples/baseline/` or name it `*_baseline.py` for auto-discovery.
256
+
257
+ ---
258
+
156
259
  ## 🔐 SDK → Dashboard Pairing
157
260
 
158
261
  When you run `uvx synth-ai setup` (or legacy `uvx synth-ai rl_demo setup`):