logits-cookbook 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (276) hide show
  1. logits_cookbook-0.1.0/.github/workflows/ci.yml +49 -0
  2. logits_cookbook-0.1.0/.github/workflows/publish.yml +87 -0
  3. logits_cookbook-0.1.0/.gitignore +8 -0
  4. logits_cookbook-0.1.0/AGENTS.md +135 -0
  5. logits_cookbook-0.1.0/CHANGELOG.md +5 -0
  6. logits_cookbook-0.1.0/CLAUDE.md +135 -0
  7. logits_cookbook-0.1.0/CONTRIBUTING.md +77 -0
  8. logits_cookbook-0.1.0/LICENSE +202 -0
  9. logits_cookbook-0.1.0/PKG-INFO +162 -0
  10. logits_cookbook-0.1.0/README.md +93 -0
  11. logits_cookbook-0.1.0/docs/api-reference/apifuture.md +151 -0
  12. logits_cookbook-0.1.0/docs/api-reference/exceptions.md +130 -0
  13. logits_cookbook-0.1.0/docs/api-reference/restclient.md +512 -0
  14. logits_cookbook-0.1.0/docs/api-reference/samplingclient.md +112 -0
  15. logits_cookbook-0.1.0/docs/api-reference/serviceclient.md +267 -0
  16. logits_cookbook-0.1.0/docs/api-reference/trainingclient.md +462 -0
  17. logits_cookbook-0.1.0/docs/api-reference/types.md +899 -0
  18. logits_cookbook-0.1.0/docs/async.mdx +58 -0
  19. logits_cookbook-0.1.0/docs/compatible-apis/openai.mdx +78 -0
  20. logits_cookbook-0.1.0/docs/completers.mdx +46 -0
  21. logits_cookbook-0.1.0/docs/dev-tips.mdx +5 -0
  22. logits_cookbook-0.1.0/docs/docs-outline.mdx +25 -0
  23. logits_cookbook-0.1.0/docs/download-weights.mdx +33 -0
  24. logits_cookbook-0.1.0/docs/evals.mdx +234 -0
  25. logits_cookbook-0.1.0/docs/index.mdx +37 -0
  26. logits_cookbook-0.1.0/docs/install.mdx +34 -0
  27. logits_cookbook-0.1.0/docs/lora-primer.mdx +55 -0
  28. logits_cookbook-0.1.0/docs/losses.mdx +308 -0
  29. logits_cookbook-0.1.0/docs/model-lineup.mdx +66 -0
  30. logits_cookbook-0.1.0/docs/overview-building.mdx +16 -0
  31. logits_cookbook-0.1.0/docs/preferences/dpo-guide.mdx +112 -0
  32. logits_cookbook-0.1.0/docs/preferences/rlhf-example.mdx +22 -0
  33. logits_cookbook-0.1.0/docs/preferences.mdx +15 -0
  34. logits_cookbook-0.1.0/docs/publish-weights.mdx +48 -0
  35. logits_cookbook-0.1.0/docs/rendering.mdx +241 -0
  36. logits_cookbook-0.1.0/docs/rl/images/rl_loop_reward.png +0 -0
  37. logits_cookbook-0.1.0/docs/rl/rl-basic.mdx +27 -0
  38. logits_cookbook-0.1.0/docs/rl/rl-envs.mdx +63 -0
  39. logits_cookbook-0.1.0/docs/rl/rl-hyperparams.mdx +83 -0
  40. logits_cookbook-0.1.0/docs/rl/rl-logging.mdx +98 -0
  41. logits_cookbook-0.1.0/docs/rl/rl-loops.mdx +25 -0
  42. logits_cookbook-0.1.0/docs/rl/sequence-extension.mdx +127 -0
  43. logits_cookbook-0.1.0/docs/rl.mdx +21 -0
  44. logits_cookbook-0.1.0/docs/save-load.mdx +56 -0
  45. logits_cookbook-0.1.0/docs/supervised-learning/images/lr_sweep.png +0 -0
  46. logits_cookbook-0.1.0/docs/supervised-learning/images/train_test_loss.png +0 -0
  47. logits_cookbook-0.1.0/docs/supervised-learning/prompt-distillation.mdx +88 -0
  48. logits_cookbook-0.1.0/docs/supervised-learning/sl-basic.mdx +45 -0
  49. logits_cookbook-0.1.0/docs/supervised-learning/sl-hyperparams.mdx +44 -0
  50. logits_cookbook-0.1.0/docs/supervised-learning/sl-loop.mdx +5 -0
  51. logits_cookbook-0.1.0/docs/supervised-learning/sweep-case-study.mdx +112 -0
  52. logits_cookbook-0.1.0/docs/supervised-learning.mdx +16 -0
  53. logits_cookbook-0.1.0/docs/training-sampling.mdx +309 -0
  54. logits_cookbook-0.1.0/docs/under-the-hood.mdx +68 -0
  55. logits_cookbook-0.1.0/logits_cookbook/__init__.py +1 -0
  56. logits_cookbook-0.1.0/logits_cookbook/chat_app/README.md +46 -0
  57. logits_cookbook-0.1.0/logits_cookbook/chat_app/logits_chat_cli.py +184 -0
  58. logits_cookbook-0.1.0/logits_cookbook/checkpoint_utils.py +362 -0
  59. logits_cookbook-0.1.0/logits_cookbook/cli_utils.py +60 -0
  60. logits_cookbook-0.1.0/logits_cookbook/client_utils.py +29 -0
  61. logits_cookbook-0.1.0/logits_cookbook/completers.py +123 -0
  62. logits_cookbook-0.1.0/logits_cookbook/display.py +46 -0
  63. logits_cookbook-0.1.0/logits_cookbook/distillation/__init__.py +0 -0
  64. logits_cookbook-0.1.0/logits_cookbook/distillation/datasets.py +278 -0
  65. logits_cookbook-0.1.0/logits_cookbook/distillation/train_on_policy.py +505 -0
  66. logits_cookbook-0.1.0/logits_cookbook/eval/README.md +13 -0
  67. logits_cookbook-0.1.0/logits_cookbook/eval/__init__.py +0 -0
  68. logits_cookbook-0.1.0/logits_cookbook/eval/custom_evaluators.py +103 -0
  69. logits_cookbook-0.1.0/logits_cookbook/eval/custom_inspect_task.py +69 -0
  70. logits_cookbook-0.1.0/logits_cookbook/eval/evaluators.py +30 -0
  71. logits_cookbook-0.1.0/logits_cookbook/eval/inspect_evaluators.py +137 -0
  72. logits_cookbook-0.1.0/logits_cookbook/eval/inspect_utils.py +161 -0
  73. logits_cookbook-0.1.0/logits_cookbook/eval/run_inspect_evals.py +71 -0
  74. logits_cookbook-0.1.0/logits_cookbook/example_data/conversations.jsonl +128 -0
  75. logits_cookbook-0.1.0/logits_cookbook/example_data/multilingual.txt +2100 -0
  76. logits_cookbook-0.1.0/logits_cookbook/hyperparam_utils.py +190 -0
  77. logits_cookbook-0.1.0/logits_cookbook/image_processing_utils.py +63 -0
  78. logits_cookbook-0.1.0/logits_cookbook/image_processing_utils_test.py +54 -0
  79. logits_cookbook-0.1.0/logits_cookbook/model_info.py +140 -0
  80. logits_cookbook-0.1.0/logits_cookbook/preference/__init__.py +0 -0
  81. logits_cookbook-0.1.0/logits_cookbook/preference/comparison_policy_evaluator.py +67 -0
  82. logits_cookbook-0.1.0/logits_cookbook/preference/dpo_datasets.py +77 -0
  83. logits_cookbook-0.1.0/logits_cookbook/preference/preference_datasets.py +172 -0
  84. logits_cookbook-0.1.0/logits_cookbook/preference/train_dpo.py +433 -0
  85. logits_cookbook-0.1.0/logits_cookbook/preference/types.py +157 -0
  86. logits_cookbook-0.1.0/logits_cookbook/recipes/README.md +32 -0
  87. logits_cookbook-0.1.0/logits_cookbook/recipes/chat_sl/README.md +41 -0
  88. logits_cookbook-0.1.0/logits_cookbook/recipes/chat_sl/chat_datasets.py +77 -0
  89. logits_cookbook-0.1.0/logits_cookbook/recipes/chat_sl/train.py +173 -0
  90. logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/README.md +70 -0
  91. logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/code_env.py +290 -0
  92. logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/code_grading.py +183 -0
  93. logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/deepcoder_tool.py +135 -0
  94. logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/lcb_utils.py +821 -0
  95. logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/sandbox_config/local.yaml +122 -0
  96. logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/train.py +125 -0
  97. logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/README.md +154 -0
  98. logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/harbor_multiturn.py +35 -0
  99. logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/harbor_multiturn_test.py +49 -0
  100. logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/off_policy_reasoning.py +201 -0
  101. logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/on_policy_distillation.py +182 -0
  102. logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/on_policy_distillation_harbor_multi_turn.py +187 -0
  103. logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/on_policy_multi_teacher.py +195 -0
  104. logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/README.md +94 -0
  105. logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/harbor_env.py +211 -0
  106. logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/harbor_tools.py +118 -0
  107. logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/harbor_tools_test.py +250 -0
  108. logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/launch_terminal_bench.py +47 -0
  109. logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/train.py +121 -0
  110. logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/README.md +85 -0
  111. logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/arithmetic_env.py +105 -0
  112. logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/math_env.py +449 -0
  113. logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/math_env_test.py +32 -0
  114. logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/math_grading.py +548 -0
  115. logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/train.py +169 -0
  116. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/README.md +14 -0
  117. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/guess_number/README.md +82 -0
  118. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/guess_number/env.py +169 -0
  119. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/guess_number/train.py +76 -0
  120. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/text_arena/README.md +65 -0
  121. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/text_arena/env.py +298 -0
  122. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/text_arena/train.py +78 -0
  123. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/twenty_questions/README.md +73 -0
  124. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/twenty_questions/common_english_nouns.txt +171 -0
  125. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/twenty_questions/env.py +273 -0
  126. logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/twenty_questions/train.py +78 -0
  127. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/README.md +12 -0
  128. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/datasets.py +315 -0
  129. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/dpo/README.md +34 -0
  130. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/dpo/train.py +134 -0
  131. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/rlhf/README.md +28 -0
  132. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/rlhf/rlhf_pipeline.py +299 -0
  133. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/shorter/README.md +29 -0
  134. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/shorter/env.py +61 -0
  135. logits_cookbook-0.1.0/logits_cookbook/recipes/preference/shorter/train.py +78 -0
  136. logits_cookbook-0.1.0/logits_cookbook/recipes/prompt_distillation/README.md +76 -0
  137. logits_cookbook-0.1.0/logits_cookbook/recipes/prompt_distillation/create_data.py +178 -0
  138. logits_cookbook-0.1.0/logits_cookbook/recipes/prompt_distillation/train.py +120 -0
  139. logits_cookbook-0.1.0/logits_cookbook/recipes/rl_basic.py +42 -0
  140. logits_cookbook-0.1.0/logits_cookbook/recipes/rl_loop.py +256 -0
  141. logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/README.md +94 -0
  142. logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/data.py +201 -0
  143. logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/debug_env.py +78 -0
  144. logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/env.py +265 -0
  145. logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/generate_data.py +46 -0
  146. logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/prometheus_experimental.py +147 -0
  147. logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/train.py +158 -0
  148. logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/README.md +68 -0
  149. logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/chroma_pickle_test.py +37 -0
  150. logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/embedding.py +133 -0
  151. logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/offline_eval.py +213 -0
  152. logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/search_env.py +246 -0
  153. logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/tools.py +272 -0
  154. logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/train.py +150 -0
  155. logits_cookbook-0.1.0/logits_cookbook/recipes/sl_basic.py +52 -0
  156. logits_cookbook-0.1.0/logits_cookbook/recipes/sl_loop.py +169 -0
  157. logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/README.md +35 -0
  158. logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/evaluate.py +166 -0
  159. logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/logits_openai.py +261 -0
  160. logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/train.py +152 -0
  161. logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/verifiers_env.py +206 -0
  162. logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/verifiers_pickle_test.py +33 -0
  163. logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/README.md +42 -0
  164. logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/data.py +529 -0
  165. logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/eval.py +503 -0
  166. logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/eval_sweep.py +275 -0
  167. logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/sweep.py +230 -0
  168. logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/train.py +161 -0
  169. logits_cookbook-0.1.0/logits_cookbook/renderers/README.md +15 -0
  170. logits_cookbook-0.1.0/logits_cookbook/renderers/__init__.py +252 -0
  171. logits_cookbook-0.1.0/logits_cookbook/renderers/base.py +1576 -0
  172. logits_cookbook-0.1.0/logits_cookbook/renderers/deepseek_v3.py +501 -0
  173. logits_cookbook-0.1.0/logits_cookbook/renderers/deepseek_v3_test.py +183 -0
  174. logits_cookbook-0.1.0/logits_cookbook/renderers/gpt_oss.py +661 -0
  175. logits_cookbook-0.1.0/logits_cookbook/renderers/gpt_oss_test.py +186 -0
  176. logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k2.py +555 -0
  177. logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k25.py +160 -0
  178. logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k25_test.py +814 -0
  179. logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k2_5_tool_declaration_ts.py +481 -0
  180. logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k2_test.py +768 -0
  181. logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k2_tool_declaration_test.py +261 -0
  182. logits_cookbook-0.1.0/logits_cookbook/renderers/llama3.py +70 -0
  183. logits_cookbook-0.1.0/logits_cookbook/renderers/parsing_test.py +417 -0
  184. logits_cookbook-0.1.0/logits_cookbook/renderers/qwen3.py +558 -0
  185. logits_cookbook-0.1.0/logits_cookbook/renderers/qwen3_5.py +290 -0
  186. logits_cookbook-0.1.0/logits_cookbook/renderers/qwen3_test.py +247 -0
  187. logits_cookbook-0.1.0/logits_cookbook/renderers/qwen3_tool_declaration_test.py +348 -0
  188. logits_cookbook-0.1.0/logits_cookbook/renderers/renderer_pickle_test.py +142 -0
  189. logits_cookbook-0.1.0/logits_cookbook/renderers/renderers_test.py +1439 -0
  190. logits_cookbook-0.1.0/logits_cookbook/renderers/role_colon.py +97 -0
  191. logits_cookbook-0.1.0/logits_cookbook/renderers/tool_calling_test.py +309 -0
  192. logits_cookbook-0.1.0/logits_cookbook/rl/__init__.py +0 -0
  193. logits_cookbook-0.1.0/logits_cookbook/rl/builder_pickle_test.py +104 -0
  194. logits_cookbook-0.1.0/logits_cookbook/rl/data_processing.py +207 -0
  195. logits_cookbook-0.1.0/logits_cookbook/rl/message_env.py +120 -0
  196. logits_cookbook-0.1.0/logits_cookbook/rl/message_env_test.py +318 -0
  197. logits_cookbook-0.1.0/logits_cookbook/rl/metric_util.py +250 -0
  198. logits_cookbook-0.1.0/logits_cookbook/rl/metrics.py +169 -0
  199. logits_cookbook-0.1.0/logits_cookbook/rl/multiturn_weight_assignment_test.py +284 -0
  200. logits_cookbook-0.1.0/logits_cookbook/rl/play_w_env.py +112 -0
  201. logits_cookbook-0.1.0/logits_cookbook/rl/preference_envs.py +283 -0
  202. logits_cookbook-0.1.0/logits_cookbook/rl/problem_env.py +114 -0
  203. logits_cookbook-0.1.0/logits_cookbook/rl/rollout_logging.py +125 -0
  204. logits_cookbook-0.1.0/logits_cookbook/rl/rollout_logging_test.py +46 -0
  205. logits_cookbook-0.1.0/logits_cookbook/rl/rollouts.py +237 -0
  206. logits_cookbook-0.1.0/logits_cookbook/rl/train.py +1436 -0
  207. logits_cookbook-0.1.0/logits_cookbook/rl/types.py +186 -0
  208. logits_cookbook-0.1.0/logits_cookbook/sandbox/README.md +67 -0
  209. logits_cookbook-0.1.0/logits_cookbook/sandbox/__init__.py +30 -0
  210. logits_cookbook-0.1.0/logits_cookbook/sandbox/modal_sandbox.py +334 -0
  211. logits_cookbook-0.1.0/logits_cookbook/sandbox/sandbox_interface.py +105 -0
  212. logits_cookbook-0.1.0/logits_cookbook/sandbox/sandboxfusion.py +122 -0
  213. logits_cookbook-0.1.0/logits_cookbook/scripts/merge_logits_adapter_to_hf_model.py +182 -0
  214. logits_cookbook-0.1.0/logits_cookbook/scripts/test_tool_calling_e2e.py +228 -0
  215. logits_cookbook-0.1.0/logits_cookbook/supervised/__init__.py +0 -0
  216. logits_cookbook-0.1.0/logits_cookbook/supervised/common.py +138 -0
  217. logits_cookbook-0.1.0/logits_cookbook/supervised/data.py +183 -0
  218. logits_cookbook-0.1.0/logits_cookbook/supervised/nll_evaluator.py +26 -0
  219. logits_cookbook-0.1.0/logits_cookbook/supervised/resume_test.py +156 -0
  220. logits_cookbook-0.1.0/logits_cookbook/supervised/train.py +404 -0
  221. logits_cookbook-0.1.0/logits_cookbook/supervised/types.py +87 -0
  222. logits_cookbook-0.1.0/logits_cookbook/supervised/viz_sft_dataset.py +53 -0
  223. logits_cookbook-0.1.0/logits_cookbook/third_party/__init__.py +0 -0
  224. logits_cookbook-0.1.0/logits_cookbook/third_party/litellm/README.md +181 -0
  225. logits_cookbook-0.1.0/logits_cookbook/third_party/litellm/__init__.py +5 -0
  226. logits_cookbook-0.1.0/logits_cookbook/third_party/litellm/provider.py +533 -0
  227. logits_cookbook-0.1.0/logits_cookbook/third_party/litellm/provider_test.py +564 -0
  228. logits_cookbook-0.1.0/logits_cookbook/tokenizer_utils.py +105 -0
  229. logits_cookbook-0.1.0/logits_cookbook/tokenizer_utils_test.py +83 -0
  230. logits_cookbook-0.1.0/logits_cookbook/tool_use/README.md +99 -0
  231. logits_cookbook-0.1.0/logits_cookbook/tool_use/__init__.py +33 -0
  232. logits_cookbook-0.1.0/logits_cookbook/tool_use/agent_tool_message_env.py +148 -0
  233. logits_cookbook-0.1.0/logits_cookbook/tool_use/tools.py +321 -0
  234. logits_cookbook-0.1.0/logits_cookbook/tool_use/types.py +58 -0
  235. logits_cookbook-0.1.0/logits_cookbook/utils/__init__.py +1 -0
  236. logits_cookbook-0.1.0/logits_cookbook/utils/code_state.py +130 -0
  237. logits_cookbook-0.1.0/logits_cookbook/utils/file_utils.py +6 -0
  238. logits_cookbook-0.1.0/logits_cookbook/utils/format_colorized.py +50 -0
  239. logits_cookbook-0.1.0/logits_cookbook/utils/logtree.py +1125 -0
  240. logits_cookbook-0.1.0/logits_cookbook/utils/logtree_formatters.py +250 -0
  241. logits_cookbook-0.1.0/logits_cookbook/utils/logtree_test.py +722 -0
  242. logits_cookbook-0.1.0/logits_cookbook/utils/lr_scheduling.py +23 -0
  243. logits_cookbook-0.1.0/logits_cookbook/utils/misc_utils.py +94 -0
  244. logits_cookbook-0.1.0/logits_cookbook/utils/ml_log.py +531 -0
  245. logits_cookbook-0.1.0/logits_cookbook/utils/ml_log_test.py +42 -0
  246. logits_cookbook-0.1.0/logits_cookbook/utils/trace.py +443 -0
  247. logits_cookbook-0.1.0/logits_cookbook/utils/trace_test.py +127 -0
  248. logits_cookbook-0.1.0/logits_cookbook/xmux/README.md +93 -0
  249. logits_cookbook-0.1.0/logits_cookbook/xmux/__init__.py +6 -0
  250. logits_cookbook-0.1.0/logits_cookbook/xmux/control.py +509 -0
  251. logits_cookbook-0.1.0/logits_cookbook/xmux/core.py +658 -0
  252. logits_cookbook-0.1.0/logits_cookbook/xmux/examples/async_rl_sweep.py +94 -0
  253. logits_cookbook-0.1.0/logits_cookbook/xmux/examples/fake_train.py +75 -0
  254. logits_cookbook-0.1.0/logits_cookbook/xmux/examples/ml_sweep.py +278 -0
  255. logits_cookbook-0.1.0/logits_cookbook/xmux/run_job.py +50 -0
  256. logits_cookbook-0.1.0/logits_cookbook/xmux/utils.py +253 -0
  257. logits_cookbook-0.1.0/pyproject.toml +113 -0
  258. logits_cookbook-0.1.0/tests/__init__.py +0 -0
  259. logits_cookbook-0.1.0/tests/compare_sampling_training_logprobs.py +159 -0
  260. logits_cookbook-0.1.0/tests/conftest.py +32 -0
  261. logits_cookbook-0.1.0/tests/helpers.py +95 -0
  262. logits_cookbook-0.1.0/tests/smoke_tests.py +143 -0
  263. logits_cookbook-0.1.0/tests/test_recipe_chat_sl.py +33 -0
  264. logits_cookbook-0.1.0/tests/test_recipe_dpo.py +11 -0
  265. logits_cookbook-0.1.0/tests/test_recipe_guess_number.py +14 -0
  266. logits_cookbook-0.1.0/tests/test_recipe_off_policy_reasoning.py +16 -0
  267. logits_cookbook-0.1.0/tests/test_recipe_on_policy_distillation.py +14 -0
  268. logits_cookbook-0.1.0/tests/test_recipe_on_policy_multi_teacher.py +15 -0
  269. logits_cookbook-0.1.0/tests/test_recipe_rlhf_pipeline.py +11 -0
  270. logits_cookbook-0.1.0/tests/test_recipe_shorter.py +11 -0
  271. logits_cookbook-0.1.0/tests/test_recipe_text_arena.py +14 -0
  272. logits_cookbook-0.1.0/tests/test_recipe_twenty_questions.py +15 -0
  273. logits_cookbook-0.1.0/tests/test_recipe_vlm_classifier.py +18 -0
  274. logits_cookbook-0.1.0/tests/third_party/__init__.py +0 -0
  275. logits_cookbook-0.1.0/tests/third_party/test_litellm.py +169 -0
  276. logits_cookbook-0.1.0/tests/validate_temperature_logprobs.py +369 -0
@@ -0,0 +1,49 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ pull_request:
6
+ workflow_dispatch:
7
+
8
+ jobs:
9
+ lint:
10
+ name: Lint (ruff)
11
+ runs-on: ubuntu-latest
12
+
13
+ steps:
14
+ - name: Check out repository
15
+ uses: actions/checkout@v4
16
+
17
+ - name: Set up Python
18
+ uses: actions/setup-python@v5
19
+ with:
20
+ python-version: "3.12"
21
+
22
+ - name: Install ruff
23
+ run: python -m pip install --upgrade pip ruff
24
+
25
+ - name: ruff check
26
+ run: ruff check .
27
+
28
+ - name: ruff format --check
29
+ run: ruff format --check .
30
+
31
+ package:
32
+ name: Build package
33
+ runs-on: ubuntu-latest
34
+
35
+ steps:
36
+ - name: Check out repository
37
+ uses: actions/checkout@v4
38
+
39
+ - name: Set up Python
40
+ uses: actions/setup-python@v5
41
+ with:
42
+ python-version: "3.12"
43
+
44
+ - name: Build and check distributions
45
+ run: |
46
+ python -m pip install --upgrade pip
47
+ python -m pip install build twine
48
+ python -m build
49
+ python -m twine check dist/*
@@ -0,0 +1,87 @@
1
+ name: Publish
2
+
3
+ on:
4
+ release:
5
+ types: [published]
6
+ workflow_dispatch:
7
+ inputs:
8
+ repository:
9
+ description: "Repository to publish to"
10
+ required: true
11
+ default: "testpypi"
12
+ type: choice
13
+ options:
14
+ - testpypi
15
+ - pypi
16
+
17
+ permissions:
18
+ contents: read
19
+
20
+ jobs:
21
+ build:
22
+ name: Build distributions
23
+ runs-on: ubuntu-latest
24
+
25
+ steps:
26
+ - name: Check out repository
27
+ uses: actions/checkout@v4
28
+
29
+ - name: Set up Python
30
+ uses: actions/setup-python@v5
31
+ with:
32
+ python-version: "3.12"
33
+
34
+ - name: Build package
35
+ run: |
36
+ python -m pip install --upgrade pip
37
+ python -m pip install build twine
38
+ python -m build
39
+ python -m twine check dist/*
40
+
41
+ - name: Upload distributions
42
+ uses: actions/upload-artifact@v4
43
+ with:
44
+ name: python-package-distributions
45
+ path: dist/
46
+
47
+ publish-testpypi:
48
+ name: Publish to TestPyPI
49
+ needs: build
50
+ if: github.event_name == 'workflow_dispatch' && inputs.repository == 'testpypi'
51
+ runs-on: ubuntu-latest
52
+ environment: testpypi
53
+ permissions:
54
+ id-token: write
55
+ contents: read
56
+
57
+ steps:
58
+ - name: Download distributions
59
+ uses: actions/download-artifact@v4
60
+ with:
61
+ name: python-package-distributions
62
+ path: dist/
63
+
64
+ - name: Publish distributions to TestPyPI
65
+ uses: pypa/gh-action-pypi-publish@release/v1
66
+ with:
67
+ repository-url: https://test.pypi.org/legacy/
68
+
69
+ publish-pypi:
70
+ name: Publish to PyPI
71
+ needs: build
72
+ if: github.event_name == 'release' || (github.event_name == 'workflow_dispatch' && inputs.repository == 'pypi')
73
+ runs-on: ubuntu-latest
74
+ environment: pypi
75
+ permissions:
76
+ id-token: write
77
+ contents: read
78
+
79
+ steps:
80
+ - name: Download distributions
81
+ uses: actions/download-artifact@v4
82
+ with:
83
+ name: python-package-distributions
84
+ path: dist/
85
+
86
+ - name: Publish distributions to PyPI
87
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,8 @@
1
+ **/__pycache__
2
+ .DS_Store
3
+ .env
4
+ .env.*
5
+ .venv
6
+ uv.lock
7
+ sdk
8
+ logits-sdk
@@ -0,0 +1,135 @@
1
+ # Logits Cookbook Agent Guide
2
+
3
+ Quick reference for agents working on `logits_cookbook`. Full documentation is in `docs/`.
4
+
5
+ `logits_cookbook` is a client library with training and eval code built on the Tinker service (hosted by Thinking Machines Lab) and the Tinker SDK (a separate repo with just the API). You author training/eval loops that run on a CPU machine; Tinker executes the heavy GPU work.
6
+
7
+ **Start here:** `docs/training-sampling.mdx` - Complete walkthrough of training and sampling basics.
8
+
9
+ ## Documentation Map (`docs/`)
10
+
11
+ **API Fundamentals:**
12
+ - `index.mdx` - Tinker overview, division of responsibilities
13
+ - `install.mdx` - Installation, API key setup
14
+ - `training-sampling.mdx` - **Starter guide**: data prep, forward_backward, sampling, vision inputs
15
+ - `losses.mdx` - Loss functions (cross_entropy, importance_sampling, ppo, cispo, dro, forward_backward_custom)
16
+ - `save-load.mdx` - Checkpointing (save_weights_for_sampler vs save_state)
17
+ - `async.mdx` - Sync/async APIs, futures, overlapping requests
18
+ - `model-lineup.mdx` - Available models
19
+ - `under-the-hood.mdx` - Clock cycles, worker pools
20
+
21
+ **API Reference (`api-reference/`):**
22
+ - `types.md` - **All API types** (Datum, ModelInput, TensorData, SamplingParams, etc.)
23
+ - `trainingclient.md`, `samplingclient.md`, `serviceclient.md`, `restclient.md` - Client APIs
24
+
25
+ **Supervised Learning (`supervised-learning/`):**
26
+ - `../supervised-learning.mdx` - SL overview
27
+ - `sl-basic.mdx` - First SL run
28
+ - `sl-hyperparams.mdx` - LR formula, batch size
29
+ - `sl-loop.mdx` - Minimal training loop
30
+ - `prompt-distillation.mdx` - Distilling prompts
31
+ - `sweep-case-study.mdx` - Hyperparameter sweeps
32
+
33
+ **Reinforcement Learning (`rl/`):**
34
+ - `../rl.mdx` - RL overview (RLVR, RLHF)
35
+ - `rl-basic.mdx` - First RL run
36
+ - `rl-envs.mdx` - Custom Env, EnvGroupBuilder, RLDataset
37
+ - `rl-loops.mdx` - Minimal RL loop
38
+ - `rl-hyperparams.mdx` - batch_size vs group_size, async training
39
+ - `sequence-extension.mdx` - Multi-turn RL, KV-cache
40
+
41
+ **Preferences (`preferences/`):**
42
+ - `../preferences.mdx` - DPO vs RLHF overview
43
+ - `dpo-guide.mdx` - DPO training
44
+ - `rlhf-example.mdx` - RLHF pipeline
45
+
46
+ **Other:**
47
+ - `rendering.mdx` - Renderers (bridge between chat-style data and token sequences), vision inputs, TrainOnWhat
48
+ - `completers.mdx` - TokenCompleter vs MessageCompleter
49
+ - `evals.mdx` - Inline evals, Inspect AI, custom evaluators
50
+ - `lora-primer.mdx` - LoRA background
51
+ - `download-weights.mdx` / `publish-weights.mdx` - Weight export
52
+
53
+ ---
54
+
55
+ ## Composing Types
56
+
57
+ Agents often struggle with the nested type hierarchy. Key resources:
58
+
59
+ **Reference:** `docs/api-reference/types.md` documents all API types.
60
+
61
+ **Core types:**
62
+ - `Datum` = `model_input` (ModelInput) + `loss_fn_inputs` (dict of TensorData)
63
+ - `ModelInput` = list of chunks (EncodedTextChunk, ImageChunk)
64
+ - `TensorData` = wrapper for numpy/torch arrays with shape info
65
+
66
+ **Helper functions** (use these instead of manual construction):
67
+ - `datum_from_model_input_weights(model_input, weights, max_length)` - SL datum creation (`supervised/common.py`)
68
+ - `conversation_to_datum(messages, renderer, max_length, train_on_what)` - Full pipeline (`supervised/data.py`)
69
+ - `renderer.build_supervised_example(messages)` - Returns (ModelInput, weights)
70
+ - `ModelInput.from_ints(tokens)` - Create from token list
71
+ - `TensorData.from_numpy(arr)` / `TensorData.from_torch(tensor)` - Wrap arrays
72
+
73
+ ---
74
+
75
+ ## Architecture
76
+
77
+ **Builder pattern:** Config objects are `chz` dataclasses (SupervisedDatasetBuilder, RLDatasetBuilder, EnvGroupBuilder). They expose `.build()`/`__call__()` returning runtime objects.
78
+
79
+ **Key code locations:**
80
+ - SL: `logits_cookbook/supervised/train.py`
81
+ - RL: `logits_cookbook/rl/train.py`
82
+ - DPO: `logits_cookbook/preference/train_dpo.py`
83
+ - Renderers: `logits_cookbook/renderers/`
84
+ - Completers: `logits_cookbook/completers.py`
85
+ - RL types: `logits_cookbook/rl/types.py`
86
+ - Logging: `logits_cookbook/utils/logtree.py`, `logits_cookbook/rl/rollouts.py`
87
+ - Recipes: `logits_cookbook/recipes/`
88
+
89
+ **Training outputs:** RL and SL training write human-readable HTML reports and machine-readable JSON files (metrics, rollout transcripts, per-trajectory summaries) to `log_path`. Point agents at a `log_path` directory to analyze training runs — `metrics.jsonl` for scalar metrics, `*_rollout_summaries.jsonl` for per-trajectory data, and `*_logtree.json` for full rollout transcripts including model responses. See `docs/rl/rl-logging.mdx` for the complete file reference and parsing examples.
90
+
91
+ ---
92
+
93
+ ## Conventions
94
+
95
+ **Subscript suffixes** for tensor names: `_P` (problems), `_G` (groups), `_T` (tokens), `_D` (datums). Example: `tokens_P_G_T[p][g][t]`
96
+
97
+ **Code style:**
98
+ - Explicit typing; avoid `Any` / `type: ignore`
99
+ - Use `safezip`, `timed`, `scope` helpers
100
+ - `@chz.chz` decorator for config serialization
101
+ - `ml_log.log_metrics` for metrics; `logtree` for transcripts
102
+
103
+ **Env lifecycle:** `Env` objects are single-use (no reset). Create via `EnvGroupBuilder`.
104
+
105
+ ---
106
+
107
+ ## Common Pitfalls
108
+
109
+ 1. **LoRA LR:** Use `hyperparam_utils.get_lr(model_name)` - LoRA needs ~10x higher LR than full fine-tuning.
110
+
111
+ 2. **Renderer mismatch:** Match `renderer_name` to model family (`llama3`, `qwen3`, `role_colon`).
112
+
113
+ 3. **Async gaps:** Submit `forward_backward_async` and `optim_step_async` back-to-back before awaiting.
114
+
115
+ 4. **Sampler desync:** Create a **new** sampling client after saving weights.
116
+
117
+ 5. **Type construction:** Use helper functions, not manual dict construction. See `supervised/data.py` and `supervised/common.py`.
118
+
119
+ 6. **Group semantics:** RL advantages are centered within each group.
120
+
121
+ 7. **DPO:** Start with `dpo_beta=0.1`, LR~1e-5.
122
+
123
+ ---
124
+
125
+ ## Testing
126
+
127
+ ```bash
128
+ # Unit tests (no API needed, colocated *_test.py files)
129
+ pytest logits_cookbook/
130
+
131
+ # Smoke tests (requires TINKER_API_KEY + network)
132
+ pytest tests/
133
+ ```
134
+
135
+ For debugging, shrink workloads via `n_batches`, `batch_size`, `group_size` in dataset builders.
@@ -0,0 +1,5 @@
1
+ ### Fork from tinker cookbook
2
+ **Date:** 2026-04-09
3
+ **Type:** new
4
+ **Tags:** all
5
+
@@ -0,0 +1,135 @@
1
+ # Logits Cookbook Agent Guide
2
+
3
+ Quick reference for agents working on `logits_cookbook`. Full documentation is in `docs/`.
4
+
5
+ `logits_cookbook` is a client library with training and eval code built on the Tinker service (hosted by Thinking Machines Lab) and the Tinker SDK (a separate repo with just the API). You author training/eval loops that run on a CPU machine; Tinker executes the heavy GPU work.
6
+
7
+ **Start here:** `docs/training-sampling.mdx` - Complete walkthrough of training and sampling basics.
8
+
9
+ ## Documentation Map (`docs/`)
10
+
11
+ **API Fundamentals:**
12
+ - `index.mdx` - Tinker overview, division of responsibilities
13
+ - `install.mdx` - Installation, API key setup
14
+ - `training-sampling.mdx` - **Starter guide**: data prep, forward_backward, sampling, vision inputs
15
+ - `losses.mdx` - Loss functions (cross_entropy, importance_sampling, ppo, cispo, dro, forward_backward_custom)
16
+ - `save-load.mdx` - Checkpointing (save_weights_for_sampler vs save_state)
17
+ - `async.mdx` - Sync/async APIs, futures, overlapping requests
18
+ - `model-lineup.mdx` - Available models
19
+ - `under-the-hood.mdx` - Clock cycles, worker pools
20
+
21
+ **API Reference (`api-reference/`):**
22
+ - `types.md` - **All API types** (Datum, ModelInput, TensorData, SamplingParams, etc.)
23
+ - `trainingclient.md`, `samplingclient.md`, `serviceclient.md`, `restclient.md` - Client APIs
24
+
25
+ **Supervised Learning (`supervised-learning/`):**
26
+ - `../supervised-learning.mdx` - SL overview
27
+ - `sl-basic.mdx` - First SL run
28
+ - `sl-hyperparams.mdx` - LR formula, batch size
29
+ - `sl-loop.mdx` - Minimal training loop
30
+ - `prompt-distillation.mdx` - Distilling prompts
31
+ - `sweep-case-study.mdx` - Hyperparameter sweeps
32
+
33
+ **Reinforcement Learning (`rl/`):**
34
+ - `../rl.mdx` - RL overview (RLVR, RLHF)
35
+ - `rl-basic.mdx` - First RL run
36
+ - `rl-envs.mdx` - Custom Env, EnvGroupBuilder, RLDataset
37
+ - `rl-loops.mdx` - Minimal RL loop
38
+ - `rl-hyperparams.mdx` - batch_size vs group_size, async training
39
+ - `sequence-extension.mdx` - Multi-turn RL, KV-cache
40
+
41
+ **Preferences (`preferences/`):**
42
+ - `../preferences.mdx` - DPO vs RLHF overview
43
+ - `dpo-guide.mdx` - DPO training
44
+ - `rlhf-example.mdx` - RLHF pipeline
45
+
46
+ **Other:**
47
+ - `rendering.mdx` - Renderers (bridge between chat-style data and token sequences), vision inputs, TrainOnWhat
48
+ - `completers.mdx` - TokenCompleter vs MessageCompleter
49
+ - `evals.mdx` - Inline evals, Inspect AI, custom evaluators
50
+ - `lora-primer.mdx` - LoRA background
51
+ - `download-weights.mdx` / `publish-weights.mdx` - Weight export
52
+
53
+ ---
54
+
55
+ ## Composing Types
56
+
57
+ Agents often struggle with the nested type hierarchy. Key resources:
58
+
59
+ **Reference:** `docs/api-reference/types.md` documents all API types.
60
+
61
+ **Core types:**
62
+ - `Datum` = `model_input` (ModelInput) + `loss_fn_inputs` (dict of TensorData)
63
+ - `ModelInput` = list of chunks (EncodedTextChunk, ImageChunk)
64
+ - `TensorData` = wrapper for numpy/torch arrays with shape info
65
+
66
+ **Helper functions** (use these instead of manual construction):
67
+ - `datum_from_model_input_weights(model_input, weights, max_length)` - SL datum creation (`supervised/common.py`)
68
+ - `conversation_to_datum(messages, renderer, max_length, train_on_what)` - Full pipeline (`supervised/data.py`)
69
+ - `renderer.build_supervised_example(messages)` - Returns (ModelInput, weights)
70
+ - `ModelInput.from_ints(tokens)` - Create from token list
71
+ - `TensorData.from_numpy(arr)` / `TensorData.from_torch(tensor)` - Wrap arrays
72
+
73
+ ---
74
+
75
+ ## Architecture
76
+
77
+ **Builder pattern:** Config objects are `chz` dataclasses (SupervisedDatasetBuilder, RLDatasetBuilder, EnvGroupBuilder). They expose `.build()`/`__call__()` returning runtime objects.
78
+
79
+ **Key code locations:**
80
+ - SL: `logits_cookbook/supervised/train.py`
81
+ - RL: `logits_cookbook/rl/train.py`
82
+ - DPO: `logits_cookbook/preference/train_dpo.py`
83
+ - Renderers: `logits_cookbook/renderers/`
84
+ - Completers: `logits_cookbook/completers.py`
85
+ - RL types: `logits_cookbook/rl/types.py`
86
+ - Logging: `logits_cookbook/utils/logtree.py`, `logits_cookbook/rl/rollouts.py`
87
+ - Recipes: `logits_cookbook/recipes/`
88
+
89
+ **Training outputs:** RL and SL training write human-readable HTML reports and machine-readable JSON files (metrics, rollout transcripts, per-trajectory summaries) to `log_path`. Point agents at a `log_path` directory to analyze training runs — `metrics.jsonl` for scalar metrics, `*_rollout_summaries.jsonl` for per-trajectory data, and `*_logtree.json` for full rollout transcripts including model responses. See `docs/rl/rl-logging.mdx` for the complete file reference and parsing examples.
90
+
91
+ ---
92
+
93
+ ## Conventions
94
+
95
+ **Subscript suffixes** for tensor names: `_P` (problems), `_G` (groups), `_T` (tokens), `_D` (datums). Example: `tokens_P_G_T[p][g][t]`
96
+
97
+ **Code style:**
98
+ - Explicit typing; avoid `Any` / `type: ignore`
99
+ - Use `safezip`, `timed`, `scope` helpers
100
+ - `@chz.chz` decorator for config serialization
101
+ - `ml_log.log_metrics` for metrics; `logtree` for transcripts
102
+
103
+ **Env lifecycle:** `Env` objects are single-use (no reset). Create via `EnvGroupBuilder`.
104
+
105
+ ---
106
+
107
+ ## Common Pitfalls
108
+
109
+ 1. **LoRA LR:** Use `hyperparam_utils.get_lr(model_name)` - LoRA needs ~10x higher LR than full fine-tuning.
110
+
111
+ 2. **Renderer mismatch:** Match `renderer_name` to model family (`llama3`, `qwen3`, `role_colon`).
112
+
113
+ 3. **Async gaps:** Submit `forward_backward_async` and `optim_step_async` back-to-back before awaiting.
114
+
115
+ 4. **Sampler desync:** Create a **new** sampling client after saving weights.
116
+
117
+ 5. **Type construction:** Use helper functions, not manual dict construction. See `supervised/data.py` and `supervised/common.py`.
118
+
119
+ 6. **Group semantics:** RL advantages are centered within each group.
120
+
121
+ 7. **DPO:** Start with `dpo_beta=0.1`, LR~1e-5.
122
+
123
+ ---
124
+
125
+ ## Testing
126
+
127
+ ```bash
128
+ # Unit tests (no API needed, colocated *_test.py files)
129
+ pytest logits_cookbook/
130
+
131
+ # Smoke tests (requires TINKER_API_KEY + network)
132
+ pytest tests/
133
+ ```
134
+
135
+ For debugging, shrink workloads via `n_batches`, `batch_size`, `group_size` in dataset builders.
@@ -0,0 +1,77 @@
1
+ # Development
2
+
3
+ This project is built in the spirit of open science and collaborative development. We believe that the best tools emerge through community involvement and shared learning.
4
+
5
+ We welcome PR contributions after our private beta is over. If you have any feedback, please email us at tinker@thinkingmachines.ai.
6
+
7
+ ## Organization of training scripts
8
+
9
+ We're designing the codebase with the following goals:
10
+
11
+ 1. Low barrier to entry: it should be dead simple to run something and see numbers go up.
12
+ 2. Extensible: it should be possible to pass in custom datasets and evals and control all the hyperparameters.
13
+ 3. Science-friendly: it should be easy to run sweeps, and analyze the results.
14
+
15
+ To achieve this, we'll use the following structure around training scripts:
16
+
17
+ - There's a main training function, such as [rl/train.py](logits_cookbook/rl/train.py) or [supervised/train.py](logits_cookbook/supervised/train.py), which contains the main loop.
18
+ - This function contains a detailed config object (`Config`), which isn't constructable from the command line.
19
+ - The config contains members that specify things like datasets and evals. However, these should be chz configs (with a `.build` method that constructs the actual object) or callables (we recommend using functools.partial). This way, the config is serializable, which is useful for sweeps.
20
+ - There are launch scripts that assemble training configs (e.g., [recipes/math_rl/train.py](logits_cookbook/recipes/math_rl/train.py)), which construct a smaller config object (`CLIConfig`) from the command line.
21
+
22
+ ## Async
23
+
24
+ Async is very useful for RL, where it allows us to make many queries in parallel (e.g., sampling calls). For all of the interfaces used in RL (such as the `Env` class), all the methods that take nontrivial amounts of time should be async. For some of the other code, such as [recipes/sl_loop.py](logits_cookbook/recipes/sl_loop.py), we've chosen not to use async methods, just to make it more beginner-friendly, as many python programmers are not familiar with async.
25
+
26
+ ## Typing
27
+
28
+ Please use typing wherever possible; avoid `Any` and `type: ignore`; prefer casting. However, avoid using convoluted generics or writing code that's much more verbose just to satisfy the type checker. Prefer using single types over union types.
29
+
30
+ ## Classes
31
+
32
+ There are a lot of different classes, which might make the code feel less approachable. However, they follow *the builder pattern*, and the code should be less confusing when you know the pattern.
33
+
34
+ We can illustrate the pattern with the two main examples:
35
+
36
+ - A `SupervisedDatasetBuilder` is a configuration object which builds a `SupervisedDataset`.
37
+ - An `RLDatasetBuilder` is a configuration object which builds an `RLDataset`, which generates batches of `EnvGroupBuilder` objects, which each generate a group of `Env` objects.
38
+
39
+ Here, the `SupervisedDatasetBuilder`, `RLDatasetBuilder`, and `EnvGroupBuilder` are all configuration objects, which have a `__call__` method that builds another object. You can see these objects in [supervised/types.py](logits_cookbook/supervised/types.py) and [rl/types.py](logits_cookbook/rl/types.py).
40
+
41
+ In general, we use a lot of configuration objects, with a `__call__` method that returns a heavyweight object (like a dataset). We use `chz` for the configuration objects -- it's similar to a dataclass but with some extra features that are nice for configs. We use either dataclasses or regular python classes for the heavyweight objects.
42
+
43
+ ## Envs
44
+
45
+ An `Env` is an RL environment. For those with an RL background, it roughly corresponds to an MDP or a POMDP, however we use in more general cases (such as multi-agent settings) that don't strictly correspond to the MDP/POMDP formalism. It's roughly analogous the concept of an Env in OpenAI Gym, but unlike OpenAI Gym, we don't have a `reset` method; rather, the env should be discarded after a rollout. Any shared resources should be maintained by whatever object is creating the envs.
46
+
47
+ The `Env`s are created by `EnvGroupBuilder`s. The group of envs returned by `EnvGroupBuilder` have something in common; either they correspond to the same task (in which case we can use this information for variance reduction, as in GRPO, which centers per group); or, we can use the group to define a multi-agent environment.
48
+
49
+ - One common multi-agent environment is where we use a pairwise preference model to compare pairs of completions.
50
+ - We can also use the group to define a two-player game. Some two player games such as tic-tac-toe are currently supported through the [text_arena](logits_cookbook/recipes/multiplayer_rl/text_arena/env.py) environments.
51
+
52
+
53
+ ## Notation
54
+
55
+ We'll use subscripts to indicate the shapes of objects. For example, `tokens_P_G_T` indicates a three-dimensional array of tokens, with `P` problems, `G` groups, and `T` tokens per groups, so `tokens_P_G_T[p][g][t]` should refer to a single token. In many cases, the arrays will be ragged. E.g., the `T` axis will have different lengths for different `(p,g)`. Sometimes, a given dimension will be flattened from two dimensions. If we write `tokens_PG_T`, that means that we have a two dimensional array, where the 0th dimension is flattened from the `P` and `G` dimensions.
56
+
57
+ ### Common Dimension Names
58
+
59
+ Here are the standard dimension subscripts used throughout the codebase:
60
+
61
+ - `_D`: Data/Datum dimension (for training data items)
62
+ - `_G`: Group dimension (for multiple attempts/rollouts of the same problem)
63
+ - `_P`: Problem dimension (for different problems/prompts)
64
+ - `_T`: Token/Time dimension (for sequences)
65
+
66
+ The relationship between dimensions in RL:
67
+ - A batch contains multiple problems (`_P`)
68
+ - Each problem spawns multiple attempts/environments (`_G`), forming a group
69
+ - Each attempt produces one trajectory
70
+ - Advantages are normalized within each group (across the `_G` dimension)
71
+
72
+ Examples:
73
+ - `env_group_builders_P`: A list of environment builders, one per problem
74
+ - `trajectories_G`: Multiple trajectories from attempts at the same problem
75
+ - `rewards_G`: Rewards for each attempt within a group
76
+ - `tokens_P_G_T`: Tokens with problem, group, and time dimensions
77
+ - `data_D`: A list of training data items