logits-cookbook 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- logits_cookbook-0.1.0/.github/workflows/ci.yml +49 -0
- logits_cookbook-0.1.0/.github/workflows/publish.yml +87 -0
- logits_cookbook-0.1.0/.gitignore +8 -0
- logits_cookbook-0.1.0/AGENTS.md +135 -0
- logits_cookbook-0.1.0/CHANGELOG.md +5 -0
- logits_cookbook-0.1.0/CLAUDE.md +135 -0
- logits_cookbook-0.1.0/CONTRIBUTING.md +77 -0
- logits_cookbook-0.1.0/LICENSE +202 -0
- logits_cookbook-0.1.0/PKG-INFO +162 -0
- logits_cookbook-0.1.0/README.md +93 -0
- logits_cookbook-0.1.0/docs/api-reference/apifuture.md +151 -0
- logits_cookbook-0.1.0/docs/api-reference/exceptions.md +130 -0
- logits_cookbook-0.1.0/docs/api-reference/restclient.md +512 -0
- logits_cookbook-0.1.0/docs/api-reference/samplingclient.md +112 -0
- logits_cookbook-0.1.0/docs/api-reference/serviceclient.md +267 -0
- logits_cookbook-0.1.0/docs/api-reference/trainingclient.md +462 -0
- logits_cookbook-0.1.0/docs/api-reference/types.md +899 -0
- logits_cookbook-0.1.0/docs/async.mdx +58 -0
- logits_cookbook-0.1.0/docs/compatible-apis/openai.mdx +78 -0
- logits_cookbook-0.1.0/docs/completers.mdx +46 -0
- logits_cookbook-0.1.0/docs/dev-tips.mdx +5 -0
- logits_cookbook-0.1.0/docs/docs-outline.mdx +25 -0
- logits_cookbook-0.1.0/docs/download-weights.mdx +33 -0
- logits_cookbook-0.1.0/docs/evals.mdx +234 -0
- logits_cookbook-0.1.0/docs/index.mdx +37 -0
- logits_cookbook-0.1.0/docs/install.mdx +34 -0
- logits_cookbook-0.1.0/docs/lora-primer.mdx +55 -0
- logits_cookbook-0.1.0/docs/losses.mdx +308 -0
- logits_cookbook-0.1.0/docs/model-lineup.mdx +66 -0
- logits_cookbook-0.1.0/docs/overview-building.mdx +16 -0
- logits_cookbook-0.1.0/docs/preferences/dpo-guide.mdx +112 -0
- logits_cookbook-0.1.0/docs/preferences/rlhf-example.mdx +22 -0
- logits_cookbook-0.1.0/docs/preferences.mdx +15 -0
- logits_cookbook-0.1.0/docs/publish-weights.mdx +48 -0
- logits_cookbook-0.1.0/docs/rendering.mdx +241 -0
- logits_cookbook-0.1.0/docs/rl/images/rl_loop_reward.png +0 -0
- logits_cookbook-0.1.0/docs/rl/rl-basic.mdx +27 -0
- logits_cookbook-0.1.0/docs/rl/rl-envs.mdx +63 -0
- logits_cookbook-0.1.0/docs/rl/rl-hyperparams.mdx +83 -0
- logits_cookbook-0.1.0/docs/rl/rl-logging.mdx +98 -0
- logits_cookbook-0.1.0/docs/rl/rl-loops.mdx +25 -0
- logits_cookbook-0.1.0/docs/rl/sequence-extension.mdx +127 -0
- logits_cookbook-0.1.0/docs/rl.mdx +21 -0
- logits_cookbook-0.1.0/docs/save-load.mdx +56 -0
- logits_cookbook-0.1.0/docs/supervised-learning/images/lr_sweep.png +0 -0
- logits_cookbook-0.1.0/docs/supervised-learning/images/train_test_loss.png +0 -0
- logits_cookbook-0.1.0/docs/supervised-learning/prompt-distillation.mdx +88 -0
- logits_cookbook-0.1.0/docs/supervised-learning/sl-basic.mdx +45 -0
- logits_cookbook-0.1.0/docs/supervised-learning/sl-hyperparams.mdx +44 -0
- logits_cookbook-0.1.0/docs/supervised-learning/sl-loop.mdx +5 -0
- logits_cookbook-0.1.0/docs/supervised-learning/sweep-case-study.mdx +112 -0
- logits_cookbook-0.1.0/docs/supervised-learning.mdx +16 -0
- logits_cookbook-0.1.0/docs/training-sampling.mdx +309 -0
- logits_cookbook-0.1.0/docs/under-the-hood.mdx +68 -0
- logits_cookbook-0.1.0/logits_cookbook/__init__.py +1 -0
- logits_cookbook-0.1.0/logits_cookbook/chat_app/README.md +46 -0
- logits_cookbook-0.1.0/logits_cookbook/chat_app/logits_chat_cli.py +184 -0
- logits_cookbook-0.1.0/logits_cookbook/checkpoint_utils.py +362 -0
- logits_cookbook-0.1.0/logits_cookbook/cli_utils.py +60 -0
- logits_cookbook-0.1.0/logits_cookbook/client_utils.py +29 -0
- logits_cookbook-0.1.0/logits_cookbook/completers.py +123 -0
- logits_cookbook-0.1.0/logits_cookbook/display.py +46 -0
- logits_cookbook-0.1.0/logits_cookbook/distillation/__init__.py +0 -0
- logits_cookbook-0.1.0/logits_cookbook/distillation/datasets.py +278 -0
- logits_cookbook-0.1.0/logits_cookbook/distillation/train_on_policy.py +505 -0
- logits_cookbook-0.1.0/logits_cookbook/eval/README.md +13 -0
- logits_cookbook-0.1.0/logits_cookbook/eval/__init__.py +0 -0
- logits_cookbook-0.1.0/logits_cookbook/eval/custom_evaluators.py +103 -0
- logits_cookbook-0.1.0/logits_cookbook/eval/custom_inspect_task.py +69 -0
- logits_cookbook-0.1.0/logits_cookbook/eval/evaluators.py +30 -0
- logits_cookbook-0.1.0/logits_cookbook/eval/inspect_evaluators.py +137 -0
- logits_cookbook-0.1.0/logits_cookbook/eval/inspect_utils.py +161 -0
- logits_cookbook-0.1.0/logits_cookbook/eval/run_inspect_evals.py +71 -0
- logits_cookbook-0.1.0/logits_cookbook/example_data/conversations.jsonl +128 -0
- logits_cookbook-0.1.0/logits_cookbook/example_data/multilingual.txt +2100 -0
- logits_cookbook-0.1.0/logits_cookbook/hyperparam_utils.py +190 -0
- logits_cookbook-0.1.0/logits_cookbook/image_processing_utils.py +63 -0
- logits_cookbook-0.1.0/logits_cookbook/image_processing_utils_test.py +54 -0
- logits_cookbook-0.1.0/logits_cookbook/model_info.py +140 -0
- logits_cookbook-0.1.0/logits_cookbook/preference/__init__.py +0 -0
- logits_cookbook-0.1.0/logits_cookbook/preference/comparison_policy_evaluator.py +67 -0
- logits_cookbook-0.1.0/logits_cookbook/preference/dpo_datasets.py +77 -0
- logits_cookbook-0.1.0/logits_cookbook/preference/preference_datasets.py +172 -0
- logits_cookbook-0.1.0/logits_cookbook/preference/train_dpo.py +433 -0
- logits_cookbook-0.1.0/logits_cookbook/preference/types.py +157 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/README.md +32 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/chat_sl/README.md +41 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/chat_sl/chat_datasets.py +77 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/chat_sl/train.py +173 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/README.md +70 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/code_env.py +290 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/code_grading.py +183 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/deepcoder_tool.py +135 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/lcb_utils.py +821 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/sandbox_config/local.yaml +122 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/code_rl/train.py +125 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/README.md +154 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/harbor_multiturn.py +35 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/harbor_multiturn_test.py +49 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/off_policy_reasoning.py +201 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/on_policy_distillation.py +182 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/on_policy_distillation_harbor_multi_turn.py +187 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/distillation/on_policy_multi_teacher.py +195 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/README.md +94 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/harbor_env.py +211 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/harbor_tools.py +118 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/harbor_tools_test.py +250 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/launch_terminal_bench.py +47 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/harbor_rl/train.py +121 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/README.md +85 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/arithmetic_env.py +105 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/math_env.py +449 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/math_env_test.py +32 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/math_grading.py +548 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/math_rl/train.py +169 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/README.md +14 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/guess_number/README.md +82 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/guess_number/env.py +169 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/guess_number/train.py +76 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/text_arena/README.md +65 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/text_arena/env.py +298 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/text_arena/train.py +78 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/twenty_questions/README.md +73 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/twenty_questions/common_english_nouns.txt +171 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/twenty_questions/env.py +273 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/multiplayer_rl/twenty_questions/train.py +78 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/README.md +12 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/datasets.py +315 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/dpo/README.md +34 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/dpo/train.py +134 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/rlhf/README.md +28 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/rlhf/rlhf_pipeline.py +299 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/shorter/README.md +29 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/shorter/env.py +61 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/preference/shorter/train.py +78 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/prompt_distillation/README.md +76 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/prompt_distillation/create_data.py +178 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/prompt_distillation/train.py +120 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rl_basic.py +42 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rl_loop.py +256 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/README.md +94 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/data.py +201 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/debug_env.py +78 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/env.py +265 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/generate_data.py +46 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/prometheus_experimental.py +147 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/rubric/train.py +158 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/README.md +68 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/chroma_pickle_test.py +37 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/embedding.py +133 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/offline_eval.py +213 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/search_env.py +246 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/tools.py +272 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/search_tool/train.py +150 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/sl_basic.py +52 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/sl_loop.py +169 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/README.md +35 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/evaluate.py +166 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/logits_openai.py +261 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/train.py +152 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/verifiers_env.py +206 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/verifiers_rl/verifiers_pickle_test.py +33 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/README.md +42 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/data.py +529 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/eval.py +503 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/eval_sweep.py +275 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/sweep.py +230 -0
- logits_cookbook-0.1.0/logits_cookbook/recipes/vlm_classifier/train.py +161 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/README.md +15 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/__init__.py +252 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/base.py +1576 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/deepseek_v3.py +501 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/deepseek_v3_test.py +183 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/gpt_oss.py +661 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/gpt_oss_test.py +186 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k2.py +555 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k25.py +160 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k25_test.py +814 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k2_5_tool_declaration_ts.py +481 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k2_test.py +768 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/kimi_k2_tool_declaration_test.py +261 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/llama3.py +70 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/parsing_test.py +417 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/qwen3.py +558 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/qwen3_5.py +290 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/qwen3_test.py +247 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/qwen3_tool_declaration_test.py +348 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/renderer_pickle_test.py +142 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/renderers_test.py +1439 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/role_colon.py +97 -0
- logits_cookbook-0.1.0/logits_cookbook/renderers/tool_calling_test.py +309 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/__init__.py +0 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/builder_pickle_test.py +104 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/data_processing.py +207 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/message_env.py +120 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/message_env_test.py +318 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/metric_util.py +250 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/metrics.py +169 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/multiturn_weight_assignment_test.py +284 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/play_w_env.py +112 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/preference_envs.py +283 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/problem_env.py +114 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/rollout_logging.py +125 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/rollout_logging_test.py +46 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/rollouts.py +237 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/train.py +1436 -0
- logits_cookbook-0.1.0/logits_cookbook/rl/types.py +186 -0
- logits_cookbook-0.1.0/logits_cookbook/sandbox/README.md +67 -0
- logits_cookbook-0.1.0/logits_cookbook/sandbox/__init__.py +30 -0
- logits_cookbook-0.1.0/logits_cookbook/sandbox/modal_sandbox.py +334 -0
- logits_cookbook-0.1.0/logits_cookbook/sandbox/sandbox_interface.py +105 -0
- logits_cookbook-0.1.0/logits_cookbook/sandbox/sandboxfusion.py +122 -0
- logits_cookbook-0.1.0/logits_cookbook/scripts/merge_logits_adapter_to_hf_model.py +182 -0
- logits_cookbook-0.1.0/logits_cookbook/scripts/test_tool_calling_e2e.py +228 -0
- logits_cookbook-0.1.0/logits_cookbook/supervised/__init__.py +0 -0
- logits_cookbook-0.1.0/logits_cookbook/supervised/common.py +138 -0
- logits_cookbook-0.1.0/logits_cookbook/supervised/data.py +183 -0
- logits_cookbook-0.1.0/logits_cookbook/supervised/nll_evaluator.py +26 -0
- logits_cookbook-0.1.0/logits_cookbook/supervised/resume_test.py +156 -0
- logits_cookbook-0.1.0/logits_cookbook/supervised/train.py +404 -0
- logits_cookbook-0.1.0/logits_cookbook/supervised/types.py +87 -0
- logits_cookbook-0.1.0/logits_cookbook/supervised/viz_sft_dataset.py +53 -0
- logits_cookbook-0.1.0/logits_cookbook/third_party/__init__.py +0 -0
- logits_cookbook-0.1.0/logits_cookbook/third_party/litellm/README.md +181 -0
- logits_cookbook-0.1.0/logits_cookbook/third_party/litellm/__init__.py +5 -0
- logits_cookbook-0.1.0/logits_cookbook/third_party/litellm/provider.py +533 -0
- logits_cookbook-0.1.0/logits_cookbook/third_party/litellm/provider_test.py +564 -0
- logits_cookbook-0.1.0/logits_cookbook/tokenizer_utils.py +105 -0
- logits_cookbook-0.1.0/logits_cookbook/tokenizer_utils_test.py +83 -0
- logits_cookbook-0.1.0/logits_cookbook/tool_use/README.md +99 -0
- logits_cookbook-0.1.0/logits_cookbook/tool_use/__init__.py +33 -0
- logits_cookbook-0.1.0/logits_cookbook/tool_use/agent_tool_message_env.py +148 -0
- logits_cookbook-0.1.0/logits_cookbook/tool_use/tools.py +321 -0
- logits_cookbook-0.1.0/logits_cookbook/tool_use/types.py +58 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/__init__.py +1 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/code_state.py +130 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/file_utils.py +6 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/format_colorized.py +50 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/logtree.py +1125 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/logtree_formatters.py +250 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/logtree_test.py +722 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/lr_scheduling.py +23 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/misc_utils.py +94 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/ml_log.py +531 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/ml_log_test.py +42 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/trace.py +443 -0
- logits_cookbook-0.1.0/logits_cookbook/utils/trace_test.py +127 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/README.md +93 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/__init__.py +6 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/control.py +509 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/core.py +658 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/examples/async_rl_sweep.py +94 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/examples/fake_train.py +75 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/examples/ml_sweep.py +278 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/run_job.py +50 -0
- logits_cookbook-0.1.0/logits_cookbook/xmux/utils.py +253 -0
- logits_cookbook-0.1.0/pyproject.toml +113 -0
- logits_cookbook-0.1.0/tests/__init__.py +0 -0
- logits_cookbook-0.1.0/tests/compare_sampling_training_logprobs.py +159 -0
- logits_cookbook-0.1.0/tests/conftest.py +32 -0
- logits_cookbook-0.1.0/tests/helpers.py +95 -0
- logits_cookbook-0.1.0/tests/smoke_tests.py +143 -0
- logits_cookbook-0.1.0/tests/test_recipe_chat_sl.py +33 -0
- logits_cookbook-0.1.0/tests/test_recipe_dpo.py +11 -0
- logits_cookbook-0.1.0/tests/test_recipe_guess_number.py +14 -0
- logits_cookbook-0.1.0/tests/test_recipe_off_policy_reasoning.py +16 -0
- logits_cookbook-0.1.0/tests/test_recipe_on_policy_distillation.py +14 -0
- logits_cookbook-0.1.0/tests/test_recipe_on_policy_multi_teacher.py +15 -0
- logits_cookbook-0.1.0/tests/test_recipe_rlhf_pipeline.py +11 -0
- logits_cookbook-0.1.0/tests/test_recipe_shorter.py +11 -0
- logits_cookbook-0.1.0/tests/test_recipe_text_arena.py +14 -0
- logits_cookbook-0.1.0/tests/test_recipe_twenty_questions.py +15 -0
- logits_cookbook-0.1.0/tests/test_recipe_vlm_classifier.py +18 -0
- logits_cookbook-0.1.0/tests/third_party/__init__.py +0 -0
- logits_cookbook-0.1.0/tests/third_party/test_litellm.py +169 -0
- logits_cookbook-0.1.0/tests/validate_temperature_logprobs.py +369 -0
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
pull_request:
|
|
6
|
+
workflow_dispatch:
|
|
7
|
+
|
|
8
|
+
jobs:
|
|
9
|
+
lint:
|
|
10
|
+
name: Lint (ruff)
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
|
|
13
|
+
steps:
|
|
14
|
+
- name: Check out repository
|
|
15
|
+
uses: actions/checkout@v4
|
|
16
|
+
|
|
17
|
+
- name: Set up Python
|
|
18
|
+
uses: actions/setup-python@v5
|
|
19
|
+
with:
|
|
20
|
+
python-version: "3.12"
|
|
21
|
+
|
|
22
|
+
- name: Install ruff
|
|
23
|
+
run: python -m pip install --upgrade pip ruff
|
|
24
|
+
|
|
25
|
+
- name: ruff check
|
|
26
|
+
run: ruff check .
|
|
27
|
+
|
|
28
|
+
- name: ruff format --check
|
|
29
|
+
run: ruff format --check .
|
|
30
|
+
|
|
31
|
+
package:
|
|
32
|
+
name: Build package
|
|
33
|
+
runs-on: ubuntu-latest
|
|
34
|
+
|
|
35
|
+
steps:
|
|
36
|
+
- name: Check out repository
|
|
37
|
+
uses: actions/checkout@v4
|
|
38
|
+
|
|
39
|
+
- name: Set up Python
|
|
40
|
+
uses: actions/setup-python@v5
|
|
41
|
+
with:
|
|
42
|
+
python-version: "3.12"
|
|
43
|
+
|
|
44
|
+
- name: Build and check distributions
|
|
45
|
+
run: |
|
|
46
|
+
python -m pip install --upgrade pip
|
|
47
|
+
python -m pip install build twine
|
|
48
|
+
python -m build
|
|
49
|
+
python -m twine check dist/*
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
name: Publish
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
release:
|
|
5
|
+
types: [published]
|
|
6
|
+
workflow_dispatch:
|
|
7
|
+
inputs:
|
|
8
|
+
repository:
|
|
9
|
+
description: "Repository to publish to"
|
|
10
|
+
required: true
|
|
11
|
+
default: "testpypi"
|
|
12
|
+
type: choice
|
|
13
|
+
options:
|
|
14
|
+
- testpypi
|
|
15
|
+
- pypi
|
|
16
|
+
|
|
17
|
+
permissions:
|
|
18
|
+
contents: read
|
|
19
|
+
|
|
20
|
+
jobs:
|
|
21
|
+
build:
|
|
22
|
+
name: Build distributions
|
|
23
|
+
runs-on: ubuntu-latest
|
|
24
|
+
|
|
25
|
+
steps:
|
|
26
|
+
- name: Check out repository
|
|
27
|
+
uses: actions/checkout@v4
|
|
28
|
+
|
|
29
|
+
- name: Set up Python
|
|
30
|
+
uses: actions/setup-python@v5
|
|
31
|
+
with:
|
|
32
|
+
python-version: "3.12"
|
|
33
|
+
|
|
34
|
+
- name: Build package
|
|
35
|
+
run: |
|
|
36
|
+
python -m pip install --upgrade pip
|
|
37
|
+
python -m pip install build twine
|
|
38
|
+
python -m build
|
|
39
|
+
python -m twine check dist/*
|
|
40
|
+
|
|
41
|
+
- name: Upload distributions
|
|
42
|
+
uses: actions/upload-artifact@v4
|
|
43
|
+
with:
|
|
44
|
+
name: python-package-distributions
|
|
45
|
+
path: dist/
|
|
46
|
+
|
|
47
|
+
publish-testpypi:
|
|
48
|
+
name: Publish to TestPyPI
|
|
49
|
+
needs: build
|
|
50
|
+
if: github.event_name == 'workflow_dispatch' && inputs.repository == 'testpypi'
|
|
51
|
+
runs-on: ubuntu-latest
|
|
52
|
+
environment: testpypi
|
|
53
|
+
permissions:
|
|
54
|
+
id-token: write
|
|
55
|
+
contents: read
|
|
56
|
+
|
|
57
|
+
steps:
|
|
58
|
+
- name: Download distributions
|
|
59
|
+
uses: actions/download-artifact@v4
|
|
60
|
+
with:
|
|
61
|
+
name: python-package-distributions
|
|
62
|
+
path: dist/
|
|
63
|
+
|
|
64
|
+
- name: Publish distributions to TestPyPI
|
|
65
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
66
|
+
with:
|
|
67
|
+
repository-url: https://test.pypi.org/legacy/
|
|
68
|
+
|
|
69
|
+
publish-pypi:
|
|
70
|
+
name: Publish to PyPI
|
|
71
|
+
needs: build
|
|
72
|
+
if: github.event_name == 'release' || (github.event_name == 'workflow_dispatch' && inputs.repository == 'pypi')
|
|
73
|
+
runs-on: ubuntu-latest
|
|
74
|
+
environment: pypi
|
|
75
|
+
permissions:
|
|
76
|
+
id-token: write
|
|
77
|
+
contents: read
|
|
78
|
+
|
|
79
|
+
steps:
|
|
80
|
+
- name: Download distributions
|
|
81
|
+
uses: actions/download-artifact@v4
|
|
82
|
+
with:
|
|
83
|
+
name: python-package-distributions
|
|
84
|
+
path: dist/
|
|
85
|
+
|
|
86
|
+
- name: Publish distributions to PyPI
|
|
87
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
# Logits Cookbook Agent Guide
|
|
2
|
+
|
|
3
|
+
Quick reference for agents working on `logits_cookbook`. Full documentation is in `docs/`.
|
|
4
|
+
|
|
5
|
+
`logits_cookbook` is a client library with training and eval code built on the Tinker service (hosted by Thinking Machines Lab) and the Tinker SDK (a separate repo with just the API). You author training/eval loops that run on a CPU machine; Tinker executes the heavy GPU work.
|
|
6
|
+
|
|
7
|
+
**Start here:** `docs/training-sampling.mdx` - Complete walkthrough of training and sampling basics.
|
|
8
|
+
|
|
9
|
+
## Documentation Map (`docs/`)
|
|
10
|
+
|
|
11
|
+
**API Fundamentals:**
|
|
12
|
+
- `index.mdx` - Tinker overview, division of responsibilities
|
|
13
|
+
- `install.mdx` - Installation, API key setup
|
|
14
|
+
- `training-sampling.mdx` - **Starter guide**: data prep, forward_backward, sampling, vision inputs
|
|
15
|
+
- `losses.mdx` - Loss functions (cross_entropy, importance_sampling, ppo, cispo, dro, forward_backward_custom)
|
|
16
|
+
- `save-load.mdx` - Checkpointing (save_weights_for_sampler vs save_state)
|
|
17
|
+
- `async.mdx` - Sync/async APIs, futures, overlapping requests
|
|
18
|
+
- `model-lineup.mdx` - Available models
|
|
19
|
+
- `under-the-hood.mdx` - Clock cycles, worker pools
|
|
20
|
+
|
|
21
|
+
**API Reference (`api-reference/`):**
|
|
22
|
+
- `types.md` - **All API types** (Datum, ModelInput, TensorData, SamplingParams, etc.)
|
|
23
|
+
- `trainingclient.md`, `samplingclient.md`, `serviceclient.md`, `restclient.md` - Client APIs
|
|
24
|
+
|
|
25
|
+
**Supervised Learning (`supervised-learning/`):**
|
|
26
|
+
- `../supervised-learning.mdx` - SL overview
|
|
27
|
+
- `sl-basic.mdx` - First SL run
|
|
28
|
+
- `sl-hyperparams.mdx` - LR formula, batch size
|
|
29
|
+
- `sl-loop.mdx` - Minimal training loop
|
|
30
|
+
- `prompt-distillation.mdx` - Distilling prompts
|
|
31
|
+
- `sweep-case-study.mdx` - Hyperparameter sweeps
|
|
32
|
+
|
|
33
|
+
**Reinforcement Learning (`rl/`):**
|
|
34
|
+
- `../rl.mdx` - RL overview (RLVR, RLHF)
|
|
35
|
+
- `rl-basic.mdx` - First RL run
|
|
36
|
+
- `rl-envs.mdx` - Custom Env, EnvGroupBuilder, RLDataset
|
|
37
|
+
- `rl-loops.mdx` - Minimal RL loop
|
|
38
|
+
- `rl-hyperparams.mdx` - batch_size vs group_size, async training
|
|
39
|
+
- `sequence-extension.mdx` - Multi-turn RL, KV-cache
|
|
40
|
+
|
|
41
|
+
**Preferences (`preferences/`):**
|
|
42
|
+
- `../preferences.mdx` - DPO vs RLHF overview
|
|
43
|
+
- `dpo-guide.mdx` - DPO training
|
|
44
|
+
- `rlhf-example.mdx` - RLHF pipeline
|
|
45
|
+
|
|
46
|
+
**Other:**
|
|
47
|
+
- `rendering.mdx` - Renderers (bridge between chat-style data and token sequences), vision inputs, TrainOnWhat
|
|
48
|
+
- `completers.mdx` - TokenCompleter vs MessageCompleter
|
|
49
|
+
- `evals.mdx` - Inline evals, Inspect AI, custom evaluators
|
|
50
|
+
- `lora-primer.mdx` - LoRA background
|
|
51
|
+
- `download-weights.mdx` / `publish-weights.mdx` - Weight export
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Composing Types
|
|
56
|
+
|
|
57
|
+
Agents often struggle with the nested type hierarchy. Key resources:
|
|
58
|
+
|
|
59
|
+
**Reference:** `docs/api-reference/types.md` documents all API types.
|
|
60
|
+
|
|
61
|
+
**Core types:**
|
|
62
|
+
- `Datum` = `model_input` (ModelInput) + `loss_fn_inputs` (dict of TensorData)
|
|
63
|
+
- `ModelInput` = list of chunks (EncodedTextChunk, ImageChunk)
|
|
64
|
+
- `TensorData` = wrapper for numpy/torch arrays with shape info
|
|
65
|
+
|
|
66
|
+
**Helper functions** (use these instead of manual construction):
|
|
67
|
+
- `datum_from_model_input_weights(model_input, weights, max_length)` - SL datum creation (`supervised/common.py`)
|
|
68
|
+
- `conversation_to_datum(messages, renderer, max_length, train_on_what)` - Full pipeline (`supervised/data.py`)
|
|
69
|
+
- `renderer.build_supervised_example(messages)` - Returns (ModelInput, weights)
|
|
70
|
+
- `ModelInput.from_ints(tokens)` - Create from token list
|
|
71
|
+
- `TensorData.from_numpy(arr)` / `TensorData.from_torch(tensor)` - Wrap arrays
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Architecture
|
|
76
|
+
|
|
77
|
+
**Builder pattern:** Config objects are `chz` dataclasses (SupervisedDatasetBuilder, RLDatasetBuilder, EnvGroupBuilder). They expose `.build()`/`__call__()` returning runtime objects.
|
|
78
|
+
|
|
79
|
+
**Key code locations:**
|
|
80
|
+
- SL: `logits_cookbook/supervised/train.py`
|
|
81
|
+
- RL: `logits_cookbook/rl/train.py`
|
|
82
|
+
- DPO: `logits_cookbook/preference/train_dpo.py`
|
|
83
|
+
- Renderers: `logits_cookbook/renderers/`
|
|
84
|
+
- Completers: `logits_cookbook/completers.py`
|
|
85
|
+
- RL types: `logits_cookbook/rl/types.py`
|
|
86
|
+
- Logging: `logits_cookbook/utils/logtree.py`, `logits_cookbook/rl/rollouts.py`
|
|
87
|
+
- Recipes: `logits_cookbook/recipes/`
|
|
88
|
+
|
|
89
|
+
**Training outputs:** RL and SL training write human-readable HTML reports and machine-readable JSON files (metrics, rollout transcripts, per-trajectory summaries) to `log_path`. Point agents at a `log_path` directory to analyze training runs — `metrics.jsonl` for scalar metrics, `*_rollout_summaries.jsonl` for per-trajectory data, and `*_logtree.json` for full rollout transcripts including model responses. See `docs/rl/rl-logging.mdx` for the complete file reference and parsing examples.
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Conventions
|
|
94
|
+
|
|
95
|
+
**Subscript suffixes** for tensor names: `_P` (problems), `_G` (groups), `_T` (tokens), `_D` (datums). Example: `tokens_P_G_T[p][g][t]`
|
|
96
|
+
|
|
97
|
+
**Code style:**
|
|
98
|
+
- Explicit typing; avoid `Any` / `type: ignore`
|
|
99
|
+
- Use `safezip`, `timed`, `scope` helpers
|
|
100
|
+
- `@chz.chz` decorator for config serialization
|
|
101
|
+
- `ml_log.log_metrics` for metrics; `logtree` for transcripts
|
|
102
|
+
|
|
103
|
+
**Env lifecycle:** `Env` objects are single-use (no reset). Create via `EnvGroupBuilder`.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Common Pitfalls
|
|
108
|
+
|
|
109
|
+
1. **LoRA LR:** Use `hyperparam_utils.get_lr(model_name)` - LoRA needs ~10x higher LR than full fine-tuning.
|
|
110
|
+
|
|
111
|
+
2. **Renderer mismatch:** Match `renderer_name` to model family (`llama3`, `qwen3`, `role_colon`).
|
|
112
|
+
|
|
113
|
+
3. **Async gaps:** Submit `forward_backward_async` and `optim_step_async` back-to-back before awaiting.
|
|
114
|
+
|
|
115
|
+
4. **Sampler desync:** Create a **new** sampling client after saving weights.
|
|
116
|
+
|
|
117
|
+
5. **Type construction:** Use helper functions, not manual dict construction. See `supervised/data.py` and `supervised/common.py`.
|
|
118
|
+
|
|
119
|
+
6. **Group semantics:** RL advantages are centered within each group.
|
|
120
|
+
|
|
121
|
+
7. **DPO:** Start with `dpo_beta=0.1`, LR~1e-5.
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Testing
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
# Unit tests (no API needed, colocated *_test.py files)
|
|
129
|
+
pytest logits_cookbook/
|
|
130
|
+
|
|
131
|
+
# Smoke tests (requires TINKER_API_KEY + network)
|
|
132
|
+
pytest tests/
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
For debugging, shrink workloads via `n_batches`, `batch_size`, `group_size` in dataset builders.
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
# Logits Cookbook Agent Guide
|
|
2
|
+
|
|
3
|
+
Quick reference for agents working on `logits_cookbook`. Full documentation is in `docs/`.
|
|
4
|
+
|
|
5
|
+
`logits_cookbook` is a client library with training and eval code built on the Tinker service (hosted by Thinking Machines Lab) and the Tinker SDK (a separate repo with just the API). You author training/eval loops that run on a CPU machine; Tinker executes the heavy GPU work.
|
|
6
|
+
|
|
7
|
+
**Start here:** `docs/training-sampling.mdx` - Complete walkthrough of training and sampling basics.
|
|
8
|
+
|
|
9
|
+
## Documentation Map (`docs/`)
|
|
10
|
+
|
|
11
|
+
**API Fundamentals:**
|
|
12
|
+
- `index.mdx` - Tinker overview, division of responsibilities
|
|
13
|
+
- `install.mdx` - Installation, API key setup
|
|
14
|
+
- `training-sampling.mdx` - **Starter guide**: data prep, forward_backward, sampling, vision inputs
|
|
15
|
+
- `losses.mdx` - Loss functions (cross_entropy, importance_sampling, ppo, cispo, dro, forward_backward_custom)
|
|
16
|
+
- `save-load.mdx` - Checkpointing (save_weights_for_sampler vs save_state)
|
|
17
|
+
- `async.mdx` - Sync/async APIs, futures, overlapping requests
|
|
18
|
+
- `model-lineup.mdx` - Available models
|
|
19
|
+
- `under-the-hood.mdx` - Clock cycles, worker pools
|
|
20
|
+
|
|
21
|
+
**API Reference (`api-reference/`):**
|
|
22
|
+
- `types.md` - **All API types** (Datum, ModelInput, TensorData, SamplingParams, etc.)
|
|
23
|
+
- `trainingclient.md`, `samplingclient.md`, `serviceclient.md`, `restclient.md` - Client APIs
|
|
24
|
+
|
|
25
|
+
**Supervised Learning (`supervised-learning/`):**
|
|
26
|
+
- `../supervised-learning.mdx` - SL overview
|
|
27
|
+
- `sl-basic.mdx` - First SL run
|
|
28
|
+
- `sl-hyperparams.mdx` - LR formula, batch size
|
|
29
|
+
- `sl-loop.mdx` - Minimal training loop
|
|
30
|
+
- `prompt-distillation.mdx` - Distilling prompts
|
|
31
|
+
- `sweep-case-study.mdx` - Hyperparameter sweeps
|
|
32
|
+
|
|
33
|
+
**Reinforcement Learning (`rl/`):**
|
|
34
|
+
- `../rl.mdx` - RL overview (RLVR, RLHF)
|
|
35
|
+
- `rl-basic.mdx` - First RL run
|
|
36
|
+
- `rl-envs.mdx` - Custom Env, EnvGroupBuilder, RLDataset
|
|
37
|
+
- `rl-loops.mdx` - Minimal RL loop
|
|
38
|
+
- `rl-hyperparams.mdx` - batch_size vs group_size, async training
|
|
39
|
+
- `sequence-extension.mdx` - Multi-turn RL, KV-cache
|
|
40
|
+
|
|
41
|
+
**Preferences (`preferences/`):**
|
|
42
|
+
- `../preferences.mdx` - DPO vs RLHF overview
|
|
43
|
+
- `dpo-guide.mdx` - DPO training
|
|
44
|
+
- `rlhf-example.mdx` - RLHF pipeline
|
|
45
|
+
|
|
46
|
+
**Other:**
|
|
47
|
+
- `rendering.mdx` - Renderers (bridge between chat-style data and token sequences), vision inputs, TrainOnWhat
|
|
48
|
+
- `completers.mdx` - TokenCompleter vs MessageCompleter
|
|
49
|
+
- `evals.mdx` - Inline evals, Inspect AI, custom evaluators
|
|
50
|
+
- `lora-primer.mdx` - LoRA background
|
|
51
|
+
- `download-weights.mdx` / `publish-weights.mdx` - Weight export
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Composing Types
|
|
56
|
+
|
|
57
|
+
Agents often struggle with the nested type hierarchy. Key resources:
|
|
58
|
+
|
|
59
|
+
**Reference:** `docs/api-reference/types.md` documents all API types.
|
|
60
|
+
|
|
61
|
+
**Core types:**
|
|
62
|
+
- `Datum` = `model_input` (ModelInput) + `loss_fn_inputs` (dict of TensorData)
|
|
63
|
+
- `ModelInput` = list of chunks (EncodedTextChunk, ImageChunk)
|
|
64
|
+
- `TensorData` = wrapper for numpy/torch arrays with shape info
|
|
65
|
+
|
|
66
|
+
**Helper functions** (use these instead of manual construction):
|
|
67
|
+
- `datum_from_model_input_weights(model_input, weights, max_length)` - SL datum creation (`supervised/common.py`)
|
|
68
|
+
- `conversation_to_datum(messages, renderer, max_length, train_on_what)` - Full pipeline (`supervised/data.py`)
|
|
69
|
+
- `renderer.build_supervised_example(messages)` - Returns (ModelInput, weights)
|
|
70
|
+
- `ModelInput.from_ints(tokens)` - Create from token list
|
|
71
|
+
- `TensorData.from_numpy(arr)` / `TensorData.from_torch(tensor)` - Wrap arrays
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Architecture
|
|
76
|
+
|
|
77
|
+
**Builder pattern:** Config objects are `chz` dataclasses (SupervisedDatasetBuilder, RLDatasetBuilder, EnvGroupBuilder). They expose `.build()`/`__call__()` returning runtime objects.
|
|
78
|
+
|
|
79
|
+
**Key code locations:**
|
|
80
|
+
- SL: `logits_cookbook/supervised/train.py`
|
|
81
|
+
- RL: `logits_cookbook/rl/train.py`
|
|
82
|
+
- DPO: `logits_cookbook/preference/train_dpo.py`
|
|
83
|
+
- Renderers: `logits_cookbook/renderers/`
|
|
84
|
+
- Completers: `logits_cookbook/completers.py`
|
|
85
|
+
- RL types: `logits_cookbook/rl/types.py`
|
|
86
|
+
- Logging: `logits_cookbook/utils/logtree.py`, `logits_cookbook/rl/rollouts.py`
|
|
87
|
+
- Recipes: `logits_cookbook/recipes/`
|
|
88
|
+
|
|
89
|
+
**Training outputs:** RL and SL training write human-readable HTML reports and machine-readable JSON files (metrics, rollout transcripts, per-trajectory summaries) to `log_path`. Point agents at a `log_path` directory to analyze training runs — `metrics.jsonl` for scalar metrics, `*_rollout_summaries.jsonl` for per-trajectory data, and `*_logtree.json` for full rollout transcripts including model responses. See `docs/rl/rl-logging.mdx` for the complete file reference and parsing examples.
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Conventions
|
|
94
|
+
|
|
95
|
+
**Subscript suffixes** for tensor names: `_P` (problems), `_G` (groups), `_T` (tokens), `_D` (datums). Example: `tokens_P_G_T[p][g][t]`
|
|
96
|
+
|
|
97
|
+
**Code style:**
|
|
98
|
+
- Explicit typing; avoid `Any` / `type: ignore`
|
|
99
|
+
- Use `safezip`, `timed`, `scope` helpers
|
|
100
|
+
- `@chz.chz` decorator for config serialization
|
|
101
|
+
- `ml_log.log_metrics` for metrics; `logtree` for transcripts
|
|
102
|
+
|
|
103
|
+
**Env lifecycle:** `Env` objects are single-use (no reset). Create via `EnvGroupBuilder`.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Common Pitfalls
|
|
108
|
+
|
|
109
|
+
1. **LoRA LR:** Use `hyperparam_utils.get_lr(model_name)` - LoRA needs ~10x higher LR than full fine-tuning.
|
|
110
|
+
|
|
111
|
+
2. **Renderer mismatch:** Match `renderer_name` to model family (`llama3`, `qwen3`, `role_colon`).
|
|
112
|
+
|
|
113
|
+
3. **Async gaps:** Submit `forward_backward_async` and `optim_step_async` back-to-back before awaiting.
|
|
114
|
+
|
|
115
|
+
4. **Sampler desync:** Create a **new** sampling client after saving weights.
|
|
116
|
+
|
|
117
|
+
5. **Type construction:** Use helper functions, not manual dict construction. See `supervised/data.py` and `supervised/common.py`.
|
|
118
|
+
|
|
119
|
+
6. **Group semantics:** RL advantages are centered within each group.
|
|
120
|
+
|
|
121
|
+
7. **DPO:** Start with `dpo_beta=0.1`, LR~1e-5.
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Testing
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
# Unit tests (no API needed, colocated *_test.py files)
|
|
129
|
+
pytest logits_cookbook/
|
|
130
|
+
|
|
131
|
+
# Smoke tests (requires TINKER_API_KEY + network)
|
|
132
|
+
pytest tests/
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
For debugging, shrink workloads via `n_batches`, `batch_size`, `group_size` in dataset builders.
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# Development
|
|
2
|
+
|
|
3
|
+
This project is built in the spirit of open science and collaborative development. We believe that the best tools emerge through community involvement and shared learning.
|
|
4
|
+
|
|
5
|
+
We welcome PR contributions after our private beta is over. If you have any feedback, please email us at tinker@thinkingmachines.ai.
|
|
6
|
+
|
|
7
|
+
## Organization of training scripts
|
|
8
|
+
|
|
9
|
+
We're designing the codebase with the following goals:
|
|
10
|
+
|
|
11
|
+
1. Low barrier to entry: it should be dead simple to run something and see numbers go up.
|
|
12
|
+
2. Extensible: it should be possible to pass in custom datasets and evals and control all the hyperparameters.
|
|
13
|
+
3. Science-friendly: it should be easy to run sweeps, and analyze the results.
|
|
14
|
+
|
|
15
|
+
To achieve this, we'll use the following structure around training scripts:
|
|
16
|
+
|
|
17
|
+
- There's a main training function, such as [rl/train.py](logits_cookbook/rl/train.py) or [supervised/train.py](logits_cookbook/supervised/train.py), which contains the main loop.
|
|
18
|
+
- This function contains a detailed config object (`Config`), which isn't constructable from the command line.
|
|
19
|
+
- The config contains members that specify things like datasets and evals. However, these should be chz configs (with a `.build` method that constructs the actual object) or callables (we recommend using functools.partial). This way, the config is serializable, which is useful for sweeps.
|
|
20
|
+
- There are launch scripts that assemble training configs (e.g., [recipes/math_rl/train.py](logits_cookbook/recipes/math_rl/train.py)), which construct a smaller config object (`CLIConfig`) from the command line.
|
|
21
|
+
|
|
22
|
+
## Async
|
|
23
|
+
|
|
24
|
+
Async is very useful for RL, where it allows us to make many queries in parallel (e.g., sampling calls). For all of the interfaces used in RL (such as the `Env` class), all the methods that take nontrivial amounts of time should be async. For some of the other code, such as [recipes/sl_loop.py](logits_cookbook/recipes/sl_loop.py), we've chosen not to use async methods, just to make it more beginner-friendly, as many python programmers are not familiar with async.
|
|
25
|
+
|
|
26
|
+
## Typing
|
|
27
|
+
|
|
28
|
+
Please use typing wherever possible; avoid `Any` and `type: ignore`; prefer casting. However, avoid using convoluted generics or writing code that's much more verbose just to satisfy the type checker. Prefer using single types over union types.
|
|
29
|
+
|
|
30
|
+
## Classes
|
|
31
|
+
|
|
32
|
+
There are a lot of different classes, which might make the code feel less approachable. However, they follow *the builder pattern*, and the code should be less confusing when you know the pattern.
|
|
33
|
+
|
|
34
|
+
We can illustrate the pattern with the two main examples:
|
|
35
|
+
|
|
36
|
+
- A `SupervisedDatasetBuilder` is a configuration object which builds a `SupervisedDataset`.
|
|
37
|
+
- An `RLDatasetBuilder` is a configuration object which builds an `RLDataset`, which generates batches of `EnvGroupBuilder` objects, which each generate a group of `Env` objects.
|
|
38
|
+
|
|
39
|
+
Here, the `SupervisedDatasetBuilder`, `RLDatasetBuilder`, and `EnvGroupBuilder` are all configuration objects, which have a `__call__` method that builds another object. You can see these objects in [supervised/types.py](logits_cookbook/supervised/types.py) and [rl/types.py](logits_cookbook/rl/types.py).
|
|
40
|
+
|
|
41
|
+
In general, we use a lot of configuration objects, with a `__call__` method that returns a heavyweight object (like a dataset). We use `chz` for the configuration objects -- it's similar to a dataclass but with some extra features that are nice for configs. We use either dataclasses or regular python classes for the heavyweight objects.
|
|
42
|
+
|
|
43
|
+
## Envs
|
|
44
|
+
|
|
45
|
+
An `Env` is an RL environment. For those with an RL background, it roughly corresponds to an MDP or a POMDP, however we use in more general cases (such as multi-agent settings) that don't strictly correspond to the MDP/POMDP formalism. It's roughly analogous the concept of an Env in OpenAI Gym, but unlike OpenAI Gym, we don't have a `reset` method; rather, the env should be discarded after a rollout. Any shared resources should be maintained by whatever object is creating the envs.
|
|
46
|
+
|
|
47
|
+
The `Env`s are created by `EnvGroupBuilder`s. The group of envs returned by `EnvGroupBuilder` have something in common; either they correspond to the same task (in which case we can use this information for variance reduction, as in GRPO, which centers per group); or, we can use the group to define a multi-agent environment.
|
|
48
|
+
|
|
49
|
+
- One common multi-agent environment is where we use a pairwise preference model to compare pairs of completions.
|
|
50
|
+
- We can also use the group to define a two-player game. Some two player games such as tic-tac-toe are currently supported through the [text_arena](logits_cookbook/recipes/multiplayer_rl/text_arena/env.py) environments.
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
## Notation
|
|
54
|
+
|
|
55
|
+
We'll use subscripts to indicate the shapes of objects. For example, `tokens_P_G_T` indicates a three-dimensional array of tokens, with `P` problems, `G` groups, and `T` tokens per groups, so `tokens_P_G_T[p][g][t]` should refer to a single token. In many cases, the arrays will be ragged. E.g., the `T` axis will have different lengths for different `(p,g)`. Sometimes, a given dimension will be flattened from two dimensions. If we write `tokens_PG_T`, that means that we have a two dimensional array, where the 0th dimension is flattened from the `P` and `G` dimensions.
|
|
56
|
+
|
|
57
|
+
### Common Dimension Names
|
|
58
|
+
|
|
59
|
+
Here are the standard dimension subscripts used throughout the codebase:
|
|
60
|
+
|
|
61
|
+
- `_D`: Data/Datum dimension (for training data items)
|
|
62
|
+
- `_G`: Group dimension (for multiple attempts/rollouts of the same problem)
|
|
63
|
+
- `_P`: Problem dimension (for different problems/prompts)
|
|
64
|
+
- `_T`: Token/Time dimension (for sequences)
|
|
65
|
+
|
|
66
|
+
The relationship between dimensions in RL:
|
|
67
|
+
- A batch contains multiple problems (`_P`)
|
|
68
|
+
- Each problem spawns multiple attempts/environments (`_G`), forming a group
|
|
69
|
+
- Each attempt produces one trajectory
|
|
70
|
+
- Advantages are normalized within each group (across the `_G` dimension)
|
|
71
|
+
|
|
72
|
+
Examples:
|
|
73
|
+
- `env_group_builders_P`: A list of environment builders, one per problem
|
|
74
|
+
- `trajectories_G`: Multiple trajectories from attempts at the same problem
|
|
75
|
+
- `rewards_G`: Rewards for each attempt within a group
|
|
76
|
+
- `tokens_P_G_T`: Tokens with problem, group, and time dimensions
|
|
77
|
+
- `data_D`: A list of training data items
|