EvoScientist 0.0.1.dev1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (107) hide show
  1. EvoScientist/EvoScientist.py +157 -0
  2. EvoScientist/__init__.py +24 -0
  3. EvoScientist/__main__.py +4 -0
  4. EvoScientist/backends.py +392 -0
  5. EvoScientist/cli.py +1553 -0
  6. EvoScientist/middleware.py +35 -0
  7. EvoScientist/prompts.py +277 -0
  8. EvoScientist/skills/accelerate/SKILL.md +332 -0
  9. EvoScientist/skills/accelerate/references/custom-plugins.md +453 -0
  10. EvoScientist/skills/accelerate/references/megatron-integration.md +489 -0
  11. EvoScientist/skills/accelerate/references/performance.md +525 -0
  12. EvoScientist/skills/bitsandbytes/SKILL.md +411 -0
  13. EvoScientist/skills/bitsandbytes/references/memory-optimization.md +521 -0
  14. EvoScientist/skills/bitsandbytes/references/qlora-training.md +521 -0
  15. EvoScientist/skills/bitsandbytes/references/quantization-formats.md +447 -0
  16. EvoScientist/skills/find-skills/SKILL.md +133 -0
  17. EvoScientist/skills/find-skills/scripts/install_skill.py +211 -0
  18. EvoScientist/skills/flash-attention/SKILL.md +367 -0
  19. EvoScientist/skills/flash-attention/references/benchmarks.md +215 -0
  20. EvoScientist/skills/flash-attention/references/transformers-integration.md +293 -0
  21. EvoScientist/skills/llama-cpp/SKILL.md +258 -0
  22. EvoScientist/skills/llama-cpp/references/optimization.md +89 -0
  23. EvoScientist/skills/llama-cpp/references/quantization.md +213 -0
  24. EvoScientist/skills/llama-cpp/references/server.md +125 -0
  25. EvoScientist/skills/lm-evaluation-harness/SKILL.md +490 -0
  26. EvoScientist/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
  27. EvoScientist/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
  28. EvoScientist/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
  29. EvoScientist/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
  30. EvoScientist/skills/ml-paper-writing/SKILL.md +937 -0
  31. EvoScientist/skills/ml-paper-writing/references/checklists.md +361 -0
  32. EvoScientist/skills/ml-paper-writing/references/citation-workflow.md +562 -0
  33. EvoScientist/skills/ml-paper-writing/references/reviewer-guidelines.md +367 -0
  34. EvoScientist/skills/ml-paper-writing/references/sources.md +159 -0
  35. EvoScientist/skills/ml-paper-writing/references/writing-guide.md +476 -0
  36. EvoScientist/skills/ml-paper-writing/templates/README.md +251 -0
  37. EvoScientist/skills/ml-paper-writing/templates/aaai2026/README.md +534 -0
  38. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-supp.tex +144 -0
  39. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026-unified-template.tex +952 -0
  40. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bib +111 -0
  41. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.bst +1493 -0
  42. EvoScientist/skills/ml-paper-writing/templates/aaai2026/aaai2026.sty +315 -0
  43. EvoScientist/skills/ml-paper-writing/templates/acl/README.md +50 -0
  44. EvoScientist/skills/ml-paper-writing/templates/acl/acl.sty +312 -0
  45. EvoScientist/skills/ml-paper-writing/templates/acl/acl_latex.tex +377 -0
  46. EvoScientist/skills/ml-paper-writing/templates/acl/acl_lualatex.tex +101 -0
  47. EvoScientist/skills/ml-paper-writing/templates/acl/acl_natbib.bst +1940 -0
  48. EvoScientist/skills/ml-paper-writing/templates/acl/anthology.bib.txt +26 -0
  49. EvoScientist/skills/ml-paper-writing/templates/acl/custom.bib +70 -0
  50. EvoScientist/skills/ml-paper-writing/templates/acl/formatting.md +326 -0
  51. EvoScientist/skills/ml-paper-writing/templates/colm2025/README.md +3 -0
  52. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bib +11 -0
  53. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.bst +1440 -0
  54. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.pdf +0 -0
  55. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.sty +218 -0
  56. EvoScientist/skills/ml-paper-writing/templates/colm2025/colm2025_conference.tex +305 -0
  57. EvoScientist/skills/ml-paper-writing/templates/colm2025/fancyhdr.sty +485 -0
  58. EvoScientist/skills/ml-paper-writing/templates/colm2025/math_commands.tex +508 -0
  59. EvoScientist/skills/ml-paper-writing/templates/colm2025/natbib.sty +1246 -0
  60. EvoScientist/skills/ml-paper-writing/templates/iclr2026/fancyhdr.sty +485 -0
  61. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bib +24 -0
  62. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.bst +1440 -0
  63. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.pdf +0 -0
  64. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.sty +246 -0
  65. EvoScientist/skills/ml-paper-writing/templates/iclr2026/iclr2026_conference.tex +414 -0
  66. EvoScientist/skills/ml-paper-writing/templates/iclr2026/math_commands.tex +508 -0
  67. EvoScientist/skills/ml-paper-writing/templates/iclr2026/natbib.sty +1246 -0
  68. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithm.sty +79 -0
  69. EvoScientist/skills/ml-paper-writing/templates/icml2026/algorithmic.sty +201 -0
  70. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.bib +75 -0
  71. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.pdf +0 -0
  72. EvoScientist/skills/ml-paper-writing/templates/icml2026/example_paper.tex +662 -0
  73. EvoScientist/skills/ml-paper-writing/templates/icml2026/fancyhdr.sty +864 -0
  74. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.bst +1443 -0
  75. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml2026.sty +767 -0
  76. EvoScientist/skills/ml-paper-writing/templates/icml2026/icml_numpapers.pdf +0 -0
  77. EvoScientist/skills/ml-paper-writing/templates/neurips2025/Makefile +36 -0
  78. EvoScientist/skills/ml-paper-writing/templates/neurips2025/extra_pkgs.tex +53 -0
  79. EvoScientist/skills/ml-paper-writing/templates/neurips2025/main.tex +38 -0
  80. EvoScientist/skills/ml-paper-writing/templates/neurips2025/neurips.sty +382 -0
  81. EvoScientist/skills/peft/SKILL.md +431 -0
  82. EvoScientist/skills/peft/references/advanced-usage.md +514 -0
  83. EvoScientist/skills/peft/references/troubleshooting.md +480 -0
  84. EvoScientist/skills/ray-data/SKILL.md +326 -0
  85. EvoScientist/skills/ray-data/references/integration.md +82 -0
  86. EvoScientist/skills/ray-data/references/transformations.md +83 -0
  87. EvoScientist/skills/skill-creator/LICENSE.txt +202 -0
  88. EvoScientist/skills/skill-creator/SKILL.md +356 -0
  89. EvoScientist/skills/skill-creator/references/output-patterns.md +82 -0
  90. EvoScientist/skills/skill-creator/references/workflows.md +28 -0
  91. EvoScientist/skills/skill-creator/scripts/init_skill.py +303 -0
  92. EvoScientist/skills/skill-creator/scripts/package_skill.py +110 -0
  93. EvoScientist/skills/skill-creator/scripts/quick_validate.py +95 -0
  94. EvoScientist/stream/__init__.py +53 -0
  95. EvoScientist/stream/emitter.py +94 -0
  96. EvoScientist/stream/formatter.py +168 -0
  97. EvoScientist/stream/tracker.py +115 -0
  98. EvoScientist/stream/utils.py +255 -0
  99. EvoScientist/subagent.yaml +147 -0
  100. EvoScientist/tools.py +135 -0
  101. EvoScientist/utils.py +207 -0
  102. evoscientist-0.0.1.dev1.dist-info/METADATA +222 -0
  103. evoscientist-0.0.1.dev1.dist-info/RECORD +107 -0
  104. evoscientist-0.0.1.dev1.dist-info/WHEEL +5 -0
  105. evoscientist-0.0.1.dev1.dist-info/entry_points.txt +2 -0
  106. evoscientist-0.0.1.dev1.dist-info/licenses/LICENSE +21 -0
  107. evoscientist-0.0.1.dev1.dist-info/top_level.txt +1 -0
@@ -0,0 +1,35 @@
1
+ """Middleware configuration for the EvoScientist agent."""
2
+
3
+ from pathlib import Path
4
+
5
+ from deepagents.middleware.skills import SkillsMiddleware
6
+
7
+ from .backends import MergedReadOnlyBackend
8
+
9
+ _DEFAULT_SKILLS_DIR = str(Path(__file__).parent / "skills")
10
+
11
+
12
+ def create_skills_middleware(
13
+ skills_dir: str = _DEFAULT_SKILLS_DIR,
14
+ workspace_dir: str = "./workspace/",
15
+ ) -> SkillsMiddleware:
16
+ """Create a SkillsMiddleware that loads skills.
17
+
18
+ Merges user-installed skills (workspace/skills/) with system skills
19
+ (package built-in). User skills take priority on name conflicts.
20
+
21
+ Args:
22
+ skills_dir: Path to the system skills directory (package built-in)
23
+ workspace_dir: Path to the workspace root (user skills live under workspace/skills/)
24
+
25
+ Returns:
26
+ Configured SkillsMiddleware instance
27
+ """
28
+ merged = MergedReadOnlyBackend(
29
+ primary_dir=str(Path(workspace_dir) / "skills"),
30
+ secondary_dir=skills_dir,
31
+ )
32
+ return SkillsMiddleware(
33
+ backend=merged,
34
+ sources=["/"],
35
+ )
@@ -0,0 +1,277 @@
1
+ """Prompt templates for the EvoScientist experimental agent."""
2
+
3
+ # =============================================================================
4
+ # Main agent workflow
5
+ # =============================================================================
6
+
7
+ EXPERIMENT_WORKFLOW = """# Experiment Workflow
8
+
9
+ You are the main experimental agent. Your mission is to transform a research proposal
10
+ into reproducible experiments and a paper-ready experimental report.
11
+
12
+ ## Core Principles
13
+ - Baseline first, then iterate (ablation-friendly).
14
+ - Change one major variable per iteration (data, model, objective, or training recipe).
15
+ - Never invent results. If you cannot run something, say so and propose the smallest next step.
16
+ - Delegate aggressively using the `task` tool. Prefer the research sub-agent for web search.
17
+ - Use local skills via `load_skill` when they match the task. Skills provide proven workflows and checklists.
18
+ All skills are available under `/skills/` (read-only).
19
+ When calling `load_skill`, use the skill id from the SKILL.md frontmatter (`name:`), not the folder name.
20
+
21
+ ## Scientific Rigor Checklist
22
+ - Validate data and run quick EDA; document anomalies or data leakage risks.
23
+ - Separate exploratory vs confirmatory analyses; define primary metrics up front.
24
+ - Report effect sizes with uncertainty (confidence intervals/error bars) where possible.
25
+ - Apply multiple-testing correction when comparing many conditions.
26
+ - State limitations, negative results, and sensitivity to key parameters.
27
+ - Track reproducibility (seeds, versions, configs, and exact commands).
28
+
29
+ ## Step 1: Intake & Scope
30
+ - Read the proposal and extract goals, datasets, constraints, and evaluation metrics
31
+ - Capture key assumptions and open questions
32
+ - Save the original proposal to `/research_request.md`
33
+
34
+ ## Step 2: Plan (Recommended Structure)
35
+ - Create experiment stages with success signals (flexible, not rigid)
36
+ - Identify resource/data dependencies and baseline requirements
37
+ - Use `write_todos` to track the execution plan and updates
38
+ - If delegating planning to planner-agent, start your message with: `MODE: PLAN`
39
+ - If a stage matches an existing skill, note the skill name in the plan and load it before implementation.
40
+ Use the skill id from SKILL.md frontmatter (`name:`).
41
+ -- Save the plan to `/todos.md` (recommended). Include per-stage:
42
+ - objective and success signals
43
+ - what to run (commands/scripts)
44
+ - expected artifacts (tables/plots/logs)
45
+ - Optionally save:
46
+ - `/plan.md` for stages
47
+ - `/success_criteria.md` for success signals
48
+
49
+ ## Step 3: Execute & Debug
50
+ - Delegate tasks to sub-agents using the `task` tool:
51
+ - Planning/structuring → planner-agent
52
+ - Methods/baselines/datasets → research-agent
53
+ - Implementation → code-agent
54
+ - Debugging → debug-agent
55
+ - Analysis/visualization → data-analysis-agent
56
+ - Report drafting → writing-agent
57
+ - Prefer the research-agent for web search; avoid searching directly
58
+ - Use `execute` for shell commands when running experiments
59
+ - When a task matches an existing skill, `load_skill` it and follow it rather than reinventing the workflow.
60
+ - Keep outputs organized under `/artifacts/` (recommended)
61
+ - Optionally log runs to `/experiment_log.md` (params, seeds, env, outputs)
62
+
63
+ ## Step 4: Evaluate & Iterate
64
+ - Compare results against success signals
65
+ - If results are weak or ambiguous, iterate:
66
+ - identify gaps
67
+ - propose new methods/data
68
+ - re-run and re-evaluate
69
+ - Prefer evidence-driven iteration: error analysis, sanity checks, and minimal ablations
70
+ - Update `/todos.md` to reflect new iterations
71
+ - Stop iterating when evidence is sufficient or diminishing returns appear
72
+
73
+ ### Stage Reflection (Recommended Checkpoint)
74
+ After any meaningful experimental stage (baseline, new dataset, new training recipe, etc.),
75
+ delegate a short reflection to the planner-agent and use it to update the remaining plan.
76
+
77
+ Trigger this checkpoint when:
78
+ - A baseline finishes (you now have a reference point).
79
+ - You introduce a new dataset/model/training recipe (risk of confounding changes).
80
+ - Two iterations in a row fail to improve the primary metric.
81
+ - Results look suspicious (metric mismatch, unstable training, unexpected regressions).
82
+
83
+ When calling the planner-agent in reflection mode, provide:
84
+ - Start your message with: `MODE: REFLECTION`
85
+ - Stage name/index and intent
86
+ - Commands run + key parameters (model, dataset, seeds, batch size, lr, epochs, hardware)
87
+ - Key metrics vs baseline (a small table is ideal)
88
+ - Artifact paths (logs, plots, checkpoints)
89
+ - Which success signals were met/unmet
90
+ - If proposing skills, use skill ids from SKILL.md frontmatter (`name:`).
91
+
92
+ Ask the planner-agent to output a **Plan Update JSON** with this schema:
93
+ ```json
94
+ {
95
+ "completed": ["..."],
96
+ "unmet_success_signals": ["..."],
97
+ "skill_suggestions": ["..."],
98
+ "stage_modifications": [
99
+ {"stage": "Stage name or index", "change": "What to adjust and why"}
100
+ ],
101
+ "new_stages": [
102
+ {
103
+ "title": "...",
104
+ "goal": "...",
105
+ "success_signals": ["..."],
106
+ "what_to_run": ["..."],
107
+ "expected_artifacts": ["..."]
108
+ }
109
+ ],
110
+ "todo_updates": ["..."]
111
+ }
112
+ ```
113
+ Empty arrays are valid. If no changes are needed, return the JSON with empty arrays.
114
+ Then revise `/todos.md` accordingly.
115
+
116
+ ## Step 5: Write Report
117
+ - Write the final report to `/final_report.md` (Markdown)
118
+ - Include:
119
+ - Problem summary
120
+ - Experiment plan (stages + success signals)
121
+ - Experimental setup and configurations
122
+ - Results and visualizations (reference artifacts)
123
+ - Analysis, limitations, and next steps
124
+ - If web research was used, include a Sources section with real URLs (no fabricated citations)
125
+ - When applicable, include effect sizes, uncertainty, and notes on statistical corrections.
126
+ - Be precise, technical, and concise
127
+
128
+ ## Step 6: Verify
129
+ - Re-read `/research_request.md` to ensure coverage
130
+ - Confirm the report answers the proposal and documents key settings/results
131
+
132
+ ## Experiment Report Template (Recommended)
133
+ 1. Summary & goals
134
+ 2. Experiment plan (stages + success signals)
135
+ 3. Setup (data, model, environment, parameters)
136
+ 4. Baselines and comparisons
137
+ 5. Results (tables/figures + references to artifacts)
138
+ 6. Analysis, limitations, and next steps
139
+
140
+ ## Writing Guidelines
141
+ - Use bullets for configs, stage lists, and key results; use short paragraphs for reasoning
142
+ - Avoid first-person singular ("I ..."). Prefer neutral phrasing ("This experiment...") or "we" style.
143
+ - Professional, objective tone
144
+
145
+ ## Shell Execution Guidelines
146
+ When using the `execute` tool for shell commands:
147
+
148
+ **Short commands** (< 30 seconds): Run directly
149
+ ```bash
150
+ python script.py
151
+ pip install pandas
152
+ ```
153
+
154
+ **Long-running commands** (> 30 seconds): Run in background, then check results
155
+ ```bash
156
+ # Step 1: Start in background, redirect output to log
157
+ python long_task.py > /output.log 2>&1 &
158
+
159
+ # Step 2: Check if still running
160
+ ps aux | grep long_task
161
+
162
+ # Step 3: Read results when done
163
+ cat /output.log
164
+ ```
165
+
166
+ This prevents blocking the conversation during long operations.
167
+ """
168
+
169
+ # =============================================================================
170
+ # Sub-agent delegation strategy
171
+ # =============================================================================
172
+
173
+ DELEGATION_STRATEGY = """# Sub-Agent Delegation
174
+
175
+ ## Default: Use 1 Sub-Agent
176
+ For most tasks, a single sub-agent is sufficient:
177
+ - "Plan experimental stages" → planner-agent
178
+ - "Reflect and update the plan after a stage" → planner-agent
179
+ - "Find related methods/baselines/datasets" → research-agent
180
+ - "Implement baseline or training loop" → code-agent
181
+ - "Debug runtime failures" → debug-agent
182
+ - "Analyze metrics and plot figures" → data-analysis-agent
183
+ - "Draft report sections" → writing-agent
184
+
185
+ ## Task Granularity
186
+ - One sub-agent task = one topic / one experiment / one artifact bundle
187
+ - Provide concrete file paths, commands, and success signals in each task
188
+ so the sub-agent can respond precisely
189
+
190
+ ## Parallelize Only When Necessary
191
+ Use multiple sub-agents ONLY for:
192
+
193
+ **Explicit comparisons** (1 per method/baseline):
194
+ - "Compare A vs B vs C" → 3 parallel sub-agents
195
+
196
+ **Distinct experiments** with separate datasets or setups:
197
+ - "Run baselines on X and Y" → 2 parallel sub-agents
198
+
199
+ ## Limits
200
+ - Maximum {max_concurrent} parallel sub-agents per round
201
+ - Maximum {max_iterations} delegation rounds total
202
+ - Stop when evidence is sufficient
203
+
204
+ ## Key Principles
205
+ - Bias towards a single sub-agent (token-efficient)
206
+ - Avoid premature decomposition
207
+ - Each sub-agent returns focused, self-contained findings
208
+ """
209
+
210
+ # =============================================================================
211
+ # Sub-agent research instructions
212
+ # =============================================================================
213
+
214
+ RESEARCHER_INSTRUCTIONS = """You are a research assistant. Today's date is {date}.
215
+
216
+ ## Task
217
+ Use tools to gather information on the assigned topic (methods, baselines,
218
+ datasets, or prior results) to support experimental planning or iteration.
219
+ Prefer actionable details: datasets, metrics, code availability, and common pitfalls.
220
+ Do not fabricate citations or URLs.
221
+ Capture evaluation protocols (splits, metrics, calibration) and known failure modes.
222
+
223
+ ## Available Tools
224
+ 1. **tavily_search** - Web search for information
225
+ 2. **think_tool** - Reflect on findings and plan next steps
226
+
227
+ **CRITICAL: Use think_tool after each search**
228
+
229
+ ## Research Strategy
230
+ 1. Read the question carefully
231
+ 2. Start with broad searches
232
+ 3. After each search, reflect: Do I have enough? What's missing?
233
+ 4. Narrow searches to fill gaps
234
+ 5. Stop when you can answer confidently
235
+
236
+ ## Hard Limits
237
+ - Simple queries: 2-3 searches maximum
238
+ - Complex queries: up to 5 searches maximum
239
+ - Stop after 5 searches regardless
240
+
241
+ ## Stop When
242
+ - You can answer comprehensively
243
+ - You have 3+ relevant sources
244
+ - Last 2 searches returned similar information
245
+
246
+ ## Response Format
247
+ Structure findings with clear headings and cite sources inline:
248
+
249
+ ```
250
+ ## Key Findings
251
+
252
+ Finding one with context [1]. Another insight [2].
253
+
254
+ ## Recommended Next Experiments
255
+ - One actionable experiment suggestion with motivation and expected outcome.
256
+
257
+ ### Sources
258
+ [1] Title: URL
259
+ [2] Title: URL
260
+ ```
261
+ """
262
+
263
+ # =============================================================================
264
+ # Combined exports
265
+ # =============================================================================
266
+
267
+ def get_system_prompt(max_concurrent: int = 3, max_iterations: int = 3) -> str:
268
+ """Generate the complete system prompt with configured limits."""
269
+ delegation = DELEGATION_STRATEGY.format(
270
+ max_concurrent=max_concurrent,
271
+ max_iterations=max_iterations,
272
+ )
273
+ return EXPERIMENT_WORKFLOW + "\n" + delegation
274
+
275
+
276
+ # Default export (backward compatible)
277
+ SYSTEM_PROMPT = get_system_prompt()
@@ -0,0 +1,332 @@
1
+ ---
2
+ name: accelerate
3
+ description: Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.
4
+ version: 1.0.0
5
+ author: Orchestra Research
6
+ license: MIT
7
+ tags: [Distributed Training, HuggingFace, Accelerate, DeepSpeed, FSDP, Mixed Precision, PyTorch, DDP, Unified API, Simple]
8
+ dependencies: [accelerate, torch, transformers]
9
+ ---
10
+
11
+ # HuggingFace Accelerate - Unified Distributed Training
12
+
13
+ ## Quick start
14
+
15
+ Accelerate simplifies distributed training to 4 lines of code.
16
+
17
+ **Installation**:
18
+ ```bash
19
+ pip install accelerate
20
+ ```
21
+
22
+ **Convert PyTorch script** (4 lines):
23
+ ```python
24
+ import torch
25
+ + from accelerate import Accelerator
26
+
27
+ + accelerator = Accelerator()
28
+
29
+ model = torch.nn.Transformer()
30
+ optimizer = torch.optim.Adam(model.parameters())
31
+ dataloader = torch.utils.data.DataLoader(dataset)
32
+
33
+ + model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
34
+
35
+ for batch in dataloader:
36
+ optimizer.zero_grad()
37
+ loss = model(batch)
38
+ - loss.backward()
39
+ + accelerator.backward(loss)
40
+ optimizer.step()
41
+ ```
42
+
43
+ **Run** (single command):
44
+ ```bash
45
+ accelerate launch train.py
46
+ ```
47
+
48
+ ## Common workflows
49
+
50
+ ### Workflow 1: From single GPU to multi-GPU
51
+
52
+ **Original script**:
53
+ ```python
54
+ # train.py
55
+ import torch
56
+
57
+ model = torch.nn.Linear(10, 2).to('cuda')
58
+ optimizer = torch.optim.Adam(model.parameters())
59
+ dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
60
+
61
+ for epoch in range(10):
62
+ for batch in dataloader:
63
+ batch = batch.to('cuda')
64
+ optimizer.zero_grad()
65
+ loss = model(batch).mean()
66
+ loss.backward()
67
+ optimizer.step()
68
+ ```
69
+
70
+ **With Accelerate** (4 lines added):
71
+ ```python
72
+ # train.py
73
+ import torch
74
+ from accelerate import Accelerator # +1
75
+
76
+ accelerator = Accelerator() # +2
77
+
78
+ model = torch.nn.Linear(10, 2)
79
+ optimizer = torch.optim.Adam(model.parameters())
80
+ dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
81
+
82
+ model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader) # +3
83
+
84
+ for epoch in range(10):
85
+ for batch in dataloader:
86
+ # No .to('cuda') needed - automatic!
87
+ optimizer.zero_grad()
88
+ loss = model(batch).mean()
89
+ accelerator.backward(loss) # +4
90
+ optimizer.step()
91
+ ```
92
+
93
+ **Configure** (interactive):
94
+ ```bash
95
+ accelerate config
96
+ ```
97
+
98
+ **Questions**:
99
+ - Which machine? (single/multi GPU/TPU/CPU)
100
+ - How many machines? (1)
101
+ - Mixed precision? (no/fp16/bf16/fp8)
102
+ - DeepSpeed? (no/yes)
103
+
104
+ **Launch** (works on any setup):
105
+ ```bash
106
+ # Single GPU
107
+ accelerate launch train.py
108
+
109
+ # Multi-GPU (8 GPUs)
110
+ accelerate launch --multi_gpu --num_processes 8 train.py
111
+
112
+ # Multi-node
113
+ accelerate launch --multi_gpu --num_processes 16 \
114
+ --num_machines 2 --machine_rank 0 \
115
+ --main_process_ip $MASTER_ADDR \
116
+ train.py
117
+ ```
118
+
119
+ ### Workflow 2: Mixed precision training
120
+
121
+ **Enable FP16/BF16**:
122
+ ```python
123
+ from accelerate import Accelerator
124
+
125
+ # FP16 (with gradient scaling)
126
+ accelerator = Accelerator(mixed_precision='fp16')
127
+
128
+ # BF16 (no scaling, more stable)
129
+ accelerator = Accelerator(mixed_precision='bf16')
130
+
131
+ # FP8 (H100+)
132
+ accelerator = Accelerator(mixed_precision='fp8')
133
+
134
+ model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
135
+
136
+ # Everything else is automatic!
137
+ for batch in dataloader:
138
+ with accelerator.autocast(): # Optional, done automatically
139
+ loss = model(batch)
140
+ accelerator.backward(loss)
141
+ ```
142
+
143
+ ### Workflow 3: DeepSpeed ZeRO integration
144
+
145
+ **Enable DeepSpeed ZeRO-2**:
146
+ ```python
147
+ from accelerate import Accelerator
148
+
149
+ accelerator = Accelerator(
150
+ mixed_precision='bf16',
151
+ deepspeed_plugin={
152
+ "zero_stage": 2, # ZeRO-2
153
+ "offload_optimizer": False,
154
+ "gradient_accumulation_steps": 4
155
+ }
156
+ )
157
+
158
+ # Same code as before!
159
+ model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
160
+ ```
161
+
162
+ **Or via config**:
163
+ ```bash
164
+ accelerate config
165
+ # Select: DeepSpeed → ZeRO-2
166
+ ```
167
+
168
+ **deepspeed_config.json**:
169
+ ```json
170
+ {
171
+ "fp16": {"enabled": false},
172
+ "bf16": {"enabled": true},
173
+ "zero_optimization": {
174
+ "stage": 2,
175
+ "offload_optimizer": {"device": "cpu"},
176
+ "allgather_bucket_size": 5e8,
177
+ "reduce_bucket_size": 5e8
178
+ }
179
+ }
180
+ ```
181
+
182
+ **Launch**:
183
+ ```bash
184
+ accelerate launch --config_file deepspeed_config.json train.py
185
+ ```
186
+
187
+ ### Workflow 4: FSDP (Fully Sharded Data Parallel)
188
+
189
+ **Enable FSDP**:
190
+ ```python
191
+ from accelerate import Accelerator, FullyShardedDataParallelPlugin
192
+
193
+ fsdp_plugin = FullyShardedDataParallelPlugin(
194
+ sharding_strategy="FULL_SHARD", # ZeRO-3 equivalent
195
+ auto_wrap_policy="TRANSFORMER_AUTO_WRAP",
196
+ cpu_offload=False
197
+ )
198
+
199
+ accelerator = Accelerator(
200
+ mixed_precision='bf16',
201
+ fsdp_plugin=fsdp_plugin
202
+ )
203
+
204
+ model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
205
+ ```
206
+
207
+ **Or via config**:
208
+ ```bash
209
+ accelerate config
210
+ # Select: FSDP → Full Shard → No CPU Offload
211
+ ```
212
+
213
+ ### Workflow 5: Gradient accumulation
214
+
215
+ **Accumulate gradients**:
216
+ ```python
217
+ from accelerate import Accelerator
218
+
219
+ accelerator = Accelerator(gradient_accumulation_steps=4)
220
+
221
+ model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
222
+
223
+ for batch in dataloader:
224
+ with accelerator.accumulate(model): # Handles accumulation
225
+ optimizer.zero_grad()
226
+ loss = model(batch)
227
+ accelerator.backward(loss)
228
+ optimizer.step()
229
+ ```
230
+
231
+ **Effective batch size**: `batch_size * num_gpus * gradient_accumulation_steps`
232
+
233
+ ## When to use vs alternatives
234
+
235
+ **Use Accelerate when**:
236
+ - Want simplest distributed training
237
+ - Need single script for any hardware
238
+ - Use HuggingFace ecosystem
239
+ - Want flexibility (DDP/DeepSpeed/FSDP/Megatron)
240
+ - Need quick prototyping
241
+
242
+ **Key advantages**:
243
+ - **4 lines**: Minimal code changes
244
+ - **Unified API**: Same code for DDP, DeepSpeed, FSDP, Megatron
245
+ - **Automatic**: Device placement, mixed precision, sharding
246
+ - **Interactive config**: No manual launcher setup
247
+ - **Single launch**: Works everywhere
248
+
249
+ **Use alternatives instead**:
250
+ - **PyTorch Lightning**: Need callbacks, high-level abstractions
251
+ - **Ray Train**: Multi-node orchestration, hyperparameter tuning
252
+ - **DeepSpeed**: Direct API control, advanced features
253
+ - **Raw DDP**: Maximum control, minimal abstraction
254
+
255
+ ## Common issues
256
+
257
+ **Issue: Wrong device placement**
258
+
259
+ Don't manually move to device:
260
+ ```python
261
+ # WRONG
262
+ batch = batch.to('cuda')
263
+
264
+ # CORRECT
265
+ # Accelerate handles it automatically after prepare()
266
+ ```
267
+
268
+ **Issue: Gradient accumulation not working**
269
+
270
+ Use context manager:
271
+ ```python
272
+ # CORRECT
273
+ with accelerator.accumulate(model):
274
+ optimizer.zero_grad()
275
+ accelerator.backward(loss)
276
+ optimizer.step()
277
+ ```
278
+
279
+ **Issue: Checkpointing in distributed**
280
+
281
+ Use accelerator methods:
282
+ ```python
283
+ # Save only on main process
284
+ if accelerator.is_main_process:
285
+ accelerator.save_state('checkpoint/')
286
+
287
+ # Load on all processes
288
+ accelerator.load_state('checkpoint/')
289
+ ```
290
+
291
+ **Issue: Different results with FSDP**
292
+
293
+ Ensure same random seed:
294
+ ```python
295
+ from accelerate.utils import set_seed
296
+ set_seed(42)
297
+ ```
298
+
299
+ ## Advanced topics
300
+
301
+ **Megatron integration**: See [references/megatron-integration.md](references/megatron-integration.md) for tensor parallelism, pipeline parallelism, and sequence parallelism setup.
302
+
303
+ **Custom plugins**: See [references/custom-plugins.md](references/custom-plugins.md) for creating custom distributed plugins and advanced configuration.
304
+
305
+ **Performance tuning**: See [references/performance.md](references/performance.md) for profiling, memory optimization, and best practices.
306
+
307
+ ## Hardware requirements
308
+
309
+ - **CPU**: Works (slow)
310
+ - **Single GPU**: Works
311
+ - **Multi-GPU**: DDP (default), DeepSpeed, or FSDP
312
+ - **Multi-node**: DDP, DeepSpeed, FSDP, Megatron
313
+ - **TPU**: Supported
314
+ - **Apple MPS**: Supported
315
+
316
+ **Launcher requirements**:
317
+ - **DDP**: `torch.distributed.run` (built-in)
318
+ - **DeepSpeed**: `deepspeed` (pip install deepspeed)
319
+ - **FSDP**: PyTorch 1.12+ (built-in)
320
+ - **Megatron**: Custom setup
321
+
322
+ ## Resources
323
+
324
+ - Docs: https://huggingface.co/docs/accelerate
325
+ - GitHub: https://github.com/huggingface/accelerate
326
+ - Version: 1.11.0+
327
+ - Tutorial: "Accelerate your scripts"
328
+ - Examples: https://github.com/huggingface/accelerate/tree/main/examples
329
+ - Used by: HuggingFace Transformers, TRL, PEFT, all HF libraries
330
+
331
+
332
+