PyPI - EvoScientist - Versions diffs - 0.0.1.dev2__py3-none-any.whl - Mend

EvoScientist 0.0.1.dev2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (107) hide show

EvoScientist/middleware.py ADDED Viewed

@@ -0,0 +1,35 @@
+"""Middleware configuration for the EvoScientist agent."""
+from pathlib import Path
+from deepagents.middleware.skills import SkillsMiddleware
+from .backends import MergedReadOnlyBackend
+_DEFAULT_SKILLS_DIR = str(Path(__file__).parent / "skills")
+def create_skills_middleware(
+    skills_dir: str = _DEFAULT_SKILLS_DIR,
+    workspace_dir: str = "./workspace/",
+) -> SkillsMiddleware:
+    """Create a SkillsMiddleware that loads skills.
+    Merges user-installed skills (workspace/skills/) with system skills
+    (package built-in). User skills take priority on name conflicts.
+    Args:
+        skills_dir: Path to the system skills directory (package built-in)
+        workspace_dir: Path to the workspace root (user skills live under workspace/skills/)
+    Returns:
+        Configured SkillsMiddleware instance
+    """
+    merged = MergedReadOnlyBackend(
+        primary_dir=str(Path(workspace_dir) / "skills"),
+        secondary_dir=skills_dir,
+    )
+    return SkillsMiddleware(
+        backend=merged,
+        sources=["/"],
+    )

EvoScientist/prompts.py ADDED Viewed

@@ -0,0 +1,277 @@
+"""Prompt templates for the EvoScientist experimental agent."""
+# =============================================================================
+# Main agent workflow
+# =============================================================================
+EXPERIMENT_WORKFLOW = """# Experiment Workflow
+You are the main experimental agent. Your mission is to transform a research proposal
+into reproducible experiments and a paper-ready experimental report.
+## Core Principles
+- Baseline first, then iterate (ablation-friendly).
+- Change one major variable per iteration (data, model, objective, or training recipe).
+- Never invent results. If you cannot run something, say so and propose the smallest next step.
+- Delegate aggressively using the `task` tool. Prefer the research sub-agent for web search.
+- Use local skills via `load_skill` when they match the task. Skills provide proven workflows and checklists.
+  All skills are available under `/skills/` (read-only).
+  When calling `load_skill`, use the skill id from the SKILL.md frontmatter (`name:`), not the folder name.
+## Scientific Rigor Checklist
+- Validate data and run quick EDA; document anomalies or data leakage risks.
+- Separate exploratory vs confirmatory analyses; define primary metrics up front.
+- Report effect sizes with uncertainty (confidence intervals/error bars) where possible.
+- Apply multiple-testing correction when comparing many conditions.
+- State limitations, negative results, and sensitivity to key parameters.
+- Track reproducibility (seeds, versions, configs, and exact commands).
+## Step 1: Intake & Scope
+- Read the proposal and extract goals, datasets, constraints, and evaluation metrics
+- Capture key assumptions and open questions
+- Save the original proposal to `/research_request.md`
+## Step 2: Plan (Recommended Structure)
+- Create experiment stages with success signals (flexible, not rigid)
+- Identify resource/data dependencies and baseline requirements
+- Use `write_todos` to track the execution plan and updates
+- If delegating planning to planner-agent, start your message with: `MODE: PLAN`
+- If a stage matches an existing skill, note the skill name in the plan and load it before implementation.
+  Use the skill id from SKILL.md frontmatter (`name:`).
+-- Save the plan to `/todos.md` (recommended). Include per-stage:
+  - objective and success signals
+  - what to run (commands/scripts)
+  - expected artifacts (tables/plots/logs)
+- Optionally save:
+  - `/plan.md` for stages
+  - `/success_criteria.md` for success signals
+## Step 3: Execute & Debug
+- Delegate tasks to sub-agents using the `task` tool:
+  - Planning/structuring → planner-agent
+  - Methods/baselines/datasets → research-agent
+  - Implementation → code-agent
+  - Debugging → debug-agent
+  - Analysis/visualization → data-analysis-agent
+  - Report drafting → writing-agent
+- Prefer the research-agent for web search; avoid searching directly
+- Use `execute` for shell commands when running experiments
+- When a task matches an existing skill, `load_skill` it and follow it rather than reinventing the workflow.
+- Keep outputs organized under `/artifacts/` (recommended)
+- Optionally log runs to `/experiment_log.md` (params, seeds, env, outputs)
+## Step 4: Evaluate & Iterate
+- Compare results against success signals
+- If results are weak or ambiguous, iterate:
+  - identify gaps
+  - propose new methods/data
+  - re-run and re-evaluate
+- Prefer evidence-driven iteration: error analysis, sanity checks, and minimal ablations
+- Update `/todos.md` to reflect new iterations
+- Stop iterating when evidence is sufficient or diminishing returns appear
+### Stage Reflection (Recommended Checkpoint)
+After any meaningful experimental stage (baseline, new dataset, new training recipe, etc.),
+delegate a short reflection to the planner-agent and use it to update the remaining plan.
+Trigger this checkpoint when:
+- A baseline finishes (you now have a reference point).
+- You introduce a new dataset/model/training recipe (risk of confounding changes).
+- Two iterations in a row fail to improve the primary metric.
+- Results look suspicious (metric mismatch, unstable training, unexpected regressions).
+When calling the planner-agent in reflection mode, provide:
+- Start your message with: `MODE: REFLECTION`
+- Stage name/index and intent
+- Commands run + key parameters (model, dataset, seeds, batch size, lr, epochs, hardware)
+- Key metrics vs baseline (a small table is ideal)
+- Artifact paths (logs, plots, checkpoints)
+- Which success signals were met/unmet
+- If proposing skills, use skill ids from SKILL.md frontmatter (`name:`).
+Ask the planner-agent to output a **Plan Update JSON** with this schema:
+```json
+{
+  "completed": ["..."],
+  "unmet_success_signals": ["..."],
+  "skill_suggestions": ["..."],
+  "stage_modifications": [
+    {"stage": "Stage name or index", "change": "What to adjust and why"}
+  ],
+  "new_stages": [
+    {
+      "title": "...",
+      "goal": "...",
+      "success_signals": ["..."],
+      "what_to_run": ["..."],
+      "expected_artifacts": ["..."]
+    }
+  ],
+  "todo_updates": ["..."]
+}
+```
+Empty arrays are valid. If no changes are needed, return the JSON with empty arrays.
+Then revise `/todos.md` accordingly.
+## Step 5: Write Report
+- Write the final report to `/final_report.md` (Markdown)
+- Include:
+  - Problem summary
+  - Experiment plan (stages + success signals)
+  - Experimental setup and configurations
+  - Results and visualizations (reference artifacts)
+  - Analysis, limitations, and next steps
+- If web research was used, include a Sources section with real URLs (no fabricated citations)
+- When applicable, include effect sizes, uncertainty, and notes on statistical corrections.
+- Be precise, technical, and concise
+## Step 6: Verify
+- Re-read `/research_request.md` to ensure coverage
+- Confirm the report answers the proposal and documents key settings/results
+## Experiment Report Template (Recommended)
+1. Summary & goals
+2. Experiment plan (stages + success signals)
+3. Setup (data, model, environment, parameters)
+4. Baselines and comparisons
+5. Results (tables/figures + references to artifacts)
+6. Analysis, limitations, and next steps
+## Writing Guidelines
+- Use bullets for configs, stage lists, and key results; use short paragraphs for reasoning
+- Avoid first-person singular ("I ..."). Prefer neutral phrasing ("This experiment...") or "we" style.
+- Professional, objective tone
+## Shell Execution Guidelines
+When using the `execute` tool for shell commands:
+**Short commands** (< 30 seconds): Run directly
+```bash
+python script.py
+pip install pandas
+```
+**Long-running commands** (> 30 seconds): Run in background, then check results
+```bash
+# Step 1: Start in background, redirect output to log
+python long_task.py > /output.log 2>&1 &
+# Step 2: Check if still running
+ps aux | grep long_task
+# Step 3: Read results when done
+cat /output.log
+```
+This prevents blocking the conversation during long operations.
+"""
+# =============================================================================
+# Sub-agent delegation strategy
+# =============================================================================
+DELEGATION_STRATEGY = """# Sub-Agent Delegation
+## Default: Use 1 Sub-Agent
+For most tasks, a single sub-agent is sufficient:
+- "Plan experimental stages" → planner-agent
+- "Reflect and update the plan after a stage" → planner-agent
+- "Find related methods/baselines/datasets" → research-agent
+- "Implement baseline or training loop" → code-agent
+- "Debug runtime failures" → debug-agent
+- "Analyze metrics and plot figures" → data-analysis-agent
+- "Draft report sections" → writing-agent
+## Task Granularity
+- One sub-agent task = one topic / one experiment / one artifact bundle
+- Provide concrete file paths, commands, and success signals in each task
+  so the sub-agent can respond precisely
+## Parallelize Only When Necessary
+Use multiple sub-agents ONLY for:
+**Explicit comparisons** (1 per method/baseline):
+- "Compare A vs B vs C" → 3 parallel sub-agents
+**Distinct experiments** with separate datasets or setups:
+- "Run baselines on X and Y" → 2 parallel sub-agents
+## Limits
+- Maximum {max_concurrent} parallel sub-agents per round
+- Maximum {max_iterations} delegation rounds total
+- Stop when evidence is sufficient
+## Key Principles
+- Bias towards a single sub-agent (token-efficient)
+- Avoid premature decomposition
+- Each sub-agent returns focused, self-contained findings
+"""
+# =============================================================================
+# Sub-agent research instructions
+# =============================================================================
+RESEARCHER_INSTRUCTIONS = """You are a research assistant. Today's date is {date}.
+## Task
+Use tools to gather information on the assigned topic (methods, baselines,
+datasets, or prior results) to support experimental planning or iteration.
+Prefer actionable details: datasets, metrics, code availability, and common pitfalls.
+Do not fabricate citations or URLs.
+Capture evaluation protocols (splits, metrics, calibration) and known failure modes.
+## Available Tools
+1. **tavily_search** - Web search for information
+2. **think_tool** - Reflect on findings and plan next steps
+**CRITICAL: Use think_tool after each search**
+## Research Strategy
+1. Read the question carefully
+2. Start with broad searches
+3. After each search, reflect: Do I have enough? What's missing?
+4. Narrow searches to fill gaps
+5. Stop when you can answer confidently
+## Hard Limits
+- Simple queries: 2-3 searches maximum
+- Complex queries: up to 5 searches maximum
+- Stop after 5 searches regardless
+## Stop When
+- You can answer comprehensively
+- You have 3+ relevant sources
+- Last 2 searches returned similar information
+## Response Format
+Structure findings with clear headings and cite sources inline:
+```
+## Key Findings
+Finding one with context [1]. Another insight [2].
+## Recommended Next Experiments
+- One actionable experiment suggestion with motivation and expected outcome.
+### Sources
+[1] Title: URL
+[2] Title: URL
+```
+"""
+# =============================================================================
+# Combined exports
+# =============================================================================
+def get_system_prompt(max_concurrent: int = 3, max_iterations: int = 3) -> str:
+    """Generate the complete system prompt with configured limits."""
+    delegation = DELEGATION_STRATEGY.format(
+        max_concurrent=max_concurrent,
+        max_iterations=max_iterations,
+    )
+    return EXPERIMENT_WORKFLOW + "\n" + delegation
+# Default export (backward compatible)
+SYSTEM_PROMPT = get_system_prompt()

EvoScientist/skills/accelerate/SKILL.md ADDED Viewed

@@ -0,0 +1,332 @@
+---
+name: accelerate
+description: Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.
+version: 1.0.0
+author: Orchestra Research
+license: MIT
+tags: [Distributed Training, HuggingFace, Accelerate, DeepSpeed, FSDP, Mixed Precision, PyTorch, DDP, Unified API, Simple]
+dependencies: [accelerate, torch, transformers]
+---
+# HuggingFace Accelerate - Unified Distributed Training
+## Quick start
+Accelerate simplifies distributed training to 4 lines of code.
+**Installation**:
+```bash
+pip install accelerate
+```
+**Convert PyTorch script** (4 lines):
+```python
+import torch
++ from accelerate import Accelerator
++ accelerator = Accelerator()
+  model = torch.nn.Transformer()
+  optimizer = torch.optim.Adam(model.parameters())
+  dataloader = torch.utils.data.DataLoader(dataset)
++ model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
+  for batch in dataloader:
+      optimizer.zero_grad()
+      loss = model(batch)
+-     loss.backward()
++     accelerator.backward(loss)
+      optimizer.step()
+```
+**Run** (single command):
+```bash
+accelerate launch train.py
+```
+## Common workflows
+### Workflow 1: From single GPU to multi-GPU
+**Original script**:
+```python
+# train.py
+import torch
+model = torch.nn.Linear(10, 2).to('cuda')
+optimizer = torch.optim.Adam(model.parameters())
+dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
+for epoch in range(10):
+    for batch in dataloader:
+        batch = batch.to('cuda')
+        optimizer.zero_grad()
+        loss = model(batch).mean()
+        loss.backward()
+        optimizer.step()
+```
+**With Accelerate** (4 lines added):
+```python
+# train.py
+import torch
+from accelerate import Accelerator  # +1
+accelerator = Accelerator()  # +2
+model = torch.nn.Linear(10, 2)
+optimizer = torch.optim.Adam(model.parameters())
+dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
+model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)  # +3
+for epoch in range(10):
+    for batch in dataloader:
+        # No .to('cuda') needed - automatic!
+        optimizer.zero_grad()
+        loss = model(batch).mean()
+        accelerator.backward(loss)  # +4
+        optimizer.step()
+```
+**Configure** (interactive):
+```bash
+accelerate config
+```
+**Questions**:
+- Which machine? (single/multi GPU/TPU/CPU)
+- How many machines? (1)
+- Mixed precision? (no/fp16/bf16/fp8)
+- DeepSpeed? (no/yes)
+**Launch** (works on any setup):
+```bash
+# Single GPU
+accelerate launch train.py
+# Multi-GPU (8 GPUs)
+accelerate launch --multi_gpu --num_processes 8 train.py
+# Multi-node
+accelerate launch --multi_gpu --num_processes 16 \
+  --num_machines 2 --machine_rank 0 \
+  --main_process_ip $MASTER_ADDR \
+  train.py
+```
+### Workflow 2: Mixed precision training
+**Enable FP16/BF16**:
+```python
+from accelerate import Accelerator
+# FP16 (with gradient scaling)
+accelerator = Accelerator(mixed_precision='fp16')
+# BF16 (no scaling, more stable)
+accelerator = Accelerator(mixed_precision='bf16')
+# FP8 (H100+)
+accelerator = Accelerator(mixed_precision='fp8')
+model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
+# Everything else is automatic!
+for batch in dataloader:
+    with accelerator.autocast():  # Optional, done automatically
+        loss = model(batch)
+    accelerator.backward(loss)
+```
+### Workflow 3: DeepSpeed ZeRO integration
+**Enable DeepSpeed ZeRO-2**:
+```python
+from accelerate import Accelerator
+accelerator = Accelerator(
+    mixed_precision='bf16',
+    deepspeed_plugin={
+        "zero_stage": 2,  # ZeRO-2
+        "offload_optimizer": False,
+        "gradient_accumulation_steps": 4
+    }
+)
+# Same code as before!
+model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
+```
+**Or via config**:
+```bash
+accelerate config
+# Select: DeepSpeed → ZeRO-2
+```
+**deepspeed_config.json**:
+```json
+{
+    "fp16": {"enabled": false},
+    "bf16": {"enabled": true},
+    "zero_optimization": {
+        "stage": 2,
+        "offload_optimizer": {"device": "cpu"},
+        "allgather_bucket_size": 5e8,
+        "reduce_bucket_size": 5e8
+    }
+}
+```
+**Launch**:
+```bash
+accelerate launch --config_file deepspeed_config.json train.py
+```
+### Workflow 4: FSDP (Fully Sharded Data Parallel)
+**Enable FSDP**:
+```python
+from accelerate import Accelerator, FullyShardedDataParallelPlugin
+fsdp_plugin = FullyShardedDataParallelPlugin(
+    sharding_strategy="FULL_SHARD",  # ZeRO-3 equivalent
+    auto_wrap_policy="TRANSFORMER_AUTO_WRAP",
+    cpu_offload=False
+)
+accelerator = Accelerator(
+    mixed_precision='bf16',
+    fsdp_plugin=fsdp_plugin
+)
+model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
+```
+**Or via config**:
+```bash
+accelerate config
+# Select: FSDP → Full Shard → No CPU Offload
+```
+### Workflow 5: Gradient accumulation
+**Accumulate gradients**:
+```python
+from accelerate import Accelerator
+accelerator = Accelerator(gradient_accumulation_steps=4)
+model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)
+for batch in dataloader:
+    with accelerator.accumulate(model):  # Handles accumulation
+        optimizer.zero_grad()
+        loss = model(batch)
+        accelerator.backward(loss)
+        optimizer.step()
+```
+**Effective batch size**: `batch_size * num_gpus * gradient_accumulation_steps`
+## When to use vs alternatives
+**Use Accelerate when**:
+- Want simplest distributed training
+- Need single script for any hardware
+- Use HuggingFace ecosystem
+- Want flexibility (DDP/DeepSpeed/FSDP/Megatron)
+- Need quick prototyping
+**Key advantages**:
+- **4 lines**: Minimal code changes
+- **Unified API**: Same code for DDP, DeepSpeed, FSDP, Megatron
+- **Automatic**: Device placement, mixed precision, sharding
+- **Interactive config**: No manual launcher setup
+- **Single launch**: Works everywhere
+**Use alternatives instead**:
+- **PyTorch Lightning**: Need callbacks, high-level abstractions
+- **Ray Train**: Multi-node orchestration, hyperparameter tuning
+- **DeepSpeed**: Direct API control, advanced features
+- **Raw DDP**: Maximum control, minimal abstraction
+## Common issues
+**Issue: Wrong device placement**
+Don't manually move to device:
+```python
+# WRONG
+batch = batch.to('cuda')
+# CORRECT
+# Accelerate handles it automatically after prepare()
+```
+**Issue: Gradient accumulation not working**
+Use context manager:
+```python
+# CORRECT
+with accelerator.accumulate(model):
+    optimizer.zero_grad()
+    accelerator.backward(loss)
+    optimizer.step()
+```
+**Issue: Checkpointing in distributed**
+Use accelerator methods:
+```python
+# Save only on main process
+if accelerator.is_main_process:
+    accelerator.save_state('checkpoint/')
+# Load on all processes
+accelerator.load_state('checkpoint/')
+```
+**Issue: Different results with FSDP**
+Ensure same random seed:
+```python
+from accelerate.utils import set_seed
+set_seed(42)
+```
+## Advanced topics
+**Megatron integration**: See [references/megatron-integration.md](references/megatron-integration.md) for tensor parallelism, pipeline parallelism, and sequence parallelism setup.
+**Custom plugins**: See [references/custom-plugins.md](references/custom-plugins.md) for creating custom distributed plugins and advanced configuration.
+**Performance tuning**: See [references/performance.md](references/performance.md) for profiling, memory optimization, and best practices.
+## Hardware requirements
+- **CPU**: Works (slow)
+- **Single GPU**: Works
+- **Multi-GPU**: DDP (default), DeepSpeed, or FSDP
+- **Multi-node**: DDP, DeepSpeed, FSDP, Megatron
+- **TPU**: Supported
+- **Apple MPS**: Supported
+**Launcher requirements**:
+- **DDP**: `torch.distributed.run` (built-in)
+- **DeepSpeed**: `deepspeed` (pip install deepspeed)
+- **FSDP**: PyTorch 1.12+ (built-in)
+- **Megatron**: Custom setup
+## Resources
+- Docs: https://huggingface.co/docs/accelerate
+- GitHub: https://github.com/huggingface/accelerate
+- Version: 1.11.0+
+- Tutorial: "Accelerate your scripts"
+- Examples: https://github.com/huggingface/accelerate/tree/main/examples
+- Used by: HuggingFace Transformers, TRL, PEFT, all HF libraries