npm - @wentorai/research-plugins - Versions diffs - 1.0.0 → 1.2.0 - Mend

@wentorai/research-plugins 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (415) hide show

package/skills/research/automation/ai-scientist-v2-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,284 @@
+---
+name: ai-scientist-v2-guide
+description: "Automated scientific discovery via agentic tree search by Sakana AI"
+metadata:
+  openclaw:
+    emoji: "🧪"
+    category: "research"
+    subcategory: "automation"
+    keywords: ["scientific-discovery", "automation", "tree-search", "paper-generation", "experiment-design", "sakana-ai"]
+    source: "https://github.com/SakanaAI/AI-Scientist-v2"
+---
+# AI Scientist v2 Guide
+## Overview
+AI-Scientist-v2 is an open-source system developed by Sakana AI with over 2,000 GitHub stars that automates the full scientific research pipeline -- from idea generation through experimentation to paper writing. Building on the original AI Scientist, version 2 introduces an agentic tree search approach that systematically explores the space of research ideas, designs and runs experiments, analyzes results, and produces workshop-level scientific papers with minimal human intervention.
+The key innovation in v2 is the tree search mechanism. Rather than pursuing a single research direction linearly, the system maintains a tree of possible research trajectories. At each node, the agent can branch into multiple experimental variations, evaluate the results, and prune unpromising directions while doubling down on successful ones. This mirrors how experienced researchers navigate the research landscape -- exploring broadly at first, then focusing resources on the most promising leads.
+AI-Scientist-v2 has demonstrated the ability to generate novel, valid research papers in machine learning subfields including diffusion models, language model training, and optimization. While the generated papers are currently at workshop acceptance level, the system represents a significant step toward autonomous scientific discovery and is an invaluable tool for researchers looking to automate the more mechanical aspects of their research workflow.
+## Installation and Setup
+```bash
+# Clone the repository
+git clone https://github.com/SakanaAI/AI-Scientist-v2.git
+cd AI-Scientist-v2
+# Create a conda environment
+conda create -n ai-scientist python=3.11
+conda activate ai-scientist
+# Install dependencies
+pip install -r requirements.txt
+```
+### Prerequisites
+AI-Scientist-v2 requires several components:
+```bash
+# LLM API access (required for ideation, analysis, and writing)
+export OPENAI_API_KEY=$OPENAI_API_KEY
+# Or Anthropic
+export ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
+# GPU access for running ML experiments
+# Recommended: at least one NVIDIA GPU with 24GB+ VRAM
+# LaTeX installation for paper compilation
+# Ubuntu/Debian
+sudo apt-get install texlive-full
+# macOS
+brew install --cask mactex
+```
+### Configuration
+Set up your research configuration:
+```python
+# config.yaml
+llm:
+  provider: "openai"
+  model: "gpt-4o"
+  temperature: 0.7
+search:
+  max_depth: 5          # Maximum tree depth
+  branching_factor: 3   # Number of branches per node
+  pruning_threshold: 0.3  # Prune branches below this score
+experiment:
+  gpu_ids: [0, 1]       # Available GPUs
+  timeout_hours: 2      # Max time per experiment
+  num_seeds: 3          # Random seeds per experiment
+paper:
+  template: "icml"      # Paper template (icml, neurips, iclr)
+  max_pages: 8          # Maximum paper length
+```
+## Core Research Pipeline
+### Phase 1: Idea Generation
+The system generates research ideas by analyzing existing literature and identifying gaps or extensions:
+```python
+from ai_scientist import IdeaGenerator
+generator = IdeaGenerator(
+    research_area="efficient_transformers",
+    seed_papers=[
+        "path/to/related_paper_1.pdf",
+        "path/to/related_paper_2.pdf",
+    ],
+    num_ideas=10,
+)
+ideas = generator.generate()
+for idea in ideas:
+    print(f"Title: {idea.title}")
+    print(f"Hypothesis: {idea.hypothesis}")
+    print(f"Novelty score: {idea.novelty_score}")
+    print(f"Feasibility score: {idea.feasibility_score}")
+```
+### Phase 2: Agentic Tree Search
+The tree search mechanism explores the research space systematically:
+```python
+from ai_scientist import TreeSearchResearcher
+researcher = TreeSearchResearcher(
+    idea=ideas[0],  # Start with the top-ranked idea
+    base_code="templates/efficient_transformer/",
+    config="config.yaml",
+)
+# Run the tree search
+result = researcher.run()
+# The search tree tracks all explorations
+print(f"Tree depth reached: {result.max_depth}")
+print(f"Total experiments run: {result.total_experiments}")
+print(f"Best result: {result.best_node.metrics}")
+```
+The tree search works as follows:
+1. **Root node**: The initial research idea and baseline implementation
+2. **Expansion**: At each node, the agent proposes 2-4 modifications (hyperparameter changes, architectural tweaks, new training strategies)
+3. **Evaluation**: Each modification is implemented and evaluated experimentally
+4. **Selection**: Promising branches are selected for further exploration using UCB (Upper Confidence Bound) or similar strategies
+5. **Pruning**: Branches that underperform the baseline or show diminishing returns are pruned
+### Phase 3: Experiment Execution
+Experiments are executed in isolated environments with proper controls:
+```python
+# Each experiment node contains:
+class ExperimentNode:
+    hypothesis: str          # What we're testing
+    code_changes: list       # Specific code modifications
+    config_changes: dict     # Hyperparameter changes
+    results: dict            # Experimental results
+    analysis: str            # LLM-generated analysis
+    children: list           # Branch experiments
+```
+The system automatically handles experiment boilerplate including random seed management, metric logging, checkpoint saving, and result visualization. Each experiment is run with multiple seeds to ensure statistical significance.
+### Phase 4: Paper Generation
+After the tree search completes, the system generates a scientific paper:
+```python
+from ai_scientist import PaperWriter
+writer = PaperWriter(
+    research_result=result,
+    template="neurips",
+    sections=[
+        "introduction",
+        "related_work",
+        "method",
+        "experiments",
+        "analysis",
+        "conclusion",
+    ],
+)
+# Generate the paper
+paper = writer.write()
+# Compile to PDF
+paper.compile_latex("output/paper.pdf")
+# The paper includes:
+# - Abstract summarizing key findings
+# - Introduction with motivation and contributions
+# - Related work section with citations
+# - Method description with equations
+# - Experiment section with tables and figures
+# - Analysis of results with ablation studies
+# - Conclusion with future work directions
+```
+## Research Templates
+AI-Scientist-v2 includes several research templates that define the experimental domain:
+### NanoGPT Template
+Train and evaluate small language models with various architectural modifications:
+```bash
+python run_scientist.py \
+  --template nanoGPT \
+  --idea "Investigate the effect of rotary position embeddings on small-scale language model training" \
+  --max_experiments 20
+```
+### Diffusion Model Template
+Experiment with diffusion model architectures and training strategies:
+```bash
+python run_scientist.py \
+  --template diffusion \
+  --idea "Compare noise schedules for conditional image generation"
+```
+### Creating Custom Templates
+Define your own research template for your specific domain:
+```python
+# templates/my_domain/template.py
+class MyDomainTemplate:
+    name = "my_research_domain"
+    base_metrics = ["accuracy", "f1_score", "inference_time"]
+    def setup_baseline(self):
+        """Set up the baseline experiment."""
+        pass
+    def evaluate(self, model, data):
+        """Evaluate a model configuration."""
+        pass
+    def get_modification_space(self):
+        """Define the space of possible modifications."""
+        return {
+            "architecture": ["transformer", "lstm", "mamba"],
+            "learning_rate": [1e-4, 3e-4, 1e-3],
+            "batch_size": [32, 64, 128],
+        }
+```
+## Automated Paper Review
+AI-Scientist-v2 includes an automated reviewer that evaluates generated papers using criteria from top ML venues:
+```python
+from ai_scientist import PaperReviewer
+reviewer = PaperReviewer(
+    venue="neurips",
+    review_criteria=[
+        "novelty",
+        "significance",
+        "clarity",
+        "correctness",
+        "reproducibility",
+    ],
+)
+review = reviewer.review("output/paper.pdf")
+print(f"Overall score: {review.overall_score}/10")
+print(f"Strengths: {review.strengths}")
+print(f"Weaknesses: {review.weaknesses}")
+print(f"Questions: {review.questions}")
+```
+## Ethical Considerations and Limitations
+When using AI-Scientist-v2, keep these considerations in mind:
+- **Human oversight**: Always review generated papers for correctness before submission. The system can produce plausible-sounding but incorrect analyses.
+- **Attribution**: If using AI-Scientist-v2 outputs in publications, disclose the use of automated research tools per venue guidelines.
+- **Scope**: The system works best for incremental research within well-defined experimental frameworks. Breakthrough conceptual contributions still require human creativity.
+- **Compute cost**: Tree search with multiple seeds per experiment can require substantial GPU time. Set appropriate budgets and timeouts.
+- **Reproducibility**: All experiments are logged with seeds, configurations, and code versions for full reproducibility.
+## References
+- Repository: https://github.com/SakanaAI/AI-Scientist-v2
+- Original AI Scientist paper: https://arxiv.org/abs/2408.06292
+- Sakana AI: https://sakana.ai/
+- AI Scientist v1: https://github.com/SakanaAI/AI-Scientist

package/skills/research/automation/aim-experiment-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,234 @@
+---
+name: aim-experiment-guide
+description: "Track and compare research experiments with Aim experiment tracker"
+metadata:
+  openclaw:
+    emoji: "🎯"
+    category: "research"
+    subcategory: "automation"
+    keywords: ["experiment-tracking", "visualization", "mlops", "reproducibility", "metrics", "hyperparameters"]
+    source: "https://github.com/aimhubio/aim"
+---
+# Aim Experiment Tracker Guide
+## Overview
+Aim is an open-source experiment tracking platform designed for researchers and ML engineers who need to log, compare, and analyze large numbers of experiments. Unlike cloud-based tracking services that require sending data to external servers, Aim runs entirely on your own infrastructure, making it suitable for research environments with data privacy requirements or institutional restrictions on external services.
+The core problem Aim solves is experiment management at scale. A typical research project involves hundreds or thousands of training runs with different hyperparameters, data splits, model architectures, and random seeds. Without systematic tracking, researchers lose track of which configurations produced which results, leading to wasted computation and unreproducible findings. Aim provides a high-performance storage backend and a rich web UI for logging, querying, and visualizing experiment metadata and metrics.
+With over 6,000 GitHub stars, Aim has established itself as a compelling self-hosted alternative to tools like Weights and Biases and MLflow. Its Python-native API integrates with minimal friction into existing training loops, and the query language enables sophisticated filtering across thousands of runs.
+## Installation and Setup
+Install Aim via pip:
+```bash
+pip install aim
+```
+Initialize an Aim repository in your project directory:
+```bash
+cd /path/to/research-project
+aim init
+```
+This creates a `.aim` directory that stores all experiment data locally. Launch the web UI:
+```bash
+aim up
+```
+The dashboard becomes available at `http://localhost:43800`, providing interactive visualizations of all tracked experiments.
+For remote server deployment:
+```bash
+aim up --host 0.0.0.0 --port 43800
+```
+## Core Features
+**Experiment Logging**: Integrate Aim tracking into your training scripts with minimal code changes:
+```python
+from aim import Run
+# Initialize a tracked run
+run = Run(experiment="protein_folding_v2")
+# Log hyperparameters
+run["hparams"] = {
+    "learning_rate": 0.001,
+    "batch_size": 64,
+    "model": "transformer",
+    "num_layers": 6,
+    "hidden_dim": 256,
+    "dropout": 0.1,
+    "optimizer": "adamw",
+    "weight_decay": 0.01,
+    "seed": 42,
+}
+# Log dataset information
+run["dataset"] = {
+    "name": "protein_benchmark_v3",
+    "train_size": 50000,
+    "val_size": 5000,
+    "test_size": 5000,
+}
+# Track metrics during training
+for epoch in range(num_epochs):
+    train_loss = train_one_epoch(model, train_loader)
+    val_loss, val_accuracy = evaluate(model, val_loader)
+    run.track(train_loss, name="loss", context={"subset": "train"})
+    run.track(val_loss, name="loss", context={"subset": "val"})
+    run.track(val_accuracy, name="accuracy", context={"subset": "val"})
+```
+**Framework Integrations**: Aim provides built-in callbacks for popular training frameworks:
+```python
+# PyTorch Lightning integration
+from aim.pytorch_lightning import AimLogger
+aim_logger = AimLogger(experiment="lightning_exp")
+trainer = pl.Trainer(logger=aim_logger, max_epochs=100)
+# Hugging Face Transformers integration
+from aim.hugging_face import AimCallback
+aim_callback = AimCallback(experiment="hf_training")
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    callbacks=[aim_callback],
+)
+# Keras integration
+from aim.keras import AimCallback as KerasAimCallback
+model.fit(
+    x_train, y_train,
+    callbacks=[KerasAimCallback(experiment="keras_exp")],
+    epochs=50,
+)
+```
+**Powerful Query Language**: Filter and retrieve experiments programmatically:
+```python
+from aim import Repo
+repo = Repo("/path/to/research-project")
+# Query runs matching specific criteria
+query = """
+run.experiment == "protein_folding_v2"
+and run.hparams.learning_rate < 0.01
+and run.hparams.model == "transformer"
+"""
+for run in repo.query_runs(query).iter_runs():
+    print(f"Run: {run.hash}")
+    print(f"  LR: {run['hparams']['learning_rate']}")
+    print(f"  Final val loss: {run['loss']}")
+```
+**Rich Visualizations**: The web UI provides interactive charts for comparing experiments:
+- Line charts for metric trajectories across epochs
+- Parallel coordinates plots for hyperparameter exploration
+- Scatter plots correlating hyperparameters with final metrics
+- Distribution plots for metric analysis across run groups
+- Image and audio tracking for multimedia experiments
+## Research Workflow Integration
+**Hyperparameter Search Analysis**: After running grid search or random search experiments, use Aim to identify the best configurations:
+```python
+from aim import Repo
+repo = Repo(".")
+# Find the best run by validation accuracy
+best_run = None
+best_acc = 0.0
+for run_metrics in repo.query_metrics(
+    "metric.name == 'accuracy' and metric.context.subset == 'val'"
+).iter_runs():
+    for metric in run_metrics:
+        final_val = list(metric.values.values())[-1]
+        if final_val > best_acc:
+            best_acc = final_val
+            best_run = metric.run.hash
+print(f"Best run: {best_run} with accuracy {best_acc:.4f}")
+```
+**Reproducibility Documentation**: Every tracked run captures the full hyperparameter configuration, making it straightforward to include exact experimental details in paper methods sections and supplementary materials.
+**Ablation Studies**: Tag runs with ablation group identifiers and use the comparison UI to visualize the impact of each component:
+```python
+run = Run(experiment="ablation_study")
+run["hparams"] = config
+run["ablation"] = {
+    "group": "attention_mechanism",
+    "variant": "multi_head",
+    "description": "Standard multi-head attention vs. linear attention",
+}
+```
+**Lab Notebook Integration**: Export experiment summaries for inclusion in electronic lab notebooks. The query API enables automated report generation:
+```python
+import pandas as pd
+from aim import Repo
+repo = Repo(".")
+records = []
+for run_metrics in repo.query_metrics(
+    "metric.name == 'accuracy'"
+).iter_runs():
+    run = run_metrics.run
+    for metric in run_metrics:
+        values = list(metric.values.values())
+        records.append({
+            "run_hash": run.hash[:8],
+            "model": run["hparams"].get("model"),
+            "lr": run["hparams"].get("learning_rate"),
+            "final_accuracy": values[-1] if values else None,
+        })
+df = pd.DataFrame(records)
+df.to_csv("experiment_summary.csv", index=False)
+```
+## Storage and Performance
+Aim uses a custom high-performance storage engine optimized for time-series metrics data. The storage scales to millions of tracked values across thousands of runs without significant degradation in query performance.
+Data is stored locally in the `.aim` directory. Back up this directory to preserve your experiment history. For team settings, the Aim server can be deployed as a shared service accessible to multiple researchers.
+```bash
+# Check storage usage
+du -sh .aim/
+# Export data for archival
+aim storage --repo . upgrade 3.0
+```
+## References
+- Aim repository: https://github.com/aimhubio/aim
+- Aim documentation: https://aimstack.readthedocs.io/
+- Aim UI demo and screenshots in the repository wiki
+- Comparison with MLflow and Weights and Biases in the documentation