npm - claude-turing - Versions diffs - 4.6.0 → 4.8.0 - Mend

claude-turing 4.6.0 → 4.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (333) hide show

package/skills/turing/feature/SKILL.md ADDED Viewed

@@ -0,0 +1,41 @@
+---
+name: feature
+description: Automated feature selection — multi-method importance consensus, redundancy detection, and interaction feature generation.
+argument-hint: "[--method all|importance] [--top-k 20]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+Systematically evaluate which features matter and which are noise.
+## Steps
+1. **Activate environment:**
+   ```bash
+   source .venv/bin/activate
+   ```
+2. **Parse arguments from `$ARGUMENTS`:**
+   - `--method all|importance|selection|generation` — analysis type (default: all)
+   - `--top-k 20` — number of top features to consider
+   - `--json` — raw JSON output
+3. **Run feature analysis:**
+   ```bash
+   python scripts/feature_intelligence.py $ARGUMENTS
+   ```
+4. **Report includes:**
+   - Consensus ranking: features ranked by number of methods placing them in top-K
+   - Per-method ranks: mutual information, L1, tree-based
+   - Redundant pairs: features with |r| > 0.95
+   - Candidate interaction features from top consensus set
+   - Drop recommendation for zero-consensus features
+5. **Saved output:** report in `experiments/features/features-*.yaml`
+## Examples
+```
+/turing:feature                      # Full analysis
+/turing:feature --top-k 10           # Top-10 consensus
+```

package/skills/turing/flashback/SKILL.md ADDED Viewed

@@ -0,0 +1,21 @@
+---
+name: flashback
+description: Session context restoration — "where was I?" summary after days away. Current best, pending hypotheses, last session, annotations.
+argument-hint: "[--days 7] [--last 10]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+Come back to a project after a week and start working in 10 seconds instead of 30 minutes.
+## Steps
+1. **Activate environment:** `source .venv/bin/activate`
+2. **Run:** `python scripts/session_flashback.py $ARGUMENTS`
+3. **Report:** current best, last session experiments, pending hypotheses, annotations, budget, suggested next action
+4. **Saved output:** `experiments/flashbacks/flashback-*.yaml`
+## Examples
+```
+/turing:flashback                    # Default: last 7 days
+/turing:flashback --days 14          # 2-week lookback
+/turing:flashback --last 5           # Last 5 experiments
+```

package/skills/turing/fork/SKILL.md ADDED Viewed

@@ -0,0 +1,39 @@
+---
+name: fork
+description: Branch an experiment into parallel tracks — run both A and B, report the winner.
+argument-hint: "<exp-id> --branches \"approach A\" \"approach B\" [--auto-promote]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+Fork an experiment into parallel branches and compare results.
+## Steps
+1. **Activate environment:**
+   ```bash
+   source .venv/bin/activate
+   ```
+2. **Parse arguments from `$ARGUMENTS`:**
+   - First argument is the parent experiment ID
+   - `--branches "A" "B" "C"` — branch descriptions (2+ required)
+   - `--auto-promote` — automatically keep the winning branch
+3. **Run fork:**
+   ```bash
+   python scripts/fork_experiment.py $ARGUMENTS
+   ```
+4. **Report results:**
+   - Comparison tree showing each branch's metric
+   - Winner identified and marked
+   - Recommendation: promote winner, abandon rest
+5. **Saved output:** report written to `experiments/forks/exp-NNN-fork.yaml`
+## Examples
+```
+/turing:fork exp-042 --branches "LightGBM with dart" "XGBoost deeper trees"
+/turing:fork exp-042 --branches "A" "B" "C" --auto-promote
+```

package/skills/turing/frontier/SKILL.md ADDED Viewed

@@ -0,0 +1,44 @@
+---
+name: frontier
+description: Visualize Pareto frontier across multiple objectives — answers "which model is actually best?" when there are tradeoffs.
+argument-hint: "[--metrics \"accuracy,train_seconds,n_params\"] [--ascii]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+Visualize the Pareto frontier across multiple objectives from experiment history.
+## Steps
+1. **Activate environment:**
+   ```bash
+   source .venv/bin/activate
+   ```
+2. **Parse arguments from `$ARGUMENTS`:**
+   - `--metrics "accuracy,train_seconds,n_params"` specifies metrics to analyze
+   - Without `--metrics`, uses primary metric + train_seconds from config
+   - `--ascii` generates an ASCII scatter plot (2D projection)
+3. **Run Pareto analysis:**
+   ```bash
+   python scripts/pareto_frontier.py $ARGUMENTS
+   ```
+4. **Report results:**
+   - **Pareto-optimal experiments:** table with all metrics and what each is best at
+   - **Dominated experiments:** with their nearest Pareto neighbor
+   - **ASCII scatter plot** (if `--ascii`): 2D projection with * for Pareto, · for dominated
+   - Summary: "N Pareto-optimal of M experiments across K metrics"
+5. **Saved output:** results written to `experiments/frontiers/frontier-YYYY-MM-DD.yaml`
+6. **If no experiments have all requested metrics:** suggest which metrics are available.
+## Examples
+```
+/turing:frontier                                              # Default: metric vs time
+/turing:frontier --metrics "accuracy,train_seconds"           # 2D frontier
+/turing:frontier --metrics "accuracy,train_seconds,n_params"  # 3D frontier
+/turing:frontier --ascii                                      # With scatter plot
+```

package/skills/turing/init/SKILL.md ADDED Viewed

@@ -0,0 +1,153 @@
+---
+name: init
+description: Initialize a new ML project with the Turing autoresearch harness. Scaffolds the full experiment infrastructure — immutable evaluation pipeline, agent-editable training code, structured logging, convergence detection hooks, and a Python virtual environment. Use --plan to generate a research plan.
+argument-hint: "[project_name] [--plan]"
+allowed-tools: Read, Write, Edit, Bash(*), Grep, Glob, WebSearch, WebFetch
+---
+Scaffold a new ML project with the Turing autoresearch harness. This creates the separation between the measurement apparatus (READ-ONLY) and the hypothesis space (AGENT-EDITABLE) that makes autonomous experimentation trustworthy.
+## Interactive Setup
+Ask the user for the following (or accept from `$ARGUMENTS` if provided as JSON):
+1. **Project name** (`{{PROJECT_NAME}}`): Name of the ML project (e.g., "sentiment", "churn", "fraud-detection")
+2. **Target metric** (`{{TARGET_METRIC}}`): Primary metric to optimize (e.g., "accuracy", "f1", "mae", "mse", "auc")
+3. **Metric direction**: Is lower better (mae, mse, loss) or higher better (accuracy, f1, auc)?
+4. **Task description** (`{{TASK_DESCRIPTION}}`): What the model does (e.g., "Predict customer churn from usage data")
+5. **ML directory** (`{{ML_DIR}}`): Where ML files go relative to project root (e.g., "ml/sentiment")
+6. **Data source** (`{{DATA_SOURCE}}`): Where training data comes from (e.g., "data/reviews.csv")
+## Scaffolding
+Once you have all 6 values, delegate to the unified scaffolding script:
+```bash
+python3 <templates_dir>/scripts/scaffold.py \
+    --project-name "<project_name>" \
+    --target-metric "<target_metric>" \
+    --metric-direction "<metric_direction>" \
+    --task-description "<task_description>" \
+    --ml-dir "<ml_dir>" \
+    --data-source "<data_source>" \
+    --templates-dir "<templates_dir>"
+```
+The scaffold script handles everything in a single atomic operation:
+- Copies all template files with placeholder substitution
+- Creates data/, experiments/, models/ directories
+- Sets up agent memory at `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md`
+- Configures Claude Code hooks in `.claude/settings.local.json`
+- Creates Python virtual environment and installs requirements
+- Verifies all placeholders were replaced (fails loudly if any remain)
+## Locating Templates
+Use the installed command-pack templates directory first:
+```
+.claude/commands/turing/templates/
+~/.claude/commands/turing/templates/
+```
+Then fall back to plugin or npm locations:
+```
+~/.claude/plugins/*/templates/
+node_modules/claude-turing/templates/
+```
+Example command:
+```bash
+python3 ~/.claude/commands/turing/templates/scripts/scaffold.py \
+    --project-name "<project_name>" \
+    --target-metric "<target_metric>" \
+    --metric-direction "<metric_direction>" \
+    --task-description "<task_description>" \
+    --ml-dir "<ml_dir>" \
+    --data-source "<data_source>" \
+    --templates-dir ~/.claude/commands/turing/templates
+```
+## After Scaffolding
+Report what was created:
+- The separation: READ-ONLY (`prepare.py`, `evaluate.py`) vs AGENT-EDITABLE (`train.py`)
+- Next steps: add data to the configured data source path, run `python prepare.py`, then `/turing:train`
+- The taste-leverage loop: `/turing:try` to inject hypotheses, `/turing:brief` for intelligence reports
+## Research Plan Generation (--plan flag)
+If `$ARGUMENTS` contains `--plan`, generate a research plan AFTER scaffolding. This gives the agent strategic direction for its first 5-10 experiments rather than ad-hoc exploration.
+### Steps:
+1. **Read the task context** from the just-created `config.yaml`: task description, model type, target metric, data source.
+2. **Search literature** with `WebSearch` for the task domain:
+   - "state of the art <task description> machine learning 2024 2025"
+   - "best model <target metric> <data type> benchmark"
+   - "<task description> common approaches survey"
+   Use `WebFetch` on top 2-3 results to extract: dominant model families, typical metric ranges, known challenges.
+3. **Generate `RESEARCH_PLAN.md`** in the ML project directory with this structure:
+   ```markdown
+   # Research Plan: <task description>
+   Generated: <date>
+   ## Task Summary
+   <one paragraph describing the task, data, and success criteria>
+   ## Model Families to Explore
+   Ordered by expected relevance based on literature:
+   1. **<family 1>** — <why, with citation>
+   2. **<family 2>** — <why, with citation>
+   3. **<family 3>** — <why, with citation>
+   ## Evaluation Strategy
+   - Primary metric: <metric> (<higher/lower> is better)
+   - Multi-run recommendation: <yes/no, based on expected variance>
+   - Baseline target: <realistic first-pass metric from literature>
+   ## Search Budget
+   - <N> experiments per model family before moving on
+   - Total budget: <N> experiments before first convergence check
+   ## Success Criteria
+   - Target metric: <value from literature benchmarks>
+   - Convergence: <patience> consecutive non-improvements
+   ## Known Challenges
+   - <challenge 1 from literature, e.g., "class imbalance common in this domain">
+   - <challenge 2>
+   ## Sources
+   - <citation 1>
+   - <citation 2>
+   ```
+4. **Self-critique the plan** (one round):
+   - Are the model families ordered by evidence strength?
+   - Is the budget realistic?
+   - Are the success criteria grounded in benchmark data?
+   Revise if any section is vague or unsupported.
+5. **Report:** "Research plan generated at `<ml_dir>/RESEARCH_PLAN.md`. The agent will read this during `/turing:train` for strategic direction."
+### Integration
+The agent's `program.md` OBSERVE step reads `RESEARCH_PLAN.md` (if it exists) for strategic direction. The plan is advisory — the agent can deviate but should note why in `experiment_state.yaml`.
+## Multiple Projects
+You can scaffold multiple ML projects in the same repository:
+```bash
+/turing:init    # First project: prompts for ml_dir (e.g., ml/sentiment)
+/turing:init    # Second project: prompts for ml_dir (e.g., ml/churn)
+```
+Each project gets its own directory with independent config, data, experiments, and models. `/turing:train ml/sentiment` or `/turing:train ml/churn` targets a specific project. If you `cd ml/sentiment` first, `/turing:train` auto-detects from cwd.
+Agent memory is scoped per project: `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md`

package/skills/turing/leak/SKILL.md ADDED Viewed

@@ -0,0 +1,46 @@
+---
+name: leak
+description: Targeted leakage detection — probe for data leakage with single-feature tests, correlation checks, and train/test overlap detection.
+argument-hint: "[--deep] [--features feature_1,feature_2]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+Actively probe for data leakage. The #1 cause of "too good to be true" results.
+## Steps
+1. **Activate environment:**
+   ```bash
+   source .venv/bin/activate
+   ```
+2. **Parse arguments from `$ARGUMENTS`:**
+   - `--deep` — run full single-feature analysis (slow but thorough)
+   - `--features "feat_1,feat_2"` — check specific features
+   - `--json` — raw JSON output
+3. **Run leakage scan:**
+   ```bash
+   python scripts/leakage_detector.py $ARGUMENTS
+   ```
+4. **Checks performed:**
+   - **Feature-target correlation:** flag features with >0.95 correlation to target
+   - **Single-feature predictiveness (--deep):** train on each feature alone, flag any that achieve >80% of full model performance
+   - **Train/test overlap:** hash-based deduplication across splits
+5. **Verdicts:**
+   - **CLEAN** — no leakage detected
+   - **SUSPICIOUS** — warnings to review
+   - **LEAKAGE DETECTED** — critical flags found
+6. **Integration:** satisfies the "data leakage" check in `/turing:audit`
+7. **Saved output:** report in `experiments/leakage/leak-*.yaml`
+## Examples
+```
+/turing:leak                    # Standard correlation + overlap checks
+/turing:leak --deep             # Full single-feature analysis
+```

package/skills/turing/lit/SKILL.md ADDED Viewed

@@ -0,0 +1,46 @@
+---
+name: lit
+description: Literature search scoped to the current experiment domain — find papers, SOTA baselines, and related work without leaving the terminal.
+argument-hint: "<query> | --baseline | --related <exp-id>"
+allowed-tools: Read, Bash(*), Grep, Glob, WebSearch
+---
+Search the literature for papers, baselines, and related work.
+## Steps
+1. **Activate environment:**
+   ```bash
+   source .venv/bin/activate
+   ```
+2. **Parse arguments from `$ARGUMENTS`:**
+   - **Free query:** `"gradient boosting for tabular data"` — searches Semantic Scholar
+   - **Baseline:** `--baseline` — finds SOTA results for the current task, compares against your best
+   - **Related:** `--related exp-042` — finds papers using similar methods to a specific experiment
+   - `--auto-queue` — auto-queues hypotheses from literature with `source: "literature"`
+   - `--limit 10` — max number of results
+3. **Run literature search:**
+   ```bash
+   python scripts/literature_search.py $ARGUMENTS
+   ```
+4. **Report results:**
+   - **Papers:** title, authors, year, venue, citations, abstract snippet, URL
+   - **Baseline mode:** SOTA comparison with gap analysis against current best
+   - **Related mode:** methodological differences worth investigating
+   - **Hypotheses:** if `--auto-queue`, shows queued experiments from findings
+5. **Saved output:** results written to `experiments/literature/query-YYYY-MM-DD-HHMMSS.md`
+6. **If API unavailable:** reports error and suggests manual search.
+## Examples
+```
+/turing:lit "gradient boosting missing values"    # Free query
+/turing:lit --baseline                             # SOTA comparison
+/turing:lit --related exp-042                      # Related work
+/turing:lit --auto-queue "ensemble methods"        # Queue hypotheses
+```

package/skills/turing/logbook/SKILL.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+name: logbook
+description: Generate a research logbook showing the full experiment narrative — hypotheses proposed, experiments run, decisions made, and progress over time. Outputs HTML (with interactive chart) or markdown.
+argument-hint: "[--since YYYY-MM-DD] [--format html|markdown] [--output path]"
+allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob
+---
+Generate a research logbook that captures the full narrative of the experiment campaign.
+## Steps
+1. **Generate the logbook:**
+   ```bash
+   source .venv/bin/activate && python scripts/generate_logbook.py
+   ```
+   **With options from `$ARGUMENTS`:**
+   - `--since 2026-03-15` — only include events after this date
+   - `--format markdown` — output as markdown instead of HTML
+   - `--output logbook.html` — write to file instead of stdout
+   **Common usage:**
+   ```bash
+   # HTML logbook with interactive trajectory chart
+   source .venv/bin/activate && python scripts/generate_logbook.py --output logbook.html
+   # Markdown for embedding in docs or READMEs
+   source .venv/bin/activate && python scripts/generate_logbook.py --format markdown --output logbook.md
+   # Last week's activity
+   source .venv/bin/activate && python scripts/generate_logbook.py --since 2026-03-24 --output logbook.html
+   ```
+2. **Present the result:**
+   - If HTML: tell the user to open the file in their browser. The logbook includes an interactive Chart.js trajectory visualization.
+   - If markdown: display inline or note the output file location.
+## What the Logbook Contains
+- **Campaign summary:** total experiments, keep rate, best metric, hypothesis count
+- **Improvement trajectory:** interactive line chart showing metric progression and best-so-far envelope
+- **Experiment log:** every experiment with ID, description, metric value, status (kept/discarded), date
+- **Hypothesis queue:** every hypothesis with source (human/agent/literature), status, priority
+## When to Use
+- To share progress with collaborators
+- Before and after meetings to show what was tried
+- To archive a completed research campaign
+- To track progress over a specific time period

package/skills/turing/merge/SKILL.md ADDED Viewed

@@ -0,0 +1,23 @@
+---
+name: merge
+description: Model merging — average weights from multiple checkpoints into a single model (soups, TIES, DARE). Free accuracy, zero latency cost.
+argument-hint: "<exp-ids...> [--method uniform|greedy|ties|dare]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+Combine model weights (not predictions) into a single, better model with no latency overhead.
+## Steps
+1. **Activate environment:** `source .venv/bin/activate`
+2. **Run:** `python scripts/model_merger.py $ARGUMENTS`
+3. **Methods:** uniform soup (simple average), greedy soup (include only if improves), TIES (trim+elect+merge), DARE (drop+rescale)
+4. **Report:** compatibility check, per-model metrics, method comparison, improvement delta
+5. **Saved output:** `experiments/merges/merge-*.yaml`
+## Examples
+```
+/turing:merge exp-042 exp-053 exp-067              # All methods
+/turing:merge exp-042 exp-053 --method greedy      # Greedy soup only
+```

package/skills/turing/mode/SKILL.md ADDED Viewed

@@ -0,0 +1,42 @@
+---
+name: mode
+description: Set the research strategy mode — explore (try new things), exploit (refine what works), or replicate (verify results). Drives novelty guard policy and agent behavior.
+argument-hint: "<explore|exploit|replicate>"
+---
+Set the research mode for the current project. The mode determines how the novelty guard filters proposed experiments and how the agent prioritizes its work.
+## Modes
+| Mode | Novelty Guard Policy | Agent Behavior |
+|------|---------------------|----------------|
+| **explore** | Allow novel ideas, block repeats and follow-ups | Try fundamentally different approaches |
+| **exploit** | Allow follow-ups and known successes, block repeats | Refine the current best configuration |
+| **replicate** | Allow duplicate runs, block novel ideas | Re-run best experiments with different seeds |
+## Steps
+1. **Parse mode** from `$ARGUMENTS`. Must be one of: `explore`, `exploit`, `replicate`.
+2. **Update experiment state:**
+   ```bash
+   source .venv/bin/activate
+   python -c "
+   import yaml
+   from pathlib import Path
+   path = Path('experiment_state.yaml')
+   state = yaml.safe_load(path.read_text()) if path.exists() else {}
+   state['research_mode'] = '$ARGUMENTS'
+   path.write_text(yaml.dump(state, default_flow_style=False))
+   print(f'Research mode set to: $ARGUMENTS')
+   "
+   ```
+3. **Confirm** with guidance:
+   - `explore`: "The agent will prioritize novel ideas and avoid follow-ups. Best when the current approach feels exhausted."
+   - `exploit`: "The agent will refine the current best. Best when you have a promising direction."
+   - `replicate`: "The agent will re-run experiments for statistical verification. Best before declaring a winner."
+## Default
+The default mode is `exploit` (refine what works). Change to `explore` when plateauing, `replicate` before final decisions.

package/skills/turing/onboard/SKILL.md ADDED Viewed

@@ -0,0 +1,19 @@
+---
+name: onboard
+description: Project onboarding — generate a walkthrough for new collaborators. Task, history, decisions, next steps.
+argument-hint: "[--audience researcher|engineer|stakeholder] [--depth brief|full]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+5-minute read that replaces a 1-hour onboarding meeting.
+## Steps
+1. `source .venv/bin/activate`
+2. `python scripts/generate_onboarding.py $ARGUMENTS`
+3. **Saved:** `ONBOARDING.md`
+## Examples
+```
+/turing:onboard
+/turing:onboard --audience engineer --depth brief
+```

package/skills/turing/paper/SKILL.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: paper
+description: Draft mechanical paper sections (setup, results, ablation, hyperparameters) from experiment logs. LaTeX and markdown output.
+argument-hint: "[--sections setup,results,ablation] [--format latex|markdown]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+Draft paper sections directly from experiment data.
+## Steps
+1. **Activate environment:**
+   ```bash
+   source .venv/bin/activate
+   ```
+2. **Parse arguments from `$ARGUMENTS`:**
+   - `--sections setup,results,ablation,hyperparameters` — which sections to draft (default: all)
+   - `--format latex|markdown` — output format (default: latex)
+3. **Run paper drafting:**
+   ```bash
+   python scripts/draft_paper_sections.py $ARGUMENTS
+   ```
+4. **Report results:**
+   - **setup:** Experimental setup prose (dataset, metrics, split, seed methodology)
+   - **results:** Comparison table with all model types, best bolded, seed study stats
+   - **ablation:** Ablation table from `/turing:ablate` results
+   - **hyperparameters:** Appendix-style parameter table per model
+5. **Output:** Each section saved to `paper/sections/` as `.tex` or `.md`
+6. **Numbers are pulled directly from experiment logs** — no manual transcription needed.
+## Examples
+```
+/turing:paper                                        # All sections, LaTeX
+/turing:paper --format markdown                      # All sections, markdown
+/turing:paper --sections setup,results               # Just setup + results
+/turing:paper --sections ablation --format latex      # Just ablation table
+```

package/skills/turing/plan/SKILL.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: plan
+description: Research planning assistant — design a strategic experiment campaign with budget-aware ROI allocation.
+argument-hint: "[--budget 20] [--goal \"maximize F1 for production\"]"
+allowed-tools: Read, Bash(*), Grep, Glob
+---
+Design the next N experiments strategically, not randomly. Allocates budget by expected ROI.
+## Steps
+1. `source .venv/bin/activate`
+2. `python scripts/research_planner.py $ARGUMENTS`
+3. **Saved:** `experiments/plans/`
+## How it works
+- Analyzes experiment history to compute per-family ROI
+- Adjusts strategy priorities based on project state and goal
+- Allocates budget across: feature engineering, model search, ensemble, calibration, verification
+- Generates phased plan with specific experiment descriptions
+## Examples
+```
+/turing:plan --budget 20
+/turing:plan --budget 10 --goal "maximize F1 for production deployment"
+/turing:plan --budget 30 --json
+```