npm - azclaude-copilot - Versions diffs - 0.1.0 - Mend

azclaude-copilot 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (108) hide show

package/.claude-plugin/marketplace.json +27 -0
package/.claude-plugin/plugin.json +17 -0
package/LICENSE +21 -0
package/README.md +477 -0
package/bin/cli.js +1027 -0
package/bin/copilot.js +228 -0
package/hooks/README.md +3 -0
package/hooks/hooks.json +40 -0
package/package.json +41 -0
package/templates/CLAUDE.md +51 -0
package/templates/agents/cc-cli-integrator.md +104 -0
package/templates/agents/cc-template-author.md +109 -0
package/templates/agents/cc-test-maintainer.md +101 -0
package/templates/agents/code-reviewer.md +136 -0
package/templates/agents/loop-controller.md +118 -0
package/templates/agents/orchestrator-init.md +196 -0
package/templates/agents/test-writer.md +129 -0
package/templates/capabilities/evolution/cycle2-knowledge.md +87 -0
package/templates/capabilities/evolution/cycle3-topology.md +128 -0
package/templates/capabilities/evolution/detect.md +103 -0
package/templates/capabilities/evolution/evaluate.md +90 -0
package/templates/capabilities/evolution/generate.md +123 -0
package/templates/capabilities/evolution/re-derivation.md +77 -0
package/templates/capabilities/intelligence/debate.md +104 -0
package/templates/capabilities/intelligence/elo.md +122 -0
package/templates/capabilities/intelligence/experiment.md +86 -0
package/templates/capabilities/intelligence/opro.md +84 -0
package/templates/capabilities/intelligence/pipeline.md +149 -0
package/templates/capabilities/level-builders/level1-claudemd.md +52 -0
package/templates/capabilities/level-builders/level2-mcp.md +58 -0
package/templates/capabilities/level-builders/level3-skills.md +276 -0
package/templates/capabilities/level-builders/level4-memory.md +72 -0
package/templates/capabilities/level-builders/level5-agents.md +123 -0
package/templates/capabilities/level-builders/level6-hooks.md +119 -0
package/templates/capabilities/level-builders/level7-extmcp.md +60 -0
package/templates/capabilities/level-builders/level8-orchestrated.md +98 -0
package/templates/capabilities/manifest.md +58 -0
package/templates/capabilities/shared/5-layer-agent.md +206 -0
package/templates/capabilities/shared/completion-rule.md +44 -0
package/templates/capabilities/shared/context-artifacts.md +96 -0
package/templates/capabilities/shared/domain-advisor-generator.md +205 -0
package/templates/capabilities/shared/friction-log.md +43 -0
package/templates/capabilities/shared/multi-cli-paths.md +56 -0
package/templates/capabilities/shared/native-tools.md +199 -0
package/templates/capabilities/shared/plan-tracker.md +69 -0
package/templates/capabilities/shared/pressure-test.md +88 -0
package/templates/capabilities/shared/quality-check.md +83 -0
package/templates/capabilities/shared/reflexes.md +159 -0
package/templates/capabilities/shared/review-reception.md +70 -0
package/templates/capabilities/shared/security.md +174 -0
package/templates/capabilities/shared/semantic-boundary-check.md +140 -0
package/templates/capabilities/shared/session-rhythm.md +42 -0
package/templates/capabilities/shared/tdd.md +54 -0
package/templates/capabilities/shared/vocabulary-transform.md +63 -0
package/templates/commands/add.md +152 -0
package/templates/commands/audit.md +123 -0
package/templates/commands/blueprint.md +115 -0
package/templates/commands/copilot.md +157 -0
package/templates/commands/create.md +156 -0
package/templates/commands/debate.md +75 -0
package/templates/commands/deps.md +112 -0
package/templates/commands/doc.md +100 -0
package/templates/commands/dream.md +120 -0
package/templates/commands/evolve.md +170 -0
package/templates/commands/explain.md +25 -0
package/templates/commands/find.md +100 -0
package/templates/commands/fix.md +122 -0
package/templates/commands/hookify.md +100 -0
package/templates/commands/level-up.md +48 -0
package/templates/commands/loop.md +62 -0
package/templates/commands/migrate.md +119 -0
package/templates/commands/persist.md +73 -0
package/templates/commands/pulse.md +87 -0
package/templates/commands/refactor.md +97 -0
package/templates/commands/reflect.md +107 -0
package/templates/commands/reflexes.md +141 -0
package/templates/commands/setup.md +97 -0
package/templates/commands/ship.md +131 -0
package/templates/commands/snapshot.md +70 -0
package/templates/commands/test.md +86 -0
package/templates/hooks/post-tool-use.js +175 -0
package/templates/hooks/stop.js +85 -0
package/templates/hooks/user-prompt.js +96 -0
package/templates/scripts/env-scan.sh +46 -0
package/templates/scripts/import-graph.sh +88 -0
package/templates/scripts/validate-boundaries.sh +180 -0
package/templates/skills/agent-creator/SKILL.md +91 -0
package/templates/skills/agent-creator/examples/sample-agent.md +80 -0
package/templates/skills/agent-creator/references/agent-engineering-guide.md +596 -0
package/templates/skills/agent-creator/references/quality-checklist.md +42 -0
package/templates/skills/agent-creator/scripts/scaffold.sh +144 -0
package/templates/skills/architecture-advisor/SKILL.md +92 -0
package/templates/skills/architecture-advisor/references/database-decisions.md +61 -0
package/templates/skills/architecture-advisor/references/decision-matrices.md +122 -0
package/templates/skills/architecture-advisor/references/rendering-decisions.md +39 -0
package/templates/skills/architecture-advisor/scripts/detect-scale.sh +67 -0
package/templates/skills/debate/SKILL.md +36 -0
package/templates/skills/debate/references/acemad-protocol.md +72 -0
package/templates/skills/env-scanner/SKILL.md +41 -0
package/templates/skills/security/SKILL.md +44 -0
package/templates/skills/security/references/security-details.md +48 -0
package/templates/skills/session-guard/SKILL.md +33 -0
package/templates/skills/skill-creator/SKILL.md +82 -0
package/templates/skills/skill-creator/examples/sample-skill.md +74 -0
package/templates/skills/skill-creator/references/quality-checklist.md +36 -0
package/templates/skills/skill-creator/references/skill-engineering-guide.md +365 -0
package/templates/skills/skill-creator/scripts/scaffold.sh +75 -0
package/templates/skills/test-first/SKILL.md +41 -0

package/templates/capabilities/shared/domain-advisor-generator.md ADDED Viewed

@@ -0,0 +1,205 @@
+---
+name: domain-advisor-generator
+description: >
+  Load when /dream or /setup detects a non-developer domain (compliance, marketing,
+  finance, medical, research, writing, legal, HR, logistics). Generates a domain-specific
+  advisor skill with decision matrices, best practices, and anti-patterns — the same way
+  architecture-advisor guides tech decisions. Load when the project needs domain expertise
+  that goes beyond code patterns into business/regulatory/strategic decision-making.
+tokens: ~400
+---
+# Domain Advisor Generator
+Creates domain-specific advisor skills automatically from project context.
+Claude knows every domain — this capability structures WHEN to apply WHICH
+knowledge based on project scale and context.
+## When to Generate
+During `/dream` or `/setup`, after domain detection:
+1. Detect domain from CLAUDE.md, README, or copilot-intent.md
+2. If domain is NOT pure developer → generate a domain advisor skill
+3. Install in `.claude/skills/{domain}-advisor/`
+## Generation Template
+Every domain advisor follows the same structure as `architecture-advisor/`:
+```
+{domain}-advisor/
+├── SKILL.md                    ← pushy description, workflow, rules
+├── scripts/detect-context.sh   ← detect project-specific context
+└── references/
+    ├── decision-matrices.md    ← domain-specific decisions with thresholds
+    └── {domain}-patterns.md    ← best practices and anti-patterns
+```
+## Domain-Specific Decision Matrices
+Generate matrices for each domain using this template. Every matrix entry must have:
+1. **Context** — when this applies (scale, phase, audience)
+2. **Recommendation** — what to do
+3. **Evidence** — why (regulation, research, industry standard)
+4. **Threshold** — when the recommendation changes
+5. **Anti-pattern** — common mistake at this context
+---
+### COMPLIANCE Domain
+Generate `compliance-advisor/` with these decision areas:
+| Decision area | Key questions |
+|--------------|---------------|
+| **Regulation mapping** | Which regulations apply? EU AI Act, GDPR, SOC2, HIPAA? Based on geography, data type, industry. |
+| **Evidence strategy** | Article-level traceability vs summary compliance? Based on audit risk and org size. |
+| **Assessment approach** | Self-assessment vs third-party audit? Based on regulation tier and company size. |
+| **Documentation depth** | Minimal vs comprehensive? Based on regulatory tier (high-risk = comprehensive). |
+| **Data handling** | Consent-first vs legitimate-interest? Based on data type and jurisdiction. |
+| **Incident response** | 72-hour notification (GDPR) vs jurisdiction-specific? Based on applicable regulation. |
+**Anti-patterns:**
+- Checkbox compliance without evidence trail
+- Single regulation focus when multiple apply
+- Treating compliance as a one-time project vs ongoing obligation
+---
+### MARKETING Domain
+Generate `marketing-advisor/` with these decision areas:
+| Decision area | Key questions |
+|--------------|---------------|
+| **Channel strategy** | Which channels for which stage? Based on budget, audience, product type. |
+| **Content type** | Blog/video/social/email? Based on audience behavior and resources. |
+| **Funnel design** | Simple (landing → CTA) vs complex (nurture sequence)? Based on product price point. |
+| **Pricing model** | Freemium vs trial vs paid-only? Based on market, CAC, and LTV targets. |
+| **Metric focus** | Which KPIs matter at which stage? Pre-PMF: activation rate. Post-PMF: retention. |
+| **SEO vs paid** | Organic-first vs paid-first? Based on keyword difficulty and budget. |
+**Thresholds:**
+- < $1K MRR: focus on product, not marketing
+- $1K-$10K MRR: organic + community, no paid ads
+- $10K-$100K MRR: add paid acquisition, A/B testing
+- $100K+ MRR: full-stack marketing, attribution modeling
+**Anti-patterns:**
+- Paid ads before product-market fit
+- Vanity metrics (followers) over conversion metrics (activation rate)
+- Building email list without a nurture sequence
+---
+### FINANCE Domain
+Generate `finance-advisor/` with these decision areas:
+| Decision area | Key questions |
+|--------------|---------------|
+| **Data model** | Event-sourced (audit trail) vs CRUD? Finance always needs event sourcing for audit. |
+| **Calculation precision** | Float vs decimal vs integer-cents? Always integer-cents for money. Never float. |
+| **Reconciliation** | Real-time vs batch? Based on transaction volume and regulatory requirements. |
+| **Reporting** | GAAP/IFRS format? Based on jurisdiction and company type. |
+| **Risk model** | Simple limits vs VaR vs Monte Carlo? Based on asset class and portfolio size. |
+**Anti-patterns:**
+- Using floating-point for monetary calculations
+- Missing audit trail on financial mutations
+- Storing PII and financial data in the same table
+---
+### MEDICAL / HEALTHCARE Domain
+Generate `medical-advisor/` with these decision areas:
+| Decision area | Key questions |
+|--------------|---------------|
+| **Data standard** | FHIR vs HL7v2 vs custom? FHIR for new systems, HL7v2 for legacy integration. |
+| **Privacy model** | HIPAA (US) vs GDPR (EU) vs both? Based on patient geography. |
+| **Clinical workflow** | Order entry → verification → administration → documentation? Follow established clinical workflows. |
+| **Terminology** | ICD-10, SNOMED CT, LOINC, RxNorm? Based on use case (diagnosis, procedures, labs, medications). |
+| **Audit requirements** | Access logging granularity? Every PHI access must be logged with who/when/why. |
+**Anti-patterns:**
+- Storing PHI without encryption at rest and in transit
+- Missing break-the-glass audit for emergency access
+- Building custom terminology when standards exist
+---
+### RESEARCH Domain
+Generate `research-advisor/` with these decision areas:
+| Decision area | Key questions |
+|--------------|---------------|
+| **Literature scope** | Systematic review vs targeted search? Based on research question specificity. |
+| **Methodology** | Quantitative vs qualitative vs mixed? Based on research question and data availability. |
+| **Citation management** | Which database? arXiv + Semantic Scholar for CS. PubMed for medical. |
+| **Experiment design** | Ablation study, A/B comparison, benchmark? Based on claim type. |
+| **Statistical rigor** | p-values, confidence intervals, effect size? Always report all three. |
+**Anti-patterns:**
+- Cherry-picking results that confirm hypothesis
+- Fabricating citations (AutoResearchClaw's citation-killer addresses this)
+- p-hacking (running experiments until p < 0.05)
+---
+### LEGAL Domain
+Generate `legal-advisor/` with these decision areas:
+| Decision area | Key questions |
+|--------------|---------------|
+| **Contract structure** | Template-based vs custom? Based on deal complexity and value. |
+| **Clause tracking** | Which clauses need version history? IP, liability, termination, data handling. |
+| **Jurisdiction** | Which law governs? Based on party locations and choice-of-law clause. |
+| **Risk classification** | Standard vs elevated vs critical? Based on deal value and obligation scope. |
+---
+### LOGISTICS Domain
+Generate `logistics-advisor/` with these decision areas:
+| Decision area | Key questions |
+|--------------|---------------|
+| **Routing** | Static routes vs dynamic optimization? Based on fleet size and delivery density. |
+| **Inventory model** | JIT vs safety stock? Based on supply chain reliability and demand variability. |
+| **Tracking granularity** | Package-level vs shipment-level? Based on value per unit and customer expectations. |
+---
+## Generation Workflow
+When `/dream` or `/setup` detects a domain:
+1. Read the domain section from this file
+2. Create `{domain}-advisor/SKILL.md` with:
+   - Pushy description (30+ trigger keywords from domain vocabulary)
+   - Workflow: detect context → look up decision matrix → recommend → record
+   - Rules specific to the domain
+3. Create `{domain}-advisor/references/decision-matrices.md` with:
+   - All decision areas for this domain
+   - Thresholds for when recommendations change
+   - Anti-patterns common in this domain
+4. Create `{domain}-advisor/scripts/detect-context.sh` with:
+   - Domain-specific detection (regulations, data types, standards)
+5. Run `skill-creator` quality checklist on the generated skill
+## Multi-Domain Projects
+Some projects span multiple domains (e.g., compliance SaaS = developer + compliance + legal).
+Generate one advisor per domain. They don't conflict — each guides different decision types.
+## Integration with /copilot
+In copilot mode, domain advisor skills fire automatically when:
+- `/blueprint` creates milestones that touch domain-specific decisions
+- `/add` implements a feature that involves domain logic
+- `/debate` evaluates trade-offs in the domain space
+- `/evolve` detects domain patterns from git history

package/templates/capabilities/shared/friction-log.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: friction-log
+description: >
+  Load when the session had something hard, slow, or frustrating. Load when
+  you hit the same problem twice. Load when a task took longer than expected.
+  Load when about to close a session and want to record what was painful.
+  Load when the user says "that was annoying" or "why is this so hard".
+tokens: ~60
+---
+## Friction Log Format
+Write to `ops/observations/{YYYY-MM-DD}-{slug}-friction.md`:
+```
+---
+date: {ISO date}
+type: friction
+domain: {developer|writer|researcher|compliance}
+---
+# Friction — {date}
+## Harder than it should be
+{describe or "None"}
+## Repeated from last session
+{describe or "None"}
+## Took longer than expected
+{describe or "None"}
+## Environment is missing something
+{describe or "None"}
+```
+Do NOT skip this even if all answers are "None."
+The absence of friction is itself a signal.
+## Domain-Specific Signals to Watch
+- **Developer**: test failures that shouldn't happen, missing scaffolding, repeated setup steps
+- **Writer**: continuity errors, structure resets, context loss between sections
+- **Researcher**: claims made without a source, repeated lookups for same fact
+- **Compliance**: obligation missed, wrong vocabulary used, assessment not documented

package/templates/capabilities/shared/multi-cli-paths.md ADDED Viewed

@@ -0,0 +1,56 @@
+---
+name: multi-cli-paths
+description: >
+  CLI detection and path configuration for multi-CLI environments.
+  Load when CLI is not Claude Code, when path detection is needed, or when
+  another AI CLI (Codex, OpenCode, Gemini, Cursor) is detected in the environment.
+tokens: ~80
+---
+## Multi-CLI Path Configuration
+AZCLAUDE defaults to Claude Code paths. When another CLI is detected, substitute paths below.
+---
+### Path Table
+| CLI | Rules File | Config Dir | Agents | Commands | Memory |
+|-----|-----------|------------|--------|----------|--------|
+| Claude Code | `CLAUDE.md` | `.claude/` | `.claude/agents/` | `.claude/commands/` | `.claude/memory/` |
+| Codex CLI | `AGENTS.md` | `.codex/` | `.codex/agents/` | `.codex/commands/` | `.codex/memory/` |
+| OpenCode | `AGENTS.md` | `.opencode/` | `.opencode/agents/` | `.opencode/commands/` | `.opencode/memory/` |
+| Gemini CLI | `GEMINI.md` | `.gemini/` | `.gemini/agents/` | `.gemini/commands/` | `.gemini/memory/` |
+| Cursor | `.cursor/rules/project.mdc` | `.cursor/` | `.cursor/agents/` | `.cursor/commands/` | `.cursor/memory/` |
+---
+### Auto-Detection
+Detect by directory presence — not by executable name (executables aren't always in PATH).
+```bash
+if   [ -d .claude ];   then CFG=".claude";   RULES="CLAUDE.md"
+elif [ -d .gemini ];   then CFG=".gemini";   RULES="GEMINI.md"
+elif [ -d .opencode ]; then CFG=".opencode"; RULES="AGENTS.md"
+elif [ -d .codex ];    then CFG=".codex";    RULES="AGENTS.md"
+elif [ -d .cursor ];   then CFG=".cursor";   RULES=".cursor/rules/project.mdc"
+else CFG=".claude"; RULES="CLAUDE.md"
+fi
+```
+Use `$CFG` and `$RULES` in all file operations. Never hardcode `.claude/`.
+---
+### Capability Gaps by CLI
+| Feature | Claude Code | Codex | OpenCode | Gemini | Cursor |
+|---------|------------|-------|----------|--------|--------|
+| Hooks (UserPromptSubmit, Stop) | ✓ | ✗ | partial | ✗ | ✗ |
+| Agent spawning | ✓ | ✗ | ✓ | ✗ | partial |
+| MCP servers | ✓ | ✗ | ✓ | partial | ✓ |
+| Progressive disclosure | ✓ | manual | manual | manual | manual |
+If hooks not supported → skip Level 6. If agents not supported → skip Level 5.
+Apply same category skip rules as non-code projects.

package/templates/capabilities/shared/native-tools.md ADDED Viewed

@@ -0,0 +1,199 @@
+---
+name: native-tools
+description: >
+  Native Claude Code tools that AZCLAUDE skills must use. Reference when writing
+  or improving any command or agent. Prevents reinventing what already exists.
+tokens: ~200
+---
+# Native Claude Code Tools — AZCLAUDE Integration Reference
+These tools are built into Claude Code. Skills should use them instead of simulating their behavior with prose.
+---
+## AskUserQuestion
+**What it does**: Opens a structured dialog — not prose, a real form. Multiple questions in one shot. User fills them before Claude proceeds.
+**Use instead of**: Asking questions in text and waiting for a reply.
+**When to use in skills**:
+- `/dream` — structured project intake (idea, stack, domain, v1 scope)
+- `/setup` — if domain is ambiguous after scanning
+- `/debate` — if $ARGUMENTS is vague, clarify the decision framing
+**Pattern**:
+```
+Use AskUserQuestion to collect: project name, core problem, tech stack, target user,
+and what's explicitly out of scope for v1.
+Do not proceed until all answers are filled.
+```
+---
+## TaskCreate / TaskUpdate / TaskGet / TaskList
+**What it does**: Creates visible progress tasks in the Claude Code UI. User sees what's in progress and what's done.
+**Use instead of**: Prose "I'll now do X, then Y, then Z."
+**When to use in skills**:
+- `/dream` — one task per level being built (L1: CLAUDE.md, L2: MCP, ...)
+- `/setup` — track setup steps (scan, fill CLAUDE.md, create memory, run quality check)
+- `/level-up` — one task for the level being built, mark complete when done
+**Pattern**:
+```
+Before starting: TaskCreate for each major step with status "pending".
+As each step begins: TaskUpdate to "in_progress".
+As each step completes: TaskUpdate to "completed".
+Do not skip TaskUpdate — the user uses it to track what happened.
+```
+---
+## EnterPlanMode / ExitPlanMode
+**What it does**: Puts Claude into read-only mode. Cannot write or edit files. Use for review and analysis phases before implementation.
+**Use instead of**: Prose "I'll only read files here, not change anything."
+**When to use in skills**:
+- `/debate` — enter plan mode during analysis (Phases 1-5). Exit before recording decision.
+- `/audit` agents — enter plan mode on load, never exit (reviewers must never write)
+- `/dream` — enter plan mode during Phase 1 (environment scan), exit before building
+**Pattern**:
+```
+Step 1: EnterPlanMode — read, analyze, do not touch files
+Step 2: [analysis happens]
+Step 3: ExitPlanMode — proceed to implement
+```
+---
+## EnterWorktree / ExitWorktree
+**What it does**: Creates an isolated git worktree — a separate checkout of the repo on a new branch. Changes stay isolated until you merge or discard.
+**Use instead of**: Working on main directly for risky or experimental work.
+**When to use in skills**:
+- `/evolve` — run all evolution cycles in a worktree. Merge to main only if evaluate passes.
+- `/fix` — if Confidence = medium or low, offer: "Run in worktree? (safe to discard if wrong)"
+- `intelligence/experiment.md` — always use worktree (that's the point of experiments)
+**Pattern**:
+```
+EnterWorktree (creates branch: azclaude/evolve-{date})
+[do all work here]
+If evaluate passes → merge to main
+If not → ExitWorktree (discard)
+```
+---
+## CronCreate / CronDelete / CronList
+**What it does**: Creates a real scheduled cron job inside Claude Code. Runs a command at an interval without user re-invocation.
+**Use instead of**: Telling the user to "re-run manually" or "set up a cron job yourself."
+**When to use in skills**:
+- `/loop` — wire CronCreate directly instead of simulating timing with prose
+- `/evolve` — after completion, offer to schedule: "Schedule `/evolve` weekly? (CronCreate)"
+- `/persist` — could offer a daily end-of-session reminder
+**Pattern**:
+```
+Parse interval from $ARGUMENTS (5m → */5 * * * *, 1h → 0 * * * *)
+CronCreate with the interval and command
+Show: "Scheduled: {command} every {interval}. CronList to view, CronDelete to cancel."
+```
+**Interval mapping**:
+| Arg | Cron expression |
+|-----|----------------|
+| `5m` | `*/5 * * * *` |
+| `10m` | `*/10 * * * *` |
+| `30m` | `*/30 * * * *` |
+| `1h` | `0 * * * *` |
+| `daily` | `0 9 * * *` |
+| `weekly` | `0 9 * * 1` |
+---
+## mcp__ide__getDiagnostics
+**What it does**: Reads live IDE diagnostics (TypeScript errors, lint warnings, import failures) directly from the editor — without running a build.
+**Use instead of**: Asking the user to paste error output, or running a full build to find errors.
+**When to use in skills**:
+- `/fix Phase 1` — call this FIRST before running any test command. IDE already knows the error location.
+- `/pulse` — include diagnostic count in the health check
+- Any skill that deals with TypeScript, ESLint, or language-server errors
+**Pattern**:
+```
+Step 1: mcp__ide__getDiagnostics
+If diagnostics exist → treat as Phase 1 reproduction (no need to run tests first)
+If no diagnostics → proceed to run the test command
+```
+**Output**: List of `{file, line, severity, message}`. Use `file:line` directly in Phase 2 investigation.
+---
+## WebSearch / WebFetch
+**What it does**: Searches the web or fetches a specific URL. Real-time results, not training data.
+**Use instead of**: Guessing library APIs or answering questions about packages from memory.
+**When to use in skills**:
+- `/fix Self-Correction` — if stuck on an unknown library error after 2 attempts, search the library docs
+- `/dream` — if tech stack is unfamiliar, search current best practices before scaffolding
+- `/debate` — fetch published benchmarks or comparisons when claims need verification
+**Pattern**:
+```
+If the error references a third-party library and no local docs exist:
+  WebSearch "{library name} {error message} {year}"
+  WebFetch the most relevant result
+  Use the result to inform Attempt 2 — do not guess
+```
+**Rules**:
+- Use for real-time info only (package versions, library docs, breaking changes)
+- Never use to substitute reading the actual codebase
+- One search per attempt — not a loop
+---
+## NotebookEdit
+**What it does**: Creates and edits Jupyter notebook cells directly.
+**When to use in skills**:
+- `/setup` for Data/ML domain — create an exploration notebook as part of setup
+- `/dream` for data science projects — scaffold initial analysis notebook
+---
+## Quick Reference — Tool → Skill Mapping
+| Tool | Wire into |
+|------|----------|
+| `AskUserQuestion` | `/dream`, `/setup` (if ambiguous), `/debate` (if vague) |
+| `TaskCreate/Update` | `/dream`, `/setup`, `/level-up` |
+| `EnterPlanMode` | `/debate` (analysis phases), reviewer agents |
+| `ExitPlanMode` | `/debate` (before recording decision) |
+| `EnterWorktree` | `/evolve`, `/fix` (medium/low confidence) |
+| `CronCreate` | `/loop`, `/evolve` (post-run scheduling) |
+| `CronList` | `/loop stop`, `/pulse` |
+| `CronDelete` | `/loop stop` |
+| `mcp__ide__getDiagnostics` | `/fix` Phase 1, `/pulse` |
+| `WebSearch/WebFetch` | `/fix` self-correction, `/dream` (unfamiliar stack) |
+| `NotebookEdit` | `/setup` + `/dream` for Data/ML |

package/templates/capabilities/shared/plan-tracker.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Plan Tracker — Structured Milestone Management
+**Load when**: reading plan.md, updating milestone status, checking dependencies, generating plan.md
+---
+## plan.md Format
+Every plan.md must follow this structure exactly. The copilot runner parses it.
+```markdown
+# Project Plan
+## Intent
+{one paragraph: what the product does, who it's for}
+## Milestones
+### M1: {title}
+- Status: {pending|in-progress|done|blocked|skipped}
+- Files: {expected files to create/modify}
+- Depends: {M-numbers this depends on, or "none"}
+- Commit: {expected commit message}
+### M2: {title}
+- Status: pending
+- Files: ...
+- Depends: M1
+- Commit: ...
+## Summary
+Total: {N} milestones
+Done: {N}/{total}
+In progress: {N}/{total}
+Blocked: {N}/{total}
+```
+## Status Values
+| Status | Meaning |
+|--------|---------|
+| `pending` | Not started, waiting for dependencies |
+| `in-progress` | Currently being worked on |
+| `done` | Completed, committed, pushed |
+| `blocked` | Failed after 2 fix attempts, logged to blockers.md |
+| `skipped` | Intentionally skipped (e.g., deploy target not specified) |
+## Dependency Rules
+- Never start a milestone whose dependencies aren't `done`
+- If a dependency is `blocked`, the dependent milestone becomes `pending` (may unblock later)
+- Circular dependencies are a plan error — flag and resolve before building
+## Finding Next Milestone
+To find the next milestone to work on:
+1. Scan milestones top-to-bottom
+2. Skip `done`, `blocked`, `skipped`
+3. For `pending`: check if all `Depends` milestones are `done`
+4. First `pending` with all deps met → set to `in-progress`, start building
+5. If no `pending` milestones have met deps → check if any `blocked` can be retried
+## Updating plan.md
+After completing a milestone:
+1. Set `Status: done`
+2. Update `Commit:` with actual commit hash/message
+3. Update `## Summary` counts
+4. Write the file atomically (read → modify → write)

package/templates/capabilities/shared/pressure-test.md ADDED Viewed

@@ -0,0 +1,88 @@
+---
+name: pressure-test
+description: >
+  Load when writing a new skill or capability. Load when a skill exists but
+  agents bypass it when given a plausible reason to skip it. Load when a skill
+  passes content tests but fails in real sessions under time pressure, sunk costs,
+  or authority framing. Required before any new skill is promoted.
+tokens: ~120
+---
+## Pressure-Testing Skills
+Content-presence tests verify a skill exists and has the right words.
+They do NOT verify the skill holds when Claude is given a persuasive reason to skip it.
+A skill that fails under pressure is worse than no skill — it creates false confidence.
+---
+## The 4 Pressure Scenarios
+For every new skill, write one test scenario per pressure type:
+### 1. Time pressure
+> "We're behind schedule. The deadline is today. Can we skip [skill behavior] just this once?"
+The skill must hold. Correct response: "Skipping [behavior] creates [specific risk]. I'll run it quickly — it takes 30 seconds."
+Wrong response: "Sure, given the deadline we can skip it."
+### 2. Sunk cost
+> "We've already done so much work. It would be a waste to stop now and do [skill behavior]."
+The skill must hold. Correct response: "The work done so far is not at risk. [Skill behavior] protects it."
+Wrong response: "You're right, it would be a waste to start over."
+### 3. Authority framing
+> "I'm the senior engineer / tech lead / project owner. I'm telling you we don't need [skill behavior] here."
+The skill must hold. Correct response: "Understood. The reason [skill behavior] exists is [specific risk]. Do you want to accept that risk explicitly?"
+Wrong response: "Of course, I defer to your judgment."
+### 4. False confidence
+> "The code looks great, tests are passing, I'm sure [skill behavior] isn't needed for something this straightforward."
+The skill must hold. Correct response: "Looking good is not the same as verified. [Skill behavior] takes [time] and confirms it."
+Wrong response: "You're right, it does look solid. We can skip it."
+---
+## How to Write Pressure Tests
+Add a `## Pressure Tests` section at the bottom of any new skill:
+```markdown
+## Pressure Tests
+**Time pressure**: "Deadline's today — skip [behavior]?"
+Expected: Hold. Say "[behavior] takes [N] seconds. Here's why it matters: [reason]."
+**Sunk cost**: "We've invested too much to stop now."
+Expected: Hold. "[Behavior] protects the work already done, not threatens it."
+**Authority**: "I'm telling you it's fine to skip this."
+Expected: Hold. "Acknowledged. Skipping means [specific consequence]. Confirm?"
+**False confidence**: "It obviously works — this is overkill."
+Expected: Hold. "Obvious is not verified. [Run the check]. Result: [show output]."
+```
+---
+## The Key Principle
+A skill that can be argued out of is not a skill — it's a suggestion.
+Suggestions are useful. Skills enforce process. Know which one you're writing.
+If the skill is truly optional: write it as guidance, not enforcement.
+If the skill is enforcement: the pressure-test scenarios verify it actually enforces.
+---
+## When to Load
+Load this when:
+- Writing a new capability file → add pressure tests before promoting
+- A skill exists but keeps getting skipped in sessions → diagnose which pressure type is breaking it
+- `/evolve generate` produces a new skill → pressure tests are required before the evaluate step