npm - claude-dev-env - Versions diffs - 1.4.0 → 1.8.0 - Mend

claude-dev-env 1.4.0 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/agents/deep-research.md +170 -0
package/bin/install.mjs +98 -26
package/hooks/HOOK_SPECS_PROMPT_WORKFLOW.md +68 -0
package/hooks/blocking/agent-execution-intent-gate.py +83 -0
package/hooks/blocking/prompt-workflow-stop-guard.py +131 -0
package/hooks/blocking/prompt_workflow_gate_core.py +161 -0
package/hooks/blocking/test_agent_execution_intent_gate.py +106 -0
package/hooks/blocking/test_context_control_policy_files.py +27 -0
package/hooks/blocking/test_prompt_workflow_gate_core.py +68 -0
package/hooks/blocking/test_prompt_workflow_stop_guard.py +144 -0
package/hooks/hooks.json +10 -20
package/package.json +3 -2
package/rules/prompt-workflow-context-controls.md +48 -0
package/skills/agent-prompt/SKILL.md +107 -9
package/skills/deep-research/SKILL.md +80 -0
package/skills/dream/SKILL.md +118 -0
package/skills/prompt-generator/REFINEMENT_PIPELINE_RUNBOOK.md +174 -0
package/skills/prompt-generator/SKILL.md +191 -12
package/skills/research-mode/SKILL.md +53 -0
package/skills/session-log/SKILL.md +237 -0
package/skills/session-tidy/SKILL.md +181 -0
package/skills/skill-writer/REFERENCE.md +160 -122
package/skills/skill-writer/SKILL.md +131 -197
package/LICENSE +0 -21
package/README.md +0 -247

package/skills/session-tidy/SKILL.md ADDED Viewed

@@ -0,0 +1,181 @@
+---
+name: session-tidy
+description: Audit, clean, and consolidate session logs in the Obsidian vault. Fixes format drift, resolves orphaned next-steps, updates stale statuses, and generates project rollup summaries. Use when session logs feel cluttered or before starting a new project milestone. Triggers on '/session-tidy', 'tidy sessions', 'clean up session logs', 'session audit'.
+disable-model-invocation: true
+---
+# Session Tidy: Session Log Consolidation
+## Overview
+Audit and consolidate session logs in the Obsidian vault (`sessions/[Project]/`) by enforcing the format contract, categorizing uncategorized files into project subfolders, resolving orphaned next-steps, updating stale statuses, and generating project rollup summaries.
+**Announce at start:** "Running session log consolidation."
+**Context:** Standalone maintenance utility. Run periodically, after completing a project milestone, or when starting fresh on a project. Companion to `/dream` (memory consolidation) and `/session-log` (session creation).
+## The Format Contract
+Source: `journal:session-log` skill definition. Read that skill's SKILL.md if you need to verify the authoritative format.
+**Directory structure:** `sessions/[Project]/[N]. [Title].md`
+- Each project gets its own subfolder under `sessions/`
+- Session files are named `[N]. [Title].md` where Title is a 2-5 word summary of the primary outcome (e.g., `3. Amazon Auth Migration.md`)
+- Summary files are named `Summary.md` in the same project folder
+- No parenthetical suffixes like `(Wrap-Up)`
+- Files missing a title (e.g., `Session 1.md` or `1.md`) should be flagged for renaming -- read the content to derive a title
+**Required frontmatter fields:**
+```yaml
+---
+type: session-report
+project: "[name]"
+session: [N]
+date: "[YYYY-MM-DD]"
+status: "completed" | "in-progress" | "blocked"
+blocked: true | false
+tags: ["session", "[project-tag]"]
+---
+```
+**Content structure:**
+- Section headers (`###`) use one emoji + outcome-oriented title
+- Valid emojis: ✅ (done), 🚫 (blocked), ⚠️ (note), 🔧 (in-progress), 📋 (queued)
+- Explanatory paragraphs under each header, not just bullets
+- Tables for 3+ rows of structured data
+- No play-by-play or process narration
+## The Process
+### Phase 0: Preflight
+Determine which storage backend is available. Try in this order and use the first that succeeds:
+1. **Headless vault** -- run `ob --version` via Bash to verify the obsidian-headless CLI is installed. Then check `OBSIDIAN_VAULT_PATH` environment variable or `~/.claude/vault/` for a vault directory. If the CLI exists and a vault directory resolves, set `backend = "headless"`.
+2. **Obsidian MCP** -- call `mcp__obsidian__list_directory` with `path="sessions"`. If it succeeds, set `backend = "obsidian"`.
+3. **Local vault** -- use `~/.claude/vault/` as a local vault directory. Create it via `mkdir -p ~/.claude/vault/sessions` if it does not exist. Set `backend = "local"`.
+**Headless and local capability notes:** When running without Obsidian MCP, the following operations use alternative methods:
+- **Move/rename:** `mv` via Bash
+- **Frontmatter update:** Read + modify YAML + Write
+- **Search:** Bash `ls` + Grep
+### Phase 1: Audit
+List project subdirectories in `sessions/` via `mcp__obsidian__list_directory`. For each project folder, read its files via `mcp__obsidian__read_note`. Also check for any files sitting directly in `sessions/` (uncategorized). For each file, check:
+1. **Naming convention?** Must match `[N]. [Title].md` where N is the session number and Title is a 2-5 word outcome summary. Flag bracket prefixes, date-only names, parenthetical suffixes, or legacy `Session [N].md` patterns missing a title.
+2. **Frontmatter complete?** All required fields present: `type`, `project`, `session`, `date`, `status`, `blocked`, `tags`.
+3. **Type correct?** Must be `session-report`. Flag other types like `rule-audit`.
+4. **Status coherent?** `status: "completed"` with `blocked: true` is contradictory. `status: "in-progress"` or `status: "blocked"` on sessions older than 7 days is likely stale.
+5. **Orphaned next-steps?** Scan content for sections containing "Next", "Queued", "Session N+1", "TODO", or clipboard emoji sections. Cross-reference against subsequent sessions for the same project to determine if the items were addressed.
+6. **Categorized?** Files sitting directly in `sessions/` (not in a project subfolder) are uncategorized. Infer the project name from the filename pattern (e.g., `BudgetBridge Session 3.md` belongs in `sessions/BudgetBridge/`) or from the frontmatter `project` field. Flag for move into the correct subfolder.
+### Phase 2: Propose Changes
+Present a structured report:
+**Format violations** -- files with wrong naming, missing/wrong frontmatter fields, wrong type
+**Stale statuses** -- in-progress or blocked sessions older than 7 days, contradictory status+blocked combos
+**Orphaned next-steps** -- queued items from session N with no evidence of resolution in session N+1 or later. For each item, state whether it was:
+  - **Resolved:** found evidence in a later session
+  - **Orphaned:** no later session addresses it
+  - **Unknown:** later sessions exist but don't clearly address or skip the item
+**Uncategorized files** -- files in `sessions/` root that should be moved into a project subfolder
+**Rollup candidates** -- projects with 3+ sessions that have no rollup/summary session
+**Proposed actions** -- numbered list of specific changes
+Do NOT execute any changes yet. Wait for user approval.
+### Phase 3: Execute
+After user approves (all or selected items):
+1. **Rename** non-conforming files via `mcp__obsidian__move_note`.
+2. **Fix frontmatter** via `mcp__obsidian__update_frontmatter` -- set correct type, add missing fields.
+3. **Update stale statuses** -- change old in-progress/blocked to completed (or ask user for correct status).
+4. **Clean orphaned next-steps** -- for confirmed orphaned items, either:
+   - Remove the section if all items are orphaned and the session is old
+   - Add a strikethrough or "(not pursued)" annotation if some items remain relevant
+   - Leave unchanged if user declines
+5. **Generate project rollups** -- for each qualifying project, write a summary note via `mcp__obsidian__write_note`:
+   - **Path:** `sessions/[Project]/Summary.md`
+   - **Frontmatter:** `{"type": "project-summary", "project": "[name]", "sessions": [count], "date_range": "[first] to [last]", "tags": ["summary", "[project-tag]"]}`
+   - **Content:** Use this template:
+```markdown
+## [Project] -- Project Summary
+### Timeline
+| # | Title | Date | Status |
+|---|-------|------|--------|
+| 1 | [title from filename] | [date] | ✅/🔧/🚫 |
+### Key Outcomes
+- **Session [N]:** [one-line summary of primary outcome]
+### Current Status
+[One paragraph: where the project stands now, what's active, what's done]
+### Carried Forward
+- [ ] [unresolved item from latest session]
+```
+6. **Categorize uncategorized files** -- move files from `sessions/` root into the correct project subfolder via `mcp__obsidian__move_note`. When moving, rename to match the `[N]. [Title].md` convention (e.g., `sessions/BudgetBridge Session 3.md` becomes `sessions/BudgetBridge/3. [Title derived from content].md`).
+### Phase 4: Verify
+After execution, re-read the vault and confirm:
+- Every session file lives in a project subfolder under `sessions/[Project]/`
+- Every file has valid frontmatter with all required fields
+- Every filename matches the naming convention (`[N]. [Title].md` or `Summary.md`)
+- No contradictory status/blocked combos
+- All orphaned next-steps resolved or annotated
+- Rollup summaries exist for qualifying projects
+Report the results: files renamed, frontmatter fixed, statuses updated, next-steps resolved, rollups generated.
+If Phase 1 finds zero issues across all checks, skip Phases 2-4 and report: "All session logs are tidy. Nothing to do."
+## Output Format
+Phase 2 report structure:
+```
+## Session Tidy Report
+### Format Violations (X found)
+- [file] -- [issue]
+### Stale Statuses (X found)
+- [file] -- [status] since [date] ([age] days ago)
+### Uncategorized Files (X found)
+- [file] -- move to sessions/[Project]/[new filename]
+### Orphaned Next-Steps (X found)
+- [file] -- "[item text]" -- [resolved/orphaned/unknown]
+### Rollup Candidates (X projects)
+- [project] -- [N] sessions, [date range], no summary exists
+### Proposed Actions
+1. [action] -- [file] -- [reason]
+2. ...
+Approve all, select by number, or cancel?
+```
+## After Completion
+Report summary: files renamed, frontmatter fixes, status updates, next-steps cleaned, rollups generated.
+## Best Practices
+- Run after finishing a multi-session project
+- Run before starting a new milestone on an existing project (clears stale context)
+- Cross-reference next-steps manually when the skill flags "unknown" -- it cannot determine intent from content alone
+- The 7-day staleness threshold is a heuristic -- adjust based on how frequently you session-log
+- Rollup summaries are not a replacement for individual session logs -- they are a navigational aid

package/skills/skill-writer/REFERENCE.md CHANGED Viewed

@@ -1,26 +1,45 @@
-# Skill Writer Reference
+# Skill writer -- reference
-## Table of Contents
+## Canonical resources
-1. [Complete Frontmatter Fields](#complete-frontmatter-fields)
-2. [Progressive Disclosure Architecture](#progressive-disclosure-architecture)
-3. [Content Templates by Degree of Freedom](#content-templates)
-4. [Validation Checklist](#validation-checklist)
+When authoring or refining skills, ground decisions in these sources. If guidance conflicts, defer to the higher tier.
+### Tier 1: Anthropic (primary authority for Claude)
+- https://platform.claude.com/docs/en/claude-code/skills -- official skill structure, frontmatter fields, progressive disclosure, string substitutions, directory layout.
+- https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices -- the single living reference for Claude's latest models. Covers XML tags, tool use, thinking, agentic systems, overeagerness, anti-hallucination.
+- https://transformer-circuits.pub/2026/emotions/index.html -- emotion concepts research (April 2026): 171 internal activation patterns that causally influence behavior. Key skill-writing takeaways: clear criteria and escape routes improve output quality, collaborative framing activates engagement, positive task framing correlates with better results.
+- https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking -- adaptive thinking reference; replaces manual budget_tokens with effort-based control.
+### Tier 2: Major labs (strong secondary, often transfers across models)
+- https://platform.openai.com/docs/guides/prompt-engineering -- six strategies: clear instructions, reference text, split complex tasks, give models time to think, use external tools, test systematically.
+- https://deepmind.google/research/ -- chain-of-thought and structured reasoning research.
+### Tier 3: Community and individuals (supplementary)
+- https://simonwillison.net/ -- practical LLM experiments, skill patterns, and automation insights.
+- https://www.deeplearning.ai/short-courses/ -- foundational prompt engineering courses.
+- https://www.latent.space/ -- AI engineering perspective.
+### Conflict resolution rule
+If sources disagree on a technique, apply in order: Anthropic documentation first (it describes the actual model behavior), then OpenAI/Google/Microsoft (large-scale research with cross-model relevance), then community sources (patterns and intuition, not authoritative on model internals). When Tier 3 contradicts Tier 1, Tier 1 wins without exception.
 ---
-## Complete Frontmatter Fields
+## Complete frontmatter fields
 Source: [Claude Code Skills](https://platform.claude.com/docs/en/claude-code/skills)
-### Required Fields
+### Required fields
 | Field | Type | Constraints | Description |
 |-------|------|-------------|-------------|
 | `name` | string | Lowercase, hyphens, numbers. Max 64 chars. No `anthropic` or `claude`. | Must match directory name. Prefer gerund form. |
 | `description` | string | Max 1024 chars. Third person. No XML tags. | What it does + when to use it + trigger phrases. |
-### Optional Fields
+### Optional fields
 | Field | Type | Default | Description |
 |-------|------|---------|-------------|
@@ -33,10 +52,10 @@ Source: [Claude Code Skills](https://platform.claude.com/docs/en/claude-code/ski
 | `disable-model-invocation` | boolean | `false` | Set `true` for manual-only via `/name` |
 | `paths` | string/list | (all files) | Glob patterns limiting activation. E.g., `"*.py"` or `["*.ts", "*.tsx"]` |
 | `argument-hint` | string | (none) | Autocomplete hint. E.g., `[filename] [format]` |
-| `shell` | string | `bash` | Shell for `!`command`` blocks. `bash` or `powershell` |
+| `shell` | string | `bash` | Shell for `` !`command` `` blocks. `bash` or `powershell` |
 | `hooks` | object | (none) | Hooks scoped to this skill's lifecycle |
-### String Substitutions (available in SKILL.md body)
+### String substitutions (available in SKILL.md body)
 | Variable | Description |
 |----------|-------------|
@@ -45,9 +64,9 @@ Source: [Claude Code Skills](https://platform.claude.com/docs/en/claude-code/ski
 | `$N` | Shorthand for `$ARGUMENTS[N]` (e.g., `$0`, `$1`) |
 | `${CLAUDE_SESSION_ID}` | Current session ID |
 | `${CLAUDE_SKILL_DIR}` | Directory containing SKILL.md |
-| `` !`command` `` | Dynamic context injection - shell command runs before Claude sees content |
+| `` !`command` `` | Dynamic context injection -- shell command runs before Claude sees content |
-### Permission Syntax
+### Permission syntax
 ```
 Skill(name)        # Allow exact skill
@@ -56,191 +75,210 @@ Skill(name *)      # Allow skill with any arguments
 ---
-## Progressive Disclosure Architecture
+## Progressive disclosure architecture
-Source: [Agent Skills Overview](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview)
+Source: [Claude Code Skills](https://platform.claude.com/docs/en/claude-code/skills)
 Skills load in three levels to minimize context usage:
-| Level | What Loads | Token Cost | When |
+| Level | What loads | Token cost | When |
 |-------|-----------|------------|------|
 | **1. Metadata** | `name` and `description` from frontmatter | ~100 tokens | Always (system prompt) |
 | **2. Instructions** | SKILL.md body | <5k tokens | When triggered by matching request |
 | **3. Resources** | Additional files, scripts, schemas | Unlimited | When referenced from instructions |
-### Key implications:
-- You can install many skills with minimal context cost (~100 tokens each)
-- SKILL.md body should stay under 500 lines
-- Scripts execute via bash - their code never enters context, only output does
-- Reference files load only when Claude reads them
+### Implications for skill authors
+- Many skills can coexist with minimal context cost (~100 tokens each at Level 1)
+- SKILL.md body should stay under 500 lines to fit the <5k token budget
+- Scripts execute via bash -- their code never enters context, only output does
+- Reference files load only when Claude reads them via `@` reference or explicit Read
+- Heavy lookup tables, field mappings, and long examples belong in REFERENCE.md or separate files
 ---
-## Content Templates
+## Content templates by degree of freedom
-### High Freedom (Advisory)
+### High freedom (advisory)
-Best for guidance where multiple approaches are valid.
+For guidance where multiple approaches are valid. States goals and acceptance criteria without prescribing exact steps.
 ```markdown
 ---
 name: analyzing-data
-description: "Analyzes datasets and generates statistical summaries. Use when working with CSV or Excel data requiring descriptive statistics, correlations, or visualizations."
+description: >-
+  Analyzes datasets and generates statistical summaries. Use when working
+  with CSV or Excel data requiring descriptive statistics, correlations,
+  or visualizations. Triggers: 'analyze data', 'statistical summary'.
 ---
-# Analyzing Data
+# Analyzing data
-## Overview
+**Core principle:** Let the data structure guide the analysis approach.
-Performs statistical analysis on tabular datasets.
+## When this skill applies
-**Announce at start:** "I'm using the analyzing-data skill."
+Trigger when the user has tabular data and wants statistical insight,
+trend identification, or visualization.
-## Instructions
+## Goals
-1. Load and inspect the data structure
-2. Generate descriptive statistics
-3. Identify correlations and patterns
+1. Inspect the data structure and report shape, types, missing values
+2. Generate descriptive statistics appropriate to the data types
+3. Identify correlations and patterns worth highlighting
 4. Produce visualizations if requested
-## Examples
+## Output format
-**Input:** "Analyze sales_2024.csv for trends"
-**Output:** Summary statistics, monthly trend chart, top performers
+Summary with key findings first, then supporting tables and charts.
+Explain statistical choices so the user can evaluate appropriateness.
-## Best Practices
+## Examples
-- Always show data shape and types before analysis
-- Handle missing values explicitly
-- Use appropriate chart types for the data
+**Input:** "Analyze sales_2024.csv for trends"
+**Output:** Monthly trend summary, top performers, seasonal patterns,
+with charts for the clearest visual signals.
 ```
-### Medium Freedom (Structured Workflow)
+### Medium freedom (structured workflow)
-Best for preferred patterns with some variation allowed.
+For preferred patterns with room for adaptation. States a recommended sequence but allows deviation when justified.
 ```markdown
 ---
 name: reviewing-plans
-description: "Validates implementation plans against code standards, TDD compliance, and right-sized engineering. Use after writing plans and before executing them. Triggers: 'review plan', 'validate plan', 'check plan'."
+description: >-
+  Validates implementation plans against code standards and TDD compliance.
+  Use after writing plans and before executing them.
+  Triggers: 'review plan', 'validate plan', 'check plan'.
 ---
-# Reviewing Plans
-## Overview
+# Reviewing plans
-**Core principle:** Bad plans produce bad code. Review before you execute.
+**Core principle:** Bad plans produce bad code. Validate before executing.
-**Announce at start:** "I'm using the reviewing-plans skill to validate this plan."
+## When this skill applies
-**Context:** Use after write-plan and before plan-executor. Quality gate between planning and implementation.
+Use after a plan has been written and before execution begins.
+This is the quality gate between planning and implementation.
-## The Process
+## The process
-### Step 1: Identify Plan Files
-Locate plan files in `.planning/phases/` or `docs/plans/`.
+### Step 1: Locate plan files
+Find plan files in `.planning/phases/` or `docs/plans/`.
-### Step 2: Review Dimensions
-Check structure, TDD compliance, code quality, right-sized engineering, task granularity.
+### Step 2: Review dimensions
+Check each dimension: structure completeness, TDD compliance,
+code quality standards, right-sized engineering, task granularity.
-### Step 3: Report Verdict
-READY or NEEDS REVISION with specific issues.
+### Step 3: Report verdict
+Deliver READY or NEEDS REVISION with specific issues and locations.
-## Output Format
+## Output format
-| Dimension | Status |
-|-----------|--------|
-| Structure | PASS/FAIL |
-| TDD | PASS/FAIL |
-| Code quality | PASS/FAIL |
-## Red Flags - STOP
-- Any placeholder text ("implement later")
-- Missing TDD steps for production code
-- Magic values in code blocks
-## Rationalization Prevention
-| Excuse | Reality |
-|--------|---------|
-| "The plan is high-level" | Plans without complete code produce inconsistent implementations |
-| "TDD makes it too long" | TDD in plan prevents skipping TDD during execution |
+| Dimension | Status | Notes |
+|-----------|--------|-------|
+| Structure | PASS/FAIL | [specific finding] |
+| TDD | PASS/FAIL | [specific finding] |
+| Code quality | PASS/FAIL | [specific finding] |
 ```
-### Low Freedom (Critical/Exact)
+### Low freedom (exact sequence)
-Best for fragile operations where consistency is critical.
+For fragile operations where deviation causes silent failures. States precisely what to do, in what order, with validation at each step.
 ```markdown
 ---
 name: filling-pdf-forms
-description: "Fills PDF forms using pdf-lib JavaScript library with exact field mapping. Use when populating PDF forms programmatically. Triggers: 'fill PDF form', 'populate form', 'form filling'."
+description: >-
+  Fills PDF forms using pdf-lib with exact field mapping and validation.
+  Use when populating PDF forms programmatically.
+  Triggers: 'fill PDF form', 'populate form', 'form filling'.
 allowed-tools: Bash(node *), Read
 ---
-# Filling PDF Forms
+# Filling PDF forms
-## MANDATORY PROTOCOL
+**Core principle:** Extract field names first, then map, then fill.
+Guessing field names produces silently empty forms.
-Before filling ANY form:
-1. [ ] Read `${CLAUDE_SKILL_DIR}/FORMS.md` for field mapping reference
-2. [ ] Extract field names: `node ${CLAUDE_SKILL_DIR}/scripts/extract_fields.js input.pdf`
-3. [ ] Match extracted fields against the mapping
+## Before filling any form
+1. Read `${CLAUDE_SKILL_DIR}/FORMS.md` for the field mapping reference
+2. Extract field names: `node ${CLAUDE_SKILL_DIR}/scripts/extract_fields.js input.pdf`
+3. Match extracted fields against the mapping before writing any fill code
 ## Workflow
-### Step 1: Extract Fields
+### Step 1: Extract fields
 ```bash
 node ${CLAUDE_SKILL_DIR}/scripts/extract_fields.js "$0"
 ```
-### Step 2: Generate Fill Script
-Use the field mapping from FORMS.md. Every field must be explicitly set.
+### Step 2: Generate fill script
+Use the field mapping from FORMS.md. Set every field explicitly.
-### Step 3: Execute and Validate
+### Step 3: Execute and validate
 ```bash
 node fill_script.js && node ${CLAUDE_SKILL_DIR}/scripts/validate_fill.js output.pdf
 ```
-### Feedback Loop
-If validation fails -> read error output -> fix field mapping -> re-execute -> re-validate.
-## Critical Rules
-**NEVER guess field names.** Always extract first.
-**WHY:** Wrong field names silently produce empty forms.
+### If validation fails
+Read error output, fix the field mapping, re-execute, re-validate.
+Do not proceed with a partially filled form.
 ```
 ---
-## Validation Checklist
+## Validation checklist
-Source: [Best Practices Checklist](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices)
+Source: [Claude Code Skills](https://platform.claude.com/docs/en/claude-code/skills)
-### Core Quality
-- [ ] Description is specific, includes key terms, and is in **third person**
-- [ ] Description includes both what the Skill does and when to use it
-- [ ] SKILL.md body is **under 500 lines**
-- [ ] Additional details are in separate files (if needed)
-- [ ] No time-sensitive information (or clearly marked as legacy)
+### Structure
+- [ ] SKILL.md in correct location (matches `name` directory)
+- [ ] Valid YAML frontmatter with required fields
+- [ ] Body under 500 lines
+- [ ] Heavy content in separate reference files
+### Description quality
+- [ ] Third person ("Analyzes..." not "Analyze...")
+- [ ] Includes what it does and when to use it
+- [ ] Contains trigger phrases matching natural user language
+- [ ] Under 1024 characters
+- [ ] No XML tags in description
+### Content quality
+- [ ] Core principle stated (one sentence)
+- [ ] Steps are sequential and numbered
+- [ ] States desired behavior in positive terms
+- [ ] Motivation provided for constraints (why, not just what)
+- [ ] Examples are concrete with specific inputs and outputs
+- [ ] No time-sensitive claims (or clearly dated)
 - [ ] Consistent terminology throughout
-- [ ] Examples are concrete, not abstract
-- [ ] File references are **one level deep** from SKILL.md
-- [ ] Files >100 lines have a **table of contents**
-- [ ] Workflows have clear steps
-- [ ] All file paths use **forward slashes**
-### Code and Scripts
-- [ ] Scripts solve problems rather than punt to Claude
-- [ ] Error handling is explicit and helpful
-- [ ] No "voodoo constants" (all values justified)
-- [ ] Required packages listed and verified as available
-- [ ] MCP tools use **fully qualified names** (`ServerName:tool_name`)
-- [ ] Validation/verification steps for critical operations
-- [ ] Feedback loops included for quality-critical tasks
-### Testing
-- [ ] At least 3 evaluation scenarios created
-- [ ] Tested with representative real tasks
-- [ ] If multi-model: tested with Haiku, Sonnet, and Opus
+- [ ] File references are one level deep from SKILL.md
+- [ ] Files over 100 lines have a table of contents
+### Technical correctness
+- [ ] `allowed-tools` specified if skill needs specific tools
+- [ ] `argument-hint` included if skill accepts arguments
+- [ ] `paths` set if skill applies only to certain file types
+- [ ] String substitutions use correct syntax
+- [ ] Script references use `${CLAUDE_SKILL_DIR}` for portability
+- [ ] MCP tools use fully qualified names (`ServerName:tool_name`)
+### Testing readiness
+- [ ] At least 3 evaluation scenarios identified
+- [ ] Covers a typical case, an edge case, and a decline/clarify case
+- [ ] Golden rule: a colleague could follow this skill without extra context
+---
+## Evaluation loop
+For skill drafts that need iteration:
+1. Trigger the skill with 2-3 representative user requests.
+2. Note failure modes (missed triggers, wrong output format, overstepped scope).
+3. Tighten trigger phrases, add examples, or adjust degree of freedom for the failure class only.
+Anthropic's **self-correction chaining** pattern extends this: generate a draft, have Claude review it against the validation checklist, then refine based on the review.