npm - claude-dev-env - Versions diffs - 1.4.0 → 1.8.0 - Mend

claude-dev-env 1.4.0 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/agents/deep-research.md +170 -0
package/bin/install.mjs +98 -26
package/hooks/HOOK_SPECS_PROMPT_WORKFLOW.md +68 -0
package/hooks/blocking/agent-execution-intent-gate.py +83 -0
package/hooks/blocking/prompt-workflow-stop-guard.py +131 -0
package/hooks/blocking/prompt_workflow_gate_core.py +161 -0
package/hooks/blocking/test_agent_execution_intent_gate.py +106 -0
package/hooks/blocking/test_context_control_policy_files.py +27 -0
package/hooks/blocking/test_prompt_workflow_gate_core.py +68 -0
package/hooks/blocking/test_prompt_workflow_stop_guard.py +144 -0
package/hooks/hooks.json +10 -20
package/package.json +3 -2
package/rules/prompt-workflow-context-controls.md +48 -0
package/skills/agent-prompt/SKILL.md +107 -9
package/skills/deep-research/SKILL.md +80 -0
package/skills/dream/SKILL.md +118 -0
package/skills/prompt-generator/REFINEMENT_PIPELINE_RUNBOOK.md +174 -0
package/skills/prompt-generator/SKILL.md +191 -12
package/skills/research-mode/SKILL.md +53 -0
package/skills/session-log/SKILL.md +237 -0
package/skills/session-tidy/SKILL.md +181 -0
package/skills/skill-writer/REFERENCE.md +160 -122
package/skills/skill-writer/SKILL.md +131 -197
package/LICENSE +0 -21
package/README.md +0 -247

package/skills/agent-prompt/SKILL.md CHANGED Viewed

@@ -9,8 +9,8 @@ description: >-
   agent delegation with prompt quality.
 ---
-@~/.claude/skills/prompt-generator/SKILL.md
-@~/.claude/skills/prompt-generator/REFERENCE.md
+@packages/claude-dev-env/skills/prompt-generator/SKILL.md
+@packages/claude-dev-env/skills/prompt-generator/REFERENCE.md
 # Agent Prompt
@@ -20,7 +20,9 @@ The prompt-generator skill above defines the prompt-crafting workflow. This skil
 ## When this skill applies
-Trigger when the user wants to delegate a task to an agent. The difference from /prompt-generator: this skill **executes**.
+Trigger only when the user explicitly wants to delegate or execute a task with an agent.
+`/prompt-generator` is the default owner for prompt authoring and refinement. This skill starts after explicit execution intent.
 When invoked with arguments (e.g. `/agent-prompt fix the auth bug via TDD`), treat the arguments as the task to build a prompt for and execute.
@@ -30,13 +32,13 @@ When invoked with arguments (e.g. `/agent-prompt fix the auth bug via TDD`), tre
 Follow the prompt-generator workflow steps 1 through 8 exactly as written. Classify the prompt type, set degree of freedom, collect missing facts, build the prompt with XML tags and role, control format and style, add examples if needed, and self-check against the rubric.
-Skip step 9 (Deliver). Continue below instead.
+After steps 1-8, continue directly to step 9 for context gathering; deliverables are handled through the orchestration flow below.
 ### Step 9: Gather context before crafting
 The agent starts with zero conversation history. Before building the prompt, use Read, Glob, Grep, and other research tools to gather the concrete values the agent will need -- file paths, function signatures, existing patterns, branch names. Embed these directly in the prompt instead of telling the agent to "find" them.
-The agent-spawn-protocol rule requires this: if any context question has the answer "I don't know", investigate first. Do not delegate the context-gathering.
+The agent-spawn-protocol rule requires this: if any context question has the answer "I don't know", investigate first, then delegate with complete context.
 Proactive context gathering enables agents to plan effectively from the start. Anthropic's emotion concepts research (2026) found that agents produce higher-quality output when they understand constraints, available tools, and system boundaries upfront — they incorporate these into their approach naturally, leading to better first attempts and more accurate results.
@@ -55,9 +57,46 @@ Always set `run_in_background: true`.
 Generate a descriptive `name` (3-5 words, kebab-case) so the user can track progress and send follow-up messages via `SendMessage({to: name})`.
-### Step 11: Present for approval
+### Step 10A: Section-refinement orchestration mode (default for execution tasks)
+Execution behavior: run this deterministic orchestration for delegated prompt work after explicit launch intent.
+Prompt authoring and prompt refinement ownership remain in `/prompt-generator`.
+Use simplified mode when either condition is true:
+- The user explicitly requests single-agent execution
+- The task is genuinely too small for orchestration (for example, one quick read/search)
+This mode is triggered when execution input includes `pipeline_mode: internal_section_refinement_with_final_audit` or equivalent execution-ready orchestration metadata.
+If present, carry forward the scope block (`target_local_roots`, `target_canonical_roots`, `target_file_globs`, `comparison_basis`, `completion_boundary`) so execution remains artifact-bound.
+Execution launch payload must include `execution_intent: explicit`.
+1. Spawn exactly 6 refinement agents, one per section in fixed order:
+   - `role`
+   - `context`
+   - `instructions`
+   - `constraints`
+   - `output_format`
+   - `examples`
+2. Enforce section-only scope in each sub-prompt:
+   - "Edit `<SECTION_NAME>` and preserve all other sections unchanged."
+3. Require section output contract from each agent:
+   - `improved_block`
+   - `rationale`
+   - `concise_diff`
+4. Merge outputs into one canonical prompt after all 6 refiners finish.
+5. Run one final audit agent against the merged prompt and checklist.
+6. If audit fails, apply targeted fixes and re-run audit with capped retries (`max_retries: 2` unless user overrides).
-Use AskUserQuestion with one question. The question text should summarize the agent config (type, mode, name). Each option should use the `preview` field to show the full crafted prompt.
+Run all stages in this exact order.
+### Step 11: Present for approval (must reflect default orchestration)
+Use AskUserQuestion with one question. The question text must summarize:
+- agent config (type, mode, name)
+- orchestration mode (`section_refinement_with_final_audit` by default)
+- retry cap for audit loop
+Each option should use the `preview` field to show the full crafted prompt.
 Options:
 1. "Launch it" (recommended) -- preview shows the crafted prompt
@@ -77,10 +116,12 @@ On **"Cancel"**: acknowledge and stop.
 When building the prompt in step 4, these adjustments ensure the agent can work independently:
 **Context completeness** -- include file paths, line numbers, function names, branch state, and anything you learned during step 9. The agent cannot see this conversation.
+Bind execution steps to the scope block artifacts passed from refinement output whenever available.
+Keep runtime context compact: include only actionable facts required for execution.
 **Acceptance criteria** -- state what "done" looks like. For code: include the test command. For research: specify the output format and save location.
-**Scope boundary** -- include "Only make changes directly requested; do not refactor surrounding code" or equivalent. Agents without scope constraints tend to over-engineer.
+**Scope boundary** -- include "Make requested changes and keep surrounding code stable" or equivalent. Agents with explicit scope constraints stay aligned to task intent.
 **Constraints from this project** -- if the project has CODE_RULES.md, TDD requirements, or naming conventions, include the relevant subset in the prompt so the agent follows them.
@@ -92,11 +133,68 @@ When building the prompt in step 4, these adjustments ensure the agent can work
 **Temp file cleanup** -- If the agent may create scratch files during iteration, include cleanup instructions. Anthropic: "If you create any temporary new files, scripts, or helper files for iteration, clean up these files by removing them at the end of the task."
+## Final audit-agent stage requirements (for default section-refinement mode)
+After merge, run one dedicated audit agent that validates the full prompt against:
+- Prompt-generator rubric requirements (`packages/claude-dev-env/skills/prompt-generator/SKILL.md`)
+- The deterministic checklist from the handoff artifact
+- Embedded research-mode evidence constraints below
+Required audit output shape:
+```json
+{
+  "overall_status": "pass|fail",
+  "checklist_results": [
+    {
+      "check_id": "structured_scoped_instructions",
+      "status": "pass|fail",
+      "evidence_quote": "word-for-word quote",
+      "source_ref": "path-or-url",
+      "fix_if_fail": "targeted correction"
+    }
+  ],
+  "corrective_edits": ["..."],
+  "retry_count": 0
+}
+```
+### Embedded research-mode policy text (audit behavior)
+The audit agent must enforce these constraints as policy text in the audit prompt (do not rely on a global mode switch):
+- "Every recommendation, claim, or piece of advice must cite a specific source."
+- "Ground your response in word-for-word quotes, not paraphrased summaries."
+- "If you don't have a credible source for a claim, say 'I don't know'."
+- Source priority:
+  1. Official vendor/creator docs for external tools
+  2. Local project files for local behavior
+  3. Academic or named expert sources
+  4. Reputable external sources with URLs
+  5. Blogs/community posts (lowest)
+Policy source: `packages/claude-dev-env/skills/prompt-generator/REFINEMENT_PIPELINE_RUNBOOK.md`
+## Section-refinement acceptance criteria
+Section-refinement orchestration is done only when all are true:
+- All 6 section agents ran, each scoped to exactly one section
+- Merge produced one canonical prompt containing all six sections
+- Final audit returned `overall_status: pass`
+- Any non-pass audit was resolved through targeted revisions within retry cap
+- AskUserQuestion approval gate was honored before launch
+- Final user artifact includes one complete pasteable prompt block
 ## Constraints
-- Always present for approval via AskUserQuestion -- never auto-spawn
+- Present every launch for approval via AskUserQuestion before spawning
 - Always run agents in background
 - Gather context before crafting -- do not send an agent in blind
+- Start only after explicit user execution intent; keep prompt authoring/refinement in `/prompt-generator`
+- Default to `section_refinement_with_final_audit` orchestration for execution tasks unless user requests simplified mode
+- Include `execution_intent: explicit` in Task/Agent launch prompts so runtime hooks can enforce deterministic gating
 - If the task is too small for an agent (single file read, quick grep), say so and just do it directly
 - Include obstacle handling: "When encountering obstacles, do not use destructive actions as a shortcut (e.g. --no-verify, discarding unfamiliar files)" -- agents without this guidance may take irreversible shortcuts
 - Frame agent tasks with collaborative language and include permission to express uncertainty — agents produce higher-quality output with collaborative briefing (Anthropic emotion concepts research, 2026)

package/skills/deep-research/SKILL.md ADDED Viewed

@@ -0,0 +1,80 @@
+---
+name: deep-research
+description: "Deep Research mode — iterative multi-source research producing comprehensive Obsidian reports with citations. Official-docs-first methodology. Triggers: '/deep-research [topic]'"
+argument-hint: "TOPIC or RESEARCH QUESTION"
+---
+# Deep Research
+You orchestrate a two-phase deep research pipeline. Phase 1 happens here (main thread). Phase 2 is delegated to the `deep-research` agent.
+## Phase 1: Build the Research Prompt (Interactive Q&A)
+The user's raw topic is: `$ARGUMENTS`
+Your job is to turn this raw topic into a precise, well-scoped research brief using prompt-generator methodology. Follow these steps:
+### Step 1: Classify and assess
+Silently determine:
+- **Complexity**: Is this a narrow factual question or a broad landscape survey?
+- **Ambiguity**: Can you research this as-is, or does it need scoping?
+- **Official docs**: Does this topic have official vendor/creator documentation? If yes, that is the primary source and must be consulted first.
+### Step 2: Ask clarifying questions
+Use AskUserQuestion to ask 1-3 questions. Choose from these dimensions based on what's genuinely unclear — skip any that are obvious from context:
+- **Audience**: "Who is this research for?" (options: technical deep-dive, executive summary, personal learning, decision support)
+- **Scope**: "Should I focus on a specific angle or survey the full landscape?" (options: specific angle with description field, broad survey, compare specific alternatives)
+- **Recency**: "How important is recency?" (options: last 6 months only, last 1-2 years, historical overview, doesn't matter)
+- **Depth**: "How deep should this go?" (options: quick overview 5-10 sources, standard 15-20 sources, exhaustive 25+ sources)
+Skip clarification entirely only if the topic is already narrow, unambiguous, and the audience is obvious.
+### Step 3: Construct the research brief
+From the user's answers, write a structured research brief:
+```
+<research_brief>
+  <topic>The original topic, cleaned up</topic>
+  <official_docs>Known official documentation sources, or "none identified" if the topic lacks vendor docs</official_docs>
+  <scope>Exactly what to research — boundaries, inclusions, exclusions</scope>
+  <audience>Who this is for and what they need</audience>
+  <depth>Target source count and iteration budget</depth>
+  <output>What the final deliverable looks like</output>
+  <key_questions>
+    1. Specific question the research must answer
+    2. Another specific question
+    3. ...
+  </key_questions>
+</research_brief>
+```
+Show the brief to the user. Ask: "Does this capture what you need, or should I adjust the scope?"
+### Step 4: Set iteration budget
+Map the user's depth preference to iteration count:
+- Quick overview: 8 iterations
+- Standard (default): 15 iterations
+- Exhaustive: 25 iterations
+## Phase 2: Launch the Research Agent
+Once the brief is confirmed, spawn the `deep-research` agent using the Agent tool with:
+- **subagent_type**: `deep-research`
+- **prompt**: The full `<research_brief>` XML block from Step 3, plus the iteration budget
+- **mode**: `bypassPermissions` (research agent needs unrestricted tool access for web searches)
+- **description**: "Deep research: [short topic summary]"
+The agent handles everything from here: iteration, state tracking, and Obsidian output.
+When the agent returns, summarize:
+1. Where the report was saved (Obsidian path)
+2. How many sources were consulted (official vs secondary)
+3. Any gaps or limitations noted
+Then clean up temporary files: `.deep-research-state.md`

package/skills/dream/SKILL.md ADDED Viewed

@@ -0,0 +1,118 @@
+---
+name: dream
+description: Consolidate, prune, and reorganize auto memory files. Simulates Auto Dream -- fixes format drift, deduplicates facts, enforces index structure. Use when memory feels stale or cluttered. Triggers on '/dream', 'consolidate memory', 'clean up memory', 'dream'.
+disable-model-invocation: true
+---
+# Dream: Memory Consolidation
+## Overview
+Consolidate auto memory by enforcing the format contract, pruning stale content, deduplicating facts, and rebuilding MEMORY.md as a clean index.
+**Announce at start:** "Running memory consolidation (dream)."
+**Context:** Standalone maintenance utility. Run periodically or when memory feels cluttered. Simulates the Auto Dream feature that is in gradual rollout.
+## The Format Contract
+Source: Claude Code client system prompt + [official docs](https://code.claude.com/docs/en/memory).
+**MEMORY.md is an index, not a memory.** It should contain only one-line pointers to topic files:
+- Format: `- [Title](file.md) -- one-line hook`
+- Target: under ~150 characters per entry
+- Hard limit: 200 lines / 25KB (only this much loads at session start)
+- No content, tables, or multi-line facts directly in MEMORY.md
+**Topic files require frontmatter:**
+```yaml
+---
+name: {{topic name}}
+description: {{one-line description}}
+type: {{user | feedback | project | reference}}
+---
+```
+**Organization:** Semantic by topic, not chronological by session.
+## The Process
+### Phase 1: Audit
+Read MEMORY.md and every file in the memory directory. For each file, check:
+1. **Frontmatter present?** Must have name, description, type fields.
+2. **Type correct?** Must be one of: user, feedback, project, reference.
+3. **Named by topic?** Files named `session-YYYY-MM-DD-*` should be renamed to their actual topic.
+For MEMORY.md, check:
+1. **Index entries only?** Flag any line that is NOT a `- [Title](file.md)` link or a `##` section header.
+2. **Content leaking into index?** Flag inline facts, tables, multi-line bullet points.
+3. **Under 200 lines?** Flag if approaching the limit.
+### Phase 2: Propose Changes
+Present a structured report to the user with these sections:
+**Format violations** -- files missing frontmatter, content in MEMORY.md
+**Stale content** -- items older than 14 days with forward-looking TODOs or "Next/Pending" sections that may be completed
+**Duplicates** -- facts that appear in both topic files and MEMORY.md inline, or across multiple topic files
+**Rename candidates** -- session-dated files that should be topic-named
+**Proposed actions** -- numbered list of specific changes (extract, merge, prune, rename, add frontmatter)
+Do NOT execute any changes yet. Wait for user approval.
+### Phase 3: Execute
+After user approves (all or selected items):
+1. **Extract** inline MEMORY.md content into new or existing topic files with proper frontmatter.
+2. **Add frontmatter** to files that lack it.
+3. **Rename** session-dated files to topic names.
+4. **Deduplicate** by removing redundant copies (keep the most complete version).
+5. **Prune** stale forward-looking content (TODOs, "Next" sections) from old files.
+6. **Rebuild MEMORY.md** as a clean index -- one-line entries only, grouped by `##` section headers.
+### Phase 4: Verify
+After execution, read the rebuilt MEMORY.md and confirm:
+- Every entry is a one-line link
+- Every referenced file exists and has valid frontmatter
+- No orphaned files (files in directory but not in index)
+- Total line count under 200
+Report the results: files changed, lines saved, violations fixed.
+## Output Format
+Phase 2 report structure:
+```
+## Dream Report
+### Format Violations (X found)
+- [file] -- [issue]
+### Stale Content (X flagged)
+- [file] -- [what's stale] -- [age]
+### Duplicates (X found)
+- [fact] -- appears in [file1] and [file2]
+### Proposed Actions
+1. [action] -- [file] -- [reason]
+2. ...
+Approve all, select by number, or cancel?
+```
+## After Completion
+Report summary: files modified, created, renamed, deleted. Line count before/after.
+## Best Practices
+- Run after long sessions or when starting fresh on a project
+- Check stale flags manually -- dream cannot verify if TODOs were completed without reading the actual codebase
+- The 14-day staleness threshold is a heuristic, not a hard rule
+- When in doubt about whether to prune, flag it for the user rather than proposing deletion

package/skills/prompt-generator/REFINEMENT_PIPELINE_RUNBOOK.md ADDED Viewed

@@ -0,0 +1,174 @@
+# Prompt Refinement Pipeline Runbook
+## Purpose
+Validate deterministic behavior for:
+1. Base prompt generation (`/prompt-generator`)
+2. Six section refiners (owned by `/prompt-generator`)
+3. Merge + final audit with citation-grounded checks
+4. Targeted fix + capped re-audit loop
+## Sample Input
+Use this command:
+```text
+/prompt-generator Create a trusted final system prompt for a coding agent that edits files safely, follows user scope, and returns concise status updates.
+```
+## Expected Stage Artifacts
+1. **Base stage**
+   - Scope block is present and explicit:
+     - `target_local_roots`
+     - `target_canonical_roots` (if applicable)
+     - `target_file_globs`
+     - `comparison_basis`
+     - `completion_boundary`
+   - XML scaffold includes all sections:
+     - `<role>`
+     - `<context>`
+     - `<instructions>`
+     - `<constraints>`
+     - `<output_format>`
+     - `<examples>`
+   - Includes internal refinement object with:
+     - `pipeline_mode: internal_section_refinement_with_final_audit`
+     - `required_sections` list with all six sections
+     - section/merge/audit output contracts
+2. **Section refinement stage**
+   - Exactly 6 agent runs, one per section.
+   - Each section output includes:
+     - `improved_block`
+     - `rationale`
+     - `concise_diff`
+   - No section agent edits another section.
+3. **Merge stage**
+   - One canonical merged prompt with all six sections.
+4. **Audit stage**
+   - Output includes:
+     - `overall_status`
+     - `checklist_results`
+     - `corrective_edits`
+     - `retry_count`
+   - Every checklist item includes:
+     - `status`
+     - `evidence_quote` (direct quote)
+     - `source_ref`
+     - `fix_if_fail`
+5. **Final output**
+   - One complete prompt block that is copy-pasteable.
+   - Internal refinement object is not shown unless debug output was requested.
+   - Default output must not leak the raw internal refinement object fields.
+## Deterministic Checklist Coverage
+Audit report must include all check IDs:
+- `structured_scoped_instructions`
+- `sequential_steps_present`
+- `positive_framing`
+- `acceptance_criteria_defined`
+- `safety_reversibility_language`
+- `no_destructive_shortcuts_guidance`
+- `concrete_output_contract`
+- `scope_boundary_present`
+- `explicit_scope_anchors_present`
+- `all_instructions_artifact_bound`
+- `no_ambiguous_scope_terms`
+- `completion_boundary_measurable`
+- `citation_grounding_policy_present`
+- `source_priority_rules_present`
+## Citation and Grounding Validation
+For each factual compliance claim in the audit:
+- Include a source citation
+- Include a word-for-word quote
+- If unsupported, explicitly return "I don't know"
+Source priority must be applied in this order:
+1. Official vendor docs (external behavior)
+2. Local project files (local behavior)
+3. Academic / named experts
+4. Reputable external URLs
+5. Blog/community content
+## Non-pass Loop Validation
+If `overall_status` is `fail`:
+1. Apply only targeted edits listed in `corrective_edits`
+2. Re-run audit
+3. Stop after retry cap (`max_retries: 2` unless explicitly overridden)
+4. Return unresolved failures with evidence if still failing at cap
+## Ownership and Execution-Intent Validation
+- Prompt refinement remains inside `/prompt-generator`.
+- `/agent-prompt` is used only after explicit execution/delegation intent.
+- Execution launch metadata includes `execution_intent: explicit`.
+- Final refined prompt content is treated as artifact text during refinement and audit.
+- Execution steps (when requested) are bound to scope block artifacts.
+## Scope-Phrasing Validation
+- Reject ambiguous scope wording such as "this session", "current files", "here", "above", or "as needed" when used as scope boundaries.
+- Require artifact-bound replacements using explicit roots, globs, comparison basis, and measurable completion boundary.
+## Runtime Hook Gate Validation
+Validate fail-closed runtime gates:
+1. **Execution-intent gate (PreToolUse Task/Agent)**
+   - Deny execution when `execution_intent: explicit` marker is missing.
+   - Deny execution when required scope anchors are missing from launch payload.
+2. **Stop leakage/scope/checklist gate**
+   - Block responses that leak raw internal refinement object fields unless debug intent is explicit.
+   - Block responses missing deterministic checklist rows when audit output is present.
+   - Block responses using ambiguous scope phrasing in scope-bound sections.
+## Context-Footprint Controls
+- Keep baseline prompt-workflow policy minimal by default.
+- Store stable enforcement text in hooks/rules; avoid repeating full policy blocks in prompt artifacts.
+- Load heavy skills on demand based on explicit task intent.
+- Prefer canonical references and compact outputs over repeated long policy text.
+## Deterministic vs Semantic Boundary
+- **Deterministic (fail-closed):**
+  - Missing execution intent marker
+  - Missing required scope anchors
+  - Raw internal object leakage without debug intent
+  - Missing required checklist rows in audit output
+  - Ambiguous scope terms in scope-bound text
+- **Semantic-only (auditor layer):**
+  - Overall quality/readability of scope wording beyond banned-term checks
+  - Whether instruction binding quality is "good enough" beyond explicit anchor presence
+  - Whether context compaction is optimal for a specific task
+## Doc Alignment Validation
+Each major workflow requirement added in skills text must map to at least one principle:
+- Structured/scoped instructions
+- Clear sequential process
+- Positive framing
+- Explicit acceptance criteria
+- Concrete output format contract
+- Reversibility/safety constraints
+## Traceability Validation
+Each major requirement in skill text should point to:
+- Anthropic best-practice URL, and/or
+- Local source file path used as authority