npm - claude-dev-env - Versions diffs - 1.16.1 → 1.17.1 - Mend

claude-dev-env 1.16.1 → 1.17.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/package.json +1 -1
package/skills/prompt-generator/ARCHITECTURE.md +17 -0
package/skills/prompt-generator/REFERENCE.md +8 -2
package/skills/prompt-generator/SKILL.md +58 -33
package/skills/prompt-generator/TARGET_OUTPUT.md +29 -16
package/skills/prompt-generator/evals/prompt-generator.json +28 -8
package/skills/prompt-generator/templates/skill-from-ground-up.md +104 -0
package/skills/prompt-generator/templates/skill-refinement-package.md +109 -0
package/skills/skill-builder/SKILL.md +20 -0
package/skills/skill-builder/references/delegation-map.md +2 -0
package/skills/skill-builder/workflows/improve-skill.md +9 -0
package/skills/skill-builder/workflows/new-skill.md +12 -0
package/skills/skill-builder/workflows/polish-skill.md +9 -0
package/skills/skill-writer/SKILL.md +18 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
     "name": "claude-dev-env",
-    "version": "1.16.1",
+    "version": "1.17.1",
     "description": "Claude Code development standards — rules, hooks, agents, commands, and skills",
     "type": "module",
     "bin": {

package/skills/prompt-generator/ARCHITECTURE.md ADDED Viewed

@@ -0,0 +1,17 @@
+# prompt-generator — file map
+Baseline inventory of files in the prompt-generator skill package.
+## Baseline inventory
+| Path | Role |
+| --- | --- |
+| `SKILL.md` | Orchestrator rules, subagent contract, compliance audit |
+| `TARGET_OUTPUT.md` | User-visible output contract for evals and hooks |
+| `REFERENCE.md` | Tiered sources, harness patterns, debug schema |
+| `REFINEMENT_PIPELINE_RUNBOOK.md` | Evidence-grounding runbook |
+| `evals/prompt-generator.json` | Scenario eval rows |
+| `templates/skill-from-ground-up.md` | Net-new skill checkpoint template |
+| `templates/skill-refinement-package.md` | Existing-skill refinement template |
+| `hooks/blocking/prompt-workflow-stop-guard.py` | Stop gate + clipboard |
+| `hooks/blocking/prompt_workflow_gate_core.py` | Fence extraction, markers |

package/skills/prompt-generator/REFERENCE.md CHANGED Viewed

@@ -39,6 +39,12 @@ When authoring or refining prompts, ground decisions in these sources. If guidan
 If sources disagree on a technique, apply in order: Anthropic documentation first (it describes the actual model behavior), then OpenAI/Google/Microsoft (large-scale research with cross-model relevance), then community sources (patterns and intuition, not authoritative on model internals). When Tier 3 contradicts Tier 1, Tier 1 wins without exception.
+### Outcome preview gate and digest (`prompt-generator`)
+Human checkpoint before the paste-ready artifact ships: the orchestrator runs an **Outcome preview** turn (`### Outcome preview` bullets built from the **preview summary**, defined in SKILL.md Terminology) plus **AskUserQuestion** (**Ship** recommended first, two contextual alternates, **Refine with free text**), then emits `Audit`, a single ` ```xml ` fence, and **`## Outcome digest`** after the fence. Rationale matches collaborative checkpoints in `templates/skill-from-ground-up.md` and the refinement pattern in `templates/skill-refinement-package.md`. `ARCHITECTURE.md` lists all files in this skill package.
+**Clipboard safety:** `prompt_workflow_gate_core.extract_fenced_xml_content` concatenates every ` ```xml ` block in the message—follow the sample formatting rules in SKILL.md section 7 so clipboard copy stays the lone artifact body. Full contract: `TARGET_OUTPUT.md`.
 ## Harness design patterns (Anthropic blog, April 2026)
 Primary URL: https://claude.com/blog/harnessing-claudes-intelligence. Structured inventory: `docs/references/anthropic-harnessing-claudes-intelligence-technique-inventory.md`.
@@ -203,7 +209,7 @@ When deciding how to approach a problem, choose an approach and commit to it. Av
 ## Debug JSON schema (prompt-generator pipeline)
-Use **only** when the user explicitly requests debug output (for example `show debug`, `full audit table`, `raw internal object`). Default assistant turns stay **audit line + one `xml` fence**; this object is an optional appendix after that pair.
+Use **only** when the user explicitly requests debug output (for example `show debug`, `full audit table`, `raw internal object`). Default assistant turns complete the normal handoff first: **audit line** + one `xml` fence + **`## Outcome digest`** + optional hook validation block (defined in SKILL.md Terminology; see also `TARGET_OUTPUT.md`); this JSON object is an optional appendix **after** that handoff (and after any hook validation block).
 Shape (field names stable for internal audit helpers and Stop-hook leak detection):
@@ -247,4 +253,4 @@ Shape (field names stable for internal audit helpers and Stop-hook leak detectio
 }
 ```
-`checklist_results` keys must include all **14** compliance row ids from `SKILL.md` §11 (for example `reversible_action_and_safety_check_guidance`, `scope_terms_explicit_and_anchored`).
+`checklist_results` keys must include all **15** compliance row ids from `SKILL.md` §11 (for example `reversible_action_and_safety_check_guidance`, `scope_terms_explicit_and_anchored`).

package/skills/prompt-generator/SKILL.md CHANGED Viewed

@@ -4,10 +4,11 @@ description: >-
   Authors repository-grounded XML prompt artifacts for Claude: system and developer
   instructions, agent harnesses, tool-use patterns, evaluation rubrics, NotebookLM audio
   customization, and MCP or browser automation steering. Gathers scope through discovery
-  and AskUserQuestion, runs the default refinement pipeline in a drafting subagent, and
-  delivers a one-line audit plus one fenced XML block. Trigger when the user asks to write,
-  refine, or improve steering text for Claude. Execution of the described work belongs in
-  /agent-prompt only after the user explicitly confirms they want it run.
+  and AskUserQuestion, runs the default refinement pipeline in a drafting subagent, runs a
+  mandatory Outcome preview AskUserQuestion gate, then delivers a one-line audit, one fenced
+  XML block, and a skimmable Outcome digest after the fence. Trigger when the user asks to
+  write, refine, or improve steering text for Claude. Execution of the described work belongs
+  in /agent-prompt only after the user explicitly confirms they want it run.
 ---
 @packages/claude-dev-env/skills/prompt-generator/REFERENCE.md
@@ -21,37 +22,40 @@ description: >-
 **Harness hygiene:** Re-test harness assumptions about what Claude cannot do alone on each model generation or major product release—stale compensations bottleneck performance as capabilities improve (Hook 1; [Harnessing Claude's intelligence](https://claude.com/blog/harnessing-claudes-intelligence), inventory `docs/references/anthropic-harnessing-claudes-intelligence-technique-inventory.md`).
-**Eval contract:** The user-visible behavior this skill must satisfy is defined in `packages/claude-dev-env/skills/prompt-generator/TARGET_OUTPUT.md`. Automated evals live in `packages/claude-dev-env/skills/prompt-generator/evals/prompt-generator.json`.
+**Eval contract:** The user-visible behavior this skill must satisfy is defined in `packages/claude-dev-env/skills/prompt-generator/TARGET_OUTPUT.md`. Automated evals live in `packages/claude-dev-env/skills/prompt-generator/evals/prompt-generator.json`. **File map:** `ARCHITECTURE.md` lists all files in this skill package and their roles.
-**Terminology:** **Prompt artifact** — the full XML inside the single user-facing `xml` fence (the paste-ready handoff). **Scope block** — the five-key contract in §3A that grounds instructions. **Default refinement pipeline** — §10: base draft → section refine → merge → 15-row compliance audit → capped fixes (subagent-internal unless draft-only). **Light self-check** — §8: fast pre-return sanity pass (shape, tools, scope, patterns); *not* the compliance audit. **Compliance audit (15-row)** — §11: hook-keyed rows that set the `Audit: pass|fail` numerator. **Execution handoff** — `/agent-prompt` after explicit user intent to run work.
+**Templates:** Under `packages/claude-dev-env/skills/prompt-generator/templates/`, `skill-from-ground-up.md` is the collaborative prompt for **net-new** checkpointed Agent Skill packages; `skill-refinement-package.md` is the sibling prompt for **existing-skill** multi-file refinements and package-aware polish. Skill-builder and skill-writer in this repo require implementers to use the matching template before checkpointed package work.
-**Hook-survival invariant (read first):** The fenced XML artifact is the primary deliverable and MUST survive Stop-hook retries. If a Stop hook rejects the response, only the surrounding audit summary and runtime signal scaffolding may change between retries—the XML inside the fence MUST be re-emitted in full on every retry. Recovery pattern: re-emit the complete fenced XML first, then adjust the audit line. Trimming, summarizing, or deferring the prompt artifact to satisfy a hook gate is forbidden.
+**Terminology:** **Prompt artifact** — the full XML inside the single user-facing `xml` fence (the paste-ready output). **Outcome digest** — skimmable `## Outcome digest` markdown **after** that fence on the final turn: what executing the prompt produces, inputs or tools, done criteria, short sample (see `TARGET_OUTPUT.md`). **Outcome preview gate** — mandatory `AskUserQuestion` **after** internal drafting returns candidate XML and **before** the final `Audit` line ships; uses `### Outcome preview` bullets plus confirmation options (**Ship** first, two contextual alternates, **Refine with free text**). **Preview summary** — structured fields the drafting subagent returns to the orchestrator: `final_prompt_xml`, `what_executor_produces`, `primary_inputs_or_tools`, `done_when`, `sample_excerpt_markdown` (about twenty lines; follow the sample formatting rules in SKILL.md section 7). **Scope block** — the five-key contract in §3A that grounds instructions. **Default refinement pipeline** — §10: base draft → section refine → merge → 15-row compliance audit → capped fixes (subagent-internal unless draft-only). **Light self-check** — §8: fast pre-return pass on output shape, tools, scope, and patterns; *not* the compliance audit. **Compliance audit (15-row)** — §11: hook-keyed rows that set the `Audit: pass|fail` numerator. **Execution handoff** — `/agent-prompt` after explicit user intent to run work. **Hook validation block** — structured fields appended after the Outcome digest for hook validation. Fields: `overall_status`, `checklist_results` rows, five scope-anchor tokens, `base_minimal_instruction_layer: true` (signals to the `prompt-workflow-guard` hook that the response includes the required minimal instruction scaffolding: scope anchors, checklist rows, and runtime signals), and `on_demand_skill_loading: true` (signals to the same hook that heavy skills were loaded only when the task explicitly required them, per section 17 context-footprint controls). All other files reference this single definition.
-**Turn shape:** Each orchestrator turn is either **AskUserQuestion** only (then wait for answers), or **`Audit: …` + exactly one `xml` fenced block** (then **send boundary**)—per `TARGET_OUTPUT.md`. Do not substitute free-form question paragraphs for AskUserQuestion; do not append commentary after the closing fence on the default path.
+**Hook-survival invariant (read first):** The fenced XML artifact is the primary deliverable and MUST survive Stop-hook retries. If a Stop hook rejects the response, only the surrounding audit line, **Outcome digest**, hook validation block, and runtime signal scaffolding may change between retries—the XML inside the fence MUST be re-emitted in full on every retry. **Recovery order (user-visible):** match the standard output order: **Audit line** → complete ` ```xml ` block → **## Outcome digest** → optional hook validation block. Re-send that full sequence on each retry attempt; keep the fenced XML payload byte-identical until the gate passes; adjust only the audit prefix or non-fence scaffolding when required. Trimming, summarizing, or deferring the prompt artifact to satisfy a hook gate is forbidden.
-**Happy path:** (1) Choose scenario **1–4** from the router table. (2) Run discovery when that scenario calls for repo tools. (3) Collect answers through **AskUserQuestion** (one form per round, **2–4** options per field, recommended first). (4) Subagent produces XML, runs **light self-check**, then **15-row compliance audit** + refinement loop. (5) Orchestrator prints **`Audit: pass 15/15`** or **`Audit: fail N/15 — [reason]`** and the **complete fenced XML**. (6) **Send boundary:** end the message immediately after the closing fence. (7) If the user names a debug phrase, append the full table / JSON per `TARGET_OUTPUT.md`.
+**Turn shape:** Each orchestrator turn is one of: **AskUserQuestion** only (then wait); **Outcome preview** turn (`### Outcome preview` markdown bullets + **AskUserQuestion** only); or the **final handoff** (`Audit:` + one `xml` fence + `## Outcome digest` + optional hook validation block)—per `TARGET_OUTPUT.md`. Do not substitute free-form question paragraphs for scope clarifications; preview bullets are statements, not standalone interrogative paragraphs.
+**Happy path:** (1) Choose scenario **1–4** from the router table. (2) Run discovery when that scenario calls for repo tools. (3) **AskUserQuestion** (one form per round, **2–4** options per field, recommended first). (4) Subagent produces XML plus **preview summary**, runs **light self-check**, then **15-row compliance audit** + refinement loop. (5) Orchestrator emits **Outcome preview** turn from the preview summary; user confirms or refines (up to **three** preview rounds unless the user raises the cap in chat). (6) Orchestrator prints **`Audit: pass 15/15`** or **`Audit: fail N/15 — [reason]`**, the **complete fenced XML**, then **`## Outcome digest`**, then any hook validation block hooks expect. (7) If the user names a debug phrase, append the full table / JSON **after** digest and hook validation block per `TARGET_OUTPUT.md`.
 **Clarity bar:** Ship concrete, outcome-first copy everywhere (AskUserQuestion fields, audit line, XML body): name *what* to do, *where* it applies, and *how* to verify done—per [Be clear and direct](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#be-clear-and-direct) and [Control the format of responses](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#control-the-format-of-responses). This skill **authors** prompts; downstream execution stays out of the default path until `/agent-prompt`.
 ## Primary mission: paste-ready XML prompts (overrides other delivery instructions)
-**Delivery contract:** Each completed request yields a **repo-grounded XML prompt** a human or agent can paste into a new session. Turns go to discovery, **AskUserQuestion**, subagent drafting, and internal audits until that artifact is ready. **Author vs execution:** this skill ends at the artifact; when the user wants edits, tests, or PRs run for real, they confirm and move to **`/agent-prompt`**.
+**Delivery contract:** Each completed request yields a **repo-grounded XML prompt** a human or agent can paste into a new session, preceded by confirmation at the **Outcome preview gate** and followed by an **Outcome digest** for skimming. Turns go to discovery, **AskUserQuestion**, subagent drafting, preview gate, optional refinement loops, then the final handoff. **Author vs execution:** this skill ends at the artifact plus digest; when the user wants edits, tests, or PRs run for real, they confirm and move to **`/agent-prompt`**.
-**Hook-survival invariant:** Treat the fenced XML as the immutable payload for the user. On every Stop-hook retry, print the **same full** XML between the opening and closing fences; adjust only the one-line audit prefix (or other non-fence scaffolding) if a hook requires a format tweak. Re-emit the **entire** XML body before tweaking surrounding text—never shorten the artifact to pass a gate.
+**Hook-survival invariant:** Treat the fenced XML as the immutable payload for paste operations. On every Stop-hook retry, print the **same full** XML between the opening and closing fences; adjust only the one-line audit prefix (or other non-fence scaffolding) if a hook requires a format tweak. Re-emit the **entire** XML body before tweaking surrounding text—never shorten the artifact to pass a gate. Re-emit the **Outcome digest** after the fence on the same schedule as the successful audit line.
-**Orchestrator vs subagent:** The **orchestrator** runs ordered discovery, issues **AskUserQuestion**, and owns the **final** user-visible line: audit + fence. The **subagent** owns base draft, per-section refinement, merge, and the **15-row compliance audit**, returning **only** final XML plus pass/fail counts (no user-facing table)—unless the user asked for **draft-only** / **no refinement**, in which case you may draft inline with the same output shape. Keep hook retries internal; expose at most one short line such as `Retrying: scope anchor missing` before the successful audit + fence.
+**Orchestrator vs subagent:** The **orchestrator** runs ordered discovery, issues **AskUserQuestion**, runs the **Outcome preview gate**, and owns the **final** handoff: audit + fence + digest (+ hook validation block when used). The **subagent** owns base draft, per-section refinement, merge, the **15-row compliance audit**, and returns **final XML string + pass/fail counts + preview summary** to the orchestrator (no user-facing compliance table)—unless the user asked for **draft-only** / **no refinement**, in which case you may draft inline with the same preview + handoff shape. Keep hook retries internal; expose at most one short line such as `Retrying: scope anchor missing` before the successful handoff.
-**Interaction shape:** Route clarifications through **AskUserQuestion** only. Close each successful artifact turn with **audit line + one fenced XML block**; keep implementation plans **inside** that XML for the downstream consumer, not as a chat to-do list.
+**Interaction shape:** Route scope clarifications through **AskUserQuestion** only. Close each successful run with **audit line + one fenced XML block + Outcome digest**; keep implementation plans **inside** the fenced XML for the downstream consumer, not as a chat to-do list.
 ## User-visible output contract (mandatory)
 Match `TARGET_OUTPUT.md`. Summary:
-1. **Questions:** Use **AskUserQuestion** for every clarification (one multi-field form per round); keep normal assistant text free of standalone question paragraphs.
+1. **Questions:** Use **AskUserQuestion** for every scope clarification (one multi-field form per round); keep normal assistant text free of standalone question paragraphs outside preview bullets.
 2. **Options:** Supply **2–4** options per question, **recommended option first**; label discovery-sourced choices **`[discovered]`**.
-3. **Final message (exactly):** Line 1 = `Audit: pass 15/15` or `Audit: fail N/15 — [short reason]`; immediately after, output **one** Markdown code fence whose language tag is `xml` and whose body is the **complete** prompt; **send boundary** = right after that fence closes—the visible message is exactly those two consecutive blocks, copy-ready together, before any later user message.
-4. **Full audit table / JSON debug object:** Append only after the user uses an explicit debug phrase such as `show debug`, `full audit table`, or `raw internal object`.
-5. **Commit-and-execute:** Pick a drafting approach, run it to completion, ship the XML; change plans only when **new** facts from the user or tools contradict the earlier scope.
+3. **Outcome preview turn:** `### Outcome preview` bullet block (preview summary) plus **AskUserQuestion** with **Ship this outcome profile** first, two contextual alternates, **Refine with free text**; cap at three preview rounds unless the user raises the cap in chat.
+4. **Final message:** Line 1 = `Audit: pass 15/15` or `Audit: fail N/15 — [short reason]`; then **one** ` ```xml ` fence with the **complete** prompt; then **`## Outcome digest`**; then optional hook validation block (checklist rows, scope anchors, runtime signals) for hooks—**paste-ready section** remains the single `xml` fence for downstream paste.
+5. **Full audit table / JSON debug object:** Append only after the user uses an explicit debug phrase such as `show debug`, `full audit table`, or `raw internal object`, and only **after** digest and hook validation block.
+6. **Commit-and-execute:** Pick a drafting approach, carry it through preview confirmation, ship the handoff; change plans only when **new** facts from the user or tools contradict the earlier scope.
 **Required XML sections** inside the fence: `<role>`, `<background>`, `<instructions>`, `<constraints>`, `<output_format>`. Optional: `<illustrations>`, `<open_question>` (use for unresolved discovery — see structural invariant D in `TARGET_OUTPUT.md`).
@@ -64,15 +68,17 @@ Match `TARGET_OUTPUT.md`. Summary:
 | **3 — Long unstructured input** | Many requirements / paths in one message | Verify repo references (packages, shared utils, configs) with targeted tools **before** questions | First question **confirms extracted intent**; ambiguities as **specific** options; **every** user-stated requirement captured in the generated XML by name — track all requirements from the unstructured input and confirm coverage before shipping |
 | **4 — Noisy context** | Long unrelated thread before `/prompt-generator` | Build the subagent brief from: the user’s literal `/prompt-generator` text, a **≤120-word** summary of on-topic facts, and discovery notes—**exclude** raw stack traces and unrelated tangents | As needed (often Scenario 1-shaped) |
+**Final handoff (all scenarios):** After drafting, every run uses the **Outcome preview** turn, then the final message **Audit** → ` ```xml ` → `## Outcome digest` → optional hook validation block (`TARGET_OUTPUT.md`).
 **Handoff (Scenario 2):** `<background>` must be **self-contained** — state, **decisions**, files touched, next steps, constraints — so a new session needs no prior chat. Preserve prior decisions verbatim in the handoff; quote the exact decision text where precision matters rather than paraphrasing it away.
 ## Phase ordering (structural invariant A)
 For the **final** user-visible turn that ships the artifact:
-- Compose the message as **audit line → opening fence → XML → closing fence → end**; keep the byte stream free of `tool_use` blocks **between** the opening and closing fences.
+- Compose the message as **audit line → opening fence → XML → closing fence → `## Outcome digest` → optional hook validation block → end**; keep the byte stream free of `tool_use` blocks **between** the opening and closing fences.
 - **Completeness:** End every numbered step inside `<instructions>` with a complete sentence and a fully written list item. Balance every XML tag explicitly (open and close each `<role>`, `<background>`, `<instructions>`, `<constraints>`, `<output_format>`). The artifact must be copy-pasteable into a new file with zero manual repair.
-- Global pipeline: **discovery tools** (when applicable) → **AskUserQuestion** → **subagent** (draft + refinement + internal audit) → **one** orchestrator reply containing only audit line + fence.
+- Global pipeline: **discovery tools** (when applicable) → **AskUserQuestion** → **subagent** (draft + refinement + internal audit + **preview summary**) → **Outcome preview** turn → optional refinement loops → **one** orchestrator reply with audit line + fence + digest (+ hook validation block).
 ## Interactive discovery mode (default)
@@ -93,12 +99,22 @@ Issue **one** AskUserQuestion with all fields populated from discovery and the u
 Spawn a **subagent** (Agent tool) with:
-- Scenario id (1–4), user goal, discovery summary, AskUserQuestion answers
-- Instruction: produce **one** well-formed XML prompt (required sections) + run the internal refinement loop and **15-row compliance audit**; return **only** the final XML string and a pass/fail + fail count for that audit (no user-facing table)
+- Scenario id (1–4), user goal, discovery summary, AskUserQuestion answers (and any **Refine with free text** deltas from prior preview rounds)
+- Instruction: produce **one** well-formed XML prompt (required sections) + run the internal refinement loop and **15-row compliance audit**; return **final XML string**, **pass/fail + fail count** for that audit, and **preview summary** fields (`what_executor_produces`, `primary_inputs_or_tools`, `done_when`, `sample_excerpt_markdown` following the sample formatting rules in SKILL.md section 7, about twenty lines max)
+Keep subagent reasoning in the Agent transcript; the user-facing **Outcome preview** turn surfaces the preview summary; the **final** turn contains audit + fence + digest (+ hook validation block).
+### Phase 4 — Outcome preview gate
+1. Render `### Outcome preview` from the **preview summary** (bullets only).
+2. Issue **AskUserQuestion** with **Ship this outcome profile** (recommended), two contextual alternates from discovery, **Refine with free text**.
+3. On **Ship**, go to Phase 5. On an alternate, merge the alternate into the brief and re-run Phase 3. On **Refine with free text**, merge the user text into the brief and re-run Phase 3. Stop after **three** preview rounds unless the user explicitly raises the cap in chat.
-The orchestrator then prints **`Audit: pass 15/15`** or **`Audit: fail N/15 — [reason]`** immediately followed by the fenced XML. Keep subagent reasoning in the Agent transcript; the user-facing turn contains **only** audit + artifact.
+### Phase 5 — Final handoff
-**Draft-only:** If the user explicitly requests no refinement (“quick draft”, “no refinement loop”), the subagent may skip Steps 10–12 below but must still return valid XML and a honest audit line.
+Print **`Audit: pass 15/15`** or **`Audit: fail N/15 — [reason]`**, the **complete fenced XML**, then **`## Outcome digest`** (tightened copy from the accepted preview summary), then any hook validation block the prompt-workflow hooks expect.
+**Draft-only:** If the user explicitly requests no refinement (“quick draft”, “no refinement loop”), the subagent may skip Steps 10–12 below but must still return valid XML, honest audit line, and a **preview summary**; Phases 4–5 still run so the user confirms shape before paste.
 ## Workflow (run in order — primarily inside the drafting subagent)
@@ -175,7 +191,7 @@ Use the optional `<illustrations>` section when concrete samples make format, to
 3. **Triple-backtick inner fence:** When the sample must use backtick fences, emit a **complete pair**: an opening line beginning with three backticks plus an info string (e.g. `` ```bash ``), the sample lines, then a closing line containing only three backticks. The prompt-workflow hook and clipboard path treat that pair as one unit inside the outer `` ```xml `` fence. For the **most stable on-screen rendering** in chat UIs, use step 1 or step 2 above before this option.
 4. **Cap count:** Include **three to five** distinct illustration blocks (narrative plus optional sample) unless the user’s brief asks for a different depth.
-These steps are **machine-facing obligations** for the orchestrator and drafting subagent. The person invoking `/prompt-generator` receives the finished fenced XML; the skill text above is what the model follows when filling `<illustrations>`.
+These steps are instructions for the orchestrator and drafting subagent to follow when filling `<illustrations>`. The person invoking `/prompt-generator` receives the finished fenced XML.
 ### 8. Light self-check (subagent, pre-return)
@@ -199,15 +215,24 @@ Expand the light self-check with this internal checklist when useful:
 ### 9. Deliver (orchestrator)
-The orchestrator’s **only** delivery to the user is:
+The orchestrator’s **final** delivery to the user is, in order:
 ```text
 Audit: pass 15/15
 ```
-(or `fail N/15 — …`), immediately followed by **one** fenced XML block; **send boundary** is immediately after the closing fence so the user receives a copy-ready pair (audit line + artifact) in one assistant message before the conversation continues.
+(or `fail N/15 — …`), immediately followed by **one** fenced `xml` block (paste-ready prompt artifact), immediately followed by **`## Outcome digest`** containing:
+- **What the downstream executor produces** (from `what_executor_produces`)
+- **Primary inputs or tools** (from `primary_inputs_or_tools`)
+- **Done when** (from `done_when`)
+- **Sample excerpt** (from `sample_excerpt_markdown`; follow the sample formatting rules in SKILL.md section 7 because `extract_fenced_xml_content` concatenates every `xml` fence)
+Then append the **hook validation block** (see Terminology above) **after** the digest when the workflow emits it for hooks.
+**Paste-ready section:** Only the ` ```xml ` … ` ``` ` span is intended for clipboard paste into a downstream session; the digest and hook validation block are for reading and hook validation.
-**Render-survival:** When the fenced XML uses tag names that **collide with HTML5 elements** (`section`, `summary`, `details`, `header`, `footer`, `main`, `aside`, `article`, `nav`, `figure`), or when the artifact is **very large**, **write the artifact to a file** and give the user the path together with the usual one-line audit. Add a brief **section inventory** (confirming the five required sections) so the user can trust the file even if the inline fence would render poorly. Required grounding uses `<background>` (the old `context` name matched HTML). Details: **TARGET_OUTPUT.md — Structural invariant E**.
+**Render-survival:** When the fenced XML uses tag names that **collide with HTML5 elements** (`section`, `summary`, `details`, `header`, `footer`, `main`, `aside`, `article`, `nav`, `figure`), or when the artifact is **very large**, **write the artifact to a file** and give the user the path together with the usual one-line audit. Add a brief **section inventory** (confirming the five required sections) so the user can trust the file even if the inline fence would render poorly. Still emit **Outcome digest** (and file path if used) after the inline fence closes. Required grounding uses `<background>` (the old `context` name matched HTML). Details: **TARGET_OUTPUT.md — Structural invariant E**.
 ### 10. Default refinement mode (subagent-internal)
@@ -249,13 +274,13 @@ For each row, maintain `status`, `evidence_quote`, `source_ref`, and `fix_if_fai
 ### 12. Debug-only bundle (explicit user request only)
-When the user explicitly asks for debug / full audit, emit the markdown table, `scope_block` recap, and the debug JSON **in addition to** the audit line + XML fence.
+When the user explicitly asks for debug / full audit, emit the markdown table, `scope_block` recap, and the debug JSON **in addition to** the audit line + XML fence + **Outcome digest** + any hook validation block.
-**Default user-facing path (keeps Stop hooks green):** After the XML fence, stop—do **not** add a second fenced block, do **not** start the message with `{`, and keep internal pipeline keys (`pipeline_mode`, `scope_block_validation`, `evidence_quotes`, `source_refs`, `corrective_edits`, `retry_count`, `audit_output_contract`, `section_output_contract`, `base_prompt_xml`, `required_sections`) inside the debug JSON only.
+**Default user-facing path (keeps Stop hooks green):** On non-debug turns, after the `xml` fence, emit **Outcome digest** and hook validation block, then **stop**—do **not** add a second outer fenced block for debug payloads, do **not** start the assistant message with `{`, and keep internal pipeline keys (`pipeline_mode`, `scope_block_validation`, `evidence_quotes`, `source_refs`, `corrective_edits`, `retry_count`, `audit_output_contract`, `section_output_contract`, `base_prompt_xml`, `required_sections`) inside the debug JSON only.
-**Debug JSON shape:** Full schema and field definitions: **REFERENCE.md** → **Debug JSON schema (prompt-generator pipeline)**. Use that object only on debug requests; default turns remain audit line + single `xml` fence.
+**Debug JSON shape:** Full schema and field definitions: **REFERENCE.md** → **Debug JSON schema (prompt-generator pipeline)**. Use that object only on debug requests.
-**Hook-recovery (default path):** Print the **complete** fenced XML again, then the **one-line** audit; keep every XML section intact while you adjust scaffolding to satisfy the hook.
+**Hook-recovery (default path):** Print the full handoff again in standard output order: **Audit line**, complete ` ```xml ` fence, **## Outcome digest**, then hook validation block when used. Keep every XML section inside the fence intact while you adjust audit prefix or scaffolding outside the fence to satisfy the hook.
 ### 13. Scope quality rule for generated prompts
@@ -282,9 +307,9 @@ Use `/agent-prompt` only after the user explicitly asks to execute. Refinement s
 ### 17. Context-footprint controls
-Keep orchestrator turns minimal: discovery → AskUserQuestion → subagent → one-line audit + fence. Push heavy drafting to the subagent with a **curated** brief (especially Scenario 4).
+Keep orchestrator turns structured: discovery → **AskUserQuestion** → subagent → **Outcome preview** turn → one final message (audit + fence + digest + hook validation block). Push heavy drafting to the subagent with a **curated** brief (especially Scenario 4).
-**Low-context defaults:** Keep the base instruction layer in generated prompts lean—scope anchors, checklist-backed behaviors, and inert-content safety where hooks apply. Store stable enforcement text in hooks/rules instead of pasting full policy into every XML artifact. Load heavy skills only when the user’s task explicitly needs them. Prefer pointers to **REFERENCE.md** over repeating long excerpts; default user-visible output stays audit line + single `xml` fence unless the user requests debug.
+**Low-context defaults:** Keep the base instruction layer in generated prompts lean—scope anchors, checklist-backed behaviors, and inert-content safety where hooks apply. Store stable enforcement text in hooks/rules instead of pasting full policy into every XML artifact. Load heavy skills only when the user’s task explicitly needs them. Prefer pointers to **REFERENCE.md** over repeating long excerpts; default user-visible output stays audit line + single `xml` fence + **Outcome digest** (+ hook validation block when used) unless the user requests debug extras.
 ## Claude 4.6 considerations

package/skills/prompt-generator/TARGET_OUTPUT.md CHANGED Viewed

@@ -2,19 +2,31 @@
 This file is the **target output spec** for eval-driven iteration of the `prompt-generator` skill. Evals assert behavior against it; update this document and `SKILL.md` together when the contract changes.
+**File map:** `ARCHITECTURE.md` lists all files in this skill package and their roles.
 **Methodology:** [Anthropic — Agent Skills: evaluation and iteration](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices#evaluation-and-iteration)
+## Terminology
+- **Outcome preview gate** — Mandatory `AskUserQuestion` turn **after** the drafting subagent returns candidate XML internally and **before** the orchestrator emits the final `Audit` line and fenced artifact. Confirms the user recognizes what executing the generated prompt will produce.
+- **Outcome digest** — Skimmable markdown block **after** the closing ` ``` ` of the single `xml` fence on the final handoff: bullets for downstream deliverables, primary inputs or tools, done criteria, and a short sample excerpt (see `SKILL.md` §9).
 ## User-visible output contract
-- **Clarity bar:** Every deliverable (AskUserQuestion fields, audit line, XML body) states concrete outcomes, explicit formats, and checkable done-when signals—aligned with Anthropic [Be clear and direct](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#be-clear-and-direct) and [Control the format of responses](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#control-the-format-of-responses). Prefer what to do and how to verify it over empty prohibitions or vague quality adjectives.
+- **Clarity bar:** Every deliverable (`AskUserQuestion` fields, audit line, XML body, outcome digest) states concrete outcomes, explicit formats, and checkable done-when signals—aligned with Anthropic [Be clear and direct](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#be-clear-and-direct) and [Control the format of responses](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices#control-the-format-of-responses). Prefer what to do and how to verify it over empty prohibitions or vague quality adjectives.
 - **Questions:** Deliver every clarifying question through **AskUserQuestion** (one form per round), with **2–4** options per question and the **recommended** option listed **first**. Tag discovery-sourced options **`[discovered]`** when they came from repo search.
+- **Outcome preview turn (mandatory):** Immediately before the final handoff, emit exactly one assistant turn that contains:
+  1. A markdown block titled `### Outcome preview` with bullets only: **What the downstream executor produces**, **Primary inputs or tools**, **Done when**, **Sample excerpt** (about twenty lines max; follow the sample formatting rules in SKILL.md section 7).
+  2. **AskUserQuestion** with **2–4** options: **Ship this outcome profile** (recommended first), two **contextual alternates** grounded in discovery, and **Refine with free text** (starts another drafting loop). Observe the preview round cap per SKILL.md Phase 4.
 - **Final assistant message (complete handoff in one send):**
   1. **Audit line:** `Audit: pass 15/15` or `Audit: fail N/15 — [reason]`
   2. **Artifact:** the full XML prompt inside **one** Markdown code fence whose language tag is `xml`
-  3. **Send boundary:** stop typing as soon as the closing fence ends—the message body is exactly those two blocks back-to-back, ready to copy; your next tokens belong to the user’s following turn
-- **Full audit table / JSON debug bundle:** Stay internal until the user names debug with a phrase such as `show debug`, `full audit table`, or `raw internal object`; then append the table/JSON after the usual audit line + XML fence.
-- **Hook retries:** Keep retry loops inside the subagent or internal pipeline; the user sees at most one short status line such as `Retrying: scope anchor missing` before the successful audit line + fence.
-- **Decision stability:** Pick one drafting approach, carry it to a complete XML artifact, then stop. Change approach only when the user or tool results add **new** facts that contradict the earlier plan; if the draft fails checks, fix forward inside the same structure instead of restarting from scratch.
+  3. **Outcome digest:** after the closing fence, a `## Outcome digest` section repeating (or lightly tightening) the skimmable bullets and sample so the user can verify the paste-ready prompt matches intent **without** opening the whole XML
+  4. **Hook validation block (when used, defined in SKILL.md Terminology):** When the workflow emits the hook validation block, place it **after** the Outcome digest so `extract_fenced_xml_content` still returns only the XML body for clipboard copy
+  5. **Paste-ready section:** The paste-ready prompt artifact remains the single ` ```xml ` block; the digest and hook validation block are for reading and hook validation, not for pasting into the downstream session
+- **Full audit table / JSON debug bundle:** Stay internal until the user names debug with a phrase such as `show debug`, `full audit table`, or `raw internal object`; then append the table/JSON **after** the Outcome digest and any hook validation block.
+- **Hook retries:** Keep retry loops inside the subagent or internal pipeline; the user sees at most one short status line such as `Retrying: scope anchor missing` before the successful audit line + fence + digest.
+- **Decision stability:** Pick one drafting approach, carry it through preview confirmation, then ship; change approach only when **new** facts from the user or tools contradict the earlier plan; if the draft fails checks, fix forward inside the same structure instead of restarting from scratch.
 ## Scenario 1: Fresh chat with brief goal
@@ -24,7 +36,7 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
 **Q&A:** One AskUserQuestion with **2–4** questions covering: scope (which subtree), audience (human vs agent consumer), desired downstream output shape, and hard constraints (tests, CODE_RULES, deadlines). Populate options from discovery paths and package names.
-**Output:** Send audit line, then one `xml` fence with the full prompt, then stop—the handoff message is complete.
+**Output:** After drafting, run the **Outcome preview** turn; then send audit line, `xml` fence, **Outcome digest**, and any hook validation block—the handoff message is complete.
 ## Scenario 2: Session handoff
@@ -34,7 +46,7 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
 **Q&A:** One AskUserQuestion with **1–2** questions, e.g. “Rank these next actions for the new session” or “Exclude these areas from scope,” each with **2–4** concrete options drawn from the thread.
-**Output:** Send audit line, then one `xml` fence with the full prompt, then stop—the handoff message is complete.
+**Output:** After drafting, run the **Outcome preview** turn; then send audit line, `xml` fence, **Outcome digest**, and any hook validation block.
 **Handoff prompt quality:** `<background>` must include the bullet lists above so a new session can continue with **zero** access to this chat. Quote decision text verbatim where precision matters.
@@ -48,13 +60,13 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
 **Requirements checklist:** The generated XML must mention every user-stated requirement by name (timeouts, selectors, config extraction, TDD, CODE_RULES, test safety, etc.); if one is out of scope, put the reason in `<open_question>`.
-**Output:** Send audit line, then one `xml` fence with the full prompt, then stop—the handoff message is complete.
+**Output:** After drafting, run the **Outcome preview** turn; then send audit line, `xml` fence, **Outcome digest**, and any hook validation block.
 ## Scenario 4: Noisy context, stable output
 **Trigger:** `/prompt-generator ...` after a long thread with unrelated topics, tool errors, or tangents.
-**Output shape:** Same as Scenario 1: audit line, one `xml` fence, immediate send boundary after the closing fence.
+**Output shape:** Same as Scenario 1 for the final message: audit line, one `xml` fence, **Outcome digest**, then any hook validation block.
 **Content focus:** Keep the generated XML aligned with the latest `/prompt-generator` request (e.g. “security-focused code review agent”). Populate the subagent brief from: the user’s literal request string, a **one-paragraph** summary of on-topic facts, and path-grounded discovery notes—leave stack traces, failed commands, and off-topic thread history out of that brief so they never reach the XML body.
@@ -62,17 +74,17 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
 **Delegation:** Give the drafting subagent a **curated** brief under ~2k tokens when possible: request string + summary + discovery snippets—enough context to draft, without attaching the full raw transcript.
-## Structural invariant A — Tool-free artifact tail
+## Structural invariant A — Tool-free artifact output
-- **Order:** discovery tool calls (when used) → AskUserQuestion → subagent (draft + internal audit) → **one** final assistant message.
-- **Final message composition:** That message is plain text only, in order: audit line → opening fence → XML body → closing fence → end-of-message. Run every `tool_use` in earlier turns; between the opening and closing fence, emit only the characters of the XML payload.
+- **Order:** discovery tool calls (when used) → **AskUserQuestion** → subagent (draft + internal audit) → **Outcome preview** turn (`### Outcome preview` + **AskUserQuestion**) → optional refinement loops → **one** final assistant message.
+- **Final message composition:** That message is plain text only, in order: audit line → opening ` ```xml ` fence → XML body → closing fence → `## Outcome digest` → optional hook validation block → end-of-message. Run every `tool_use` in earlier turns; between the opening and closing `xml` fence, emit only the characters of the XML payload.
 ## Structural invariant B — Fenced block closes cleanly
-- Use one opening ``` and one closing ``` for the artifact.
+- Use one opening ``` and one closing ``` for the **xml** artifact.
 - Balance every XML tag; close `<instructions>`, `<background>`, etc. explicitly.
 - End each numbered step inside `<instructions>` with a complete sentence and a fully written list item.
-- The user can copy from the opening ``` through the closing ``` into a new file without manual repair.
+- The user can copy from the opening ``` through the closing ``` of the **xml** fence into a new file without manual repair.
 ## Structural invariant C — Discovery before lock-in
@@ -91,8 +103,9 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
 - **Problem (HTML):** Tag names used for prompt XML sections can overlap **HTML5 element names**. Chat renderers may treat those tokens as HTML and hide or alter the content between tags. High-risk examples include: `section`, `summary`, `details`, `header`, `footer`, `main`, `aside`, `article`, `nav`, `figure`. The former required name `context` matched an HTML element; **required** sections now use `<background>` for situational grounding so the name stays off that list. The raw assistant text may be complete while the **rendered** message looks like sections are missing.
 - **Problem (nested Markdown fences):** A ` ```bash ` (or other inner) line inside the outer ` ```xml ` block is still a line of text in the transcript, but many Markdown renderers treat it as **opening a nested code fence**, which **closes the outer fence early**. Everything after that point (including `</illustrations>` and other closing tags) can appear outside the code block or look “swallowed.” Hooks historically used a regex that stopped at the **first** triple-backtick line; `extract_fenced_xml_content` now walks inner fences (` ```lang ` … closing `` ``` ``) before accepting the outer `` ``` `` that ends the `xml` block.
+- **Outcome digest:** Follow the sample formatting rules in SKILL.md section 7 inside `## Outcome digest` so a second outer ` ```xml ` block never appears—multiple `xml` fences concatenate in `extract_fenced_xml_content` and would corrupt clipboard copy.
 - **Primary mitigation:** When the fenced XML artifact **contains any tag whose local name is on the HTML-collision list**, or when the artifact is **large enough that render truncation is likely**, the orchestrator **must write the full artifact to a file** (default: under `data/prompts/` or a path the user supplied earlier) and **paste the absolute file path** in the chat message. Pair the path with a **short section inventory** confirming all five required sections (`role`, `background`, `instructions`, `constraints`, `output_format`) are present in the file.
-- **Authoring rules for code inside `<illustrations>` (orchestrator + drafting subagent — see `SKILL.md` §7):** (1) Format each sample line with **four leading spaces** inside `<illustrations>` as the default for stable rendered chat. (2) **Or** use a **tilde fence**: `~~~` + optional language on the opening line, body, then `~~~` on its own line. (3) **Or** use a **complete triple-backtick pair** (opening `` ```lang `` line, body, closing `` ``` `` line); hooks and clipboard treat the pair as one unit inside the outer `` ```xml `` fence.
+- **Authoring rules for code inside `<illustrations>`:** Follow the sample formatting rules in SKILL.md section 7. Hooks and clipboard treat complete triple-backtick pairs as one unit inside the outer `` ```xml `` fence.
 - **Fallback when file write is unavailable:** Escape the **opening angle bracket** of colliding tags (for example `&lt;section>` — user restores `<` when pasting) or use another distinctive wrapper **documented in the same message**, so the user can recover literal XML. State explicitly that the user should restore brackets when copying into another system.
 - **Structural safety net:** Regardless of renderer behavior, the **Stop hook section-presence gate** blocks any prompt-workflow response whose fenced XML is missing any required opening/closing section tag pair. Methodology: [Anthropic — Agent Skills: evaluation and iteration](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices#evaluation-and-iteration).
@@ -106,7 +119,7 @@ Include at least:
 - `<constraints>...</constraints>`
 - `<output_format>...</output_format>`
-Add `<illustrations>` when format or tone is easy to misunderstand; nest sections when the task has natural hierarchy. **Long code samples belong in `<illustrations>`** using the same ordered choices as Structural invariant E: four-space-indented lines first, then tilde fences, then a complete triple-backtick pair if the brief requires backtick fences (see `SKILL.md` §7).
+Add `<illustrations>` when format or tone is easy to misunderstand; nest sections when the task has natural hierarchy. **Long code samples belong in `<illustrations>`** — follow the sample formatting rules in SKILL.md section 7.
 ## Internal 15-row compliance checklist (audit numerator)

package/skills/prompt-generator/evals/prompt-generator.json CHANGED Viewed

@@ -13,8 +13,10 @@
         "Discovery tool calls (Glob/Grep) execute before any AskUserQuestion",
         "All questions delivered via AskUserQuestion — zero questions in direct chat text",
         "AskUserQuestion contains 2-4 questions, each with 2-4 options, recommended option first",
-        "Final response contains exactly: 1-liner audit status + one fenced XML prompt block",
-        "No commentary, tables, audit rows, or explanation outside the fenced block",
+        "After internal drafting finishes, one assistant turn shows ### Outcome preview with bullets only: what executing the generated prompt will produce, primary inputs or tools, done-when, short sample (TARGET_OUTPUT.md)",
+        "That same turn uses AskUserQuestion (2-4 options): recommended option first = user confirms the preview matches their intent and proceeds to the final handoff; two options offer scope or emphasis shifts grounded in discovery; one option collects free-text refinements and triggers another drafting pass (at most three such preview rounds unless the user raises the cap in chat)",
+        "Final response: Audit line, one Markdown code fence tagged xml with the full prompt, then ## Outcome digest after the closing fence",
+        "No second outer ```xml fence in the digest (samples use four spaces or tilde fences only)",
         "Fenced block contains <role>, <background>, <instructions>, <constraints>, <output_format>",
         "Prompt generation delegated to a subagent (Agent tool call visible in the flow)"
       ]
@@ -34,7 +36,8 @@
         "No redundant discovery tool calls for information already in conversation",
         "Handoff prompt is self-contained — a new session can resume without prior context",
         "Prior decisions preserved in the handoff, not lost or paraphrased away",
-        "Final output: 1-liner audit + fenced XML prompt, nothing else"
+        "After internal drafting finishes, ### Outcome preview bullets plus AskUserQuestion as in eval id 1 (confirm match as recommended first option; two contextual alternates; free-text refine option; preview loop cap)",
+        "Final output: 1-liner audit + fenced XML prompt + ## Outcome digest after the fence"
       ]
     },
     {
@@ -48,7 +51,8 @@
         "Ambiguities surfaced as specific options, not open-ended questions",
         "Discovery tool calls verify references from input (shared_utils, config patterns)",
         "ALL requirements from unstructured input captured (timeouts, selectors, config extraction, TDD, code rules, test safety) — none dropped",
-        "Final output: 1-liner audit + fenced XML prompt, nothing else"
+        "After internal drafting finishes, ### Outcome preview bullets plus AskUserQuestion as in eval id 1 (confirm match as recommended first option; two contextual alternates; free-text refine option; preview loop cap)",
+        "Final output: 1-liner audit + fenced XML prompt + ## Outcome digest after the fence"
       ]
     },
     {
@@ -58,7 +62,8 @@
       "prompt": "[Preceded by 80+ turns: failed git push, hook debugging, unrelated Samsung portal discussion, Python tracebacks, Midjourney tangent, 15+ empty Grep results] /prompt-generator Write a system prompt for a code review agent that checks for security vulnerabilities",
       "files": [],
       "expected_behavior": [
-        "Output format identical to Scenario 1: 1-liner audit + fenced XML prompt",
+        "Output format matches Scenario 1: 1-liner audit + fenced XML prompt + ## Outcome digest after the fence",
+        "After internal drafting finishes, ### Outcome preview bullets plus AskUserQuestion as in eval id 1 (confirm match as recommended first option; two contextual alternates; free-text refine option; preview loop cap)",
         "Prompt content about code review and security — zero contamination from prior noise",
         "No references to prior errors, tangents, or unrelated tool calls in the prompt",
         "XML structure complete and well-formed — no truncation from context pressure",
@@ -74,8 +79,8 @@
       "expected_behavior": [
         "No tool_use blocks appear after the first fence marker of the canonical prompt artifact",
         "All Glob/Grep discovery calls precede the AskUserQuestion",
-        "All AskUserQuestion interactions precede the fenced block",
-        "Review the last successful Audit + fenced xml pair; blocked retry attempts preserved by flattened transcript exports do not count as additional delivered artifacts"
+        "AskUserQuestion interactions precede the subagent; Outcome preview AskUserQuestion precedes the final Audit line and xml fence",
+        "Review the last successful Audit + fenced xml pair; blocked retry attempts preserved by exported conversation logs do not count as additional delivered artifacts"
       ]
     },
     {
@@ -85,7 +90,7 @@
       "prompt": "/prompt-generator Write a detailed agent-harness prompt for a TDD bug-fix workflow that traces a routing error across 5+ files, with state management for multi-window execution and structured test tracking",
       "files": [],
       "expected_behavior": [
-        "The canonical prompt artifact has one opening xml fence and one matching closing fence; flattened transcript exports are normalized to that same boundary before review",
+        "The canonical prompt artifact has one opening xml fence and one matching closing fence; exported conversation logs are normalized to that same boundary before review",
         "Every XML tag properly opened and closed",
         "No truncation at numbered-list bullets (the Issue #41 failure mode)",
         "No mid-sentence cuts or incomplete sections",
@@ -182,6 +187,21 @@
         "missing_required_xml_sections sees closing tags for role, background, instructions, constraints, output_format when those appear after nested fences",
         "SKILL.md §7 states ordered authoring steps for <illustrations>: four-space-indented sample lines, then tilde fences, then a complete triple-backtick pair when required"
       ]
+    },
+    {
+      "id": 14,
+      "name": "outcome_preview_gate_and_digest_placement",
+      "scenario": "Outcome preview + post-fence digest (refinement contract)",
+      "prompt": "/prompt-generator Write a short user-task prompt for triaging GitHub issues by label in this repo",
+      "files": [],
+      "expected_behavior": [
+        "Subagent returns final XML plus preview summary fields for orchestrator use",
+        "### Outcome preview markdown block precedes AskUserQuestion; bullets cover executor output, inputs or tools, done when, sample excerpt (~20 lines max)",
+        "AskUserQuestion: recommended first option labels accepting the described outcome and proceeding (SKILL.md may phrase this as 'Ship this outcome profile' or equivalent); plus two contextual alternates and a free-text refinement path; at most three preview rounds unless user extends cap in chat",
+        "At most three preview refinement loops unless user raises cap in chat",
+        "Final handoff order: Audit line, single ```xml fence, ## Outcome digest, then optional hook validation block (defined in SKILL.md Terminology) after digest",
+        "extract_fenced_xml_content returns only the XML body (digest uses no second ```xml fence)"
+      ]
     }
   ]
 }

package/skills/prompt-generator/templates/skill-from-ground-up.md ADDED Viewed

@@ -0,0 +1,104 @@
+# Template: ground-up skill build (collaborative)
+Use this artifact when you want a **new** Agent Skill package built **architecture-first**, then **implementation in layers**, with **human checkpoints** and **evaluation rows tied to real evidence** (pasted threads or explicit user approval), aligned with [Agent Skills best practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices).
+## Before you paste
+Replace every `[[TOKEN]]` below (keep the brackets off in the final prompt you paste into a session, or substitute literal values).
+| Token | Meaning |
+| --- | --- |
+| `[[SKILL_PURPOSE]]` | Two to five sentences: capability, user pain, success shape. |
+| `[[WORKSPACE_ROOT]]` | Repo-relative folder for all Phase A–B files (e.g. `my-skill-workspace/`). |
+| `[[DESIGN_INPUT_GLOB]]` | Hypothesis or gap doc path (e.g. `my-skill-workspace/gap-analysis.md`), or `None` carried into `ARCHITECTURE.md` `Design inputs`. If `None`, omit the trailing `, plus \`[[DESIGN_INPUT_GLOB]]\`` fragment from `target_file_globs` in the XML you paste. |
+| `[[OPTIONAL_PACKAGE_TARGET]]` | Future install path (e.g. `packages/claude-dev-env/skills/my-skill/`) or `deferred until the user requests packaging`. |
+| `[[SKILL_SLUG]]` | Short kebab-case id for eval filename and metadata (e.g. `my-skill`). |
+| `[[DESCRIPTION_TRIGGERS]]` | Comma-separated phrases for third-person `description` triggers (user says, user asks, …). |
+| `[[EVIDENCE_RULE]]` | One sentence: what counts as authorized eval material (e.g. pasted excerpts only). |
+Optional: run `/prompt-generator` with this template as the brief so **AskUserQuestion** can lock scope before drafting.
+---
+## Paste-ready XML (after substitution)
+```xml
+<role>
+You are a skill architect and implementer for Claude Agent Skills. You build a skill package under human review. You follow Anthropic Agent Skills best practices for structure, progressive disclosure, workflows, examples, and evaluation-driven iteration. You treat the user as the sole authority for real evaluation evidence.
+</role>
+<background>
+Motivation: [[SKILL_PURPOSE]] The user builds collaboratively: evaluation scenarios must trace to material the user pasted or explicitly approved in thread. Work proceeds architecture-first (full end-state inventory), then implementation from foundation upward, with a full stop for review after each new or materially rewritten file.
+Scope block (ground every instruction here):
+- target_local_roots: `[[WORKSPACE_ROOT]]` at the repository root of this project (primary drafting and documentation area).
+- target_canonical_roots: Optional future install under `[[OPTIONAL_PACKAGE_TARGET]]`; until packaging is requested, keep all writes inside `[[WORKSPACE_ROOT]]`.
+- target_file_globs: `[[WORKSPACE_ROOT]]ARCHITECTURE.md`, `[[WORKSPACE_ROOT]]SKILL.md`, `[[WORKSPACE_ROOT]]REFERENCE.md`, `[[WORKSPACE_ROOT]]EXAMPLES.md`, `[[WORKSPACE_ROOT]]WORKFLOWS.md`, `[[WORKSPACE_ROOT]]evals/*.json`, `[[WORKSPACE_ROOT]]scripts/**` when present, plus `[[DESIGN_INPUT_GLOB]]` when it names a real path.
+- comparison_basis: Anthropic Agent Skills best practices for concise metadata, third-person descriptions, progressive disclosure (SKILL.md as table of contents, linked files one level deep, table of contents atop references over one hundred lines), copy-paste checklists for complex flows, template plus example pairs, evaluation JSON with expected_behavior arrays, and iterative testing across models the user cares about.
+- completion_boundary: Phase A delivers `[[WORKSPACE_ROOT]]ARCHITECTURE.md` listing every end-state file, its purpose, load order, eval evidence policy, and optional hook milestones; the user approves or edits that architecture in chat. Phase B creates or updates exactly one primary file per user checkpoint; each checkpoint ends with a short summary naming the next file queued. Phase C closes when SKILL.md plus linked references exist, `evals/` contains JSON whose populated rows include evidence references, and a final inventory table lists every path with a one-line purpose.
+Hypothesis note: When `[[DESIGN_INPUT_GLOB]]` names a file, treat that file as design input; each eval row still needs a matching pasted thread excerpt or explicit user approval before it counts as grounded behavior truth.
+Source priority for structure and quality bars: Anthropic Agent Skills documentation and prompting best practices first; user pasted materials for behavioral truth; repository AGENTS.md or package README when they touch path layout.
+Citation policy: Quote user-pasted excerpts inside evaluation records or EXAMPLES.md when illustrating a scenario; paraphrase only when the user labels a paste as approved summary material. Ground structural claims about Anthropic guidance in links recorded in REFERENCE.md.
+Evaluation evidence rule: [[EVIDENCE_RULE]]
+</background>
+<instructions>
+1. Phase A — Architecture inventory: Create or replace `[[WORKSPACE_ROOT]]ARCHITECTURE.md` with an end-state map: a `Design inputs` line carrying either the path `[[DESIGN_INPUT_GLOB]]` or the literal `None`; required YAML frontmatter fields for SKILL.md; third-person description triggers covering [[DESCRIPTION_TRIGGERS]]; planned progressive disclosure files (each linked one level deep from SKILL.md); workflow checklist locations; EXAMPLES.md plan (input and output pairs aligned to user-provided threads when available); `evals/` JSON filename, schema fields (`query`, `expected_behavior`, optional `files`, `evidence_ref` pointing to paste ids or approved labels); optional future hook integration as a separate milestone requiring explicit user activation; forward-slash paths only. Close Phase A with a brief chat summary listing deliverables and ask the user to approve or edit `ARCHITECTURE.md` before Phase B begins.
+2. Phase B — Ground-up implementation after approval: (a) Pull vocabulary and pain language from the design source path recorded in `ARCHITECTURE.md` under `Design inputs` when that section lists a path; reuse stable terms across new files. When `Design inputs` lists `None`, rely on repository conventions plus stated chat goals for terminology. (b) Draft `[[WORKSPACE_ROOT]]SKILL.md` with valid `name` (lowercase, hyphens, within length limits) and a rich third-person `description` covering what the skill does and exact triggers. (c) Add `REFERENCE.md` for terminology, links to Anthropic docs, and eval evidence rules. (d) Add `EXAMPLES.md` with template openings; pair each example with a user-supplied or user-approved excerpt when available; leave labeled slots where pastes are still outstanding. (e) Add `WORKFLOWS.md` containing copy-paste checklists for the main user journeys. (f) Create `evals/[[SKILL_SLUG]].json` using the schema from ARCHITECTURE.md; keep `expected_behavior` empty or placeholder only until matching pasted excerpts arrive; each populated row includes `evidence_ref` to the user paste or approval line. (g) Keep SKILL.md body under five hundred lines; push depth into linked files; add a table of contents at the top of any reference file expected to exceed one hundred lines.
+3. Collaboration gate: After every new file or material rewrite in Phase B, pause the thread for user review; continue only after explicit user approval naming the next file to tackle.
+4. Evaluation hygiene: Add or tighten evaluation rows solely from text the user pasted or explicitly approved in chat; when coverage gaps remain, document the gap inside REFERENCE.md under a heading such as “Pending approved excerpts” with positive wording about what artifact will unlock each slot.
+5. Quality pass before each handoff: Verify third-person description, consistent terminology, one-level-deep links from SKILL.md, forward-slash paths, and checklist presence for multi-step flows.
+6. Optional Phase C — Distribution: Copy or adapt the approved package into the package skill path only when the user requests packaging; mirror the same file set; update cross-links.
+</instructions>
+<constraints>
+- Keep writes primarily inside `[[WORKSPACE_ROOT]]` until the user expands scope in chat.
+- Limit each checkpoint to one primary file (plus tiny cross-link tweaks the user OKs in the same message).
+- Prefer additive edits: create new files listed in ARCHITECTURE.md before renaming or collapsing documents; describe planned moves in the checkpoint summary.
+- Defer hook installation, Stop-hook changes, or clipboard automation to an explicit milestone the user starts later.
+- Align naming with Anthropic guidance: gerund or action-oriented skill names; descriptive file names; ship only path segments that name the skill domain clearly (for example `reference/`, `evals/`, `EXAMPLES.md`).
+</constraints>
+<output_format>
+Every assistant milestone message uses this shape:
+1. `## Summary` — bullets for files touched, decisions made, and the single filename queued for the next review.
+2. `## Checkpoint` — one paragraph restating what the user should verify before the next file starts.
+3. When showing draft bodies inline, use fenced code blocks with language tags (`markdown`, `json`) matching the destination file.
+4. Phase A exit criterion: user approves `ARCHITECTURE.md` content in chat or edits it and signals approval.
+5. Phase B exit criterion: SKILL.md, REFERENCE.md, EXAMPLES.md, WORKFLOWS.md, and `evals/*.json` exist; eval JSON lists only user-pasted or user-approved evidence references in populated rows; REFERENCE.md lists any still-empty slots.
+6. Ship a closing `## Inventory` table: path, one-line purpose, progressive disclosure load order.
+Light self-check the implementer runs silently before sending each milestone: scope paths remain under `[[WORKSPACE_ROOT]]` while Phase C stays inactive; when Phase C is active, allow writes under the package target too; collaboration gate honored; evaluation rows tied to approved evidence only; links one level deep from SKILL.md; forward slashes throughout.
+</output_format>
+<illustrations>
+    Phase A summary template (markdown):
+    ## Architecture ready for review
+    - End-state files: [list]
+    - Eval evidence policy: [restated in one line]
+    - Next action after your OK: create `SKILL.md` skeleton
+    Eval row stub (JSON shape):
+    {
+      "skills": ["[[SKILL_SLUG]]"],
+      "query": "[paste user question text here]",
+      "files": [],
+      "expected_behavior": [],
+      "evidence_ref": "[chat message id or user approval line]"
+    }
+    Checkpoint paragraph pattern:
+    Please review `REFERENCE.md`. When ready, reply with approval to proceed to `EXAMPLES.md`.
+</illustrations>
+```
+---
+Notation: The claude-dev-env **skill-builder** and **skill-writer** skills reference this file as a required step for **net-new** full skill packages. For refinements anchored to an existing skill directory, use the sibling template `skill-refinement-package.md` in this folder. Keep this file self-contained and token-driven so it stays paste-ready for `/prompt-generator`.

package/skills/prompt-generator/templates/skill-refinement-package.md ADDED Viewed

@@ -0,0 +1,109 @@
+# Template: skill package refinement (collaborative)
+Use this artifact when you are **refining an existing** Agent Skill package **architecture-first**, then **applying changes in layers**, with **human checkpoints** and **evaluation rows tied to real evidence** (observation transcripts, pasted threads, or explicit user approval). Same orchestration shape as `skill-from-ground-up.md`; this variant anchors every plan to a **baseline skill directory** and a **delta** (observations, gap analysis), and favors **surgical edits** over wholesale rewrite.
+Source layout mirrors: [Agent Skills best practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices).
+## Before you paste
+Replace every `[[TOKEN]]` below (drop brackets in the final prompt, or substitute literals).
+| Token | Meaning |
+| --- | --- |
+| `[[REFINEMENT_PURPOSE]]` | Two to five sentences: what must improve, what must stay stable, success shape for the refined package. |
+| `[[BASELINE_SKILL_ROOT]]` | Repo-relative path to the **current** skill package root (the live `SKILL.md` directory you read first). |
+| `[[WORKSPACE_ROOT]]` | Repo-relative folder for `ARCHITECTURE.md`, iteration notes, evals, and optional draft copies (can match `[[BASELINE_SKILL_ROOT]]` for in-place work, or a dedicated `[name]-workspace/` when using a snapshot workflow). |
+| `[[DESIGN_INPUT_GLOB]]` | Observation or gap doc path (e.g. `[name]-workspace/gap-analysis.md`). Carry into `ARCHITECTURE.md` under `Observation inputs`. If truly absent, set `Design inputs` to `None` in `ARCHITECTURE.md` and omit the trailing `plus \`[[DESIGN_INPUT_GLOB]]\`` fragment from `target_file_globs` in the pasted XML. |
+| `[[OPTIONAL_PACKAGE_TARGET]]` | Future install path (e.g. `packages/claude-dev-env/skills/my-skill/`) or `deferred until the user requests packaging`. |
+| `[[SKILL_SLUG]]` | Short kebab-case id for eval filename and metadata (usually matches existing skill `name`). |
+| `[[DESCRIPTION_TRIGGERS]]` | Comma-separated phrases for third-person `description` triggers after refinement (retain strong triggers; add any new ones). |
+| `[[EVIDENCE_RULE]]` | One sentence: authorized eval material (e.g. observation transcripts plus pasted or explicitly approved thread excerpts only). |
+Optional: run `/prompt-generator` with this template as the brief so **AskUserQuestion** can lock scope before drafting.
+---
+## Paste-ready XML (after substitution)
+```xml
+<role>
+You are a skill architect and implementer for Claude Agent Skills. You refine an existing skill package under human review. You follow Anthropic Agent Skills best practices for structure, progressive disclosure, workflows, examples, and evaluation-driven iteration. You treat the user and observation artifacts as the sole authority for behavioral truth; you treat the baseline skill tree as the anchor for what already works.
+</role>
+<background>
+Motivation: [[REFINEMENT_PURPOSE]] The user refines collaboratively: evaluation scenarios must trace to observation transcripts, pasted material, or explicitly approved thread lines. Work proceeds architecture-first (inventory baseline plus planned delta), then implementation in controlled layers, with a full stop for review after each materially touched file.
+Scope block (ground every instruction here):
+- target_local_roots: `[[WORKSPACE_ROOT]]` at the repository root of this project (planning, evals, optional drafts). Baseline read source: `[[BASELINE_SKILL_ROOT]]` (always list both in ARCHITECTURE.md).
+- target_canonical_roots: Optional future install under `[[OPTIONAL_PACKAGE_TARGET]]`; until packaging is requested, keep planning and agreed writes scoped per ARCHITECTURE.md checkpoints.
+- target_file_globs: `[[WORKSPACE_ROOT]]ARCHITECTURE.md`, files under `[[BASELINE_SKILL_ROOT]]**` that ARCHITECTURE.md lists as in scope, `[[WORKSPACE_ROOT]]evals/*.json`, `[[WORKSPACE_ROOT]]scripts/**` when present, plus `[[DESIGN_INPUT_GLOB]]` when it names a real path.
+- comparison_basis: Anthropic Agent Skills best practices for concise metadata, third-person descriptions, progressive disclosure (SKILL.md as table of contents, linked files one level deep, table of contents atop references over one hundred lines), copy-paste checklists, template plus example pairs, evaluation JSON with expected_behavior arrays, and iterative testing across models the user cares about; plus **delta discipline**: preserve sections the observation record marks as still effective; change only what observations or approved gaps demand.
+- completion_boundary: Phase A delivers or updates `[[WORKSPACE_ROOT]]ARCHITECTURE.md` with sections `Baseline inventory` (paths under `[[BASELINE_SKILL_ROOT]]`), `Observation inputs` (link to `[[DESIGN_INPUT_GLOB]]` or `None`), `Planned deltas` (bullet list tied to observations), `Files unchanged this pass` (explicit list), eval evidence policy, optional hook milestones; the user approves or edits in chat. Phase B updates exactly one primary file per user checkpoint (baseline path or workspace draft per ARCHITECTURE.md); each checkpoint ends naming the next file. Phase C closes when planned deltas are implemented, eval JSON populated rows carry evidence references per [[EVIDENCE_RULE]], and a final `## Inventory` table lists every touched path with one-line purpose.
+Hypothesis note: `[[DESIGN_INPUT_GLOB]]` when present carries observation-grounded gaps; each new or tightened eval row still needs a matching transcript excerpt, paste, or explicit user approval before it counts as grounded behavior truth.
+Source priority for structure and quality bars: Anthropic Agent Skills documentation and prompting best practices first; observation transcripts and user pasted materials for behavioral truth; repository AGENTS.md or package README when they touch path layout.
+Citation policy: Quote observation snippets and user-pasted excerpts inside evaluation records or EXAMPLES.md when illustrating a scenario; ground structural claims about Anthropic guidance in links recorded in REFERENCE.md.
+Evaluation evidence rule: [[EVIDENCE_RULE]]
+</background>
+<instructions>
+1. Phase A — Refinement architecture: Create or replace `[[WORKSPACE_ROOT]]ARCHITECTURE.md` with: `Baseline inventory` listing every file under `[[BASELINE_SKILL_ROOT]]` that stays in scope; `Observation inputs` with path `[[DESIGN_INPUT_GLOB]]` or `None`; `Planned deltas` as bullets each referencing an observation or approved gap; `Files unchanged this pass` listing files explicitly left alone; required YAML frontmatter fields for SKILL.md after refinement; third-person description triggers covering [[DESCRIPTION_TRIGGERS]]; progressive disclosure plan (links one level deep from SKILL.md); workflow checklist touchpoints; EXAMPLES.md delta plan; `evals/` JSON filename and schema fields (`query`, `expected_behavior`, optional `files`, `evidence_ref`); optional hook milestone flagged for explicit user activation; forward-slash paths only. Close Phase A with a brief chat summary and obtain user approval on `ARCHITECTURE.md` before Phase B.
+2. Phase B — Layered implementation after approval: (a) Before editing any file, read the current version from `[[BASELINE_SKILL_ROOT]]` (or from the approved snapshot path recorded in ARCHITECTURE.md when using a copy workflow). (b) Apply only the deltas listed in `Planned deltas`; retain prose and structure the observation record marks as still working. (c) Update `[[BASELINE_SKILL_ROOT]]SKILL.md` or workspace draft per ARCHITECTURE.md with valid `name` and refined third-person `description`. (d) Update REFERENCE.md, EXAMPLES.md, WORKFLOWS.md only when ARCHITECTURE.md lists them in scope for this pass. (e) Update or create `evals/[[SKILL_SLUG]].json` per ARCHITECTURE.md; new or tightened rows include `evidence_ref` to observation transcript ids, pastes, or approval lines. (f) Keep SKILL.md body under five hundred lines; push new depth into linked files; add a table of contents at the top of any reference file expected to exceed one hundred lines.
+3. Collaboration gate: After every materially touched file in Phase B, pause for user review; continue only after explicit user approval naming the next file.
+4. Evaluation hygiene: Add or tighten evaluation rows solely from observation artifacts or text the user pasted or explicitly approved; document remaining coverage needs under REFERENCE.md in a “Pending approved excerpts” style section with positive wording about what artifact unlocks each slot.
+5. Quality pass before each handoff: Verify third-person description, consistent terminology, one-level-deep links from SKILL.md, forward-slash paths, checklist presence for multi-step flows, and that unchanged files match the `Files unchanged this pass` list; when the user approves a scope change, update that list in `ARCHITECTURE.md` in the same checkpoint before touching additional files.
+6. Optional Phase C — Distribution: Copy or adapt the approved package into the package skill path only when the user requests packaging; mirror the agreed file set; update cross-links.
+</instructions>
+<constraints>
+- Read baseline files from `[[BASELINE_SKILL_ROOT]]` before drafting replacements; prefer minimal diffs aligned to ARCHITECTURE.md.
+- Limit each checkpoint to one primary file (plus tiny cross-link tweaks the user OKs in the same message).
+- Prefer additive edits and explicit rename plans recorded in the checkpoint summary before broad moves.
+- Defer hook installation, Stop-hook changes, or clipboard automation to an explicit milestone the user starts later.
+- Align naming with Anthropic guidance; ship only path segments that name the skill domain clearly (for example `reference/`, `evals/`, `EXAMPLES.md`).
+</constraints>
+<output_format>
+Every assistant milestone message uses this shape:
+1. `## Summary` — bullets for files touched, decisions made, baseline subsection preserved or changed, single filename queued for next review.
+2. `## Checkpoint` — one paragraph restating what the user should verify before the next file starts.
+3. When showing draft bodies inline, use fenced code blocks with language tags (`markdown`, `json`) matching the destination file.
+4. Phase A exit criterion: user approves `ARCHITECTURE.md` in chat or edits it and signals approval.
+5. Phase B exit criterion: planned deltas from ARCHITECTURE.md are implemented; eval JSON populated rows reference evidence per [[EVIDENCE_RULE]]; REFERENCE.md lists any still-empty eval slots.
+6. Ship a closing `## Inventory` table: path, one-line purpose, progressive disclosure load order, note whether each path was baseline-updated or newly added.
+Light self-check before each milestone: scope matches ARCHITECTURE.md; collaboration gate honored; evaluation rows tied to approved evidence only; links one level deep from SKILL.md; forward slashes throughout; baseline sections marked stable stay intact until the user approves an explicit change to them.
+</output_format>
+<illustrations>
+    Phase A summary template (markdown):
+    ## Refinement architecture ready for review
+    - Baseline root: [[BASELINE_SKILL_ROOT]]
+    - Observation inputs: [[DESIGN_INPUT_GLOB]]
+    - Planned deltas: [bullets tied to observations]
+    - Unchanged this pass: [file list]
+    - Next action after your OK: edit [single file name]
+    Eval row stub (JSON shape):
+    {
+      "skills": ["[[SKILL_SLUG]]"],
+      "query": "[task that failed under old skill]",
+      "files": [],
+      "expected_behavior": [],
+      "evidence_ref": "[observation transcript id or user approval line]"
+    }
+    Checkpoint paragraph pattern:
+    Please review the updated `SKILL.md` delta against baseline. When ready, reply with approval to proceed to `REFERENCE.md`.
+</illustrations>
+```
+---
+Notation: Derived from `skill-from-ground-up.md` in this folder; use **that** template for **net-new** packages, **this** template when a baseline skill directory exists. The claude-dev-env **skill-builder** workflows (`improve-skill`, `polish-skill`) and **skill-writer** reference this file for checkpointed, multi-file refinements.

package/skills/skill-builder/SKILL.md CHANGED Viewed

@@ -58,6 +58,24 @@ The feedback loop: observe Claude B's behavior, bring insights back, refine the
 | 5. Iterate | Review results, refine, re-test | This skill + `/skill-writer` + Phase 4 |
 | 6. Polish | Description optimization, trigger eval, final check | `skill-creator` description optimizer |
+## Ground-up skill package (required reference)
+When building a **new** skill as a **full package** (architecture inventory, progressive disclosure files, `evals/`, **human checkpoint after each file**), treat the following as **mandatory** before implementation:
+- **Read:** `prompt-generator/templates/skill-from-ground-up.md` inside the claude-dev-env `skills/` directory (repository path: `packages/claude-dev-env/skills/prompt-generator/templates/skill-from-ground-up.md`).
+- **Do:** Run `/prompt-generator` with that file’s token table filled so the downstream session follows architecture-first sequencing, per-file review gates, and eval rows tied only to user-pasted or explicitly approved evidence.
+Use this **together with** gap-analysis and eval-scenario templates in this package; the ground-up template supplies the **orchestration contract** for the multi-file layout Anthropic recommends.
+## Refinement skill package (required reference)
+When **improving** an existing skill as a **multi-file** or **checkpointed** package (baseline directory plus planned deltas, observation-grounded evals), treat the following as **mandatory** before Phase 2–6 file work in `improve-skill.md` or package-aware steps in `polish-skill.md`:
+- **Read:** `prompt-generator/templates/skill-refinement-package.md` (repository path: `packages/claude-dev-env/skills/prompt-generator/templates/skill-refinement-package.md`).
+- **Do:** Run `/prompt-generator` with that file’s token table filled (`[[BASELINE_SKILL_ROOT]]`, `[[WORKSPACE_ROOT]]`, observation gap path, evidence rule) so rollout stays architecture-first, delta-focused, and tied to real observation or approved excerpts.
+Net-new packages without a baseline skill directory use `skill-from-ground-up.md` instead.
 ## Principles (apply across all phases)
 1. **Evals before documentation.** Never write extensive skill content without evaluation scenarios to validate it.
@@ -85,3 +103,5 @@ See `${CLAUDE_SKILL_DIR}/references/delegation-map.md` for exact invocation patt
 | `references/delegation-map.md` | Integration map for skill-writer and skill-creator |
 | `templates/gap-analysis.md` | Template for Phase 1 gap documentation |
 | `templates/eval-scenario.json` | Eval template matching skill-creator schema |
+| `../prompt-generator/templates/skill-from-ground-up.md` | Required orchestration template for **net-new** full skill packages (architecture-first, checkpoints, evidence-backed evals) |
+| `../prompt-generator/templates/skill-refinement-package.md` | Required orchestration template for **existing-skill** multi-file refinements and package-aware polish (baseline + delta, checkpoints, evidence-backed evals) |

package/skills/skill-builder/references/delegation-map.md CHANGED Viewed

@@ -18,6 +18,8 @@ No external delegation. The orchestrator helps the user transform gaps into eval
 ## Phase 3: Write Skill -- Delegate to `/skill-writer`
+**Full package path:** If the user approved a package plan from `prompt-generator/templates/skill-from-ground-up.md` (net-new) or `prompt-generator/templates/skill-refinement-package.md` (existing baseline), paste the approved architecture summary, baseline inventory, planned deltas, and checkpoint list into the skill-writer handoff so file order and scope stay aligned.
 Invoke `/skill-writer` with the following context in your prompt:
 ```

package/skills/skill-builder/workflows/improve-skill.md CHANGED Viewed

@@ -51,6 +51,15 @@ Observation-first flow for iterating on an existing skill.
 From here, follow the same phases as `${CLAUDE_SKILL_DIR}/workflows/new-skill.md`, starting at Phase 2 (Build Evals).
+### Collaborative package orchestration (Phases 2–6)
+Whenever Phases 2–6 will touch **multiple files**, **progressive disclosure layout**, or use **checkpointed file-by-file rollout**, treat this as **required** before expanding or rewriting the tree:
+1. Read `prompt-generator/templates/skill-refinement-package.md` from the claude-dev-env `skills/` tree (repository path: `packages/claude-dev-env/skills/prompt-generator/templates/skill-refinement-package.md`).
+2. Run `/prompt-generator` with that template’s token table filled: set `[[BASELINE_SKILL_ROOT]]` to the existing skill directory, `[[WORKSPACE_ROOT]]` to your iteration workspace (in-place or snapshot per user preference), and `[[DESIGN_INPUT_GLOB]]` to this workflow’s observation-based `gap-analysis.md` when it exists.
+Use `skill-from-ground-up.md` **only** for **greenfield** packages where no baseline skill directory exists yet; use `skill-refinement-package.md` for every refinement anchored to an existing skill.
 Key differences from the new-skill flow:
 - **Phase 2 (Build Evals):** Evals should test the specific issues observed in Phase 1, not hypothetical gaps.

package/skills/skill-builder/workflows/new-skill.md CHANGED Viewed

@@ -7,6 +7,18 @@ Full evaluation-driven lifecycle for building a new skill from scratch.
 - The user has a task or domain they want to capture as a skill
 - No existing skill for this capability (or intentionally starting fresh)
+### Ground-up package layout (required before multi-file implementation)
+When the outcome includes **ARCHITECTURE.md**, **REFERENCE / EXAMPLES / WORKFLOWS**, and **`evals/*.json`** under a workspace (Anthropic-style progressive disclosure plus checkpointed rollout):
+1. Read `prompt-generator/templates/skill-from-ground-up.md` from the claude-dev-env `skills/` tree (repository path: `packages/claude-dev-env/skills/prompt-generator/templates/skill-from-ground-up.md`).
+2. Run `/prompt-generator` using that template (substitute tokens per its table) **before** Phase 3 expands the repo; align the XML scope block with this workflow’s workspace and evidence rules.
+3. Keep Phase 1–2 artifacts honest: eval prompts and expectations stay grounded in **real** user scenarios; the template reinforces eval rows that reference pasted or explicitly approved evidence only.
+Skip this block only when the user explicitly wants a **single-file** SKILL.md with no staged package plan.
+Refinements to an **existing** skill package use `prompt-generator/templates/skill-refinement-package.md` instead (see `improve-skill.md`).
 ---
 ## Phase 1: Identify Gaps

package/skills/skill-builder/workflows/polish-skill.md CHANGED Viewed

@@ -8,6 +8,15 @@ Final optimization pass for a skill that is functionally complete.
 - The user is satisfied with output quality
 - This is the final step before the skill is considered done
+### Package-aware polish (recommended)
+When the polish pass will touch **more than frontmatter alone** (for example `REFERENCE.md`, `EXAMPLES.md`, `WORKFLOWS.md`, link structure, or eval JSON), or the user wants **checkpointed** multi-file updates alongside description work:
+1. Read `prompt-generator/templates/skill-refinement-package.md` (repository path: `packages/claude-dev-env/skills/prompt-generator/templates/skill-refinement-package.md`).
+2. Run `/prompt-generator` with tokens filled so `ARCHITECTURE.md` records baseline inventory, planned deltas for polish, and evidence rules for any new trigger or behavior evals.
+Purely **single-field** `description` edits with no structural package changes can skip this block.
 ---
 ## Step 1: Description Optimization

package/skills/skill-writer/SKILL.md CHANGED Viewed

@@ -32,6 +32,24 @@ Use this skill when the user needs a structured skill artifact; for quick answer
 When invoked with arguments (e.g. `/skill-writer improve this: [paste]`), treat `$ARGUMENTS` as the skill content to refine.
+### Ground-up multi-file packages (required)
+When the user is creating a **new** skill as a **package** (workspace with `ARCHITECTURE.md`, `REFERENCE.md`, `EXAMPLES.md`, `WORKFLOWS.md`, `evals/*.json`, per-file human review), **before** drafting `SKILL.md`:
+1. Read `packages/claude-dev-env/skills/prompt-generator/templates/skill-from-ground-up.md` (installed layout: sibling folder `prompt-generator/templates/skill-from-ground-up.md` under the same `skills/` parent).
+2. Ensure `/prompt-generator` has run with that template filled so architecture-first steps, checkpoint gates, and eval evidence rules are already agreed.
+If the task is **only** editing an existing `SKILL.md` or a small single-file tweak, this subsection does not apply.
+### Refinement multi-file packages (required)
+When the user is **refining** an existing skill as a **package** (baseline skill directory, `ARCHITECTURE.md` with planned deltas, checkpointed updates to REFERENCE / EXAMPLES / WORKFLOWS / `evals/`), **before** rewriting multiple files:
+1. Read `packages/claude-dev-env/skills/prompt-generator/templates/skill-refinement-package.md` (installed layout: `prompt-generator/templates/skill-refinement-package.md` under the same `skills/` parent).
+2. Ensure `/prompt-generator` has run with that template filled so baseline root, workspace root, observation inputs, and evidence rules are fixed before edits proceed.
+If the change set is a **small single-file** tweak, this subsection does not apply.
 ## Workflow (run in order)
 ### 1. Classify the skill type