claude-dev-env 1.15.0 → 1.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -53,7 +53,7 @@ Match `TARGET_OUTPUT.md`. Summary:
53
53
  4. **Full audit table / JSON debug object:** Append only after the user uses an explicit debug phrase such as `show debug`, `full audit table`, or `raw internal object`.
54
54
  5. **Commit-and-execute:** Pick a drafting approach, run it to completion, ship the XML; change plans only when **new** facts from the user or tools contradict the earlier scope.
55
55
 
56
- **Required XML sections** inside the fence: `<role>`, `<context>`, `<instructions>`, `<constraints>`, `<output_format>`. Optional: `<examples>`, `<open_question>` (use for unresolved discovery — see structural invariant D in `TARGET_OUTPUT.md`).
56
+ **Required XML sections** inside the fence: `<role>`, `<background>`, `<instructions>`, `<constraints>`, `<output_format>`. Optional: `<illustrations>`, `<open_question>` (use for unresolved discovery — see structural invariant D in `TARGET_OUTPUT.md`).
57
57
 
58
58
  ## Scenario router
59
59
 
@@ -64,14 +64,14 @@ Match `TARGET_OUTPUT.md`. Summary:
64
64
  | **3 — Long unstructured input** | Many requirements / paths in one message | Verify repo references (packages, shared utils, configs) with targeted tools **before** questions | First question **confirms extracted intent**; ambiguities as **specific** options; **every** user-stated requirement captured in the generated XML by name — track all requirements from the unstructured input and confirm coverage before shipping |
65
65
  | **4 — Noisy context** | Long unrelated thread before `/prompt-generator` | Build the subagent brief from: the user’s literal `/prompt-generator` text, a **≤120-word** summary of on-topic facts, and discovery notes—**exclude** raw stack traces and unrelated tangents | As needed (often Scenario 1-shaped) |
66
66
 
67
- **Handoff (Scenario 2):** `<context>` must be **self-contained** — state, **decisions**, files touched, next steps, constraints — so a new session needs no prior chat. Preserve prior decisions verbatim in the handoff; quote the exact decision text where precision matters rather than paraphrasing it away.
67
+ **Handoff (Scenario 2):** `<background>` must be **self-contained** — state, **decisions**, files touched, next steps, constraints — so a new session needs no prior chat. Preserve prior decisions verbatim in the handoff; quote the exact decision text where precision matters rather than paraphrasing it away.
68
68
 
69
69
  ## Phase ordering (structural invariant A)
70
70
 
71
71
  For the **final** user-visible turn that ships the artifact:
72
72
 
73
73
  - Compose the message as **audit line → opening fence → XML → closing fence → end**; keep the byte stream free of `tool_use` blocks **between** the opening and closing fences.
74
- - **Completeness:** End every numbered step inside `<instructions>` with a complete sentence and a fully written list item. Balance every XML tag explicitly (open and close each `<role>`, `<context>`, `<instructions>`, `<constraints>`, `<output_format>`). The artifact must be copy-pasteable into a new file with zero manual repair.
74
+ - **Completeness:** End every numbered step inside `<instructions>` with a complete sentence and a fully written list item. Balance every XML tag explicitly (open and close each `<role>`, `<background>`, `<instructions>`, `<constraints>`, `<output_format>`). The artifact must be copy-pasteable into a new file with zero manual repair.
75
75
  - Global pipeline: **discovery tools** (when applicable) → **AskUserQuestion** → **subagent** (draft + refinement + internal audit) → **one** orchestrator reply containing only audit line + fence.
76
76
 
77
77
  ## Interactive discovery mode (default)
@@ -116,7 +116,7 @@ Match specificity to task fragility:
116
116
 
117
117
  ### 3. Collect required missing facts
118
118
 
119
- If AskUserQuestion did not cover something essential, the drafting agent either (a) inserts `<open_question>` in `<context>` with the missing fact spelled out, or (b) signals the orchestrator to run **another** AskUserQuestion round **before** emitting the fence—avoid free-form clarification paragraphs in the orchestrator chat.
119
+ If AskUserQuestion did not cover something essential, the drafting agent either (a) inserts `<open_question>` in `<background>` with the missing fact spelled out, or (b) signals the orchestrator to run **another** AskUserQuestion round **before** emitting the fence—avoid free-form clarification paragraphs in the orchestrator chat.
120
120
 
121
121
  ### 3A. Anchor scope to concrete artifacts (required)
122
122
 
@@ -132,15 +132,15 @@ Use this scope block as the grounding contract for all generated instructions. E
132
132
 
133
133
  ### 4. Build the prompt
134
134
 
135
- Apply principles from Anthropic’s prompting guide (see REFERENCE.md): XML sections, role, motivation in `<context>`, positive framing, emotion-informed collaborative tone where appropriate, **commit-and-execute** for multi-step agent prompts.
135
+ Apply principles from Anthropic’s prompting guide (see REFERENCE.md): XML sections, role, motivation in `<background>`, positive framing, emotion-informed collaborative tone where appropriate, **commit-and-execute** for multi-step agent prompts.
136
136
 
137
137
  **Structural invariant D:** Write `<instructions>` / `<constraints>` as direct imperatives (“Open `path/to/file.ts` and …”). Park unresolved items in `<open_question>` tags—one distinct question per tag with the exact decision you need. Inside the fenced XML artifact, use only confident, definitive language: replace hedging phrases (“let me also check”, “actually”, “one more consideration”) and tentative qualifiers (“might be”, “possibly”, “I think”, “could be”) with direct assertions or move genuine uncertainty into `<open_question>` tags.
138
138
 
139
139
  **Set a role** in the system prompt. Anthropic: "Setting a role in the system prompt focuses Claude's behavior and tone for your use case. Even a single sentence makes a difference."
140
140
 
141
- **Add motivation behind constraints** in `<context>`. Anthropic: "Providing context or motivation behind your instructions... can help Claude better understand your goals and deliver more targeted responses." Claude generalizes from the explanation.
141
+ **Add motivation behind constraints** in `<background>`. Anthropic: "Providing context or motivation behind your instructions... can help Claude better understand your goals and deliver more targeted responses." Claude generalizes from the explanation.
142
142
 
143
- **Frame positively (zero-negative-keyword rule).** Anthropic: state the desired outcome directly. "Your response should be composed of smoothly flowing prose paragraphs" provides clearer guidance than a prohibition-only instruction. Apply this rule absolutely inside the fenced XML artifact across all sections (`<role>`, `<context>`, `<instructions>`, `<constraints>`, `<output_format>`): every instruction states what to do, what to produce, what to enforce. Use affirmative directives exclusively: "only X", "always X", "ensure X", "require X." Banned keywords inside generated XML: "no", "not", "don't", "do not", "never", "avoid", "without", "refrain", "stop", "prevent", "exclude", "prohibit", "forbid", "reject", "cannot", "unless." Also banned: indirect negative patterns such as "instead of X", "rather than X", "as opposed to." Example pass: "Ensure all functions have explicit return types." Example fail: "Do not leave return types implicit." When a boundary is needed, phrase it as what is permitted: "only run commands within the scoped paths" rather than a prohibition.
143
+ **Frame positively (zero-negative-keyword rule).** Anthropic: state the desired outcome directly. "Your response should be composed of smoothly flowing prose paragraphs" provides clearer guidance than a prohibition-only instruction. Apply this rule absolutely inside the fenced XML artifact across all sections (`<role>`, `<background>`, `<instructions>`, `<constraints>`, `<output_format>`): every instruction states what to do, what to produce, what to enforce. Use affirmative directives exclusively: "only X", "always X", "ensure X", "require X." Banned keywords inside generated XML: "no", "not", "don't", "do not", "never", "avoid", "without", "refrain", "stop", "prevent", "exclude", "prohibit", "forbid", "reject", "cannot", "unless." Also banned: indirect negative patterns such as "instead of X", "rather than X", "as opposed to." Example pass: "Ensure all functions have explicit return types." Example fail: "Do not leave return types implicit." When a boundary is needed, phrase it as what is permitted: "only run commands within the scoped paths" rather than a prohibition.
144
144
 
145
145
  **Emotion-informed framing.** Anthropic's emotion concepts research (2026) shows that internal activation patterns causally influence output quality. Apply: explicit success criteria with "say so if you're unsure" as an accepted answer; collaborative language ("help figure out", "work on this together"); framing tasks as interesting problems rather than chores; constructive, forward-looking tone. Cross-model caveat: studied on Sonnet 4.5; the patterns align with Anthropic's prompting best practices independently. Full pattern catalog and citations: `packages/claude-dev-env/docs/emotion-informed-prompt-design.md`.
146
146
 
@@ -164,9 +164,18 @@ State desired outcomes explicitly; use XML inside the generated prompt when mixi
164
164
 
165
165
  Tune verbosity in the **generated** prompt: summaries after tool use vs direct answers — as appropriate to the user’s AskUserQuestion answers.
166
166
 
167
- ### 7. Add examples
167
+ ### 7. Add illustrations (`<illustrations>`)
168
168
 
169
- For format- or tone-sensitive **generated** prompts, include 3–5 `<example>` blocks where helpful.
169
+ Use the optional `<illustrations>` section when concrete samples make format, tone, or structure obvious to the downstream reader.
170
+
171
+ **Code and command samples inside `<illustrations>` (drafting subagent — follow in order):**
172
+
173
+ 1. **Indented block (default for chat-stable rendering):** Put each line of sample shell, Python, JSON, or config text at **four spaces** of indentation from the left margin of the XML text so the sample reads as a single monospaced block inside `<illustrations>` using **only** leading spaces on each sample line (plain text inside the XML).
174
+ 2. **Tilde fence:** When the sample needs explicit fence delimiters, use a **tilde** fence only: an opening line `~~~` plus an optional info word (e.g. `~~~bash`), the sample lines, then a closing line `~~~` alone on its own line.
175
+ 3. **Triple-backtick inner fence:** When the sample must use backtick fences, emit a **complete pair**: an opening line beginning with three backticks plus an info string (e.g. `` ```bash ``), the sample lines, then a closing line containing only three backticks. The prompt-workflow hook and clipboard path treat that pair as one unit inside the outer `` ```xml `` fence. For the **most stable on-screen rendering** in chat UIs, use step 1 or step 2 above before this option.
176
+ 4. **Cap count:** Include **three to five** distinct illustration blocks (narrative plus optional sample) unless the user’s brief asks for a different depth.
177
+
178
+ These steps are **machine-facing obligations** for the orchestrator and drafting subagent. The person invoking `/prompt-generator` receives the finished fenced XML; the skill text above is what the model follows when filling `<illustrations>`.
170
179
 
171
180
  ### 8. Light self-check (subagent, pre-return)
172
181
 
@@ -185,7 +194,8 @@ Expand the light self-check with this internal checklist when useful:
185
194
  - [ ] Emotion-informed framing is present: collaborative language, explicit success criteria, and explicit permission to express uncertainty ("say so if unsure")
186
195
  - [ ] Constraints are surfaced upfront (proactive constraint awareness) so the model can incorporate them into its plan, and each non-obvious constraint carries its motivation
187
196
  - [ ] Self-correction chaining is considered when the prompt must hold up over time (generate → review → refine)
188
- - [ ] All five required XML sections (`<role>`, `<context>`, `<instructions>`, `<constraints>`, `<output_format>`) are present with both opening and closing tags in the fenced artifact
197
+ - [ ] All five required XML sections (`<role>`, `<background>`, `<instructions>`, `<constraints>`, `<output_format>`) are present with both opening and closing tags in the fenced artifact
198
+ - [ ] If `<illustrations>` is present, code or command samples inside it follow §7 (indented block, or tilde fence, or complete triple-backtick pair in that priority order)
189
199
 
190
200
  ### 9. Deliver (orchestrator)
191
201
 
@@ -197,25 +207,25 @@ Audit: pass 15/15
197
207
 
198
208
  (or `fail N/15 — …`), immediately followed by **one** fenced XML block; **send boundary** is immediately after the closing fence so the user receives a copy-ready pair (audit line + artifact) in one assistant message before the conversation continues.
199
209
 
200
- **Render-survival:** When the fenced XML uses tag names that **collide with HTML5 elements** (`context`, `section`, `summary`, `details`, `header`, `footer`, `main`, `aside`, `article`, `nav`, `figure`), or when the artifact is **very large**, **write the artifact to a file** and give the user the path together with the usual one-line audit. Add a brief **section inventory** (confirming the five required sections) so the user can trust the file even if the inline fence would render poorly. Details: **TARGET_OUTPUT.md — Structural invariant E**.
210
+ **Render-survival:** When the fenced XML uses tag names that **collide with HTML5 elements** (`section`, `summary`, `details`, `header`, `footer`, `main`, `aside`, `article`, `nav`, `figure`), or when the artifact is **very large**, **write the artifact to a file** and give the user the path together with the usual one-line audit. Add a brief **section inventory** (confirming the five required sections) so the user can trust the file even if the inline fence would render poorly. Required grounding uses `<background>` (the old `context` name matched HTML). Details: **TARGET_OUTPUT.md — Structural invariant E**.
201
211
 
202
212
  ### 10. Default refinement mode (subagent-internal)
203
213
 
204
214
  For non-trivial requests, run inside the drafting subagent (use **draft-only** when the user explicitly asks for a quick draft / no refinement loop):
205
215
 
206
216
  1. Base draft
207
- 2. Section refinement in order: `role`, `context`, `instructions`, `constraints`, `output_format`, `examples` (examples optional if unused)
217
+ 2. Section refinement in order: `role`, `background`, `instructions`, `constraints`, `output_format`, `illustrations` (illustrations optional if unused)
208
218
  3. Merge to one canonical XML prompt
209
219
  4. Final **15-row compliance audit** pass/fail with evidence (internal)
210
220
  5. If fail: targeted fixes + capped re-audit rounds
211
221
 
212
- Required section list is immutable for this pipeline: `role`, `context`, `instructions`, `constraints`, `output_format`, `examples`.
222
+ Required section list is immutable for this pipeline: `role`, `background`, `instructions`, `constraints`, `output_format`, `illustrations`.
213
223
 
214
224
  ### 11. Compliance audit — 15-row checklist (internal, audit numerator)
215
225
 
216
226
  **Two-tier validation — tier 2:** The `15` in `Audit: pass 15/15` counts these **compliance** rows (stable ids for hooks). Tier 1 is the **light self-check** in §8—keep the steps separate so models do not merge them.
217
227
 
218
- **Runtime Stop hook:** In addition to the 15-row internal audit, the `prompt-workflow-stop-guard` Stop hook enforces **section presence** on prompt-workflow responses: any fenced Markdown XML block must include opening and closing tags for `role`, `context`, `instructions`, `constraints`, and `output_format`. Missing tags trigger a retry before the user sees a passing turn. Pair this with **Structural invariant E** in `TARGET_OUTPUT.md` so users still receive intact XML when chat renderers strip HTML-named tags.
228
+ **Runtime Stop hook:** In addition to the 15-row internal audit, the `prompt-workflow-stop-guard` Stop hook enforces **section presence** on prompt-workflow responses: any fenced Markdown XML block must include opening and closing tags for `role`, `background`, `instructions`, `constraints`, and `output_format`. Missing tags trigger a retry before the user sees a passing turn. Pair this with **Structural invariant E** in `TARGET_OUTPUT.md` so users still receive intact XML when chat renderers strip HTML-named tags. `prompt_workflow_gate_core.extract_fenced_xml_content` scans each inner Markdown fence (` ```lang ` through its closing `` ``` `` line) as a unit so hooks and clipboard copy see the **full** XML body, including everything after inner fences inside `<illustrations>`.
219
229
 
220
230
  | # | Row name |
221
231
  |---|----------|
@@ -36,7 +36,7 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
36
36
 
37
37
  **Output:** Send audit line, then one `xml` fence with the full prompt, then stop—the handoff message is complete.
38
38
 
39
- **Handoff prompt quality:** `<context>` must include the bullet lists above so a new session can continue with **zero** access to this chat. Quote decision text verbatim where precision matters.
39
+ **Handoff prompt quality:** `<background>` must include the bullet lists above so a new session can continue with **zero** access to this chat. Quote decision text verbatim where precision matters.
40
40
 
41
41
  ## Scenario 3: Long unstructured input
42
42
 
@@ -70,15 +70,15 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
70
70
  ## Structural invariant B — Fenced block closes cleanly
71
71
 
72
72
  - Use one opening ``` and one closing ``` for the artifact.
73
- - Balance every XML tag; close `<instructions>`, `<context>`, etc. explicitly.
73
+ - Balance every XML tag; close `<instructions>`, `<background>`, etc. explicitly.
74
74
  - End each numbered step inside `<instructions>` with a complete sentence and a fully written list item.
75
75
  - The user can copy from the opening ``` through the closing ``` into a new file without manual repair.
76
76
 
77
77
  ## Structural invariant C — Discovery before lock-in
78
78
 
79
- - When the user is unsure where logic lives, run discovery **before** you freeze the XML; record findings in `<context>` with paths from Glob/Grep.
79
+ - When the user is unsure where logic lives, run discovery **before** you freeze the XML; record findings in `<background>` with paths from Glob/Grep.
80
80
  - If discovery finds the owner file(s), reference them with repo-relative paths in `<instructions>`.
81
- - If discovery is inconclusive, add `<open_question>` in `<context>` naming what you searched and what remains unknown.
81
+ - If discovery is inconclusive, add `<open_question>` in `<background>` naming what you searched and what remains unknown.
82
82
  - After the opening fence of the artifact, treat the XML as frozen: finish editing inside that fence; route any new repo searches to a later user turn if needed.
83
83
 
84
84
  ## Structural invariant D — Certainty in instructions, questions in tags
@@ -89,9 +89,11 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
89
89
 
90
90
  ## Structural invariant E — Render-survival for XML sections
91
91
 
92
- - **Problem:** Tag names used for prompt XML sections can overlap **HTML5 element names**. Chat renderers may treat those tokens as HTML and hide or alter the content between tags. High-risk examples include: `context`, `section`, `summary`, `details`, `header`, `footer`, `main`, `aside`, `article`, `nav`, `figure`. The raw assistant text may be complete while the **rendered** message looks like sections are missing (notably `<context>`).
93
- - **Primary mitigation:** When the fenced XML artifact **contains any tag whose local name is on that HTML-collision list**, or when the artifact is **large enough that render truncation is likely**, the orchestrator **must write the full artifact to a file** (default: under `data/prompts/` or a path the user supplied earlier) and **paste the absolute file path** in the chat message. Pair the path with a **short section inventory** confirming all five required sections (`role`, `context`, `instructions`, `constraints`, `output_format`) are present in the file.
94
- - **Fallback when file write is unavailable:** Escape the **opening angle bracket** of colliding tags (for example `&lt;context>` user restores `<` when pasting) or use another distinctive wrapper **documented in the same message**, so the user can recover literal XML. State explicitly that the user should restore brackets when copying into another system.
92
+ - **Problem (HTML):** Tag names used for prompt XML sections can overlap **HTML5 element names**. Chat renderers may treat those tokens as HTML and hide or alter the content between tags. High-risk examples include: `section`, `summary`, `details`, `header`, `footer`, `main`, `aside`, `article`, `nav`, `figure`. The former required name `context` matched an HTML element; **required** sections now use `<background>` for situational grounding so the name stays off that list. The raw assistant text may be complete while the **rendered** message looks like sections are missing.
93
+ - **Problem (nested Markdown fences):** A ` ```bash ` (or other inner) line inside the outer ` ```xml ` block is still a line of text in the transcript, but many Markdown renderers treat it as **opening a nested code fence**, which **closes the outer fence early**. Everything after that point (including `</illustrations>` and other closing tags) can appear outside the code block or look “swallowed.” Hooks historically used a regex that stopped at the **first** triple-backtick line; `extract_fenced_xml_content` now walks inner fences (` ```lang ` closing `` ``` ``) before accepting the outer `` ``` `` that ends the `xml` block.
94
+ - **Primary mitigation:** When the fenced XML artifact **contains any tag whose local name is on the HTML-collision list**, or when the artifact is **large enough that render truncation is likely**, the orchestrator **must write the full artifact to a file** (default: under `data/prompts/` or a path the user supplied earlier) and **paste the absolute file path** in the chat message. Pair the path with a **short section inventory** confirming all five required sections (`role`, `background`, `instructions`, `constraints`, `output_format`) are present in the file.
95
+ - **Authoring rules for code inside `<illustrations>` (orchestrator + drafting subagent — see `SKILL.md` §7):** (1) Format each sample line with **four leading spaces** inside `<illustrations>` as the default for stable rendered chat. (2) **Or** use a **tilde fence**: `~~~` + optional language on the opening line, body, then `~~~` on its own line. (3) **Or** use a **complete triple-backtick pair** (opening `` ```lang `` line, body, closing `` ``` `` line); hooks and clipboard treat the pair as one unit inside the outer `` ```xml `` fence.
96
+ - **Fallback when file write is unavailable:** Escape the **opening angle bracket** of colliding tags (for example `&lt;section>` — user restores `<` when pasting) or use another distinctive wrapper **documented in the same message**, so the user can recover literal XML. State explicitly that the user should restore brackets when copying into another system.
95
97
  - **Structural safety net:** Regardless of renderer behavior, the **Stop hook section-presence gate** blocks any prompt-workflow response whose fenced XML is missing any required opening/closing section tag pair. Methodology: [Anthropic — Agent Skills: evaluation and iteration](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices#evaluation-and-iteration).
96
98
 
97
99
  ## XML artifact (minimum sections)
@@ -99,12 +101,12 @@ This file is the **target output spec** for eval-driven iteration of the `prompt
99
101
  Include at least:
100
102
 
101
103
  - `<role>...</role>`
102
- - `<context>...</context>`
104
+ - `<background>...</background>`
103
105
  - `<instructions>...</instructions>`
104
106
  - `<constraints>...</constraints>`
105
107
  - `<output_format>...</output_format>`
106
108
 
107
- Add `<examples>` when format or tone is easy to misunderstand; nest sections when the task has natural hierarchy.
109
+ Add `<illustrations>` when format or tone is easy to misunderstand; nest sections when the task has natural hierarchy. **Long code samples belong in `<illustrations>`** using the same ordered choices as Structural invariant E: four-space-indented lines first, then tilde fences, then a complete triple-backtick pair if the brief requires backtick fences (see `SKILL.md` §7).
108
110
 
109
111
  ## Internal 15-row compliance checklist (audit numerator)
110
112
 
@@ -15,7 +15,7 @@
15
15
  "AskUserQuestion contains 2-4 questions, each with 2-4 options, recommended option first",
16
16
  "Final response contains exactly: 1-liner audit status + one fenced XML prompt block",
17
17
  "No commentary, tables, audit rows, or explanation outside the fenced block",
18
- "Fenced block contains <role>, <context>, <instructions>, <constraints>, <output_format>",
18
+ "Fenced block contains <role>, <background>, <instructions>, <constraints>, <output_format>",
19
19
  "Prompt generation delegated to a subagent (Agent tool call visible in the flow)"
20
20
  ]
21
21
  },
@@ -30,7 +30,7 @@
30
30
  ],
31
31
  "expected_behavior": [
32
32
  "AskUserQuestion has 1-2 questions — lighter than Scenario 1",
33
- "Generated prompt <context> includes: session state, decisions, files modified, next steps",
33
+ "Generated prompt <background> includes: session state, decisions, files modified, next steps",
34
34
  "No redundant discovery tool calls for information already in conversation",
35
35
  "Handoff prompt is self-contained — a new session can resume without prior context",
36
36
  "Prior decisions preserved in the handoff, not lost or paraphrased away",
@@ -72,10 +72,10 @@
72
72
  "prompt": "/prompt-generator Create a prompt for an agent that traces a routing bug across shared_utils/export_handler.py, orchestrator.py, and download_manager.py — find where extract_apk is called and whether it handles APK signature check failures",
73
73
  "files": ["packages/samsung-automation/shared_utils/export_handler.py"],
74
74
  "expected_behavior": [
75
- "No tool_use blocks appear after the first fence marker of the prompt artifact",
75
+ "No tool_use blocks appear after the first fence marker of the canonical prompt artifact",
76
76
  "All Glob/Grep discovery calls precede the AskUserQuestion",
77
77
  "All AskUserQuestion interactions precede the fenced block",
78
- "Prompt artifact emits in a single uninterrupted response"
78
+ "Review the last successful Audit + fenced xml pair; blocked retry attempts preserved by flattened transcript exports do not count as additional delivered artifacts"
79
79
  ]
80
80
  },
81
81
  {
@@ -85,7 +85,7 @@
85
85
  "prompt": "/prompt-generator Write a detailed agent-harness prompt for a TDD bug-fix workflow that traces a routing error across 5+ files, with state management for multi-window execution and structured test tracking",
86
86
  "files": [],
87
87
  "expected_behavior": [
88
- "Opening fence has a matching closing fence",
88
+ "The canonical prompt artifact has one opening xml fence and one matching closing fence; flattened transcript exports are normalized to that same boundary before review",
89
89
  "Every XML tag properly opened and closed",
90
90
  "No truncation at numbered-list bullets (the Issue #41 failure mode)",
91
91
  "No mid-sentence cuts or incomplete sections",
@@ -101,9 +101,9 @@
101
101
  "expected_behavior": [
102
102
  "Discovery tool calls attempt to locate scoring logic before prompt generation",
103
103
  "If resolved: prompt references concrete file paths from discovery",
104
- "If unresolved: prompt contains <open_question> in <context> for downstream agent",
105
- "No re-entry to discovery after fenced block starts",
106
- "AskUserQuestion may surface the uncertainty if discovery was inconclusive"
104
+ "If unresolved: prompt contains <open_question> in <background> for downstream agent",
105
+ "No re-entry to discovery after the canonical artifact fence starts",
106
+ "AskUserQuestion may surface the uncertainty if discovery was inconclusive; when discovery resolves concrete paths before the artifact, absence of <open_question> is expected"
107
107
  ]
108
108
  },
109
109
  {
@@ -131,7 +131,7 @@
131
131
  "Every instruction phrased as a positive directive: what TO do, what TO produce, what TO enforce",
132
132
  "Constraints section uses affirmative boundaries: 'only X', 'always X', 'ensure X', 'require X' — positive framing throughout",
133
133
  "Example: 'Ensure all functions have explicit return types' passes; 'Do not leave return types implicit' fails; 'Avoid missing return types' fails",
134
- "Applies to all sections inside the fenced block: <role>, <context>, <instructions>, <constraints>, <output_format>"
134
+ "Applies to all sections inside the fenced block: <role>, <background>, <instructions>, <constraints>, <output_format>"
135
135
  ]
136
136
  },
137
137
  {
@@ -141,7 +141,7 @@
141
141
  "prompt": "/prompt-generator Write a system prompt for a Python linting agent that auto-fixes code style issues in this repo",
142
142
  "files": [],
143
143
  "expected_behavior": [
144
- "Fenced XML block contains opening and closing tags for all five required sections: role, context, instructions, constraints, output_format",
144
+ "Fenced XML block contains opening and closing tags for all five required sections: role, background, instructions, constraints, output_format",
145
145
  "Each required section contains substantive content (minimum one sentence each)",
146
146
  "The Stop hook section-presence check passes for this output (no missing section tags)",
147
147
  "Sections appear in order: role first, output_format last among the five required sections"
@@ -151,10 +151,10 @@
151
151
  "id": 11,
152
152
  "name": "section_missing_triggers_hook_block",
153
153
  "scenario": "Section completeness gate — failure path",
154
- "prompt": "Synthetic eval: assistant final message is prompt-workflow shaped (overall_status, checklist, scope anchors, runtime signals) with a fenced Markdown XML block whose body omits the entire context section (no context opening/closing tags); observer asserts Stop hook behavior and successful retry.",
154
+ "prompt": "Synthetic eval: assistant final message is prompt-workflow shaped (overall_status, checklist, scope anchors, runtime signals) with a fenced Markdown XML block whose body omits the entire background section (no background opening/closing tags); observer asserts Stop hook behavior and successful retry.",
155
155
  "files": [],
156
156
  "expected_behavior": [
157
- "The Stop hook runs _check_required_xml_sections and returns a block decision naming context as a missing section",
157
+ "The Stop hook runs _check_required_xml_sections and returns a block decision naming background as a missing section",
158
158
  "The model retry includes all five required sections with both opening and closing tags",
159
159
  "The retry output passes the section-presence gate (empty missing list from missing_required_xml_sections)"
160
160
  ]
@@ -166,10 +166,22 @@
166
166
  "prompt": "/prompt-generator Write a comprehensive agent prompt for migrating a large Prisma schema and all related API routes, with step-by-step rollout, rollback, and verification — artifact sized like the migration prompt that triggered chat render stripping.",
167
167
  "files": [],
168
168
  "expected_behavior": [
169
- "When the artifact exceeds a size threshold or contains XML section tag names that collide with HTML5 elements (context, section, summary, details, header, footer, main, aside, article, nav, figure), the orchestrator writes the full artifact to a file under data/prompts/ or a user-specified path",
169
+ "When the artifact exceeds a size threshold or contains XML section tag names that collide with HTML5 elements (section, summary, details, header, footer, main, aside, article, nav, figure), the orchestrator writes the full artifact to a file under data/prompts/ or a user-specified path",
170
170
  "The file contains the complete XML with all tags preserved as literal text",
171
171
  "The user-facing message states the file path and briefly inventories which required sections the artifact contains"
172
172
  ]
173
+ },
174
+ {
175
+ "id": 13,
176
+ "name": "nested_inner_fence_does_not_truncate_xml_for_hooks",
177
+ "scenario": "Structural invariant E — nested Markdown fences inside ```xml",
178
+ "prompt": "/prompt-generator Include <illustrations> with a bash snippet using triple-backtick fences inside the XML, mirroring real prompts that previously hid </illustrations> in chat and broke hook extraction.",
179
+ "files": [],
180
+ "expected_behavior": [
181
+ "prompt_workflow_gate_core.extract_fenced_xml_content includes text after inner ```bash ... ``` lines up to the final closing ``` of the xml fence",
182
+ "missing_required_xml_sections sees closing tags for role, background, instructions, constraints, output_format when those appear after nested fences",
183
+ "SKILL.md §7 states ordered authoring steps for <illustrations>: four-space-indented sample lines, then tilde fences, then a complete triple-backtick pair when required"
184
+ ]
173
185
  }
174
186
  ]
175
187
  }
@@ -0,0 +1,87 @@
1
+ ---
2
+ name: skill-builder
3
+ description: >-
4
+ Orchestrates the complete skill-building lifecycle using evaluation-driven
5
+ development. Routes through gap analysis, eval creation, skill writing (via
6
+ skill-writer), subagent testing (via skill-creator infrastructure), and
7
+ iterative refinement. Use when creating new skills, improving existing skills,
8
+ or optimizing skill descriptions. Triggers: 'build a skill', 'new skill
9
+ workflow', 'improve this skill', 'optimize skill description', 'skill
10
+ development lifecycle'.
11
+ ---
12
+
13
+ @${CLAUDE_SKILL_DIR}/references/eval-driven-flow.md
14
+
15
+ # Skill Builder
16
+
17
+ **Core principle:** Evaluation-driven development. Build evals BEFORE writing extensive documentation. This ensures skills solve real problems rather than documenting imagined ones.
18
+
19
+ Source: [Anthropic Skill Best Practices - Evaluation and Iteration](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices#evaluation-and-iteration)
20
+
21
+ ## When this skill applies
22
+
23
+ Trigger for requests to **build**, **improve**, or **polish** a skill through the full evaluation-driven lifecycle. This skill orchestrates the process -- it delegates writing to `/skill-writer` and evaluation infrastructure to the `skill-creator` plugin.
24
+
25
+ For quick skill syntax questions or one-off SKILL.md edits, use `/skill-writer` directly instead.
26
+
27
+ ## Routing
28
+
29
+ Assess the user's intent from conversation context and existing artifacts. Route directly:
30
+
31
+ **Creating a new skill?**
32
+ Read `${CLAUDE_SKILL_DIR}/workflows/new-skill.md` and follow it.
33
+
34
+ **Improving an existing skill?**
35
+ Read `${CLAUDE_SKILL_DIR}/workflows/improve-skill.md` and follow it.
36
+
37
+ **Final polish only (description optimization, trigger eval)?**
38
+ Read `${CLAUDE_SKILL_DIR}/workflows/polish-skill.md` and follow it.
39
+
40
+ **Ambiguous?** Ask: "Are you creating a new skill, improving an existing one, or doing a final polish pass?"
41
+
42
+ ## The Claude A / Claude B Pattern
43
+
44
+ You and the user are **Claude A** -- the expert who designs and refines the skill. Subagents running the built skill on eval tasks are **Claude B** -- the agent using the skill to perform real work.
45
+
46
+ > "Work with one instance of Claude ('Claude A') to create a Skill that is used by other instances ('Claude B'). Claude A helps you design and refine instructions, while Claude B tests them in real tasks."
47
+
48
+ The feedback loop: observe Claude B's behavior, bring insights back, refine the skill, test again.
49
+
50
+ ## Phase Overview
51
+
52
+ | Phase | Purpose | Delegated To |
53
+ |-------|---------|-------------|
54
+ | 1. Identify gaps | Document what fails without the skill | This skill (guided conversation) |
55
+ | 2. Build evals | Create 3+ scenarios testing the gaps | This skill (templates + user input) |
56
+ | 3. Write skill | Minimal instructions addressing gaps | `/skill-writer` |
57
+ | 4. Test | Subagent runs with/without skill, grade, benchmark | `skill-creator` eval infrastructure |
58
+ | 5. Iterate | Review results, refine, re-test | This skill + `/skill-writer` + Phase 4 |
59
+ | 6. Polish | Description optimization, trigger eval, final check | `skill-creator` description optimizer |
60
+
61
+ ## Principles (apply across all phases)
62
+
63
+ 1. **Evals before documentation.** Never write extensive skill content without evaluation scenarios to validate it.
64
+
65
+ 2. **Minimal instructions first.** Write just enough to pass evaluations. Resist the urge to over-document.
66
+
67
+ 3. **Generalize from feedback.** The skill will be used across many prompts. Do not overfit to test cases.
68
+
69
+ 4. **Explain the why.** Theory of mind beats rigid rules. Help the model understand reasoning, not just constraints.
70
+
71
+ 5. **Observe, do not assume.** Iterate based on what Claude B actually does, not what you think it should do.
72
+
73
+ ## Delegation Details
74
+
75
+ See `${CLAUDE_SKILL_DIR}/references/delegation-map.md` for exact invocation patterns and integration points between this orchestrator, `/skill-writer`, and `skill-creator`.
76
+
77
+ ## File Index
78
+
79
+ | File | Purpose |
80
+ |------|---------|
81
+ | `workflows/new-skill.md` | Full lifecycle for new skills (6 phases) |
82
+ | `workflows/improve-skill.md` | Observation-first flow for existing skills |
83
+ | `workflows/polish-skill.md` | Description optimization and final validation |
84
+ | `references/eval-driven-flow.md` | Official Anthropic methodology with citations |
85
+ | `references/delegation-map.md` | Integration map for skill-writer and skill-creator |
86
+ | `templates/gap-analysis.md` | Template for Phase 1 gap documentation |
87
+ | `templates/eval-scenario.json` | Eval template matching skill-creator schema |
@@ -0,0 +1,151 @@
1
+ # Delegation Map
2
+
3
+ How the skill-builder orchestrator integrates with external skills at each phase.
4
+
5
+ ## Phase 1: Identify Gaps -- This Orchestrator
6
+
7
+ No external delegation. The orchestrator guides a conversation with the user to document what fails without a skill.
8
+
9
+ **Output:** `[skill-name]-workspace/gap-analysis.md` using the template at `templates/gap-analysis.md`.
10
+
11
+ ## Phase 2: Build Evals -- This Orchestrator
12
+
13
+ No external delegation. The orchestrator helps the user transform gaps into eval scenarios.
14
+
15
+ **Output:** `[skill-name]-workspace/evals/evals.json` using the template at `templates/eval-scenario.json`.
16
+
17
+ **Baseline runs:** Spawn subagents WITHOUT any skill for each eval scenario. These run as background Agent tasks.
18
+
19
+ ## Phase 3: Write Skill -- Delegate to `/skill-writer`
20
+
21
+ Invoke `/skill-writer` with the following context in your prompt:
22
+
23
+ ```
24
+ Create a skill based on this gap analysis and eval scenarios.
25
+
26
+ Gap analysis: [paste or reference gap-analysis.md]
27
+ Eval scenarios: [paste or reference evals.json expected_output fields]
28
+ Baseline failures: [summarize what Claude got wrong without the skill]
29
+
30
+ Constraint: Write the minimum instructions needed to address these specific gaps.
31
+ Do not over-document. Every line must serve a documented gap.
32
+ ```
33
+
34
+ skill-writer handles: type classification, degree of freedom, frontmatter, body structure, progressive disclosure, self-check.
35
+
36
+ **Output:** The skill's SKILL.md (and optional REFERENCE.md, scripts, etc.)
37
+
38
+ ## Phase 4: Test -- Delegate to skill-creator Infrastructure
39
+
40
+ The skill-creator plugin provides the eval infrastructure. Reference its components directly:
41
+
42
+ ### Spawning test runs
43
+
44
+ For each eval, spawn TWO subagents in the SAME turn (parallel):
45
+
46
+ **With-skill subagent:**
47
+ ```
48
+ Execute this task:
49
+ - Read the skill at [path-to-skill]/SKILL.md and follow its instructions
50
+ - Task: [eval prompt from evals.json]
51
+ - Input files: [eval files if any]
52
+ - Save all output files to: [workspace]/iteration-N/eval-[name]/with_skill/outputs/
53
+ - Save a transcript of your complete work to: [workspace]/iteration-N/eval-[name]/with_skill/transcript.md
54
+ - At the end, write a metrics.json with tool call counts and file list
55
+ ```
56
+
57
+ **Without-skill subagent** (baseline):
58
+ For iteration-1, reuse baseline results from Phase 2 (iteration-0). For later iterations, the original baseline persists.
59
+
60
+ ### Grading
61
+
62
+ Read the grading agent instructions from the skill-creator plugin:
63
+ `[skill-creator-plugin-path]/agents/grader.md`
64
+
65
+ Spawn a grader subagent for each run with:
66
+ - The expectations from evals.json
67
+ - The transcript path
68
+ - The outputs directory
69
+
70
+ **Output:** `grading.json` in each run directory.
71
+
72
+ ### Benchmarking
73
+
74
+ Run the aggregation script from the skill-creator plugin directory:
75
+ ```bash
76
+ cd [skill-creator-plugin-path] && python -m scripts.aggregate_benchmark [workspace]/iteration-N --skill-name [name]
77
+ ```
78
+
79
+ **Output:** `benchmark.json` and `benchmark.md` in the iteration directory.
80
+
81
+ ### Eval Viewer
82
+
83
+ Launch the viewer from the skill-creator plugin:
84
+ ```bash
85
+ python [skill-creator-plugin-path]/eval-viewer/generate_review.py \
86
+ [workspace]/iteration-N \
87
+ --skill-name "[name]" \
88
+ --benchmark [workspace]/iteration-N/benchmark.json
89
+ ```
90
+
91
+ For iteration 2+, add: `--previous-workspace [workspace]/iteration-[N-1]`
92
+
93
+ If no browser/display available, add: `--static [workspace]/iteration-N/review.html`
94
+
95
+ **Output:** Browser-based reviewer where the user inspects outputs and leaves feedback.
96
+
97
+ ### Finding the skill-creator plugin path
98
+
99
+ The skill-creator plugin is installed at a path like:
100
+ `~/.claude/plugins/marketplaces/claude-plugins-official/plugins/skill-creator/skills/skill-creator/`
101
+
102
+ To find it dynamically, search for the skill-creator SKILL.md:
103
+ ```bash
104
+ find ~/.claude/plugins -name "SKILL.md" -path "*/skill-creator/*" 2>/dev/null | head -1
105
+ ```
106
+
107
+ Then derive the plugin root from that path.
108
+
109
+ ## Phase 5: Iterate -- This Orchestrator + `/skill-writer`
110
+
111
+ The orchestrator reads feedback.json and transcripts, synthesizes observations, then delegates refinement to `/skill-writer`:
112
+
113
+ ```
114
+ Refine this existing skill based on these observations from testing.
115
+
116
+ Current SKILL.md: [paste or reference]
117
+ User feedback: [from feedback.json]
118
+ Behavioral observations: [from transcript analysis]
119
+
120
+ Specific issues to address:
121
+ 1. [Issue from feedback]
122
+ 2. [Issue from observation]
123
+
124
+ Constraint: Only change what the feedback demands. Do not reorganize working content.
125
+ ```
126
+
127
+ Then return to Phase 4 with the refined skill.
128
+
129
+ ## Phase 6: Polish -- Delegate to skill-creator Description Optimizer
130
+
131
+ The skill-creator plugin includes a description optimization loop:
132
+
133
+ ### Trigger eval generation
134
+
135
+ Generate 20 realistic eval queries (10 should-trigger, 10 should-not-trigger). Use the HTML review template from:
136
+ `[skill-creator-plugin-path]/assets/eval_review.html`
137
+
138
+ ### Optimization loop
139
+
140
+ ```bash
141
+ cd [skill-creator-plugin-path] && python -m scripts.run_loop \
142
+ --eval-set [path-to-trigger-eval.json] \
143
+ --skill-path [path-to-skill] \
144
+ --model [current-model-id] \
145
+ --max-iterations 5 \
146
+ --verbose
147
+ ```
148
+
149
+ ### Final validation
150
+
151
+ Run the skill-writer self-check rubric (from skill-writer's Step 9) against the finished skill. All items must pass.
@@ -0,0 +1,41 @@
1
+ # Gap Analysis: [Skill Name]
2
+
3
+ ## Task Description
4
+
5
+ [What the user is trying to accomplish -- the capability this skill should provide]
6
+
7
+ ## Gaps Identified
8
+
9
+ ### Gap 1: [Descriptive Name]
10
+
11
+ - **What happened:** [Description of the failure or missing context when working without a skill]
12
+ - **What was needed:** [The specific context, instruction, or knowledge that would fix it]
13
+ - **Frequency:** [How often this comes up in real usage]
14
+ - **Example task:** [A concrete task that exposes this gap]
15
+
16
+ ### Gap 2: [Descriptive Name]
17
+
18
+ - **What happened:** [Description]
19
+ - **What was needed:** [Context/instruction needed]
20
+ - **Frequency:** [Frequency]
21
+ - **Example task:** [Concrete example]
22
+
23
+ ### Gap 3: [Descriptive Name]
24
+
25
+ - **What happened:** [Description]
26
+ - **What was needed:** [Context/instruction needed]
27
+ - **Frequency:** [Frequency]
28
+ - **Example task:** [Concrete example]
29
+
30
+ ## Patterns
31
+
32
+ - [Recurring themes across gaps -- e.g., "Claude consistently lacks knowledge about X"]
33
+ - [Common failure modes -- e.g., "Without guidance, Claude chooses library A when library B is required"]
34
+ - [Context that was repeatedly provided manually]
35
+
36
+ ## Candidate Eval Scenarios
37
+
38
+ - [Task that would expose Gap 1 -- becomes the seed for an eval]
39
+ - [Task that would expose Gap 2]
40
+ - [Task that would expose multiple gaps simultaneously]
41
+ - [Edge case that tests boundary behavior]