claude-dev-env 1.17.2 → 1.19.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/install.mjs +145 -63
- package/hooks/blocking/content-search-to-zoekt-redirector.py +55 -0
- package/hooks/blocking/content_search_zoekt_bash_block_reason.py +25 -0
- package/hooks/blocking/content_search_zoekt_block_payload.py +17 -0
- package/hooks/blocking/content_search_zoekt_indexed_paths.py +24 -0
- package/hooks/blocking/content_search_zoekt_indexed_roots_config.py +131 -0
- package/hooks/blocking/content_search_zoekt_redirect_guidance.py +19 -0
- package/hooks/blocking/destructive-command-blocker.py +53 -4
- package/hooks/blocking/test_content_search_to_zoekt_redirector_integration.py +54 -0
- package/hooks/blocking/test_content_search_to_zoekt_redirector_unit.py +51 -0
- package/hooks/blocking/test_content_search_zoekt_indexed_roots_config.py +102 -0
- package/hooks/blocking/test_destructive_command_blocker.py +108 -0
- package/package.json +4 -1
- package/skills/rule-audit/SKILL.md +2 -2
- package/hooks/HOOK_SPECS_PROMPT_WORKFLOW.md +0 -64
- package/hooks/blocking/prompt_workflow_clipboard.py +0 -63
- package/hooks/blocking/prompt_workflow_gate_config.py +0 -113
- package/hooks/blocking/prompt_workflow_gate_core.py +0 -289
- package/hooks/blocking/prompt_workflow_validate.py +0 -218
- package/hooks/blocking/test_prompt_workflow_clipboard.py +0 -54
- package/hooks/blocking/test_prompt_workflow_gate_core.py +0 -195
- package/hooks/blocking/test_prompt_workflow_validate.py +0 -339
- package/rules/prompt-workflow-context-controls.md +0 -48
- package/skills/agent-prompt/SKILL.md +0 -199
- package/skills/prompt-generator/ARCHITECTURE.md +0 -18
- package/skills/prompt-generator/REFERENCE.md +0 -254
- package/skills/prompt-generator/REFINEMENT_PIPELINE_RUNBOOK.md +0 -177
- package/skills/prompt-generator/SKILL.md +0 -354
- package/skills/prompt-generator/TARGET_OUTPUT.md +0 -133
- package/skills/prompt-generator/evals/prompt-generator.json +0 -207
- package/skills/prompt-generator/templates/skill-from-ground-up.md +0 -104
- package/skills/prompt-generator/templates/skill-refinement-package.md +0 -109
|
@@ -1,199 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: agent-prompt
|
|
3
|
-
description: >-
|
|
4
|
-
Craft a structured prompt using prompt-generator's workflow, then spawn a
|
|
5
|
-
background agent to execute it after user approval. Use instead of
|
|
6
|
-
/prompt-generator when the user wants execution, not just the prompt.
|
|
7
|
-
Triggers on /agent-prompt, "launch an agent for this", "spawn agent to do X",
|
|
8
|
-
"delegate this", "run this in background", or any task that benefits from
|
|
9
|
-
agent delegation with prompt quality.
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
@packages/claude-dev-env/skills/prompt-generator/SKILL.md
|
|
13
|
-
@packages/claude-dev-env/skills/prompt-generator/REFERENCE.md
|
|
14
|
-
|
|
15
|
-
# Agent Prompt
|
|
16
|
-
|
|
17
|
-
Craft a structured agent prompt, get approval, spawn a background agent.
|
|
18
|
-
|
|
19
|
-
The prompt-generator skill above defines the prompt-crafting workflow. This skill extends it: instead of delivering the prompt as a fenced block, it presents the prompt for approval and spawns a background agent.
|
|
20
|
-
|
|
21
|
-
## When this skill applies
|
|
22
|
-
|
|
23
|
-
Trigger only when the user explicitly wants to delegate or execute a task with an agent.
|
|
24
|
-
|
|
25
|
-
`/prompt-generator` is the default owner for prompt authoring and refinement. This skill starts after explicit execution intent.
|
|
26
|
-
|
|
27
|
-
When invoked with arguments (e.g. `/agent-prompt fix the auth bug via TDD`), treat the arguments as the task to build a prompt for and execute.
|
|
28
|
-
|
|
29
|
-
## Workflow
|
|
30
|
-
|
|
31
|
-
### Steps 1-8: Craft the prompt
|
|
32
|
-
|
|
33
|
-
Follow the prompt-generator workflow steps 1 through 8 exactly as written. Classify the prompt type, set degree of freedom, collect missing facts, build the prompt with XML tags and role, control format and style, add examples if needed, and self-check against the rubric.
|
|
34
|
-
|
|
35
|
-
After steps 1-8, continue directly to step 9 for context gathering; deliverables are handled through the orchestration flow below.
|
|
36
|
-
|
|
37
|
-
### Step 9: Gather context before crafting
|
|
38
|
-
|
|
39
|
-
The agent starts with zero conversation history. Before building the prompt, use Read, Glob, Grep, and other research tools to gather the concrete values the agent will need -- file paths, function signatures, existing patterns, branch names. Embed these directly in the prompt instead of telling the agent to "find" them.
|
|
40
|
-
|
|
41
|
-
The agent-spawn-protocol rule requires this: if any context question has the answer "I don't know", investigate first, then delegate with complete context.
|
|
42
|
-
|
|
43
|
-
Proactive context gathering enables agents to plan effectively from the start. Anthropic's emotion concepts research (2026) found that agents produce higher-quality output when they understand constraints, available tools, and system boundaries upfront — they incorporate these into their approach naturally, leading to better first attempts and more accurate results.
|
|
44
|
-
|
|
45
|
-
### Step 10: Determine agent configuration
|
|
46
|
-
|
|
47
|
-
Map the task to agent parameters:
|
|
48
|
-
|
|
49
|
-
| Task type | subagent_type | mode |
|
|
50
|
-
|---|---|---|
|
|
51
|
-
| Codebase exploration, search, research | Explore | default |
|
|
52
|
-
| Code implementation, bug fix, refactoring | general-purpose | auto |
|
|
53
|
-
| Read-only audit, analysis, review | general-purpose | default |
|
|
54
|
-
| Architecture, multi-step planning | Plan | plan |
|
|
55
|
-
|
|
56
|
-
Always set `run_in_background: true`.
|
|
57
|
-
|
|
58
|
-
Generate a descriptive `name` (3-5 words, kebab-case) so the user can track progress and send follow-up messages via `SendMessage({to: name})`.
|
|
59
|
-
|
|
60
|
-
### Step 10A: Section-refinement orchestration mode (default for execution tasks)
|
|
61
|
-
|
|
62
|
-
Execution behavior: run this deterministic orchestration for delegated prompt work after explicit launch intent.
|
|
63
|
-
Prompt authoring and prompt refinement ownership remain in `/prompt-generator`.
|
|
64
|
-
|
|
65
|
-
Use simplified mode when either condition is true:
|
|
66
|
-
- The user explicitly requests single-agent execution
|
|
67
|
-
- The task is genuinely too small for orchestration (for example, one quick read/search)
|
|
68
|
-
|
|
69
|
-
This mode is triggered when execution input includes `pipeline_mode: internal_section_refinement_with_final_audit` or equivalent execution-ready orchestration metadata.
|
|
70
|
-
If present, carry forward the scope block (`target_local_roots`, `target_canonical_roots`, `target_file_globs`, `comparison_basis`, `completion_boundary`) so execution remains artifact-bound.
|
|
71
|
-
|
|
72
|
-
1. Spawn exactly 6 refinement agents, one per section in fixed order:
|
|
73
|
-
- `role`
|
|
74
|
-
- `context`
|
|
75
|
-
- `instructions`
|
|
76
|
-
- `constraints`
|
|
77
|
-
- `output_format`
|
|
78
|
-
- `examples`
|
|
79
|
-
2. Enforce section-only scope in each sub-prompt:
|
|
80
|
-
- "Edit `<SECTION_NAME>` and preserve all other sections unchanged."
|
|
81
|
-
3. Require section output contract from each agent:
|
|
82
|
-
- `improved_block`
|
|
83
|
-
- `rationale`
|
|
84
|
-
- `concise_diff`
|
|
85
|
-
4. Merge outputs into one canonical prompt after all 6 refiners finish.
|
|
86
|
-
5. Run one final audit agent against the merged prompt and checklist.
|
|
87
|
-
6. If audit fails, apply targeted fixes and re-run audit with capped retries (`max_retries: 2` unless user overrides).
|
|
88
|
-
|
|
89
|
-
Run all stages in this exact order.
|
|
90
|
-
|
|
91
|
-
### Step 11: Present for approval (must reflect default orchestration)
|
|
92
|
-
|
|
93
|
-
Use AskUserQuestion with one question. The question text must summarize:
|
|
94
|
-
- agent config (type, mode, name)
|
|
95
|
-
- orchestration mode (`section_refinement_with_final_audit` by default)
|
|
96
|
-
- retry cap for audit loop
|
|
97
|
-
|
|
98
|
-
Each option should use the `preview` field to show the full crafted prompt.
|
|
99
|
-
|
|
100
|
-
Options:
|
|
101
|
-
1. "Launch it" (recommended) -- preview shows the crafted prompt
|
|
102
|
-
2. "Edit first" -- preview shows the prompt with a note that user can provide changes
|
|
103
|
-
3. "Cancel" -- no preview
|
|
104
|
-
|
|
105
|
-
### Step 12: Spawn
|
|
106
|
-
|
|
107
|
-
On **"Launch it"**: spawn the Agent tool with the crafted prompt and configuration. Report the agent name so the user knows what's running.
|
|
108
|
-
|
|
109
|
-
On **"Edit first"**: present the prompt in conversation text. After the user provides changes, return to step 11 with the updated prompt.
|
|
110
|
-
|
|
111
|
-
On **"Cancel"**: acknowledge and stop.
|
|
112
|
-
|
|
113
|
-
## Prompt adjustments for agent execution
|
|
114
|
-
|
|
115
|
-
When building the prompt in step 4, these adjustments ensure the agent can work independently:
|
|
116
|
-
|
|
117
|
-
**Context completeness** -- include file paths, line numbers, function names, branch state, and anything you learned during step 9. The agent cannot see this conversation.
|
|
118
|
-
Bind execution steps to the scope block artifacts passed from refinement output whenever available.
|
|
119
|
-
Keep runtime context compact: include only actionable facts required for execution.
|
|
120
|
-
|
|
121
|
-
**Acceptance criteria** -- state what "done" looks like. For code: include the test command. For research: specify the output format and save location.
|
|
122
|
-
|
|
123
|
-
**Scope boundary** -- include "Make requested changes and keep surrounding code stable" or equivalent. Agents with explicit scope constraints stay aligned to task intent.
|
|
124
|
-
|
|
125
|
-
**Constraints from this project** -- if the project has CODE_RULES.md, TDD requirements, or naming conventions, include the relevant subset in the prompt so the agent follows them.
|
|
126
|
-
|
|
127
|
-
**Emotion-informed briefing** -- Anthropic's emotion concepts research (2026) found that briefing style causally affects output quality. Frame tasks collaboratively ("work on this together", "help figure out"). Include permission to express uncertainty ("flag anything you're unsure about", "use [PLACEHOLDER] for unverified specifics"). Provide motivation behind constraints ("this ordering ensures tests define behavior before implementation exists"). Share system context proactively (what hooks enforce, what tools are available, what the fallback is) so the agent can incorporate constraints into its plan from the start.
|
|
128
|
-
|
|
129
|
-
**Anti-test-fixation** -- For code tasks, include guidance against test-specific solutions. Anthropic: "Implement a solution that works correctly for all valid inputs, not just the test cases. Tests are there to verify correctness, not to define the solution. If the task is unreasonable or infeasible, or if any of the tests are incorrect, please inform me rather than working around them."
|
|
130
|
-
|
|
131
|
-
**Commit-and-execute** -- For multi-step agent work, include decision commitment guidance. Anthropic: "When deciding how to approach a problem, choose an approach and commit to it. Avoid revisiting decisions unless you encounter new information that directly contradicts your reasoning."
|
|
132
|
-
|
|
133
|
-
**Temp file cleanup** -- If the agent may create scratch files during iteration, include cleanup instructions. Anthropic: "If you create any temporary new files, scripts, or helper files for iteration, clean up these files by removing them at the end of the task."
|
|
134
|
-
|
|
135
|
-
## Final audit-agent stage requirements (for default section-refinement mode)
|
|
136
|
-
|
|
137
|
-
After merge, run one dedicated audit agent that validates the full prompt against:
|
|
138
|
-
|
|
139
|
-
- Prompt-generator rubric requirements (`packages/claude-dev-env/skills/prompt-generator/SKILL.md`)
|
|
140
|
-
- The deterministic checklist from the handoff artifact
|
|
141
|
-
- Embedded research-mode evidence constraints below
|
|
142
|
-
|
|
143
|
-
Required audit output shape:
|
|
144
|
-
|
|
145
|
-
```json
|
|
146
|
-
{
|
|
147
|
-
"overall_status": "pass|fail",
|
|
148
|
-
"checklist_results": [
|
|
149
|
-
{
|
|
150
|
-
"check_id": "structured_scoped_instructions",
|
|
151
|
-
"status": "pass|fail",
|
|
152
|
-
"evidence_quote": "word-for-word quote",
|
|
153
|
-
"source_ref": "path-or-url",
|
|
154
|
-
"fix_if_fail": "targeted correction"
|
|
155
|
-
}
|
|
156
|
-
],
|
|
157
|
-
"corrective_edits": ["..."],
|
|
158
|
-
"retry_count": 0
|
|
159
|
-
}
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
### Embedded research-mode policy text (audit behavior)
|
|
163
|
-
|
|
164
|
-
The audit agent must enforce these constraints as policy text in the audit prompt (do not rely on a global mode switch):
|
|
165
|
-
|
|
166
|
-
- "Every recommendation, claim, or piece of advice must cite a specific source."
|
|
167
|
-
- "Ground your response in word-for-word quotes, not paraphrased summaries."
|
|
168
|
-
- "If you don't have a credible source for a claim, say 'I don't know'."
|
|
169
|
-
- Source priority:
|
|
170
|
-
1. Official vendor/creator docs for external tools
|
|
171
|
-
2. Local project files for local behavior
|
|
172
|
-
3. Academic or named expert sources
|
|
173
|
-
4. Reputable external sources with URLs
|
|
174
|
-
5. Blogs/community posts (lowest)
|
|
175
|
-
|
|
176
|
-
Policy source: `packages/claude-dev-env/skills/prompt-generator/REFINEMENT_PIPELINE_RUNBOOK.md`
|
|
177
|
-
|
|
178
|
-
## Section-refinement acceptance criteria
|
|
179
|
-
|
|
180
|
-
Section-refinement orchestration is done only when all are true:
|
|
181
|
-
|
|
182
|
-
- All 6 section agents ran, each scoped to exactly one section
|
|
183
|
-
- Merge produced one canonical prompt containing all six sections
|
|
184
|
-
- Final audit returned `overall_status: pass`
|
|
185
|
-
- Any non-pass audit was resolved through targeted revisions within retry cap
|
|
186
|
-
- AskUserQuestion approval gate was honored before launch
|
|
187
|
-
- Final user artifact includes one complete pasteable prompt block
|
|
188
|
-
|
|
189
|
-
## Constraints
|
|
190
|
-
|
|
191
|
-
- Present every launch for approval via AskUserQuestion before spawning
|
|
192
|
-
- Always run agents in background
|
|
193
|
-
- Gather context before crafting -- do not send an agent in blind
|
|
194
|
-
- Start only after explicit user execution intent; keep prompt authoring/refinement in `/prompt-generator`
|
|
195
|
-
- Default to `section_refinement_with_final_audit` orchestration for execution tasks unless user requests simplified mode
|
|
196
|
-
- Carry scope-block context into execution prompts; native Agent/Task tools have no custom intent metadata
|
|
197
|
-
- If the task is too small for an agent (single file read, quick grep), say so and just do it directly
|
|
198
|
-
- Include obstacle handling: "When encountering obstacles, do not use destructive actions as a shortcut (e.g. --no-verify, discarding unfamiliar files)" -- agents without this guidance may take irreversible shortcuts
|
|
199
|
-
- Frame agent tasks with collaborative language and include permission to express uncertainty — agents produce higher-quality output with collaborative briefing (Anthropic emotion concepts research, 2026)
|
|
@@ -1,18 +0,0 @@
|
|
|
1
|
-
# prompt-generator — file map
|
|
2
|
-
|
|
3
|
-
Baseline inventory of files in the prompt-generator skill package.
|
|
4
|
-
|
|
5
|
-
## Baseline inventory
|
|
6
|
-
|
|
7
|
-
| Path | Role |
|
|
8
|
-
| --- | --- |
|
|
9
|
-
| `SKILL.md` | Orchestrator rules, subagent contract, compliance audit |
|
|
10
|
-
| `TARGET_OUTPUT.md` | User-visible output contract for evals and hooks |
|
|
11
|
-
| `REFERENCE.md` | Tiered sources, harness patterns, debug schema |
|
|
12
|
-
| `REFINEMENT_PIPELINE_RUNBOOK.md` | Evidence-grounding runbook |
|
|
13
|
-
| `evals/prompt-generator.json` | Scenario eval rows |
|
|
14
|
-
| `templates/skill-from-ground-up.md` | Net-new skill checkpoint template |
|
|
15
|
-
| `templates/skill-refinement-package.md` | Existing-skill refinement template |
|
|
16
|
-
| `hooks/blocking/prompt_workflow_validate.py` | Validator CLI (file-based loop) |
|
|
17
|
-
| `hooks/blocking/prompt_workflow_gate_core.py` | Fence extraction, markers |
|
|
18
|
-
| `hooks/blocking/prompt_workflow_clipboard.py` | Clipboard copy for artifacts |
|
|
@@ -1,254 +0,0 @@
|
|
|
1
|
-
# Prompt generator -- reference
|
|
2
|
-
|
|
3
|
-
## Canonical resources
|
|
4
|
-
|
|
5
|
-
When authoring or refining prompts, ground decisions in these sources. If guidance conflicts, defer to the higher tier.
|
|
6
|
-
|
|
7
|
-
### Tier 1: Anthropic (primary authority for Claude)
|
|
8
|
-
|
|
9
|
-
- https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/overview -- overview, links to all sub-guides
|
|
10
|
-
- https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices -- the single living reference for Claude's latest models.
|
|
11
|
-
- https://transformer-circuits.pub/2026/emotions/index.html -- emotion concepts research (April 2026). Key takeaways: clear criteria and escape routes, collaborative framing, positive task framing, inviting transparency. Full catalog: `packages/claude-dev-env/docs/emotion-informed-prompt-design.md`.
|
|
12
|
-
- https://www.anthropic.com/research/emotion-concepts-function -- blog summary of the above paper.
|
|
13
|
-
- https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking -- adaptive thinking reference; replaces manual budget_tokens with effort-based control.
|
|
14
|
-
- https://claude.com/blog/harnessing-claudes-intelligence -- harness evolution: primitives Claude already knows, what to stop doing in the harness, deliberate boundaries (context economics, caching, typed tools). Local inventory: `docs/references/anthropic-harnessing-claudes-intelligence-technique-inventory.md`.
|
|
15
|
-
- https://github.com/anthropics/skills/tree/main/skills/claude-api -- Anthropic `claude-api` Agent Skill for hands-on API/tool patterns from that post (Hook 10). Platform entry: https://platform.claude.com/docs/en/agents-and-tools/agent-skills/claude-api-skill
|
|
16
|
-
|
|
17
|
-
### Tier 2: Major labs (strong secondary, often transfers across models)
|
|
18
|
-
|
|
19
|
-
- https://platform.openai.com/docs/guides/prompt-engineering -- six strategies: write clear instructions, provide reference text, split complex tasks, give models time to think, use external tools, test systematically.
|
|
20
|
-
- https://deepmind.google/research/ -- learning resources and chain-of-thought research.
|
|
21
|
-
- https://www.microsoft.com/en-us/research/blog/ -- publications and applied research.
|
|
22
|
-
|
|
23
|
-
### Tier 3: Courses, communities, individuals (supplementary)
|
|
24
|
-
|
|
25
|
-
**Courses:**
|
|
26
|
-
|
|
27
|
-
- https://www.deeplearning.ai/short-courses/ -- Andrew Ng's courses. "ChatGPT Prompt Engineering for Developers" (with OpenAI) is the foundational one.
|
|
28
|
-
- https://course.fast.ai/ -- Jeremy Howard's top-down teaching style.
|
|
29
|
-
- https://www.elementsofai.com/ -- University of Helsinki introductory course.
|
|
30
|
-
- https://ocw.mit.edu/search/?t=Artificial%20Intelligence -- MIT OpenCourseWare AI curriculum.
|
|
31
|
-
|
|
32
|
-
**Communities and individuals:**
|
|
33
|
-
|
|
34
|
-
- https://discuss.huggingface.co/ -- open-source model community.
|
|
35
|
-
- https://www.latent.space/ -- AI engineering perspective (Latent Space Podcast & Newsletter).
|
|
36
|
-
- https://simonwillison.net/ -- practical LLM experiments. His "LLM" tag is especially valuable.
|
|
37
|
-
|
|
38
|
-
### Conflict resolution rule
|
|
39
|
-
|
|
40
|
-
If sources disagree, apply tier order: Anthropic first, then OpenAI/Google/Microsoft, then community. Tier 1 wins when conflicting with lower tiers.
|
|
41
|
-
|
|
42
|
-
### Outcome preview gate and digest (`prompt-generator`)
|
|
43
|
-
|
|
44
|
-
See SKILL.md §§107-115 (Phases 4-5) and `TARGET_OUTPUT.md` for the full contract. **Clipboard safety:** `extract_fenced_xml_content` concatenates every ` ```xml ` block—follow §7 sample formatting so clipboard copy stays the lone artifact body.
|
|
45
|
-
|
|
46
|
-
### Outcome preview gate and digest (`prompt-generator`)
|
|
47
|
-
|
|
48
|
-
Human checkpoint before the paste-ready artifact ships: the orchestrator runs an **Outcome preview** turn (`### Outcome preview` bullets built from the **preview summary**, defined in SKILL.md Terminology) plus **AskUserQuestion** (**Ship** recommended first, two contextual alternates, **Refine with free text**), then emits `Audit`, a single ` ```xml ` fence, and **`## Outcome digest`** after the fence. Rationale matches collaborative checkpoints in `templates/skill-from-ground-up.md` and the refinement pattern in `templates/skill-refinement-package.md`. `ARCHITECTURE.md` lists all files in this skill package.
|
|
49
|
-
|
|
50
|
-
**Clipboard safety:** `prompt_workflow_gate_core.extract_fenced_xml_content` concatenates every ` ```xml ` block in the message—follow the sample formatting rules in SKILL.md section 7 so clipboard copy stays the lone artifact body. Full contract: `TARGET_OUTPUT.md`.
|
|
51
|
-
|
|
52
|
-
## Harness design patterns (Anthropic blog, April 2026)
|
|
53
|
-
|
|
54
|
-
Primary URL: https://claude.com/blog/harnessing-claudes-intelligence. Structured inventory: `docs/references/anthropic-harnessing-claudes-intelligence-technique-inventory.md`.
|
|
55
|
-
|
|
56
|
-
### Mechanism doc map (Hook 11)
|
|
57
|
-
|
|
58
|
-
Jump from concept to the platform specs the post names:
|
|
59
|
-
|
|
60
|
-
- [Bash tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/bash-tool) / [Text editor tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/text-editor-tool)
|
|
61
|
-
- [Code execution tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/code-execution-tool) / [Programmatic tool calling](https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling)
|
|
62
|
-
- [Memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool)
|
|
63
|
-
- [Agent Skills overview](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview)
|
|
64
|
-
- [Context windows](https://platform.claude.com/docs/en/build-with-claude/context-windows) / [Context editing](https://platform.claude.com/docs/en/build-with-claude/context-editing) / [Compaction](https://platform.claude.com/docs/en/build-with-claude/compaction)
|
|
65
|
-
- [Subagents](https://code.claude.com/docs/en/sub-agents)
|
|
66
|
-
- [System prompts](https://platform.claude.com/docs/en/release-notes/system-prompts) / [Working with the Messages API](https://platform.claude.com/docs/en/build-with-claude/working-with-messages) / [Prompt caching](https://platform.claude.com/docs/en/build-with-claude/prompt-caching)
|
|
67
|
-
- [Model migration guide — hard-coded filters](https://platform.claude.com/docs/en/about-claude/models/migration-guide#additional-recommended-changes)
|
|
68
|
-
- [Harness design for long-running applications](https://www.anthropic.com/engineering/harness-design-long-running-apps)
|
|
69
|
-
- [Claude Code auto-mode](https://www.anthropic.com/engineering/claude-code-auto-mode)
|
|
70
|
-
- [Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
|
|
71
|
-
|
|
72
|
-
### Context stack (Hook 5)
|
|
73
|
-
|
|
74
|
-
- **Context editing:** Remove stale tool results and thinking blocks selectively ([Context editing](https://platform.claude.com/docs/en/build-with-claude/context-editing)).
|
|
75
|
-
- **Subagents:** Fork fresh windows for isolated subtasks; post cites **+2.8%** BrowseComp vs best single-agent for Opus 4.6 ([Subagents](https://code.claude.com/docs/en/sub-agents)).
|
|
76
|
-
- **Compaction:** Summarize prior context for long horizons ([Compaction](https://platform.claude.com/docs/en/build-with-claude/compaction)); effectiveness varies by model generation (see Hook 9 table).
|
|
77
|
-
- **Memory folder:** Persist agent-chosen state via the memory tool / files ([Memory tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool)).
|
|
78
|
-
|
|
79
|
-
### Prompt caching (Hook 6)
|
|
80
|
-
|
|
81
|
-
The Messages API is stateless. Maximize [prompt caching](https://platform.claude.com/docs/en/build-with-claude/prompt-caching): **stable prefix first, dynamic tail last**; **append** via messages; **avoid mid-session model switches** (use a subagent for cheaper models); **treat tool list as cached prefix**; use **tool search** to append without invalidation; **advance breakpoints** toward the latest message. Cached tokens cost **10% of base input**.
|
|
82
|
-
|
|
83
|
-
### Typed tools vs bash strings (Hook 7)
|
|
84
|
-
|
|
85
|
-
Promote actions to **dedicated tools** with typed arguments when the harness must intercept, gate, render (e.g., **modals**), or audit—**hard-to-reverse** steps (e.g., external API calls) for user confirmation; **write/edit** paths with **staleness checks** so concurrent edits are not blindly overwritten ([Harnessing Claude's intelligence](https://claude.com/blog/harnessing-claudes-intelligence)).
|
|
86
|
-
|
|
87
|
-
### Standing review: dedicated tools vs general bash + policy (Hook 8)
|
|
88
|
-
|
|
89
|
-
Re-evaluate promotions as models improve—e.g., Claude Code **auto-mode** (secondary reviewer over bash strings) can **reduce** bespoke tools **only** where users accept that trust profile; **high-stakes** actions still warrant dedicated tools ([Claude Code auto-mode](https://www.anthropic.com/engineering/claude-code-auto-mode)).
|
|
90
|
-
|
|
91
|
-
### Benchmark vignettes — motivation only, not guarantees (Hook 9)
|
|
92
|
-
|
|
93
|
-
| Vignette | Outcome stated in the post |
|
|
94
|
-
|----------|----------------------------|
|
|
95
|
-
| SWE-bench Verified | Claude 3.5 Sonnet **49%** with bash + editor only (then SOTA framing) |
|
|
96
|
-
| BrowseComp + output filtering | Opus 4.6 **45.3% → 61.6%** |
|
|
97
|
-
| BrowseComp + subagents | Opus 4.6 **+2.8%** vs best single-agent |
|
|
98
|
-
| BrowseComp + compaction | Sonnet 4.5 **43%** flat; Opus 4.5 **68%**; Opus 4.6 **84%** (same setup) |
|
|
99
|
-
| BrowseComp-Plus + memory folder | Sonnet 4.5 **60.4% → 67.2%** |
|
|
100
|
-
| Prompt caching | Cached tokens **10%** the cost of base input tokens |
|
|
101
|
-
|
|
102
|
-
## NotebookLM Audio Overview customization (example)
|
|
103
|
-
|
|
104
|
-
Adapt `[FOCUS AREA]` per notebook. Pair with Deep Dive + Longer in the product UI when that matches the user's plan.
|
|
105
|
-
|
|
106
|
-
```text
|
|
107
|
-
Target audience: [Expert-level listener profile -- skip beginner padding.]
|
|
108
|
-
|
|
109
|
-
Focus: [FOCUS AREA -- single notebook-specific paragraph.]
|
|
110
|
-
|
|
111
|
-
Style: [Technical depth, anti-patterns, implications for builders.]
|
|
112
|
-
|
|
113
|
-
Prioritize: [Technical depth and specific findings over marketing tone or generic summaries.]
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
## Agent checklist pattern
|
|
117
|
-
|
|
118
|
-
For long tasks, optional checklist the model can mirror:
|
|
119
|
-
|
|
120
|
-
```text
|
|
121
|
-
Copy this checklist and mark items as you go:
|
|
122
|
-
|
|
123
|
-
Progress:
|
|
124
|
-
- [ ] ...
|
|
125
|
-
- [ ] ...
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
## Agentic state management
|
|
129
|
-
|
|
130
|
-
For `agent-harness` prompts that span multiple context windows, include state persistence and multi-window patterns. Based on Anthropic's guidance:
|
|
131
|
-
|
|
132
|
-
### Context awareness
|
|
133
|
-
|
|
134
|
-
Claude 4.6 tracks its remaining context window. Include harness capabilities so Claude can plan accordingly:
|
|
135
|
-
|
|
136
|
-
```text
|
|
137
|
-
<context_management>
|
|
138
|
-
Your context window will be automatically compacted as it approaches its limit, allowing you to continue working indefinitely from where you left off. Do not stop tasks early due to token budget concerns. As you approach the limit, save current progress and state before the context window refreshes. Always be as persistent and autonomous as possible and complete tasks fully.
|
|
139
|
-
</context_management>
|
|
140
|
-
```
|
|
141
|
-
|
|
142
|
-
### Multi-window workflow
|
|
143
|
-
|
|
144
|
-
Anthropic recommends differentiating the first context window from subsequent ones:
|
|
145
|
-
|
|
146
|
-
**First window:** Set up the framework -- write tests, create setup scripts, establish the todo-list.
|
|
147
|
-
|
|
148
|
-
**Subsequent windows:** Iterate on the todo-list, using state files to resume.
|
|
149
|
-
|
|
150
|
-
Key patterns from Anthropic:
|
|
151
|
-
- Have the model write tests in a **structured format** (e.g. `tests.json` with `{id, name, status}`) before starting work. Remind: "It is unacceptable to remove or edit tests because this could lead to missing or buggy functionality."
|
|
152
|
-
- Encourage **setup scripts** (e.g. `init.sh`) to start servers, run test suites, and linters. This prevents repeated work across windows.
|
|
153
|
-
- When starting fresh, be **prescriptive about resumption**: "Review progress.txt, tests.json, and the git logs."
|
|
154
|
-
- Provide **verification tools** (Playwright, computer use) for autonomous UI testing.
|
|
155
|
-
|
|
156
|
-
### State tracking
|
|
157
|
-
|
|
158
|
-
```text
|
|
159
|
-
<state_management>
|
|
160
|
-
Track progress in structured + freeform files:
|
|
161
|
-
- tests.json: structured test status {id, name, status}
|
|
162
|
-
- progress.txt: freeform session notes and next steps
|
|
163
|
-
- Use git commits as checkpoints for rollback
|
|
164
|
-
|
|
165
|
-
When approaching context limits, save current state before the window refreshes.
|
|
166
|
-
Do not stop tasks early due to token budget concerns.
|
|
167
|
-
</state_management>
|
|
168
|
-
```
|
|
169
|
-
|
|
170
|
-
### Encouraging complete context usage
|
|
171
|
-
|
|
172
|
-
```text
|
|
173
|
-
This is a very long task, so it may be beneficial to plan out your work clearly. It's encouraged to spend your entire output context working on the task - just make sure you don't run out of context with significant uncommitted work. Continue working systematically until you have completed this task.
|
|
174
|
-
```
|
|
175
|
-
|
|
176
|
-
## Research prompt pattern
|
|
177
|
-
|
|
178
|
-
For `research` prompt types, include structured investigation with hypothesis tracking:
|
|
179
|
-
|
|
180
|
-
```text
|
|
181
|
-
<research_approach>
|
|
182
|
-
Search for this information in a structured way. As you gather data, develop several competing hypotheses. Track your confidence levels in your progress notes to improve calibration. Regularly self-critique your approach and plan. Update a hypothesis tree or research notes file to persist information and provide transparency. Break down this complex research task systematically.
|
|
183
|
-
</research_approach>
|
|
184
|
-
```
|
|
185
|
-
|
|
186
|
-
## Evaluation loop
|
|
187
|
-
|
|
188
|
-
For prompt drafts that must hold up over time:
|
|
189
|
-
|
|
190
|
-
1. Run the draft on 2-3 representative user utterances.
|
|
191
|
-
2. Note failure modes (skipped steps, wrong format, over-refusal).
|
|
192
|
-
3. Tighten **constraints** or add **examples** for the failure class only.
|
|
193
|
-
|
|
194
|
-
Anthropic's **self-correction chaining** pattern extends this: generate a draft, have Claude review it against criteria, then have Claude refine based on the review. Each step can be a separate API call for inspection and branching.
|
|
195
|
-
|
|
196
|
-
## Anti-test-fixation pattern
|
|
197
|
-
|
|
198
|
-
```text
|
|
199
|
-
Write general-purpose solutions using the standard tools available. Implement logic that works correctly for all valid inputs, not just the test cases. Tests verify correctness -- they do not define the solution. If a test seems incorrect or the task is unreasonable, flag it rather than working around it.
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
## Commit-and-execute pattern
|
|
203
|
-
|
|
204
|
-
```text
|
|
205
|
-
When deciding how to approach a problem, choose an approach and commit to it. Avoid revisiting decisions unless you encounter new information that directly contradicts your reasoning. If you are weighing two approaches, pick one and see it through. You can always course-correct later if the chosen approach fails.
|
|
206
|
-
```
|
|
207
|
-
|
|
208
|
-
## Debug JSON schema (prompt-generator pipeline)
|
|
209
|
-
|
|
210
|
-
Use **only** when the user explicitly requests debug output (for example `show debug`, `full audit table`, `raw internal object`). Default assistant turns complete the normal handoff first: one `xml` fence + **`## Outcome digest`** (see also `TARGET_OUTPUT.md`); this JSON object is an optional appendix **after** that handoff.
|
|
211
|
-
|
|
212
|
-
Shape (field names stable for internal audit helpers and Stop-hook leak detection):
|
|
213
|
-
|
|
214
|
-
```json
|
|
215
|
-
{
|
|
216
|
-
"pipeline_mode": "internal_section_refinement_with_final_audit",
|
|
217
|
-
"scope_block": {
|
|
218
|
-
"target_local_roots": ["..."],
|
|
219
|
-
"target_canonical_roots": ["..."],
|
|
220
|
-
"target_file_globs": ["..."],
|
|
221
|
-
"comparison_basis": "...",
|
|
222
|
-
"completion_boundary": "..."
|
|
223
|
-
},
|
|
224
|
-
"required_sections": ["role", "background", "instructions", "constraints", "output_format", "illustrations"],
|
|
225
|
-
"base_prompt_xml": "<role>...</role><background>...</background><instructions>...</instructions><constraints>...</constraints><illustrations>...</illustrations><output_format>...</output_format>",
|
|
226
|
-
"section_scope_rule": "Each refiner edits exactly one section and returns sibling sections unchanged.",
|
|
227
|
-
"section_output_contract": {
|
|
228
|
-
"required_fields": ["improved_block", "rationale", "concise_diff"]
|
|
229
|
-
},
|
|
230
|
-
"merge_output_contract": {
|
|
231
|
-
"required_fields": ["canonical_prompt_xml"]
|
|
232
|
-
},
|
|
233
|
-
"audit_output_contract": {
|
|
234
|
-
"required_fields": [
|
|
235
|
-
"overall_status",
|
|
236
|
-
"checklist_results",
|
|
237
|
-
"evidence_quotes",
|
|
238
|
-
"source_refs",
|
|
239
|
-
"corrective_edits",
|
|
240
|
-
"retry_count"
|
|
241
|
-
]
|
|
242
|
-
},
|
|
243
|
-
"checklist_results": {
|
|
244
|
-
"<row_name>": {
|
|
245
|
-
"status": "pass|fail",
|
|
246
|
-
"evidence_quote": "exact quote used for verification",
|
|
247
|
-
"source_ref": "URL or local path",
|
|
248
|
-
"fix_if_fail": "concrete edit text (empty only if pass)"
|
|
249
|
-
}
|
|
250
|
-
}
|
|
251
|
-
}
|
|
252
|
-
```
|
|
253
|
-
|
|
254
|
-
`checklist_results` keys must include all **15** compliance row ids from `SKILL.md` §11 (for example `reversible_action_and_safety_check_guidance`, `scope_terms_explicit_and_anchored`).
|
|
@@ -1,177 +0,0 @@
|
|
|
1
|
-
# Prompt Refinement Pipeline Runbook
|
|
2
|
-
|
|
3
|
-
## Purpose
|
|
4
|
-
|
|
5
|
-
Validate deterministic behavior for:
|
|
6
|
-
|
|
7
|
-
1. Base prompt generation (`/prompt-generator`)
|
|
8
|
-
2. Six section refiners (owned by `/prompt-generator`)
|
|
9
|
-
3. Merge + final audit with citation-grounded checks
|
|
10
|
-
4. Targeted fix + capped re-audit loop
|
|
11
|
-
|
|
12
|
-
## Sample Input
|
|
13
|
-
|
|
14
|
-
Use this command:
|
|
15
|
-
|
|
16
|
-
```text
|
|
17
|
-
/prompt-generator Create a trusted final system prompt for a coding agent that edits files safely, follows user scope, and returns concise status updates.
|
|
18
|
-
```
|
|
19
|
-
|
|
20
|
-
## Expected Stage Artifacts
|
|
21
|
-
|
|
22
|
-
1. **Base stage**
|
|
23
|
-
- Scope block is present and explicit:
|
|
24
|
-
- `target_local_roots`
|
|
25
|
-
- `target_canonical_roots` (if applicable)
|
|
26
|
-
- `target_file_globs`
|
|
27
|
-
- `comparison_basis`
|
|
28
|
-
- `completion_boundary`
|
|
29
|
-
- XML scaffold includes all sections — verified by the Stop hook at runtime; each required section tag must have both an opening and a closing tag:
|
|
30
|
-
- `<role>`
|
|
31
|
-
- `<background>`
|
|
32
|
-
- `<instructions>`
|
|
33
|
-
- `<constraints>`
|
|
34
|
-
- `<output_format>`
|
|
35
|
-
- `<illustrations>`
|
|
36
|
-
- Includes internal refinement object with:
|
|
37
|
-
- `pipeline_mode: internal_section_refinement_with_final_audit`
|
|
38
|
-
- `required_sections` list with all six sections
|
|
39
|
-
- section/merge/audit output contracts
|
|
40
|
-
|
|
41
|
-
2. **Section refinement stage**
|
|
42
|
-
- Exactly 6 agent runs, one per section.
|
|
43
|
-
- Each section output includes:
|
|
44
|
-
- `improved_block`
|
|
45
|
-
- `rationale`
|
|
46
|
-
- `concise_diff`
|
|
47
|
-
- No section agent edits another section.
|
|
48
|
-
|
|
49
|
-
3. **Merge stage**
|
|
50
|
-
- One canonical merged prompt with all six sections.
|
|
51
|
-
|
|
52
|
-
4. **Audit stage**
|
|
53
|
-
- Output includes:
|
|
54
|
-
- `overall_status`
|
|
55
|
-
- `checklist_results`
|
|
56
|
-
- `corrective_edits`
|
|
57
|
-
- `retry_count`
|
|
58
|
-
- Every checklist item includes:
|
|
59
|
-
- `status`
|
|
60
|
-
- `evidence_quote` (direct quote)
|
|
61
|
-
- `source_ref`
|
|
62
|
-
- `fix_if_fail`
|
|
63
|
-
|
|
64
|
-
5. **Final output**
|
|
65
|
-
- One complete prompt block that is copy-pasteable.
|
|
66
|
-
- Internal refinement object is not shown unless debug output was requested.
|
|
67
|
-
- Default output must not leak the raw internal refinement object fields.
|
|
68
|
-
|
|
69
|
-
## Deterministic Checklist Coverage
|
|
70
|
-
|
|
71
|
-
Audit report must include all check IDs:
|
|
72
|
-
|
|
73
|
-
- `structured_scoped_instructions`
|
|
74
|
-
- `sequential_steps_present`
|
|
75
|
-
- `positive_framing`
|
|
76
|
-
- `acceptance_criteria_defined`
|
|
77
|
-
- `safety_reversibility_language`
|
|
78
|
-
- `reversible_action_and_safety_check_guidance`
|
|
79
|
-
- `concrete_output_contract`
|
|
80
|
-
- `scope_boundary_present`
|
|
81
|
-
- `explicit_scope_anchors_present`
|
|
82
|
-
- `all_instructions_artifact_bound`
|
|
83
|
-
- `scope_terms_explicit_and_anchored`
|
|
84
|
-
- `completion_boundary_measurable`
|
|
85
|
-
- `citation_grounding_policy_present`
|
|
86
|
-
- `source_priority_rules_present`
|
|
87
|
-
- `artifact_language_confidence`
|
|
88
|
-
|
|
89
|
-
## Citation and Grounding Validation
|
|
90
|
-
|
|
91
|
-
For each factual compliance claim in the audit:
|
|
92
|
-
|
|
93
|
-
- Include a source citation
|
|
94
|
-
- Include a word-for-word quote
|
|
95
|
-
- If unsupported, explicitly return "I don't know"
|
|
96
|
-
|
|
97
|
-
Source priority must be applied in this order:
|
|
98
|
-
|
|
99
|
-
1. Official vendor docs (external behavior)
|
|
100
|
-
2. Local project files (local behavior)
|
|
101
|
-
3. Academic / named experts
|
|
102
|
-
4. Reputable external URLs
|
|
103
|
-
5. Blog/community content
|
|
104
|
-
|
|
105
|
-
## Non-pass Loop Validation
|
|
106
|
-
|
|
107
|
-
If `overall_status` is `fail`:
|
|
108
|
-
|
|
109
|
-
1. Apply only targeted edits listed in `corrective_edits`
|
|
110
|
-
2. Re-run audit
|
|
111
|
-
3. Stop after retry cap (`max_retries: 2` unless explicitly overridden)
|
|
112
|
-
4. Return unresolved failures with evidence if still failing at cap
|
|
113
|
-
|
|
114
|
-
## Ownership and Execution-Intent Validation
|
|
115
|
-
|
|
116
|
-
- Prompt refinement remains inside `/prompt-generator`.
|
|
117
|
-
- `/agent-prompt` is used only after explicit execution/delegation intent.
|
|
118
|
-
- Execution handoffs that go through `/agent-prompt` carry scope-block context in the execution prompt as needed.
|
|
119
|
-
- Final refined prompt content is treated as artifact text during refinement and audit.
|
|
120
|
-
- Execution steps (when requested) are bound to scope block artifacts.
|
|
121
|
-
|
|
122
|
-
## Scope-Phrasing Validation
|
|
123
|
-
|
|
124
|
-
- Reject ambiguous scope wording such as "this session", "current files", "here", "above", or "as needed" when used as scope boundaries.
|
|
125
|
-
- Require artifact-bound replacements using explicit roots, globs, comparison basis, and measurable completion boundary.
|
|
126
|
-
|
|
127
|
-
## Runtime Hook Gate Validation
|
|
128
|
-
|
|
129
|
-
Validate fail-closed runtime gates:
|
|
130
|
-
|
|
131
|
-
1. **Stop leakage/scope/checklist gate**
|
|
132
|
-
- **Section-presence gate (Stop)** — Block responses where the fenced XML artifact is missing any of the five required section tag pairs: `role`, `background`, `instructions`, `constraints`, `output_format`.
|
|
133
|
-
- Block responses that leak raw internal refinement object fields unless debug intent is explicit.
|
|
134
|
-
- Block responses missing deterministic checklist rows when audit output is present.
|
|
135
|
-
- Block responses using ambiguous scope phrasing in scope-bound sections.
|
|
136
|
-
- Block responses containing negative keywords (no, not, don't, never, avoid, etc.) inside fenced XML artifacts.
|
|
137
|
-
- Block responses containing hedging language (might be, possibly, I think, etc.) inside fenced XML artifacts.
|
|
138
|
-
|
|
139
|
-
## Context-Footprint Controls
|
|
140
|
-
|
|
141
|
-
- Keep baseline prompt-workflow policy minimal by default.
|
|
142
|
-
- Store stable enforcement text in hooks/rules; avoid repeating full policy blocks in prompt artifacts.
|
|
143
|
-
- Load heavy skills on demand based on explicit task intent.
|
|
144
|
-
- Prefer canonical references and compact outputs over repeated long policy text.
|
|
145
|
-
|
|
146
|
-
## Deterministic vs Semantic Boundary
|
|
147
|
-
|
|
148
|
-
- **Deterministic (fail-closed):**
|
|
149
|
-
- Missing required scope anchors (when Stop guard applies)
|
|
150
|
-
- Raw internal object leakage without debug intent
|
|
151
|
-
- Missing required checklist rows in audit output
|
|
152
|
-
- Missing required XML sections (`role`, `background`, `instructions`, `constraints`, `output_format`) in the fenced artifact (opening and closing tags)
|
|
153
|
-
- Ambiguous scope terms in scope-bound text
|
|
154
|
-
- Negative keywords inside fenced XML artifacts
|
|
155
|
-
- Hedging language inside fenced XML artifacts
|
|
156
|
-
- **Semantic-only (auditor layer):**
|
|
157
|
-
- Overall quality/readability of scope wording beyond banned-term checks
|
|
158
|
-
- Whether instruction binding quality is "good enough" beyond explicit anchor presence
|
|
159
|
-
- Whether context compaction is optimal for a specific task
|
|
160
|
-
|
|
161
|
-
## Doc Alignment Validation
|
|
162
|
-
|
|
163
|
-
Each major workflow requirement added in skills text must map to at least one principle:
|
|
164
|
-
|
|
165
|
-
- Structured/scoped instructions
|
|
166
|
-
- Clear sequential process
|
|
167
|
-
- Positive framing
|
|
168
|
-
- Explicit acceptance criteria
|
|
169
|
-
- Concrete output format contract
|
|
170
|
-
- Reversibility/safety constraints
|
|
171
|
-
|
|
172
|
-
## Traceability Validation
|
|
173
|
-
|
|
174
|
-
Each major requirement in skill text should point to:
|
|
175
|
-
|
|
176
|
-
- Anthropic best-practice URL, and/or
|
|
177
|
-
- Local source file path used as authority
|