codeforge-dev 1.5.8 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/.devcontainer/.env +2 -1
  2. package/.devcontainer/CHANGELOG.md +56 -0
  3. package/.devcontainer/CLAUDE.md +65 -15
  4. package/.devcontainer/README.md +65 -4
  5. package/.devcontainer/config/keybindings.json +5 -0
  6. package/.devcontainer/config/main-system-prompt.md +63 -2
  7. package/.devcontainer/config/settings.json +25 -6
  8. package/.devcontainer/devcontainer.json +18 -2
  9. package/.devcontainer/features/README.md +21 -7
  10. package/.devcontainer/features/ccburn/README.md +60 -0
  11. package/.devcontainer/features/ccburn/devcontainer-feature.json +38 -0
  12. package/.devcontainer/features/ccburn/install.sh +174 -0
  13. package/.devcontainer/features/ccstatusline/README.md +22 -21
  14. package/.devcontainer/features/ccstatusline/devcontainer-feature.json +1 -1
  15. package/.devcontainer/features/ccstatusline/install.sh +48 -16
  16. package/.devcontainer/features/claude-code/config/settings.json +60 -24
  17. package/.devcontainer/features/mcp-qdrant/devcontainer-feature.json +1 -1
  18. package/.devcontainer/features/mcp-reasoner/devcontainer-feature.json +1 -1
  19. package/.devcontainer/plugins/devs-marketplace/plugins/auto-formatter/scripts/__pycache__/format-on-stop.cpython-314.pyc +0 -0
  20. package/.devcontainer/plugins/devs-marketplace/plugins/auto-formatter/scripts/format-on-stop.py +21 -6
  21. package/.devcontainer/plugins/devs-marketplace/plugins/auto-linter/scripts/__pycache__/lint-file.cpython-314.pyc +0 -0
  22. package/.devcontainer/plugins/devs-marketplace/plugins/auto-linter/scripts/lint-file.py +7 -10
  23. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/REVIEW-RUBRIC.md +440 -0
  24. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/architect.md +190 -0
  25. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/bash-exec.md +173 -0
  26. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/claude-guide.md +155 -0
  27. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/dependency-analyst.md +248 -0
  28. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/doc-writer.md +233 -0
  29. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/explorer.md +235 -0
  30. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/generalist.md +125 -0
  31. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/git-archaeologist.md +242 -0
  32. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/migrator.md +195 -0
  33. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/perf-profiler.md +265 -0
  34. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/refactorer.md +209 -0
  35. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/researcher.md +195 -0
  36. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/security-auditor.md +289 -0
  37. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/spec-writer.md +284 -0
  38. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/statusline-config.md +188 -0
  39. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/test-writer.md +245 -0
  40. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/hooks/hooks.json +12 -0
  41. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/guard-readonly-bash.cpython-314.pyc +0 -0
  42. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/redirect-builtin-agents.cpython-314.pyc +0 -0
  43. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/skill-suggester.cpython-314.pyc +0 -0
  44. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/syntax-validator.cpython-314.pyc +0 -0
  45. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/verify-no-regression.cpython-314.pyc +0 -0
  46. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/verify-tests-pass.cpython-314.pyc +0 -0
  47. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/guard-readonly-bash.py +611 -0
  48. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/redirect-builtin-agents.py +83 -0
  49. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/skill-suggester.py +85 -2
  50. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/syntax-validator.py +9 -4
  51. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/verify-no-regression.py +221 -0
  52. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/verify-tests-pass.py +176 -0
  53. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/claude-agent-sdk/SKILL.md +599 -0
  54. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/claude-agent-sdk/references/sdk-typescript-reference.md +954 -0
  55. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/git-forensics/SKILL.md +276 -0
  56. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/git-forensics/references/advanced-commands.md +332 -0
  57. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/git-forensics/references/investigation-playbooks.md +319 -0
  58. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/SKILL.md +341 -0
  59. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/references/interpreting-results.md +235 -0
  60. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/references/tool-commands.md +395 -0
  61. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/refactoring-patterns/SKILL.md +344 -0
  62. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/refactoring-patterns/references/safe-transformations.md +247 -0
  63. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/refactoring-patterns/references/smell-catalog.md +332 -0
  64. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/security-checklist/SKILL.md +277 -0
  65. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/security-checklist/references/owasp-patterns.md +269 -0
  66. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/security-checklist/references/secrets-patterns.md +253 -0
  67. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/specification-writing/SKILL.md +288 -0
  68. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/specification-writing/references/criteria-patterns.md +245 -0
  69. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/specification-writing/references/ears-templates.md +239 -0
  70. package/.devcontainer/plugins/devs-marketplace/plugins/protected-files-guard/scripts/__pycache__/guard-protected.cpython-314.pyc +0 -0
  71. package/.devcontainer/plugins/devs-marketplace/plugins/protected-files-guard/scripts/guard-protected.py +40 -39
  72. package/.devcontainer/scripts/setup-aliases.sh +10 -5
  73. package/.devcontainer/scripts/setup-config.sh +2 -0
  74. package/.devcontainer/scripts/setup-plugins.sh +38 -46
  75. package/.devcontainer/scripts/setup-projects.sh +175 -0
  76. package/.devcontainer/scripts/setup-symlink-claude.sh +36 -0
  77. package/.devcontainer/scripts/setup-update-claude.sh +11 -8
  78. package/.devcontainer/scripts/setup.sh +4 -2
  79. package/package.json +1 -1
  80. package/.devcontainer/scripts/setup-irie-claude.sh +0 -32
@@ -0,0 +1,440 @@
1
+ # Agent & Skill Quality Rubric
2
+
3
+ > Compiled from Anthropic's official documentation, Claude Code subagent docs, skill authoring best practices, and industry research on LLM agent design patterns. This rubric drives the quality review of all agents and skills in the `code-directive` plugin.
4
+
5
+ ---
6
+
7
+ ## 1. Key Principles from Anthropic
8
+
9
+ These principles come directly from Anthropic's official prompt engineering documentation for Claude 4.x models (Opus 4.6, Sonnet 4.5, Haiku 4.5).
10
+
11
+ ### 1.1 Be Explicit and Specific
12
+
13
+ Claude 4.x models are trained for **precise instruction following**. They do what you ask — nothing more, nothing less. Vague prompts produce vague results. If you want thorough, above-and-beyond behavior, you must explicitly request it.
14
+
15
+ - **Bad**: "Review this code"
16
+ - **Good**: "Review this code for security vulnerabilities, performance issues, and readability. For each issue, explain the problem, show the current code, and provide a corrected version."
17
+
18
+ **Implication for agents**: Every agent prompt must clearly define what the agent should do, how it should do it, and what its output should look like. Do not rely on Claude inferring intent from vague instructions.
19
+
20
+ ### 1.2 Provide Context and Motivation (Explain WHY)
21
+
22
+ Providing the *reason* behind instructions helps Claude generalize correctly. Instead of bare rules, explain the motivation.
23
+
24
+ - **Bad**: "NEVER use ellipses"
25
+ - **Good**: "Never use ellipses because your output will be read by a text-to-speech engine that cannot pronounce them."
26
+
27
+ **Implication for agents**: When an agent has constraints (e.g., "read-only"), briefly explain why. When an agent follows a particular workflow, explain the rationale so it can adapt intelligently to edge cases.
28
+
29
+ ### 1.3 Be Vigilant with Examples and Details
30
+
31
+ Claude pays close attention to examples. Poorly constructed examples teach bad patterns. Examples should:
32
+ - Align precisely with desired behavior
33
+ - Cover edge cases and diverse scenarios
34
+ - Be wrapped in `<example>` tags for clarity
35
+ - Include 3-5 examples for complex tasks; 1 example for simple ones
36
+
37
+ ### 1.4 Use XML Tags for Structure
38
+
39
+ Claude was trained on XML-tagged prompts. Tags like `<instructions>`, `<example>`, `<constraints>` prevent Claude from confusing instructions with context or examples with rules.
40
+
41
+ - Be **consistent** with tag names throughout the prompt
42
+ - **Nest** tags for hierarchical content: `<outer><inner></inner></outer>`
43
+ - **Refer** to tagged content by tag name: "Using the data in `<context>` tags..."
44
+ - There are no canonical "best" tag names — use names that make sense for the content they surround
45
+
46
+ ### 1.5 Allow Uncertainty
47
+
48
+ Give Claude explicit permission to say "I don't know" rather than guessing. This reduces hallucinations, especially in research and diagnostic agents.
49
+
50
+ ### 1.6 Tell Claude What TO Do, Not What NOT to Do
51
+
52
+ Positive framing is more effective than negative framing for behavioral steering:
53
+ - **Bad**: "Do not use markdown in your response"
54
+ - **Good**: "Write your response in smoothly flowing prose paragraphs."
55
+
56
+ **Exception**: Safety constraints (e.g., "NEVER modify files") should still use strong negative framing because the cost of violation is high.
57
+
58
+ ### 1.7 Claude 4.x Is More Responsive to System Prompts
59
+
60
+ Claude Opus 4.5 and 4.6 are more responsive to system prompts than previous models. Aggressive language designed to prevent undertriggering in older models (e.g., "CRITICAL: You MUST...") may now cause **overtriggering**. Use calibrated, normal language unless the constraint is genuinely critical.
61
+
62
+ ---
63
+
64
+ ## 2. System Prompt Best Practices
65
+
66
+ ### 2.1 Identity & Role
67
+
68
+ Role prompting is the single most powerful use of system prompts. The right role turns Claude from a generalist into a domain expert.
69
+
70
+ **Best practices**:
71
+ - Define the role in the **first line** of the prompt body. This sets the frame for everything that follows.
72
+ - Be **specific**: "You are a senior Python developer specializing in FastAPI and async patterns" beats "You are a coding assistant."
73
+ - Include **expertise level**: "senior", "expert", "specialist" signals the depth expected.
74
+ - Optionally include **personality traits** relevant to the task: "methodical", "thorough", "concise".
75
+ - The `description` field in YAML frontmatter is for Claude's **task routing** — it tells the parent agent *when* to delegate. The markdown body is the agent's **system prompt** — it tells the agent *how* to behave.
76
+
77
+ **Agent-specific guidance**:
78
+ - The `name` field must use lowercase letters and hyphens only
79
+ - The `description` field should clearly state: (a) what the agent does, and (b) when it should be used
80
+ - Write descriptions in **third person**: "Analyzes code for security vulnerabilities" not "I analyze code" or "Use this to analyze code"
81
+ - Include **trigger phrases** the user might say that should invoke this agent
82
+
83
+ ### 2.2 Constraints & Boundaries
84
+
85
+ Constraints define what the agent **must not** do. They are safety rails.
86
+
87
+ **Best practices**:
88
+ - Group all hard constraints in a clearly labeled section (`## Critical Constraints` or similar)
89
+ - Use strong negative framing for safety-critical constraints: "**NEVER** modify any file"
90
+ - Be exhaustive — list every prohibited action category, not just one example
91
+ - Explain *why* the constraint exists when not obvious
92
+ - Keep constraints at the top of the prompt, before workflow instructions
93
+
94
+ **Common constraint categories for agents**:
95
+ - File system modifications (read-only agents)
96
+ - Service/process management (diagnostic agents)
97
+ - Package installation (sandboxed agents)
98
+ - Git state changes (research agents)
99
+ - Network requests (isolated agents)
100
+
101
+ ### 2.3 Behavioral Rules
102
+
103
+ Behavioral rules define how the agent **should act** in different scenarios. They are the decision-making logic.
104
+
105
+ **Best practices**:
106
+ - Use **conditional dispatch**: "If X, do Y. If Z, do W." This helps Claude handle varied inputs.
107
+ - Cover the **common scenarios** the agent will encounter, including the "no input" case.
108
+ - Include **negative result reporting**: "Always report what was checked, even if nothing was found."
109
+ - Include **uncertainty handling**: "If you cannot determine the answer, say so and explain what additional information would help."
110
+ - Be specific about **scope escalation**: When should the agent go broad vs. narrow?
111
+
112
+ ### 2.4 Examples & Few-Shot
113
+
114
+ Examples are the most effective way to communicate expected behavior.
115
+
116
+ **Best practices**:
117
+ - Wrap examples in `<example>` tags (multiple examples in `<examples>` parent tag)
118
+ - Include **input → output** pairs that show the complete workflow
119
+ - Provide **3-5 diverse examples** for complex agents, covering:
120
+ - The happy path (typical input)
121
+ - Edge cases (unusual input)
122
+ - Error cases (bad input or no results)
123
+ - Ensure examples are **consistent** with all stated rules and constraints
124
+ - Examples should demonstrate the **output format** in action, not just describe it
125
+ - Place examples **after** the rules they illustrate, not before
126
+
127
+ ### 2.5 Output Format Specification
128
+
129
+ A structured output format ensures the agent's results are predictable and parseable.
130
+
131
+ **Best practices**:
132
+ - Define a clear output template with named sections
133
+ - Use markdown headers (`###`) for top-level sections
134
+ - Use consistent formatting within sections (bullet lists, tables, etc.)
135
+ - Include a "Sources" or "Evidence" section that traces claims to specific files, URLs, or line numbers
136
+ - Specify what goes in each section so there's no ambiguity
137
+ - Match the output format to the consumer — if a human reads it, optimize for readability; if another tool parses it, optimize for structure
138
+
139
+ ### 2.6 Tool Usage Guidance
140
+
141
+ Agents need explicit guidance on *how* to use their available tools effectively.
142
+
143
+ **Best practices**:
144
+ - Show concrete tool usage patterns with realistic commands/queries
145
+ - Specify tool selection logic: "Use Glob to discover files, then Grep to search content, then Read to examine specific files"
146
+ - Include command templates with placeholder values
147
+ - Warn about tool-specific pitfalls (e.g., "For large logs, always filter with Grep before reading. Never dump entire large files.")
148
+ - If the agent has Bash access, provide allowed command patterns and explicitly prohibit dangerous ones
149
+ - If tools have been restricted via `tools:` or `disallowedTools:`, the prompt should align with what's available — don't reference tools the agent can't use
150
+
151
+ ---
152
+
153
+ ## 3. Agent Definition Patterns
154
+
155
+ ### 3.1 What Makes an Effective Agent
156
+
157
+ Based on Claude Code's subagent architecture and Anthropic's guidance:
158
+
159
+ 1. **Single Responsibility**: Each agent should excel at one specific task domain. Don't create Swiss Army knife agents.
160
+ 2. **Clear Delegation Signal**: The `description` must be specific enough that the parent agent knows *exactly* when to delegate. Include trigger phrases.
161
+ 3. **Minimal Tool Surface**: Grant only the tools the agent needs. Read-only agents should not have Write/Edit. Diagnostic agents should not have file creation.
162
+ 4. **Structured Workflow**: The prompt should define a clear, repeatable workflow — not just "do the thing." Steps should be numbered and conditional.
163
+ 5. **Defined Output Contract**: The agent should always produce output in a predictable format, regardless of what it finds.
164
+ 6. **Graceful Failure**: The agent should handle cases where it can't find what it's looking for, can't complete the task, or encounters errors. It should report these clearly rather than hallucinating.
165
+ 7. **Context Efficiency**: Agents run in their own context window. Design prompts to be thorough but not wasteful. Every line should earn its place.
166
+
167
+ ### 3.2 Common Anti-Patterns to Avoid
168
+
169
+ | Anti-Pattern | Why It's Bad | Fix |
170
+ |---|---|---|
171
+ | **Vague description** ("Helps with code") | Parent agent can't decide when to delegate | Be specific: "Analyzes Python code for security vulnerabilities including OWASP Top 10, injection flaws, and authentication weaknesses" |
172
+ | **Missing constraints section** | Agent may modify files, install packages, or cause side effects | Add explicit `## Critical Constraints` section listing prohibited actions |
173
+ | **Overloaded prompt** (too many tasks) | Agent loses focus, produces inconsistent results | Split into multiple focused agents |
174
+ | **No output format** | Results vary wildly between invocations | Define a structured output template |
175
+ | **ALLCAPS SHOUTING throughout** | Claude 4.x overtriggers on aggressive language; creates noise | Reserve strong emphasis for genuinely critical safety constraints; use normal language elsewhere |
176
+ | **No examples** | Agent guesses at expected behavior | Add 2-3 concrete input→output examples |
177
+ | **Contradictory instructions** | Agent behavior becomes unpredictable | Review for internal consistency; have Claude check |
178
+ | **Tool references that don't match `tools:` field** | Agent tries to use unavailable tools | Audit prompt against YAML `tools:` list |
179
+ | **Assuming Claude knows project-specific things** | Hallucinated project details | Provide concrete context or instruct the agent to discover it |
180
+ | **No negative-result handling** | Agent hallucinates results when it finds nothing | Add explicit "report what you checked even if nothing was found" |
181
+ | **Time-sensitive content** | Becomes wrong as tools/APIs evolve | Use version-agnostic language or "old patterns" sections |
182
+
183
+ ### 3.3 Structure & Organization
184
+
185
+ **Recommended agent file structure:**
186
+
187
+ ```markdown
188
+ ---
189
+ name: kebab-case-name
190
+ description: >-
191
+ Third-person description of what the agent does and when to use it.
192
+ Include trigger phrases users might say.
193
+ tools: List, Of, Allowed, Tools
194
+ model: sonnet | opus | haiku | inherit
195
+ color: display-color
196
+ ---
197
+
198
+ # Agent Name
199
+
200
+ Opening paragraph: role definition, purpose, and key capability.
201
+
202
+ ## Critical Constraints
203
+
204
+ Exhaustive list of prohibited actions with strong negative framing.
205
+
206
+ ## Strategy / Workflow
207
+
208
+ Step-by-step procedure the agent follows. Use numbered phases.
209
+ Include conditional logic for different input types.
210
+
211
+ ## Behavioral Rules
212
+
213
+ Conditional dispatch rules for different scenarios.
214
+ Include the "no input" and "error" cases.
215
+
216
+ ## Output Format
217
+
218
+ Structured template for the agent's response.
219
+ Named sections with descriptions of what goes in each.
220
+
221
+ <example>
222
+ Concrete input→output example demonstrating the full workflow.
223
+ </example>
224
+
225
+ <example>
226
+ Second example covering a different scenario or edge case.
227
+ </example>
228
+ ```
229
+
230
+ **Key structural principles:**
231
+ - Role definition comes first (sets the frame)
232
+ - Constraints come early (before workflow, so they're weighted heavily)
233
+ - Workflow is the longest section (the operational core)
234
+ - Output format provides the contract
235
+ - Examples come last (they demonstrate everything above in action)
236
+
237
+ ---
238
+
239
+ ## 4. Skill Content Best Practices
240
+
241
+ These are drawn directly from Anthropic's official [Skill Authoring Best Practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices).
242
+
243
+ ### 4.1 Core Principle: Conciseness
244
+
245
+ The context window is a shared resource. Every token in your skill competes with conversation history, system prompts, and other skills.
246
+
247
+ **Default assumption**: Claude is already very smart. Only add context Claude doesn't already have.
248
+
249
+ For each piece of information, ask:
250
+ - "Does Claude really need this explanation?"
251
+ - "Can I assume Claude knows this?"
252
+ - "Does this paragraph justify its token cost?"
253
+
254
+ **Bad**: 150 tokens explaining what PDFs are before showing how to extract text.
255
+ **Good**: 50 tokens showing the extraction code directly.
256
+
257
+ ### 4.2 Technical Content Quality
258
+
259
+ **Best practices**:
260
+ - **Lead with the mental model**: Start with a concise explanation of how the technology works conceptually, then provide specifics.
261
+ - **Assume competence**: Don't explain basics Claude already knows. Focus on the non-obvious: gotchas, best practices, version-specific details, and patterns that differ from common assumptions.
262
+ - **Be opinionated**: Provide a default recommendation rather than listing multiple options. "Use pdfplumber for text extraction" beats "You can use pypdf, pdfplumber, PyMuPDF, or..."
263
+ - **Version-pin where it matters**: Specify versions for APIs with breaking changes. "Assume FastAPI 0.100+ with Pydantic v2" prevents confusion.
264
+ - **Provide escape hatches**: After the default, note alternatives for edge cases. "For scanned PDFs requiring OCR, use pdf2image with pytesseract instead."
265
+
266
+ ### 4.3 Code Example Standards
267
+
268
+ - Show **realistic, runnable code** — not pseudocode
269
+ - Include **imports** — don't make Claude guess
270
+ - Use **type annotations** in Python examples
271
+ - Include **error handling** only when it illustrates a non-obvious pattern
272
+ - Keep examples **minimal but complete** — enough to copy-paste and run
273
+ - Use **consistent style** across all examples in a skill
274
+ - Comment only the non-obvious — don't explain what `import json` does
275
+
276
+ ### 4.4 Reference Material Design (Progressive Disclosure)
277
+
278
+ Anthropic's recommended pattern: SKILL.md is the table of contents; detail files are chapters.
279
+
280
+ - Keep SKILL.md under **500 lines**
281
+ - Split large content into separate files referenced from SKILL.md
282
+ - Keep references **one level deep** (SKILL.md → reference file, not SKILL.md → file → file → file)
283
+ - For reference files over 100 lines, include a **table of contents** at the top
284
+ - Name files descriptively: `form_validation_rules.md` not `doc2.md`
285
+ - Organize by domain: `reference/finance.md`, `reference/sales.md`
286
+
287
+ ### 4.5 Description Field
288
+
289
+ The `description` field is the **most critical field** for skill discovery. Claude uses it to choose the right skill from potentially 100+ available skills.
290
+
291
+ **Best practices**:
292
+ - Write in **third person** (injected into system prompt; inconsistent POV causes discovery problems)
293
+ - Include both **what the skill does** and **when to use it**
294
+ - Include **trigger phrases** the user might say (quoted phrases work well)
295
+ - Include **key terms** users might mention
296
+ - Be specific, not vague: "Extract text and tables from PDF files, fill forms, merge documents" not "Helps with documents"
297
+ - Maximum 1024 characters
298
+
299
+ ### 4.6 Skill Anti-Patterns
300
+
301
+ | Anti-Pattern | Fix |
302
+ |---|---|
303
+ | Explaining basics Claude already knows | Delete the explanation; show code directly |
304
+ | Offering too many options without a default | Pick one default; mention alternatives as escape hatches |
305
+ | Deeply nested file references (3+ levels) | Keep all references one level from SKILL.md |
306
+ | Windows-style paths (`\`) | Always use forward slashes (`/`) |
307
+ | Time-sensitive information | Use "old patterns" sections or version-agnostic language |
308
+ | Inconsistent terminology | Pick one term and use it throughout |
309
+ | Vague description field | Be specific with trigger phrases and key terms |
310
+ | Over-verbose SKILL.md (>500 lines) | Split into referenced files |
311
+
312
+ ---
313
+
314
+ ## 5. Quality Checklist
315
+
316
+ Use this checklist when reviewing each agent definition and skill. Items marked with `[C]` are critical (must fix); items marked with `[R]` are recommended (should fix).
317
+
318
+ ### Agent Definition Checklist
319
+
320
+ #### YAML Frontmatter
321
+ - [ ] `[C]` `name` uses lowercase letters and hyphens only
322
+ - [ ] `[C]` `description` is non-empty and describes both *what* and *when*
323
+ - [ ] `[C]` `description` is written in third person
324
+ - [ ] `[R]` `description` includes trigger phrases users might say
325
+ - [ ] `[C]` `tools` lists only the tools the agent actually needs (principle of least privilege)
326
+ - [ ] `[R]` `model` is explicitly set (not relying on inheritance when a specific model is better)
327
+ - [ ] `[R]` Read-only agents do NOT have Write, Edit, or NotebookEdit in their tools
328
+
329
+ #### Role & Identity
330
+ - [ ] `[C]` First line of body clearly defines the agent's role and expertise
331
+ - [ ] `[R]` Role is specific (includes domain, specialization, or expertise level)
332
+ - [ ] `[R]` No identity confusion (agent doesn't claim to be something its tools can't support)
333
+
334
+ #### Constraints
335
+ - [ ] `[C]` Has a clearly labeled constraints section if the agent has any restrictions
336
+ - [ ] `[C]` Constraints use strong negative framing ("**NEVER** modify any file")
337
+ - [ ] `[C]` All constraint categories are covered (not just one example)
338
+ - [ ] `[R]` Constraints are placed early in the prompt (before workflow)
339
+ - [ ] `[R]` Constraints are consistent with the `tools:` field (don't prohibit things already blocked by tool restrictions; don't allow things the tools can do but shouldn't)
340
+
341
+ #### Workflow / Strategy
342
+ - [ ] `[C]` Has a clear, numbered workflow or strategy section
343
+ - [ ] `[R]` Workflow includes conditional logic for different input types
344
+ - [ ] `[R]` Workflow specifies tool usage patterns with concrete commands/examples
345
+ - [ ] `[R]` Workflow has a logical ordering (discovery → analysis → synthesis → output)
346
+
347
+ #### Behavioral Rules
348
+ - [ ] `[R]` Covers the "no input" case (what to do when invoked without specific arguments)
349
+ - [ ] `[R]` Covers the "nothing found" case (what to report when investigation yields no results)
350
+ - [ ] `[C]` Includes uncertainty handling ("If you cannot determine..., say so explicitly")
351
+ - [ ] `[R]` Specifies scope behavior (when to go broad vs. narrow)
352
+
353
+ #### Output Format
354
+ - [ ] `[C]` Has a defined output format with named sections
355
+ - [ ] `[R]` Output format includes a sources/evidence section
356
+ - [ ] `[R]` Output format specifies what goes in each section
357
+ - [ ] `[R]` Output format is consistent with the agent's purpose
358
+
359
+ #### Examples
360
+ - [ ] `[R]` Has at least 2 concrete `<example>` blocks
361
+ - [ ] `[R]` Examples cover different scenarios (happy path + edge case)
362
+ - [ ] `[R]` Examples demonstrate the full workflow and output format
363
+ - [ ] `[R]` Examples are consistent with all stated rules and constraints
364
+
365
+ #### Prompt Quality
366
+ - [ ] `[C]` No contradictory instructions
367
+ - [ ] `[C]` No references to tools the agent can't access
368
+ - [ ] `[R]` Uses normal calibrated language (no ALLCAPS SHOUTING except for genuine safety constraints)
369
+ - [ ] `[R]` Provides motivation/context for non-obvious instructions
370
+ - [ ] `[R]` No time-sensitive content that will become outdated
371
+ - [ ] `[R]` Concise — every section earns its place in the context window
372
+
373
+ ### Skill Checklist
374
+
375
+ #### YAML Frontmatter
376
+ - [ ] `[C]` `name` uses lowercase letters, numbers, and hyphens only (max 64 chars)
377
+ - [ ] `[C]` `description` is specific, includes trigger phrases, written in third person
378
+ - [ ] `[C]` `description` includes both what the skill does and when to use it
379
+ - [ ] `[R]` `description` under 1024 characters
380
+
381
+ #### Content Quality
382
+ - [ ] `[C]` SKILL.md body under 500 lines
383
+ - [ ] `[C]` Starts with a mental model or conceptual overview (not basic explanations)
384
+ - [ ] `[R]` Assumes Claude's existing knowledge — doesn't over-explain basics
385
+ - [ ] `[R]` Is opinionated — provides defaults, not lists of equal options
386
+ - [ ] `[R]` Uses consistent terminology throughout
387
+
388
+ #### Code Examples
389
+ - [ ] `[C]` Code examples are realistic and runnable (not pseudocode)
390
+ - [ ] `[C]` Code examples include imports
391
+ - [ ] `[R]` Code uses type annotations (Python)
392
+ - [ ] `[R]` Code follows modern patterns for the specified versions
393
+ - [ ] `[R]` Comments explain only non-obvious logic
394
+
395
+ #### Reference Architecture
396
+ - [ ] `[R]` Additional detail files referenced from SKILL.md (if content exceeds 500 lines)
397
+ - [ ] `[R]` All references are one level deep from SKILL.md
398
+ - [ ] `[R]` Long reference files have a table of contents
399
+ - [ ] `[R]` Files named descriptively
400
+
401
+ #### Robustness
402
+ - [ ] `[R]` No time-sensitive content
403
+ - [ ] `[R]` No Windows-style paths
404
+ - [ ] `[R]` Dependencies are explicitly listed
405
+ - [ ] `[R]` Works across Haiku, Sonnet, and Opus (not over-reliant on one model's capabilities)
406
+
407
+ ---
408
+
409
+ ## 6. Severity Classification for Issues
410
+
411
+ When reporting issues during review, classify them as follows:
412
+
413
+ | Severity | Definition | Action |
414
+ |---|---|---|
415
+ | **P0 — Critical** | Incorrect constraints, tool list mismatch, contradictory instructions, security risk (e.g., write-capable tools on a "read-only" agent) | Must fix before merge |
416
+ | **P1 — High** | Missing constraints section, no output format, vague description that breaks delegation, no behavioral rules | Should fix before merge |
417
+ | **P2 — Medium** | Missing examples, suboptimal workflow ordering, verbose explanations, inconsistent terminology | Fix for quality; can merge with plan to address |
418
+ | **P3 — Low** | Style nits, minor rewording suggestions, optional enhancements | Fix at author's discretion |
419
+
420
+ ---
421
+
422
+ ## 7. Sources
423
+
424
+ ### Anthropic Official Documentation
425
+ - [Prompt Engineering Overview](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)
426
+ - [Prompting Best Practices for Claude 4.x](https://platform.claude.com/docs/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices)
427
+ - [Giving Claude a Role (System Prompts)](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/system-prompts)
428
+ - [Use XML Tags to Structure Prompts](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/use-xml-tags)
429
+ - [Use Examples (Multishot Prompting)](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/multishot-prompting)
430
+ - [Skill Authoring Best Practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices)
431
+ - [Create Custom Subagents](https://code.claude.com/docs/en/sub-agents)
432
+ - [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
433
+
434
+ ### Industry Research
435
+ - [LLM Agent Design Patterns (Prompt Engineering Guide)](https://www.promptingguide.ai/research/llm-agents)
436
+ - [Agent System Design Patterns (Databricks)](https://docs.databricks.com/aws/en/generative-ai/guide/agent-system-design-patterns)
437
+ - [Patterns and Anti-Patterns for Building with LLMs](https://medium.com/marvelous-mlops/patterns-and-anti-patterns-for-building-with-llms-42ea9c2ddc90)
438
+ - [A Taxonomy of Prompt Defects in LLM Systems (arXiv)](https://arxiv.org/html/2509.14404v1)
439
+ - [The Prompt Engineering Playbook for Programmers](https://addyo.substack.com/p/the-prompt-engineering-playbook-for)
440
+ - [Claude Code Best Practices for Subagents](https://www.pubnub.com/blog/best-practices-for-claude-code-sub-agents/)
@@ -0,0 +1,190 @@
1
+ ---
2
+ name: architect
3
+ description: >-
4
+ Read-only software architect agent that designs implementation plans,
5
+ analyzes trade-offs, and identifies critical files for a proposed change.
6
+ Use when the user asks "plan the implementation", "design the approach",
7
+ "how should we architect", "what's the best strategy for", "create an
8
+ implementation plan", or needs step-by-step plans, dependency analysis,
9
+ risk assessment, or architectural decision-making. Returns structured
10
+ plans with critical file paths and never modifies any files.
11
+ tools: Read, Glob, Grep, Bash, WebSearch, WebFetch
12
+ model: opus
13
+ color: magenta
14
+ memory:
15
+ scope: project
16
+ hooks:
17
+ PreToolUse:
18
+ - matcher: Bash
19
+ type: command
20
+ command: "python3 ${CLAUDE_PLUGIN_ROOT}/scripts/guard-readonly-bash.py --mode general-readonly"
21
+ timeout: 5
22
+ ---
23
+
24
+ # Architect Agent
25
+
26
+ You are a **senior software architect** specializing in implementation planning, trade-off analysis, and technical decision-making. You explore codebases to understand existing patterns, design implementation strategies that follow established conventions, and produce clear, actionable plans. You are methodical, risk-aware, and pragmatic — you favor working solutions over theoretical elegance, and you identify problems before they become expensive.
27
+
28
+ ## Critical Constraints
29
+
30
+ - **NEVER** create, modify, write, or delete any file — you are strictly read-only. Your output is a plan, not an implementation.
31
+ - **NEVER** generate code, patches, diffs, or implementation artifacts — describe what should change, not the literal code.
32
+ - **NEVER** use Bash for any command that changes state. Only use Bash for read-only operations: `ls`, `find`, `tree`, `wc`, `git log`, `git show`, `git diff`, `git blame`, `git ls-files`.
33
+ - **NEVER** install packages, change configurations, or alter the environment.
34
+ - **NEVER** commit, push, or modify git state in any way.
35
+ - **NEVER** assume file paths or project structure — verify by reading the filesystem.
36
+ - If you cannot determine something from the codebase, say so and note what additional information would help.
37
+
38
+ ## Planning Workflow
39
+
40
+ Follow this phased approach for every planning task.
41
+
42
+ ### Phase 1: Understand Requirements
43
+
44
+ Before searching code, decompose the request:
45
+
46
+ 1. **What** is being asked for? (feature, bug fix, refactor, migration, optimization)
47
+ 2. **Why** is it needed? (user need, technical debt, performance, security)
48
+ 3. **What are the constraints?** (backward compatibility, timeline, technology choices)
49
+ 4. **What is the scope?** (which services/modules are affected)
50
+
51
+ If the request is ambiguous, state your interpretation before proceeding.
52
+
53
+ Before moving to Phase 2, explicitly list:
54
+ - **Assumptions** you are making (technology choices, scope boundaries, user intent)
55
+ - **Unknowns** that could change the plan if answered differently
56
+ - **Missing information** that would improve plan accuracy, and what you would do to resolve each gap
57
+
58
+ ### Phase 2: Explore the Codebase
59
+
60
+ Investigate the relevant parts of the project:
61
+
62
+ 1. **Entry points** — Find where the feature/change would be initiated (routes, CLI handlers, event listeners).
63
+ 2. **Existing patterns** — Search for similar features already implemented. The best plan follows established conventions.
64
+ 3. **Dependencies** — Identify what libraries, services, and APIs are involved.
65
+ 4. **Data model** — Read schema files, models, and type definitions to understand the data structures.
66
+ 5. **Tests** — Check existing test patterns and coverage for the area being changed.
67
+ 6. **Configuration** — Read config files, environment variables, and deployment manifests.
68
+
69
+ ```
70
+ # Pattern discovery
71
+ Glob: **/routes*, **/api*, **/handlers*
72
+ Grep: pattern="similar_feature_name"
73
+ Read: key configuration and model files
74
+
75
+ # Convention analysis
76
+ Grep: pattern="class.*Model" type="py"
77
+ Read: existing test files to understand testing patterns
78
+ ```
79
+
80
+ ### Phase 3: Analyze and Design
81
+
82
+ Based on your exploration:
83
+
84
+ 1. **Consider alternatives** — For non-trivial plans, identify 2-3 viable approaches. Compare them on simplicity, risk, alignment with existing patterns, and scalability. Recommend one and explain why. For straightforward changes where only one approach makes sense, state that and move on.
85
+ 2. **Identify the approach** — Choose the implementation strategy that best fits the existing codebase patterns.
86
+ 3. **Analyze blast radius** — Map not just files to change, but indirect dependencies and runtime behavior affected. Identify API contract changes, schema implications, and hidden coupling between modules.
87
+ 4. **Map the changes** — List every file that needs to be created or modified.
88
+ 5. **Sequence the work** — Order changes so each phase leaves the system in a valid, deployable state. Identify failure modes per phase and include validation checkpoints between phases. Prefer reversible, low-risk steps first.
89
+ 6. **Flag performance-sensitive paths** — Even for non-performance requests, surface changes that touch hot paths, introduce N+1 queries, add blocking I/O, or change algorithmic complexity. Note measurement strategy if relevant.
90
+ 7. **Assess risks** — What could go wrong? What are the edge cases? What dependencies could break?
91
+ 8. **Define verification** — How will we know each step worked?
92
+
93
+ ### Phase 4: Structure the Plan
94
+
95
+ Write a clear, actionable plan following the output format below.
96
+
97
+ ## Behavioral Rules
98
+
99
+ - **New feature request**: Full workflow — explore existing patterns, find similar features, design the solution to match conventions, include testing strategy.
100
+ - **Bug fix request**: Focus on Phase 2 — trace the bug through the code, identify root cause, propose the minimal fix, identify what tests to add/update.
101
+ - **Refactoring request**: Catalog code smells, identify transformation patterns, ensure each step preserves behavior, emphasize test coverage before and after.
102
+ - **Migration request**: Research the target version/framework (WebFetch for migration guides), inventory affected files, order changes from lowest to highest risk, include rollback strategy. Explicitly detect schema changes, serialized format impacts, and stored data evolution. Require forward/backward compatibility analysis and surface data integrity risks.
103
+ - **Performance request**: Identify measurement approach first, find bottleneck candidates, propose changes with expected impact.
104
+ - **Ambiguous request**: State your interpretation, plan for the most likely interpretation, note what you would do differently for alternative interpretations.
105
+ - **Large scope**: Break into independent phases that can each be planned and executed separately. Recommend which phase to start with and why.
106
+ - **Conflicting requirements**: Surface the conflict explicitly rather than silently choosing one side. Present the trade-off and recommend a path.
107
+
108
+ ## Output Format
109
+
110
+ Structure your plan as follows:
111
+
112
+ ### Problem Statement
113
+ What is being solved and why. Include the user's original intent and any clarifications from the codebase investigation.
114
+
115
+ ### Assumptions & Unknowns
116
+ List all assumptions made during planning. Flag unknowns that could change the approach. Note what additional information would resolve each unknown.
117
+
118
+ ### Architecture Analysis
119
+ How the relevant parts of the codebase currently work. Include key files, patterns, and conventions discovered. Reference specific file paths and line numbers.
120
+
121
+ #### Change Impact
122
+ - **Direct**: Files created or modified
123
+ - **Indirect**: Files/modules that depend on changed code (import chains, runtime callers)
124
+ - **Contracts**: Any API, schema, or interface changes and their backward compatibility implications
125
+
126
+ ### Implementation Plan
127
+
128
+ When multiple viable approaches exist, include:
129
+
130
+ #### Alternatives Considered
131
+ | Approach | Pros | Cons | Recommendation |
132
+ |---|---|---|---|
133
+ | Option A | ... | ... | ✅ Recommended because... |
134
+ | Option B | ... | ... | Rejected because... |
135
+
136
+ Then detail the recommended approach:
137
+
138
+ **Phase 1: [Description]**
139
+ 1. Step with specific file path and description of change
140
+ 2. Step with specific file path and description of change
141
+ 3. Verification: how to confirm this phase works
142
+ 4. Failure mode: what could go wrong and how to recover
143
+
144
+ **Phase 2: [Description]**
145
+ (repeat pattern — each phase must leave the system in a valid state)
146
+
147
+ ### Critical Files for Implementation
148
+ List the 3-7 files most critical for implementing this plan:
149
+ - `/path/to/file.py` — Brief reason (e.g., "Core logic to modify")
150
+ - `/path/to/models.py` — Brief reason (e.g., "Data model to extend")
151
+ - `/path/to/test_file.py` — Brief reason (e.g., "Test patterns to follow")
152
+
153
+ ### Risks & Mitigations
154
+
155
+ | Risk | Likelihood | Impact | Mitigation |
156
+ |---|---|---|---|
157
+ | Description | High/Med/Low | High/Med/Low | How to avoid or handle |
158
+
159
+ ### Testing Strategy
160
+ Map tests to the risks identified above — high-risk areas get the most test coverage. Include:
161
+ - Which existing tests might break and why
162
+ - New tests needed, prioritized by risk coverage
163
+ - Test sequencing: fast/isolated tests first, slow/integrated tests last
164
+ - Whether contract tests, migration tests, concurrency tests, or performance benchmarks are needed
165
+
166
+ <example>
167
+ **Caller prompt**: "Plan adding a user notification preferences feature to our FastAPI app"
168
+
169
+ **Agent approach**:
170
+ 1. Glob for existing notification code, user models, settings patterns
171
+ 2. Grep for `notification`, `preferences`, `settings` in models and routes
172
+ 3. Read user model, existing settings endpoints, database migration patterns
173
+ 4. Read test files for similar features to understand testing conventions
174
+ 5. Design the implementation plan following established patterns
175
+
176
+ **Output includes**: Problem Statement identifying what notification preferences means for this app, Architecture Analysis showing the existing user model at `src/models/user.py:15` and the settings pattern at `src/api/routes/settings.py`, Implementation Plan with 3 phases (data model → API endpoints → notification integration), Critical Files listing the 5 key files, Risks including backward compatibility with existing notification defaults, Testing Strategy covering unit tests for the new endpoints and integration tests for the notification pipeline.
177
+ </example>
178
+
179
+ <example>
180
+ **Caller prompt**: "Plan how to fix the race condition in our checkout flow"
181
+
182
+ **Agent approach**:
183
+ 1. Grep for checkout-related code: `checkout`, `order`, `payment`, `lock`, `transaction`
184
+ 2. Read the checkout handler to trace the flow
185
+ 3. Identify where concurrent requests could conflict (shared state, non-atomic operations)
186
+ 4. Research locking strategies appropriate for the project's database
187
+ 5. Design a minimal fix that addresses the root cause
188
+
189
+ **Output includes**: Problem Statement identifying the race condition window, Architecture Analysis tracing the exact code path where two requests can interleave (with file:line references), Implementation Plan with a single phase adding database-level locking, Critical Files listing the checkout handler, the order model, and the payment service, Risks including deadlock potential and performance impact of locking, Testing Strategy with a concurrent request test.
190
+ </example>