codeforge-dev 1.5.7 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/.devcontainer/.env +2 -1
  2. package/.devcontainer/CHANGELOG.md +55 -9
  3. package/.devcontainer/CLAUDE.md +65 -15
  4. package/.devcontainer/README.md +67 -6
  5. package/.devcontainer/config/keybindings.json +5 -0
  6. package/.devcontainer/config/main-system-prompt.md +63 -2
  7. package/.devcontainer/config/settings.json +25 -6
  8. package/.devcontainer/devcontainer.json +23 -7
  9. package/.devcontainer/features/README.md +21 -7
  10. package/.devcontainer/features/ccburn/README.md +60 -0
  11. package/.devcontainer/features/ccburn/devcontainer-feature.json +38 -0
  12. package/.devcontainer/features/ccburn/install.sh +174 -0
  13. package/.devcontainer/features/ccstatusline/README.md +22 -21
  14. package/.devcontainer/features/ccstatusline/devcontainer-feature.json +1 -1
  15. package/.devcontainer/features/ccstatusline/install.sh +48 -16
  16. package/.devcontainer/features/claude-code/config/settings.json +60 -24
  17. package/.devcontainer/features/mcp-qdrant/devcontainer-feature.json +1 -1
  18. package/.devcontainer/features/mcp-reasoner/devcontainer-feature.json +1 -1
  19. package/.devcontainer/plugins/devs-marketplace/plugins/auto-formatter/scripts/__pycache__/format-on-stop.cpython-314.pyc +0 -0
  20. package/.devcontainer/plugins/devs-marketplace/plugins/auto-formatter/scripts/format-on-stop.py +21 -6
  21. package/.devcontainer/plugins/devs-marketplace/plugins/auto-linter/scripts/__pycache__/lint-file.cpython-314.pyc +0 -0
  22. package/.devcontainer/plugins/devs-marketplace/plugins/auto-linter/scripts/lint-file.py +7 -10
  23. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/REVIEW-RUBRIC.md +440 -0
  24. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/architect.md +190 -0
  25. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/bash-exec.md +173 -0
  26. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/claude-guide.md +155 -0
  27. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/dependency-analyst.md +248 -0
  28. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/doc-writer.md +233 -0
  29. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/explorer.md +235 -0
  30. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/generalist.md +125 -0
  31. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/git-archaeologist.md +242 -0
  32. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/migrator.md +195 -0
  33. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/perf-profiler.md +265 -0
  34. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/refactorer.md +209 -0
  35. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/researcher.md +195 -0
  36. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/security-auditor.md +289 -0
  37. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/spec-writer.md +284 -0
  38. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/statusline-config.md +188 -0
  39. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/agents/test-writer.md +245 -0
  40. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/hooks/hooks.json +12 -0
  41. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/guard-readonly-bash.cpython-314.pyc +0 -0
  42. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/redirect-builtin-agents.cpython-314.pyc +0 -0
  43. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/skill-suggester.cpython-314.pyc +0 -0
  44. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/syntax-validator.cpython-314.pyc +0 -0
  45. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/verify-no-regression.cpython-314.pyc +0 -0
  46. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/__pycache__/verify-tests-pass.cpython-314.pyc +0 -0
  47. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/guard-readonly-bash.py +611 -0
  48. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/redirect-builtin-agents.py +83 -0
  49. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/skill-suggester.py +85 -2
  50. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/syntax-validator.py +9 -4
  51. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/verify-no-regression.py +221 -0
  52. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/scripts/verify-tests-pass.py +176 -0
  53. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/claude-agent-sdk/SKILL.md +599 -0
  54. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/claude-agent-sdk/references/sdk-typescript-reference.md +954 -0
  55. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/git-forensics/SKILL.md +276 -0
  56. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/git-forensics/references/advanced-commands.md +332 -0
  57. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/git-forensics/references/investigation-playbooks.md +319 -0
  58. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/SKILL.md +341 -0
  59. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/references/interpreting-results.md +235 -0
  60. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/references/tool-commands.md +395 -0
  61. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/refactoring-patterns/SKILL.md +344 -0
  62. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/refactoring-patterns/references/safe-transformations.md +247 -0
  63. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/refactoring-patterns/references/smell-catalog.md +332 -0
  64. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/security-checklist/SKILL.md +277 -0
  65. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/security-checklist/references/owasp-patterns.md +269 -0
  66. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/security-checklist/references/secrets-patterns.md +253 -0
  67. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/specification-writing/SKILL.md +288 -0
  68. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/specification-writing/references/criteria-patterns.md +245 -0
  69. package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/specification-writing/references/ears-templates.md +239 -0
  70. package/.devcontainer/plugins/devs-marketplace/plugins/protected-files-guard/scripts/__pycache__/guard-protected.cpython-314.pyc +0 -0
  71. package/.devcontainer/plugins/devs-marketplace/plugins/protected-files-guard/scripts/guard-protected.py +40 -39
  72. package/.devcontainer/scripts/setup-aliases.sh +10 -20
  73. package/.devcontainer/scripts/setup-config.sh +2 -0
  74. package/.devcontainer/scripts/setup-plugins.sh +38 -46
  75. package/.devcontainer/scripts/setup-projects.sh +175 -0
  76. package/.devcontainer/scripts/setup-symlink-claude.sh +36 -0
  77. package/.devcontainer/scripts/setup-update-claude.sh +11 -8
  78. package/.devcontainer/scripts/setup.sh +4 -2
  79. package/package.json +1 -1
  80. package/.devcontainer/scripts/setup-irie-claude.sh +0 -32
@@ -28,11 +28,7 @@ def lint_python(file_path: str) -> tuple[bool, str]:
28
28
 
29
29
  # Check if pyright is available
30
30
  try:
31
- subprocess.run(
32
- ["which", pyright_cmd],
33
- capture_output=True,
34
- check=True
35
- )
31
+ subprocess.run(["which", pyright_cmd], capture_output=True, check=True)
36
32
  except subprocess.CalledProcessError:
37
33
  return True, "" # Pyright not available
38
34
 
@@ -41,7 +37,7 @@ def lint_python(file_path: str) -> tuple[bool, str]:
41
37
  [pyright_cmd, "--outputjson", file_path],
42
38
  capture_output=True,
43
39
  text=True,
44
- timeout=55
40
+ timeout=10,
45
41
  )
46
42
 
47
43
  # Parse pyright JSON output
@@ -79,7 +75,10 @@ def lint_python(file_path: str) -> tuple[bool, str]:
79
75
  except json.JSONDecodeError:
80
76
  # Pyright output not JSON, might be an error
81
77
  if result.stderr:
82
- return True, f"[Auto-linter] Pyright error: {result.stderr.strip()[:100]}"
78
+ return (
79
+ True,
80
+ f"[Auto-linter] Pyright error: {result.stderr.strip()[:100]}",
81
+ )
83
82
  return True, ""
84
83
 
85
84
  except subprocess.TimeoutExpired:
@@ -120,9 +119,7 @@ def main():
120
119
 
121
120
  if message:
122
121
  # Output context for Claude
123
- print(json.dumps({
124
- "additionalContext": message
125
- }))
122
+ print(json.dumps({"additionalContext": message}))
126
123
 
127
124
  sys.exit(0)
128
125
 
@@ -0,0 +1,440 @@
1
+ # Agent & Skill Quality Rubric
2
+
3
+ > Compiled from Anthropic's official documentation, Claude Code subagent docs, skill authoring best practices, and industry research on LLM agent design patterns. This rubric drives the quality review of all agents and skills in the `code-directive` plugin.
4
+
5
+ ---
6
+
7
+ ## 1. Key Principles from Anthropic
8
+
9
+ These principles come directly from Anthropic's official prompt engineering documentation for Claude 4.x models (Opus 4.6, Sonnet 4.5, Haiku 4.5).
10
+
11
+ ### 1.1 Be Explicit and Specific
12
+
13
+ Claude 4.x models are trained for **precise instruction following**. They do what you ask — nothing more, nothing less. Vague prompts produce vague results. If you want thorough, above-and-beyond behavior, you must explicitly request it.
14
+
15
+ - **Bad**: "Review this code"
16
+ - **Good**: "Review this code for security vulnerabilities, performance issues, and readability. For each issue, explain the problem, show the current code, and provide a corrected version."
17
+
18
+ **Implication for agents**: Every agent prompt must clearly define what the agent should do, how it should do it, and what its output should look like. Do not rely on Claude inferring intent from vague instructions.
19
+
20
+ ### 1.2 Provide Context and Motivation (Explain WHY)
21
+
22
+ Providing the *reason* behind instructions helps Claude generalize correctly. Instead of bare rules, explain the motivation.
23
+
24
+ - **Bad**: "NEVER use ellipses"
25
+ - **Good**: "Never use ellipses because your output will be read by a text-to-speech engine that cannot pronounce them."
26
+
27
+ **Implication for agents**: When an agent has constraints (e.g., "read-only"), briefly explain why. When an agent follows a particular workflow, explain the rationale so it can adapt intelligently to edge cases.
28
+
29
+ ### 1.3 Be Vigilant with Examples and Details
30
+
31
+ Claude pays close attention to examples. Poorly constructed examples teach bad patterns. Examples should:
32
+ - Align precisely with desired behavior
33
+ - Cover edge cases and diverse scenarios
34
+ - Be wrapped in `<example>` tags for clarity
35
+ - Include 3-5 examples for complex tasks; 1 example for simple ones
36
+
37
+ ### 1.4 Use XML Tags for Structure
38
+
39
+ Claude was trained on XML-tagged prompts. Tags like `<instructions>`, `<example>`, `<constraints>` prevent Claude from confusing instructions with context or examples with rules.
40
+
41
+ - Be **consistent** with tag names throughout the prompt
42
+ - **Nest** tags for hierarchical content: `<outer><inner></inner></outer>`
43
+ - **Refer** to tagged content by tag name: "Using the data in `<context>` tags..."
44
+ - There are no canonical "best" tag names — use names that make sense for the content they surround
45
+
46
+ ### 1.5 Allow Uncertainty
47
+
48
+ Give Claude explicit permission to say "I don't know" rather than guessing. This reduces hallucinations, especially in research and diagnostic agents.
49
+
50
+ ### 1.6 Tell Claude What TO Do, Not What NOT to Do
51
+
52
+ Positive framing is more effective than negative framing for behavioral steering:
53
+ - **Bad**: "Do not use markdown in your response"
54
+ - **Good**: "Write your response in smoothly flowing prose paragraphs."
55
+
56
+ **Exception**: Safety constraints (e.g., "NEVER modify files") should still use strong negative framing because the cost of violation is high.
57
+
58
+ ### 1.7 Claude 4.x Is More Responsive to System Prompts
59
+
60
+ Claude Opus 4.5 and 4.6 are more responsive to system prompts than previous models. Aggressive language designed to prevent undertriggering in older models (e.g., "CRITICAL: You MUST...") may now cause **overtriggering**. Use calibrated, normal language unless the constraint is genuinely critical.
61
+
62
+ ---
63
+
64
+ ## 2. System Prompt Best Practices
65
+
66
+ ### 2.1 Identity & Role
67
+
68
+ Role prompting is the single most powerful use of system prompts. The right role turns Claude from a generalist into a domain expert.
69
+
70
+ **Best practices**:
71
+ - Define the role in the **first line** of the prompt body. This sets the frame for everything that follows.
72
+ - Be **specific**: "You are a senior Python developer specializing in FastAPI and async patterns" beats "You are a coding assistant."
73
+ - Include **expertise level**: "senior", "expert", "specialist" signals the depth expected.
74
+ - Optionally include **personality traits** relevant to the task: "methodical", "thorough", "concise".
75
+ - The `description` field in YAML frontmatter is for Claude's **task routing** — it tells the parent agent *when* to delegate. The markdown body is the agent's **system prompt** — it tells the agent *how* to behave.
76
+
77
+ **Agent-specific guidance**:
78
+ - The `name` field must use lowercase letters and hyphens only
79
+ - The `description` field should clearly state: (a) what the agent does, and (b) when it should be used
80
+ - Write descriptions in **third person**: "Analyzes code for security vulnerabilities" not "I analyze code" or "Use this to analyze code"
81
+ - Include **trigger phrases** the user might say that should invoke this agent
82
+
83
+ ### 2.2 Constraints & Boundaries
84
+
85
+ Constraints define what the agent **must not** do. They are safety rails.
86
+
87
+ **Best practices**:
88
+ - Group all hard constraints in a clearly labeled section (`## Critical Constraints` or similar)
89
+ - Use strong negative framing for safety-critical constraints: "**NEVER** modify any file"
90
+ - Be exhaustive — list every prohibited action category, not just one example
91
+ - Explain *why* the constraint exists when not obvious
92
+ - Keep constraints at the top of the prompt, before workflow instructions
93
+
94
+ **Common constraint categories for agents**:
95
+ - File system modifications (read-only agents)
96
+ - Service/process management (diagnostic agents)
97
+ - Package installation (sandboxed agents)
98
+ - Git state changes (research agents)
99
+ - Network requests (isolated agents)
100
+
101
+ ### 2.3 Behavioral Rules
102
+
103
+ Behavioral rules define how the agent **should act** in different scenarios. They are the decision-making logic.
104
+
105
+ **Best practices**:
106
+ - Use **conditional dispatch**: "If X, do Y. If Z, do W." This helps Claude handle varied inputs.
107
+ - Cover the **common scenarios** the agent will encounter, including the "no input" case.
108
+ - Include **negative result reporting**: "Always report what was checked, even if nothing was found."
109
+ - Include **uncertainty handling**: "If you cannot determine the answer, say so and explain what additional information would help."
110
+ - Be specific about **scope escalation**: When should the agent go broad vs. narrow?
111
+
112
+ ### 2.4 Examples & Few-Shot
113
+
114
+ Examples are the most effective way to communicate expected behavior.
115
+
116
+ **Best practices**:
117
+ - Wrap examples in `<example>` tags (multiple examples in `<examples>` parent tag)
118
+ - Include **input → output** pairs that show the complete workflow
119
+ - Provide **3-5 diverse examples** for complex agents, covering:
120
+ - The happy path (typical input)
121
+ - Edge cases (unusual input)
122
+ - Error cases (bad input or no results)
123
+ - Ensure examples are **consistent** with all stated rules and constraints
124
+ - Examples should demonstrate the **output format** in action, not just describe it
125
+ - Place examples **after** the rules they illustrate, not before
126
+
127
+ ### 2.5 Output Format Specification
128
+
129
+ A structured output format ensures the agent's results are predictable and parseable.
130
+
131
+ **Best practices**:
132
+ - Define a clear output template with named sections
133
+ - Use markdown headers (`###`) for top-level sections
134
+ - Use consistent formatting within sections (bullet lists, tables, etc.)
135
+ - Include a "Sources" or "Evidence" section that traces claims to specific files, URLs, or line numbers
136
+ - Specify what goes in each section so there's no ambiguity
137
+ - Match the output format to the consumer — if a human reads it, optimize for readability; if another tool parses it, optimize for structure
138
+
139
+ ### 2.6 Tool Usage Guidance
140
+
141
+ Agents need explicit guidance on *how* to use their available tools effectively.
142
+
143
+ **Best practices**:
144
+ - Show concrete tool usage patterns with realistic commands/queries
145
+ - Specify tool selection logic: "Use Glob to discover files, then Grep to search content, then Read to examine specific files"
146
+ - Include command templates with placeholder values
147
+ - Warn about tool-specific pitfalls (e.g., "For large logs, always filter with Grep before reading. Never dump entire large files.")
148
+ - If the agent has Bash access, provide allowed command patterns and explicitly prohibit dangerous ones
149
+ - If tools have been restricted via `tools:` or `disallowedTools:`, the prompt should align with what's available — don't reference tools the agent can't use
150
+
151
+ ---
152
+
153
+ ## 3. Agent Definition Patterns
154
+
155
+ ### 3.1 What Makes an Effective Agent
156
+
157
+ Based on Claude Code's subagent architecture and Anthropic's guidance:
158
+
159
+ 1. **Single Responsibility**: Each agent should excel at one specific task domain. Don't create Swiss Army knife agents.
160
+ 2. **Clear Delegation Signal**: The `description` must be specific enough that the parent agent knows *exactly* when to delegate. Include trigger phrases.
161
+ 3. **Minimal Tool Surface**: Grant only the tools the agent needs. Read-only agents should not have Write/Edit. Diagnostic agents should not have file creation.
162
+ 4. **Structured Workflow**: The prompt should define a clear, repeatable workflow — not just "do the thing." Steps should be numbered and conditional.
163
+ 5. **Defined Output Contract**: The agent should always produce output in a predictable format, regardless of what it finds.
164
+ 6. **Graceful Failure**: The agent should handle cases where it can't find what it's looking for, can't complete the task, or encounters errors. It should report these clearly rather than hallucinating.
165
+ 7. **Context Efficiency**: Agents run in their own context window. Design prompts to be thorough but not wasteful. Every line should earn its place.
166
+
167
+ ### 3.2 Common Anti-Patterns to Avoid
168
+
169
+ | Anti-Pattern | Why It's Bad | Fix |
170
+ |---|---|---|
171
+ | **Vague description** ("Helps with code") | Parent agent can't decide when to delegate | Be specific: "Analyzes Python code for security vulnerabilities including OWASP Top 10, injection flaws, and authentication weaknesses" |
172
+ | **Missing constraints section** | Agent may modify files, install packages, or cause side effects | Add explicit `## Critical Constraints` section listing prohibited actions |
173
+ | **Overloaded prompt** (too many tasks) | Agent loses focus, produces inconsistent results | Split into multiple focused agents |
174
+ | **No output format** | Results vary wildly between invocations | Define a structured output template |
175
+ | **ALLCAPS SHOUTING throughout** | Claude 4.x overtriggers on aggressive language; creates noise | Reserve strong emphasis for genuinely critical safety constraints; use normal language elsewhere |
176
+ | **No examples** | Agent guesses at expected behavior | Add 2-3 concrete input→output examples |
177
+ | **Contradictory instructions** | Agent behavior becomes unpredictable | Review for internal consistency; have Claude check |
178
+ | **Tool references that don't match `tools:` field** | Agent tries to use unavailable tools | Audit prompt against YAML `tools:` list |
179
+ | **Assuming Claude knows project-specific things** | Hallucinated project details | Provide concrete context or instruct the agent to discover it |
180
+ | **No negative-result handling** | Agent hallucinates results when it finds nothing | Add explicit "report what you checked even if nothing was found" |
181
+ | **Time-sensitive content** | Becomes wrong as tools/APIs evolve | Use version-agnostic language or "old patterns" sections |
182
+
183
+ ### 3.3 Structure & Organization
184
+
185
+ **Recommended agent file structure:**
186
+
187
+ ```markdown
188
+ ---
189
+ name: kebab-case-name
190
+ description: >-
191
+ Third-person description of what the agent does and when to use it.
192
+ Include trigger phrases users might say.
193
+ tools: List, Of, Allowed, Tools
194
+ model: sonnet | opus | haiku | inherit
195
+ color: display-color
196
+ ---
197
+
198
+ # Agent Name
199
+
200
+ Opening paragraph: role definition, purpose, and key capability.
201
+
202
+ ## Critical Constraints
203
+
204
+ Exhaustive list of prohibited actions with strong negative framing.
205
+
206
+ ## Strategy / Workflow
207
+
208
+ Step-by-step procedure the agent follows. Use numbered phases.
209
+ Include conditional logic for different input types.
210
+
211
+ ## Behavioral Rules
212
+
213
+ Conditional dispatch rules for different scenarios.
214
+ Include the "no input" and "error" cases.
215
+
216
+ ## Output Format
217
+
218
+ Structured template for the agent's response.
219
+ Named sections with descriptions of what goes in each.
220
+
221
+ <example>
222
+ Concrete input→output example demonstrating the full workflow.
223
+ </example>
224
+
225
+ <example>
226
+ Second example covering a different scenario or edge case.
227
+ </example>
228
+ ```
229
+
230
+ **Key structural principles:**
231
+ - Role definition comes first (sets the frame)
232
+ - Constraints come early (before workflow, so they're weighted heavily)
233
+ - Workflow is the longest section (the operational core)
234
+ - Output format provides the contract
235
+ - Examples come last (they demonstrate everything above in action)
236
+
237
+ ---
238
+
239
+ ## 4. Skill Content Best Practices
240
+
241
+ These are drawn directly from Anthropic's official [Skill Authoring Best Practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices).
242
+
243
+ ### 4.1 Core Principle: Conciseness
244
+
245
+ The context window is a shared resource. Every token in your skill competes with conversation history, system prompts, and other skills.
246
+
247
+ **Default assumption**: Claude is already very smart. Only add context Claude doesn't already have.
248
+
249
+ For each piece of information, ask:
250
+ - "Does Claude really need this explanation?"
251
+ - "Can I assume Claude knows this?"
252
+ - "Does this paragraph justify its token cost?"
253
+
254
+ **Bad**: 150 tokens explaining what PDFs are before showing how to extract text.
255
+ **Good**: 50 tokens showing the extraction code directly.
256
+
257
+ ### 4.2 Technical Content Quality
258
+
259
+ **Best practices**:
260
+ - **Lead with the mental model**: Start with a concise explanation of how the technology works conceptually, then provide specifics.
261
+ - **Assume competence**: Don't explain basics Claude already knows. Focus on the non-obvious: gotchas, best practices, version-specific details, and patterns that differ from common assumptions.
262
+ - **Be opinionated**: Provide a default recommendation rather than listing multiple options. "Use pdfplumber for text extraction" beats "You can use pypdf, pdfplumber, PyMuPDF, or..."
263
+ - **Version-pin where it matters**: Specify versions for APIs with breaking changes. "Assume FastAPI 0.100+ with Pydantic v2" prevents confusion.
264
+ - **Provide escape hatches**: After the default, note alternatives for edge cases. "For scanned PDFs requiring OCR, use pdf2image with pytesseract instead."
265
+
266
+ ### 4.3 Code Example Standards
267
+
268
+ - Show **realistic, runnable code** — not pseudocode
269
+ - Include **imports** — don't make Claude guess
270
+ - Use **type annotations** in Python examples
271
+ - Include **error handling** only when it illustrates a non-obvious pattern
272
+ - Keep examples **minimal but complete** — enough to copy-paste and run
273
+ - Use **consistent style** across all examples in a skill
274
+ - Comment only the non-obvious — don't explain what `import json` does
275
+
276
+ ### 4.4 Reference Material Design (Progressive Disclosure)
277
+
278
+ Anthropic's recommended pattern: SKILL.md is the table of contents; detail files are chapters.
279
+
280
+ - Keep SKILL.md under **500 lines**
281
+ - Split large content into separate files referenced from SKILL.md
282
+ - Keep references **one level deep** (SKILL.md → reference file, not SKILL.md → file → file → file)
283
+ - For reference files over 100 lines, include a **table of contents** at the top
284
+ - Name files descriptively: `form_validation_rules.md` not `doc2.md`
285
+ - Organize by domain: `reference/finance.md`, `reference/sales.md`
286
+
287
+ ### 4.5 Description Field
288
+
289
+ The `description` field is the **most critical field** for skill discovery. Claude uses it to choose the right skill from potentially 100+ available skills.
290
+
291
+ **Best practices**:
292
+ - Write in **third person** (injected into system prompt; inconsistent POV causes discovery problems)
293
+ - Include both **what the skill does** and **when to use it**
294
+ - Include **trigger phrases** the user might say (quoted phrases work well)
295
+ - Include **key terms** users might mention
296
+ - Be specific, not vague: "Extract text and tables from PDF files, fill forms, merge documents" not "Helps with documents"
297
+ - Maximum 1024 characters
298
+
299
+ ### 4.6 Skill Anti-Patterns
300
+
301
+ | Anti-Pattern | Fix |
302
+ |---|---|
303
+ | Explaining basics Claude already knows | Delete the explanation; show code directly |
304
+ | Offering too many options without a default | Pick one default; mention alternatives as escape hatches |
305
+ | Deeply nested file references (3+ levels) | Keep all references one level from SKILL.md |
306
+ | Windows-style paths (`\`) | Always use forward slashes (`/`) |
307
+ | Time-sensitive information | Use "old patterns" sections or version-agnostic language |
308
+ | Inconsistent terminology | Pick one term and use it throughout |
309
+ | Vague description field | Be specific with trigger phrases and key terms |
310
+ | Over-verbose SKILL.md (>500 lines) | Split into referenced files |
311
+
312
+ ---
313
+
314
+ ## 5. Quality Checklist
315
+
316
+ Use this checklist when reviewing each agent definition and skill. Items marked with `[C]` are critical (must fix); items marked with `[R]` are recommended (should fix).
317
+
318
+ ### Agent Definition Checklist
319
+
320
+ #### YAML Frontmatter
321
+ - [ ] `[C]` `name` uses lowercase letters and hyphens only
322
+ - [ ] `[C]` `description` is non-empty and describes both *what* and *when*
323
+ - [ ] `[C]` `description` is written in third person
324
+ - [ ] `[R]` `description` includes trigger phrases users might say
325
+ - [ ] `[C]` `tools` lists only the tools the agent actually needs (principle of least privilege)
326
+ - [ ] `[R]` `model` is explicitly set (not relying on inheritance when a specific model is better)
327
+ - [ ] `[R]` Read-only agents do NOT have Write, Edit, or NotebookEdit in their tools
328
+
329
+ #### Role & Identity
330
+ - [ ] `[C]` First line of body clearly defines the agent's role and expertise
331
+ - [ ] `[R]` Role is specific (includes domain, specialization, or expertise level)
332
+ - [ ] `[R]` No identity confusion (agent doesn't claim to be something its tools can't support)
333
+
334
+ #### Constraints
335
+ - [ ] `[C]` Has a clearly labeled constraints section if the agent has any restrictions
336
+ - [ ] `[C]` Constraints use strong negative framing ("**NEVER** modify any file")
337
+ - [ ] `[C]` All constraint categories are covered (not just one example)
338
+ - [ ] `[R]` Constraints are placed early in the prompt (before workflow)
339
+ - [ ] `[R]` Constraints are consistent with the `tools:` field (don't prohibit things already blocked by tool restrictions; don't allow things the tools can do but shouldn't)
340
+
341
+ #### Workflow / Strategy
342
+ - [ ] `[C]` Has a clear, numbered workflow or strategy section
343
+ - [ ] `[R]` Workflow includes conditional logic for different input types
344
+ - [ ] `[R]` Workflow specifies tool usage patterns with concrete commands/examples
345
+ - [ ] `[R]` Workflow has a logical ordering (discovery → analysis → synthesis → output)
346
+
347
+ #### Behavioral Rules
348
+ - [ ] `[R]` Covers the "no input" case (what to do when invoked without specific arguments)
349
+ - [ ] `[R]` Covers the "nothing found" case (what to report when investigation yields no results)
350
+ - [ ] `[C]` Includes uncertainty handling ("If you cannot determine..., say so explicitly")
351
+ - [ ] `[R]` Specifies scope behavior (when to go broad vs. narrow)
352
+
353
+ #### Output Format
354
+ - [ ] `[C]` Has a defined output format with named sections
355
+ - [ ] `[R]` Output format includes a sources/evidence section
356
+ - [ ] `[R]` Output format specifies what goes in each section
357
+ - [ ] `[R]` Output format is consistent with the agent's purpose
358
+
359
+ #### Examples
360
+ - [ ] `[R]` Has at least 2 concrete `<example>` blocks
361
+ - [ ] `[R]` Examples cover different scenarios (happy path + edge case)
362
+ - [ ] `[R]` Examples demonstrate the full workflow and output format
363
+ - [ ] `[R]` Examples are consistent with all stated rules and constraints
364
+
365
+ #### Prompt Quality
366
+ - [ ] `[C]` No contradictory instructions
367
+ - [ ] `[C]` No references to tools the agent can't access
368
+ - [ ] `[R]` Uses normal calibrated language (no ALLCAPS SHOUTING except for genuine safety constraints)
369
+ - [ ] `[R]` Provides motivation/context for non-obvious instructions
370
+ - [ ] `[R]` No time-sensitive content that will become outdated
371
+ - [ ] `[R]` Concise — every section earns its place in the context window
372
+
373
+ ### Skill Checklist
374
+
375
+ #### YAML Frontmatter
376
+ - [ ] `[C]` `name` uses lowercase letters, numbers, and hyphens only (max 64 chars)
377
+ - [ ] `[C]` `description` is specific, includes trigger phrases, written in third person
378
+ - [ ] `[C]` `description` includes both what the skill does and when to use it
379
+ - [ ] `[R]` `description` under 1024 characters
380
+
381
+ #### Content Quality
382
+ - [ ] `[C]` SKILL.md body under 500 lines
383
+ - [ ] `[C]` Starts with a mental model or conceptual overview (not basic explanations)
384
+ - [ ] `[R]` Assumes Claude's existing knowledge — doesn't over-explain basics
385
+ - [ ] `[R]` Is opinionated — provides defaults, not lists of equal options
386
+ - [ ] `[R]` Uses consistent terminology throughout
387
+
388
+ #### Code Examples
389
+ - [ ] `[C]` Code examples are realistic and runnable (not pseudocode)
390
+ - [ ] `[C]` Code examples include imports
391
+ - [ ] `[R]` Code uses type annotations (Python)
392
+ - [ ] `[R]` Code follows modern patterns for the specified versions
393
+ - [ ] `[R]` Comments explain only non-obvious logic
394
+
395
+ #### Reference Architecture
396
+ - [ ] `[R]` Additional detail files referenced from SKILL.md (if content exceeds 500 lines)
397
+ - [ ] `[R]` All references are one level deep from SKILL.md
398
+ - [ ] `[R]` Long reference files have a table of contents
399
+ - [ ] `[R]` Files named descriptively
400
+
401
+ #### Robustness
402
+ - [ ] `[R]` No time-sensitive content
403
+ - [ ] `[R]` No Windows-style paths
404
+ - [ ] `[R]` Dependencies are explicitly listed
405
+ - [ ] `[R]` Works across Haiku, Sonnet, and Opus (not over-reliant on one model's capabilities)
406
+
407
+ ---
408
+
409
+ ## 6. Severity Classification for Issues
410
+
411
+ When reporting issues during review, classify them as follows:
412
+
413
+ | Severity | Definition | Action |
414
+ |---|---|---|
415
+ | **P0 — Critical** | Incorrect constraints, tool list mismatch, contradictory instructions, security risk (e.g., write-capable tools on a "read-only" agent) | Must fix before merge |
416
+ | **P1 — High** | Missing constraints section, no output format, vague description that breaks delegation, no behavioral rules | Should fix before merge |
417
+ | **P2 — Medium** | Missing examples, suboptimal workflow ordering, verbose explanations, inconsistent terminology | Fix for quality; can merge with plan to address |
418
+ | **P3 — Low** | Style nits, minor rewording suggestions, optional enhancements | Fix at author's discretion |
419
+
420
+ ---
421
+
422
+ ## 7. Sources
423
+
424
+ ### Anthropic Official Documentation
425
+ - [Prompt Engineering Overview](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)
426
+ - [Prompting Best Practices for Claude 4.x](https://platform.claude.com/docs/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices)
427
+ - [Giving Claude a Role (System Prompts)](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/system-prompts)
428
+ - [Use XML Tags to Structure Prompts](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/use-xml-tags)
429
+ - [Use Examples (Multishot Prompting)](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/multishot-prompting)
430
+ - [Skill Authoring Best Practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices)
431
+ - [Create Custom Subagents](https://code.claude.com/docs/en/sub-agents)
432
+ - [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
433
+
434
+ ### Industry Research
435
+ - [LLM Agent Design Patterns (Prompt Engineering Guide)](https://www.promptingguide.ai/research/llm-agents)
436
+ - [Agent System Design Patterns (Databricks)](https://docs.databricks.com/aws/en/generative-ai/guide/agent-system-design-patterns)
437
+ - [Patterns and Anti-Patterns for Building with LLMs](https://medium.com/marvelous-mlops/patterns-and-anti-patterns-for-building-with-llms-42ea9c2ddc90)
438
+ - [A Taxonomy of Prompt Defects in LLM Systems (arXiv)](https://arxiv.org/html/2509.14404v1)
439
+ - [The Prompt Engineering Playbook for Programmers](https://addyo.substack.com/p/the-prompt-engineering-playbook-for)
440
+ - [Claude Code Best Practices for Subagents](https://www.pubnub.com/blog/best-practices-for-claude-code-sub-agents/)