@qball-inc/the-bulwark 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (175) hide show
  1. package/.claude-plugin/plugin.json +43 -0
  2. package/agents/bulwark-fix-validator.md +633 -0
  3. package/agents/bulwark-implementer.md +391 -0
  4. package/agents/bulwark-issue-analyzer.md +308 -0
  5. package/agents/bulwark-standards-reviewer.md +221 -0
  6. package/agents/plan-creation-architect.md +323 -0
  7. package/agents/plan-creation-eng-lead.md +352 -0
  8. package/agents/plan-creation-po.md +300 -0
  9. package/agents/plan-creation-qa-critic.md +334 -0
  10. package/agents/product-ideation-competitive-analyzer.md +298 -0
  11. package/agents/product-ideation-idea-validator.md +268 -0
  12. package/agents/product-ideation-market-researcher.md +292 -0
  13. package/agents/product-ideation-pattern-documenter.md +308 -0
  14. package/agents/product-ideation-segment-analyzer.md +303 -0
  15. package/agents/product-ideation-strategist.md +259 -0
  16. package/agents/statusline-setup.md +97 -0
  17. package/hooks/hooks.json +59 -0
  18. package/package.json +45 -0
  19. package/scripts/hooks/cleanup-stale.sh +13 -0
  20. package/scripts/hooks/enforce-quality.sh +166 -0
  21. package/scripts/hooks/implementer-quality.sh +256 -0
  22. package/scripts/hooks/inject-protocol.sh +52 -0
  23. package/scripts/hooks/suggest-pipeline.sh +175 -0
  24. package/scripts/hooks/track-pipeline-start.sh +37 -0
  25. package/scripts/hooks/track-pipeline-stop.sh +52 -0
  26. package/scripts/init-rules.sh +35 -0
  27. package/scripts/init.sh +151 -0
  28. package/skills/anthropic-validator/SKILL.md +607 -0
  29. package/skills/anthropic-validator/references/agents-checklist.md +131 -0
  30. package/skills/anthropic-validator/references/commands-checklist.md +102 -0
  31. package/skills/anthropic-validator/references/hooks-checklist.md +151 -0
  32. package/skills/anthropic-validator/references/mcp-checklist.md +136 -0
  33. package/skills/anthropic-validator/references/plugins-checklist.md +148 -0
  34. package/skills/anthropic-validator/references/skills-checklist.md +85 -0
  35. package/skills/assertion-patterns/SKILL.md +296 -0
  36. package/skills/bug-magnet-data/SKILL.md +284 -0
  37. package/skills/bug-magnet-data/context/cli-args.md +91 -0
  38. package/skills/bug-magnet-data/context/db-query.md +104 -0
  39. package/skills/bug-magnet-data/context/file-contents.md +103 -0
  40. package/skills/bug-magnet-data/context/http-body.md +91 -0
  41. package/skills/bug-magnet-data/context/process-spawn.md +123 -0
  42. package/skills/bug-magnet-data/data/booleans/boundaries.yaml +143 -0
  43. package/skills/bug-magnet-data/data/collections/arrays.yaml +114 -0
  44. package/skills/bug-magnet-data/data/collections/objects.yaml +123 -0
  45. package/skills/bug-magnet-data/data/concurrency/race-conditions.yaml +118 -0
  46. package/skills/bug-magnet-data/data/concurrency/state-machines.yaml +115 -0
  47. package/skills/bug-magnet-data/data/dates/boundaries.yaml +137 -0
  48. package/skills/bug-magnet-data/data/dates/invalid.yaml +132 -0
  49. package/skills/bug-magnet-data/data/dates/timezone.yaml +118 -0
  50. package/skills/bug-magnet-data/data/encoding/charset.yaml +79 -0
  51. package/skills/bug-magnet-data/data/encoding/normalization.yaml +105 -0
  52. package/skills/bug-magnet-data/data/formats/email.yaml +154 -0
  53. package/skills/bug-magnet-data/data/formats/json.yaml +187 -0
  54. package/skills/bug-magnet-data/data/formats/url.yaml +165 -0
  55. package/skills/bug-magnet-data/data/language-specific/javascript.yaml +182 -0
  56. package/skills/bug-magnet-data/data/language-specific/python.yaml +174 -0
  57. package/skills/bug-magnet-data/data/language-specific/rust.yaml +148 -0
  58. package/skills/bug-magnet-data/data/numbers/boundaries.yaml +161 -0
  59. package/skills/bug-magnet-data/data/numbers/precision.yaml +89 -0
  60. package/skills/bug-magnet-data/data/numbers/special.yaml +69 -0
  61. package/skills/bug-magnet-data/data/strings/boundaries.yaml +109 -0
  62. package/skills/bug-magnet-data/data/strings/injection.yaml +208 -0
  63. package/skills/bug-magnet-data/data/strings/special-chars.yaml +190 -0
  64. package/skills/bug-magnet-data/data/strings/unicode.yaml +139 -0
  65. package/skills/bug-magnet-data/references/external-lists.md +115 -0
  66. package/skills/bulwark-brainstorm/SKILL.md +563 -0
  67. package/skills/bulwark-brainstorm/references/at-teammate-prompts.md +60 -0
  68. package/skills/bulwark-brainstorm/references/role-critical-analyst.md +78 -0
  69. package/skills/bulwark-brainstorm/references/role-development-lead.md +66 -0
  70. package/skills/bulwark-brainstorm/references/role-product-delivery-lead.md +79 -0
  71. package/skills/bulwark-brainstorm/references/role-product-manager.md +62 -0
  72. package/skills/bulwark-brainstorm/references/role-project-sme.md +59 -0
  73. package/skills/bulwark-brainstorm/references/role-technical-architect.md +66 -0
  74. package/skills/bulwark-research/SKILL.md +298 -0
  75. package/skills/bulwark-research/references/viewpoint-contrarian.md +63 -0
  76. package/skills/bulwark-research/references/viewpoint-direct-investigation.md +62 -0
  77. package/skills/bulwark-research/references/viewpoint-first-principles.md +65 -0
  78. package/skills/bulwark-research/references/viewpoint-practitioner.md +62 -0
  79. package/skills/bulwark-research/references/viewpoint-prior-art.md +66 -0
  80. package/skills/bulwark-scaffold/SKILL.md +330 -0
  81. package/skills/bulwark-statusline/SKILL.md +161 -0
  82. package/skills/bulwark-statusline/scripts/statusline.sh +144 -0
  83. package/skills/bulwark-verify/SKILL.md +519 -0
  84. package/skills/code-review/SKILL.md +428 -0
  85. package/skills/code-review/examples/anti-patterns/linting.ts +181 -0
  86. package/skills/code-review/examples/anti-patterns/security.ts +91 -0
  87. package/skills/code-review/examples/anti-patterns/standards.ts +195 -0
  88. package/skills/code-review/examples/anti-patterns/type-safety.ts +108 -0
  89. package/skills/code-review/examples/recommended/linting.ts +195 -0
  90. package/skills/code-review/examples/recommended/security.ts +154 -0
  91. package/skills/code-review/examples/recommended/standards.ts +231 -0
  92. package/skills/code-review/examples/recommended/type-safety.ts +181 -0
  93. package/skills/code-review/frameworks/angular.md +218 -0
  94. package/skills/code-review/frameworks/django.md +235 -0
  95. package/skills/code-review/frameworks/express.md +207 -0
  96. package/skills/code-review/frameworks/flask.md +298 -0
  97. package/skills/code-review/frameworks/generic.md +146 -0
  98. package/skills/code-review/frameworks/react.md +152 -0
  99. package/skills/code-review/frameworks/vue.md +244 -0
  100. package/skills/code-review/references/linting-patterns.md +221 -0
  101. package/skills/code-review/references/security-patterns.md +125 -0
  102. package/skills/code-review/references/standards-patterns.md +246 -0
  103. package/skills/code-review/references/type-safety-patterns.md +130 -0
  104. package/skills/component-patterns/SKILL.md +131 -0
  105. package/skills/component-patterns/references/pattern-cli-command.md +118 -0
  106. package/skills/component-patterns/references/pattern-database.md +166 -0
  107. package/skills/component-patterns/references/pattern-external-api.md +139 -0
  108. package/skills/component-patterns/references/pattern-file-parser.md +168 -0
  109. package/skills/component-patterns/references/pattern-http-server.md +162 -0
  110. package/skills/component-patterns/references/pattern-process-spawner.md +133 -0
  111. package/skills/continuous-feedback/SKILL.md +327 -0
  112. package/skills/continuous-feedback/references/collect-instructions.md +81 -0
  113. package/skills/continuous-feedback/references/specialize-code-review.md +82 -0
  114. package/skills/continuous-feedback/references/specialize-general.md +98 -0
  115. package/skills/continuous-feedback/references/specialize-test-audit.md +81 -0
  116. package/skills/create-skill/SKILL.md +359 -0
  117. package/skills/create-skill/references/agent-conventions.md +194 -0
  118. package/skills/create-skill/references/agent-template.md +195 -0
  119. package/skills/create-skill/references/content-guidance.md +291 -0
  120. package/skills/create-skill/references/decision-framework.md +124 -0
  121. package/skills/create-skill/references/template-pipeline.md +217 -0
  122. package/skills/create-skill/references/template-reference-heavy.md +111 -0
  123. package/skills/create-skill/references/template-research.md +210 -0
  124. package/skills/create-skill/references/template-script-driven.md +172 -0
  125. package/skills/create-skill/references/template-simple.md +80 -0
  126. package/skills/create-subagent/SKILL.md +353 -0
  127. package/skills/create-subagent/references/agent-conventions.md +268 -0
  128. package/skills/create-subagent/references/content-guidance.md +232 -0
  129. package/skills/create-subagent/references/decision-framework.md +134 -0
  130. package/skills/create-subagent/references/template-single-agent.md +192 -0
  131. package/skills/fix-bug/SKILL.md +241 -0
  132. package/skills/governance-protocol/SKILL.md +116 -0
  133. package/skills/init/SKILL.md +341 -0
  134. package/skills/issue-debugging/SKILL.md +385 -0
  135. package/skills/issue-debugging/references/anti-patterns.md +245 -0
  136. package/skills/issue-debugging/references/debug-report-schema.md +227 -0
  137. package/skills/mock-detection/SKILL.md +511 -0
  138. package/skills/mock-detection/references/false-positive-prevention.md +402 -0
  139. package/skills/mock-detection/references/stub-patterns.md +236 -0
  140. package/skills/pipeline-templates/SKILL.md +215 -0
  141. package/skills/pipeline-templates/references/code-change-workflow.md +277 -0
  142. package/skills/pipeline-templates/references/code-review.md +336 -0
  143. package/skills/pipeline-templates/references/fix-validation.md +421 -0
  144. package/skills/pipeline-templates/references/new-feature.md +335 -0
  145. package/skills/pipeline-templates/references/research-brainstorm.md +161 -0
  146. package/skills/pipeline-templates/references/research-planning.md +257 -0
  147. package/skills/pipeline-templates/references/test-audit.md +389 -0
  148. package/skills/pipeline-templates/references/test-execution-fix.md +238 -0
  149. package/skills/plan-creation/SKILL.md +497 -0
  150. package/skills/product-ideation/SKILL.md +372 -0
  151. package/skills/product-ideation/references/analysis-frameworks.md +161 -0
  152. package/skills/session-handoff/SKILL.md +139 -0
  153. package/skills/session-handoff/references/examples.md +223 -0
  154. package/skills/setup-lsp/SKILL.md +312 -0
  155. package/skills/setup-lsp/references/server-registry.md +85 -0
  156. package/skills/setup-lsp/references/troubleshooting.md +135 -0
  157. package/skills/subagent-output-templating/SKILL.md +415 -0
  158. package/skills/subagent-output-templating/references/examples.md +440 -0
  159. package/skills/subagent-prompting/SKILL.md +364 -0
  160. package/skills/subagent-prompting/references/examples.md +342 -0
  161. package/skills/test-audit/SKILL.md +531 -0
  162. package/skills/test-audit/references/known-limitations.md +41 -0
  163. package/skills/test-audit/references/priority-classification.md +30 -0
  164. package/skills/test-audit/references/prompts/deep-mode-detection.md +83 -0
  165. package/skills/test-audit/references/prompts/synthesis.md +57 -0
  166. package/skills/test-audit/references/rewrite-instructions.md +46 -0
  167. package/skills/test-audit/references/schemas/audit-output.yaml +100 -0
  168. package/skills/test-audit/references/schemas/diagnostic-output.yaml +49 -0
  169. package/skills/test-audit/scripts/data-flow-analyzer.ts +509 -0
  170. package/skills/test-audit/scripts/integration-mock-detector.ts +462 -0
  171. package/skills/test-audit/scripts/package.json +20 -0
  172. package/skills/test-audit/scripts/skip-detector.ts +211 -0
  173. package/skills/test-audit/scripts/verification-counter.ts +295 -0
  174. package/skills/test-classification/SKILL.md +310 -0
  175. package/skills/test-fixture-creation/SKILL.md +295 -0
@@ -0,0 +1,98 @@
1
+ # Specialization: General
2
+
3
+ This reference guides the general Analyzer on what improvement patterns to look for in collected learnings. The general Analyzer is the catch-all — it processes learnings that don't fit neatly into test-audit or code-review, plus any items tagged with "general" skill_relevance.
4
+
5
+ ## Target Scope
6
+
7
+ The general Analyzer examines improvements for ANY skill or agent in the project, including:
8
+
9
+ - Skill authoring patterns (frontmatter, structure, instructions)
10
+ - Sub-agent behavior and prompt engineering
11
+ - Pipeline orchestration patterns
12
+ - Hook system configuration and behavior
13
+ - Session workflow and handoff patterns
14
+ - Token management and context window optimization
15
+ - Template and reference document conventions
16
+
17
+ ## What to Look For
18
+
19
+ ### Instruction Hardening (DEF-P4-005 Pattern)
20
+
21
+ Learnings about LLM compliance with skill/agent instructions:
22
+
23
+ - Cases where the LLM ignored or reinterpreted instructions
24
+ - Missing BINDING language (MUST/MUST NOT/MANDATORY/REQUIRED)
25
+ - Pre-Flight Gate gaps that allowed invalid inputs through
26
+ - SC1-SC3 compliance issues (skill instructions treated as advisory)
27
+
28
+ **Action**: Propose instruction strengthening for affected skills with specific MUST/MUST NOT language. Reference DEF-P4-005 as the canonical example.
29
+
30
+ ### Workflow Improvements
31
+
32
+ Learnings about process efficiency:
33
+
34
+ - Pipeline stage ordering improvements
35
+ - Parallel vs sequential execution discoveries
36
+ - Token budget management techniques
37
+ - Error handling and retry patterns that worked well
38
+ - Pre-flight alignment patterns that reduced post-synthesis iterations
39
+
40
+ **Action**: Propose workflow updates to affected skill SKILL.md files or pipeline templates.
41
+
42
+ ### Sub-Agent Behavior Patterns
43
+
44
+ Learnings about how sub-agents behave:
45
+
46
+ - Prompt patterns that produce better/worse agent output
47
+ - Model selection insights (when Haiku/Sonnet/Opus is appropriate)
48
+ - Agent output quality patterns (verbosity, hallucination, instruction compliance)
49
+ - Context window management for sub-agents
50
+
51
+ **Action**: Propose sub-agent prompt improvements or model selection updates.
52
+
53
+ ### Configuration and Convention Updates
54
+
55
+ Learnings about project configuration:
56
+
57
+ - Frontmatter field discoveries (what works, what silently breaks)
58
+ - File naming conventions that improve or degrade discoverability
59
+ - Hook configuration patterns
60
+ - Sync and portability requirements
61
+
62
+ **Action**: Propose configuration or convention updates to affected files.
63
+
64
+ ### Template and Reference Improvements
65
+
66
+ Learnings about document templates:
67
+
68
+ - Output template fields that are missing or unused
69
+ - Reference document gaps (missing guidance for common scenarios)
70
+ - Diagnostic output improvements
71
+ - Cross-skill reference patterns
72
+
73
+ **Action**: Propose template or reference updates with specific field additions or removals.
74
+
75
+ ### Tool and Platform Behaviors
76
+
77
+ Learnings about Claude Code platform behaviors:
78
+
79
+ - Framework observations (FW-OBS-NNN patterns)
80
+ - Tool quirks and workarounds
81
+ - Platform limitations that affect skill design
82
+ - New platform features that enable improvements
83
+
84
+ **Action**: Propose skill updates that account for discovered platform behaviors.
85
+
86
+ ## Analysis Output Structure
87
+
88
+ For each improvement identified, produce:
89
+
90
+ 1. **What was learned** — the specific learning item(s) driving this
91
+ 2. **What it affects** — which file(s) and section(s) in the target project
92
+ 3. **Proposed improvement** — specific enough for the Proposer to create a copy-paste-ready change
93
+ 4. **Priority** — High (causes real failures or blocks workflows), Medium (improves quality), Low (nice to have)
94
+ 5. **Evidence** — reference the source learning item IDs (L-NNN)
95
+
96
+ ## Catch-All Responsibility
97
+
98
+ The general Analyzer MUST process any learning items that were not fully covered by specialized Analyzers. If an item has `skill_relevance: ["test-audit", "general"]`, the test-audit Analyzer handles the test-audit angle, but the general Analyzer should still examine it for broader implications (e.g., instruction hardening patterns that apply across all skills).
@@ -0,0 +1,81 @@
1
+ # Specialization: Test Audit
2
+
3
+ This reference guides the test-audit Analyzer on what improvement patterns to look for in collected learnings.
4
+
5
+ ## Target Skill Structure
6
+
7
+ The test-audit skill (`skills/test-audit/` or `.claude/skills/test-audit/`) typically contains:
8
+
9
+ | Component | Purpose |
10
+ |-----------|---------|
11
+ | `SKILL.md` | Main skill document with pipeline stages and instructions |
12
+ | `references/mock-detection-patterns.md` | Mock detection heuristics and violation examples |
13
+ | `references/deep-mode-detection.md` | Deep mode analysis prompt for LLM-based classification |
14
+ | `references/test-classification.md` | Test type classification rules (unit/integration/e2e) |
15
+ | `references/assertion-patterns.md` | Real behavior verification patterns |
16
+ | AST scripts (`scripts/`) | TypeScript AST analysis for mock detection |
17
+
18
+ ## What to Look For
19
+
20
+ ### Mock Detection Gaps
21
+
22
+ Learnings that reveal mock patterns the current detection misses:
23
+
24
+ - New violation patterns not covered in `mock-detection-patterns.md`
25
+ - Property-access chains (e.g., `mockOrder.id` used in new objects)
26
+ - Framework-specific mock patterns (e.g., new testing libraries)
27
+ - Edge cases where AST scripts produce false positives or false negatives
28
+ - Scale vs Deep mode disagreements that reveal detection blind spots
29
+
30
+ **Action**: Propose additions to `references/mock-detection-patterns.md` with specific violation examples.
31
+
32
+ ### Assertion Pattern Additions
33
+
34
+ Learnings about how real behavior should be verified:
35
+
36
+ - New component types that need verification approaches
37
+ - Patterns where `toHaveBeenCalled` should be replaced with output checks
38
+ - File system, network, or process verification patterns discovered during debugging
39
+
40
+ **Action**: Propose additions to assertion patterns references with concrete before/after examples.
41
+
42
+ ### AST Script Coverage
43
+
44
+ Learnings about AST analysis accuracy:
45
+
46
+ - Cases where AST scripts miss violations that LLM deep mode catches
47
+ - Cases where AST scripts flag false positives
48
+ - New TypeScript/JavaScript patterns that need AST support
49
+ - Performance observations (files where AST analysis is slow or times out)
50
+
51
+ **Action**: Propose specific AST pattern additions or corrections. Include the code pattern that should be detected.
52
+
53
+ ### Classification Improvements
54
+
55
+ Learnings about test type classification:
56
+
57
+ - Tests misclassified as unit when they're integration (or vice versa)
58
+ - New heuristics for distinguishing test types
59
+ - Section-boundary detection improvements (where one file has multiple test types)
60
+
61
+ **Action**: Propose classification rule updates with examples of correct vs incorrect classification.
62
+
63
+ ### Instruction Hardening
64
+
65
+ Learnings about LLM compliance with test-audit instructions:
66
+
67
+ - Cases where the LLM re-classified AST findings (DEF-P4-005 pattern)
68
+ - Missing BINDING language that allowed instruction drift
69
+ - Pre-Flight Gate gaps or missing threshold checks
70
+
71
+ **Action**: Propose instruction strengthening with specific MUST/MUST NOT language.
72
+
73
+ ## Analysis Output Structure
74
+
75
+ For each improvement identified, produce:
76
+
77
+ 1. **What was learned** — the specific learning item(s) driving this
78
+ 2. **What it affects** — which test-audit component (reference file, AST script, SKILL.md section)
79
+ 3. **Proposed improvement** — specific enough for the Proposer to create a copy-paste-ready change
80
+ 4. **Priority** — High (current misses cause real failures), Medium (improves coverage), Low (nice to have)
81
+ 5. **Evidence** — reference the source learning item IDs (L-NNN)
@@ -0,0 +1,359 @@
1
+ ---
2
+ name: create-skill
3
+ description: Generates Claude Code skills from requirements using adaptive interview, complexity classification, and iterative validation. Use when creating new skills, scaffolding skill structure, or generating skills with sub-agent orchestration.
4
+ disable-model-invocation: true
5
+ argument-hint: "<description-or-name> [--doc <requirements-path>]"
6
+ skills:
7
+ - subagent-prompting
8
+ ---
9
+
10
+ # Create Skill
11
+
12
+ Generates a complete Claude Code skill from a description or requirements document. Conducts an adaptive interview to understand the skill's purpose, classifies it into one of 5 structural types, spawns a Sonnet sub-agent to generate the files, validates with anthropic-validator, and presents the scaffold with architectural decisions.
13
+
14
+ ---
15
+
16
+ ## When to Use This Skill
17
+
18
+ **Load this skill when the user request matches ANY of these patterns:**
19
+
20
+ | Trigger Pattern | Example User Request |
21
+ |-----------------|---------------------|
22
+ | Skill creation | "Create a new skill", "Make a skill for X" |
23
+ | Scaffolding | "Scaffold a skill", "Set up a new skill" |
24
+ | Generation | "Generate a skill that does X" |
25
+ | Skill design | "Design a skill for X", "I need a skill that does X" |
26
+
27
+ **DO NOT use for:**
28
+ - Editing existing skills (edit directly)
29
+ - Creating standalone sub-agents (use `create-subagent`)
30
+ - Debugging skill issues (use `issue-debugging`)
31
+ - Validating existing skills (use `anthropic-validator`)
32
+
33
+ ---
34
+
35
+ ## Dependencies
36
+
37
+ | Category | Files | Requirement | When to Load |
38
+ |----------|-------|-------------|--------------|
39
+ | **Decision framework** | `references/decision-framework.md` | **REQUIRED** | Load at Stage 0 for interview + classification |
40
+ | **Content guidance** | `references/content-guidance.md` | **REQUIRED** | Include in Stage 2 generator prompt |
41
+ | **Skill templates** | `references/template-*.md` | **REQUIRED** | Load the matching template at Stage 2 |
42
+ | **Agent template** | `references/agent-template.md` | **REQUIRED** | Include in Stage 2 prompt when template = pipeline |
43
+ | **Agent conventions** | `references/agent-conventions.md` | **REQUIRED** | Include in Stage 2 prompt when template = pipeline |
44
+ | **Diagnostic template** | `templates/diagnostic-output.yaml` | **REQUIRED** | Use at Stage 6 |
45
+ | **Subagent prompting** | `subagent-prompting` skill | **REQUIRED** | Load at Stage 0 for 4-part prompt template |
46
+
47
+ **Fallback behavior:**
48
+ - If a template file is missing: Use the closest available template, note mismatch in diagnostics
49
+ - If content-guidance is missing: Proceed without it, note in diagnostics (output quality will be lower)
50
+
51
+ ---
52
+
53
+ ## Usage
54
+
55
+ ```
56
+ /create-skill <description-or-name>
57
+ /create-skill --doc <requirements-document>
58
+ ```
59
+
60
+ **Arguments:**
61
+ - `<description-or-name>` — Free-text description of the desired skill, or a skill name to start from
62
+ - `--doc <path>` — Path to a requirements document. Extracts interview answers from it instead of asking fresh.
63
+
64
+ **Examples:**
65
+ - `/create-skill a skill that audits dependency versions` — Start from description
66
+ - `/create-skill --doc plans/task-briefs/P5.4-create-skill.md` — Start from requirements doc
67
+ - `/create-skill changelog-generator` — Start from a name
68
+
69
+ ---
70
+
71
+ ## Mandatory Execution Checklist (BINDING)
72
+
73
+ **Every item below is mandatory. No deviations. No substitutions. No skipping.**
74
+
75
+ This skill uses a 6-stage pipeline. You are the orchestrator. Follow every item in order. Do NOT return to the user until all applicable items are checked.
76
+
77
+ - [ ] **Stage 0 — Pre-Flight**: Arguments parsed (description, name, or --doc)
78
+ - [ ] **Stage 0 — Pre-Flight**: Decision framework and content guidance loaded
79
+ - [ ] **Stage 0 — Pre-Flight**: Adaptive interview conducted (1-2 rounds via AskUserQuestion)
80
+ - [ ] **Stage 1 — Classify**: Three decisions made (context mode, sub-agent pattern, supporting files)
81
+ - [ ] **Stage 1 — Classify**: Classification presented to user and confirmed via AskUserQuestion
82
+ - [ ] **Stage 2 — Generate**: Sonnet sub-agent spawned via Task tool (you do NOT generate the files yourself)
83
+ - [ ] **Stage 2 — Generate**: Generated files verified to exist in working directory
84
+ - [ ] **Stage 2 — Generate**: If pipeline template — sub-agent files generated in {working-directory}/agents/
85
+ - [ ] **Stage 3 — Validate**: `/anthropic-validator` invoked via Skill tool (manual review is NOT a substitute)
86
+ - [ ] **Stage 3 — Validate**: If pipeline template — `/anthropic-validator` invoked on each sub-agent file
87
+ - [ ] **Stage 3 — Validate**: Validator output read and findings counted
88
+ - [ ] **Stage 3 — Validate**: Manual checks completed (single-line description, no unnecessary files)
89
+ - [ ] **Stage 4 — Refine**: If validation found critical/high issues, Sonnet sub-agent spawned to fix (max 2 retries)
90
+ - [ ] **Stage 5 — Deploy**: Skill files deployed from working directory to target directory
91
+ - [ ] **Stage 5 — Deploy**: If pipeline template — sub-agent files deployed to `.claude/agents/`
92
+ - [ ] **Stage 5 — Deploy**: Working directory cleaned up
93
+ - [ ] **Stage 5 — Present**: Post-generation summary presented with architectural decisions
94
+ - [ ] **Stage 5 — Present**: If pipeline template — sub-agent permissions communicated
95
+ - [ ] **Stage 5 — Present**: Next steps communicated (this is a scaffold, not production-ready output)
96
+ - [ ] **Stage 6 — Diagnostics**: Diagnostic YAML written to `$PROJECT_DIR/logs/diagnostics/`
97
+
98
+ ---
99
+
100
+ ## Pipeline
101
+
102
+ ```fsharp
103
+ // create-skill pipeline
104
+ PreFlight(args) // Stage 0: Orchestrator — parse input, adaptive interview
105
+ |> Classify(interview_answers) // Stage 1: Orchestrator — three independent decisions
106
+ |> Generate(classification, template, examples) // Stage 2: Sonnet sub-agent — produce skill files
107
+ |> Validate(generated_output) // Stage 3: Orchestrator — run anthropic-validator
108
+ |> Refine(validator_findings) // Stage 4: Sonnet sub-agent (conditional, max 2 retries)
109
+ |> DeployAndPresent(working_dir, target_dir) // Stage 5: Orchestrator — deploy to target + post-generation summary
110
+ |> Diagnostics() // Stage 6: Orchestrator — write YAML
111
+ ```
112
+
113
+ ---
114
+
115
+ ## Stage Definitions
116
+
117
+ ### Stage 0: Pre-Flight (Orchestrator)
118
+
119
+ ```
120
+ Stage 0: Pre-Flight
121
+ ├── Parse arguments (description, name, or --doc path)
122
+ ├── Load references/decision-framework.md
123
+ ├── Load references/content-guidance.md
124
+ ├── Load subagent-prompting skill
125
+ ├── If --doc provided:
126
+ │ ├── Read the requirements document
127
+ │ ├── Extract answers to Q1-Q5 from the document
128
+ │ └── Present extracted answers to user for confirmation via AskUserQuestion
129
+ ├── If no --doc:
130
+ │ └── AskUserQuestion: Present all 5 core questions from decision-framework.md
131
+ │ ├── Q1: What does this skill do? (concrete invocation examples)
132
+ │ ├── Q2: Needs conversation history, or can run in isolation?
133
+ │ ├── Q3: Orchestrates multiple distinct operations?
134
+ │ ├── Q4: How much domain-specific reference content? (None/Some/Extensive)
135
+ │ └── Q5: Produces structured output matching a specific format?
136
+ ├── If complexity detected in answers:
137
+ │ └── AskUserQuestion: Follow-up questions per decision-framework.md
138
+ │ ├── Q6: Do operations depend on each other's output?
139
+ │ ├── Q7: Do workers need direct communication?
140
+ │ ├── Q8: Error handling between stages?
141
+ │ └── Q9-Q10: Context-specific follow-ups
142
+ ├── Determine target directory for generated skill
143
+ │ └── Default: skills/{skill-name}/ (or user-specified path)
144
+ ├── Set working directory: tmp/create-skill/{skill-name}/
145
+ │ └── All generation and refinement happens here to avoid .claude/ edit approval storms
146
+ │ Files are deployed to the target directory only after validation passes (Stage 5)
147
+ └── Token budget check (warn if >30% consumed)
148
+ ```
149
+
150
+ **Interview behavior**: Maximum 2 AskUserQuestion rounds. Present Q1-Q5 together in round 1. Follow-ups (if needed) in round 2. Do NOT ask questions one at a time.
151
+
152
+ ### Stage 1: Classify (Orchestrator)
153
+
154
+ Apply the three-decision classification from `references/decision-framework.md`:
155
+
156
+ ```
157
+ Stage 1: Classify
158
+ ├── Decision A: Context Mode
159
+ │ ├── Needs conversation history → inline (no fork)
160
+ │ ├── Isolated multi-step work → context: fork
161
+ │ └── Simple guideline/knowledge → inline (warn if fork requested)
162
+ ├── Decision B: Sub-Agent Pattern
163
+ │ ├── Single operation → no sub-agents
164
+ │ ├── Multiple dependent operations → sequential Task tool
165
+ │ ├── Multiple independent operations → parallel Task tool
166
+ │ └── Direct worker communication → Agent Teams (experimental warning)
167
+ ├── Decision C: Supporting Files
168
+ │ ├── No references, no templates → vanilla SKILL.md
169
+ │ ├── Domain references needed → add references/
170
+ │ ├── Structured output format → add templates/
171
+ │ └── Deterministic code needed → add scripts/
172
+ ├── Map decisions → template (1 of 5 from decision-framework.md)
173
+ └── Present classification to user via AskUserQuestion:
174
+ ├── "Context: {inline/fork} — {reason}"
175
+ ├── "Sub-agents: {none/sequential/parallel/AT} — {reason}"
176
+ ├── "Supporting files: {list} — {reason}"
177
+ ├── "Template: {template name}"
178
+ └── "Proceed with generation? [Yes / Adjust]"
179
+ ```
180
+
181
+ **MANDATORY**: Wait for user confirmation before proceeding to Stage 2. If user selects "Adjust", re-classify with their feedback.
182
+
183
+ ### Stage 2: Generate (Sonnet sub-agent)
184
+
185
+ ```
186
+ Stage 2: Generate
187
+ ├── Read the selected template from references/template-{type}.md
188
+ ├── Construct prompt using 4-part template (GOAL/CONSTRAINTS/CONTEXT/OUTPUT):
189
+ │ ├── GOAL: Generate a complete, structurally correct skill matching the
190
+ │ │ classification. The skill must activate reliably and instruct clearly.
191
+ │ ├── CONSTRAINTS:
192
+ │ │ ├── Follow the template structure exactly
193
+ │ │ ├── Description MUST be a single line (multi-line breaks discovery)
194
+ │ │ ├── Description MUST use "Use when..." trigger framing
195
+ │ │ ├── Include "When to Use" table with ≥3 trigger patterns
196
+ │ │ ├── Include "DO NOT use for" section with ≥2 anti-triggers
197
+ │ │ ├── If skill has sub-agents: include Pre-Flight Gate with MUST/MUST NOT
198
+ │ │ ├── If skill has sub-agents: include subagent-prompting in skills: dependency
199
+ │ │ ├── Do NOT add unnecessary files (no README, CHANGELOG, LICENSE)
200
+ │ │ ├── Do NOT use emojis in generated content
201
+ │ │ └── Keep total SKILL.md under target line count for the type
202
+ │ │ (simple: 150, reference-heavy: 200, pipeline: 400, script: 400, research: 400)
203
+ │ ├── CONTEXT:
204
+ │ │ ├── Classification from Stage 1 (all three decisions + template)
205
+ │ │ ├── User's interview answers (concrete examples from Q1)
206
+ │ │ ├── Selected template: references/template-{type}.md
207
+ │ │ ├── Content guidance: references/content-guidance.md
208
+ │ │ ├── If pipeline template: references/agent-template.md (sub-agent file structure)
209
+ │ │ ├── If pipeline template: references/agent-conventions.md (system-prompt register, frontmatter)
210
+ │ │ ├── Instruction: "Read 1-2 existing skills of the same type from the
211
+ │ │ │ codebase for structural reference (use Glob to find skills/*/SKILL.md)"
212
+ │ │ ├── If pipeline template: "Read 1-2 existing agents from .claude/agents/*.md
213
+ │ │ │ for sub-agent structural reference"
214
+ │ │ ├── Target output directory (final deployment location)
215
+ │ │ └── Working directory: tmp/create-skill/{skill-name}/
216
+ │ └── OUTPUT:
217
+ │ ├── Write SKILL.md to {working-directory}/SKILL.md
218
+ │ ├── Write reference files to {working-directory}/references/ (if applicable)
219
+ │ ├── Write template files to {working-directory}/templates/ (if applicable)
220
+ │ ├── Write script files to {working-directory}/scripts/ (if applicable)
221
+ │ ├── If pipeline template: Write sub-agent files to {working-directory}/agents/
222
+ │ │ ├── One .md file per pipeline stage: {skill-name}-{stage-name}.md
223
+ │ │ ├── Each sub-agent follows agent-template.md structure
224
+ │ │ ├── Each sub-agent uses system-prompt register (agent-conventions.md)
225
+ │ │ └── Orchestrating SKILL.md references sub-agents by Task(subagent_type="{name}")
226
+ │ └── Return summary: list of files created with line counts
227
+ ├── Spawn: Task(description="Generate skill files", subagent_type="general-purpose",
228
+ │ model="sonnet", prompt=...)
229
+ ├── Read generator output (file list + summary)
230
+ └── Verify files were created (Glob for {working-directory}/**)
231
+ ```
232
+
233
+ ### Stage 3: Validate (Orchestrator)
234
+
235
+ ```
236
+ Stage 3: Validate
237
+ ├── FIRST: Invoke /anthropic-validator (this is the PRIMARY validation — NOT optional)
238
+ │ ├── Use the Skill tool: Skill(skill="anthropic-validator", args="{working-directory}/")
239
+ │ ├── Do NOT substitute manual review for this step
240
+ │ └── Do NOT proceed past this node until the Skill tool has been invoked
241
+ ├── If pipeline template: Also validate each sub-agent file in {working-directory}/agents/
242
+ │ └── Run /anthropic-validator on each {skill-name}-{stage-name}.md
243
+ ├── Read validator output
244
+ ├── Check for critical/high findings:
245
+ │ ├── 0 critical AND 0 high → proceed to Stage 5 (skip Stage 4)
246
+ │ └── Any critical or high → proceed to Stage 4 (refine)
247
+ ├── THEN: Manual checks (these supplement the validator, they do NOT replace it)
248
+ │ ├── Check description is single-line (read SKILL.md, verify no multiline description)
249
+ │ ├── If pipeline template: Check each sub-agent uses system-prompt register
250
+ │ └── Check no unnecessary files (no README.md, CHANGELOG.md, etc.)
251
+ └── Stage 3 exit gate:
252
+ ├── [ ] /anthropic-validator was invoked via the Skill tool (not manual review)
253
+ ├── [ ] Validator output was read and findings counted
254
+ └── If either is unchecked, Stage 3 is NOT complete — go back and invoke the validator
255
+ ```
256
+
257
+ ### Stage 4: Refine (Sonnet sub-agent, conditional, max 2 retries)
258
+
259
+ This stage only runs if Stage 3 found critical or high issues.
260
+
261
+ ```
262
+ Stage 4: Refine (attempt {N} of 2)
263
+ ├── Construct prompt using 4-part template:
264
+ │ ├── GOAL: Fix all critical and high findings from anthropic-validator
265
+ │ ├── CONSTRAINTS:
266
+ │ │ ├── Only fix the specific issues identified — do not restructure
267
+ │ │ ├── Preserve the existing skill content and structure
268
+ │ │ └── Description must remain single-line
269
+ │ ├── CONTEXT:
270
+ │ │ ├── Validator findings (critical and high items with descriptions)
271
+ │ │ ├── Current generated files (read from working directory)
272
+ │ │ └── Content guidance: references/content-guidance.md
273
+ │ └── OUTPUT: Edit files in {working-directory}/ to fix findings
274
+ ├── Spawn: Task(description="Fix validator findings", subagent_type="general-purpose",
275
+ │ model="sonnet", prompt=...)
276
+ ├── Re-run Stage 3 (validate)
277
+ ├── If still failing after 2 retries:
278
+ │ └── Proceed to Stage 5 with caveats noted
279
+ └── Token budget check
280
+ ```
281
+
282
+ ### Stage 5: Deploy & Present (Orchestrator)
283
+
284
+ ```
285
+ Stage 5: Deploy & Present
286
+ ├── Deploy skill: Move skill files from {working-directory}/ to {target-directory}/
287
+ │ ├── Copy directory tree preserving structure (SKILL.md, references/, templates/, scripts/)
288
+ │ └── This is the ONLY point where skill files are written to the final location
289
+ ├── If pipeline template: Deploy sub-agents
290
+ │ ├── Move {working-directory}/agents/*.md to .claude/agents/
291
+ │ └── Each sub-agent file: .claude/agents/{skill-name}-{stage-name}.md
292
+ ├── Clean up: Remove {working-directory}/ after successful copy
293
+ ├── Read all generated files for summary
294
+ ├── Present to user:
295
+ │ ├── "Generated skill at: {target-directory}/"
296
+ │ ├── If pipeline: "Generated sub-agents at: .claude/agents/"
297
+ │ ├── "Files created:"
298
+ │ │ └── List each file with line count (skill files + sub-agent files)
299
+ │ ├── "Architectural decisions:"
300
+ │ │ ├── "Context: {fork/inline} — {reason}"
301
+ │ │ ├── "Sub-agents: {none/sequential/parallel/AT} — {reason}"
302
+ │ │ └── "Supporting files: {list} — {reason}"
303
+ │ ├── "Skill type: {template used}"
304
+ │ ├── "Validation: {pass/fail with details}"
305
+ │ ├── If caveats: "Unresolved issues: {list}"
306
+ │ ├── If pipeline: "Sub-agent permissions to configure:"
307
+ │ │ └── {List tool permissions for each sub-agent that must be added to settings.json}
308
+ │ └── "Next steps:"
309
+ │ ├── "1. Review and customize the generated instructions"
310
+ │ ├── "2. Test activation by asking Claude to invoke it"
311
+ │ ├── "3. Iterate on trigger patterns until activation is reliable"
312
+ │ ├── "4. Add domain-specific content to reference files"
313
+ │ └── If pipeline: "5. Configure tool permissions for sub-agents in .claude/settings.json"
314
+ └── Note: This is a scaffold, not production-ready output (generate-and-customize contract)
315
+ ```
316
+
317
+ ### Stage 6: Diagnostics (REQUIRED)
318
+
319
+ **MANDATORY**: Write diagnostic output after every invocation. This cannot be skipped.
320
+
321
+ ```
322
+ Stage 6: Diagnostics
323
+ ├── Write to: $PROJECT_DIR/logs/diagnostics/create-skill-{YYYYMMDD-HHMMSS}.yaml
324
+ │ └── Use templates/diagnostic-output.yaml schema
325
+ └── Include:
326
+ ├── Input: description/name/doc path
327
+ ├── Interview: questions asked, rounds completed
328
+ ├── Classification: all three decisions + template selected
329
+ ├── Generation: files created, line counts, model used
330
+ ├── Validation: pass/fail, findings count, retry count
331
+ └── Outcome: success/partial/failure
332
+ ```
333
+
334
+ ---
335
+
336
+ ## Error Handling
337
+
338
+ | Scenario | Action |
339
+ |----------|--------|
340
+ | Generator sub-agent returns empty output | Re-spawn once with reinforced instructions. If still empty, STOP: "Generation failed. Please try with a more detailed description." |
341
+ | anthropic-validator finds critical issues | Stage 4 retry (max 2). After 2 retries, present with caveats. |
342
+ | anthropic-validator unavailable | Skip validation, note in diagnostics, warn user: "Validation skipped — run /anthropic-validator manually." |
343
+ | Interview answers are ambiguous | Ask 1-2 follow-up questions (max 2 AskUserQuestion rounds total). |
344
+ | User requests Agent Teams | Include experimental warning: "Agent Teams requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1. This is an experimental feature." |
345
+ | Token budget exceeded | Stop at current stage, present partial output with explanation. |
346
+ | Target directory already exists | AskUserQuestion: "Directory {path} already exists. Overwrite / Choose different name / Cancel?" |
347
+ | Working directory already exists | Silently remove and recreate tmp/create-skill/{skill-name}/ (working dirs are ephemeral) |
348
+ | User rejects classification | Re-classify with user's feedback. Max 2 classification rounds. |
349
+
350
+ ---
351
+
352
+ ## Token Budget Management
353
+
354
+ | Checkpoint | Threshold | Action |
355
+ |------------|-----------|--------|
356
+ | After Pre-Flight | >30% consumed | Warn: "Pipeline agents will consume significant context." |
357
+ | After Generate | >55% consumed | Warn: "Approaching budget. Validation + refinement may be limited." |
358
+ | After Validate | >65% consumed | Skip refinement if needed, present as-is with caveats. |
359
+