@qball-inc/the-bulwark 1.0.1 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (232) hide show
  1. package/.claude-plugin/plugin.json +2 -3
  2. package/.gitattributes +48 -0
  3. package/CHANGELOG.md +121 -0
  4. package/LICENSE +21 -0
  5. package/README.md +426 -368
  6. package/agents/bulwark-fix-validator.md +643 -633
  7. package/agents/bulwark-implementer.md +407 -391
  8. package/agents/bulwark-issue-analyzer.md +310 -308
  9. package/agents/bulwark-standards-reviewer.md +305 -221
  10. package/agents/plan-creation-architect.md +325 -323
  11. package/agents/plan-creation-eng-lead.md +354 -352
  12. package/agents/plan-creation-po.md +302 -300
  13. package/agents/plan-creation-qa-critic.md +336 -334
  14. package/agents/product-ideation-competitive-analyzer.md +2 -0
  15. package/agents/product-ideation-idea-validator.md +2 -0
  16. package/agents/product-ideation-market-researcher.md +2 -0
  17. package/agents/product-ideation-pattern-documenter.md +2 -0
  18. package/agents/product-ideation-segment-analyzer.md +2 -0
  19. package/agents/product-ideation-strategist.md +2 -0
  20. package/agents/statusline-setup.md +99 -97
  21. package/hooks/hooks.json +30 -1
  22. package/package.json +6 -5
  23. package/scripts/apply-section.sh +243 -0
  24. package/scripts/hooks/check-template-drift.sh +191 -0
  25. package/scripts/hooks/cleanup-review-registry.sh +106 -0
  26. package/scripts/hooks/cleanup-stale.sh +19 -2
  27. package/scripts/hooks/enforce-quality.sh +72 -23
  28. package/scripts/hooks/lib/coverage_check.py +513 -0
  29. package/scripts/hooks/suggest-pipeline-stop.sh +234 -0
  30. package/scripts/hooks/suggest-pipeline.sh +12 -0
  31. package/scripts/init.sh +64 -0
  32. package/scripts/install-bun.sh +327 -0
  33. package/scripts/install-just.sh +404 -0
  34. package/scripts/toolchain-smoke-run.sh +219 -0
  35. package/scripts/update.sh +342 -0
  36. package/skills/anthropic-validator/SKILL.md +497 -607
  37. package/skills/anthropic-validator/references/agents-checklist.md +144 -131
  38. package/skills/anthropic-validator/references/agents-validation.md +90 -0
  39. package/skills/anthropic-validator/references/commands-checklist.md +102 -102
  40. package/skills/anthropic-validator/references/commands-validation.md +42 -0
  41. package/skills/anthropic-validator/references/hooks-checklist.md +160 -151
  42. package/skills/anthropic-validator/references/hooks-validation.md +82 -0
  43. package/skills/anthropic-validator/references/mcp-checklist.md +136 -136
  44. package/skills/anthropic-validator/references/mcp-validation.md +39 -0
  45. package/skills/anthropic-validator/references/plugins-checklist.md +154 -148
  46. package/skills/anthropic-validator/references/plugins-validation.md +68 -0
  47. package/skills/anthropic-validator/references/skills-checklist.md +105 -85
  48. package/skills/anthropic-validator/references/skills-validation.md +79 -0
  49. package/skills/assertion-patterns/SKILL.md +298 -296
  50. package/skills/bug-magnet-data/SKILL.md +286 -284
  51. package/skills/bug-magnet-data/context/cli-args.md +91 -91
  52. package/skills/bug-magnet-data/context/db-query.md +104 -104
  53. package/skills/bug-magnet-data/context/file-contents.md +103 -103
  54. package/skills/bug-magnet-data/context/http-body.md +91 -91
  55. package/skills/bug-magnet-data/context/process-spawn.md +123 -123
  56. package/skills/bug-magnet-data/data/booleans/boundaries.yaml +143 -143
  57. package/skills/bug-magnet-data/data/collections/arrays.yaml +114 -114
  58. package/skills/bug-magnet-data/data/collections/objects.yaml +123 -123
  59. package/skills/bug-magnet-data/data/concurrency/race-conditions.yaml +118 -118
  60. package/skills/bug-magnet-data/data/concurrency/state-machines.yaml +115 -115
  61. package/skills/bug-magnet-data/data/dates/boundaries.yaml +137 -137
  62. package/skills/bug-magnet-data/data/dates/invalid.yaml +132 -132
  63. package/skills/bug-magnet-data/data/dates/timezone.yaml +118 -118
  64. package/skills/bug-magnet-data/data/encoding/charset.yaml +79 -79
  65. package/skills/bug-magnet-data/data/encoding/normalization.yaml +105 -105
  66. package/skills/bug-magnet-data/data/formats/email.yaml +154 -154
  67. package/skills/bug-magnet-data/data/formats/json.yaml +187 -187
  68. package/skills/bug-magnet-data/data/formats/url.yaml +165 -165
  69. package/skills/bug-magnet-data/data/language-specific/javascript.yaml +182 -182
  70. package/skills/bug-magnet-data/data/language-specific/python.yaml +174 -174
  71. package/skills/bug-magnet-data/data/language-specific/rust.yaml +148 -148
  72. package/skills/bug-magnet-data/data/numbers/boundaries.yaml +161 -161
  73. package/skills/bug-magnet-data/data/numbers/precision.yaml +89 -89
  74. package/skills/bug-magnet-data/data/numbers/special.yaml +69 -69
  75. package/skills/bug-magnet-data/data/strings/boundaries.yaml +109 -109
  76. package/skills/bug-magnet-data/data/strings/injection.yaml +208 -208
  77. package/skills/bug-magnet-data/data/strings/special-chars.yaml +190 -190
  78. package/skills/bug-magnet-data/data/strings/unicode.yaml +139 -139
  79. package/skills/bug-magnet-data/references/external-lists.md +115 -115
  80. package/skills/bulwark-brainstorm/SKILL.md +566 -563
  81. package/skills/bulwark-brainstorm/references/at-teammate-prompts.md +95 -60
  82. package/skills/bulwark-brainstorm/references/role-critical-analyst.md +78 -78
  83. package/skills/bulwark-brainstorm/references/role-development-lead.md +66 -66
  84. package/skills/bulwark-brainstorm/references/role-product-delivery-lead.md +79 -79
  85. package/skills/bulwark-brainstorm/references/role-product-manager.md +62 -62
  86. package/skills/bulwark-brainstorm/references/role-project-sme.md +59 -59
  87. package/skills/bulwark-brainstorm/references/role-technical-architect.md +66 -66
  88. package/skills/bulwark-research/SKILL.md +300 -298
  89. package/skills/bulwark-research/references/viewpoint-contrarian.md +63 -63
  90. package/skills/bulwark-research/references/viewpoint-direct-investigation.md +62 -62
  91. package/skills/bulwark-research/references/viewpoint-first-principles.md +65 -65
  92. package/skills/bulwark-research/references/viewpoint-practitioner.md +62 -62
  93. package/skills/bulwark-research/references/viewpoint-prior-art.md +66 -66
  94. package/skills/bulwark-scaffold/SKILL.md +483 -330
  95. package/skills/bulwark-statusline/SKILL.md +166 -161
  96. package/skills/bulwark-statusline/scripts/statusline.sh +1 -1
  97. package/skills/bulwark-verify/SKILL.md +532 -519
  98. package/skills/code-review/SKILL.md +488 -428
  99. package/skills/code-review/examples/anti-patterns/linting.ts +181 -181
  100. package/skills/code-review/examples/anti-patterns/security.ts +91 -91
  101. package/skills/code-review/examples/anti-patterns/standards.ts +195 -195
  102. package/skills/code-review/examples/anti-patterns/type-safety.ts +108 -108
  103. package/skills/code-review/examples/recommended/linting.ts +195 -195
  104. package/skills/code-review/examples/recommended/security.ts +154 -154
  105. package/skills/code-review/examples/recommended/standards.ts +231 -231
  106. package/skills/code-review/examples/recommended/type-safety.ts +181 -181
  107. package/skills/code-review/frameworks/angular.md +218 -218
  108. package/skills/code-review/frameworks/django.md +235 -235
  109. package/skills/code-review/frameworks/express.md +207 -207
  110. package/skills/code-review/frameworks/fastapi.md +326 -0
  111. package/skills/code-review/frameworks/flask.md +298 -298
  112. package/skills/code-review/frameworks/generic.md +146 -146
  113. package/skills/code-review/frameworks/react.md +152 -152
  114. package/skills/code-review/frameworks/vue.md +244 -244
  115. package/skills/code-review/references/linting-patterns.md +221 -221
  116. package/skills/code-review/references/security-patterns.md +125 -125
  117. package/skills/code-review/references/standards-patterns.md +246 -246
  118. package/skills/code-review/references/type-safety-patterns.md +130 -130
  119. package/skills/component-patterns/SKILL.md +133 -131
  120. package/skills/component-patterns/references/pattern-cli-command.md +118 -118
  121. package/skills/component-patterns/references/pattern-database.md +166 -166
  122. package/skills/component-patterns/references/pattern-external-api.md +139 -139
  123. package/skills/component-patterns/references/pattern-file-parser.md +168 -168
  124. package/skills/component-patterns/references/pattern-http-server.md +162 -162
  125. package/skills/component-patterns/references/pattern-process-spawner.md +133 -133
  126. package/skills/continuous-feedback/SKILL.md +329 -327
  127. package/skills/continuous-feedback/references/collect-instructions.md +81 -81
  128. package/skills/continuous-feedback/references/specialize-code-review.md +82 -82
  129. package/skills/continuous-feedback/references/specialize-general.md +98 -98
  130. package/skills/continuous-feedback/references/specialize-test-audit.md +81 -81
  131. package/skills/create-skill/SKILL.md +550 -359
  132. package/skills/create-skill/agents/skill-eval-comparator.md +158 -0
  133. package/skills/create-skill/agents/skill-eval-grader.md +168 -0
  134. package/skills/create-skill/references/agent-conventions.md +194 -194
  135. package/skills/create-skill/references/agent-template.md +195 -195
  136. package/skills/create-skill/references/content-guidance.md +541 -291
  137. package/skills/create-skill/references/decision-framework.md +232 -124
  138. package/skills/create-skill/references/eval-scaffolding.md +468 -0
  139. package/skills/create-skill/references/eval-shape.md +383 -0
  140. package/skills/create-skill/references/scripts-conventions.md +142 -0
  141. package/skills/create-skill/references/template-generator.md +183 -0
  142. package/skills/create-skill/references/template-inversion.md +269 -0
  143. package/skills/create-skill/references/template-pipeline.md +248 -217
  144. package/skills/create-skill/references/template-research.md +234 -210
  145. package/skills/create-skill/references/template-reviewer.md +231 -0
  146. package/skills/create-skill/references/template-script-driven.md +185 -172
  147. package/skills/create-skill/references/template-tool-wrapper.md +199 -0
  148. package/skills/create-skill/scripts/check-description.ts +238 -0
  149. package/skills/create-skill/scripts/check-skill-size.ts +201 -0
  150. package/skills/create-skill/scripts/grade.ts +855 -0
  151. package/skills/create-skill/scripts/run-loop.ts +297 -0
  152. package/skills/create-subagent/SKILL.md +355 -353
  153. package/skills/create-subagent/references/agent-conventions.md +268 -268
  154. package/skills/create-subagent/references/content-guidance.md +232 -232
  155. package/skills/create-subagent/references/decision-framework.md +134 -134
  156. package/skills/create-subagent/references/template-single-agent.md +194 -192
  157. package/skills/fix-bug/SKILL.md +243 -241
  158. package/skills/governance-protocol/SKILL.md +118 -116
  159. package/skills/init/SKILL.md +519 -341
  160. package/skills/init/references/update-askuser-prompts.md +198 -0
  161. package/skills/init/references/update-mode.md +305 -0
  162. package/skills/init/references/update-section-anchor-diff.md +163 -0
  163. package/skills/issue-debugging/SKILL.md +387 -385
  164. package/skills/issue-debugging/references/anti-patterns.md +245 -245
  165. package/skills/issue-debugging/references/debug-report-schema.md +227 -227
  166. package/skills/mock-detection/SKILL.md +528 -511
  167. package/skills/mock-detection/references/false-positive-prevention.md +402 -402
  168. package/skills/mock-detection/references/stub-patterns.md +236 -236
  169. package/skills/pipeline-templates/SKILL.md +262 -215
  170. package/skills/pipeline-templates/references/code-change-workflow.md +277 -277
  171. package/skills/pipeline-templates/references/code-review.md +348 -336
  172. package/skills/pipeline-templates/references/fix-validation.md +421 -421
  173. package/skills/pipeline-templates/references/new-feature.md +335 -335
  174. package/skills/pipeline-templates/references/research-brainstorm.md +161 -161
  175. package/skills/pipeline-templates/references/research-planning.md +257 -257
  176. package/skills/pipeline-templates/references/test-audit.md +389 -389
  177. package/skills/pipeline-templates/references/test-execution-fix.md +238 -238
  178. package/skills/plan-creation/SKILL.md +531 -497
  179. package/skills/plan-to-tasks/SKILL.md +151 -0
  180. package/skills/plan-to-tasks/references/askuserquestion-prompts.md +75 -0
  181. package/skills/plan-to-tasks/references/transform.md +253 -0
  182. package/skills/product-ideation/SKILL.md +2 -0
  183. package/skills/session-handoff/SKILL.md +167 -139
  184. package/skills/session-handoff/references/examples.md +223 -223
  185. package/skills/setup-lsp/SKILL.md +314 -312
  186. package/skills/setup-lsp/references/server-registry.md +85 -85
  187. package/skills/setup-lsp/references/troubleshooting.md +135 -135
  188. package/skills/spec-drift-check/SKILL.md +287 -0
  189. package/skills/spec-drift-check/evals/evals.json +33 -0
  190. package/skills/spec-drift-check/evals/triggers.json +19 -0
  191. package/skills/spec-drift-check/examples/clean-spec.md +52 -0
  192. package/skills/spec-drift-check/examples/expected-output-clean.yaml +96 -0
  193. package/skills/spec-drift-check/examples/expected-output-high-drift.yaml +78 -0
  194. package/skills/spec-drift-check/examples/expected-output-low-drift.yaml +67 -0
  195. package/skills/spec-drift-check/examples/high-drift-spec.md +49 -0
  196. package/skills/spec-drift-check/examples/low-drift-spec.md +39 -0
  197. package/skills/spec-drift-check/references/anti-patterns.md +65 -0
  198. package/skills/spec-drift-check/references/output-template.md +142 -0
  199. package/skills/spec-drift-check/references/step-1-claim-extraction.md +147 -0
  200. package/skills/spec-drift-check/references/step-2-verification-methods.md +203 -0
  201. package/skills/spec-drift-check/references/step-3-categorization.md +105 -0
  202. package/skills/spec-drift-check/references/step-4-plan-adjustment.md +122 -0
  203. package/skills/spec-drift-check/references/step-5-log-template.md +220 -0
  204. package/skills/spec-drift-check/references/step-6-decision-matrix.md +136 -0
  205. package/skills/subagent-output-templating/SKILL.md +417 -415
  206. package/skills/subagent-output-templating/references/examples.md +440 -440
  207. package/skills/subagent-prompting/SKILL.md +366 -364
  208. package/skills/subagent-prompting/references/examples.md +342 -342
  209. package/skills/test-audit/SKILL.md +545 -531
  210. package/skills/test-audit/references/known-limitations.md +41 -41
  211. package/skills/test-audit/references/priority-classification.md +30 -30
  212. package/skills/test-audit/references/prompts/deep-mode-detection.md +83 -83
  213. package/skills/test-audit/references/prompts/synthesis.md +58 -57
  214. package/skills/test-audit/references/rewrite-instructions.md +46 -46
  215. package/skills/test-audit/references/schemas/audit-output.yaml +131 -100
  216. package/skills/test-audit/references/schemas/diagnostic-output.yaml +56 -49
  217. package/skills/test-audit/references/two-gate-logic.md +43 -0
  218. package/skills/test-audit/scripts/data-flow-analyzer.ts +508 -509
  219. package/skills/test-audit/scripts/integration-mock-detector.ts +462 -462
  220. package/skills/test-audit/scripts/skip-detector.ts +211 -211
  221. package/skills/test-audit/scripts/verification-counter.ts +295 -295
  222. package/skills/test-classification/SKILL.md +326 -310
  223. package/skills/test-fixture-creation/SKILL.md +297 -295
  224. package/Infographics/01_product-ideation.png +0 -0
  225. package/Infographics/02_feature-research.png +0 -0
  226. package/Infographics/03_brainstorm.png +0 -0
  227. package/Infographics/04_plan-creation.png +0 -0
  228. package/Infographics/05_code-review.png +0 -0
  229. package/Infographics/06_test-audit.png +0 -0
  230. package/Infographics/07_fix-bug.png +0 -0
  231. package/skills/create-skill/references/template-reference-heavy.md +0 -111
  232. package/skills/create-skill/references/template-simple.md +0 -80
@@ -1,60 +1,95 @@
1
- # AT Teammate Prompt Structure (--exploratory mode)
2
-
3
- This reference defines the mandatory prompt sections for Agent Teams teammates in `--exploratory` mode. Load this file at Stage 3B only.
4
-
5
- ---
6
-
7
- ## Prompt Sections
8
-
9
- Each teammate prompt MUST include these sections:
10
-
11
- **1. Role instructions** — from the corresponding `references/role-*.md` file
12
-
13
- **2. Input context** — problem statement, research synthesis (if available), SME output
14
-
15
- **3. Dual-Output Contract (SA2 — MANDATORY in every teammate prompt):**
16
-
17
- > You MUST produce TWO outputs:
18
- >
19
- > **Output 1 — Full analysis (SA2 artifact):** Write your complete analysis to `$PROJECT_DIR/logs/brainstorm/{topic-slug}/{NN}-{role-slug}.md` using the output template provided. This is the permanent record.
20
- >
21
- > **Output 2 — Coordination summary (mailbox):** After writing your full analysis, send a 3-5 sentence summary to other teammates via sendMessage. Include: your recommendation (proceed/modify/defer/kill), your top finding, and your strongest concern.
22
-
23
- **4. Peer Debate Directives:**
24
-
25
- > **Selective challenge protocol:** After receiving summaries from other teammates:
26
- > - Read each teammate's summary
27
- > - If you DISAGREE with a position, send a targeted challenge via sendMessage explaining WHY you disagree with evidence
28
- > - If you AGREE, do NOT send a message (avoid noise)
29
- > - You may update your log file after the debate if your position changed — append a "## Post-Debate Update" section
30
-
31
- **5. AT Mitigation Patterns (ALL 3 MANDATORY in every teammate prompt):**
32
-
33
- > **CC-to-lead:** After any peer message exchange, also send a 1-line summary to the lead so the lead can track debate progress.
34
- >
35
- > **Task list coordination:** Update your task status to mark progress. Set to completed when your full analysis is written AND you have reviewed all peer summaries.
36
- >
37
- > **Completion signal:** When you have finished all work (analysis written, peer summaries reviewed, challenges sent if any), send a final message to the lead: "WORK COMPLETE — [role name]"
38
-
39
- **6. Critical Analystspecial AT directive (in addition to standard Critic prompt):**
40
-
41
- > **Deferred verdict:** You are active from the start of the debate, not a sequential gatekeeper. Challenge early findings from other teammates as they arrive. However, do NOT form your final verdict until all teammates have shared their summaries. Your formal verdict belongs in your log artifact, not in peer messages. In your log file, include a "## Debate Influence" section documenting which peer positions you challenged and how the debate shaped your final verdict.
42
-
43
- ---
44
-
45
- ## AT Configuration (Hardcoded)
46
-
47
- | Setting | Value | Rationale |
48
- |---------|-------|-----------|
49
- | Display mode | In-process | WSL2 safe default |
50
- | Lead mode | Delegate | Coordination only — lead does not do analysis |
51
- | Communication | Selective challenge | Broadcast summary once, respond only to disagreements |
52
- | Teammate count | 3 | Fixed for v1 |
53
-
54
- ---
55
-
56
- ## AT Failure Recovery
57
-
58
- - **Teammate fails mid-debate**: Fall back to Stage 3A for the failed role only. Partial AT output from successful teammates feeds into fallback as additional context.
59
- - **All teammates fail**: Fall back to full Stage 3A (--scoped pipeline).
60
- - **Lead context compaction**: Known platform limitation. Structural mitigation: SME runs before AT (reduces lead context pressure). Document in diagnostics if observed.
1
+ # AT Teammate Prompt Structure (--exploratory mode)
2
+
3
+ This reference defines the mandatory prompt sections for Agent Teams teammates in `--exploratory` mode. Load this file at Stage 3B only.
4
+
5
+ ---
6
+
7
+ ## Prompt Sections
8
+
9
+ Each teammate prompt MUST include these sections:
10
+
11
+ **1. Role instructions** — from the corresponding `references/role-*.md` file
12
+
13
+ **2. Input context** — problem statement, research synthesis (if available), SME output
14
+
15
+ **3. Dual-Output Contract (SA2 — MANDATORY in every teammate prompt):**
16
+
17
+ > You MUST produce TWO outputs:
18
+ >
19
+ > **Output 1 — Full analysis (SA2 artifact):** Write your complete analysis to `$PROJECT_DIR/logs/brainstorm/{topic-slug}/{NN}-{role-slug}.md` using the output template provided. This is the permanent record.
20
+ >
21
+ > **Output 2 — Coordination summary (mailbox):** After writing your full analysis, send a 3-5 sentence summary to other teammates via sendMessage. Include: your recommendation (proceed/modify/defer/kill), your top finding, and your strongest concern.
22
+
23
+ **4. Peer Debate Directives:**
24
+
25
+ > **Selective challenge protocol:** After receiving summaries from other teammates:
26
+ > - Read each teammate's summary
27
+ > - If you DISAGREE with a position, send a targeted challenge via sendMessage explaining WHY you disagree with evidence
28
+ > - If you AGREE, do NOT send a message (avoid noise)
29
+ > - You may update your log file after the debate if your position changed — append a "## Post-Debate Update" section
30
+
31
+ **5. AT Mitigation Patterns (ALL 4 MANDATORY in every teammate prompt):**
32
+
33
+ > **CC-ALL:** When sending peer DMs with challenges, findings, or coordination signals, you MUST CC every other teammate (including the lead). Peer DMs without full-team CC are invisible to non-recipients and will be treated as stalled work. Format: include `CC: <Teammate-A>, <Teammate-B>, Lead` at the top of the message. CC-ALL replaces the prior CC-to-lead pattern every participant sees every cross-cutting peer message in real time.
34
+ >
35
+ > **Task list coordination:** Update your task status to mark progress. Set to completed when your full analysis is written AND you have reviewed all peer summaries.
36
+ >
37
+ > **Completion signal:** When you have finished all work (analysis written, peer summaries reviewed, challenges sent if any), send a final message to the lead: "WORK COMPLETE — [role name]"
38
+ >
39
+ > **Confirmation handshake:** After sending WORK COMPLETE, the lead will reply with a confirmation request asking whether you have incorporated ALL inbound debate feedback. Reply `YES` only if you are fully complete; reply `NO` if still iterating. Do NOT silently re-engage after a `YES` if you receive a late peer DM, signal another WORK COMPLETE and the lead will re-confirm.
40
+
41
+ **6. Critical Analyst special AT directive (in addition to standard Critic prompt):**
42
+
43
+ > **Deferred verdict:** You are active from the start of the debate, not a sequential gatekeeper. Challenge early findings from other teammates as they arrive. However, do NOT form your final verdict until all teammates have shared their summaries. Your formal verdict belongs in your log artifact, not in peer messages. In your log file, include a "## Debate Influence" section documenting which peer positions you challenged and how the debate shaped your final verdict.
44
+
45
+ ---
46
+
47
+ ## AT Configuration (Hardcoded)
48
+
49
+ | Setting | Value | Rationale |
50
+ |---------|-------|-----------|
51
+ | Display mode | In-process | WSL2 safe default |
52
+ | Lead mode | Delegate | Coordination only lead does not do analysis |
53
+ | Communication | Selective challenge | Broadcast summary once, respond only to disagreements |
54
+ | Teammate count | 3 | Fixed for v1 |
55
+
56
+ ---
57
+
58
+ ## Lead Coordination Gates (Lead MUST enforce)
59
+
60
+ The lead enforces these gates BEFORE beginning synthesis. Synthesis-too-early is the most common AT failure mode; these gates exist to prevent it.
61
+
62
+ ### Work-Complete Confirmation Gate (MANDATORY)
63
+
64
+ When a teammate sends `WORK COMPLETE`, do NOT mark them terminal. Instead:
65
+
66
+ 1. Send the following DM to that teammate, CC all other teammates:
67
+ > "Confirm: have you incorporated ALL inbound debate feedback from this round? Reply YES when fully complete, NO if still iterating."
68
+ 2. Await explicit `YES` response.
69
+ 3. Only after receiving `YES` mark the teammate as terminal.
70
+ 4. If the teammate replies `NO` or does not respond within the timeout, treat them as active — do NOT begin synthesis.
71
+
72
+ **Reason**: teammates often send `WORK COMPLETE` at initial draft, then iterate on peer feedback. Without this gate, synthesis excludes post-debate outcomes.
73
+
74
+ ### Re-Entry Gate (MANDATORY)
75
+
76
+ If a teammate who previously confirmed `YES` sends any new peer DM, new WORK COMPLETE signal, or new content, mark them as **re-active** and require a fresh confirmation handshake (repeat the Confirmation Gate). The previous confirmation is **invalidated**.
77
+
78
+ **Reason**: a teammate may confirm done, then re-engage after receiving a late peer DM. Without re-entry handling, the late iteration is missed.
79
+
80
+ ### Rendezvous Gate (synthesis precondition)
81
+
82
+ The lead MUST NOT begin synthesis until ALL of the following are true:
83
+
84
+ 1. WORK COMPLETE + explicit `YES` confirmation received from ALL teammates (per Confirmation Gate)
85
+ 2. All shared task list tasks in terminal state
86
+ 3. All teammate log files exist and are non-empty
87
+ 4. **Quiet period of 30 seconds** with NO new peer DM activity AND NO re-active teammates. If any new activity lands during the quiet period, reset the 30s timer and re-evaluate the Confirmation Gate for any re-active teammate.
88
+
89
+ ---
90
+
91
+ ## AT Failure Recovery
92
+
93
+ - **Teammate fails mid-debate**: Fall back to Stage 3A for the failed role only. Partial AT output from successful teammates feeds into fallback as additional context.
94
+ - **All teammates fail**: Fall back to full Stage 3A (--scoped pipeline).
95
+ - **Lead context compaction**: Known platform limitation. Structural mitigation: SME runs before AT (reduces lead context pressure). Document in diagnostics if observed.
@@ -1,78 +1,78 @@
1
- # Role: Critical Analyst
2
-
3
- **Execution**:
4
- - `--scoped`: Sequential — LAST (solo, after all other roles complete). Receives ALL prior outputs.
5
- - `--exploratory`: AT teammate — active from start. Challenges in real time via peer debate. Deferred verdict.
6
-
7
- ## Purpose
8
-
9
- Perform cost-benefit analysis, challenge assumptions, validate the problem itself, and poke holes. Provides the final verdict.
10
-
11
- ## Focus Areas
12
-
13
- - Problem validation — should this problem be solved at all? Is the premise valid? What evidence suggests this is worth investing in?
14
- - Cost-benefit analysis — is the investment justified?
15
- - Assumption challenges — what are we assuming that might be wrong?
16
- - Gaps in the proposals — what has been overlooked?
17
- - Simpler alternatives — could a less ambitious approach work?
18
- - Kill criteria — under what conditions should this be abandoned?
19
- - Final verdict: proceed / modify / defer / kill (with conditions)
20
-
21
- ## Prompt Template
22
-
23
- ```
24
- GOAL: You are a critical analyst reviewing proposals for adopting [{topic}]. You
25
- have the original research, the SME analysis, and three role-based evaluations
26
- (PM, Architect, Dev Lead). Challenge everything: Is the investment justified?
27
- What assumptions might be wrong? What has been overlooked? Is there a simpler
28
- alternative? Provide a clear verdict.
29
-
30
- CONSTRAINTS:
31
- - You MUST read and reference ALL prior outputs (SME + role agents)
32
- - Start with Problem Validation: "Should this problem be solved at all? Is the
33
- premise valid? What evidence suggests this is worth investing in?" This is
34
- distinct from assumption challenges — it challenges the TOPIC itself.
35
- - Be genuinely critical, not performatively contrarian — ground challenges in evidence
36
- - Propose specific conditions under which your verdict would change
37
- - Be prescriptive: "Do X" not "Consider X or Y"
38
- - Target 1200-1800 words
39
-
40
- REASONING DEPTH — Highest-Risk Assumption Focus:
41
- You MUST follow this reasoning process (do not skip to writing the final output):
42
-
43
- 1. CATALOG: List every assumption made across ALL 4 prior outputs (SME, PM,
44
- Architect, Dev Lead). Be exhaustive — assumptions hide in scope boundaries,
45
- effort estimates, integration points, and "obvious" claims.
46
- 2. RANK: Rank assumptions by risk (probability of being wrong × impact if wrong).
47
- Identify the SINGLE highest-risk assumption across all proposals.
48
- 3. STRESS-TEST: For the top 3 highest-risk assumptions, reason through:
49
- - What evidence supports this assumption?
50
- - What evidence contradicts it?
51
- - What would happen to the entire proposal if this assumption is wrong?
52
- - What would it cost to validate this assumption before proceeding?
53
- 4. FOCAL POINT: In your output, explicitly call out:
54
- > **Highest-Risk Assumption**: {assumption}
55
- > **If wrong**: {consequence}
56
- > **To validate**: {what would need to be checked}
57
-
58
- This gives the synthesis a clear focal point for the post-synthesis evaluation gate.
59
-
60
- Only after completing all 4 steps, write your final output using the template below.
61
-
62
- CONTEXT:
63
- {topic_description}
64
- {research_synthesis_if_available}
65
- {sme_output}
66
- {product_manager_output}
67
- {technical_architect_output}
68
- {development_lead_output}
69
-
70
- OUTPUT:
71
- Write findings to: {output_path}
72
- Use the critic output template provided below for document structure.
73
- Use YAML header with: role, topic, verdict (proceed/modify/defer/kill),
74
- verdict_confidence (high/medium/low), conditions, key_challenges (3-5 bullets)
75
- Follow with detailed analysis organized by the focus areas above.
76
-
77
- {critic_output_template}
78
- ```
1
+ # Role: Critical Analyst
2
+
3
+ **Execution**:
4
+ - `--scoped`: Sequential — LAST (solo, after all other roles complete). Receives ALL prior outputs.
5
+ - `--exploratory`: AT teammate — active from start. Challenges in real time via peer debate. Deferred verdict.
6
+
7
+ ## Purpose
8
+
9
+ Perform cost-benefit analysis, challenge assumptions, validate the problem itself, and poke holes. Provides the final verdict.
10
+
11
+ ## Focus Areas
12
+
13
+ - Problem validation — should this problem be solved at all? Is the premise valid? What evidence suggests this is worth investing in?
14
+ - Cost-benefit analysis — is the investment justified?
15
+ - Assumption challenges — what are we assuming that might be wrong?
16
+ - Gaps in the proposals — what has been overlooked?
17
+ - Simpler alternatives — could a less ambitious approach work?
18
+ - Kill criteria — under what conditions should this be abandoned?
19
+ - Final verdict: proceed / modify / defer / kill (with conditions)
20
+
21
+ ## Prompt Template
22
+
23
+ ```
24
+ GOAL: You are a critical analyst reviewing proposals for adopting [{topic}]. You
25
+ have the original research, the SME analysis, and three role-based evaluations
26
+ (PM, Architect, Dev Lead). Challenge everything: Is the investment justified?
27
+ What assumptions might be wrong? What has been overlooked? Is there a simpler
28
+ alternative? Provide a clear verdict.
29
+
30
+ CONSTRAINTS:
31
+ - You MUST read and reference ALL prior outputs (SME + role agents)
32
+ - Start with Problem Validation: "Should this problem be solved at all? Is the
33
+ premise valid? What evidence suggests this is worth investing in?" This is
34
+ distinct from assumption challenges — it challenges the TOPIC itself.
35
+ - Be genuinely critical, not performatively contrarian — ground challenges in evidence
36
+ - Propose specific conditions under which your verdict would change
37
+ - Be prescriptive: "Do X" not "Consider X or Y"
38
+ - Target 1200-1800 words
39
+
40
+ REASONING DEPTH — Highest-Risk Assumption Focus:
41
+ You MUST follow this reasoning process (do not skip to writing the final output):
42
+
43
+ 1. CATALOG: List every assumption made across ALL 4 prior outputs (SME, PM,
44
+ Architect, Dev Lead). Be exhaustive — assumptions hide in scope boundaries,
45
+ effort estimates, integration points, and "obvious" claims.
46
+ 2. RANK: Rank assumptions by risk (probability of being wrong × impact if wrong).
47
+ Identify the SINGLE highest-risk assumption across all proposals.
48
+ 3. STRESS-TEST: For the top 3 highest-risk assumptions, reason through:
49
+ - What evidence supports this assumption?
50
+ - What evidence contradicts it?
51
+ - What would happen to the entire proposal if this assumption is wrong?
52
+ - What would it cost to validate this assumption before proceeding?
53
+ 4. FOCAL POINT: In your output, explicitly call out:
54
+ > **Highest-Risk Assumption**: {assumption}
55
+ > **If wrong**: {consequence}
56
+ > **To validate**: {what would need to be checked}
57
+
58
+ This gives the synthesis a clear focal point for the post-synthesis evaluation gate.
59
+
60
+ Only after completing all 4 steps, write your final output using the template below.
61
+
62
+ CONTEXT:
63
+ {topic_description}
64
+ {research_synthesis_if_available}
65
+ {sme_output}
66
+ {product_manager_output}
67
+ {technical_architect_output}
68
+ {development_lead_output}
69
+
70
+ OUTPUT:
71
+ Write findings to: {output_path}
72
+ Use the critic output template provided below for document structure.
73
+ Use YAML header with: role, topic, verdict (proceed/modify/defer/kill),
74
+ verdict_confidence (high/medium/low), conditions, key_challenges (3-5 bullets)
75
+ Follow with detailed analysis organized by the focus areas above.
76
+
77
+ {critic_output_template}
78
+ ```
@@ -1,66 +1,66 @@
1
- # Role: Senior Development Lead
2
-
3
- **Execution Order**: Parallel — SECOND (runs alongside Product Manager and Technical Architect)
4
-
5
- ## Purpose
6
-
7
- Assess implementation feasibility, effort, and practical risks. Receives the SME's project context analysis as input.
8
-
9
- ## Focus Areas
10
-
11
- - Implementation feasibility — can this be built with available tools?
12
- - Effort estimation — complexity and session count
13
- - Implementation risks — what could go wrong during building?
14
- - Testing strategy — how do we verify this works?
15
- - Dependencies and ordering — what must be built first?
16
-
17
- ## Prompt Template
18
-
19
- ```
20
- GOAL: You are a senior development lead responsible for building [{topic}].
21
- Using the research findings and SME analysis, assess feasibility, estimate
22
- effort, identify implementation risks, and define build order.
23
-
24
- CONSTRAINTS:
25
- - Focus on your role's perspective — other roles are handled by separate agents
26
- - Ground all recommendations in the research findings (do not re-research the topic),
27
- but DO explore the codebase using Glob, Grep, and Read to validate your
28
- implementation plan against actual project structure and tooling
29
- - Reference specific project assets by path when discussing integration points
30
- - Be prescriptive: "Do X" not "Consider X or Y"
31
- - Target 1200-1800 words
32
-
33
- REASONING DEPTH — Propose-Challenge-Refine:
34
- You MUST follow this reasoning process (do not skip to writing the final output):
35
-
36
- 1. PROPOSE: Form your initial implementation plan based on the research findings
37
- and SME context. Estimate effort, identify risks, define build order.
38
- 2. VALIDATE: Explore the codebase to verify your plan:
39
- - Do the dependencies you identified actually exist?
40
- - Does the project's tooling (build system, test framework) support your plan?
41
- - Are there existing implementation patterns you should follow for consistency?
42
- - Is the effort estimate realistic given the codebase complexity you observe?
43
- 3. CHALLENGE: Self-challenge your plan:
44
- - "What am I assuming about implementation difficulty that I haven't verified?"
45
- - "What is the riskiest step in my build order?"
46
- - "If I'm wrong about effort estimates, which items are most likely underestimated?"
47
- - "What testing strategy gaps exist in my plan?"
48
- 4. REFINE: Adjust your plan based on the validation and challenge steps.
49
- Document what changed and why.
50
-
51
- Only after completing all 4 steps, write your final output using the template below.
52
-
53
- CONTEXT:
54
- {topic_description}
55
- {research_synthesis_if_available}
56
- {sme_output}
57
-
58
- OUTPUT:
59
- Write findings to: {output_path}
60
- Use the output template provided below for document structure.
61
- Use YAML header with: role, topic, recommendation (proceed/modify/defer/kill),
62
- key_findings (3-5 bullets)
63
- Follow with detailed analysis organized by the focus areas above.
64
-
65
- {role_output_template}
66
- ```
1
+ # Role: Senior Development Lead
2
+
3
+ **Execution Order**: Parallel — SECOND (runs alongside Product Manager and Technical Architect)
4
+
5
+ ## Purpose
6
+
7
+ Assess implementation feasibility, effort, and practical risks. Receives the SME's project context analysis as input.
8
+
9
+ ## Focus Areas
10
+
11
+ - Implementation feasibility — can this be built with available tools?
12
+ - Effort estimation — complexity and session count
13
+ - Implementation risks — what could go wrong during building?
14
+ - Testing strategy — how do we verify this works?
15
+ - Dependencies and ordering — what must be built first?
16
+
17
+ ## Prompt Template
18
+
19
+ ```
20
+ GOAL: You are a senior development lead responsible for building [{topic}].
21
+ Using the research findings and SME analysis, assess feasibility, estimate
22
+ effort, identify implementation risks, and define build order.
23
+
24
+ CONSTRAINTS:
25
+ - Focus on your role's perspective — other roles are handled by separate agents
26
+ - Ground all recommendations in the research findings (do not re-research the topic),
27
+ but DO explore the codebase using Glob, Grep, and Read to validate your
28
+ implementation plan against actual project structure and tooling
29
+ - Reference specific project assets by path when discussing integration points
30
+ - Be prescriptive: "Do X" not "Consider X or Y"
31
+ - Target 1200-1800 words
32
+
33
+ REASONING DEPTH — Propose-Challenge-Refine:
34
+ You MUST follow this reasoning process (do not skip to writing the final output):
35
+
36
+ 1. PROPOSE: Form your initial implementation plan based on the research findings
37
+ and SME context. Estimate effort, identify risks, define build order.
38
+ 2. VALIDATE: Explore the codebase to verify your plan:
39
+ - Do the dependencies you identified actually exist?
40
+ - Does the project's tooling (build system, test framework) support your plan?
41
+ - Are there existing implementation patterns you should follow for consistency?
42
+ - Is the effort estimate realistic given the codebase complexity you observe?
43
+ 3. CHALLENGE: Self-challenge your plan:
44
+ - "What am I assuming about implementation difficulty that I haven't verified?"
45
+ - "What is the riskiest step in my build order?"
46
+ - "If I'm wrong about effort estimates, which items are most likely underestimated?"
47
+ - "What testing strategy gaps exist in my plan?"
48
+ 4. REFINE: Adjust your plan based on the validation and challenge steps.
49
+ Document what changed and why.
50
+
51
+ Only after completing all 4 steps, write your final output using the template below.
52
+
53
+ CONTEXT:
54
+ {topic_description}
55
+ {research_synthesis_if_available}
56
+ {sme_output}
57
+
58
+ OUTPUT:
59
+ Write findings to: {output_path}
60
+ Use the output template provided below for document structure.
61
+ Use YAML header with: role, topic, recommendation (proceed/modify/defer/kill),
62
+ key_findings (3-5 bullets)
63
+ Follow with detailed analysis organized by the focus areas above.
64
+
65
+ {role_output_template}
66
+ ```
@@ -1,79 +1,79 @@
1
- # Role: Product & Delivery Lead
2
-
3
- **Execution Mode**: Agent Teams teammate — `--exploratory` mode ONLY
4
-
5
- **Note**: This combined role exists only in `--exploratory` mode. In `--scoped` mode, the Senior Product Manager and Senior Development Lead operate as separate parallel agents.
6
-
7
- ## Purpose
8
-
9
- Evaluate user value, scope boundaries, implementation feasibility, and delivery planning. This role combines the PM's value/prioritization lens with the Dev Lead's feasibility/effort lens, enabling integrated trade-off analysis rather than separate perspectives that must be reconciled later.
10
-
11
- ## Focus Areas
12
-
13
- - User value proposition — who benefits and how?
14
- - Prioritization — what delivers the most value soonest?
15
- - Scope boundaries — what is v1 vs. deferred?
16
- - Implementation feasibility — can this be built with available tools?
17
- - Effort estimation — complexity and session count
18
- - Build order — dependencies, risks, testing strategy
19
- - Value-effort trade-offs — which features have the best ROI?
20
-
21
- ## Prompt Template
22
-
23
- ```
24
- GOAL: You are a product & delivery lead evaluating [{topic}]. Using the research
25
- findings and SME analysis, assess user value, prioritization, scope boundaries,
26
- implementation feasibility, effort, and build order. Your unique perspective
27
- integrates product thinking with delivery planning — assess trade-offs between
28
- value and effort directly rather than in isolation.
29
-
30
- CONSTRAINTS:
31
- - Focus on your combined role's perspective — architecture is handled by a separate agent
32
- - Ground all recommendations in the research findings (do not re-research the topic),
33
- but DO explore the codebase using Glob, Grep, and Read to validate your
34
- implementation plan against actual project structure and tooling
35
- - Reference specific project assets by path when discussing integration points
36
- - Be prescriptive: "Do X" not "Consider X or Y"
37
- - Target 1500-2000 words (broader scope than individual roles)
38
-
39
- REASONING DEPTH — Evaluate-Plan-Challenge:
40
- You MUST follow this reasoning process (do not skip to writing the final output):
41
-
42
- 1. EVALUATE: Form your initial assessment of user value and scope boundaries.
43
- For each feature/capability, assess:
44
- - The user value it delivers and who benefits
45
- - What happens if this is deferred (cost of delay)
46
- - Whether it is v1 or deferred
47
- 2. PLAN: For each v1 item, develop the delivery plan:
48
- - Implementation feasibility given current tooling
49
- - Effort estimate (complexity and session count)
50
- - Dependencies and build order
51
- - Testing strategy
52
- 3. VALIDATE: Explore the codebase to verify your plan:
53
- - Do the dependencies you identified actually exist?
54
- - Does the project's tooling support your plan?
55
- - Are there existing patterns you should follow?
56
- - Is the effort estimate realistic given codebase complexity?
57
- 4. CHALLENGE: Self-challenge the integrated plan:
58
- - "Am I prioritizing high-effort items because they seem impressive, not because they deliver the most value?"
59
- - "If I'm wrong about effort estimates, which items flip from 'worth it' to 'defer'?"
60
- - "What is the minimum viable scope that still delivers the core value proposition?"
61
- - "What testing gaps exist?"
62
- Adjust recommendations based on this self-challenge.
63
-
64
- Only after completing all 4 steps, write your final output using the template below.
65
-
66
- CONTEXT:
67
- {topic_description}
68
- {research_synthesis_if_available}
69
- {sme_output}
70
-
71
- OUTPUT:
72
- Write findings to: {output_path}
73
- Use the output template provided below for document structure.
74
- Use YAML header with: role, topic, recommendation (proceed/modify/defer/kill),
75
- key_findings (3-5 bullets)
76
- Follow with detailed analysis organized by the focus areas above.
77
-
78
- {role_output_template}
79
- ```
1
+ # Role: Product & Delivery Lead
2
+
3
+ **Execution Mode**: Agent Teams teammate — `--exploratory` mode ONLY
4
+
5
+ **Note**: This combined role exists only in `--exploratory` mode. In `--scoped` mode, the Senior Product Manager and Senior Development Lead operate as separate parallel agents.
6
+
7
+ ## Purpose
8
+
9
+ Evaluate user value, scope boundaries, implementation feasibility, and delivery planning. This role combines the PM's value/prioritization lens with the Dev Lead's feasibility/effort lens, enabling integrated trade-off analysis rather than separate perspectives that must be reconciled later.
10
+
11
+ ## Focus Areas
12
+
13
+ - User value proposition — who benefits and how?
14
+ - Prioritization — what delivers the most value soonest?
15
+ - Scope boundaries — what is v1 vs. deferred?
16
+ - Implementation feasibility — can this be built with available tools?
17
+ - Effort estimation — complexity and session count
18
+ - Build order — dependencies, risks, testing strategy
19
+ - Value-effort trade-offs — which features have the best ROI?
20
+
21
+ ## Prompt Template
22
+
23
+ ```
24
+ GOAL: You are a product & delivery lead evaluating [{topic}]. Using the research
25
+ findings and SME analysis, assess user value, prioritization, scope boundaries,
26
+ implementation feasibility, effort, and build order. Your unique perspective
27
+ integrates product thinking with delivery planning — assess trade-offs between
28
+ value and effort directly rather than in isolation.
29
+
30
+ CONSTRAINTS:
31
+ - Focus on your combined role's perspective — architecture is handled by a separate agent
32
+ - Ground all recommendations in the research findings (do not re-research the topic),
33
+ but DO explore the codebase using Glob, Grep, and Read to validate your
34
+ implementation plan against actual project structure and tooling
35
+ - Reference specific project assets by path when discussing integration points
36
+ - Be prescriptive: "Do X" not "Consider X or Y"
37
+ - Target 1500-2000 words (broader scope than individual roles)
38
+
39
+ REASONING DEPTH — Evaluate-Plan-Challenge:
40
+ You MUST follow this reasoning process (do not skip to writing the final output):
41
+
42
+ 1. EVALUATE: Form your initial assessment of user value and scope boundaries.
43
+ For each feature/capability, assess:
44
+ - The user value it delivers and who benefits
45
+ - What happens if this is deferred (cost of delay)
46
+ - Whether it is v1 or deferred
47
+ 2. PLAN: For each v1 item, develop the delivery plan:
48
+ - Implementation feasibility given current tooling
49
+ - Effort estimate (complexity and session count)
50
+ - Dependencies and build order
51
+ - Testing strategy
52
+ 3. VALIDATE: Explore the codebase to verify your plan:
53
+ - Do the dependencies you identified actually exist?
54
+ - Does the project's tooling support your plan?
55
+ - Are there existing patterns you should follow?
56
+ - Is the effort estimate realistic given codebase complexity?
57
+ 4. CHALLENGE: Self-challenge the integrated plan:
58
+ - "Am I prioritizing high-effort items because they seem impressive, not because they deliver the most value?"
59
+ - "If I'm wrong about effort estimates, which items flip from 'worth it' to 'defer'?"
60
+ - "What is the minimum viable scope that still delivers the core value proposition?"
61
+ - "What testing gaps exist?"
62
+ Adjust recommendations based on this self-challenge.
63
+
64
+ Only after completing all 4 steps, write your final output using the template below.
65
+
66
+ CONTEXT:
67
+ {topic_description}
68
+ {research_synthesis_if_available}
69
+ {sme_output}
70
+
71
+ OUTPUT:
72
+ Write findings to: {output_path}
73
+ Use the output template provided below for document structure.
74
+ Use YAML header with: role, topic, recommendation (proceed/modify/defer/kill),
75
+ key_findings (3-5 bullets)
76
+ Follow with detailed analysis organized by the focus areas above.
77
+
78
+ {role_output_template}
79
+ ```