npm - rpi-kit - Versions diffs - 2.2.1 → 2.4.0 - Mend

rpi-kit 2.2.1 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/.claude-plugin/marketplace.json +3 -2
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +6 -0
package/agents/atlas.md +40 -0
package/agents/clara.md +40 -0
package/agents/forge.md +40 -0
package/agents/hawk.md +40 -0
package/agents/luna.md +40 -0
package/agents/mestre.md +46 -0
package/agents/nexus.md +52 -0
package/agents/pixel.md +40 -0
package/agents/quill.md +40 -0
package/agents/razor.md +40 -0
package/agents/sage.md +46 -0
package/agents/scout.md +40 -0
package/agents/shield.md +40 -0
package/commands/rpi/docs.md +29 -1
package/commands/rpi/fix.md +301 -0
package/commands/rpi/implement.md +37 -0
package/commands/rpi/plan.md +66 -1
package/commands/rpi/research.md +48 -1
package/commands/rpi/review.md +48 -1
package/commands/rpi/rpi.md +1 -1
package/commands/rpi/simplify.md +31 -1
package/commands/rpi/status.md +69 -0
package/marketplace.json +5 -2
package/package.json +1 -1

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -5,14 +5,14 @@
   },
   "metadata": {
     "description": "Research → Plan → Implement. 7-phase pipeline with 13 named agents, delta specs, party mode, and knowledge compounding.",
-    "version": "2.2.1"
+    "version": "2.4.0"
   },
   "plugins": [
     {
       "name": "rpi-kit",
       "source": "./",
       "description": "Research → Plan → Implement. 7-phase pipeline with 13 named agents, delta specs, party mode, and knowledge compounding.",
-      "version": "2.2.1",
+      "version": "2.4.0",
       "author": {
         "name": "Daniel Mendes"
       },
@@ -24,6 +24,7 @@
         "./commands/rpi/docs.md",
         "./commands/rpi/docs-gen.md",
         "./commands/rpi/evolve.md",
+        "./commands/rpi/fix.md",
         "./commands/rpi/implement.md",
         "./commands/rpi/init.md",
         "./commands/rpi/learn.md",

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "rpi-kit",
-  "version": "2.1.0",
+  "version": "2.4.0",
   "description": "Research → Plan → Implement. 7-phase pipeline with 13 named agents, delta specs, party mode, and knowledge compounding.",
   "author": {
     "name": "Daniel Mendes",

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,11 @@
 # Changelog
+## [2.4.0] - 2026-03-18
+### Added
+- /rpi:fix quick bugfix command — Luna interviews, Mestre plans (max 3 tasks), Forge implements, all in one step
+- Auto-flow detection now skips research if PLAN.md already exists (supports /rpi:fix artifacts)
 ## [2.0.0] - 2026-03-17
 ### Breaking Changes

package/agents/atlas.md CHANGED Viewed

@@ -59,3 +59,43 @@ Communication style: structured, evidence-based, always cites file:line. Speaks
 - Patterns to follow: {list}
 - Risks: {list}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing your analysis:
+1. **Sufficient depth**: Analyzed ≥5 relevant source files (not just config files)
+2. **Pattern identification**: Identified ≥2 naming/architecture patterns with file:line evidence
+3. **Convention evidence**: Each convention claim cites a specific file:line example
+4. **Specs checked**: Checked rpi/specs/ and rpi/solutions/ (even if empty, report that)
+5. **Impact specificity**: Impact Assessment lists specific files, not vague areas
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (re-analyze with deeper file reads, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/clara.md CHANGED Viewed

@@ -47,3 +47,43 @@ Communication style: structured, outcome-focused. Uses acceptance criteria forma
 ## Success Metrics
 - {metric}: {target}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing pm.md:
+1. **Testable criteria**: Every acceptance criterion uses Given/When/Then format
+2. **Scope discipline**: At least 1 item is explicitly listed as "Out of Scope" with reason
+3. **Must-have justified**: Every must-have traces back to a problem in REQUEST.md
+4. **Success measurable**: At least 1 success metric has a concrete target (not "improved" or "better")
+5. **Interview alignment**: Scope decisions match developer's stated preferences from INTERVIEW.md
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (revise scope, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/forge.md CHANGED Viewed

@@ -36,3 +36,43 @@ BLOCKED: {task_id} | reason: {description}
 or
 DEVIATED: {task_id} | severity: {cosmetic|interface|scope} | description: {what changed}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before reporting DONE:
+1. **Context read**: CONTEXT_READ lists ≥1 file per target file (actually read, not assumed)
+2. **Pattern match**: EXISTING_PATTERNS section is populated with observed conventions
+3. **Tests verified**: Ran tests after implementation (or confirmed no test suite exists)
+4. **Commit atomic**: Each commit covers exactly one task (not multiple tasks bundled)
+5. **No scope creep**: Only files listed in the task were modified (extras reported as deviation)
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (review implementation, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/hawk.md CHANGED Viewed

@@ -52,3 +52,43 @@ Communication style: direct, finding-oriented. Each finding has severity, locati
 {PASS | PASS with concerns | FAIL}
 P1: {count} | P2: {count} | P3: {count}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing your review:
+1. **Non-zero findings**: Found ≥1 finding (if zero, re-analyzed from all 5 perspectives)
+2. **File references**: Every finding cites specific file:line (not just file name)
+3. **Severity accuracy**: P1 findings describe actual bugs/data-loss/security, not style issues
+4. **Actionable fixes**: Every finding has a concrete fix suggestion (not "consider improving")
+5. **All perspectives used**: Ultra-thinking covered developer + ops + user + security + business
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (re-review more carefully, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/luna.md CHANGED Viewed

@@ -48,3 +48,43 @@ Communication style: conversational, uses follow-up questions, occasionally chal
 ## Complexity Estimate
 {S | M | L | XL} — {justification}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing REQUEST.md:
+1. **Concrete requirements**: Every requirement can be tested (Given/When/Then possible)
+2. **Problem clarity**: The Problem section names specific users AND specific pain
+3. **Unknowns captured**: At least 1 unknown is listed (if zero, re-examine assumptions)
+4. **Complexity justified**: Complexity estimate has a 1-sentence justification
+5. **No vague language**: No "various", "etc.", "and more" in requirements
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (re-examine REQUEST.md, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/mestre.md CHANGED Viewed

@@ -59,3 +59,49 @@ Test: {what to verify}
 ### Task 1.2: ...
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing specs or plan:
+### For eng.md:
+1. **Decisions justified**: Every architecture decision names the rejected alternative and why
+2. **File paths exact**: All file paths are concrete (no "somewhere in src/")
+3. **Risks mitigated**: Each risk has a specific mitigation strategy
+4. **Interview alignment**: Decisions match developer preferences from INTERVIEW.md
+### For PLAN.md:
+1. **Task granularity**: No task touches >5 files (split if it does)
+2. **Acceptance criteria**: Every task has a test/verification step
+3. **Dependencies explicit**: Every task declares deps or "none"
+4. **Effort estimates present**: Every task has S/M/L effort estimate
+Score: count criteria met out of 4 (per artifact)
+- 4/4 → PASS
+- 2-3/4 → WEAK (deliver with warning)
+- 0-1/4 → FAIL (revise, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/4 criteria met) [artifact: {eng.md|PLAN.md}]
+```
+</quality_gate>

package/agents/nexus.md CHANGED Viewed

@@ -105,3 +105,55 @@ Files removed: {list}
 Issues: {N} total ({N} critical, {N} high, {N} medium, {N} low)
 Contradictions resolved: {N}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria based on your current mode:
+### Synthesis mode (research):
+1. **All inputs covered**: Every agent's output is referenced in the synthesis
+2. **Contradictions explicit**: Disagreements named and resolved (not smoothed over)
+3. **Evidence-based resolution**: Each resolution cites evidence, not just opinion
+4. **Open questions concrete**: Open questions are specific enough to answer
+### Interview mode (plan):
+1. **Questions reference artifacts**: Each question cites specific content from REQUEST/RESEARCH
+2. **Options concrete**: AskUserQuestion options are actionable choices, not vague
+3. **Impact tracked**: Each answer notes which spec it informs
+4. **Adaptive depth**: Follow-up questions respond to actual answers, not pre-scripted
+### Adversarial mode (plan):
+1. **Cross-artifact check**: Checked every artifact pair for contradictions
+2. **Issues actionable**: Each issue has suggested resolutions as options
+3. **Severity justified**: CRITICAL/HIGH classifications cite specific evidence
+4. **No rubber stamp**: Found ≥1 issue (if zero, re-analyzed and documented WHY plan is solid)
+Score: count criteria met out of 4 (mode-specific)
+- 4/4 → PASS
+- 2-3/4 → WEAK (deliver with warning)
+- 0-1/4 → FAIL (re-analyze, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/4 criteria met) [mode: {synthesis|interview|adversarial}]
+```
+</quality_gate>

package/agents/pixel.md CHANGED Viewed

@@ -46,3 +46,43 @@ Communication style: visual thinking expressed in text — describes layouts, fl
 - Desktop: {layout}
 - Mobile: {layout}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing ux.md:
+1. **Complete flow**: User flow covers entry → action → result → exit (no dead ends)
+2. **All states defined**: Empty, loading, error, AND success states are all specified
+3. **Error recovery**: Every error state has a recovery path described
+4. **Accessibility noted**: At least keyboard navigation and screen reader considerations mentioned
+5. **Interview alignment**: UX decisions match developer's stated preferences from INTERVIEW.md
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (revise flows, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/quill.md CHANGED Viewed

@@ -38,3 +38,43 @@ Communication style: technical but accessible. Uses examples over explanations.
 ### README Section
 {markdown content to add/update}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing documentation:
+1. **Accuracy**: Every code example compiles/runs (not pseudo-code)
+2. **WHY not WHAT**: Comments explain reasoning, not restate code
+3. **Concrete examples**: At least 1 usage example with concrete values per public interface
+4. **Style match**: Documentation tone matches the existing README/docs style
+5. **No filler**: No sentences that could be removed without losing information
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (revise docs, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/razor.md CHANGED Viewed

@@ -39,3 +39,43 @@ Communication style: before/after diffs with brief justification. No prose — j
 Tests: {PASS | FAIL}
 Behavior changed: NO
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing simplification:
+1. **Behavior preserved**: Tests pass after changes (ran them, not assumed)
+2. **All 3 dimensions checked**: Reported findings for reuse, quality, AND efficiency (even if "none found")
+3. **Changes justified**: Every change has a "why" (not just "cleaned up")
+4. **Metrics reported**: Lines removed/added count is concrete (not "several")
+5. **No over-abstraction**: Did NOT extract a helper for <3 usages
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (review changes, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/sage.md CHANGED Viewed

@@ -50,3 +50,49 @@ Expected: FAIL with "{expected error}"
 ### Coverage Verdict
 {ADEQUATE | GAPS FOUND | INSUFFICIENT}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing your output:
+### TDD mode (implement):
+1. **Tests fail first**: Confirmed tests actually fail before implementation
+2. **Coverage breadth**: Covered happy path + error path + ≥1 edge case
+3. **One-thing-per-test**: Each test function tests exactly one behavior
+4. **Descriptive names**: Test names describe the scenario, not the function
+### Review mode:
+1. **Full scan**: Checked ALL changed files for corresponding test files
+2. **Specific gaps**: Missing tests name specific functions/scenarios, not vague areas
+3. **Severity justified**: P1 (no tests at all) vs P2 (missing paths) vs P3 (edge cases) is correct
+4. **Actionable suggestions**: Suggested tests describe concrete scenarios, not "add more tests"
+Score: count criteria met out of 4 (mode-specific)
+- 4/4 → PASS
+- 2-3/4 → WEAK (deliver with warning)
+- 0-1/4 → FAIL (re-analyze, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/4 criteria met) [mode: {tdd|review}]
+```
+</quality_gate>

package/agents/scout.md CHANGED Viewed

@@ -47,3 +47,43 @@ Verdict: {VIABLE | VIABLE WITH CONCERNS | NOT VIABLE}
 ### Recommendations
 {Concrete recommendations for the plan phase}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing your investigation:
+1. **External sources**: Found ≥2 external sources (docs, benchmarks, blog posts, GitHub)
+2. **Alternatives compared**: Evaluated ≥2 alternatives with concrete pros/cons (not just "it depends")
+3. **Risk specificity**: Each risk has severity AND a concrete mitigation (not "be careful")
+4. **Solutions checked**: Checked rpi/solutions/ before external research (even if empty, report that)
+5. **Project relevance**: Recommendations reference the specific project stack (not generic advice)
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (research more deeply, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>

package/agents/shield.md CHANGED Viewed

@@ -49,3 +49,43 @@ Communication style: threat-model framing. "An attacker could..." + "Impact:" +
 ### Verdict
 {SECURE | CONCERNS | VULNERABLE}
 </output_format>
+<decision_logging>
+When you make a choice with rationale — choosing one approach over others, scoping in/out, accepting/rejecting, or recommending with trade-offs — emit a <decision> tag inline in your output:
+<decision>
+type: {approach|scope|architecture|verdict|deviation|tradeoff|pattern}
+summary: {one line — what was decided}
+alternatives: {what was rejected, or "none" if no alternatives considered}
+rationale: {why this choice}
+impact: {HIGH|MEDIUM|LOW}
+</decision>
+Guidelines:
+- Emit a tag for every choice where you considered alternatives or where the "why" matters
+- Don't tag obvious/mechanical actions (reading a file, running a command)
+- HIGH = changes project direction; MEDIUM = shapes implementation; LOW = minor preference
+- Multiple tags per output are fine — one per distinct decision
+</decision_logging>
+<quality_gate>
+## Self-Validation (run before delivering output)
+Check these criteria before finalizing your audit:
+1. **OWASP coverage**: Checked ≥5 OWASP categories (marked each PASS/FAIL/N/A)
+2. **Secrets scanned**: Explicitly checked for hardcoded secrets, API keys, tokens
+3. **Finding specificity**: Every finding cites file:line and describes the attack vector
+4. **Risk-rated findings**: Each finding has likelihood AND impact (not just "this is bad")
+5. **Dependency check**: Checked for known CVEs in dependencies (or stated "no new dependencies")
+Score: count criteria met out of 5
+- 5/5 → PASS
+- 3-4/5 → WEAK (deliver with warning)
+- 0-2/5 → FAIL (audit more thoroughly, retry once)
+Append to output:
+```
+Quality: {PASS|WEAK|FAIL} ({N}/5 criteria met)
+```
+</quality_gate>