npm - universal-dev-standards - Versions diffs - 5.0.0-rc.8 → 5.0.0 - Mend

universal-dev-standards 5.0.0-rc.8 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (355) hide show

package/bundled/core/pipeline-integration-standards.md ADDED Viewed

@@ -0,0 +1,230 @@
+# Pipeline Integration Standards
+> **Language**: English | [繁體中文](../locales/zh-TW/core/pipeline-integration-standards.md)
+**Applicability**: All software projects using automated development pipelines
+**Scope**: universal
+---
+## Overview
+Pipeline Integration Standards define how automated development pipelines should read project configuration, execute development stages, and adapt behavior based on project context. This standard provides a language-agnostic, framework-agnostic model for AI-assisted and CI/CD-driven development workflows.
+## References
+| Standard/Source | Content |
+|----------------|---------|
+| ISO/IEC 12207 | Software Lifecycle Processes |
+| ISO/IEC 15504 (SPICE) | Process Assessment |
+| Continuous Delivery (Jez Humble) | Pipeline design principles |
+| DORA Metrics | Deployment frequency, lead time, MTTR, change failure rate |
+---
+## Configuration Contract
+### UDS Configuration Block
+Projects using automated pipelines MUST declare their pipeline preferences in a standard configuration block. The configuration block is typically placed in the project's manifest file (e.g., `manifest.json`, `uds.config.json`, or equivalent).
+### Standard Toggle Names
+| Toggle | Type | Default | Description |
+|--------|------|---------|-------------|
+| `autoSpecGeneration` | boolean | false | Automatically generate SDD specs from PRD/user stories |
+| `autoDerive` | boolean | false | Automatically derive BDD/TDD/ATDD from approved specs |
+| `autoTDD` | boolean | false | Automatically enter TDD RED phase after derivation |
+| `autoCheckin` | boolean | false | Automatically commit when all quality gates pass |
+| `autoBatch` | boolean | false | Automatically batch pending changes before commit |
+### Toggle Semantics
+Each toggle controls a specific pipeline behavior:
+| Toggle | When ON | When OFF |
+|--------|---------|----------|
+| `autoSpecGeneration` | Pipeline generates spec draft from input, submits for review | Manual spec creation required |
+| `autoDerive` | Pipeline runs derivation (BDD/TDD/ATDD) after spec approval | Manual derivation via commands |
+| `autoTDD` | Pipeline sets RED state and creates test skeleton after derivation | Developer manually enters TDD |
+| `autoCheckin` | Pipeline commits after all gates pass (tests, lint, coverage) | Developer manually commits |
+| `autoBatch` | Pipeline accumulates changes and merges at threshold | Each change committed individually |
+### Configuration Example
+```json
+{
+  "pipeline": {
+    "autoSpecGeneration": true,
+    "autoDerive": true,
+    "autoTDD": true,
+    "autoCheckin": false,
+    "autoBatch": false,
+    "context": "greenfield"
+  }
+}
+```
+### Configuration Reading Rules
+1. **Fail-safe defaults**: All toggles default to OFF (manual mode)
+2. **Explicit declaration**: Pipeline MUST NOT assume toggle state without reading configuration
+3. **Runtime override**: CLI flags or environment variables MAY override file-based configuration
+4. **Validation**: Pipeline MUST validate configuration values before execution
+---
+## Pipeline Stage Model
+### Standard 6-Stage Pipeline
+```
+┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
+│  1.PLAN  │───▶│  2.SPEC  │───▶│ 3.DERIVE │───▶│ 4.BUILD  │───▶│ 5.REVIEW │───▶│6.CHECKIN │
+│ 需求分析  │    │ 規格撰寫  │    │ 測試衍生  │    │ 實作建置  │    │ 審查驗證  │    │ 提交簽入  │
+└──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
+```
+### Stage Definitions
+| Stage | Input | Output | Quality Gate |
+|-------|-------|--------|-------------|
+| **Plan** | PRD, user stories, requirements | Structured requirements document | Requirements reviewed |
+| **Spec** | Requirements | SDD specification with AC | Spec approved |
+| **Derive** | Approved spec | BDD scenarios, TDD skeletons, ATDD tables | 1:1 AC mapping verified |
+| **Build** | Test skeletons + spec | Implementation code | Tests pass (RED→GREEN) |
+| **Review** | Implementation + tests | Review feedback | Review approved |
+| **Checkin** | Approved changes | Committed code | All quality gates pass |
+### Stage Dependencies
+- Each stage's output is the next stage's input
+- Stages MUST NOT be skipped without explicit configuration
+- Failed quality gates MUST block progression to next stage
+---
+## Development Context Classification
+### Context Types
+| Context | Description | Typical Scenario |
+|---------|-------------|------------------|
+| **Greenfield** | New project or feature with no existing code | Starting a new module, new service, new product |
+| **Brownfield** | Existing codebase requiring modification | Adding features to legacy code, refactoring |
+| **Adhoc** | Small, isolated changes | Bug fixes, configuration changes, hotfixes |
+### Context Strategy Matrix
+| Stage | Greenfield | Brownfield | Adhoc |
+|-------|-----------|------------|-------|
+| **Plan** | Full requirements | Impact analysis first | Quick assessment |
+| **Spec** | Complete SDD | Delta SDD (changes only) | Optional (for significant changes) |
+| **Derive** | Full derivation | Targeted derivation | Skip (unless complex) |
+| **Build** | TDD from scratch | Modify existing + new tests | Direct fix |
+| **Review** | Full review | Focused review on changes | Quick review |
+| **Checkin** | Standard checkin | Standard checkin | Standard checkin |
+### Context Detection Heuristics
+Pipelines SHOULD auto-detect context using these signals:
+| Signal | Greenfield Indicator | Brownfield Indicator | Adhoc Indicator |
+|--------|---------------------|---------------------|-----------------|
+| File count | 0 or minimal files | Established codebase | N/A |
+| Change scope | New directory/module | Modifications to existing files | 1-3 files changed |
+| Test coverage | No existing tests | Existing test suite | Existing tests cover area |
+| Spec existence | No specs | Existing specs | May or may not have specs |
+### Context Override
+Developers can explicitly set context in configuration:
+```json
+{
+  "pipeline": {
+    "context": "brownfield"
+  }
+}
+```
+Or via CLI flag:
+```bash
+pipeline run --context=greenfield
+```
+---
+## Integration Verification
+### Pipeline Implementor Checklist
+Implementors integrating with this standard MUST verify:
+| Check | Requirement | Verification Method |
+|-------|-------------|-------------------|
+| Config reading | Pipeline reads all toggles from configuration | Unit test: mock config → verify behavior |
+| Default handling | Unset toggles default to OFF | Unit test: empty config → manual mode |
+| Stage execution | All 6 stages execute in order | Integration test: full pipeline run |
+| Gate enforcement | Failed gates block next stage | Integration test: inject failure → verify block |
+| Context awareness | Pipeline adapts to context type | Integration test: each context → verify stages |
+| Override support | CLI flags override file config | Unit test: file + flag → flag wins |
+### Validation Rules
+1. **Configuration schema**: Validate against known toggle names; warn on unknown keys
+2. **Toggle type safety**: All toggles MUST be boolean; reject non-boolean values
+3. **Context enum**: Context MUST be one of: `greenfield`, `brownfield`, `adhoc`
+4. **Stage completeness**: Pipeline MUST report which stages were executed and which were skipped
+---
+## Anti-Patterns
+| Anti-Pattern | Impact | Correct Approach |
+|--------------|--------|------------------|
+| Hardcoding pipeline behavior | Cannot adapt to project needs | Read configuration at runtime |
+| Ignoring context type | Wrong stages executed | Detect or read context setting |
+| Skipping quality gates | Broken code enters codebase | Enforce gates at each stage |
+| All-or-nothing automation | Users avoid pipeline entirely | Allow per-toggle granular control |
+| Silent stage skipping | Lost traceability | Log and report all skip decisions |
+---
+## Best Practices
+### Do's
+- ✅ Read configuration before executing any stage
+- ✅ Default all toggles to OFF (safe defaults)
+- ✅ Log which toggles are active at pipeline start
+- ✅ Report stage execution status (executed/skipped/failed)
+- ✅ Allow granular toggle control per stage
+- ✅ Validate configuration schema before use
+- ✅ Support configuration override via CLI
+### Don'ts
+- ❌ Assume toggle state without reading configuration
+- ❌ Skip stages silently without logging
+- ❌ Ignore quality gate failures
+- ❌ Hardcode pipeline behavior
+- ❌ Mix context strategies (e.g., greenfield spec + adhoc build)
+---
+## Related Standards
+- [Spec-Driven Development](spec-driven-development.md) — Spec stage workflow
+- [Forward Derivation Standards](forward-derivation-standards.md) — Derive stage implementation
+- [Check-in Standards](checkin-standards.md) — Checkin stage quality gates
+- [Change Batching Standards](change-batching-standards.md) — Batch merging before checkin
+- [Acceptance Criteria Traceability](acceptance-criteria-traceability.md) — AC tracking across stages
+---
+## Version History
+| Version | Date | Changes |
+|---------|------|---------|
+| 1.0.0 | 2026-03-18 | Initial version — configuration contract, 6-stage pipeline model, context classification |

package/bundled/core/systematic-debugging.md ADDED Viewed

@@ -0,0 +1,156 @@
+# Systematic Debugging Workflow
+> **Language**: English | [繁體中文](../locales/zh-TW/core/systematic-debugging.md)
+**Version**: 1.0.0
+**Last Updated**: 2026-03-20
+**Applicability**: All software projects using AI-assisted development
+**Scope**: universal
+**Inspired by**: [Superpowers](https://github.com/obra/superpowers) — systematic-debugging (MIT)
+---
+## Purpose
+Define a structured, four-phase debugging workflow that prevents the common anti-pattern of "guess and fix" cycles. This standard enforces root cause analysis before any fix attempt and includes the **3-Strike Rule** to catch architectural issues early.
+本標準定義結構化的四階段除錯工作流，防止常見的「猜測修復」反模式。強制要求在嘗試修復前先進行根因分析，並包含 **3-Strike Rule** 以提早發現架構問題。
+---
+## Glossary
+| Term | Definition |
+|------|-----------|
+| Root Cause | The fundamental reason a defect exists, not merely the symptom |
+| 3-Strike Rule | After 3 consecutive failed fix attempts, suspect an architectural issue |
+| Backward Tracing | Tracing from error symptom back to the originating source |
+| Component Boundary | The interface between two modules or subsystems |
+---
+## Core Principle — The Iron Rule
+> **Never skip root cause analysis to jump directly to a fix.**
+禁止跳過根因分析直接修復。
+If you catch yourself saying "quick fix", "just try", or "should work now" — **stop** and return to Phase 1.
+---
+## The Four Phases
+### Phase 1: Root Cause Investigation（根因調查）
+Analyze error messages and track recent changes to form an initial hypothesis.
+1. **Read the error carefully** — identify the exact location and type of failure
+2. **Track recent changes** — use `git log`, `git diff`, `git blame` to find what changed
+3. **Add diagnostics at component boundaries** — insert logging / breakpoints at interfaces between modules
+4. **Record observations** — document what you see and form an initial hypothesis
+分析錯誤訊息並追蹤最近變更，形成初步假設。
+### Phase 2: Pattern Analysis（模式分析）
+Compare the failing code against similar working implementations.
+1. **Search for similar successful implementations** in the codebase
+2. **Identify differences** between the failing and working code
+3. **Check for missing preconditions** — initialization, configuration, ordering
+比對失敗程式碼與類似的成功實作，找出差異。
+### Phase 3: Hypothesis Testing（假設測試）
+Validate your hypothesis with minimal, isolated changes.
+1. **Make the smallest possible change** to test your hypothesis
+2. **Change only one variable at a time** — never make multiple changes simultaneously
+3. **Record each attempt** — document the hypothesis, the change, and the result
+以最小化修改驗證假設，每次只改一個變數。
+### Phase 4: Fix Implementation（修復實作）
+Only implement the fix after confirming the root cause.
+1. **Confirm the root cause** is validated by your hypothesis testing
+2. **Implement the fix** with proper error handling
+3. **Run the full test suite** — ensure no regressions
+4. **Verify the fix doesn't introduce new issues**
+確認根因後才實作修復，並執行完整測試套件驗證。
+---
+## The 3-Strike Rule
+> After **3 consecutive failed fix attempts**, stop guessing and question the architectural design.
+連續 3 次修復失敗後，必須停止猜測並質疑架構設計。
+When the 3-Strike Rule triggers:
+1. Step back from the specific bug
+2. Review the overall module/component design
+3. Consider whether the architecture supports the intended behavior
+4. Look for design flaws that make the bug class possible
+5. Consider refactoring before attempting another fix
+---
+## Rules
+| ID | Trigger | Action | Priority |
+|----|---------|--------|----------|
+| SD-001 | 3 consecutive failed fixes | Stop guessing, question architectural design | Critical |
+| SD-002 | Phrases like "quick fix", "just try", "should work now" | Return to Phase 1 root cause analysis | High |
+| SD-003 | Multi-component interaction error | Add diagnostic observations at every component boundary | Medium |
+---
+## Anti-Patterns to Avoid
+| Anti-Pattern | Why It's Harmful |
+|-------------|-----------------|
+| Shotgun debugging | Random changes waste time and may introduce new bugs |
+| Copy-paste fix from Stack Overflow | Without understanding, the fix may not address the actual root cause |
+| Suppressing the error | Hiding symptoms doesn't fix the problem |
+| "It works on my machine" | Environment differences are clues, not dismissals |
+---
+## Examples
+### Good: Structured Debugging
+```
+Phase 1: TypeError at line 42 in parser.ts — reading property of undefined
+         git log shows parser.ts was modified 2 commits ago
+         Added console.log at module boundary → input is null when empty array passed
+Phase 2: Similar parser in legacy/ handles empty arrays with early return
+Phase 3: Added early return for empty array → test passes
+         Changed only one thing, recorded result
+Phase 4: Implemented fix, ran full test suite (47/47 pass), no regressions
+```
+### Bad: Guess-and-Fix
+```
+"Hmm, let me try adding a null check... nope.
+ Maybe wrapping in try-catch... still broken.
+ Let me just change the type to any... 🤦"
+```
+---
+## References
+- **Superpowers**: [systematic-debugging](https://github.com/obra/superpowers) (MIT)
+- **The Pragmatic Programmer**: Chapter on Debugging
+- **Debugging by Thinking**: A Multidisciplinary Approach (Robert Charles Metzger)

package/bundled/core/testing-standards.md CHANGED Viewed

@@ -2,8 +2,8 @@
 > **Language**: English | [繁體中文](../locales/zh-TW/core/testing-standards.md)
-**Version**: 3.0.0
-**Last Updated**: 2026-01-29
+**Version**: 3.1.0
+**Last Updated**: 2026-03-24
 **Applicability**: All software projects
 **Scope**: universal
 **Industry Standards**: ISTQB CTFL v4.0, ISO/IEC/IEEE 29119
@@ -38,40 +38,62 @@ This standard defines actionable testing rules and conventions for AI agents and
 ---
+## Coverage Targets (Primary Metric)
+> **Coverage is the primary metric for test quality.** Higher coverage means more code is protected by tests.
+> **覆蓋率是測試品質的主要指標。** 更高的覆蓋率代表更多程式碼受到測試保護。
+| Metric | Minimum | Standard | Ideal |
+|--------|---------|----------|-------|
+| **Line Coverage** | 80% | 90% | 95%+ |
+| **Branch Coverage** | 70% | 85% | 90%+ |
+| **Function Coverage** | 85% | 95% | 100% |
+| **Mutation Score** | — | 80% | 90%+ (critical code) |
+**Level definitions:**
+- **Minimum**: Baseline for all projects — below this is a quality risk
+- **Standard**: Target for most projects — achievable with disciplined testing
+- **Ideal**: Target for critical systems and core business logic — strive for 100% where practical
+> **Practical guidance**: 100% coverage is the ideal goal. In practice, diminishing returns appear around 95%+ for line coverage. Focus the last 5% on critical paths (authentication, payment, data integrity) rather than generated code or trivial getters/setters.
+---
+## Coverage vs Ratio — Key Distinction
+> **AI agents and developers: do NOT confuse these two concepts.**
+| Concept | Meaning | Importance |
+|---------|---------|------------|
+| **Coverage（覆蓋率）** | Percentage of code executed by tests | **Primary metric** — measures protection |
+| **Ratio（佔比）** | Distribution of test count across levels | Reference only — affects execution time |
+**Coverage** answers: "How much of my code is tested?"
+**Ratio** answers: "What proportion of my tests are unit vs integration vs E2E?"
+---
 ## Testing Framework Selection
 | Framework | Levels | Best For |
 |-----------|--------|----------|
 | **ISTQB** | UT → IT/SIT → ST → AT/UAT | Enterprise, compliance, formal QA |
-| **Industry Pyramid** | UT (70%) → IT (20%) → ST (7%) → E2E (3%) | Agile, DevOps, CI/CD |
+| **Industry Pyramid** | UT → IT → ST → E2E | Agile, DevOps, CI/CD |
 ---
-## Testing Pyramid (Default Ratios)
+## Testing Pyramid (Test Count Ratio — Reference Only)
-```
-                ┌───────┐
-                │  E2E  │  ←  3% (Slow, expensive)
-               ─┴───────┴─
-              ┌───────────┐
-              │    ST     │  ←  7% (System Testing)
-             ─┴───────────┴─
-            ┌─────────────┐
-            │     IT      │  ← 20% (Integration Testing)
-           ─┴─────────────┴─
-          ┌─────────────────┐
-          │       UT        │  ← 70% (Unit Testing - Foundation)
-          └─────────────────┘
-```
+> **Note**: These are test **count** ratios (how many tests at each level), NOT coverage targets. See [Coverage Targets](#coverage-targets-primary-metric) above for coverage requirements.
-> **Note**: The 70/20/7/3 ratio is an empirical recommendation (Mike Cohn), not a mandatory standard.
+| Level | Test Count Ratio | Execution Time Target |
+|-------|-----------------|----------------------|
+| Unit Testing (UT) | ~70% of tests | < 10 min total |
+| Integration Testing (IT) | ~20% of tests | < 30 min total |
+| System Testing (ST) | ~7% of tests | < 1 hour total |
+| E2E Testing | ~3% of tests | < 2 hours total |
-| Level | Percentage | Execution Time Target |
-|-------|------------|----------------------|
-| Unit Testing (UT) | 70% | < 10 min total |
-| Integration Testing (IT) | 20% | < 30 min total |
-| System Testing (ST) | 7% | < 1 hour total |
-| E2E Testing | 3% | < 2 hours total |
+> The 70/20/7/3 ratio is an empirical recommendation (Mike Cohn). It optimizes for fast feedback — most tests run quickly (UT), fewer tests run slowly (E2E).
 ---
@@ -110,11 +132,7 @@ This standard defines actionable testing rules and conventions for AI agents and
 #### Coverage Thresholds
-| Metric | Minimum | Recommended |
-|--------|---------|-------------|
-| Line Coverage | 70% | 85% |
-| Branch Coverage | 60% | 80% |
-| Function Coverage | 80% | 90% |
+> See [Coverage Targets](#coverage-targets-primary-metric) at the top of this document for the authoritative coverage requirements.
 ---
@@ -429,15 +447,6 @@ tests/
 ---
-## Coverage Targets Summary
-| Metric | Minimum | Recommended |
-|--------|---------|-------------|
-| Line | 70% | 85% |
-| Branch | 60% | 80% |
-| Function | 80% | 90% |
-| Mutation Score | - | >= 80% (critical code) |
 ---
 ## Related Standards
@@ -456,6 +465,7 @@ tests/
 | Version | Date | Changes |
 |---------|------|---------|
+| 3.1.0 | 2026-03-24 | **Coverage-first restructure**: Elevated Coverage Targets to primary position, raised thresholds (Line 80/90/95+, Branch 70/85/90+, Function 85/95/100), added Coverage vs Ratio distinction, demoted Testing Pyramid to reference-only |
 | 3.0.0 | 2026-01-29 | **Major refactor**: Split into Rules (this file) and Theory (testing-theory.md). Reduced from 141KB/3185 lines to ~12KB/350 lines. All educational content moved to skills/testing-guide/testing-theory.md. Rules-only format optimized for AI agent consumption. |
 | 2.2.0 | 2026-01-20 | Added Test Documentation Structure section |
 | 2.1.0 | 2026-01-05 | Added SWEBOK v4.0 reference, Testing Fundamentals, Test-Related Measures |

package/bundled/core/verification-evidence.md ADDED Viewed

@@ -0,0 +1,172 @@
+# Verification Evidence Standard
+> **Language**: English | [繁體中文](../locales/zh-TW/core/verification-evidence.md)
+**Version**: 1.0.0
+**Last Updated**: 2026-03-20
+**Applicability**: All AI-assisted development workflows
+**Scope**: universal
+**Inspired by**: [Superpowers](https://github.com/obra/superpowers) — verification-before-completion (MIT)
+---
+## Purpose
+Establish an "Iron Law" that no task can be claimed as complete without verification evidence. This standard prevents AI agents from hallucinating success and ensures every completion claim is backed by executable proof.
+建立「鐵律」：無驗證證據不可聲稱完成。防止 AI 代理虛構成功結果，確保每個完成聲明都有可執行的證據支持。
+---
+## Glossary
+| Term | Definition |
+|------|-----------|
+| Verification Evidence | A structured record of a verification command's execution and result |
+| Iron Law | The absolute rule: no evidence = no completion claim |
+| RED-GREEN Cycle | Proving a bug fix by showing the test fails before and passes after the fix |
+| Exit Code | The numeric return value of a command (0 = success, non-zero = failure) |
+---
+## The Iron Law
+> **No verification evidence = no completion claim.**
+無驗證證據 = 不可聲稱完成。
+An agent saying "it's done" is not evidence. The verification must be independently executable and produce observable output.
+代理聲稱「已完成」不是證據。驗證必須是可獨立執行且產生可觀察輸出的。
+---
+## Evidence Format
+Every verification must produce a structured evidence record:
+```json
+{
+  "command": "pnpm test -- --filter core",
+  "exit_code": 0,
+  "output": "Tests: 47 passed, 0 failed\nDuration: 3.2s",
+  "timestamp": "2026-03-20T14:30:00Z"
+}
+```
+### Required Fields
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `command` | string | Yes | The actual verification command executed |
+| `exit_code` | number | Yes | Command exit code (0 = success) |
+| `output` | string | Yes | Command output (truncated to 2000 chars, preserving key info) |
+| `timestamp` | string | Yes | Execution time in ISO 8601 format |
+---
+## RED-GREEN Cycle
+For bug fixes and regression tests, verification requires showing both the failure and the fix:
+### Step 1: RED — Prove the Bug Exists
+Run the test **before** the fix to confirm it fails:
+```json
+{
+  "command": "pnpm test -- parser.test.ts",
+  "exit_code": 1,
+  "output": "FAIL: expected null to equal { name: 'test' }",
+  "timestamp": "2026-03-20T14:25:00Z"
+}
+```
+### Step 2: Apply the Fix
+Make the code change.
+### Step 3: GREEN — Prove the Fix Works
+Run the test **after** the fix to confirm it passes:
+```json
+{
+  "command": "pnpm test -- parser.test.ts",
+  "exit_code": 0,
+  "output": "PASS: 12 tests passed",
+  "timestamp": "2026-03-20T14:28:00Z"
+}
+```
+### Step 4: Record Both
+The evidence record must include both RED and GREEN phases.
+回歸測試必須展示 RED → GREEN 循環，兩個階段的證據都必須記錄。
+---
+## Trust Rules
+| Rule | Description |
+|------|-------------|
+| Agent says "done" but no `verification_evidence` | Mark as **unverified** |
+| `verification_evidence` exists but `exit_code ≠ 0` | Mark as **verification failed** |
+| Multiple verification steps | **All** steps must pass |
+| Agent provides evidence for wrong command | Mark as **unverified** |
+---
+## Rules
+| ID | Trigger | Action | Priority |
+|----|---------|--------|----------|
+| VE-001 | Agent reports success without verification_evidence | Downgrade to `done_with_concerns` | Critical |
+| VE-002 | `exit_code ≠ 0` in evidence | Mark verification failed, trigger fix loop | High |
+| VE-003 | Bug fix without RED-GREEN cycle | Request both RED and GREEN evidence | High |
+| VE-004 | Output exceeds 2000 chars | Truncate but preserve error messages and summary lines | Medium |
+---
+## Output Truncation Guidelines
+When verification output exceeds 2000 characters:
+1. **Keep**: Error messages, failure summaries, test counts, final status line
+2. **Remove**: Verbose progress output, stack traces for passing tests, duplicate lines
+3. **Mark truncation**: Add `[... truncated ...]` where content was removed
+---
+## Examples
+### Good: Complete Evidence
+```yaml
+verification_evidence:
+  - command: "pnpm test"
+    exit_code: 0
+    output: "Test Suites: 12 passed\nTests: 147 passed\nTime: 8.3s"
+    timestamp: "2026-03-20T14:30:00Z"
+  - command: "pnpm lint"
+    exit_code: 0
+    output: "No issues found"
+    timestamp: "2026-03-20T14:30:05Z"
+```
+### Bad: No Evidence
+```yaml
+status: success
+message: "I've completed the task and everything should work now."
+# ❌ No verification_evidence — violates Iron Law
+```
+---
+## References
+- **Superpowers**: [verification-before-completion](https://github.com/obra/superpowers) (MIT)
+- **Test-Driven Development**: RED-GREEN-REFACTOR cycle
+- **Anti-Hallucination**: Complementary standard for preventing fabricated claims