npm - universal-dev-standards - Versions diffs - 5.4.0 → 5.6.0 - Mend

universal-dev-standards 5.4.0 → 5.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (138) hide show

package/bundled/core/mock-boundary.md ADDED Viewed

@@ -0,0 +1,100 @@
+# Mock Boundary Standards
+**Version**: 1.0.0
+**Last Updated**: 2026-05-04
+**Applicability**: All software projects with unit and integration tests
+**Scope**: universal
+**Industry Standards**: ISTQB Foundation (Test Doubles), xUnit Patterns (Gerard Meszaros)
+**References**: "Working Effectively with Legacy Code" (Feathers), "Growing Object-Oriented Software" (Freeman & Pryce)
+[English](.) | [繁體中文](../locales/zh-TW/core/mock-boundary.md)
+---
+## Purpose
+This document defines rules for what can and cannot be mocked in tests. Its goal is to prevent **hollow tests** — tests that always pass but fail to detect real bugs because they replace the system's logic with stubs.
+---
+## The Hollow Test Problem
+A hollow test mocks so much of the system that the test becomes a specification of mock wiring rather than system behavior. The classic symptom: you can delete the implementation file and the test still passes.
+**Real example (VibeOps SPEC-002.test.ts)**:
+```typescript
+vi.mock('../../src/runner/agent-runner.js')      // Core logic replaced
+vi.mock('../../src/runner/guardian-hooks.js')     // Core logic replaced
+vi.mock('../../src/runner/prototyper.js')         // Core logic replaced
+vi.mock('../../src/runner/iteration-report.js')   // Core logic replaced
+vi.mock('../../src/memory/memory-store.js')       // Core logic replaced
+vi.mock('node:fs/promises', ...)                  // I/O replaced
+// All assertions verify mock call counts — not actual outputs.
+// runPipeline() touches zero real code.
+```
+---
+## What You CAN Mock
+| Category | Examples | Reason |
+|----------|----------|--------|
+| External HTTP services | LLM APIs, payment gateways, email services | Prevents flaky tests; controls response scenarios |
+| Time functions | `Date.now()`, `new Date()`, `setTimeout` | Makes tests deterministic |
+| Environment variables | `process.env.NODE_ENV`, `process.env.LICENSE_KEY` | Enables config variation |
+| File system (unit tests only) | `fs.readFile`, `fs.writeFile` | Avoids I/O in fast unit tests |
+| Cross-module boundaries (with IT counterpart) | Other modules' public APIs | Isolates unit under test |
+---
+## What You CANNOT Mock
+| Category | Example Violation | Why Forbidden |
+|----------|-------------------|---------------|
+| Own module's core logic | `vi.mock('./pipeline-runner.js')` in pipeline-runner tests | Makes the test a no-op |
+| Database in IT/flow/E2E tests | `vi.mock('./db/client.js')` in integration tests | Hides query bugs, schema issues |
+| HTTP framework internals | `vi.mock('express')` | Real routing may be broken |
+| Security controls | Always-pass auth middleware stub | Security regressions invisible |
+---
+## Hollow Test Detection
+Before submitting a test file, check:
+1. **Mock count ≥ import count** → Review: at least one assertion must verify actual output
+2. **All assertions are `.toHaveBeenCalled()` variants** → Add output-value assertions
+3. **Mock path matches test subject directory** → Self-referential mock; remove it
+4. **More mock setup lines than assertion lines** → Likely hollow
+---
+## Anti-Patterns
+- **Total Mock Isolation**: Every import mocked; only mock interactions asserted
+- **Mock the World**: External + internal + DB + FS all mocked in one test
+- **Orphan Mock**: Cross-module mock with no integration test counterpart
+- **Security Bypass Mock**: Auth/permission logic replaced with pass-through stub
+- **Database Mock Cascade**: DB returns hardcoded data, hiding real query errors
+---
+## Rules Summary
+| Rule | Trigger | Action |
+|------|---------|--------|
+| No self-mock | Test file mocks its own module | Remove mock; let real code run |
+| Real DB in IT/flow | Writing IT or flow test | Use in-memory SQLite or test schema |
+| IT counterpart | Mocking cross-module boundary | Ensure corresponding IT exists |
+| No security mock | Test involves auth/permissions | Use real test user + real token |
+| Hollow review | Mock count ≥ import count | Add output-value assertion |
+---
+## Relationship to Other Standards
+- **testing**: Mock boundary rules apply to all test levels in the testing pyramid
+- **test-completeness-dimensions**: Dimension 8 (AI Test Quality) references these rules
+- **flow-based-testing**: Flow tests must follow mock boundary rules

package/bundled/core/mutation-testing.md ADDED Viewed

@@ -0,0 +1,97 @@
+# Mutation Testing Standards
+**Version**: 1.0.0
+**Last Updated**: 2026-05-04
+**Applicability**: All software projects with unit/integration tests
+**Scope**: universal
+**Industry Standards**: ISTQB Foundation Syllabus (test effectiveness metrics)
+**References**: "Introduction to Software Testing" (Ammann & Offutt), Stryker Mutator docs
+[English](.) | [繁體中文](../locales/zh-TW/core/mutation-testing.md)
+---
+## Purpose
+Mutation testing evaluates test suite effectiveness by injecting artificial bugs and checking whether tests detect them. It answers the question that line coverage cannot: **"Do my tests actually verify correct behavior?"**
+---
+## Key Concept: Mutation Score
+```
+Mutation Score = Killed Mutants / (Killed + Survived) × 100%
+```
+- **Killed**: Test suite detected the artificial bug (test failed) ✅
+- **Survived**: Test suite missed the bug (tests still pass) ❌
+A test with `expect(x).toBeDefined()` can achieve 100% line coverage but survive many mutations (because `x` being `null`, `0`, or `"wrong"` all satisfy `.toBeDefined()`).
+---
+## Tools
+| Language | Tool | Command |
+|----------|------|---------|
+| TypeScript/JS | Stryker Mutator | `npx stryker run` |
+| Python | mutmut | `mutmut run` |
+| Java | PIT (Pitest) | `mvn pitest:mutationCoverage` |
+---
+## Thresholds
+| Module Type | Minimum Score | Enforcement |
+|-------------|--------------|-------------|
+| Auth/License/Payment/Security | 80% | Block release |
+| Standard business logic | 70% | Warning; resolve before next release |
+| AI-generated tests | 50% | Required; reject if below |
+| Overall project | 60% | Track trend; alert on regression |
+---
+## When to Run
+| Trigger | Command | Enforcement |
+|---------|---------|-------------|
+| Pre-release gate | `npm run test:mutation` | ≥ 60% overall |
+| Critical module change | `npx stryker run --mutate 'src/auth/**'` | ≥ 80% |
+| AI-generated test review | `npx stryker run` | ≥ 50% |
+**Never** add mutation testing to commit hooks — it's too slow (10-60 minutes).
+---
+## Stryker Quick Start (TypeScript + Vitest)
+```bash
+npm install --save-dev @stryker-mutator/core @stryker-mutator/vitest-runner
+```
+```json
+// stryker.config.json
+{
+  "testRunner": "vitest",
+  "coverageAnalysis": "perTest",
+  "mutate": ["src/license/**/*.ts", "!src/**/*.test.ts"],
+  "thresholds": { "high": 80, "low": 60, "break": 50 }
+}
+```
+---
+## Anti-Patterns
+- Treating line coverage as a proxy for test effectiveness
+- Adding mutation testing to CI for every PR (too slow)
+- Accepting AI-generated tests without mutation score validation
+- Killing mutations by adding `toBeDefined()` assertions
+---
+## Relationship to Other Standards
+- `test-completeness-dimensions`: Dimension 8 (AI Test Quality) references mutation score
+- `mock-boundary`: Hollow tests survive many mutations; mock boundary rules prevent hollow tests
+- `testing`: Mutation testing is the quality gate on top of the test pyramid

package/bundled/core/performance-standards.md CHANGED Viewed

@@ -323,6 +323,70 @@ Analogous to the SRE Error Budget concept, a Performance Budget defines the tole
 ---
+## Per-Release Capacity Sign-off
+This section defines the **capacity gate** that must be satisfied before production release (Dimension 10 in `release-readiness-gate.md`, Tier-3).
+### Capacity Forecast
+Before each release candidate, produce a capacity forecast based on:
+1. **Baseline**: 90-day rolling average of peak TPS and resource utilization (CPU, memory, DB connections, storage growth rate)
+2. **Release impact estimate**: expected traffic delta from new features (e.g., +15% TPS from new notification flow)
+3. **Seasonal adjustment**: any known traffic spikes within the next 30 days (marketing campaigns, seasonal peaks)
+### Headroom Thresholds
+| Metric | Target (PASS) | Warn Band | Fail Threshold |
+|--------|--------------|-----------|----------------|
+| CPU headroom at projected peak | ≥ 30% | 20–30% | < 20% |
+| Memory headroom | ≥ 25% | 15–25% | < 15% |
+| DB connection pool headroom | ≥ 40% | 25–40% | < 25% |
+| p99 latency vs baseline | ≤ +5% | +5% to +10% | > +10% regression |
+| Error rate at peak load | < 0.1% | 0.1–0.5% | > 0.5% |
+### Load Test Requirement
+Run the load test scenario defined in the Performance Testing sections above (Soak + Spike test minimum) before finalizing the capacity sign-off:
+```bash
+# Example: k6 capacity verification run
+k6 run --vus 500 --duration 20m scripts/perf/soak-test.js
+# Pass criterion: headroom metrics above, p99 within budget
+```
+### Sign-off Evidence
+The capacity gate requires **two named sign-offs** — both Engineering Lead and SRE Lead:
+```markdown
+## Capacity Sign-off — <version>
+**Projection date**: YYYY-MM-DD
+**Baseline period**: last 90 days
+| Metric | Baseline peak | Projected peak | Headroom | Status |
+|--------|-------------|---------------|----------|--------|
+| CPU | [X]% | [Y]% | [Z]% | PASS/WARN/FAIL |
+| Memory | [X]% | [Y]% | [Z]% | PASS/WARN/FAIL |
+| DB pool | [X]% | [Y]% | [Z]% | PASS/WARN/FAIL |
+| p99 latency | [X]ms | [Y]ms | [±Z]% | PASS/WARN/FAIL |
+**Load test artifact**: [link to load test report]
+**Eng Lead sign-off**: _______________ Date: __________
+**SRE Lead sign-off**: _______________ Date: __________
+```
+### When Tier-3 Applies as N/A
+The capacity sign-off is `N/A` (with documented rationale) when:
+- Project has < 100 DAU and no significant traffic growth expected
+- Internal tooling with fixed user count
+- Static content / documentation site
+---
 ## Related Standards
 - [Testing Standards](testing-standards.md) - Performance testing integration
@@ -330,6 +394,7 @@ Analogous to the SRE Error Budget concept, a Performance Budget defines the tole
 - [Logging Standards](logging-standards.md) - Performance logging
 - [Code Review Checklist](code-review-checklist.md) - Performance review
 - [Deployment Standards](deployment-standards.md) - Performance validation pre-deployment
+- [Release Readiness Gate](release-readiness-gate.md) - Dimension 1 (load) and Dimension 10 (capacity)
 ---

package/bundled/core/policy-as-code-testing.md ADDED Viewed

@@ -0,0 +1,188 @@
+# Policy as Code 測試標準
+> 標準 ID：`policy-as-code-testing`
+> 版本：v1.0.0
+> 最後更新：2026-05-05
+---
+## 為什麼需要測試 Policy as Code？
+OPA（Open Policy Agent）的 Rego policy 控制 AI Agent 能否執行生產環境操作。**未測試的 policy = 靜默的安全漏洞。**
+Policy as Code 的特殊風險：
+1. **邊界條件難以推理**：`reversible: false` + `target_env: "prod"` 組合是否觸發？
+2. **型別錯誤只在執行時爆發**：`array.concat()` 用在 set 型別 → 靜默失效
+3. **Fail-Open 風險**：評估失敗若回傳 `allow: true`，攻擊者可觸發未定義路徑
+4. **Policy 改動回歸**：新增一條 rule 可能意外放行原本被擋的案例
+---
+## 一、OPA 測試框架
+### 測試規則格式
+```rego
+# 檔案命名：<policy_module>_test.rego
+# Package：<policy_package>_test
+package vibeops.guardian.forbidden_patterns_test
+import future.keywords.if
+# 正向測試：規則應觸發（assert rule fires）
+test_drop_database_is_forbidden if {
+    data.vibeops.guardian.forbidden_patterns.has_forbidden_pattern with input as {
+        "plan": [{"command_type": "sql", "command": "DROP DATABASE prod_main", "reversible": false}]
+    }
+}
+# 負向測試：規則不應觸發（assert rule does NOT fire）
+test_safe_select_is_not_forbidden if {
+    not data.vibeops.guardian.forbidden_patterns.has_forbidden_pattern with input as {
+        "plan": [{"command_type": "sql", "command": "SELECT * FROM users LIMIT 10", "reversible": true}]
+    }
+}
+```
+### 執行方式
+```bash
+# OPA 已安裝時
+opa test src/guardian/policies/ -v
+# 透過 Docker（不需安裝 OPA）
+docker run --rm \
+  -v "$(pwd)/src/guardian/policies:/policies:ro" \
+  openpolicyagent/opa:latest-static \
+  test /policies -v
+```
+---
+## 二、每個 Policy Module 的最低測試要求
+| 類型 | 最少案例 | 說明 |
+|------|---------|------|
+| ALLOW cases | 2 | 應該通過的正常操作 |
+| DENY cases | 3 | 應該被攔截的危險操作 |
+| Boundary cases | 1 | 邊界條件（如 reversible=true vs. false）|
+| Integration（main policy）| 2 | 整合 main.rego 的允許 + 拒絕路徑 |
+---
+## 三、Policy Module 設計原則
+### 3.1 Fail-Closed 預設
+```rego
+# main.rego 必須包含以下預設
+default allow = false
+allow if {
+    not data.vibeops.guardian.forbidden_patterns.has_forbidden_pattern
+    not data.vibeops.guardian.env_policy.prod_violation
+    not data.vibeops.guardian.logic_constraints.has_logic_violation
+}
+```
+任何 `undefined` 評估結果都應回傳 DENY，不能回傳 ALLOW。
+### 3.2 使用 Set（不要 array.concat）
+OPA ≥ 0.40 的型別系統嚴格區分 array 和 set。`violations` partial rule 是 set 型別，**不可用 `array.concat()`**。
+```rego
+# ✅ 正確：partial set rule 集合 violations
+deny_reasons[r] if { r := data.vibeops.guardian.forbidden_patterns.violations[_] }
+deny_reasons[r] if { r := data.vibeops.guardian.env_policy.violations[_] }
+deny_reasons[r] if { r := data.vibeops.guardian.logic_constraints.violations[_] }
+# ❌ 錯誤：array.concat 用在 set 上 → rego_type_error
+# deny_reasons := array.concat(violations1, violations2)
+```
+### 3.3 禁止解析自由文字欄位
+Policy 決策**不得依賴** `intent`、`description`、`annotation` 等使用者可控文字欄位。
+```rego
+# ❌ 危險：解析 intent 欄位 → Prompt Injection 攻擊面（OWASP LLM01）
+allow if { contains(input.intent, "EMERGENCY") }
+# ✅ 安全：只使用結構化欄位
+allow if {
+    input.target_env != "prod"
+    every step in input.plan { step.reversible == true }
+}
+```
+### 3.4 一個 Module 管一個關注點
+```
+policies/
+  forbidden_patterns.rego      ← 禁止指令模式
+  forbidden_patterns_test.rego
+  env_policy.rego              ← 環境特定規則（prod 保護）
+  env_policy_test.rego
+  logic_constraints.rego       ← 邏輯一致性（stop+start 用 restart）
+  logic_constraints_test.rego
+  risk_gate.rego               ← 風險分數閾值
+  risk_gate_test.rego
+  main.rego                    ← 整合所有 module，Fail-Closed
+  main_test.rego               ← 整合測試
+```
+---
+## 四、CI 整合
+### GitHub Actions 步驟
+```yaml
+- name: Test OPA Rego Policies
+  run: |
+    docker run --rm \
+      -v "${{ github.workspace }}/src/guardian/policies:/policies:ro" \
+      openpolicyagent/opa:latest-static \
+      test /policies -v
+```
+### npm script
+```json
+{
+  "test:policy": "docker run --rm -v \"$(pwd)/src/guardian/policies:/policies:ro\" openpolicyagent/opa:latest-static test /policies -v"
+}
+```
+---
+## 五、品質閘門
+| 閘門 | 閾值 | 強制程度 |
+|------|------|---------|
+| OPA 測試通過率（CI） | 100%（所有 test_ rule 通過）| Block merge |
+| Root policy Fail-Closed | `default allow = false` 存在 | Block merge |
+| 每個 policy module 有 _test.rego | 每個 .rego 有對應測試 | Advisory |
+---
+## 六、反模式（Anti-patterns）
+| 反模式 | 問題 | 正確做法 |
+|--------|------|---------|
+| `array.concat()` 用在 violations（set 型）| OPA 型別錯誤 | 改用 partial set rule |
+| Root policy 缺少 `default allow = false` | Fail-Open 漏洞 | 加入 default |
+| Intent 欄位參與安全決策 | Prompt Injection 攻擊面 | 只用結構化欄位 |
+| 只測試 DENY（無 ALLOW 測試）| 無法偵測過度限制 | 加入 ALLOW 案例 |
+| _test.rego 只在本機跑，不在 CI | policy 改動無安全網 | CI 加 `opa test` step |
+---
+## 參考標準
+- [OPA Testing Guide](https://www.openpolicyagent.org/docs/latest/policy-testing/)
+- NIST SP 800-204C — Attribute-based Access Control
+- [UDS `secure-op.ai.yaml`](./secure-op.md) — AI Agent 安全操作六大支柱
+- [UDS `adversarial-test.ai.yaml`](./adversarial-test.md) — 對抗性測試（OWASP LLM01）
+- [UDS `container-security.ai.yaml`](./container-security.md) — 容器安全（OPA Sidecar 部署）

package/bundled/core/prompt-regression.md ADDED Viewed

@@ -0,0 +1,72 @@
+# Prompt Regression Standards
+## Overview
+AI agent prompts are code. Unintended changes silently degrade agent behaviour without triggering type errors or unit test failures. Prompt regression tests use golden SHA-256 checksums to detect any modification, forcing developers to explicitly acknowledge and document prompt changes.
+## Why Checksums
+- Diffs alone don't block CI — checksums do
+- Prompts are large markdown files; minor edits (whitespace, punctuation) can shift model behaviour
+- Checksum update + comment creates an audit trail of why each prompt changed
+## Implementation
+### 1. Compute Initial Checksums
+```bash
+for f in agents/*/prompt.md; do
+  echo -n "$f: "
+  sha256sum "$f" | cut -d' ' -f1
+done
+```
+### 2. Golden Checksum Test (Vitest)
+```typescript
+// SPDX-License-Identifier: AGPL-3.0-only
+import { createHash } from "crypto"
+import { readFileSync } from "fs"
+import { join } from "path"
+import { describe, it, expect } from "vitest"
+// Update these values ONLY when prompt changes are intentional.
+// Add a comment on the same line explaining WHY the prompt changed.
+const GOLDEN_CHECKSUMS: Record<string, string> = {
+  architect: "98017d39b0e48cda88b796687d21e0f884c810805e534453a23b7ad935e4a5ef",
+  builder:   "5c2acda3e48dae771c61f55d3a5b0d5ac7383870054ef71e757714e367c50031",
+  // ... all agents
+}
+describe("Agent prompt regression (XSPEC-162)", () => {
+  for (const [agent, expected] of Object.entries(GOLDEN_CHECKSUMS)) {
+    it(`agents/${agent}/prompt.md checksum matches golden`, () => {
+      const filePath = join(__dirname, "..", "..", "agents", agent, "prompt.md")
+      const content = readFileSync(filePath)
+      const actual = createHash("sha256").update(content).digest("hex")
+      expect(actual, `Prompt for '${agent}' changed unexpectedly. If intentional, update GOLDEN_CHECKSUMS with a comment.`).toBe(expected)
+    })
+  }
+})
+```
+### 3. CI Integration
+The checksum test runs as part of the standard `npm run test:coverage` gate (already enforced via XSPEC-156). No additional CI step needed.
+### 4. Updating Checksums
+When a prompt change is intentional:
+```typescript
+// BEFORE:
+architect: "98017d39...",  // updated 2026-05-05: added Guardian policy XSPEC-160 reference
+```
+The comment is mandatory. PRs that update checksums without explanatory comments should be rejected in code review.
+## Related Standards
+- [LLM Output Validation](llm-output-validation.md) — schema-level validation
+- [Adversarial Test](adversarial-test.md) — red-team corpus
+- [Testing Standards](testing.md) — overall testing pyramid

package/bundled/core/property-based-testing.md ADDED Viewed

@@ -0,0 +1,73 @@
+# Property-Based Testing Standards
+## Overview
+Example-based tests only verify the cases a developer thought to write. Property-based testing inverts this: you define an invariant ("the score is always between 0 and 100") and the framework generates hundreds of inputs to try to falsify it. When it finds a failing input, it shrinks it to the minimal counterexample.
+## When to Use
+| Use Property Tests | Use Example Tests |
+|-------------------|------------------|
+| Pure math functions | Complex business logic |
+| Parsers / serializers | Integration paths |
+| Score clamping / rounding | UI behaviour |
+| Hash / encoding | Database operations |
+| Security validators | External API calls |
+## Tool: fast-check (TypeScript)
+```bash
+npm install --save-dev fast-check
+```
+```typescript
+import fc from "fast-check"
+import { describe, it, expect } from "vitest"
+import { classifyTokenZone, TOKEN_BUDGET } from "../types/index.js"
+describe("classifyTokenZone property: result is always a valid zone", () => {
+  it("for any ratio in [0, 2], returns a valid TokenBudgetZone", () => {
+    fc.assert(
+      fc.property(
+        fc.float({ min: 0, max: 2, noNaN: true }),
+        (ratio) => {
+          const zone = classifyTokenZone(ratio)
+          return ["safe", "warning", "danger", "blocking"].includes(zone)
+        }
+      ),
+      { numRuns: 1000 }
+    )
+  })
+})
+```
+## Guardian scoreReviewable Properties
+Key invariants to test:
+| Property | Description |
+|----------|-------------|
+| **Range clamping** | `score` is always `[0, 100]` |
+| **Determinism** | Same input always produces same score |
+| **Monotonicity** | prod > staging > dev for same operation |
+| **Non-negativity** | `breakdown` values are all >= 0 |
+## Counterexample Shrinking
+When fast-check finds a failing case, it automatically shrinks:
+```
+Original failure: { target_env: "prod", command: "rm -rf /tmp/xyz123...", ... }
+Shrunk to:        { target_env: "prod", command: "rm", ... }
+```
+Save the seed from the error message to reproduce:
+```typescript
+fc.assert(property, { seed: 1234567890 })
+```
+## Related Standards
+- [Mutation Testing Standards](mutation-testing.md) — complement to PBT
+- [Testing Standards](testing-standards.md) — overall test pyramid
+- [Adversarial Test Standards](adversarial-test.md) — security-focused fuzzing