npm - universal-dev-standards - Versions diffs - 5.4.0 → 5.5.0 - Mend

universal-dev-standards 5.4.0 → 5.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (114) hide show

package/bundled/core/policy-as-code-testing.md ADDED Viewed

@@ -0,0 +1,188 @@
+# Policy as Code 測試標準
+> 標準 ID：`policy-as-code-testing`
+> 版本：v1.0.0
+> 最後更新：2026-05-05
+---
+## 為什麼需要測試 Policy as Code？
+OPA（Open Policy Agent）的 Rego policy 控制 AI Agent 能否執行生產環境操作。**未測試的 policy = 靜默的安全漏洞。**
+Policy as Code 的特殊風險：
+1. **邊界條件難以推理**：`reversible: false` + `target_env: "prod"` 組合是否觸發？
+2. **型別錯誤只在執行時爆發**：`array.concat()` 用在 set 型別 → 靜默失效
+3. **Fail-Open 風險**：評估失敗若回傳 `allow: true`，攻擊者可觸發未定義路徑
+4. **Policy 改動回歸**：新增一條 rule 可能意外放行原本被擋的案例
+---
+## 一、OPA 測試框架
+### 測試規則格式
+```rego
+# 檔案命名：<policy_module>_test.rego
+# Package：<policy_package>_test
+package vibeops.guardian.forbidden_patterns_test
+import future.keywords.if
+# 正向測試：規則應觸發（assert rule fires）
+test_drop_database_is_forbidden if {
+    data.vibeops.guardian.forbidden_patterns.has_forbidden_pattern with input as {
+        "plan": [{"command_type": "sql", "command": "DROP DATABASE prod_main", "reversible": false}]
+    }
+}
+# 負向測試：規則不應觸發（assert rule does NOT fire）
+test_safe_select_is_not_forbidden if {
+    not data.vibeops.guardian.forbidden_patterns.has_forbidden_pattern with input as {
+        "plan": [{"command_type": "sql", "command": "SELECT * FROM users LIMIT 10", "reversible": true}]
+    }
+}
+```
+### 執行方式
+```bash
+# OPA 已安裝時
+opa test src/guardian/policies/ -v
+# 透過 Docker（不需安裝 OPA）
+docker run --rm \
+  -v "$(pwd)/src/guardian/policies:/policies:ro" \
+  openpolicyagent/opa:latest-static \
+  test /policies -v
+```
+---
+## 二、每個 Policy Module 的最低測試要求
+| 類型 | 最少案例 | 說明 |
+|------|---------|------|
+| ALLOW cases | 2 | 應該通過的正常操作 |
+| DENY cases | 3 | 應該被攔截的危險操作 |
+| Boundary cases | 1 | 邊界條件（如 reversible=true vs. false）|
+| Integration（main policy）| 2 | 整合 main.rego 的允許 + 拒絕路徑 |
+---
+## 三、Policy Module 設計原則
+### 3.1 Fail-Closed 預設
+```rego
+# main.rego 必須包含以下預設
+default allow = false
+allow if {
+    not data.vibeops.guardian.forbidden_patterns.has_forbidden_pattern
+    not data.vibeops.guardian.env_policy.prod_violation
+    not data.vibeops.guardian.logic_constraints.has_logic_violation
+}
+```
+任何 `undefined` 評估結果都應回傳 DENY，不能回傳 ALLOW。
+### 3.2 使用 Set（不要 array.concat）
+OPA ≥ 0.40 的型別系統嚴格區分 array 和 set。`violations` partial rule 是 set 型別，**不可用 `array.concat()`**。
+```rego
+# ✅ 正確：partial set rule 集合 violations
+deny_reasons[r] if { r := data.vibeops.guardian.forbidden_patterns.violations[_] }
+deny_reasons[r] if { r := data.vibeops.guardian.env_policy.violations[_] }
+deny_reasons[r] if { r := data.vibeops.guardian.logic_constraints.violations[_] }
+# ❌ 錯誤：array.concat 用在 set 上 → rego_type_error
+# deny_reasons := array.concat(violations1, violations2)
+```
+### 3.3 禁止解析自由文字欄位
+Policy 決策**不得依賴** `intent`、`description`、`annotation` 等使用者可控文字欄位。
+```rego
+# ❌ 危險：解析 intent 欄位 → Prompt Injection 攻擊面（OWASP LLM01）
+allow if { contains(input.intent, "EMERGENCY") }
+# ✅ 安全：只使用結構化欄位
+allow if {
+    input.target_env != "prod"
+    every step in input.plan { step.reversible == true }
+}
+```
+### 3.4 一個 Module 管一個關注點
+```
+policies/
+  forbidden_patterns.rego      ← 禁止指令模式
+  forbidden_patterns_test.rego
+  env_policy.rego              ← 環境特定規則（prod 保護）
+  env_policy_test.rego
+  logic_constraints.rego       ← 邏輯一致性（stop+start 用 restart）
+  logic_constraints_test.rego
+  risk_gate.rego               ← 風險分數閾值
+  risk_gate_test.rego
+  main.rego                    ← 整合所有 module，Fail-Closed
+  main_test.rego               ← 整合測試
+```
+---
+## 四、CI 整合
+### GitHub Actions 步驟
+```yaml
+- name: Test OPA Rego Policies
+  run: |
+    docker run --rm \
+      -v "${{ github.workspace }}/src/guardian/policies:/policies:ro" \
+      openpolicyagent/opa:latest-static \
+      test /policies -v
+```
+### npm script
+```json
+{
+  "test:policy": "docker run --rm -v \"$(pwd)/src/guardian/policies:/policies:ro\" openpolicyagent/opa:latest-static test /policies -v"
+}
+```
+---
+## 五、品質閘門
+| 閘門 | 閾值 | 強制程度 |
+|------|------|---------|
+| OPA 測試通過率（CI） | 100%（所有 test_ rule 通過）| Block merge |
+| Root policy Fail-Closed | `default allow = false` 存在 | Block merge |
+| 每個 policy module 有 _test.rego | 每個 .rego 有對應測試 | Advisory |
+---
+## 六、反模式（Anti-patterns）
+| 反模式 | 問題 | 正確做法 |
+|--------|------|---------|
+| `array.concat()` 用在 violations（set 型）| OPA 型別錯誤 | 改用 partial set rule |
+| Root policy 缺少 `default allow = false` | Fail-Open 漏洞 | 加入 default |
+| Intent 欄位參與安全決策 | Prompt Injection 攻擊面 | 只用結構化欄位 |
+| 只測試 DENY（無 ALLOW 測試）| 無法偵測過度限制 | 加入 ALLOW 案例 |
+| _test.rego 只在本機跑，不在 CI | policy 改動無安全網 | CI 加 `opa test` step |
+---
+## 參考標準
+- [OPA Testing Guide](https://www.openpolicyagent.org/docs/latest/policy-testing/)
+- NIST SP 800-204C — Attribute-based Access Control
+- [UDS `secure-op.ai.yaml`](./secure-op.md) — AI Agent 安全操作六大支柱
+- [UDS `adversarial-test.ai.yaml`](./adversarial-test.md) — 對抗性測試（OWASP LLM01）
+- [UDS `container-security.ai.yaml`](./container-security.md) — 容器安全（OPA Sidecar 部署）

package/bundled/core/prompt-regression.md ADDED Viewed

@@ -0,0 +1,72 @@
+# Prompt Regression Standards
+## Overview
+AI agent prompts are code. Unintended changes silently degrade agent behaviour without triggering type errors or unit test failures. Prompt regression tests use golden SHA-256 checksums to detect any modification, forcing developers to explicitly acknowledge and document prompt changes.
+## Why Checksums
+- Diffs alone don't block CI — checksums do
+- Prompts are large markdown files; minor edits (whitespace, punctuation) can shift model behaviour
+- Checksum update + comment creates an audit trail of why each prompt changed
+## Implementation
+### 1. Compute Initial Checksums
+```bash
+for f in agents/*/prompt.md; do
+  echo -n "$f: "
+  sha256sum "$f" | cut -d' ' -f1
+done
+```
+### 2. Golden Checksum Test (Vitest)
+```typescript
+// SPDX-License-Identifier: AGPL-3.0-only
+import { createHash } from "crypto"
+import { readFileSync } from "fs"
+import { join } from "path"
+import { describe, it, expect } from "vitest"
+// Update these values ONLY when prompt changes are intentional.
+// Add a comment on the same line explaining WHY the prompt changed.
+const GOLDEN_CHECKSUMS: Record<string, string> = {
+  architect: "98017d39b0e48cda88b796687d21e0f884c810805e534453a23b7ad935e4a5ef",
+  builder:   "5c2acda3e48dae771c61f55d3a5b0d5ac7383870054ef71e757714e367c50031",
+  // ... all agents
+}
+describe("Agent prompt regression (XSPEC-162)", () => {
+  for (const [agent, expected] of Object.entries(GOLDEN_CHECKSUMS)) {
+    it(`agents/${agent}/prompt.md checksum matches golden`, () => {
+      const filePath = join(__dirname, "..", "..", "agents", agent, "prompt.md")
+      const content = readFileSync(filePath)
+      const actual = createHash("sha256").update(content).digest("hex")
+      expect(actual, `Prompt for '${agent}' changed unexpectedly. If intentional, update GOLDEN_CHECKSUMS with a comment.`).toBe(expected)
+    })
+  }
+})
+```
+### 3. CI Integration
+The checksum test runs as part of the standard `npm run test:coverage` gate (already enforced via XSPEC-156). No additional CI step needed.
+### 4. Updating Checksums
+When a prompt change is intentional:
+```typescript
+// BEFORE:
+architect: "98017d39...",  // updated 2026-05-05: added Guardian policy XSPEC-160 reference
+```
+The comment is mandatory. PRs that update checksums without explanatory comments should be rejected in code review.
+## Related Standards
+- [LLM Output Validation](llm-output-validation.md) — schema-level validation
+- [Adversarial Test](adversarial-test.md) — red-team corpus
+- [Testing Standards](testing.md) — overall testing pyramid

package/bundled/core/property-based-testing.md ADDED Viewed

@@ -0,0 +1,73 @@
+# Property-Based Testing Standards
+## Overview
+Example-based tests only verify the cases a developer thought to write. Property-based testing inverts this: you define an invariant ("the score is always between 0 and 100") and the framework generates hundreds of inputs to try to falsify it. When it finds a failing input, it shrinks it to the minimal counterexample.
+## When to Use
+| Use Property Tests | Use Example Tests |
+|-------------------|------------------|
+| Pure math functions | Complex business logic |
+| Parsers / serializers | Integration paths |
+| Score clamping / rounding | UI behaviour |
+| Hash / encoding | Database operations |
+| Security validators | External API calls |
+## Tool: fast-check (TypeScript)
+```bash
+npm install --save-dev fast-check
+```
+```typescript
+import fc from "fast-check"
+import { describe, it, expect } from "vitest"
+import { classifyTokenZone, TOKEN_BUDGET } from "../types/index.js"
+describe("classifyTokenZone property: result is always a valid zone", () => {
+  it("for any ratio in [0, 2], returns a valid TokenBudgetZone", () => {
+    fc.assert(
+      fc.property(
+        fc.float({ min: 0, max: 2, noNaN: true }),
+        (ratio) => {
+          const zone = classifyTokenZone(ratio)
+          return ["safe", "warning", "danger", "blocking"].includes(zone)
+        }
+      ),
+      { numRuns: 1000 }
+    )
+  })
+})
+```
+## Guardian scoreReviewable Properties
+Key invariants to test:
+| Property | Description |
+|----------|-------------|
+| **Range clamping** | `score` is always `[0, 100]` |
+| **Determinism** | Same input always produces same score |
+| **Monotonicity** | prod > staging > dev for same operation |
+| **Non-negativity** | `breakdown` values are all >= 0 |
+## Counterexample Shrinking
+When fast-check finds a failing case, it automatically shrinks:
+```
+Original failure: { target_env: "prod", command: "rm -rf /tmp/xyz123...", ... }
+Shrunk to:        { target_env: "prod", command: "rm", ... }
+```
+Save the seed from the error message to reproduce:
+```typescript
+fc.assert(property, { seed: 1234567890 })
+```
+## Related Standards
+- [Mutation Testing Standards](mutation-testing.md) — complement to PBT
+- [Testing Standards](testing-standards.md) — overall test pyramid
+- [Adversarial Test Standards](adversarial-test.md) — security-focused fuzzing

package/bundled/core/release-quality-manifest.md ADDED Viewed

@@ -0,0 +1,147 @@
+# Release Quality Manifest
+## Overview
+A Release Quality Manifest (RQM) is a machine-readable document generated automatically by CI for every release. It aggregates the results of all quality gates into a single artifact that serves as the authoritative evidence of release readiness — both for internal go/no-go automation and for customer audits.
+## Why a Manifest?
+Without a manifest, quality evidence is scattered across CI logs, coverage HTML reports, SARIF files, and container scan summaries. When a customer asks "how was this release tested?", the answer is either "trust us" or a 45-minute manual aggregation exercise.
+A Release Quality Manifest makes quality evidence:
+- **Aggregated**: one file, all gates
+- **Machine-readable**: downstream tooling can parse and enforce
+- **Timestamped and commit-pinned**: tied to a specific release artifact
+- **Customer-shareable**: ready to attach to a release package
+## Schema
+```yaml
+release: vibeops-commercial-1.2.0
+generated_at: "2026-05-05T04:00:00Z"
+commit: "abc1234"
+gates:
+  unit_coverage:
+    actual: "73%"
+    target: "80%"
+    status: warn        # within 10pp of target → warn, not fail
+  mutation_score:
+    actual: "62%"
+    target: "60%"
+    status: pass
+  sca_critical_cve:
+    actual: 0
+    target: 0
+    status: pass
+  sca_high_cve:
+    actual: 0
+    target: 0
+    status: pass
+  sast_high:
+    actual: 0
+    target: 0
+    status: pass
+  e2e_pass_rate:
+    actual: "96%"
+    target: "95%"
+    status: pass
+  container_cve_critical:
+    actual: 0
+    target: 0
+    status: pass
+  image_signed:
+    actual: true
+    target: true
+    status: pass
+  sbom_present:
+    actual: true
+    target: true
+    status: pass
+overall: WARN   # worst gate status (2 warns, no fails)
+```
+## Status Semantics
+| Status | Meaning | Action |
+|--------|---------|--------|
+| `pass` | Meets or exceeds target | None required |
+| `warn` | Within acceptable deviation (see per-gate policy) | Document reason; no release block |
+| `fail` | Below hard minimum | **Blocks release** |
+### Per-Gate Hard Minimums (Examples)
+| Gate | Warn Band | Fail Threshold |
+|------|-----------|----------------|
+| unit_coverage | target - 10pp to target | below target - 10pp |
+| mutation_score | target - 5pp to target | below target - 5pp |
+| sca_critical_cve | — | any critical CVE = fail |
+| container_cve_critical | — | any critical CVE = fail |
+| e2e_pass_rate | target - 3pp to target | below target - 3pp |
+## Automated Generation
+Generate the manifest in CI after all gate jobs complete:
+```bash
+#!/usr/bin/env bash
+# scripts/generate-quality-manifest.sh
+set -euo pipefail
+COVERAGE=$(node -e "
+  const r = JSON.parse(require('fs').readFileSync('coverage/coverage-summary.json'));
+  console.log(r.total.lines.pct.toFixed(1) + '%')
+")
+MUTATION=$(node -e "
+  const r = JSON.parse(require('fs').readFileSync('reports/mutation/mutation-testing-report.json'));
+  console.log(r.metrics.mutationScore.toFixed(1) + '%')
+")
+CRITICAL_CVE=$(jq '[.Results[]?.Vulnerabilities[]? | select(.Severity == "CRITICAL")] | length' trivy-report.json)
+cat > quality-manifest.yaml <<YAML
+release: ${RELEASE_TAG}
+generated_at: "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+commit: "${GITHUB_SHA:-$(git rev-parse HEAD)}"
+gates:
+  unit_coverage:
+    actual: "${COVERAGE}"
+    target: "80%"
+    status: $([ $(echo "$COVERAGE" | tr -d '%') -ge 80 ] && echo pass || echo warn)
+  sca_critical_cve:
+    actual: ${CRITICAL_CVE}
+    target: 0
+    status: $([ "$CRITICAL_CVE" -eq 0 ] && echo pass || echo fail)
+overall: $(grep -q "fail" quality-manifest.yaml && echo FAIL || grep -q "warn" quality-manifest.yaml && echo WARN || echo PASS)
+YAML
+```
+## Customer-Facing Summary
+Generate a Markdown table alongside the YAML for inclusion in release notes:
+```markdown
+## Release Quality Gates — vibeops-commercial-1.2.0
+| Gate | Actual | Target | Status |
+|------|--------|--------|--------|
+| Unit Test Coverage | 73% | 80% | ⚠️ WARN |
+| Mutation Score | 62% | 60% | ✅ PASS |
+| Critical CVEs | 0 | 0 | ✅ PASS |
+...
+| **Overall** | | | ⚠️ WARN |
+```
+## Anti-Patterns
+- **Manually authoring the manifest** — defeats the purpose; must be generated from tool outputs
+- **Using warn for critical security gates** — `sca_critical_cve` and `container_cve_critical` are binary
+- **Generating the manifest before all gates have run** — values must reflect actual results, not estimates
+- **Not attaching the manifest to the release artifact** — a manifest in git history is not accessible to customers
+## See Also
+- `verification-evidence.ai.yaml` — audit evidence principles
+- `supply-chain-attestation.ai.yaml` — SBOM and provenance
+- `testing.ai.yaml` — overall test strategy
+- `deployment-standards.ai.yaml` — release gate integration

package/bundled/core/replay-test.md ADDED Viewed

@@ -0,0 +1,86 @@
+# Replay Test Standards
+## Overview
+AI agent systems interact with users through complex multi-step pipelines. When a customer reports unexpected behaviour, reproducing the exact failure is often difficult — the model output may be non-deterministic, the environment may have changed, or the exact inputs may be unclear. Golden fixture replay solves this by serialising the exact inputs and expected outputs at time of discovery, enabling deterministic regression tests.
+## Fixture Format
+```json
+{
+  "meta": {
+    "recorded": "2026-05-05",
+    "source": "customer-report | ci-regression | red-team | incident",
+    "description": "Human-readable description of what this tests"
+  },
+  "input": { /* exact component input */ },
+  "expected": { /* expected output fields to assert */ }
+}
+```
+## Fixture Naming
+`<component>-<outcome>-<description>.json`
+| Good | Bad |
+|------|-----|
+| `guardian-deny-prod-drop-table.json` | `test1.json` |
+| `guardian-allow-dev-npm-test.json` | `fixture.json` |
+| `guardian-hitl-prod-irreversible.json` | `scenario_3.json` |
+## Replay Test Implementation (Vitest)
+```typescript
+// SPDX-License-Identifier: AGPL-3.0-only
+import { readdirSync, readFileSync } from "fs"
+import { join } from "path"
+import { describe, it, expect } from "vitest"
+import { scoreReviewable } from "../scoring/risk-engine.js"
+const FIXTURES_DIR = join(__dirname, "..", "__fixtures__")
+interface ReplayFixture {
+  meta: { recorded: string; source: string; description: string }
+  input: Parameters<typeof scoreReviewable>[0]
+  expected: { decision: string }
+}
+function deriveDecision(score: number): string {
+  if (score >= 76) return "DENY"
+  if (score >= 51) return "REQUIRE_HITL"
+  return "ALLOW"
+}
+describe("Guardian replay fixtures", () => {
+  const fixtures = readdirSync(FIXTURES_DIR)
+    .filter(f => f.endsWith(".json"))
+    .map(f => ({
+      name: f,
+      fixture: JSON.parse(readFileSync(join(FIXTURES_DIR, f), "utf-8")) as ReplayFixture,
+    }))
+  for (const { name, fixture } of fixtures) {
+    it(`[${fixture.meta.source}] ${fixture.meta.description}`, () => {
+      const result = scoreReviewable(fixture.input)
+      const decision = deriveDecision(result.score)
+      expect(decision).toBe(fixture.expected.decision)
+    })
+  }
+})
+```
+## Bug Regression Workflow
+1. Customer reports unexpected Guardian verdict
+2. Capture the exact `Reviewable` input (from audit logs)
+3. Create fixture file: `guardian-<outcome>-<description>.json`
+4. Reproduce failure locally (test should fail)
+5. Fix the bug
+6. Confirm test passes
+7. The fixture now permanently prevents regression
+## Related Standards
+- [Adversarial Test Standards](adversarial-test.md) — red-team corpus
+- [Verification Evidence Standards](verification-evidence.md) — AC traceability
+- [Testing Standards](testing.md) — overall test pyramid