npm - universal-dev-standards - Versions diffs - 5.4.0 → 5.6.0 - Mend

universal-dev-standards 5.4.0 → 5.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (138) hide show

package/bundled/core/flow-based-testing.md ADDED Viewed

@@ -0,0 +1,275 @@
+# Flow-Based Testing
+**Version**: 1.3.0
+**Last Updated**: 2026-05-05
+**Applicability**: All software projects with multi-step workflows
+**Scope**: universal
+**Industry Standards**: ISO/IEC/IEEE 29119-4 (Test Techniques), ISTQB Foundation Syllabus
+**References**: Decision Table Testing (ISTQB), Pairwise Testing, State Transition Testing
+[English](.) | [繁體中文](../locales/zh-TW/core/flow-based-testing.md)
+---
+## Purpose
+This document defines a systematic methodology for testing multi-step processes. It addresses the gap between AC-centric tests (which verify individual behaviors in isolation) and flow-level tests (which verify sequential behavior with accumulated state and branch coverage).
+---
+## The Core Problem: AC-Centric vs. Flow-Centric Testing
+AC-centric tests verify that each acceptance criterion works in isolation. However, they miss two critical categories of bugs:
+1. **Step interaction bugs**: A bug that only manifests when Step 1's output becomes Step 2's input
+2. **Branch coverage gaps**: Decision points that are never exercised with all possible values
+**Example**: A pipeline has 8 steps. Each AC passes independently. But when the quota check in Step 3 depends on state accumulated in Steps 1 and 2, the interaction is never tested.
+---
+## Three-Step Flow Decomposition
+### Step 1: Flow Identification
+Before writing any test code, document:
+- **Preconditions**: The system's initial state
+- **Step sequence**: The ordered list of actions (Step 1 → Step N)
+- **Decision points**: Every if/else/condition in the flow
+- **Terminal states**: All possible end states (success + each distinct failure)
+### Step 2: Decision Table Expansion
+For each decision point, list all possible values. Then apply a coverage strategy:
+| Strategy | When to Use | Scenario Count |
+|----------|-------------|---------------|
+| **Each-Choice** (minimum) | Low-risk flows, fast feedback | Sum of unique values |
+| **Pairwise** | Medium-risk flows | ~N × max_values |
+| **All-Combinations** | Auth, payment, security | Product of value counts |
+**Decision Table Example**:
+| Decision Point | Values |
+|----------------|--------|
+| Authorization | valid / expired / missing |
+| Quota | sufficient / exceeded |
+| External Service | available / timeout / error |
+Each-Choice minimum: 3 + 2 + 3 = 8 scenarios (vs. the typical 1-2 that teams actually write).
+### Step 3: Journey Test Structure
+Write tests with shared state threading — a `ctx` object accumulates state across steps:
+```typescript
+describe("Flow: Create Order", () => {
+  const ctx: { token?: string; orderId?: string } = {}
+  it("Step 1: Login", async () => {
+    ctx.token = await login(credentials)
+    expect(ctx.token).toBeTruthy()
+  })
+  it("Step 2: Create order (uses Step 1 token)", async () => {
+    ctx.orderId = await createOrder(ctx.token!, orderData)
+    expect(ctx.orderId).toMatch(/^ord-/)
+  })
+  it("Step 3: Verify order state (uses Step 2 orderId)", async () => {
+    const order = await getOrder(ctx.token!, ctx.orderId!)
+    expect(order.status).toBe("pending")
+  })
+})
+describe("Flow Branch: Quota exceeded path", () => {
+  it("should return 429 and NOT create order when quota is exhausted", async () => {
+    await exhaustQuota(testUser)
+    const response = await attemptCreateOrder(testToken, orderData)
+    expect(response.status).toBe(429)
+    expect(response.body.code).toBe("QUOTA_EXCEEDED")
+    // Verify side effects: no order was created
+    const orders = await getOrders(testUser)
+    expect(orders.length).toBe(0)
+  })
+})
+```
+---
+## Anti-Patterns
+- Testing only the happy path flow (missing failure terminal states)
+- Resetting shared state between steps (breaks state threading)
+- Testing each step in isolation without verifying accumulated state
+- Using a single test for a flow with multiple decision points
+- Applying All-Combinations to every flow (reserve for critical paths only)
+- Not verifying side effects (or absence thereof) in branch tests
+---
+## Relationship to Other Standards
+- **test-completeness-dimensions**: Dimensions 9 (Flow Completeness) and 10 (Branch Coverage) are defined here
+- **behavior-driven-development**: BDD Scenario Outline tables map to decision table expansion
+- **mock-boundary**: Flow tests must respect mock boundary rules (no mocking own module logic)
+- **e2e-testing**: Journey tests run at ST or E2E level; flow tests can run at IT level with real DB
+---
+## Multi-Gate Flow Verification Model
+Flow coverage is not a single pre-release check — it is a **progressive verification chain** across the entire SDLC. There are two fundamentally different questions that must be answered at different stages:
+| Verification Type | Question | Executor | Timing |
+|------------------|----------|----------|--------|
+| **Coverage** | Are all terminal states tested? | Automated CI | Dev → Staging → Pre-UAT |
+| **Correctness** | Are the terminal state definitions right? | Human UAT | UAT phase |
+Confusing the two wastes UAT cycles on technical coverage issues that CI should have caught.
+### Gate 0 — PRD Sign-off (Before Implementation Starts)
+The three testability elements MUST be written into the PRD before a single line of code is written. Use `templates/requirement-template.md` §2.4 and §9.4:
+| Element | PRD Section | When Required |
+|---------|-------------|---------------|
+| Preconditions + Ordered Steps | §2.4 | Flows with ≥ 3 steps |
+| Decision Points list | §2.4 | Every branch condition |
+| Terminal States list | §2.4 | All distinct end states |
+| Decision Table (Each-Choice) | §9.4 | All flows |
+| Upgrade to All-Combinations | §9.4 | Auth / payment / security |
+| UAT acceptance script (pre-filled) | §9.4 | Before PRD approval |
+> **Why at PRD stage?** Test engineers cannot derive branch coverage from a spec that only describes the happy path. Discovering missing decision points during test design wastes a full sprint.
+### Gate 1 — PR Merge (Per Feature Branch)
+Every PR that touches a flow with ≥ 3 steps MUST include automated tests covering the terminal states introduced or modified by that PR. Reviewers block merge if terminal states are added to §2.4 without corresponding tests.
+### Gate 3 — Pre-UAT Deployment (Automated + QA Lead Sign-off)
+CI must prove coverage completeness **before** UAT begins. UAT is for correctness validation, not technical testing.
+Required CI checks:
+- All Decision Table scenarios have a passing automated test
+- Zero terminal states without test coverage
+- Branch coverage ≥ 90% (or project-defined threshold)
+- All-Combinations fully passing for auth / payment / security flows
+> Deploying to UAT without Gate 3 forces business stakeholders to act as technical QA — a costly and demoralizing misuse of UAT time.
+### Gate 4 — UAT Sign-off (Business Correctness, Pre-Production)
+UAT validates that terminal state **definitions are correct** against real business rules, not that they are covered. Use the UAT Acceptance Script in §9.4 (derived directly from the Decision Table — no separate script creation needed):
+- Business stakeholders sign off each row (terminal state)
+- If UAT reveals a previously undefined terminal state: add it to §2.4 + Decision Table + automated test, re-run Gate 3, then resume UAT
+- No new terminal states discovered during UAT = strong signal that §2.4 was thorough
+### Gate Model Summary
+```
+PRD Sign-off
+    │ Gate 0: §2.4 + §9.4 complete (Decision Points, Terminal States,
+    │         Decision Table, UAT script pre-filled)
+    ▼
+Implementation + PR Reviews
+    │ Gate 1: Each PR covering a flow includes terminal state tests
+    ▼
+Staging / Integration
+    │ (no formal gate — CI green is sufficient)
+    ▼
+Pre-UAT Deployment
+    │ Gate 3: CI proves 100% terminal state coverage + branch coverage ≥ 90%
+    ▼
+UAT Execution
+    │ Gate 4: Business sign-off on terminal state correctness
+    │         New terminal states → back to Gate 3 before proceeding
+    ▼
+Production
+```
+---
+## RQM Integration
+Gate 3 (Pre-UAT CI coverage gate) MUST produce a **`flow_gate_report.json`** artifact consumed by the Release Quality Manifest (`release-quality-manifest.md`, field `flow_gate_report`).
+### flow_gate_report.json Schema
+```json
+{
+  "generated_at": "2026-05-05T04:00:00Z",
+  "commit": "abc1234",
+  "flows": [
+    {
+      "flow_id": "login-authentication",
+      "spec_ref": "docs/specs/SPEC-001.md#2.4",
+      "decision_points": 3,
+      "terminal_states": 7,
+      "gate_0_complete": true,
+      "gate_1_pr_coverage": true,
+      "gate_3": {
+        "all_scenarios_green": true,
+        "terminal_states_covered": 7,
+        "terminal_states_defined": 7,
+        "branch_coverage_pct": 94,
+        "coverage_target": 90,
+        "all_combinations_required": false,
+        "status": "pass"
+      },
+      "gate_4_uat_signoff": true
+    }
+  ],
+  "summary": {
+    "total_flows": 5,
+    "gate_0_complete": true,
+    "gate_1_pr_coverage": true,
+    "gate_3_ci_pass": true,
+    "gate_4_uat_signoff": true,
+    "status": "pass"
+  }
+}
+```
+### Generation Script Hook
+Add to CI after test run (Gate 3):
+```bash
+# scripts/generate-flow-gate-report.sh
+node scripts/generate-flow-gate-report.mjs \
+  --coverage-report coverage/coverage-summary.json \
+  --flow-specs "docs/specs/**/*.md" \
+  --uat-signoffs ".release-readiness/*.md" \
+  --output flow_gate_report.json
+```
+The `summary.status` field feeds into `release-quality-manifest.yaml` under `flow_gate_report.status`.
+---
+## Quick Reference Checklist
+```
+Flow: ___________________
+□ Step 1 — Flow Identification
+  □ Preconditions documented
+  □ Ordered step sequence listed
+  □ All decision points extracted
+  □ All terminal states defined
+□ Step 2 — Decision Table
+  □ Decision table created
+  □ Coverage strategy chosen (Each-Choice / Pairwise / All-Combinations)
+  □ Critical flows (auth/payment/security) → All-Combinations
+□ Step 3 — Journey Test Structure
+  □ Happy path journey test (shared ctx, sequential steps)
+  □ Each branch outcome has its own describe block
+  □ Branch tests verify both response AND absence of side effects
+  □ No beforeEach resetting ctx between steps
+```

package/bundled/core/full-coverage-testing.md ADDED Viewed

@@ -0,0 +1,183 @@
+# Full Coverage Testing Standards
+> **AI-optimized version**: `ai/standards/full-coverage-testing.ai.yaml`
+> **XSPEC**: XSPEC-178
+> **Replaces**: Pyramid threshold model (UT≥80%, IT≥70%, E2E happy-path-only)
+## Overview
+Full Coverage Testing is a behavior-completeness paradigm designed for the AI-era, where the cost of generating tests equals the cost of generating code. Traditional pyramid thresholds assumed tests were expensive to write — this assumption no longer holds.
+**Core principle**: Every public function must be tested for all three behavioral paths. Coverage is measured by behavior completeness, not percentage floors. CI enforces a ratchet: coverage can only increase, never decrease.
+---
+## Behavior-Completeness Model
+Instead of "80% line coverage", require:
+| Path | Description | Example |
+|------|-------------|---------|
+| **Happy path** | Normal input produces correct output | `calculateDiscount(100, 0.1) → 90` |
+| **Edge case** | Boundary values do not cause unexpected errors | `calculateDiscount(0, 1.0) → 0 without throwing` |
+| **Error path** | Invalid input raises clear error or error state | `calculateDiscount(-1, 2.0) → throws ArgumentError` |
+Every public function requires all three. This replaces the "80% of business logic" target with a qualitative, behavior-driven requirement.
+---
+## Ratchet CI Policy
+- The current coverage baseline is the minimum acceptable coverage
+- Any PR that decreases coverage is blocked from merging
+- Improvements update the baseline automatically on merge
+- No fixed percentage floor — the coverage achieved today is tomorrow's floor
+```bash
+# Stored in .coverage-baseline.json
+{ "line": 91.3, "branch": 88.7, "timestamp": "2026-05-06" }
+# PR regression → blocked
+Coverage regression: 91.3% → 89.1%. Ratchet threshold violated.
+# PR improvement → baseline updated
+Coverage improved: 91.3% → 92.0%. New baseline set.
+```
+---
+## Anti-Fake Test Rules
+### Forbidden: Tautology Assertions
+Assertions that always pass regardless of behavior provide false coverage.
+```typescript
+// ❌ FORBIDDEN — always passes, tests nothing
+expect(true).toBe(true)
+expect(result).toBeDefined()  // without specific value
+// ✅ REQUIRED — verifies actual behavior
+expect(result).toBe(90)
+expect(result).toEqual({ discount: 10, total: 90 })
+```
+### Forbidden: Mocking Core Business Logic
+Mocking your own code means the business logic is never actually executed.
+```typescript
+// ❌ FORBIDDEN — business logic never runs
+jest.mock('./orderService', () => ({ calculateTotal: jest.fn(() => 100) }))
+// ✅ ALLOWED — mock only external dependencies
+// MOCK: External Stripe API — no sandbox available in CI
+jest.mock('./payment-gateway', () => ({ charge: jest.fn().mockResolvedValue({ id: 'ch_test' }) }))
+```
+### Required: Mock Reason Comments
+Every mock must explain why the dependency cannot be real.
+```typescript
+// ❌ FORBIDDEN — no explanation
+jest.mock('./payment-gateway')
+// ✅ REQUIRED — explicit reason
+// MOCK: External payment gateway — network dependency, no sandbox in CI
+jest.mock('./payment-gateway', () => ({ ... }))
+```
+### Mock Boundary: What Can Be Mocked
+| ✅ Allowed to Mock | ❌ Forbidden to Mock |
+|-------------------|---------------------|
+| External HTTP APIs (payment, OAuth) | Core business calculation functions |
+| Hardware interfaces (sensors, GPIO) | Your own service layer methods |
+| Third-party SDKs without test mode | Database queries (use in-memory SQLite) |
+| Docker daemon | Your own utility functions |
+---
+## STUB Marker Protocol
+All temporary/placeholder implementations MUST be marked with the standard STUB marker. This is enforced by pre-push hooks and deploy.sh.
+### Marking a STUB
+```typescript
+// WARNING: STUB — Remove before UAT
+async function validatePayment(card: Card): Promise<boolean> {
+  return true; // Always approve — replace with real Stripe call
+}
+```
+### Exempting a Genuine Limitation
+When a dependency truly cannot be tested (hardware, live API without sandbox):
+```typescript
+// COVERAGE_EXEMPT: Hardware temperature sensor — no simulation available in CI
+async function readTemperature(): Promise<number> {
+  return hardwareSensor.read();
+}
+```
+The exemption reason MUST be non-empty and specific.
+### Deployment Gates
+| Environment | STUB Present | Action |
+|-------------|-------------|--------|
+| Feature branch push | Yes | ⚠️ Warning (not blocked) |
+| `main` branch push | Yes | ❌ Blocked |
+| Staging deploy | Yes | ⚠️ Warning (not blocked) |
+| UAT deploy | Yes | ❌ Blocked |
+| Production deploy | Yes | ❌ Blocked (critical log) |
+---
+## AC Traceability
+Link each test to its Acceptance Criteria using the `@ac` JSDoc tag:
+```typescript
+/**
+ * @ac AC-US03-2
+ */
+it('should block PR when coverage regresses below baseline', () => {
+  // test body
+})
+// If no AC maps to this test:
+/**
+ * @ac UNTRACED
+ */
+it('helper utility returns correct format', () => { ... })
+```
+CI reports AC coverage rate. If more than 20% of ACs lack `@ac`-tagged tests, a warning is shown.
+---
+## Migration from Pyramid Model
+If your project previously used pyramid thresholds:
+1. **Delete** any hardcoded coverage thresholds from `jest.config.js` / `vitest.config.ts` (`coverageThreshold` option)
+2. **Install** `.coverage-baseline.json` with current coverage as the starting ratchet
+3. **Add** `scripts/check-coverage-ratchet.sh` to CI
+4. **Add** `scripts/check-stubs.sh` to deploy.sh and pre-push hook
+5. **Add** `scripts/check-anti-fake-tests.sh` to pre-commit or CI
+The ratchet starts at your current coverage. From that point on, it can only increase.
+---
+## Related Standards
+- `testing.ai.yaml` — Test structure, FIRST principles, AAA pattern (pyramid thresholds deprecated here)
+- `unit-testing.ai.yaml` — Unit test scope and organization
+- `integration-testing.ai.yaml` — Integration test patterns
+- `deployment-standards.ai.yaml` — Deploy gate requirements
+- XSPEC-178 — Full specification and implementation phases

package/bundled/core/llm-output-validation.md ADDED Viewed

@@ -0,0 +1,178 @@
+# LLM 輸出驗證標準
+> 標準 ID：`llm-output-validation`
+> 版本：v1.0.0
+> 最後更新：2026-05-05
+---
+## 為什麼需要 LLM 輸出驗證？
+LLM 輸出具有**不確定性**：同一個 prompt 在不同時間、不同模型版本下可能產生格式不一致的輸出。如果不加以驗證，這些輸出可能在下游管線中造成靜默失敗（silent failure）——不是報錯，而是用了一個錯誤的預設值或 `undefined`。
+LLM 輸出驗證包含三個層次：
+| 層次 | 問題 | 工具 |
+|------|------|------|
+| 結構驗證 | 輸出格式是否正確？ | JSON Schema、Zod、Pydantic |
+| 語意驗證 | 宣稱的事實是否有根據？ | NLI probe、Grounding check |
+| 行為驗證 | Agent 是否正確拒絕越界請求？ | 紅隊語料庫、拒絕評估 |
+---
+## 一、Schema Contract Test（結構驗證）
+### 核心概念
+每個 AI Agent 應宣告一份 `output-schema.json`（JSON Schema 格式），並提供對應的 contract test。
+**Contract test 的目的**：
+- 確認 schema 本身是合法的 JSON Schema
+- 確認 valid fixtures 通過驗證
+- 確認 invalid fixtures（缺少必填欄位、型別錯誤、enum 違規）被拒絕
+### 推薦目錄結構
+```
+agents/<agent-name>/
+  output-schema.json       ← JSON Schema 定義
+  __tests__/
+    contract.test.ts       ← Contract test suite
+  __fixtures__/
+    valid.json             ← 真實 LLM 輸出 golden fixture
+    invalid-missing-id.json ← 缺少必填欄位的 fixture
+```
+### TypeScript 範例（使用 Ajv）
+```typescript
+import Ajv from "ajv"
+import schema from "../output-schema.json"
+import validFixture from "../__fixtures__/valid.json"
+const ajv = new Ajv({ strict: false })
+const validate = ajv.compile(schema)
+// 測試 1：Schema 本身是合法的 JSON Schema
+it("schema is valid JSON Schema", () => {
+  expect(ajv.validateSchema(schema)).toBe(true)
+})
+// 測試 2：Valid fixture 通過驗證
+it("valid fixture passes schema", () => {
+  expect(validate(validFixture)).toBe(true)
+})
+// 測試 3：空 object 被拒絕
+it("empty object is rejected", () => {
+  expect(validate({})).toBe(false)
+})
+// 測試 4：缺少 source_agent 被拒絕
+it("object missing source_agent is rejected", () => {
+  const { source_agent, ...without } = validFixture
+  expect(validate(without)).toBe(false)
+})
+```
+### Python 範例（使用 Pydantic）
+```python
+from pydantic import ValidationError
+from your_module import AgentOutput
+# 測試 valid fixture
+valid_data = { "version": "1.0.0", "source_agent": "planner", ... }
+output = AgentOutput(**valid_data)  # 不拋出 exception
+# 測試 invalid fixture
+try:
+    AgentOutput(version="bad-format", source_agent="planner")
+    assert False, "Should have raised"
+except ValidationError:
+    pass  # 預期行為
+```
+---
+## 二、幻覺偵測（Semantic Validation）
+### 什麼是幻覺？
+LLM 產生「聽起來正確但實際上沒有根據」的內容。例如：
+- 虛構的 API 文件 URL
+- 不存在的資料庫欄位名稱
+- 未在 context 中出現的 dependency 版本
+### 偵測策略
+| 策略 | 適用場景 | 自動化程度 |
+|------|---------|-----------|
+| **Schema 結構化輸出** | Agent 輸出 JSON，enum 限制可能值 | 高（自動） |
+| **Grounding Check** | RAG 系統，回答需引用 context | 中（需 NLI 模型） |
+| **信心度標記** | Agent 在輸出中包含 `confidence` 分數 | 中（需 prompt 設計） |
+| **紅隊語料庫** | 主動測試越界請求的拒絕行為 | 高（自動） |
+### 幻覺率目標
+| Agent 類型 | Schema 合規率 | 事實幻覺率 |
+|-----------|-------------|----------|
+| 結構化 JSON Agent | ≥ 99% | ≤ 5% |
+| RAG Agent | ≥ 95% | ≤ 5% |
+| 對話 Agent | ≥ 90% | ≤ 10% |
+---
+## 三、Prompt 回歸測試
+### 何時需要跑 Prompt 回歸測試？
+- 修改任何 `agents/*/prompt.md`
+- 模型版本升級（相同 prompt，不同 model）
+- Schema 新增 required field
+### 回歸測試流程
+```bash
+# 1. 修改前：用 temperature=0 記錄 golden output
+vibeops run planner --input fixtures/planner-input.json --temp 0 > golden.json
+# 2. 修改後：重跑並比對
+vibeops run planner --input fixtures/planner-input.json --temp 0 > after.json
+# 3. 用 contract test 驗證 after.json 仍符合 schema
+npx vitest run agents/__tests__/contract.test.ts
+```
+---
+## 四、品質閘門（Quality Gates）
+| 閘門 | 閾值 | 強制程度 |
+|------|------|---------|
+| Schema 合規（CI） | 100% | Block merge |
+| 空 object 拒絕（CI）| 100% | Block merge |
+| Prompt 修改後回歸（CI）| schema 合規維持 | Block merge |
+| 幻覺率（pre-release）| ≤ 5% | Advisory |
+---
+## 五、工具推薦
+| 工具 | 語言 | 用途 |
+|------|------|------|
+| [Ajv](https://ajv.js.org/) | TypeScript/JS | JSON Schema contract test |
+| [Zod](https://zod.dev/) | TypeScript | Runtime type validation |
+| [Pydantic](https://docs.pydantic.dev/) | Python | Schema + type validation |
+| [DeepEval](https://deepeval.com/) | Python | LLM 幻覺率、faithfulness 評分 |
+| [Ragas](https://docs.ragas.io/) | Python | RAG grounded answer rate |
+---
+## 參考標準
+- NIST AI RMF (AI 100-1, 2023) — AI 風險管理框架
+- OWASP Top 10 for LLM Applications v1.1 — LLM01: Prompt Injection
+- ISO/IEC 42001:2023 — AI 管理系統
+- [UDS `security-testing.ai.yaml`](./security-testing.md) — SAST + DAST 整合
+- [UDS `adversarial-test.ai.yaml`](./adversarial-test.md) — Prompt injection 紅隊標準