universal-dev-standards 5.4.0 → 5.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/bundled/ai/options/testing/integration-testing.ai.yaml +2 -2
  2. package/bundled/ai/options/testing/unit-testing.ai.yaml +2 -2
  3. package/bundled/ai/standards/adversarial-test.ai.yaml +277 -0
  4. package/bundled/ai/standards/audit-trail.ai.yaml +113 -0
  5. package/bundled/ai/standards/browser-compatibility-standards.ai.yaml +63 -0
  6. package/bundled/ai/standards/chaos-injection-tests.ai.yaml +91 -0
  7. package/bundled/ai/standards/container-image-standards.ai.yaml +88 -0
  8. package/bundled/ai/standards/container-security.ai.yaml +331 -0
  9. package/bundled/ai/standards/contract-testing-standards.ai.yaml +62 -0
  10. package/bundled/ai/standards/cost-budget-test.ai.yaml +96 -0
  11. package/bundled/ai/standards/cross-flow-regression.ai.yaml +61 -0
  12. package/bundled/ai/standards/data-contract.ai.yaml +110 -0
  13. package/bundled/ai/standards/data-migration-testing.ai.yaml +96 -0
  14. package/bundled/ai/standards/data-pipeline.ai.yaml +113 -0
  15. package/bundled/ai/standards/disaster-recovery-drill.ai.yaml +89 -0
  16. package/bundled/ai/standards/flaky-test-management.ai.yaml +89 -0
  17. package/bundled/ai/standards/flow-based-testing.ai.yaml +240 -0
  18. package/bundled/ai/standards/full-coverage-testing.ai.yaml +192 -0
  19. package/bundled/ai/standards/iac-design-principles.ai.yaml +83 -0
  20. package/bundled/ai/standards/incident-response.ai.yaml +107 -0
  21. package/bundled/ai/standards/license-compliance.ai.yaml +106 -0
  22. package/bundled/ai/standards/llm-output-validation.ai.yaml +269 -0
  23. package/bundled/ai/standards/mock-boundary.ai.yaml +250 -0
  24. package/bundled/ai/standards/mutation-testing.ai.yaml +192 -0
  25. package/bundled/ai/standards/pii-classification.ai.yaml +109 -0
  26. package/bundled/ai/standards/policy-as-code-testing.ai.yaml +227 -0
  27. package/bundled/ai/standards/prd-standards.ai.yaml +88 -0
  28. package/bundled/ai/standards/product-metrics-standards.ai.yaml +111 -0
  29. package/bundled/ai/standards/prompt-regression.ai.yaml +94 -0
  30. package/bundled/ai/standards/property-based-testing.ai.yaml +105 -0
  31. package/bundled/ai/standards/release-quality-manifest.ai.yaml +135 -0
  32. package/bundled/ai/standards/release-readiness-gate.ai.yaml +77 -0
  33. package/bundled/ai/standards/replay-test.ai.yaml +111 -0
  34. package/bundled/ai/standards/runbook.ai.yaml +104 -0
  35. package/bundled/ai/standards/sast-advanced.ai.yaml +135 -0
  36. package/bundled/ai/standards/schema-evolution.ai.yaml +111 -0
  37. package/bundled/ai/standards/secret-management-standards.ai.yaml +105 -0
  38. package/bundled/ai/standards/secure-op.ai.yaml +365 -0
  39. package/bundled/ai/standards/security-testing.ai.yaml +171 -0
  40. package/bundled/ai/standards/server-ops-security.ai.yaml +274 -0
  41. package/bundled/ai/standards/slo-sli.ai.yaml +97 -0
  42. package/bundled/ai/standards/smoke-test.ai.yaml +87 -0
  43. package/bundled/ai/standards/supply-chain-attestation.ai.yaml +109 -0
  44. package/bundled/ai/standards/test-completeness-dimensions.ai.yaml +52 -5
  45. package/bundled/ai/standards/testing.ai.yaml +20 -13
  46. package/bundled/ai/standards/user-story-mapping.ai.yaml +108 -0
  47. package/bundled/core/accessibility-standards.md +58 -0
  48. package/bundled/core/adversarial-test.md +212 -0
  49. package/bundled/core/branch-completion.md +4 -0
  50. package/bundled/core/browser-compatibility-standards.md +220 -0
  51. package/bundled/core/chaos-injection-tests.md +116 -0
  52. package/bundled/core/checkin-standards.md +1 -0
  53. package/bundled/core/container-security.md +521 -0
  54. package/bundled/core/contract-testing-standards.md +182 -0
  55. package/bundled/core/cost-budget-test.md +69 -0
  56. package/bundled/core/cross-flow-regression.md +190 -0
  57. package/bundled/core/data-migration-testing.md +110 -0
  58. package/bundled/core/disaster-recovery-drill.md +73 -0
  59. package/bundled/core/flaky-test-management.md +73 -0
  60. package/bundled/core/flow-based-testing.md +275 -0
  61. package/bundled/core/full-coverage-testing.md +183 -0
  62. package/bundled/core/llm-output-validation.md +178 -0
  63. package/bundled/core/mock-boundary.md +100 -0
  64. package/bundled/core/mutation-testing.md +97 -0
  65. package/bundled/core/performance-standards.md +65 -0
  66. package/bundled/core/policy-as-code-testing.md +188 -0
  67. package/bundled/core/prompt-regression.md +72 -0
  68. package/bundled/core/property-based-testing.md +73 -0
  69. package/bundled/core/release-quality-manifest.md +193 -0
  70. package/bundled/core/release-readiness-gate.md +184 -0
  71. package/bundled/core/replay-test.md +86 -0
  72. package/bundled/core/sast-advanced.md +300 -0
  73. package/bundled/core/secure-op.md +314 -0
  74. package/bundled/core/security-testing.md +87 -0
  75. package/bundled/core/server-ops-security.md +493 -0
  76. package/bundled/core/smoke-test.md +65 -0
  77. package/bundled/core/supply-chain-attestation.md +117 -0
  78. package/bundled/locales/zh-CN/CHANGELOG.md +3 -3
  79. package/bundled/locales/zh-CN/README.md +1 -1
  80. package/bundled/locales/zh-CN/skills/ai-instruction-standards/SKILL.md +5 -5
  81. package/bundled/locales/zh-TW/CHANGELOG.md +3 -3
  82. package/bundled/locales/zh-TW/README.md +1 -1
  83. package/bundled/locales/zh-TW/core/browser-compatibility-standards.md +11 -0
  84. package/bundled/locales/zh-TW/core/contract-testing-standards.md +11 -0
  85. package/bundled/locales/zh-TW/core/cross-flow-regression.md +11 -0
  86. package/bundled/locales/zh-TW/core/release-readiness-gate.md +11 -0
  87. package/bundled/locales/zh-TW/skills/ai-instruction-standards/SKILL.md +183 -79
  88. package/bundled/skills/README.md +4 -3
  89. package/bundled/skills/SKILL_NAMING.md +94 -0
  90. package/bundled/skills/ai-instruction-standards/SKILL.md +181 -88
  91. package/bundled/skills/atdd-assistant/SKILL.md +8 -0
  92. package/bundled/skills/bdd-assistant/SKILL.md +7 -0
  93. package/bundled/skills/checkin-assistant/SKILL.md +8 -0
  94. package/bundled/skills/code-review-assistant/SKILL.md +7 -0
  95. package/bundled/skills/journey-test-assistant/SKILL.md +203 -0
  96. package/bundled/skills/orchestrate/SKILL.md +167 -0
  97. package/bundled/skills/plan/SKILL.md +234 -0
  98. package/bundled/skills/pr-automation-assistant/SKILL.md +8 -0
  99. package/bundled/skills/push/SKILL.md +49 -2
  100. package/bundled/skills/{process-automation → skill-builder}/SKILL.md +1 -1
  101. package/bundled/skills/{forward-derivation → spec-derivation}/SKILL.md +1 -1
  102. package/bundled/skills/spec-driven-dev/SKILL.md +7 -0
  103. package/bundled/skills/sweep/SKILL.md +145 -0
  104. package/bundled/skills/tdd-assistant/SKILL.md +7 -0
  105. package/package.json +6 -6
  106. package/src/commands/check.js +43 -0
  107. package/src/commands/flow.js +8 -0
  108. package/src/commands/init.js +2 -1
  109. package/src/commands/start.js +14 -0
  110. package/src/commands/sweep.js +8 -0
  111. package/src/commands/update.js +10 -0
  112. package/src/commands/workflow.js +8 -0
  113. package/standards-registry.json +483 -5
  114. package/bundled/locales/zh-CN/skills/ac-coverage-assistant/SKILL.md +0 -190
  115. package/bundled/locales/zh-CN/skills/forward-derivation/SKILL.md +0 -71
  116. package/bundled/locales/zh-CN/skills/forward-derivation/guide.md +0 -130
  117. package/bundled/locales/zh-CN/skills/methodology-system/SKILL.md +0 -88
  118. package/bundled/locales/zh-CN/skills/methodology-system/create-methodology.md +0 -350
  119. package/bundled/locales/zh-CN/skills/methodology-system/guide.md +0 -131
  120. package/bundled/locales/zh-CN/skills/methodology-system/runtime.md +0 -279
  121. package/bundled/locales/zh-CN/skills/process-automation/SKILL.md +0 -143
  122. package/bundled/locales/zh-TW/skills/ac-coverage-assistant/SKILL.md +0 -195
  123. package/bundled/locales/zh-TW/skills/deploy-assistant/SKILL.md +0 -178
  124. package/bundled/locales/zh-TW/skills/forward-derivation/SKILL.md +0 -69
  125. package/bundled/locales/zh-TW/skills/forward-derivation/guide.md +0 -415
  126. package/bundled/locales/zh-TW/skills/methodology-system/SKILL.md +0 -86
  127. package/bundled/locales/zh-TW/skills/methodology-system/create-methodology.md +0 -350
  128. package/bundled/locales/zh-TW/skills/methodology-system/guide.md +0 -131
  129. package/bundled/locales/zh-TW/skills/methodology-system/runtime.md +0 -279
  130. package/bundled/locales/zh-TW/skills/process-automation/SKILL.md +0 -144
  131. /package/bundled/skills/{ac-coverage-assistant → ac-coverage}/SKILL.md +0 -0
  132. /package/bundled/skills/{methodology-system → dev-methodology}/SKILL.md +0 -0
  133. /package/bundled/skills/{methodology-system → dev-methodology}/create-methodology.md +0 -0
  134. /package/bundled/skills/{methodology-system → dev-methodology}/guide.md +0 -0
  135. /package/bundled/skills/{methodology-system → dev-methodology}/integrated-flow.md +0 -0
  136. /package/bundled/skills/{methodology-system → dev-methodology}/prerequisite-check.md +0 -0
  137. /package/bundled/skills/{methodology-system → dev-methodology}/runtime.md +0 -0
  138. /package/bundled/skills/{forward-derivation → spec-derivation}/guide.md +0 -0
@@ -0,0 +1,275 @@
1
+ # Flow-Based Testing
2
+
3
+ **Version**: 1.3.0
4
+ **Last Updated**: 2026-05-05
5
+ **Applicability**: All software projects with multi-step workflows
6
+ **Scope**: universal
7
+ **Industry Standards**: ISO/IEC/IEEE 29119-4 (Test Techniques), ISTQB Foundation Syllabus
8
+ **References**: Decision Table Testing (ISTQB), Pairwise Testing, State Transition Testing
9
+
10
+ [English](.) | [繁體中文](../locales/zh-TW/core/flow-based-testing.md)
11
+
12
+ ---
13
+
14
+ ## Purpose
15
+
16
+ This document defines a systematic methodology for testing multi-step processes. It addresses the gap between AC-centric tests (which verify individual behaviors in isolation) and flow-level tests (which verify sequential behavior with accumulated state and branch coverage).
17
+
18
+ ---
19
+
20
+ ## The Core Problem: AC-Centric vs. Flow-Centric Testing
21
+
22
+ AC-centric tests verify that each acceptance criterion works in isolation. However, they miss two critical categories of bugs:
23
+
24
+ 1. **Step interaction bugs**: A bug that only manifests when Step 1's output becomes Step 2's input
25
+ 2. **Branch coverage gaps**: Decision points that are never exercised with all possible values
26
+
27
+ **Example**: A pipeline has 8 steps. Each AC passes independently. But when the quota check in Step 3 depends on state accumulated in Steps 1 and 2, the interaction is never tested.
28
+
29
+ ---
30
+
31
+ ## Three-Step Flow Decomposition
32
+
33
+ ### Step 1: Flow Identification
34
+
35
+ Before writing any test code, document:
36
+
37
+ - **Preconditions**: The system's initial state
38
+ - **Step sequence**: The ordered list of actions (Step 1 → Step N)
39
+ - **Decision points**: Every if/else/condition in the flow
40
+ - **Terminal states**: All possible end states (success + each distinct failure)
41
+
42
+ ### Step 2: Decision Table Expansion
43
+
44
+ For each decision point, list all possible values. Then apply a coverage strategy:
45
+
46
+ | Strategy | When to Use | Scenario Count |
47
+ |----------|-------------|---------------|
48
+ | **Each-Choice** (minimum) | Low-risk flows, fast feedback | Sum of unique values |
49
+ | **Pairwise** | Medium-risk flows | ~N × max_values |
50
+ | **All-Combinations** | Auth, payment, security | Product of value counts |
51
+
52
+ **Decision Table Example**:
53
+
54
+ | Decision Point | Values |
55
+ |----------------|--------|
56
+ | Authorization | valid / expired / missing |
57
+ | Quota | sufficient / exceeded |
58
+ | External Service | available / timeout / error |
59
+
60
+ Each-Choice minimum: 3 + 2 + 3 = 8 scenarios (vs. the typical 1-2 that teams actually write).
61
+
62
+ ### Step 3: Journey Test Structure
63
+
64
+ Write tests with shared state threading — a `ctx` object accumulates state across steps:
65
+
66
+ ```typescript
67
+ describe("Flow: Create Order", () => {
68
+ const ctx: { token?: string; orderId?: string } = {}
69
+
70
+ it("Step 1: Login", async () => {
71
+ ctx.token = await login(credentials)
72
+ expect(ctx.token).toBeTruthy()
73
+ })
74
+
75
+ it("Step 2: Create order (uses Step 1 token)", async () => {
76
+ ctx.orderId = await createOrder(ctx.token!, orderData)
77
+ expect(ctx.orderId).toMatch(/^ord-/)
78
+ })
79
+
80
+ it("Step 3: Verify order state (uses Step 2 orderId)", async () => {
81
+ const order = await getOrder(ctx.token!, ctx.orderId!)
82
+ expect(order.status).toBe("pending")
83
+ })
84
+ })
85
+
86
+ describe("Flow Branch: Quota exceeded path", () => {
87
+ it("should return 429 and NOT create order when quota is exhausted", async () => {
88
+ await exhaustQuota(testUser)
89
+ const response = await attemptCreateOrder(testToken, orderData)
90
+ expect(response.status).toBe(429)
91
+ expect(response.body.code).toBe("QUOTA_EXCEEDED")
92
+ // Verify side effects: no order was created
93
+ const orders = await getOrders(testUser)
94
+ expect(orders.length).toBe(0)
95
+ })
96
+ })
97
+ ```
98
+
99
+ ---
100
+
101
+ ## Anti-Patterns
102
+
103
+ - Testing only the happy path flow (missing failure terminal states)
104
+ - Resetting shared state between steps (breaks state threading)
105
+ - Testing each step in isolation without verifying accumulated state
106
+ - Using a single test for a flow with multiple decision points
107
+ - Applying All-Combinations to every flow (reserve for critical paths only)
108
+ - Not verifying side effects (or absence thereof) in branch tests
109
+
110
+ ---
111
+
112
+ ## Relationship to Other Standards
113
+
114
+ - **test-completeness-dimensions**: Dimensions 9 (Flow Completeness) and 10 (Branch Coverage) are defined here
115
+ - **behavior-driven-development**: BDD Scenario Outline tables map to decision table expansion
116
+ - **mock-boundary**: Flow tests must respect mock boundary rules (no mocking own module logic)
117
+ - **e2e-testing**: Journey tests run at ST or E2E level; flow tests can run at IT level with real DB
118
+
119
+ ---
120
+
121
+ ## Multi-Gate Flow Verification Model
122
+
123
+ Flow coverage is not a single pre-release check — it is a **progressive verification chain** across the entire SDLC. There are two fundamentally different questions that must be answered at different stages:
124
+
125
+ | Verification Type | Question | Executor | Timing |
126
+ |------------------|----------|----------|--------|
127
+ | **Coverage** | Are all terminal states tested? | Automated CI | Dev → Staging → Pre-UAT |
128
+ | **Correctness** | Are the terminal state definitions right? | Human UAT | UAT phase |
129
+
130
+ Confusing the two wastes UAT cycles on technical coverage issues that CI should have caught.
131
+
132
+ ### Gate 0 — PRD Sign-off (Before Implementation Starts)
133
+
134
+ The three testability elements MUST be written into the PRD before a single line of code is written. Use `templates/requirement-template.md` §2.4 and §9.4:
135
+
136
+ | Element | PRD Section | When Required |
137
+ |---------|-------------|---------------|
138
+ | Preconditions + Ordered Steps | §2.4 | Flows with ≥ 3 steps |
139
+ | Decision Points list | §2.4 | Every branch condition |
140
+ | Terminal States list | §2.4 | All distinct end states |
141
+ | Decision Table (Each-Choice) | §9.4 | All flows |
142
+ | Upgrade to All-Combinations | §9.4 | Auth / payment / security |
143
+ | UAT acceptance script (pre-filled) | §9.4 | Before PRD approval |
144
+
145
+ > **Why at PRD stage?** Test engineers cannot derive branch coverage from a spec that only describes the happy path. Discovering missing decision points during test design wastes a full sprint.
146
+
147
+ ### Gate 1 — PR Merge (Per Feature Branch)
148
+
149
+ Every PR that touches a flow with ≥ 3 steps MUST include automated tests covering the terminal states introduced or modified by that PR. Reviewers block merge if terminal states are added to §2.4 without corresponding tests.
150
+
151
+ ### Gate 3 — Pre-UAT Deployment (Automated + QA Lead Sign-off)
152
+
153
+ CI must prove coverage completeness **before** UAT begins. UAT is for correctness validation, not technical testing.
154
+
155
+ Required CI checks:
156
+ - All Decision Table scenarios have a passing automated test
157
+ - Zero terminal states without test coverage
158
+ - Branch coverage ≥ 90% (or project-defined threshold)
159
+ - All-Combinations fully passing for auth / payment / security flows
160
+
161
+ > Deploying to UAT without Gate 3 forces business stakeholders to act as technical QA — a costly and demoralizing misuse of UAT time.
162
+
163
+ ### Gate 4 — UAT Sign-off (Business Correctness, Pre-Production)
164
+
165
+ UAT validates that terminal state **definitions are correct** against real business rules, not that they are covered. Use the UAT Acceptance Script in §9.4 (derived directly from the Decision Table — no separate script creation needed):
166
+
167
+ - Business stakeholders sign off each row (terminal state)
168
+ - If UAT reveals a previously undefined terminal state: add it to §2.4 + Decision Table + automated test, re-run Gate 3, then resume UAT
169
+ - No new terminal states discovered during UAT = strong signal that §2.4 was thorough
170
+
171
+ ### Gate Model Summary
172
+
173
+ ```
174
+ PRD Sign-off
175
+ │ Gate 0: §2.4 + §9.4 complete (Decision Points, Terminal States,
176
+ │ Decision Table, UAT script pre-filled)
177
+
178
+ Implementation + PR Reviews
179
+ │ Gate 1: Each PR covering a flow includes terminal state tests
180
+
181
+ Staging / Integration
182
+ │ (no formal gate — CI green is sufficient)
183
+
184
+ Pre-UAT Deployment
185
+ │ Gate 3: CI proves 100% terminal state coverage + branch coverage ≥ 90%
186
+
187
+ UAT Execution
188
+ │ Gate 4: Business sign-off on terminal state correctness
189
+ │ New terminal states → back to Gate 3 before proceeding
190
+
191
+ Production
192
+ ```
193
+
194
+ ---
195
+
196
+ ## RQM Integration
197
+
198
+ Gate 3 (Pre-UAT CI coverage gate) MUST produce a **`flow_gate_report.json`** artifact consumed by the Release Quality Manifest (`release-quality-manifest.md`, field `flow_gate_report`).
199
+
200
+ ### flow_gate_report.json Schema
201
+
202
+ ```json
203
+ {
204
+ "generated_at": "2026-05-05T04:00:00Z",
205
+ "commit": "abc1234",
206
+ "flows": [
207
+ {
208
+ "flow_id": "login-authentication",
209
+ "spec_ref": "docs/specs/SPEC-001.md#2.4",
210
+ "decision_points": 3,
211
+ "terminal_states": 7,
212
+ "gate_0_complete": true,
213
+ "gate_1_pr_coverage": true,
214
+ "gate_3": {
215
+ "all_scenarios_green": true,
216
+ "terminal_states_covered": 7,
217
+ "terminal_states_defined": 7,
218
+ "branch_coverage_pct": 94,
219
+ "coverage_target": 90,
220
+ "all_combinations_required": false,
221
+ "status": "pass"
222
+ },
223
+ "gate_4_uat_signoff": true
224
+ }
225
+ ],
226
+ "summary": {
227
+ "total_flows": 5,
228
+ "gate_0_complete": true,
229
+ "gate_1_pr_coverage": true,
230
+ "gate_3_ci_pass": true,
231
+ "gate_4_uat_signoff": true,
232
+ "status": "pass"
233
+ }
234
+ }
235
+ ```
236
+
237
+ ### Generation Script Hook
238
+
239
+ Add to CI after test run (Gate 3):
240
+
241
+ ```bash
242
+ # scripts/generate-flow-gate-report.sh
243
+ node scripts/generate-flow-gate-report.mjs \
244
+ --coverage-report coverage/coverage-summary.json \
245
+ --flow-specs "docs/specs/**/*.md" \
246
+ --uat-signoffs ".release-readiness/*.md" \
247
+ --output flow_gate_report.json
248
+ ```
249
+
250
+ The `summary.status` field feeds into `release-quality-manifest.yaml` under `flow_gate_report.status`.
251
+
252
+ ---
253
+
254
+ ## Quick Reference Checklist
255
+
256
+ ```
257
+ Flow: ___________________
258
+
259
+ □ Step 1 — Flow Identification
260
+ □ Preconditions documented
261
+ □ Ordered step sequence listed
262
+ □ All decision points extracted
263
+ □ All terminal states defined
264
+
265
+ □ Step 2 — Decision Table
266
+ □ Decision table created
267
+ □ Coverage strategy chosen (Each-Choice / Pairwise / All-Combinations)
268
+ □ Critical flows (auth/payment/security) → All-Combinations
269
+
270
+ □ Step 3 — Journey Test Structure
271
+ □ Happy path journey test (shared ctx, sequential steps)
272
+ □ Each branch outcome has its own describe block
273
+ □ Branch tests verify both response AND absence of side effects
274
+ □ No beforeEach resetting ctx between steps
275
+ ```
@@ -0,0 +1,183 @@
1
+ # Full Coverage Testing Standards
2
+
3
+ > **AI-optimized version**: `ai/standards/full-coverage-testing.ai.yaml`
4
+ > **XSPEC**: XSPEC-178
5
+ > **Replaces**: Pyramid threshold model (UT≥80%, IT≥70%, E2E happy-path-only)
6
+
7
+ ## Overview
8
+
9
+ Full Coverage Testing is a behavior-completeness paradigm designed for the AI-era, where the cost of generating tests equals the cost of generating code. Traditional pyramid thresholds assumed tests were expensive to write — this assumption no longer holds.
10
+
11
+ **Core principle**: Every public function must be tested for all three behavioral paths. Coverage is measured by behavior completeness, not percentage floors. CI enforces a ratchet: coverage can only increase, never decrease.
12
+
13
+ ---
14
+
15
+ ## Behavior-Completeness Model
16
+
17
+ Instead of "80% line coverage", require:
18
+
19
+ | Path | Description | Example |
20
+ |------|-------------|---------|
21
+ | **Happy path** | Normal input produces correct output | `calculateDiscount(100, 0.1) → 90` |
22
+ | **Edge case** | Boundary values do not cause unexpected errors | `calculateDiscount(0, 1.0) → 0 without throwing` |
23
+ | **Error path** | Invalid input raises clear error or error state | `calculateDiscount(-1, 2.0) → throws ArgumentError` |
24
+
25
+ Every public function requires all three. This replaces the "80% of business logic" target with a qualitative, behavior-driven requirement.
26
+
27
+ ---
28
+
29
+ ## Ratchet CI Policy
30
+
31
+ - The current coverage baseline is the minimum acceptable coverage
32
+ - Any PR that decreases coverage is blocked from merging
33
+ - Improvements update the baseline automatically on merge
34
+ - No fixed percentage floor — the coverage achieved today is tomorrow's floor
35
+
36
+ ```bash
37
+ # Stored in .coverage-baseline.json
38
+ { "line": 91.3, "branch": 88.7, "timestamp": "2026-05-06" }
39
+
40
+ # PR regression → blocked
41
+ Coverage regression: 91.3% → 89.1%. Ratchet threshold violated.
42
+
43
+ # PR improvement → baseline updated
44
+ Coverage improved: 91.3% → 92.0%. New baseline set.
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Anti-Fake Test Rules
50
+
51
+ ### Forbidden: Tautology Assertions
52
+
53
+ Assertions that always pass regardless of behavior provide false coverage.
54
+
55
+ ```typescript
56
+ // ❌ FORBIDDEN — always passes, tests nothing
57
+ expect(true).toBe(true)
58
+ expect(result).toBeDefined() // without specific value
59
+
60
+ // ✅ REQUIRED — verifies actual behavior
61
+ expect(result).toBe(90)
62
+ expect(result).toEqual({ discount: 10, total: 90 })
63
+ ```
64
+
65
+ ### Forbidden: Mocking Core Business Logic
66
+
67
+ Mocking your own code means the business logic is never actually executed.
68
+
69
+ ```typescript
70
+ // ❌ FORBIDDEN — business logic never runs
71
+ jest.mock('./orderService', () => ({ calculateTotal: jest.fn(() => 100) }))
72
+
73
+ // ✅ ALLOWED — mock only external dependencies
74
+ // MOCK: External Stripe API — no sandbox available in CI
75
+ jest.mock('./payment-gateway', () => ({ charge: jest.fn().mockResolvedValue({ id: 'ch_test' }) }))
76
+ ```
77
+
78
+ ### Required: Mock Reason Comments
79
+
80
+ Every mock must explain why the dependency cannot be real.
81
+
82
+ ```typescript
83
+ // ❌ FORBIDDEN — no explanation
84
+ jest.mock('./payment-gateway')
85
+
86
+ // ✅ REQUIRED — explicit reason
87
+ // MOCK: External payment gateway — network dependency, no sandbox in CI
88
+ jest.mock('./payment-gateway', () => ({ ... }))
89
+ ```
90
+
91
+ ### Mock Boundary: What Can Be Mocked
92
+
93
+ | ✅ Allowed to Mock | ❌ Forbidden to Mock |
94
+ |-------------------|---------------------|
95
+ | External HTTP APIs (payment, OAuth) | Core business calculation functions |
96
+ | Hardware interfaces (sensors, GPIO) | Your own service layer methods |
97
+ | Third-party SDKs without test mode | Database queries (use in-memory SQLite) |
98
+ | Docker daemon | Your own utility functions |
99
+
100
+ ---
101
+
102
+ ## STUB Marker Protocol
103
+
104
+ All temporary/placeholder implementations MUST be marked with the standard STUB marker. This is enforced by pre-push hooks and deploy.sh.
105
+
106
+ ### Marking a STUB
107
+
108
+ ```typescript
109
+ // WARNING: STUB — Remove before UAT
110
+ async function validatePayment(card: Card): Promise<boolean> {
111
+ return true; // Always approve — replace with real Stripe call
112
+ }
113
+ ```
114
+
115
+ ### Exempting a Genuine Limitation
116
+
117
+ When a dependency truly cannot be tested (hardware, live API without sandbox):
118
+
119
+ ```typescript
120
+ // COVERAGE_EXEMPT: Hardware temperature sensor — no simulation available in CI
121
+ async function readTemperature(): Promise<number> {
122
+ return hardwareSensor.read();
123
+ }
124
+ ```
125
+
126
+ The exemption reason MUST be non-empty and specific.
127
+
128
+ ### Deployment Gates
129
+
130
+ | Environment | STUB Present | Action |
131
+ |-------------|-------------|--------|
132
+ | Feature branch push | Yes | ⚠️ Warning (not blocked) |
133
+ | `main` branch push | Yes | ❌ Blocked |
134
+ | Staging deploy | Yes | ⚠️ Warning (not blocked) |
135
+ | UAT deploy | Yes | ❌ Blocked |
136
+ | Production deploy | Yes | ❌ Blocked (critical log) |
137
+
138
+ ---
139
+
140
+ ## AC Traceability
141
+
142
+ Link each test to its Acceptance Criteria using the `@ac` JSDoc tag:
143
+
144
+ ```typescript
145
+ /**
146
+ * @ac AC-US03-2
147
+ */
148
+ it('should block PR when coverage regresses below baseline', () => {
149
+ // test body
150
+ })
151
+
152
+ // If no AC maps to this test:
153
+ /**
154
+ * @ac UNTRACED
155
+ */
156
+ it('helper utility returns correct format', () => { ... })
157
+ ```
158
+
159
+ CI reports AC coverage rate. If more than 20% of ACs lack `@ac`-tagged tests, a warning is shown.
160
+
161
+ ---
162
+
163
+ ## Migration from Pyramid Model
164
+
165
+ If your project previously used pyramid thresholds:
166
+
167
+ 1. **Delete** any hardcoded coverage thresholds from `jest.config.js` / `vitest.config.ts` (`coverageThreshold` option)
168
+ 2. **Install** `.coverage-baseline.json` with current coverage as the starting ratchet
169
+ 3. **Add** `scripts/check-coverage-ratchet.sh` to CI
170
+ 4. **Add** `scripts/check-stubs.sh` to deploy.sh and pre-push hook
171
+ 5. **Add** `scripts/check-anti-fake-tests.sh` to pre-commit or CI
172
+
173
+ The ratchet starts at your current coverage. From that point on, it can only increase.
174
+
175
+ ---
176
+
177
+ ## Related Standards
178
+
179
+ - `testing.ai.yaml` — Test structure, FIRST principles, AAA pattern (pyramid thresholds deprecated here)
180
+ - `unit-testing.ai.yaml` — Unit test scope and organization
181
+ - `integration-testing.ai.yaml` — Integration test patterns
182
+ - `deployment-standards.ai.yaml` — Deploy gate requirements
183
+ - XSPEC-178 — Full specification and implementation phases
@@ -0,0 +1,178 @@
1
+ # LLM 輸出驗證標準
2
+
3
+ > 標準 ID:`llm-output-validation`
4
+ > 版本:v1.0.0
5
+ > 最後更新:2026-05-05
6
+
7
+ ---
8
+
9
+ ## 為什麼需要 LLM 輸出驗證?
10
+
11
+ LLM 輸出具有**不確定性**:同一個 prompt 在不同時間、不同模型版本下可能產生格式不一致的輸出。如果不加以驗證,這些輸出可能在下游管線中造成靜默失敗(silent failure)——不是報錯,而是用了一個錯誤的預設值或 `undefined`。
12
+
13
+ LLM 輸出驗證包含三個層次:
14
+
15
+ | 層次 | 問題 | 工具 |
16
+ |------|------|------|
17
+ | 結構驗證 | 輸出格式是否正確? | JSON Schema、Zod、Pydantic |
18
+ | 語意驗證 | 宣稱的事實是否有根據? | NLI probe、Grounding check |
19
+ | 行為驗證 | Agent 是否正確拒絕越界請求? | 紅隊語料庫、拒絕評估 |
20
+
21
+ ---
22
+
23
+ ## 一、Schema Contract Test(結構驗證)
24
+
25
+ ### 核心概念
26
+
27
+ 每個 AI Agent 應宣告一份 `output-schema.json`(JSON Schema 格式),並提供對應的 contract test。
28
+
29
+ **Contract test 的目的**:
30
+ - 確認 schema 本身是合法的 JSON Schema
31
+ - 確認 valid fixtures 通過驗證
32
+ - 確認 invalid fixtures(缺少必填欄位、型別錯誤、enum 違規)被拒絕
33
+
34
+ ### 推薦目錄結構
35
+
36
+ ```
37
+ agents/<agent-name>/
38
+ output-schema.json ← JSON Schema 定義
39
+ __tests__/
40
+ contract.test.ts ← Contract test suite
41
+ __fixtures__/
42
+ valid.json ← 真實 LLM 輸出 golden fixture
43
+ invalid-missing-id.json ← 缺少必填欄位的 fixture
44
+ ```
45
+
46
+ ### TypeScript 範例(使用 Ajv)
47
+
48
+ ```typescript
49
+ import Ajv from "ajv"
50
+ import schema from "../output-schema.json"
51
+ import validFixture from "../__fixtures__/valid.json"
52
+
53
+ const ajv = new Ajv({ strict: false })
54
+ const validate = ajv.compile(schema)
55
+
56
+ // 測試 1:Schema 本身是合法的 JSON Schema
57
+ it("schema is valid JSON Schema", () => {
58
+ expect(ajv.validateSchema(schema)).toBe(true)
59
+ })
60
+
61
+ // 測試 2:Valid fixture 通過驗證
62
+ it("valid fixture passes schema", () => {
63
+ expect(validate(validFixture)).toBe(true)
64
+ })
65
+
66
+ // 測試 3:空 object 被拒絕
67
+ it("empty object is rejected", () => {
68
+ expect(validate({})).toBe(false)
69
+ })
70
+
71
+ // 測試 4:缺少 source_agent 被拒絕
72
+ it("object missing source_agent is rejected", () => {
73
+ const { source_agent, ...without } = validFixture
74
+ expect(validate(without)).toBe(false)
75
+ })
76
+ ```
77
+
78
+ ### Python 範例(使用 Pydantic)
79
+
80
+ ```python
81
+ from pydantic import ValidationError
82
+ from your_module import AgentOutput
83
+
84
+ # 測試 valid fixture
85
+ valid_data = { "version": "1.0.0", "source_agent": "planner", ... }
86
+ output = AgentOutput(**valid_data) # 不拋出 exception
87
+
88
+ # 測試 invalid fixture
89
+ try:
90
+ AgentOutput(version="bad-format", source_agent="planner")
91
+ assert False, "Should have raised"
92
+ except ValidationError:
93
+ pass # 預期行為
94
+ ```
95
+
96
+ ---
97
+
98
+ ## 二、幻覺偵測(Semantic Validation)
99
+
100
+ ### 什麼是幻覺?
101
+
102
+ LLM 產生「聽起來正確但實際上沒有根據」的內容。例如:
103
+ - 虛構的 API 文件 URL
104
+ - 不存在的資料庫欄位名稱
105
+ - 未在 context 中出現的 dependency 版本
106
+
107
+ ### 偵測策略
108
+
109
+ | 策略 | 適用場景 | 自動化程度 |
110
+ |------|---------|-----------|
111
+ | **Schema 結構化輸出** | Agent 輸出 JSON,enum 限制可能值 | 高(自動) |
112
+ | **Grounding Check** | RAG 系統,回答需引用 context | 中(需 NLI 模型) |
113
+ | **信心度標記** | Agent 在輸出中包含 `confidence` 分數 | 中(需 prompt 設計) |
114
+ | **紅隊語料庫** | 主動測試越界請求的拒絕行為 | 高(自動) |
115
+
116
+ ### 幻覺率目標
117
+
118
+ | Agent 類型 | Schema 合規率 | 事實幻覺率 |
119
+ |-----------|-------------|----------|
120
+ | 結構化 JSON Agent | ≥ 99% | ≤ 5% |
121
+ | RAG Agent | ≥ 95% | ≤ 5% |
122
+ | 對話 Agent | ≥ 90% | ≤ 10% |
123
+
124
+ ---
125
+
126
+ ## 三、Prompt 回歸測試
127
+
128
+ ### 何時需要跑 Prompt 回歸測試?
129
+
130
+ - 修改任何 `agents/*/prompt.md`
131
+ - 模型版本升級(相同 prompt,不同 model)
132
+ - Schema 新增 required field
133
+
134
+ ### 回歸測試流程
135
+
136
+ ```bash
137
+ # 1. 修改前:用 temperature=0 記錄 golden output
138
+ vibeops run planner --input fixtures/planner-input.json --temp 0 > golden.json
139
+
140
+ # 2. 修改後:重跑並比對
141
+ vibeops run planner --input fixtures/planner-input.json --temp 0 > after.json
142
+
143
+ # 3. 用 contract test 驗證 after.json 仍符合 schema
144
+ npx vitest run agents/__tests__/contract.test.ts
145
+ ```
146
+
147
+ ---
148
+
149
+ ## 四、品質閘門(Quality Gates)
150
+
151
+ | 閘門 | 閾值 | 強制程度 |
152
+ |------|------|---------|
153
+ | Schema 合規(CI) | 100% | Block merge |
154
+ | 空 object 拒絕(CI)| 100% | Block merge |
155
+ | Prompt 修改後回歸(CI)| schema 合規維持 | Block merge |
156
+ | 幻覺率(pre-release)| ≤ 5% | Advisory |
157
+
158
+ ---
159
+
160
+ ## 五、工具推薦
161
+
162
+ | 工具 | 語言 | 用途 |
163
+ |------|------|------|
164
+ | [Ajv](https://ajv.js.org/) | TypeScript/JS | JSON Schema contract test |
165
+ | [Zod](https://zod.dev/) | TypeScript | Runtime type validation |
166
+ | [Pydantic](https://docs.pydantic.dev/) | Python | Schema + type validation |
167
+ | [DeepEval](https://deepeval.com/) | Python | LLM 幻覺率、faithfulness 評分 |
168
+ | [Ragas](https://docs.ragas.io/) | Python | RAG grounded answer rate |
169
+
170
+ ---
171
+
172
+ ## 參考標準
173
+
174
+ - NIST AI RMF (AI 100-1, 2023) — AI 風險管理框架
175
+ - OWASP Top 10 for LLM Applications v1.1 — LLM01: Prompt Injection
176
+ - ISO/IEC 42001:2023 — AI 管理系統
177
+ - [UDS `security-testing.ai.yaml`](./security-testing.md) — SAST + DAST 整合
178
+ - [UDS `adversarial-test.ai.yaml`](./adversarial-test.md) — Prompt injection 紅隊標準