universal-dev-standards 5.4.0 → 5.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (138) hide show
  1. package/bundled/ai/options/testing/integration-testing.ai.yaml +2 -2
  2. package/bundled/ai/options/testing/unit-testing.ai.yaml +2 -2
  3. package/bundled/ai/standards/adversarial-test.ai.yaml +277 -0
  4. package/bundled/ai/standards/audit-trail.ai.yaml +113 -0
  5. package/bundled/ai/standards/browser-compatibility-standards.ai.yaml +63 -0
  6. package/bundled/ai/standards/chaos-injection-tests.ai.yaml +91 -0
  7. package/bundled/ai/standards/container-image-standards.ai.yaml +88 -0
  8. package/bundled/ai/standards/container-security.ai.yaml +331 -0
  9. package/bundled/ai/standards/contract-testing-standards.ai.yaml +62 -0
  10. package/bundled/ai/standards/cost-budget-test.ai.yaml +96 -0
  11. package/bundled/ai/standards/cross-flow-regression.ai.yaml +61 -0
  12. package/bundled/ai/standards/data-contract.ai.yaml +110 -0
  13. package/bundled/ai/standards/data-migration-testing.ai.yaml +96 -0
  14. package/bundled/ai/standards/data-pipeline.ai.yaml +113 -0
  15. package/bundled/ai/standards/disaster-recovery-drill.ai.yaml +89 -0
  16. package/bundled/ai/standards/flaky-test-management.ai.yaml +89 -0
  17. package/bundled/ai/standards/flow-based-testing.ai.yaml +240 -0
  18. package/bundled/ai/standards/full-coverage-testing.ai.yaml +192 -0
  19. package/bundled/ai/standards/iac-design-principles.ai.yaml +83 -0
  20. package/bundled/ai/standards/incident-response.ai.yaml +107 -0
  21. package/bundled/ai/standards/license-compliance.ai.yaml +106 -0
  22. package/bundled/ai/standards/llm-output-validation.ai.yaml +269 -0
  23. package/bundled/ai/standards/mock-boundary.ai.yaml +250 -0
  24. package/bundled/ai/standards/mutation-testing.ai.yaml +192 -0
  25. package/bundled/ai/standards/pii-classification.ai.yaml +109 -0
  26. package/bundled/ai/standards/policy-as-code-testing.ai.yaml +227 -0
  27. package/bundled/ai/standards/prd-standards.ai.yaml +88 -0
  28. package/bundled/ai/standards/product-metrics-standards.ai.yaml +111 -0
  29. package/bundled/ai/standards/prompt-regression.ai.yaml +94 -0
  30. package/bundled/ai/standards/property-based-testing.ai.yaml +105 -0
  31. package/bundled/ai/standards/release-quality-manifest.ai.yaml +135 -0
  32. package/bundled/ai/standards/release-readiness-gate.ai.yaml +77 -0
  33. package/bundled/ai/standards/replay-test.ai.yaml +111 -0
  34. package/bundled/ai/standards/runbook.ai.yaml +104 -0
  35. package/bundled/ai/standards/sast-advanced.ai.yaml +135 -0
  36. package/bundled/ai/standards/schema-evolution.ai.yaml +111 -0
  37. package/bundled/ai/standards/secret-management-standards.ai.yaml +105 -0
  38. package/bundled/ai/standards/secure-op.ai.yaml +365 -0
  39. package/bundled/ai/standards/security-testing.ai.yaml +171 -0
  40. package/bundled/ai/standards/server-ops-security.ai.yaml +274 -0
  41. package/bundled/ai/standards/slo-sli.ai.yaml +97 -0
  42. package/bundled/ai/standards/smoke-test.ai.yaml +87 -0
  43. package/bundled/ai/standards/supply-chain-attestation.ai.yaml +109 -0
  44. package/bundled/ai/standards/test-completeness-dimensions.ai.yaml +52 -5
  45. package/bundled/ai/standards/testing.ai.yaml +20 -13
  46. package/bundled/ai/standards/user-story-mapping.ai.yaml +108 -0
  47. package/bundled/core/accessibility-standards.md +58 -0
  48. package/bundled/core/adversarial-test.md +212 -0
  49. package/bundled/core/branch-completion.md +4 -0
  50. package/bundled/core/browser-compatibility-standards.md +220 -0
  51. package/bundled/core/chaos-injection-tests.md +116 -0
  52. package/bundled/core/checkin-standards.md +1 -0
  53. package/bundled/core/container-security.md +521 -0
  54. package/bundled/core/contract-testing-standards.md +182 -0
  55. package/bundled/core/cost-budget-test.md +69 -0
  56. package/bundled/core/cross-flow-regression.md +190 -0
  57. package/bundled/core/data-migration-testing.md +110 -0
  58. package/bundled/core/disaster-recovery-drill.md +73 -0
  59. package/bundled/core/flaky-test-management.md +73 -0
  60. package/bundled/core/flow-based-testing.md +275 -0
  61. package/bundled/core/full-coverage-testing.md +183 -0
  62. package/bundled/core/llm-output-validation.md +178 -0
  63. package/bundled/core/mock-boundary.md +100 -0
  64. package/bundled/core/mutation-testing.md +97 -0
  65. package/bundled/core/performance-standards.md +65 -0
  66. package/bundled/core/policy-as-code-testing.md +188 -0
  67. package/bundled/core/prompt-regression.md +72 -0
  68. package/bundled/core/property-based-testing.md +73 -0
  69. package/bundled/core/release-quality-manifest.md +193 -0
  70. package/bundled/core/release-readiness-gate.md +184 -0
  71. package/bundled/core/replay-test.md +86 -0
  72. package/bundled/core/sast-advanced.md +300 -0
  73. package/bundled/core/secure-op.md +314 -0
  74. package/bundled/core/security-testing.md +87 -0
  75. package/bundled/core/server-ops-security.md +493 -0
  76. package/bundled/core/smoke-test.md +65 -0
  77. package/bundled/core/supply-chain-attestation.md +117 -0
  78. package/bundled/locales/zh-CN/CHANGELOG.md +3 -3
  79. package/bundled/locales/zh-CN/README.md +1 -1
  80. package/bundled/locales/zh-CN/skills/ai-instruction-standards/SKILL.md +5 -5
  81. package/bundled/locales/zh-TW/CHANGELOG.md +3 -3
  82. package/bundled/locales/zh-TW/README.md +1 -1
  83. package/bundled/locales/zh-TW/core/browser-compatibility-standards.md +11 -0
  84. package/bundled/locales/zh-TW/core/contract-testing-standards.md +11 -0
  85. package/bundled/locales/zh-TW/core/cross-flow-regression.md +11 -0
  86. package/bundled/locales/zh-TW/core/release-readiness-gate.md +11 -0
  87. package/bundled/locales/zh-TW/skills/ai-instruction-standards/SKILL.md +183 -79
  88. package/bundled/skills/README.md +4 -3
  89. package/bundled/skills/SKILL_NAMING.md +94 -0
  90. package/bundled/skills/ai-instruction-standards/SKILL.md +181 -88
  91. package/bundled/skills/atdd-assistant/SKILL.md +8 -0
  92. package/bundled/skills/bdd-assistant/SKILL.md +7 -0
  93. package/bundled/skills/checkin-assistant/SKILL.md +8 -0
  94. package/bundled/skills/code-review-assistant/SKILL.md +7 -0
  95. package/bundled/skills/journey-test-assistant/SKILL.md +203 -0
  96. package/bundled/skills/orchestrate/SKILL.md +167 -0
  97. package/bundled/skills/plan/SKILL.md +234 -0
  98. package/bundled/skills/pr-automation-assistant/SKILL.md +8 -0
  99. package/bundled/skills/push/SKILL.md +49 -2
  100. package/bundled/skills/{process-automation → skill-builder}/SKILL.md +1 -1
  101. package/bundled/skills/{forward-derivation → spec-derivation}/SKILL.md +1 -1
  102. package/bundled/skills/spec-driven-dev/SKILL.md +7 -0
  103. package/bundled/skills/sweep/SKILL.md +145 -0
  104. package/bundled/skills/tdd-assistant/SKILL.md +7 -0
  105. package/package.json +6 -6
  106. package/src/commands/check.js +43 -0
  107. package/src/commands/flow.js +8 -0
  108. package/src/commands/init.js +2 -1
  109. package/src/commands/start.js +14 -0
  110. package/src/commands/sweep.js +8 -0
  111. package/src/commands/update.js +10 -0
  112. package/src/commands/workflow.js +8 -0
  113. package/standards-registry.json +483 -5
  114. package/bundled/locales/zh-CN/skills/ac-coverage-assistant/SKILL.md +0 -190
  115. package/bundled/locales/zh-CN/skills/forward-derivation/SKILL.md +0 -71
  116. package/bundled/locales/zh-CN/skills/forward-derivation/guide.md +0 -130
  117. package/bundled/locales/zh-CN/skills/methodology-system/SKILL.md +0 -88
  118. package/bundled/locales/zh-CN/skills/methodology-system/create-methodology.md +0 -350
  119. package/bundled/locales/zh-CN/skills/methodology-system/guide.md +0 -131
  120. package/bundled/locales/zh-CN/skills/methodology-system/runtime.md +0 -279
  121. package/bundled/locales/zh-CN/skills/process-automation/SKILL.md +0 -143
  122. package/bundled/locales/zh-TW/skills/ac-coverage-assistant/SKILL.md +0 -195
  123. package/bundled/locales/zh-TW/skills/deploy-assistant/SKILL.md +0 -178
  124. package/bundled/locales/zh-TW/skills/forward-derivation/SKILL.md +0 -69
  125. package/bundled/locales/zh-TW/skills/forward-derivation/guide.md +0 -415
  126. package/bundled/locales/zh-TW/skills/methodology-system/SKILL.md +0 -86
  127. package/bundled/locales/zh-TW/skills/methodology-system/create-methodology.md +0 -350
  128. package/bundled/locales/zh-TW/skills/methodology-system/guide.md +0 -131
  129. package/bundled/locales/zh-TW/skills/methodology-system/runtime.md +0 -279
  130. package/bundled/locales/zh-TW/skills/process-automation/SKILL.md +0 -144
  131. /package/bundled/skills/{ac-coverage-assistant → ac-coverage}/SKILL.md +0 -0
  132. /package/bundled/skills/{methodology-system → dev-methodology}/SKILL.md +0 -0
  133. /package/bundled/skills/{methodology-system → dev-methodology}/create-methodology.md +0 -0
  134. /package/bundled/skills/{methodology-system → dev-methodology}/guide.md +0 -0
  135. /package/bundled/skills/{methodology-system → dev-methodology}/integrated-flow.md +0 -0
  136. /package/bundled/skills/{methodology-system → dev-methodology}/prerequisite-check.md +0 -0
  137. /package/bundled/skills/{methodology-system → dev-methodology}/runtime.md +0 -0
  138. /package/bundled/skills/{forward-derivation → spec-derivation}/guide.md +0 -0
@@ -0,0 +1,96 @@
1
+ # Data Migration Testing Standards - AI Optimized
2
+ # Source: core/data-migration-testing.md
3
+
4
+ id: data-migration-testing
5
+ meta:
6
+ version: "1.0.0"
7
+ updated: "2026-05-05"
8
+ source: core/data-migration-testing.md
9
+ description: Database migration up/down/idempotency testing standards for schema changes
10
+
11
+ requirements:
12
+ REQ-1:
13
+ id: REQ-DMT-001
14
+ title: Up Migration Test
15
+ rule: >
16
+ Every schema migration MUST have an automated test that applies it to a clean
17
+ baseline schema and verifies the expected post-state (table names, columns,
18
+ indexes, constraints).
19
+ rationale: >
20
+ Unverified migrations silently corrupt production schema; up tests catch
21
+ incompatible changes before deployment.
22
+
23
+ REQ-2:
24
+ id: REQ-DMT-002
25
+ title: Down Migration (Rollback) Test
26
+ rule: >
27
+ Every migration that has a down/rollback function MUST have a test that
28
+ applies up, then down, and verifies the schema returns to its exact pre-state.
29
+ Zero data loss is the acceptance criterion.
30
+ rationale: >
31
+ Rollback is only reliable if it is tested; untested rollbacks fail at the worst
32
+ possible moment — during a production incident.
33
+
34
+ REQ-3:
35
+ id: REQ-DMT-003
36
+ title: Idempotency Test
37
+ rule: >
38
+ Migration runners MUST be tested to ensure applying the same migration twice
39
+ does not fail or cause side effects (e.g., duplicate columns, duplicate indexes).
40
+ rationale: >
41
+ CI retries and operator mistakes can trigger double-apply; idempotent migrations
42
+ prevent partial state corruption.
43
+
44
+ REQ-4:
45
+ id: REQ-DMT-004
46
+ title: Data Preservation Test
47
+ rule: >
48
+ Migrations that ALTER or DROP columns MUST include a test that seeds representative
49
+ data before the migration and verifies data integrity or expected transformation
50
+ after the migration.
51
+ rationale: >
52
+ Schema correctness does not imply data correctness; a column rename can silently
53
+ null-out existing data if the migration omits a data transform step.
54
+
55
+ REQ-5:
56
+ id: REQ-DMT-005
57
+ title: Migration History Integrity
58
+ rule: >
59
+ The migration runner's applied-migrations table/ledger MUST be validated:
60
+ each test run uses an isolated in-memory or temporary database so migrations
61
+ do not interfere with each other or with production state.
62
+ rationale: >
63
+ Shared migration state between tests causes non-deterministic failures that are
64
+ extremely difficult to diagnose.
65
+
66
+ test_structure:
67
+ isolation: "Each migration test MUST run against an isolated in-memory or temp DB"
68
+ baseline: "Start from the schema state immediately before the migration under test"
69
+ assertions:
70
+ up: "Assert table/column/index existence and types"
71
+ down: "Assert schema returns to pre-migration state"
72
+ idempotency: "Apply migration twice; second apply must succeed without error"
73
+ data: "Seed rows before migration; assert row count and values after"
74
+
75
+ tooling_guidance:
76
+ sqlite: "Use ':memory:' DSN; apply migrations via ORM migrate() or raw SQL"
77
+ postgres: "Use pgmock or test containers; run migrate up/down in a transaction"
78
+ drizzle_orm: >
79
+ Call db.run(sql`...`) with raw DDL, or use drizzle-kit's migrate() function
80
+ against a fresh :memory: SQLite database for each test file.
81
+
82
+ anti_patterns:
83
+ - description: >
84
+ Testing migrations against a shared development database — causes cross-test
85
+ pollution and non-repeatable results.
86
+ - description: >
87
+ Skipping down migration tests because "we never rollback" — rollbacks happen
88
+ during incidents; the worst time to discover they are broken.
89
+ - description: >
90
+ Writing migration tests after the fact without seeding data — misses the
91
+ data preservation class of bugs entirely.
92
+
93
+ related_standards:
94
+ - testing
95
+ - database-standards
96
+ - verification-evidence
@@ -0,0 +1,113 @@
1
+ # Data Pipeline Standards - AI Optimized
2
+ # Source: XSPEC-068 Wave 3 Data Engineering Pack
3
+
4
+ id: data-pipeline
5
+ title: Data Pipeline Standards
6
+ version: "1.0.0"
7
+ status: Active
8
+ tags: [data-engineering, pipeline, etl, data-quality, orchestration, idempotency]
9
+ summary: |
10
+ Defines engineering standards for building reliable, observable, and
11
+ maintainable data pipelines. Covers idempotency and exactly-once semantics,
12
+ error handling and dead-letter queues, checkpoint and recovery patterns,
13
+ data lineage tracking, pipeline observability (metrics, alerting), testing
14
+ requirements, and deployment practices. Applicable to batch ETL, streaming
15
+ pipelines, and ML feature pipelines.
16
+
17
+ requirements:
18
+ - id: REQ-001
19
+ title: Idempotency and Exactly-Once Processing
20
+ description: |
21
+ Every data pipeline MUST be designed for idempotent execution:
22
+ re-running the same pipeline for the same time window or batch MUST
23
+ produce identical output without duplication or data loss. Pipelines
24
+ MUST use deterministic keys for deduplication. Batch pipelines MUST
25
+ support re-processing historical partitions cleanly. Streaming pipelines
26
+ MUST implement exactly-once or at-least-once with deduplication using
27
+ unique event IDs. Overwrites of output partitions are preferred over
28
+ appends for batch jobs.
29
+ level: MUST
30
+ examples:
31
+ - "Batch: pipeline writes to date-partitioned output and overwrites the partition on re-run"
32
+ - "Streaming: dedup using Kafka message key + consumer group offset tracking"
33
+ - "Test: running pipeline twice for 2026-04-01 produces same row count both times"
34
+
35
+ - id: REQ-002
36
+ title: Error Handling and Dead-Letter Queues
37
+ description: |
38
+ Data pipelines MUST implement structured error handling with
39
+ categorized failure modes. Transient errors (network timeout, API
40
+ rate limit) MUST use exponential backoff retry (max 3 attempts).
41
+ Permanent errors (schema violation, invalid data) MUST route records
42
+ to a Dead-Letter Queue (DLQ) with the original record, error type,
43
+ error message, and processing timestamp. DLQ records MUST be
44
+ monitored and addressed within the pipeline's SLA.
45
+ level: MUST
46
+ examples:
47
+ - "Transient retry: retry_policy: {max_attempts: 3, backoff_base: 2s, max_backoff: 30s}"
48
+ - "DLQ record: {original_record: {...}, error_type: 'SCHEMA_VIOLATION', error_msg: 'field amount is null', ts: '...'}"
49
+ - "DLQ alert: >100 DLQ messages in 1 hour → PagerDuty alert to data-oncall"
50
+
51
+ - id: REQ-003
52
+ title: Checkpoint and Recovery
53
+ description: |
54
+ Long-running batch pipelines and stateful streaming pipelines MUST
55
+ implement checkpointing to enable recovery from mid-run failures
56
+ without full reprocessing. Checkpoints MUST record: last successfully
57
+ processed partition/offset/watermark, job run ID, and timestamp.
58
+ Recovery MUST resume from the last checkpoint, not from the beginning.
59
+ Checkpoint state MUST be stored in durable external storage (not
60
+ local disk).
61
+ level: MUST
62
+ examples:
63
+ - "Batch: checkpoint stores {last_processed_date: '2026-04-28', last_id: 12345678} in S3"
64
+ - "Streaming: Flink checkpoint interval 5 minutes, stored in S3 with 3 checkpoints retained"
65
+ - "Recovery test: kill job mid-run, restart, verify output matches full run with no duplicates"
66
+
67
+ - id: REQ-004
68
+ title: Data Lineage Tracking
69
+ description: |
70
+ Every data pipeline MUST emit lineage metadata describing its data
71
+ flow: source datasets (with versions/timestamps), transformation logic
72
+ applied, and output datasets produced. Lineage MUST be machine-readable
73
+ and ingested into a central lineage store or data catalog. Lineage
74
+ enables root-cause analysis of data quality issues and impact assessment
75
+ of upstream changes.
76
+ level: MUST
77
+ examples:
78
+ - "Lineage emit: {job: 'orders-aggregator', inputs: ['raw_orders@2026-04-30'], outputs: ['daily_order_summary@2026-04-30'], transform_version: 'v1.3.2'}"
79
+ - "OpenLineage event emitted to Marquez or DataHub on job start and completion"
80
+ - "Lineage query: 'Which pipelines read from raw_orders?' returns 5 downstream jobs"
81
+
82
+ - id: REQ-005
83
+ title: Pipeline Observability and SLOs
84
+ description: |
85
+ Every production data pipeline MUST expose the following metrics:
86
+ records processed (counter), processing latency (histogram), error
87
+ rate (gauge), DLQ depth (gauge), and last successful run timestamp.
88
+ Pipelines MUST define SLOs for: freshness (data available within N
89
+ hours of source), completeness (≥ X% records successfully processed),
90
+ and latency (p95 processing time within threshold). SLO violations
91
+ MUST trigger alerts.
92
+ level: MUST
93
+ examples:
94
+ - "Metric: pipeline_records_processed_total{pipeline='orders-agg',status='success'}"
95
+ - "Freshness SLO: daily_order_summary available by 03:00 UTC — alert if missing by 04:00 UTC"
96
+ - "Completeness alert: processed_records / expected_records < 0.99 → P2 alert"
97
+
98
+ - id: REQ-006
99
+ title: Pipeline Testing Requirements
100
+ description: |
101
+ Data pipelines MUST have automated tests covering: unit tests for
102
+ transformation logic (test with sample input/output DataFrames),
103
+ integration tests validating end-to-end flow with synthetic data,
104
+ and schema conformance tests validating output matches declared
105
+ data contract. Pipelines SHOULD have regression tests for historically
106
+ problematic edge cases (nulls in key fields, negative amounts,
107
+ duplicate records). Test coverage MUST be ≥ 80% for transformation
108
+ logic.
109
+ level: MUST
110
+ examples:
111
+ - "Unit test: test_calculate_order_total() — asserts discount applied correctly on sample rows"
112
+ - "Integration test: runs full pipeline on 1000 synthetic orders, validates output row count and schema"
113
+ - "Edge case test: pipeline handles duplicate order_id gracefully, deduplication logic verified"
@@ -0,0 +1,89 @@
1
+ # SPDX-License-Identifier: MIT
2
+ name: Disaster Recovery Drill Standards
3
+ nameZh: 災難恢復演練標準
4
+ id: disaster-recovery-drill
5
+ version: "1.0.0"
6
+ category: operations
7
+ scope: reliability
8
+ summary: >
9
+ Structured DR drill standards: quarterly runbook execution, RTO/RPO
10
+ measurement, backup restore verification, and Game Day protocols.
11
+ Untested recovery plans fail at the worst moment.
12
+
13
+ requirements:
14
+ - id: REQ-01
15
+ title: RTO/RPO Targets Defined
16
+ titleZh: RTO/RPO 目標定義
17
+ level: MUST
18
+ description: >
19
+ Each system MUST have documented RTO (Recovery Time Objective) and RPO
20
+ (Recovery Point Objective) targets. These must be agreed with stakeholders
21
+ before any DR drill can be considered meaningful.
22
+ examples:
23
+ - "VibeOps commercial: RTO < 1 hour, RPO < 24 hours (daily backup)"
24
+
25
+ - id: REQ-02
26
+ title: Backup Restore Test
27
+ titleZh: 備份還原測試
28
+ level: MUST
29
+ description: >
30
+ At minimum quarterly, a full backup restore MUST be executed in an
31
+ isolated environment and verified for data integrity. The restore time
32
+ MUST be measured and compared to the RTO target.
33
+
34
+ - id: REQ-03
35
+ title: Runbook Completeness
36
+ titleZh: 運行手冊完整性
37
+ level: MUST
38
+ description: >
39
+ A DR runbook MUST exist covering: (1) detection (how do we know disaster
40
+ occurred?), (2) decision (who declares DR?), (3) recovery steps
41
+ (step-by-step, executable commands), (4) verification (how do we confirm
42
+ recovery?), (5) communication plan.
43
+
44
+ - id: REQ-04
45
+ title: Game Day Exercise
46
+ titleZh: Game Day 演練
47
+ level: SHOULD
48
+ description: >
49
+ At minimum annually, a Game Day exercise SHOULD be conducted where the
50
+ team simulates a realistic failure scenario and executes the runbook from
51
+ scratch. Results SHOULD be documented and used to improve the runbook.
52
+
53
+ - id: REQ-05
54
+ title: Drill Record
55
+ titleZh: 演練記錄
56
+ level: MUST
57
+ description: >
58
+ Every DR drill MUST produce a written record including: date, participants,
59
+ scenario tested, RTO achieved, RPO achieved, issues found, remediation
60
+ actions. Records MUST be retained for 12 months.
61
+
62
+ examples:
63
+ - name: "DR drill record template"
64
+ code: |
65
+ date: 2026-05-05
66
+ participants: [alice, bob]
67
+ scenario: "Database total loss — restore from daily backup"
68
+ rto_target: "1 hour"
69
+ rto_achieved: "42 minutes"
70
+ rpo_target: "24 hours"
71
+ rpo_achieved: "23 hours 15 minutes"
72
+ issues_found:
73
+ - "backup script path was stale — fixed in XSPEC-170"
74
+ remediation:
75
+ - "Update backup path in backup-restore.sh"
76
+ status: PASS
77
+
78
+ anti_patterns:
79
+ - description: >
80
+ Only verifying that a backup file exists — always restore it and
81
+ verify data integrity. An untested backup is not a backup.
82
+ - description: >
83
+ Running DR drills in production — always use an isolated environment
84
+ to avoid turning a drill into an actual disaster.
85
+
86
+ related_standards:
87
+ - deployment-standards
88
+ - chaos-engineering-standards
89
+ - verification-evidence
@@ -0,0 +1,89 @@
1
+ # SPDX-License-Identifier: MIT
2
+ name: Flaky Test Management Standards
3
+ nameZh: 不穩定測試管理標準
4
+ id: flaky-test-management
5
+ version: "1.0.0"
6
+ category: testing
7
+ scope: test-reliability
8
+ summary: >
9
+ Policies and tooling for detecting, quarantining, and eliminating flaky
10
+ tests. Flaky tests erode CI confidence, cause false failures, and mask
11
+ real bugs.
12
+
13
+ requirements:
14
+ - id: REQ-01
15
+ title: Flaky Test Definition
16
+ titleZh: 不穩定測試定義
17
+ level: MUST
18
+ description: >
19
+ A test is considered flaky if it produces different results (pass/fail)
20
+ on consecutive runs with the same code. Teams MUST define a flakiness
21
+ threshold: a test that fails ≥ 2% of runs on main branch without code
22
+ changes is flaky.
23
+
24
+ - id: REQ-02
25
+ title: Quarantine Protocol
26
+ titleZh: 隔離協議
27
+ level: MUST
28
+ description: >
29
+ Flaky tests MUST be quarantined within 48 hours of detection by:
30
+ (1) adding a `.skip` or `.todo` annotation, (2) opening a tracking
31
+ issue, (3) adding a comment with the issue link and known failure mode.
32
+ Quarantined tests MUST NOT block CI merges.
33
+
34
+ - id: REQ-03
35
+ title: Retry Policy
36
+ titleZh: 重試策略
37
+ level: SHOULD
38
+ description: >
39
+ CI SHOULD allow a maximum of 2 retries for tests in the quarantine list.
40
+ Retries SHOULD be applied only to known-flaky tests, not the entire suite.
41
+ A test that passes after retry is still considered flaky and MUST be fixed.
42
+
43
+ - id: REQ-04
44
+ title: Flaky Test Elimination SLA
45
+ titleZh: 修復 SLA
46
+ level: MUST
47
+ description: >
48
+ Quarantined tests MUST be either fixed or permanently removed within
49
+ 30 days of quarantine. Tests left quarantined for > 30 days with no
50
+ activity SHOULD be automatically deleted.
51
+
52
+ - id: REQ-05
53
+ title: Root Cause Categories
54
+ titleZh: 根因分類
55
+ level: SHOULD
56
+ description: >
57
+ When eliminating a flaky test, the root cause SHOULD be documented in
58
+ the fixing PR. Common root causes: timing/race conditions, test isolation
59
+ failures (shared state), external service dependencies, random seed
60
+ dependence, file system ordering.
61
+
62
+ examples:
63
+ - name: "Quarantine annotation (Vitest)"
64
+ code: |
65
+ // TODO: flaky test quarantined 2026-05-05 — see issue #42
66
+ // Root cause: race condition in WebSocket reconnection
67
+ it.skip("reconnects after disconnect", async () => { ... })
68
+
69
+ - name: "Vitest retry config for known flaky tests"
70
+ code: |
71
+ // vitest.config.ts
72
+ export default defineConfig({
73
+ test: {
74
+ retry: 2, // global retry for all tests
75
+ }
76
+ })
77
+
78
+ anti_patterns:
79
+ - description: >
80
+ Allowing flaky tests to block CI without quarantine — developers learn
81
+ to ignore CI failures, which hides real bugs.
82
+ - description: >
83
+ Using arbitrary sleeps (setTimeout/sleep) to fix race conditions —
84
+ this makes tests slower and more fragile. Use proper async coordination.
85
+
86
+ related_standards:
87
+ - testing
88
+ - test-governance
89
+ - ci-cd-standards
@@ -0,0 +1,240 @@
1
+ # Flow-Based Testing Standards - AI Optimized
2
+ # Source: core/flow-based-testing.md
3
+
4
+ id: flow-based-testing
5
+ meta:
6
+ version: "1.0.0"
7
+ updated: "2026-05-04"
8
+ source: core/flow-based-testing.md
9
+ description: Flow decomposition methodology for testing multi-step processes with branch coverage
10
+
11
+ core_concepts:
12
+ problem: >
13
+ AC-centric tests verify individual behaviors in isolation but miss integration gaps
14
+ between steps and leave decision-point branches untested.
15
+ A flow where AC-1 passes and AC-2 passes independently does NOT guarantee
16
+ the AC-1 → AC-2 → AC-3 sequential flow works correctly.
17
+ solution: >
18
+ Decompose each workflow into flows, identify all decision points,
19
+ expand into a scenario matrix, then write journey tests with shared state threading.
20
+
21
+ # ─────────────────────────────────────────────────────────
22
+ # Step 1: Flow Identification
23
+ # ─────────────────────────────────────────────────────────
24
+ flow_identification:
25
+ description: Extract testable flows from SPEC/AC before writing any test code
26
+ activities:
27
+ - name: Define Preconditions
28
+ description: Document the system's initial state before the flow begins
29
+ examples:
30
+ - User is authenticated (token present)
31
+ - Database has seed data
32
+ - External services are available
33
+ - name: List Ordered Action Sequence
34
+ description: List all steps in the exact execution order
35
+ examples:
36
+ - Step 1: Validate input
37
+ - Step 2: Check quota
38
+ - Step 3: Call external service
39
+ - Step 4: Persist result
40
+ - Step 5: Return response
41
+ - name: Extract Decision Points
42
+ description: Identify every if/else/conditional in the flow
43
+ examples:
44
+ - "Is the user authorized? [yes → continue | no → 401]"
45
+ - "Is quota sufficient? [yes → continue | no → 429]"
46
+ - "Did external service respond? [200 → success | timeout → retry | error → fail]"
47
+ - name: Define Terminal States
48
+ description: List all possible end states (success + each distinct failure)
49
+ examples:
50
+ - "SUCCESS: Resource created, 201 returned"
51
+ - "FAIL_AUTH: 401 with error code AUTH_INVALID"
52
+ - "FAIL_QUOTA: 429 with error code QUOTA_EXCEEDED"
53
+ - "FAIL_EXTERNAL: 504 with error code EXTERNAL_TIMEOUT"
54
+ output: Flow definition (decision point list + terminal state map)
55
+
56
+ # ─────────────────────────────────────────────────────────
57
+ # Step 2: Decision Table Expansion
58
+ # ─────────────────────────────────────────────────────────
59
+ decision_table_expansion:
60
+ description: Expand decision points into a scenario matrix using a coverage strategy
61
+ coverage_strategies:
62
+ - name: Each-Choice
63
+ description: Each decision value appears in at least one scenario
64
+ formula: "Minimum scenarios = sum of unique values across all decision points"
65
+ use_when: Low-risk flows, fast feedback cycles, initial coverage
66
+ example: "3 values + 2 values + 3 values = 8 minimum scenarios"
67
+ - name: Pairwise
68
+ description: All pairs of decision values are covered (OWASP T-way testing)
69
+ formula: Approximately N × max_values scenarios (N = decision points)
70
+ use_when: Medium-risk flows, balance between coverage and test count
71
+ tools: [allpairs, pairwiseJS]
72
+ - name: All-Combinations
73
+ description: Full Cartesian product of all decision values
74
+ formula: "Scenarios = product of value counts per decision point"
75
+ use_when: Critical flows (auth, payment, license validation, security controls)
76
+ warning: Can grow exponentially; only use for flows with < 4 decision points or < 3 values each
77
+
78
+ decision_table_template: |
79
+ Flow: [Flow Name]
80
+
81
+ Decision Points:
82
+ | Point | Values |
83
+ |----------------|---------------------------------|
84
+ | Authorization | valid / expired / missing |
85
+ | Quota | sufficient / exceeded |
86
+ | ExternalSvc | available / timeout / error |
87
+
88
+ Each-Choice Scenarios (minimum coverage):
89
+ | Scenario | Auth | Quota | ExternalSvc | Expected |
90
+ |----------|----------|------------|-------------|-------------------|
91
+ | S1 | valid | sufficient | available | success |
92
+ | S2 | expired | sufficient | available | 401_expired |
93
+ | S3 | missing | sufficient | available | 401_missing |
94
+ | S4 | valid | exceeded | available | 429_quota |
95
+ | S5 | valid | sufficient | timeout | 504_retry |
96
+ | S6 | valid | sufficient | error | 502_external_fail |
97
+
98
+ # ─────────────────────────────────────────────────────────
99
+ # Step 3: Journey Test Structure
100
+ # ─────────────────────────────────────────────────────────
101
+ journey_test_structure:
102
+ description: Write flow tests with shared state threading — state accumulates across steps
103
+ key_principle: >
104
+ Each test step inherits state from previous steps through a shared context object.
105
+ Never reset the context between steps in a journey test.
106
+
107
+ happy_path_pattern: |
108
+ describe("Flow: [Flow Name]", () => {
109
+ // Shared context — state accumulates, NOT reset between steps
110
+ const ctx: {
111
+ token?: string;
112
+ resourceId?: string;
113
+ result?: ResponseType;
114
+ } = {}
115
+
116
+ it("Step 1: [Precondition setup]", async () => {
117
+ ctx.token = await setupAuth()
118
+ expect(ctx.token).toBeTruthy()
119
+ })
120
+
121
+ it("Step 2: [Core action using Step 1 state]", async () => {
122
+ ctx.resourceId = await createResource(ctx.token!, inputData)
123
+ expect(ctx.resourceId).toMatch(/^[a-z0-9-]+$/)
124
+ })
125
+
126
+ it("Step 3: [Verification using accumulated state]", async () => {
127
+ ctx.result = await getResource(ctx.token!, ctx.resourceId!)
128
+ expect(ctx.result.status).toBe("active")
129
+ expect(ctx.result.id).toBe(ctx.resourceId)
130
+ })
131
+ })
132
+
133
+ branch_pattern: |
134
+ // Each branch outcome gets its own describe block
135
+ describe("Flow Branch: [Decision Point] → [Outcome]", () => {
136
+ it("should [expected behavior] when [condition]", async () => {
137
+ // Setup: put the system into the branch condition
138
+ const expiredToken = buildExpiredJwt()
139
+
140
+ // Act: trigger the flow with branch condition
141
+ const response = await callApi(expiredToken)
142
+
143
+ // Assert: verify BOTH the response AND any side effects
144
+ expect(response.status).toBe(401)
145
+ expect(response.body.code).toBe("AUTH_TOKEN_EXPIRED")
146
+ // Verify NO side effects occurred (resource not created)
147
+ const count = await getResourceCount()
148
+ expect(count).toBe(0)
149
+ })
150
+ })
151
+
152
+ anti_patterns_in_structure:
153
+ - "Using beforeEach to reset ctx — breaks state threading between steps"
154
+ - "Putting all steps in a single it() block — hides which step failed"
155
+ - "Asserting only on the final state — intermediate step bugs go undetected"
156
+ - "Using arbitrary delays between steps — use proper async/await"
157
+
158
+ # ─────────────────────────────────────────────────────────
159
+ # Feature Type Mapping
160
+ # ─────────────────────────────────────────────────────────
161
+ feature_type_mapping:
162
+ - type: Workflow / Multi-step Process
163
+ requires: flow_decomposition
164
+ minimum_coverage: Each-Choice
165
+ test_structure: journey-chained-test
166
+ dimensions: [1, 3, 4, 5, 9, 10]
167
+ with_ai: [1, 3, 4, 5, 9, 10, 8]
168
+ note: Apply flow-based-testing standard; use shared ctx; Each-Choice minimum branch coverage
169
+
170
+ # ─────────────────────────────────────────────────────────
171
+ # Rules
172
+ # ─────────────────────────────────────────────────────────
173
+ rules:
174
+ - id: flow-identification-required
175
+ trigger: feature has 3 or more sequential steps
176
+ instruction: Apply 3-step flow decomposition (identify → decision table → journey structure) before writing tests
177
+ priority: required
178
+
179
+ - id: decision-table-for-branches
180
+ trigger: flow has any if/else or conditional logic
181
+ instruction: Create decision table with all decision values; apply Each-Choice minimum coverage
182
+ priority: required
183
+
184
+ - id: shared-state-in-journey
185
+ trigger: writing multi-step flow test
186
+ instruction: Use shared context object (ctx); never reset it between steps within the same flow
187
+ priority: required
188
+
189
+ - id: separate-branch-describes
190
+ trigger: decision point has 2 or more outcome values
191
+ instruction: Each distinct outcome gets its own describe block with a clear name
192
+ priority: required
193
+
194
+ - id: all-combinations-for-critical
195
+ trigger: flow involves authentication, payment, license validation, or security controls
196
+ instruction: Apply All-Combinations coverage strategy; test every value combination
197
+ priority: required
198
+
199
+ - id: verify-side-effects-in-branches
200
+ trigger: writing branch test for failure path
201
+ instruction: Assert BOTH the error response AND that no unintended side effects occurred
202
+ priority: required
203
+
204
+ # ─────────────────────────────────────────────────────────
205
+ # Anti-Patterns
206
+ # ─────────────────────────────────────────────────────────
207
+ anti_patterns:
208
+ - Testing only the happy path flow (missing failure terminal states)
209
+ - Resetting shared state between steps in a journey test (breaks state threading)
210
+ - Testing each step in isolation without verifying accumulated state
211
+ - Using a single test case for a flow with multiple decision points (hides which branch failed)
212
+ - Applying Pairwise or All-Combinations to every flow (creates unmaintainable test counts; reserve for critical flows)
213
+ - Not verifying side effects (or absence of side effects) in branch tests
214
+ - Starting flow test from midpoint without establishing preconditions
215
+
216
+ # ─────────────────────────────────────────────────────────
217
+ # Quick Reference
218
+ # ─────────────────────────────────────────────────────────
219
+ quick_reference:
220
+ flow_test_checklist: |
221
+ Flow: ___________________
222
+
223
+ □ Step 1 — Flow Identification
224
+ □ Preconditions documented
225
+ □ Ordered step sequence listed (Step 1 → Step N)
226
+ □ All decision points extracted (every if/else/condition)
227
+ □ All terminal states defined (success + each distinct failure)
228
+
229
+ □ Step 2 — Decision Table
230
+ □ Decision table created with all values per decision point
231
+ □ Coverage strategy chosen (Each-Choice / Pairwise / All-Combinations)
232
+ □ Critical flows (auth/payment/security) use All-Combinations
233
+ □ Minimum scenario count = sum of unique values (Each-Choice)
234
+
235
+ □ Step 3 — Journey Test Structure
236
+ □ Happy path journey test exists (shared ctx, all steps in sequence)
237
+ □ Each branch outcome has its own describe block
238
+ □ Step assertions verify accumulated ctx state, not just final result
239
+ □ Branch tests verify both error response AND absence of side effects
240
+ □ No beforeEach resetting ctx between steps