@paths.design/caws-cli 2.0.1 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/dist/index.d.ts.map +1 -1
  2. package/dist/index.js +101 -96
  3. package/package.json +3 -2
  4. package/templates/agents.md +820 -0
  5. package/templates/apps/tools/caws/COMPLETION_REPORT.md +331 -0
  6. package/templates/apps/tools/caws/MIGRATION_SUMMARY.md +360 -0
  7. package/templates/apps/tools/caws/README.md +463 -0
  8. package/templates/apps/tools/caws/TEST_STATUS.md +365 -0
  9. package/templates/apps/tools/caws/attest.js +357 -0
  10. package/templates/apps/tools/caws/ci-optimizer.js +642 -0
  11. package/templates/apps/tools/caws/config.ts +245 -0
  12. package/templates/apps/tools/caws/cross-functional.js +876 -0
  13. package/templates/apps/tools/caws/dashboard.js +1112 -0
  14. package/templates/apps/tools/caws/flake-detector.ts +362 -0
  15. package/templates/apps/tools/caws/gates.js +198 -0
  16. package/templates/apps/tools/caws/gates.ts +237 -0
  17. package/templates/apps/tools/caws/language-adapters.ts +381 -0
  18. package/templates/apps/tools/caws/language-support.d.ts +367 -0
  19. package/templates/apps/tools/caws/language-support.d.ts.map +1 -0
  20. package/templates/apps/tools/caws/language-support.js +585 -0
  21. package/templates/apps/tools/caws/legacy-assessment.ts +408 -0
  22. package/templates/apps/tools/caws/legacy-assessor.js +764 -0
  23. package/templates/apps/tools/caws/mutant-analyzer.js +734 -0
  24. package/templates/apps/tools/caws/perf-budgets.ts +349 -0
  25. package/templates/apps/tools/caws/property-testing.js +707 -0
  26. package/templates/apps/tools/caws/provenance.d.ts +14 -0
  27. package/templates/apps/tools/caws/provenance.d.ts.map +1 -0
  28. package/templates/apps/tools/caws/provenance.js +132 -0
  29. package/templates/apps/tools/caws/provenance.ts +211 -0
  30. package/templates/apps/tools/caws/schemas/waivers.schema.json +30 -0
  31. package/templates/apps/tools/caws/schemas/working-spec.schema.json +115 -0
  32. package/templates/apps/tools/caws/scope-guard.js +208 -0
  33. package/templates/apps/tools/caws/security-provenance.ts +483 -0
  34. package/templates/apps/tools/caws/shared/base-tool.ts +281 -0
  35. package/templates/apps/tools/caws/shared/config-manager.ts +366 -0
  36. package/templates/apps/tools/caws/shared/gate-checker.ts +597 -0
  37. package/templates/apps/tools/caws/shared/types.ts +444 -0
  38. package/templates/apps/tools/caws/shared/validator.ts +305 -0
  39. package/templates/apps/tools/caws/shared/waivers-manager.ts +174 -0
  40. package/templates/apps/tools/caws/spec-test-mapper.ts +391 -0
  41. package/templates/apps/tools/caws/templates/working-spec.template.yml +60 -0
  42. package/templates/apps/tools/caws/test-quality.js +578 -0
  43. package/templates/apps/tools/caws/tools-allow.json +331 -0
  44. package/templates/apps/tools/caws/validate.js +76 -0
  45. package/templates/apps/tools/caws/validate.ts +228 -0
  46. package/templates/apps/tools/caws/waivers.js +344 -0
  47. package/templates/apps/tools/caws/waivers.yml +19 -0
  48. package/templates/codemod/README.md +1 -0
  49. package/templates/codemod/test.js +1 -0
  50. package/templates/docs/README.md +150 -0
@@ -0,0 +1,820 @@
1
+ # CAWS v1.0 — Engineering-Grade Operating System for Coding Agents
2
+
3
+ ## Purpose
4
+
5
+ Our "engineering-grade" operating system for coding agents that (1) forces planning before code, (2) bakes in tests as first-class artifacts, (3) creates explainable provenance, and (4) enforces quality via automated CI gates. It's expressed as a Working Spec + Ruleset the agent must follow, with schemas, templates, scripts, and verification hooks that enable better collaboration between agent and our human in the loop.
6
+
7
+ ## 1) Core Framework
8
+
9
+ ### Risk Tiering → Drives Rigor
10
+
11
+ • **Tier 1** (Core/critical path, auth/billing, migrations): highest rigor; mutation ≥ 70, branch cov ≥ 90, contract tests mandatory, chaos tests optional, manual review required.
12
+ • **Tier 2** (Common features, data writes, cross-service APIs): mutation ≥ 50, branch cov ≥ 80, contracts mandatory if any external API, e2e smoke required.
13
+ • **Tier 3** (Low risk, read-only UI, internal tooling): mutation ≥ 30, branch cov ≥ 70, integration happy-path + unit thoroughness, e2e optional.
14
+
15
+ Agent must infer and declare tier in the plan; human reviewer may bump it up, never down.
16
+
17
+ ### New Invariants (Repository-Level "Operating Envelope")
18
+
19
+ 1. **Atomic Change Budget**
20
+ - _Invariant:_ "A PR must fit into one of: `refactor`, `feature`, `fix`, `doc`, `chore`—and must touch only files that the Working Spec's `scope.in` names."
21
+ - _Reason:_ Kills scope-creep; enables deterministic review.
22
+ - _Gate:_ CI rejects PRs that modify files outside `scope.in` unless `spec_delta` is present.
23
+
24
+ 2. **In-place Refactor (No Shadow Copies)**
25
+ - _Invariant:_ Refactors perform **in-place** edits with AST codemods; **no parallel files** (e.g., `enhanced-*.ts`).
26
+ - _Gate:_ a naming linter blocks new files that share stem with suffix/prefix (`enhanced|new|v2|copy|final`).
27
+
28
+ 3. **Determinism & Idempotency**
29
+ - _Invariant:_ All new code must be testable with injected clock/uuid/random; repeated requests must be safe (where applicable) and asserted in tests.
30
+ - _Gate:_ mutation tests + property tests include at least one idempotency predicate for Tier ≥2.
31
+
32
+ 4. **Prompt & Tool Security Envelope** (for agent workflows)
33
+ - _Invariant:_ Agents operate with **tool allow-lists**, **redacted secrets**, and **context firebreaks** (no raw secrets in model context; never post `.env`, keys, or tokens back into diffs).
34
+ - _Gate:_ prompt-lint and secret-scan on the agent prompt files + PR diffs.
35
+
36
+ 5. **Supply-chain Provenance**
37
+ - _Invariant:_ Every CI build produces an SBOM + SLSA-style attestation attached to the PR.
38
+ - _Gate:_ trust score requires valid SBOM/attestation.
39
+
40
+ ### Required Inputs (No Code Until Present)
41
+
42
+ • **Working Spec YAML** (see schema below) with user story, scope, invariants, acceptance tests, non-functional budgets, risk tier.
43
+ • **Interface Contracts**: OpenAPI/GraphQL SDL/proto/Pact provider/consumer stubs.
44
+ • **Test Plan**: unit cases, properties, fixtures, integration flows, e2e smokes; data setup/teardown; flake controls.
45
+ • **Change Impact Map**: touched modules, migrations, roll-forward/rollback.
46
+ • **A11y/Perf/Sec budgets**: keyboard path(s), axe rules to enforce; perf budget (TTI/LCP/API latency); SAST/secret scanning & deps policy.
47
+
48
+ If any are missing, agent must generate a draft and request confirmation inside the PR description before implementing.
49
+
50
+ ### The Loop: Plan → Implement → Verify → Document
51
+
52
+ #### 2.1 Plan (agent output, committed as feature.plan.md)
53
+
54
+ • **Design sketch**: sequence diagram or pseudo-API table.
55
+ • **Test matrix**: aligned to user intent (unit/contract/integration/e2e) with edge cases and property predicates.
56
+ • **Data plan**: factories/fixtures, seed strategy, anonymized sample payloads.
57
+ • **Observability plan**: logs/metrics/traces; which spans and attributes will verify correctness in prod.
58
+
59
+ #### 2.2 Implement (rules)
60
+
61
+ • **Contract-first**: generate/validate types from OpenAPI/SDL; add contract tests (Pact/WireMock/MSW) before impl.
62
+ • **Unit focus**: pure logic isolated; mocks only at boundaries you own (clock, fs, network).
63
+ • **State seams**: inject time/uuid/random; ensure determinism; guard for idempotency where relevant.
64
+ • **Migration discipline**: forwards-compatible; provide up/down, dry-run, and backfill strategy.
65
+
66
+ ### Mode Matrix
67
+
68
+ | Mode | Contracts | New Files | Required Artifacts |
69
+ | ------------ | ------------------------------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------ |
70
+ | **refactor** | Must not change | Discouraged; only when splitting modules with 1:1 mapping and codemod provided | Codemod script + semantic diff report |
71
+ | **feature** | Required first; consumer/provider tests green before implementation | Allowed; must be listed in scope.in | Migration plan, feature flag, performance budget |
72
+ | **fix** | Unchanged | Discouraged; prefer in-place edits | Red test → green; root cause note in PR |
73
+ | **doc** | N/A | Allowed for documentation files | Updated README/usage snippets |
74
+ | **chore** | N/A | Limited to build/tooling changes | Version updates, dependency changes |
75
+
76
+ ### Cursor/Codex Execution Guard
77
+
78
+ Add a commit policy hook to reject commit sets that introduce duplicate stems:
79
+
80
+ ```bash
81
+ # .git/hooks/pre-commit (or CI script)
82
+ PATTERN='/(copy|final|enhanced|v2)[.-]|/(new-)| - copy\.'
83
+ git diff --cached --name-only | grep -E "$PATTERN" && {
84
+ echo "❌ Disallowed filename pattern. Use in-place refactor or codemod."
85
+ exit 1
86
+ }
87
+ ```
88
+
89
+ #### 2.3 Verify (must pass locally and in CI)
90
+
91
+ • **Static checks**: typecheck, lint (code + tests), import hygiene, dead-code scan, secret scan.
92
+ • **Tests**:
93
+ • **Unit**: fast, deterministic; cover branches and edge conditions; property-based where feasible.
94
+ • **Contract**: consumer/provider; versioned and stored under apps/contracts/.
95
+ • **Integration**: real DB or Testcontainers; seed data via factories; verify persistence, transactions, retries/timeouts.
96
+ • **E2E smoke**: Playwright/Cypress; critical user paths only; semantic selectors; screenshot+trace on failure.
97
+ • **Mutation testing**: minimum scores per tier; non-conformant builds fail.
98
+ • **Non-functional checks**: axe rules; Lighthouse CI budgets or API latency budgets; SAST/dep scan clean.
99
+ • **Flake policy**: tests that intermittently fail are quarantined within 24h with an open ticket; no retries as policy, only as temporary band-aid with expiry.
100
+
101
+ #### 2.4 Document & Deliver
102
+
103
+ • **PR bundle** (template below) with:
104
+ • Working Spec YAML
105
+ • Test Plan & Coverage/Mutation summary, Contract artifacts
106
+ • Risk assessment, Rollback plan, Observability notes (dashboards/queries)
107
+ • Changelog (semver impact), Migration notes
108
+ • Traceability: PR title references ticket; commits follow conventional commits; each test cites the requirement ID in test name or annotation.
109
+ • Explainability: agent includes a 10-line "rationale" and "known-limits" section.
110
+
111
+ ## 2) Machine-Enforceable Implementation
112
+
113
+ ### A) Executable Schemas & Validation
114
+
115
+ #### Working Spec JSON Schema
116
+
117
+ ```json
118
+ {
119
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
120
+ "title": "CAWS Working Spec",
121
+ "type": "object",
122
+ "required": [
123
+ "id",
124
+ "title",
125
+ "risk_tier",
126
+ "mode",
127
+ "change_budget",
128
+ "blast_radius",
129
+ "operational_rollback_slo",
130
+ "scope",
131
+ "invariants",
132
+ "acceptance",
133
+ "non_functional",
134
+ "contracts"
135
+ ],
136
+ "properties": {
137
+ "id": { "type": "string", "pattern": "^[A-Z]+-\\d+$" },
138
+ "title": { "type": "string", "minLength": 8 },
139
+ "risk_tier": { "type": "integer", "enum": [1, 2, 3] },
140
+ "mode": { "type": "string", "enum": ["refactor", "feature", "fix", "doc", "chore"] },
141
+ "change_budget": {
142
+ "type": "object",
143
+ "properties": {
144
+ "max_files": { "type": "integer", "minimum": 1 },
145
+ "max_loc": { "type": "integer", "minimum": 1 }
146
+ },
147
+ "required": ["max_files", "max_loc"],
148
+ "additionalProperties": false
149
+ },
150
+ "blast_radius": {
151
+ "type": "object",
152
+ "properties": {
153
+ "modules": { "type": "array", "items": { "type": "string" } },
154
+ "data_migration": { "type": "boolean" }
155
+ },
156
+ "required": ["modules", "data_migration"],
157
+ "additionalProperties": false
158
+ },
159
+ "operational_rollback_slo": { "type": "string", "pattern": "^[0-9]+m$|^[0-9]+h$" },
160
+ "threats": { "type": "array", "items": { "type": "string" } },
161
+ "scope": {
162
+ "type": "object",
163
+ "required": ["in", "out"],
164
+ "properties": {
165
+ "in": { "type": "array", "items": { "type": "string" }, "minItems": 1 },
166
+ "out": { "type": "array", "items": { "type": "string" } }
167
+ }
168
+ },
169
+ "invariants": { "type": "array", "items": { "type": "string" }, "minItems": 1 },
170
+ "acceptance": {
171
+ "type": "array",
172
+ "minItems": 1,
173
+ "items": {
174
+ "type": "object",
175
+ "required": ["id", "given", "when", "then"],
176
+ "properties": {
177
+ "id": { "type": "string", "pattern": "^A\\d+$" },
178
+ "given": { "type": "string" },
179
+ "when": { "type": "string" },
180
+ "then": { "type": "string" }
181
+ }
182
+ }
183
+ },
184
+ "non_functional": {
185
+ "type": "object",
186
+ "properties": {
187
+ "a11y": { "type": "array", "items": { "type": "string" } },
188
+ "perf": {
189
+ "type": "object",
190
+ "properties": {
191
+ "api_p95_ms": { "type": "integer", "minimum": 1 },
192
+ "lcp_ms": { "type": "integer", "minimum": 1 }
193
+ },
194
+ "additionalProperties": false
195
+ },
196
+ "security": { "type": "array", "items": { "type": "string" } }
197
+ },
198
+ "additionalProperties": false
199
+ },
200
+ "contracts": {
201
+ "type": "array",
202
+ "minItems": 1,
203
+ "items": {
204
+ "type": "object",
205
+ "required": ["type", "path"],
206
+ "properties": {
207
+ "type": { "type": "string", "enum": ["openapi", "graphql", "proto", "pact"] },
208
+ "path": { "type": "string" }
209
+ }
210
+ }
211
+ },
212
+ "observability": {
213
+ "type": "object",
214
+ "properties": {
215
+ "logs": { "type": "array", "items": { "type": "string" } },
216
+ "metrics": { "type": "array", "items": { "type": "string" } },
217
+ "traces": { "type": "array", "items": { "type": "string" } }
218
+ }
219
+ },
220
+ "migrations": { "type": "array", "items": { "type": "string" } },
221
+ "rollback": { "type": "array", "items": { "type": "string" } }
222
+ },
223
+ "additionalProperties": false
224
+ }
225
+ ```
226
+
227
+ #### Provenance Manifest Schema
228
+
229
+ ```json
230
+ {
231
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
232
+ "type": "object",
233
+ "required": [
234
+ "agent",
235
+ "model",
236
+ "model_hash",
237
+ "tool_allowlist",
238
+ "commit",
239
+ "artifacts",
240
+ "results",
241
+ "approvals",
242
+ "sbom",
243
+ "attestation"
244
+ ],
245
+ "properties": {
246
+ "agent": { "type": "string" },
247
+ "model": { "type": "string" },
248
+ "model_hash": { "type": "string" },
249
+ "tool_allowlist": { "type": "array", "items": { "type": "string" } },
250
+ "prompts": { "type": "array", "items": { "type": "string" } },
251
+ "commit": { "type": "string" },
252
+ "artifacts": { "type": "array", "items": { "type": "string" } },
253
+ "results": {
254
+ "type": "object",
255
+ "properties": {
256
+ "coverage_branch": { "type": "number" },
257
+ "mutation_score": { "type": "number" },
258
+ "tests_passed": { "type": "integer" },
259
+ "contracts": {
260
+ "type": "object",
261
+ "properties": { "consumer": { "type": "boolean" }, "provider": { "type": "boolean" } }
262
+ },
263
+ "a11y": { "type": "string" },
264
+ "perf": { "type": "object" }
265
+ },
266
+ "additionalProperties": true
267
+ },
268
+ "approvals": { "type": "array", "items": { "type": "string" } },
269
+ "sbom": { "type": "string" },
270
+ "attestation": { "type": "string" }
271
+ }
272
+ }
273
+ ```
274
+
275
+ #### Tier Policy Configuration
276
+
277
+ ```json
278
+ {
279
+ "1": {
280
+ "min_branch": 0.9,
281
+ "min_mutation": 0.7,
282
+ "requires_contracts": true,
283
+ "requires_manual_review": true,
284
+ "max_files": 40,
285
+ "max_loc": 1500,
286
+ "allowed_modes": ["feature", "refactor", "fix"]
287
+ },
288
+ "2": {
289
+ "min_branch": 0.8,
290
+ "min_mutation": 0.5,
291
+ "requires_contracts": true,
292
+ "max_files": 25,
293
+ "max_loc": 1000,
294
+ "allowed_modes": ["feature", "refactor", "fix"]
295
+ },
296
+ "3": {
297
+ "min_branch": 0.7,
298
+ "min_mutation": 0.3,
299
+ "requires_contracts": false,
300
+ "max_files": 15,
301
+ "max_loc": 600,
302
+ "allowed_modes": ["feature", "refactor", "fix", "doc", "chore"]
303
+ }
304
+ }
305
+ ```
306
+
307
+ ### B) CI/CD Quality Gates (Automated)
308
+
309
+ #### Complete GitHub Actions Pipeline
310
+
311
+ ```yaml
312
+ name: CAWS Quality Gates
313
+ on:
314
+ pull_request:
315
+ types: [opened, synchronize, reopened, ready_for_review]
316
+
317
+ jobs:
318
+ naming_guard:
319
+ runs-on: ubuntu-latest
320
+ steps:
321
+ - uses: actions/checkout@v4
322
+ - name: Block shadow file patterns
323
+ run: |
324
+ BAD=$(git diff --name-only origin/${{ github.base_ref }}... | \
325
+ grep -E '/(copy|final|enhanced|v2)[.-]|/(new-)|(^|/)_.+\.| - copy\.' || true)
326
+ if [ -n "$BAD" ]; then
327
+ echo "❌ Shadow/duplicate filename patterns detected:"
328
+ echo "$BAD"
329
+ exit 1
330
+ fi
331
+
332
+ scope_guard:
333
+ runs-on: ubuntu-latest
334
+ steps:
335
+ - uses: actions/checkout@v4
336
+ - name: Ensure changes are within scope.in
337
+ run: |
338
+ yq -o=json '.caws/working-spec.yaml' > .caws/ws.json
339
+ jq -r '.scope.in[]' .caws/ws.json | sed 's|^|^|; s|$|/|' > .caws/paths.txt
340
+ CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...)
341
+ OUT=""
342
+ for f in $CHANGED; do
343
+ if ! grep -q -E -f .caws/paths.txt <<< "$f"; then OUT="$OUT\n$f"; fi
344
+ done
345
+ if [ -n "$OUT" ]; then
346
+ echo -e "❌ Files outside scope.in:\n$OUT"
347
+ echo "If intentional, add a Spec Delta to .caws/working-spec.yaml and include affected paths."
348
+ exit 1
349
+ fi
350
+
351
+ budget_guard:
352
+ runs-on: ubuntu-latest
353
+ steps:
354
+ - uses: actions/checkout@v4
355
+ - name: Enforce max files/LOC from change_budget
356
+ run: |
357
+ yq -o=json '.caws/working-spec.yaml' > .caws/ws.json
358
+ MAXF=$(jq -r '.change_budget.max_files' .caws/ws.json)
359
+ MAXL=$(jq -r '.change_budget.max_loc' .caws/ws.json)
360
+ FILES=$(git diff --name-only origin/${{ github.base_ref }}... | wc -l)
361
+ LOC=$(git diff --unified=0 origin/${{ github.base_ref }}... | grep -E '^\+|^-' | wc -l)
362
+ echo "Files:$FILES LOC:$LOC (budget Files:$MAXF LOC:$MAXL)"
363
+ [ "$FILES" -le "$MAXF" ] && [ "$LOC" -le "$MAXL" ] || (echo "❌ Budget exceeded"; exit 1)
364
+
365
+ setup:
366
+ runs-on: ubuntu-latest
367
+ outputs:
368
+ risk: ${{ steps.risk.outputs.tier }}
369
+ steps:
370
+ - uses: actions/checkout@v4
371
+ - uses: actions/setup-node@v4
372
+ with: { node-version: '20' }
373
+ - run: npm ci
374
+ - name: Parse Working Spec
375
+ id: risk
376
+ run: |
377
+ pipx install yq
378
+ yq -o=json '.caws/working-spec.yaml' > .caws/working-spec.json
379
+ echo "tier=$(jq -r .risk_tier .caws/working-spec.json)" >> $GITHUB_OUTPUT
380
+ - name: Validate Spec
381
+ run: node apps/tools/caws/validate.js .caws/working-spec.json
382
+
383
+ static:
384
+ needs: setup
385
+ runs-on: ubuntu-latest
386
+ steps:
387
+ - uses: actions/checkout@v4
388
+ - uses: actions/setup-node@v4
389
+ with: { node-version: '20' }
390
+ - run: npm ci
391
+ - run: npm run typecheck && npm run lint && npm run dep:policy && npm run sast && npm run secret:scan
392
+
393
+ unit:
394
+ needs: setup
395
+ runs-on: ubuntu-latest
396
+ steps:
397
+ - uses: actions/checkout@v4
398
+ - uses: actions/setup-node@v4
399
+ with: { node-version: '20' }
400
+ - run: npm ci
401
+ - run: npm run test:unit -- --coverage
402
+ - name: Enforce Branch Coverage
403
+ run: node apps/tools/caws/gates.js coverage --tier ${{ needs.setup.outputs.risk }}
404
+
405
+ mutation:
406
+ needs: unit
407
+ runs-on: ubuntu-latest
408
+ steps:
409
+ - uses: actions/checkout@v4
410
+ - uses: actions/setup-node@v4
411
+ with: { node-version: '20' }
412
+ - run: npm ci
413
+ - run: npm run test:mutation
414
+ - run: node apps/tools/caws/gates.js mutation --tier ${{ needs.setup.outputs.risk }}
415
+
416
+ contracts:
417
+ needs: setup
418
+ runs-on: ubuntu-latest
419
+ steps:
420
+ - uses: actions/checkout@v4
421
+ - uses: actions/setup-node@v4
422
+ with: { node-version: '20' }
423
+ - run: npm ci
424
+ - run: npm run test:contract
425
+ - run: node apps/tools/caws/gates.js contracts --tier ${{ needs.setup.outputs.risk }}
426
+
427
+ integration:
428
+ needs: [setup]
429
+ runs-on: ubuntu-latest
430
+ services:
431
+ postgres: { image: postgres:16, env: { POSTGRES_PASSWORD: pass }, ports: ["5432:5432"], options: >-
432
+ --health-cmd="pg_isready -U postgres" --health-interval=10s --health-timeout=5s --health-retries=5 }
433
+ steps:
434
+ - uses: actions/checkout@v4
435
+ - uses: actions/setup-node@v4
436
+ with: { node-version: '20' }
437
+ - run: npm ci
438
+ - run: npm run test:integration
439
+
440
+ e2e_a11y:
441
+ needs: [integration]
442
+ runs-on: ubuntu-latest
443
+ steps:
444
+ - uses: actions/checkout@v4
445
+ - uses: actions/setup-node@v4
446
+ with: { node-version: '20' }
447
+ - run: npm ci
448
+ - run: npm run test:e2e:smoke
449
+ - run: npm run test:axe
450
+
451
+ perf:
452
+ if: needs.setup.outputs.risk != '3'
453
+ needs: [integration]
454
+ runs-on: ubuntu-latest
455
+ steps:
456
+ - uses: actions/checkout@v4
457
+ - uses: actions/setup-node@v4
458
+ with: { node-version: '20' }
459
+ - run: npm ci
460
+ - run: npm run perf:budgets
461
+
462
+ provenance_trust:
463
+ needs: [naming_guard, scope_guard, budget_guard, static, unit, mutation, contracts, integration, e2e_a11y, perf]
464
+ runs-on: ubuntu-latest
465
+ steps:
466
+ - uses: actions/checkout@v4
467
+ - uses: actions/setup-node@v4
468
+ with: { node-version: '20' }
469
+ - run: npm ci
470
+ - name: Generate SBOM
471
+ run: npx @cyclonedx/cyclonedx-npm --output-file .agent/sbom.json
472
+ - name: Create Attestation
473
+ run: node apps/tools/caws/attest.js > .agent/attestation.json
474
+ - name: Prompt/Tool lint
475
+ run: node apps/tools/caws/prompt-lint.js .agent/prompts/*.md --allowlist .agent/tools-allow.json
476
+ - name: Generate Provenance
477
+ run: node apps/tools/caws/provenance.js > .agent/provenance.json
478
+ - name: Validate Provenance
479
+ run: node apps/tools/caws/validate-prov.js .agent/provenance.json
480
+ - name: Compute Trust Score
481
+ run: node apps/tools/caws/gates.js trust --tier ${{ needs.setup.outputs.risk }}
482
+ ```
483
+
484
+ ### C) Repository Scaffold
485
+
486
+ ```
487
+ .caws/
488
+ policy/tier-policy.json
489
+ schemas/{working-spec.schema.json, provenance.schema.json}
490
+ templates/{pr.md, feature.plan.md, test-plan.md}
491
+ apps/contracts/ # OpenAPI/GraphQL/Pact
492
+ docs/ # human docs; ADRs
493
+ src/
494
+ tests/
495
+ unit/
496
+ contract/
497
+ integration/
498
+ e2e/
499
+ axe/
500
+ mutation/
501
+ apps/tools/caws/
502
+ validate.ts
503
+ gates.ts # thresholds, trust score
504
+ provenance.ts
505
+ prompt-lint.js # prompt hygiene & tool allowlist
506
+ attest.js # SBOM + SLSA attestation generator
507
+ tools-allow.json # allowed tools for agents
508
+ codemod/ # AST transformation scripts for refactor mode
509
+ rename.ts # example codemod for renaming modules
510
+ .agent/ # provenance artifacts (generated)
511
+ sbom.json
512
+ attestation.json
513
+ provenance.json
514
+ tools-allow.json
515
+ .github/
516
+ workflows/caws.yml
517
+ CODEOWNERS
518
+ ```
519
+
520
+ ## 3) Templates & Examples
521
+
522
+ ### Working Spec YAML Template
523
+
524
+ ```yaml
525
+ id: { { PROJECT_ID } }
526
+ title: '{{PROJECT_TITLE}}'
527
+ risk_tier: { { PROJECT_TIER } }
528
+ mode: { { PROJECT_MODE } }
529
+ change_budget:
530
+ max_files: { { MAX_FILES } }
531
+ max_loc: { { MAX_LOC } }
532
+ blast_radius:
533
+ modules: [{ { BLAST_MODULES } }]
534
+ data_migration: { { DATA_MIGRATION } }
535
+ operational_rollback_slo: '{{ROLLBACK_SLO}}'
536
+ threats: { { PROJECT_THREATS } }
537
+ scope:
538
+ in: [{ { SCOPE_IN } }]
539
+ out: [{ { SCOPE_OUT } }]
540
+ invariants: { { PROJECT_INVARIANTS } }
541
+ acceptance: { { ACCEPTANCE_CRITERIA } }
542
+ non_functional:
543
+ a11y: [{ { A11Y_REQUIREMENTS } }]
544
+ perf: { api_p95_ms: { { PERF_BUDGET } } }
545
+ security: [{ { SECURITY_REQUIREMENTS } }]
546
+ contracts:
547
+ - type: { { CONTRACT_TYPE } }
548
+ path: '{{CONTRACT_PATH}}'
549
+ observability:
550
+ logs: [{ { OBSERVABILITY_LOGS } }]
551
+ metrics: [{ { OBSERVABILITY_METRICS } }]
552
+ traces: [{ { OBSERVABILITY_TRACES } }]
553
+ migrations: { { MIGRATION_PLAN } }
554
+ rollback: [{ { ROLLBACK_PLAN } }]
555
+ ```
556
+
557
+ ### PR Description Template
558
+
559
+ ```markdown
560
+ ## Summary
561
+
562
+ {{PR_SUMMARY}}
563
+
564
+ ## Working Spec
565
+
566
+ - Risk Tier: {{RISK_TIER}}
567
+ - Mode: {{PR_MODE}}
568
+ - Invariants: {{INVARIANTS}}
569
+
570
+ ## Tests
571
+
572
+ - Unit: {{UNIT_COVERAGE}}% (target {{TARGET_COVERAGE}}%)
573
+ - Mutation: {{MUTATION_SCORE}}% (target {{TARGET_MUTATION}}%)
574
+ - Integration: {{INTEGRATION_TESTS}} flows
575
+ - E2E smoke: {{E2E_TESTS}} ({{E2E_STATUS}})
576
+ - A11y: {{A11Y_SCORE}} ({{A11Y_STATUS}})
577
+
578
+ ## Non-functional
579
+
580
+ - API p95: {{API_PERF}}ms (budget {{API_BUDGET}}ms)
581
+ - Security: {{SAST_STATUS}}
582
+
583
+ ## Migration & Rollback
584
+
585
+ {{MIGRATION_NOTES}}
586
+
587
+ ## Known Limits
588
+
589
+ {{KNOWN_LIMITS}}
590
+ ```
591
+
592
+ ## 4) Agent Conduct Rules (Hard Constraints)
593
+
594
+ 1. **Spec adherence**: Do not implement beyond scope.in; if discovered dependency changes spec, open "Spec delta" in PR and update tests first.
595
+ 2. **No hidden state/time/net**: All non-determinism injected and controlled in tests.
596
+ 3. **Explainable mocks**: Only mock boundaries; never mock the function under test; document any mock behavior in comments.
597
+ 4. **Idempotency & error paths**: Provide tests for retries/timeouts/cancel; assert invariants on error.
598
+ 5. **Observability parity**: Every key acceptance path emits logs/metrics/traces; tests assert on them when feasible (e.g., fake exporter assertions).
599
+ 6. **Data safety**: No real PII in fixtures; factories generate realistic but synthetic data.
600
+ 7. **Accessibility required**: For UI changes: keyboard path test + axe scan; for API: error messages human-readable and localizable.
601
+ 8. **Performance ownership**: Include micro-bench (where hot path) or budget check; document algorithmic complexity if changed.
602
+ 9. **Docs as code**: Update README/usage snippets; add example code; regenerate typed clients from contracts.
603
+ 10. **Rollback ready**: Feature-flag new behavior; write a reversible migration or provide kill-switch.
604
+
605
+ ## 5) Trust & Telemetry
606
+
607
+ • **Provenance manifest** (.agent/provenance.json): agent name/version, prompts, model, commit SHAs, test results hashes, generated files list, and human approvals. Stored with the PR for auditability.
608
+ • **Trust score per PR**: composite of rubric + gates + historical flake rate; expose in a PR check and weekly dashboard.
609
+ • **Drift watch**: monitor contract usage in prod; alert if undocumented fields appear.
610
+
611
+ ## 6) Operational Excellence
612
+
613
+ ### Flake Management
614
+
615
+ • **Detector**: compute week-over-week pass variance per spec ID.
616
+ • **Policy**: >0.5% variance → auto-label flake:quarantine, open ticket with owner + expiry (7 days).
617
+ • **Implementation**: Store test run hashes in .agent/provenance.json; nightly job aggregates and posts a table to dashboard.
618
+
619
+ ### Waivers & Escalation
620
+
621
+ • **Temporary waiver requires**:
622
+ • waivers.yml with: gate, reason, owner, expiry ISO date (≤ 14 days), compensating control.
623
+ • PR must link to ticket; trust score maximum capped at 79 with active waivers.
624
+ • **Escalation**: unresolved flake/waiver past expiry auto-blocks merges across the repo until cleared.
625
+
626
+ ### Security & Performance Checks
627
+
628
+ • **Secrets**: run gitleaks/trufflehog on changed files; CAWS gate blocks any hit above low severity.
629
+ • **SAST**: language-appropriate tools; gate requires zero criticals.
630
+ • **Performance**: k6 scripts for API budgets; LHCI for web budgets; regressions fail gate.
631
+ • **Migrations**: lint for reversibility; dry-run in CI; forward-compat contract tests.
632
+
633
+ ## 7) Language & Tooling Ecosystem
634
+
635
+ ### TypeScript Stack (Recommended)
636
+
637
+ • **Testing**: Jest/Vitest, fast-check, Playwright, Testcontainers, Stryker, MSW or Pact
638
+ • **Quality**: ESLint + types, LHCI, axe-core
639
+ • **CI**: GitHub Actions with Node 20
640
+
641
+ ### Python Stack
642
+
643
+ • **Testing**: pytest, hypothesis, Playwright (Python), Testcontainers-py, mutmut, Schemathesis
644
+ • **Quality**: bandit/semgrep, Lighthouse CI, axe-core
645
+
646
+ ### JVM Stack
647
+
648
+ • **Testing**: JUnit5, jqwik, Testcontainers, PIT (mutation), Pact-JVM
649
+ • **Quality**: OWASP dependency check, SonarQube, axe-core
650
+
651
+ **Note**: Mutation testing is non-negotiable for tiers ≥2; it's the only reliable guard against assertion theater.
652
+
653
+ ## 8) Review Rubric (Scriptable Scoring)
654
+
655
+ | Category | Weight | Criteria | 0 | 1 | 2 |
656
+ | --------------------------------- | ------ | ----------------------------------- | ----------------- | ------------------ | --------------------------- |
657
+ | Spec clarity & invariants | ×5 | Clear, testable invariants | Missing/unclear | Basic coverage | Comprehensive + edge cases |
658
+ | Contract correctness & versioning | ×5 | Schema accuracy + versioning | Errors present | Minor issues | Perfect + versioned |
659
+ | Unit thoroughness & edge coverage | ×5 | Branch coverage + property tests | <70% coverage | Meets tier minimum | >90% + properties |
660
+ | Integration realism | ×4 | Real containers + seeds | Mocked heavily | Basic containers | Full stack + realistic data |
661
+ | E2E relevance & stability | ×3 | Critical paths + semantic selectors | Brittle selectors | Basic coverage | Semantic + stable |
662
+ | Mutation adequacy | ×4 | Score vs tier threshold | <50% | Meets minimum | >80% |
663
+ | A11y pathways & results | ×3 | Keyboard + axe clean | Major issues | Basic compliance | Full WCAG + keyboard |
664
+ | Perf/Resilience | ×3 | Budgets + timeouts/retries | No checks | Basic budgets | Full resilience |
665
+ | Observability | ×3 | Logs/metrics/traces asserted | Missing | Basic emission | Asserted in tests |
666
+ | Migration safety & rollback | ×3 | Reversible + kill-switch | No rollback | Basic revert | Full rollback + testing |
667
+ | Docs & PR explainability | ×3 | Clear rationale + limits | Minimal | Basic docs | Comprehensive + ADR |
668
+ | **Mode compliance** | ×3 | Changes match declared `mode` | Violations | Minor drift | Full compliance |
669
+ | **Scope & budget discipline** | ×3 | Diff within `scope.in` & budget | Exceeded | Near limit | Within limits |
670
+ | **Supply-chain attestations** | ×2 | SBOM + SLSA attestation | Missing | Partial | Complete & valid |
671
+
672
+ **Target**: ≥ 82/100 (weighted sum). Calculator in `apps/tools/caws/rubric.ts`.
673
+
674
+ ## 9) Anti-patterns (Explicitly Rejected)
675
+
676
+ • **Over-mocked integration tests**: mocking ORM or HTTP client where containerized integration is feasible.
677
+ • **UI tests keyed on CSS classes**: brittle selectors instead of semantic roles/labels.
678
+ • **Coupling tests to implementation details**: private method calls, internal sequence assertions.
679
+ • **"Retry until green" CI culture**: quarantines without expiry or owner.
680
+ • **100% coverage mandates**: without mutation testing or risk awareness.
681
+
682
+ ## 13) Failure-Mode Cards (Common Traps & Recovery)
683
+
684
+ Add a small section of "If you see X, do Y":
685
+
686
+ 1. **Symptom:** Large rename + re-exports create `*-copy.ts` or `enhanced-*.ts`.
687
+ **Action:** Switch to **refactor mode**. Generate `codemod/rename.ts` that updates imports/exports in place. Validate with `tsc --noEmit` and run mutation tests to ensure unchanged behavior.
688
+
689
+ 2. **Symptom:** Contract change proliferates across services.
690
+ **Action:** Declare **blast_radius.modules**; create consumer **Pact** tests first. Stage changes behind a feature flag; ship provider compatibility for both old/new fields.
691
+
692
+ 3. **Symptom:** Flaky time-based tests.
693
+ **Action:** Inject `Clock` and use fixed timestamps; assert **idempotency** with property tests.
694
+
695
+ 4. **Symptom:** Agent proposes new external tool/library.
696
+ **Action:** Fail unless added to `tool_allowlist`. Require SBOM delta review and perf/a11y/security notes in the PR.
697
+
698
+ ## 10) Cursor/Codex Agent Integration
699
+
700
+ ### Agent Commands
701
+
702
+ • `agent plan` → emits plan + test matrix
703
+ • `agent verify` → runs local gates; generates provenance
704
+ • `agent prove` → creates provenance manifest
705
+ • `agent doc` → updates README/changelog from spec
706
+
707
+ ### Guardrails
708
+
709
+ • **Templates**: Inject Working Spec YAML + PR template on "New Feature" command
710
+ • **Scaffold**: Pre-wire tests/\* skeletons with containers and contracts
711
+ • **Context discipline**: Restrict writes to spec-touched modules; deny outside scope unless spec updated
712
+ • **Feedback loop**: PR comments show coverage, mutation diff, contract verification summary
713
+
714
+ ## 11) Adoption Roadmap
715
+
716
+ ### Foundation Setup
717
+
718
+ - [ ] Add .caws/ directory with schemas and templates
719
+ - [ ] Create apps/tools/caws/ validation scripts
720
+ - [ ] Wire basic GitHub Actions workflow
721
+ - [ ] Add CODEOWNERS for Tier-1 paths
722
+
723
+ ### Quality Gates Implementation
724
+
725
+ - [ ] Enable Testcontainers for integration tests
726
+ - [ ] Add mutation testing with tier thresholds
727
+ - [ ] Implement trust score calculation
728
+ - [ ] Add axe + Playwright smoke for UI changes
729
+
730
+ ### Operational Excellence
731
+
732
+ - [ ] Publish provenance manifest with PRs
733
+ - [ ] Implement flake detector and quarantine process
734
+ - [ ] Add waiver system with trust score caps
735
+ - [ ] Socialize review rubric and block merges <80
736
+
737
+ ### Continuous Improvement
738
+
739
+ - [ ] Monitor drift in contract usage
740
+ - [ ] Refine tooling based on feedback
741
+ - [ ] Expand language support as needed
742
+ - [ ] Track trust score trends and flake rates
743
+
744
+ ## 12) Trust Score Formula
745
+
746
+ ```typescript
747
+ const weights = {
748
+ coverage: 0.2,
749
+ mutation: 0.2,
750
+ contracts: 0.16,
751
+ a11y: 0.08,
752
+ perf: 0.08,
753
+ flake: 0.08,
754
+ mode: 0.06,
755
+ scope: 0.06,
756
+ supplychain: 0.04,
757
+ };
758
+
759
+ function trustScore(tier: string, prov: Provenance) {
760
+ const wsum = Object.values(weights).reduce((a, b) => a + b, 0);
761
+ const score =
762
+ weights.coverage * normalize(prov.results.coverage_branch, tiers[tier].min_branch, 0.95) +
763
+ weights.mutation * normalize(prov.results.mutation_score, tiers[tier].min_mutation, 0.9) +
764
+ weights.contracts *
765
+ (tiers[tier].requires_contracts
766
+ ? prov.results.contracts.consumer && prov.results.contracts.provider
767
+ ? 1
768
+ : 0
769
+ : 1) +
770
+ weights.a11y * (prov.results.a11y === 'pass' ? 1 : 0) +
771
+ weights.perf * budgetOk(prov.results.perf) +
772
+ weights.flake * (prov.results.flake_rate <= 0.005 ? 1 : 0.5) +
773
+ weights.mode * (prov.results.mode_compliance === 'full' ? 1 : 0.5) +
774
+ weights.scope * (prov.results.scope_within_budget ? 1 : 0) +
775
+ weights.supplychain * (prov.results.sbom_valid && prov.results.attestation_valid ? 1 : 0);
776
+ return Math.round((score / wsum) * 100);
777
+ }
778
+ ```
779
+
780
+ This v1.0 combines the philosophical foundation of our system with the practical, executable implementation details needed for immediate adoption. The framework provides both the "why" (quality principles) and the "how" (automated enforcement) needed for engineering-grade AI coding agents.
781
+
782
+ ---
783
+
784
+ ## 🚀 Quick Start Guide
785
+
786
+ ### For New Projects
787
+
788
+ 1. Copy this template to your project root
789
+ 2. Run `caws init` to scaffold the project structure
790
+ 3. Customize the Working Spec YAML for your project
791
+ 4. Set up your CI/CD pipeline with the provided GitHub Actions
792
+
793
+ ### For Existing Projects
794
+
795
+ 1. Copy the relevant sections to your existing project
796
+ 2. Run `caws scaffold` to add missing components
797
+ 3. Update your existing workflows to include the CAWS gates
798
+
799
+ ### Customization
800
+
801
+ - **Project ID**: Update `{{PROJECT_ID}}` with your ticket system prefix
802
+ - **Title**: Describe your project in `{{PROJECT_TITLE}}`
803
+ - **Tier**: Set appropriate risk tier (1-3) in `{{PROJECT_TIER}}`
804
+ - **Mode**: Choose from `refactor`, `feature`, `fix`, `doc`, `chore`
805
+ - **Budget**: Set reasonable file/LOC limits in `change_budget`
806
+ - **Scope**: Define what files/features are in/out of scope
807
+ - **Contracts**: Specify API contracts (OpenAPI, GraphQL, etc.)
808
+
809
+ ### Support
810
+
811
+ - 📖 Full documentation: See sections above
812
+ - 🛠️ Tools: `apps/tools/caws/` contains all utilities
813
+ - 🎯 Examples: Check `docs/` for implementation examples
814
+ - 🤝 Community: Follow the agent conduct rules for collaboration
815
+
816
+ ---
817
+
818
+ **Author**: @darianrosebrook
819
+ **Version**: 1.0.0
820
+ **License**: MIT