@paths.design/caws-cli 2.0.1 → 3.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +1463 -121
- package/package.json +3 -2
- package/templates/agents.md +820 -0
- package/templates/apps/tools/caws/COMPLETION_REPORT.md +331 -0
- package/templates/apps/tools/caws/MIGRATION_SUMMARY.md +360 -0
- package/templates/apps/tools/caws/README.md +463 -0
- package/templates/apps/tools/caws/TEST_STATUS.md +365 -0
- package/templates/apps/tools/caws/attest.js +357 -0
- package/templates/apps/tools/caws/ci-optimizer.js +642 -0
- package/templates/apps/tools/caws/config.ts +245 -0
- package/templates/apps/tools/caws/cross-functional.js +876 -0
- package/templates/apps/tools/caws/dashboard.js +1112 -0
- package/templates/apps/tools/caws/flake-detector.ts +362 -0
- package/templates/apps/tools/caws/gates.js +198 -0
- package/templates/apps/tools/caws/gates.ts +237 -0
- package/templates/apps/tools/caws/language-adapters.ts +381 -0
- package/templates/apps/tools/caws/language-support.d.ts +367 -0
- package/templates/apps/tools/caws/language-support.d.ts.map +1 -0
- package/templates/apps/tools/caws/language-support.js +585 -0
- package/templates/apps/tools/caws/legacy-assessment.ts +408 -0
- package/templates/apps/tools/caws/legacy-assessor.js +764 -0
- package/templates/apps/tools/caws/mutant-analyzer.js +734 -0
- package/templates/apps/tools/caws/perf-budgets.ts +349 -0
- package/templates/apps/tools/caws/prompt-lint.js.backup +274 -0
- package/templates/apps/tools/caws/property-testing.js +707 -0
- package/templates/apps/tools/caws/provenance.d.ts +14 -0
- package/templates/apps/tools/caws/provenance.d.ts.map +1 -0
- package/templates/apps/tools/caws/provenance.js +132 -0
- package/templates/apps/tools/caws/provenance.js.backup +73 -0
- package/templates/apps/tools/caws/provenance.ts +211 -0
- package/templates/apps/tools/caws/schemas/waivers.schema.json +30 -0
- package/templates/apps/tools/caws/schemas/working-spec.schema.json +115 -0
- package/templates/apps/tools/caws/scope-guard.js +208 -0
- package/templates/apps/tools/caws/security-provenance.ts +483 -0
- package/templates/apps/tools/caws/shared/base-tool.ts +281 -0
- package/templates/apps/tools/caws/shared/config-manager.ts +366 -0
- package/templates/apps/tools/caws/shared/gate-checker.ts +597 -0
- package/templates/apps/tools/caws/shared/types.ts +444 -0
- package/templates/apps/tools/caws/shared/validator.ts +305 -0
- package/templates/apps/tools/caws/shared/waivers-manager.ts +174 -0
- package/templates/apps/tools/caws/spec-test-mapper.ts +391 -0
- package/templates/apps/tools/caws/templates/working-spec.template.yml +60 -0
- package/templates/apps/tools/caws/test-quality.js +578 -0
- package/templates/apps/tools/caws/tools-allow.json +331 -0
- package/templates/apps/tools/caws/validate.js +76 -0
- package/templates/apps/tools/caws/validate.ts +228 -0
- package/templates/apps/tools/caws/waivers.js +344 -0
- package/templates/apps/tools/caws/waivers.yml +19 -0
- package/templates/codemod/README.md +1 -0
- package/templates/codemod/test.js +1 -0
- package/templates/docs/README.md +150 -0
|
@@ -0,0 +1,820 @@
|
|
|
1
|
+
# CAWS v1.0 — Engineering-Grade Operating System for Coding Agents
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
Our "engineering-grade" operating system for coding agents that (1) forces planning before code, (2) bakes in tests as first-class artifacts, (3) creates explainable provenance, and (4) enforces quality via automated CI gates. It's expressed as a Working Spec + Ruleset the agent must follow, with schemas, templates, scripts, and verification hooks that enable better collaboration between agent and our human in the loop.
|
|
6
|
+
|
|
7
|
+
## 1) Core Framework
|
|
8
|
+
|
|
9
|
+
### Risk Tiering → Drives Rigor
|
|
10
|
+
|
|
11
|
+
• **Tier 1** (Core/critical path, auth/billing, migrations): highest rigor; mutation ≥ 70, branch cov ≥ 90, contract tests mandatory, chaos tests optional, manual review required.
|
|
12
|
+
• **Tier 2** (Common features, data writes, cross-service APIs): mutation ≥ 50, branch cov ≥ 80, contracts mandatory if any external API, e2e smoke required.
|
|
13
|
+
• **Tier 3** (Low risk, read-only UI, internal tooling): mutation ≥ 30, branch cov ≥ 70, integration happy-path + unit thoroughness, e2e optional.
|
|
14
|
+
|
|
15
|
+
Agent must infer and declare tier in the plan; human reviewer may bump it up, never down.
|
|
16
|
+
|
|
17
|
+
### New Invariants (Repository-Level "Operating Envelope")
|
|
18
|
+
|
|
19
|
+
1. **Atomic Change Budget**
|
|
20
|
+
- _Invariant:_ "A PR must fit into one of: `refactor`, `feature`, `fix`, `doc`, `chore`—and must touch only files that the Working Spec's `scope.in` names."
|
|
21
|
+
- _Reason:_ Kills scope-creep; enables deterministic review.
|
|
22
|
+
- _Gate:_ CI rejects PRs that modify files outside `scope.in` unless `spec_delta` is present.
|
|
23
|
+
|
|
24
|
+
2. **In-place Refactor (No Shadow Copies)**
|
|
25
|
+
- _Invariant:_ Refactors perform **in-place** edits with AST codemods; **no parallel files** (e.g., `enhanced-*.ts`).
|
|
26
|
+
- _Gate:_ a naming linter blocks new files that share stem with suffix/prefix (`enhanced|new|v2|copy|final`).
|
|
27
|
+
|
|
28
|
+
3. **Determinism & Idempotency**
|
|
29
|
+
- _Invariant:_ All new code must be testable with injected clock/uuid/random; repeated requests must be safe (where applicable) and asserted in tests.
|
|
30
|
+
- _Gate:_ mutation tests + property tests include at least one idempotency predicate for Tier ≥2.
|
|
31
|
+
|
|
32
|
+
4. **Prompt & Tool Security Envelope** (for agent workflows)
|
|
33
|
+
- _Invariant:_ Agents operate with **tool allow-lists**, **redacted secrets**, and **context firebreaks** (no raw secrets in model context; never post `.env`, keys, or tokens back into diffs).
|
|
34
|
+
- _Gate:_ prompt-lint and secret-scan on the agent prompt files + PR diffs.
|
|
35
|
+
|
|
36
|
+
5. **Supply-chain Provenance**
|
|
37
|
+
- _Invariant:_ Every CI build produces an SBOM + SLSA-style attestation attached to the PR.
|
|
38
|
+
- _Gate:_ trust score requires valid SBOM/attestation.
|
|
39
|
+
|
|
40
|
+
### Required Inputs (No Code Until Present)
|
|
41
|
+
|
|
42
|
+
• **Working Spec YAML** (see schema below) with user story, scope, invariants, acceptance tests, non-functional budgets, risk tier.
|
|
43
|
+
• **Interface Contracts**: OpenAPI/GraphQL SDL/proto/Pact provider/consumer stubs.
|
|
44
|
+
• **Test Plan**: unit cases, properties, fixtures, integration flows, e2e smokes; data setup/teardown; flake controls.
|
|
45
|
+
• **Change Impact Map**: touched modules, migrations, roll-forward/rollback.
|
|
46
|
+
• **A11y/Perf/Sec budgets**: keyboard path(s), axe rules to enforce; perf budget (TTI/LCP/API latency); SAST/secret scanning & deps policy.
|
|
47
|
+
|
|
48
|
+
If any are missing, agent must generate a draft and request confirmation inside the PR description before implementing.
|
|
49
|
+
|
|
50
|
+
### The Loop: Plan → Implement → Verify → Document
|
|
51
|
+
|
|
52
|
+
#### 2.1 Plan (agent output, committed as feature.plan.md)
|
|
53
|
+
|
|
54
|
+
• **Design sketch**: sequence diagram or pseudo-API table.
|
|
55
|
+
• **Test matrix**: aligned to user intent (unit/contract/integration/e2e) with edge cases and property predicates.
|
|
56
|
+
• **Data plan**: factories/fixtures, seed strategy, anonymized sample payloads.
|
|
57
|
+
• **Observability plan**: logs/metrics/traces; which spans and attributes will verify correctness in prod.
|
|
58
|
+
|
|
59
|
+
#### 2.2 Implement (rules)
|
|
60
|
+
|
|
61
|
+
• **Contract-first**: generate/validate types from OpenAPI/SDL; add contract tests (Pact/WireMock/MSW) before impl.
|
|
62
|
+
• **Unit focus**: pure logic isolated; mocks only at boundaries you own (clock, fs, network).
|
|
63
|
+
• **State seams**: inject time/uuid/random; ensure determinism; guard for idempotency where relevant.
|
|
64
|
+
• **Migration discipline**: forwards-compatible; provide up/down, dry-run, and backfill strategy.
|
|
65
|
+
|
|
66
|
+
### Mode Matrix
|
|
67
|
+
|
|
68
|
+
| Mode | Contracts | New Files | Required Artifacts |
|
|
69
|
+
| ------------ | ------------------------------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------ |
|
|
70
|
+
| **refactor** | Must not change | Discouraged; only when splitting modules with 1:1 mapping and codemod provided | Codemod script + semantic diff report |
|
|
71
|
+
| **feature** | Required first; consumer/provider tests green before implementation | Allowed; must be listed in scope.in | Migration plan, feature flag, performance budget |
|
|
72
|
+
| **fix** | Unchanged | Discouraged; prefer in-place edits | Red test → green; root cause note in PR |
|
|
73
|
+
| **doc** | N/A | Allowed for documentation files | Updated README/usage snippets |
|
|
74
|
+
| **chore** | N/A | Limited to build/tooling changes | Version updates, dependency changes |
|
|
75
|
+
|
|
76
|
+
### Cursor/Codex Execution Guard
|
|
77
|
+
|
|
78
|
+
Add a commit policy hook to reject commit sets that introduce duplicate stems:
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
# .git/hooks/pre-commit (or CI script)
|
|
82
|
+
PATTERN='/(copy|final|enhanced|v2)[.-]|/(new-)| - copy\.'
|
|
83
|
+
git diff --cached --name-only | grep -E "$PATTERN" && {
|
|
84
|
+
echo "❌ Disallowed filename pattern. Use in-place refactor or codemod."
|
|
85
|
+
exit 1
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
#### 2.3 Verify (must pass locally and in CI)
|
|
90
|
+
|
|
91
|
+
• **Static checks**: typecheck, lint (code + tests), import hygiene, dead-code scan, secret scan.
|
|
92
|
+
• **Tests**:
|
|
93
|
+
• **Unit**: fast, deterministic; cover branches and edge conditions; property-based where feasible.
|
|
94
|
+
• **Contract**: consumer/provider; versioned and stored under apps/contracts/.
|
|
95
|
+
• **Integration**: real DB or Testcontainers; seed data via factories; verify persistence, transactions, retries/timeouts.
|
|
96
|
+
• **E2E smoke**: Playwright/Cypress; critical user paths only; semantic selectors; screenshot+trace on failure.
|
|
97
|
+
• **Mutation testing**: minimum scores per tier; non-conformant builds fail.
|
|
98
|
+
• **Non-functional checks**: axe rules; Lighthouse CI budgets or API latency budgets; SAST/dep scan clean.
|
|
99
|
+
• **Flake policy**: tests that intermittently fail are quarantined within 24h with an open ticket; no retries as policy, only as temporary band-aid with expiry.
|
|
100
|
+
|
|
101
|
+
#### 2.4 Document & Deliver
|
|
102
|
+
|
|
103
|
+
• **PR bundle** (template below) with:
|
|
104
|
+
• Working Spec YAML
|
|
105
|
+
• Test Plan & Coverage/Mutation summary, Contract artifacts
|
|
106
|
+
• Risk assessment, Rollback plan, Observability notes (dashboards/queries)
|
|
107
|
+
• Changelog (semver impact), Migration notes
|
|
108
|
+
• Traceability: PR title references ticket; commits follow conventional commits; each test cites the requirement ID in test name or annotation.
|
|
109
|
+
• Explainability: agent includes a 10-line "rationale" and "known-limits" section.
|
|
110
|
+
|
|
111
|
+
## 2) Machine-Enforceable Implementation
|
|
112
|
+
|
|
113
|
+
### A) Executable Schemas & Validation
|
|
114
|
+
|
|
115
|
+
#### Working Spec JSON Schema
|
|
116
|
+
|
|
117
|
+
```json
|
|
118
|
+
{
|
|
119
|
+
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
|
120
|
+
"title": "CAWS Working Spec",
|
|
121
|
+
"type": "object",
|
|
122
|
+
"required": [
|
|
123
|
+
"id",
|
|
124
|
+
"title",
|
|
125
|
+
"risk_tier",
|
|
126
|
+
"mode",
|
|
127
|
+
"change_budget",
|
|
128
|
+
"blast_radius",
|
|
129
|
+
"operational_rollback_slo",
|
|
130
|
+
"scope",
|
|
131
|
+
"invariants",
|
|
132
|
+
"acceptance",
|
|
133
|
+
"non_functional",
|
|
134
|
+
"contracts"
|
|
135
|
+
],
|
|
136
|
+
"properties": {
|
|
137
|
+
"id": { "type": "string", "pattern": "^[A-Z]+-\\d+$" },
|
|
138
|
+
"title": { "type": "string", "minLength": 8 },
|
|
139
|
+
"risk_tier": { "type": "integer", "enum": [1, 2, 3] },
|
|
140
|
+
"mode": { "type": "string", "enum": ["refactor", "feature", "fix", "doc", "chore"] },
|
|
141
|
+
"change_budget": {
|
|
142
|
+
"type": "object",
|
|
143
|
+
"properties": {
|
|
144
|
+
"max_files": { "type": "integer", "minimum": 1 },
|
|
145
|
+
"max_loc": { "type": "integer", "minimum": 1 }
|
|
146
|
+
},
|
|
147
|
+
"required": ["max_files", "max_loc"],
|
|
148
|
+
"additionalProperties": false
|
|
149
|
+
},
|
|
150
|
+
"blast_radius": {
|
|
151
|
+
"type": "object",
|
|
152
|
+
"properties": {
|
|
153
|
+
"modules": { "type": "array", "items": { "type": "string" } },
|
|
154
|
+
"data_migration": { "type": "boolean" }
|
|
155
|
+
},
|
|
156
|
+
"required": ["modules", "data_migration"],
|
|
157
|
+
"additionalProperties": false
|
|
158
|
+
},
|
|
159
|
+
"operational_rollback_slo": { "type": "string", "pattern": "^[0-9]+m$|^[0-9]+h$" },
|
|
160
|
+
"threats": { "type": "array", "items": { "type": "string" } },
|
|
161
|
+
"scope": {
|
|
162
|
+
"type": "object",
|
|
163
|
+
"required": ["in", "out"],
|
|
164
|
+
"properties": {
|
|
165
|
+
"in": { "type": "array", "items": { "type": "string" }, "minItems": 1 },
|
|
166
|
+
"out": { "type": "array", "items": { "type": "string" } }
|
|
167
|
+
}
|
|
168
|
+
},
|
|
169
|
+
"invariants": { "type": "array", "items": { "type": "string" }, "minItems": 1 },
|
|
170
|
+
"acceptance": {
|
|
171
|
+
"type": "array",
|
|
172
|
+
"minItems": 1,
|
|
173
|
+
"items": {
|
|
174
|
+
"type": "object",
|
|
175
|
+
"required": ["id", "given", "when", "then"],
|
|
176
|
+
"properties": {
|
|
177
|
+
"id": { "type": "string", "pattern": "^A\\d+$" },
|
|
178
|
+
"given": { "type": "string" },
|
|
179
|
+
"when": { "type": "string" },
|
|
180
|
+
"then": { "type": "string" }
|
|
181
|
+
}
|
|
182
|
+
}
|
|
183
|
+
},
|
|
184
|
+
"non_functional": {
|
|
185
|
+
"type": "object",
|
|
186
|
+
"properties": {
|
|
187
|
+
"a11y": { "type": "array", "items": { "type": "string" } },
|
|
188
|
+
"perf": {
|
|
189
|
+
"type": "object",
|
|
190
|
+
"properties": {
|
|
191
|
+
"api_p95_ms": { "type": "integer", "minimum": 1 },
|
|
192
|
+
"lcp_ms": { "type": "integer", "minimum": 1 }
|
|
193
|
+
},
|
|
194
|
+
"additionalProperties": false
|
|
195
|
+
},
|
|
196
|
+
"security": { "type": "array", "items": { "type": "string" } }
|
|
197
|
+
},
|
|
198
|
+
"additionalProperties": false
|
|
199
|
+
},
|
|
200
|
+
"contracts": {
|
|
201
|
+
"type": "array",
|
|
202
|
+
"minItems": 1,
|
|
203
|
+
"items": {
|
|
204
|
+
"type": "object",
|
|
205
|
+
"required": ["type", "path"],
|
|
206
|
+
"properties": {
|
|
207
|
+
"type": { "type": "string", "enum": ["openapi", "graphql", "proto", "pact"] },
|
|
208
|
+
"path": { "type": "string" }
|
|
209
|
+
}
|
|
210
|
+
}
|
|
211
|
+
},
|
|
212
|
+
"observability": {
|
|
213
|
+
"type": "object",
|
|
214
|
+
"properties": {
|
|
215
|
+
"logs": { "type": "array", "items": { "type": "string" } },
|
|
216
|
+
"metrics": { "type": "array", "items": { "type": "string" } },
|
|
217
|
+
"traces": { "type": "array", "items": { "type": "string" } }
|
|
218
|
+
}
|
|
219
|
+
},
|
|
220
|
+
"migrations": { "type": "array", "items": { "type": "string" } },
|
|
221
|
+
"rollback": { "type": "array", "items": { "type": "string" } }
|
|
222
|
+
},
|
|
223
|
+
"additionalProperties": false
|
|
224
|
+
}
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
#### Provenance Manifest Schema
|
|
228
|
+
|
|
229
|
+
```json
|
|
230
|
+
{
|
|
231
|
+
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
|
232
|
+
"type": "object",
|
|
233
|
+
"required": [
|
|
234
|
+
"agent",
|
|
235
|
+
"model",
|
|
236
|
+
"model_hash",
|
|
237
|
+
"tool_allowlist",
|
|
238
|
+
"commit",
|
|
239
|
+
"artifacts",
|
|
240
|
+
"results",
|
|
241
|
+
"approvals",
|
|
242
|
+
"sbom",
|
|
243
|
+
"attestation"
|
|
244
|
+
],
|
|
245
|
+
"properties": {
|
|
246
|
+
"agent": { "type": "string" },
|
|
247
|
+
"model": { "type": "string" },
|
|
248
|
+
"model_hash": { "type": "string" },
|
|
249
|
+
"tool_allowlist": { "type": "array", "items": { "type": "string" } },
|
|
250
|
+
"prompts": { "type": "array", "items": { "type": "string" } },
|
|
251
|
+
"commit": { "type": "string" },
|
|
252
|
+
"artifacts": { "type": "array", "items": { "type": "string" } },
|
|
253
|
+
"results": {
|
|
254
|
+
"type": "object",
|
|
255
|
+
"properties": {
|
|
256
|
+
"coverage_branch": { "type": "number" },
|
|
257
|
+
"mutation_score": { "type": "number" },
|
|
258
|
+
"tests_passed": { "type": "integer" },
|
|
259
|
+
"contracts": {
|
|
260
|
+
"type": "object",
|
|
261
|
+
"properties": { "consumer": { "type": "boolean" }, "provider": { "type": "boolean" } }
|
|
262
|
+
},
|
|
263
|
+
"a11y": { "type": "string" },
|
|
264
|
+
"perf": { "type": "object" }
|
|
265
|
+
},
|
|
266
|
+
"additionalProperties": true
|
|
267
|
+
},
|
|
268
|
+
"approvals": { "type": "array", "items": { "type": "string" } },
|
|
269
|
+
"sbom": { "type": "string" },
|
|
270
|
+
"attestation": { "type": "string" }
|
|
271
|
+
}
|
|
272
|
+
}
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
#### Tier Policy Configuration
|
|
276
|
+
|
|
277
|
+
```json
|
|
278
|
+
{
|
|
279
|
+
"1": {
|
|
280
|
+
"min_branch": 0.9,
|
|
281
|
+
"min_mutation": 0.7,
|
|
282
|
+
"requires_contracts": true,
|
|
283
|
+
"requires_manual_review": true,
|
|
284
|
+
"max_files": 40,
|
|
285
|
+
"max_loc": 1500,
|
|
286
|
+
"allowed_modes": ["feature", "refactor", "fix"]
|
|
287
|
+
},
|
|
288
|
+
"2": {
|
|
289
|
+
"min_branch": 0.8,
|
|
290
|
+
"min_mutation": 0.5,
|
|
291
|
+
"requires_contracts": true,
|
|
292
|
+
"max_files": 25,
|
|
293
|
+
"max_loc": 1000,
|
|
294
|
+
"allowed_modes": ["feature", "refactor", "fix"]
|
|
295
|
+
},
|
|
296
|
+
"3": {
|
|
297
|
+
"min_branch": 0.7,
|
|
298
|
+
"min_mutation": 0.3,
|
|
299
|
+
"requires_contracts": false,
|
|
300
|
+
"max_files": 15,
|
|
301
|
+
"max_loc": 600,
|
|
302
|
+
"allowed_modes": ["feature", "refactor", "fix", "doc", "chore"]
|
|
303
|
+
}
|
|
304
|
+
}
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
### B) CI/CD Quality Gates (Automated)
|
|
308
|
+
|
|
309
|
+
#### Complete GitHub Actions Pipeline
|
|
310
|
+
|
|
311
|
+
```yaml
|
|
312
|
+
name: CAWS Quality Gates
|
|
313
|
+
on:
|
|
314
|
+
pull_request:
|
|
315
|
+
types: [opened, synchronize, reopened, ready_for_review]
|
|
316
|
+
|
|
317
|
+
jobs:
|
|
318
|
+
naming_guard:
|
|
319
|
+
runs-on: ubuntu-latest
|
|
320
|
+
steps:
|
|
321
|
+
- uses: actions/checkout@v4
|
|
322
|
+
- name: Block shadow file patterns
|
|
323
|
+
run: |
|
|
324
|
+
BAD=$(git diff --name-only origin/${{ github.base_ref }}... | \
|
|
325
|
+
grep -E '/(copy|final|enhanced|v2)[.-]|/(new-)|(^|/)_.+\.| - copy\.' || true)
|
|
326
|
+
if [ -n "$BAD" ]; then
|
|
327
|
+
echo "❌ Shadow/duplicate filename patterns detected:"
|
|
328
|
+
echo "$BAD"
|
|
329
|
+
exit 1
|
|
330
|
+
fi
|
|
331
|
+
|
|
332
|
+
scope_guard:
|
|
333
|
+
runs-on: ubuntu-latest
|
|
334
|
+
steps:
|
|
335
|
+
- uses: actions/checkout@v4
|
|
336
|
+
- name: Ensure changes are within scope.in
|
|
337
|
+
run: |
|
|
338
|
+
yq -o=json '.caws/working-spec.yaml' > .caws/ws.json
|
|
339
|
+
jq -r '.scope.in[]' .caws/ws.json | sed 's|^|^|; s|$|/|' > .caws/paths.txt
|
|
340
|
+
CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...)
|
|
341
|
+
OUT=""
|
|
342
|
+
for f in $CHANGED; do
|
|
343
|
+
if ! grep -q -E -f .caws/paths.txt <<< "$f"; then OUT="$OUT\n$f"; fi
|
|
344
|
+
done
|
|
345
|
+
if [ -n "$OUT" ]; then
|
|
346
|
+
echo -e "❌ Files outside scope.in:\n$OUT"
|
|
347
|
+
echo "If intentional, add a Spec Delta to .caws/working-spec.yaml and include affected paths."
|
|
348
|
+
exit 1
|
|
349
|
+
fi
|
|
350
|
+
|
|
351
|
+
budget_guard:
|
|
352
|
+
runs-on: ubuntu-latest
|
|
353
|
+
steps:
|
|
354
|
+
- uses: actions/checkout@v4
|
|
355
|
+
- name: Enforce max files/LOC from change_budget
|
|
356
|
+
run: |
|
|
357
|
+
yq -o=json '.caws/working-spec.yaml' > .caws/ws.json
|
|
358
|
+
MAXF=$(jq -r '.change_budget.max_files' .caws/ws.json)
|
|
359
|
+
MAXL=$(jq -r '.change_budget.max_loc' .caws/ws.json)
|
|
360
|
+
FILES=$(git diff --name-only origin/${{ github.base_ref }}... | wc -l)
|
|
361
|
+
LOC=$(git diff --unified=0 origin/${{ github.base_ref }}... | grep -E '^\+|^-' | wc -l)
|
|
362
|
+
echo "Files:$FILES LOC:$LOC (budget Files:$MAXF LOC:$MAXL)"
|
|
363
|
+
[ "$FILES" -le "$MAXF" ] && [ "$LOC" -le "$MAXL" ] || (echo "❌ Budget exceeded"; exit 1)
|
|
364
|
+
|
|
365
|
+
setup:
|
|
366
|
+
runs-on: ubuntu-latest
|
|
367
|
+
outputs:
|
|
368
|
+
risk: ${{ steps.risk.outputs.tier }}
|
|
369
|
+
steps:
|
|
370
|
+
- uses: actions/checkout@v4
|
|
371
|
+
- uses: actions/setup-node@v4
|
|
372
|
+
with: { node-version: '20' }
|
|
373
|
+
- run: npm ci
|
|
374
|
+
- name: Parse Working Spec
|
|
375
|
+
id: risk
|
|
376
|
+
run: |
|
|
377
|
+
pipx install yq
|
|
378
|
+
yq -o=json '.caws/working-spec.yaml' > .caws/working-spec.json
|
|
379
|
+
echo "tier=$(jq -r .risk_tier .caws/working-spec.json)" >> $GITHUB_OUTPUT
|
|
380
|
+
- name: Validate Spec
|
|
381
|
+
run: node apps/tools/caws/validate.js .caws/working-spec.json
|
|
382
|
+
|
|
383
|
+
static:
|
|
384
|
+
needs: setup
|
|
385
|
+
runs-on: ubuntu-latest
|
|
386
|
+
steps:
|
|
387
|
+
- uses: actions/checkout@v4
|
|
388
|
+
- uses: actions/setup-node@v4
|
|
389
|
+
with: { node-version: '20' }
|
|
390
|
+
- run: npm ci
|
|
391
|
+
- run: npm run typecheck && npm run lint && npm run dep:policy && npm run sast && npm run secret:scan
|
|
392
|
+
|
|
393
|
+
unit:
|
|
394
|
+
needs: setup
|
|
395
|
+
runs-on: ubuntu-latest
|
|
396
|
+
steps:
|
|
397
|
+
- uses: actions/checkout@v4
|
|
398
|
+
- uses: actions/setup-node@v4
|
|
399
|
+
with: { node-version: '20' }
|
|
400
|
+
- run: npm ci
|
|
401
|
+
- run: npm run test:unit -- --coverage
|
|
402
|
+
- name: Enforce Branch Coverage
|
|
403
|
+
run: node apps/tools/caws/gates.js coverage --tier ${{ needs.setup.outputs.risk }}
|
|
404
|
+
|
|
405
|
+
mutation:
|
|
406
|
+
needs: unit
|
|
407
|
+
runs-on: ubuntu-latest
|
|
408
|
+
steps:
|
|
409
|
+
- uses: actions/checkout@v4
|
|
410
|
+
- uses: actions/setup-node@v4
|
|
411
|
+
with: { node-version: '20' }
|
|
412
|
+
- run: npm ci
|
|
413
|
+
- run: npm run test:mutation
|
|
414
|
+
- run: node apps/tools/caws/gates.js mutation --tier ${{ needs.setup.outputs.risk }}
|
|
415
|
+
|
|
416
|
+
contracts:
|
|
417
|
+
needs: setup
|
|
418
|
+
runs-on: ubuntu-latest
|
|
419
|
+
steps:
|
|
420
|
+
- uses: actions/checkout@v4
|
|
421
|
+
- uses: actions/setup-node@v4
|
|
422
|
+
with: { node-version: '20' }
|
|
423
|
+
- run: npm ci
|
|
424
|
+
- run: npm run test:contract
|
|
425
|
+
- run: node apps/tools/caws/gates.js contracts --tier ${{ needs.setup.outputs.risk }}
|
|
426
|
+
|
|
427
|
+
integration:
|
|
428
|
+
needs: [setup]
|
|
429
|
+
runs-on: ubuntu-latest
|
|
430
|
+
services:
|
|
431
|
+
postgres: { image: postgres:16, env: { POSTGRES_PASSWORD: pass }, ports: ["5432:5432"], options: >-
|
|
432
|
+
--health-cmd="pg_isready -U postgres" --health-interval=10s --health-timeout=5s --health-retries=5 }
|
|
433
|
+
steps:
|
|
434
|
+
- uses: actions/checkout@v4
|
|
435
|
+
- uses: actions/setup-node@v4
|
|
436
|
+
with: { node-version: '20' }
|
|
437
|
+
- run: npm ci
|
|
438
|
+
- run: npm run test:integration
|
|
439
|
+
|
|
440
|
+
e2e_a11y:
|
|
441
|
+
needs: [integration]
|
|
442
|
+
runs-on: ubuntu-latest
|
|
443
|
+
steps:
|
|
444
|
+
- uses: actions/checkout@v4
|
|
445
|
+
- uses: actions/setup-node@v4
|
|
446
|
+
with: { node-version: '20' }
|
|
447
|
+
- run: npm ci
|
|
448
|
+
- run: npm run test:e2e:smoke
|
|
449
|
+
- run: npm run test:axe
|
|
450
|
+
|
|
451
|
+
perf:
|
|
452
|
+
if: needs.setup.outputs.risk != '3'
|
|
453
|
+
needs: [integration]
|
|
454
|
+
runs-on: ubuntu-latest
|
|
455
|
+
steps:
|
|
456
|
+
- uses: actions/checkout@v4
|
|
457
|
+
- uses: actions/setup-node@v4
|
|
458
|
+
with: { node-version: '20' }
|
|
459
|
+
- run: npm ci
|
|
460
|
+
- run: npm run perf:budgets
|
|
461
|
+
|
|
462
|
+
provenance_trust:
|
|
463
|
+
needs: [naming_guard, scope_guard, budget_guard, static, unit, mutation, contracts, integration, e2e_a11y, perf]
|
|
464
|
+
runs-on: ubuntu-latest
|
|
465
|
+
steps:
|
|
466
|
+
- uses: actions/checkout@v4
|
|
467
|
+
- uses: actions/setup-node@v4
|
|
468
|
+
with: { node-version: '20' }
|
|
469
|
+
- run: npm ci
|
|
470
|
+
- name: Generate SBOM
|
|
471
|
+
run: npx @cyclonedx/cyclonedx-npm --output-file .agent/sbom.json
|
|
472
|
+
- name: Create Attestation
|
|
473
|
+
run: node apps/tools/caws/attest.js > .agent/attestation.json
|
|
474
|
+
- name: Prompt/Tool lint
|
|
475
|
+
run: node apps/tools/caws/prompt-lint.js .agent/prompts/*.md --allowlist .agent/tools-allow.json
|
|
476
|
+
- name: Generate Provenance
|
|
477
|
+
run: node apps/tools/caws/provenance.js > .agent/provenance.json
|
|
478
|
+
- name: Validate Provenance
|
|
479
|
+
run: node apps/tools/caws/validate-prov.js .agent/provenance.json
|
|
480
|
+
- name: Compute Trust Score
|
|
481
|
+
run: node apps/tools/caws/gates.js trust --tier ${{ needs.setup.outputs.risk }}
|
|
482
|
+
```
|
|
483
|
+
|
|
484
|
+
### C) Repository Scaffold
|
|
485
|
+
|
|
486
|
+
```
|
|
487
|
+
.caws/
|
|
488
|
+
policy/tier-policy.json
|
|
489
|
+
schemas/{working-spec.schema.json, provenance.schema.json}
|
|
490
|
+
templates/{pr.md, feature.plan.md, test-plan.md}
|
|
491
|
+
apps/contracts/ # OpenAPI/GraphQL/Pact
|
|
492
|
+
docs/ # human docs; ADRs
|
|
493
|
+
src/
|
|
494
|
+
tests/
|
|
495
|
+
unit/
|
|
496
|
+
contract/
|
|
497
|
+
integration/
|
|
498
|
+
e2e/
|
|
499
|
+
axe/
|
|
500
|
+
mutation/
|
|
501
|
+
apps/tools/caws/
|
|
502
|
+
validate.ts
|
|
503
|
+
gates.ts # thresholds, trust score
|
|
504
|
+
provenance.ts
|
|
505
|
+
prompt-lint.js # prompt hygiene & tool allowlist
|
|
506
|
+
attest.js # SBOM + SLSA attestation generator
|
|
507
|
+
tools-allow.json # allowed tools for agents
|
|
508
|
+
codemod/ # AST transformation scripts for refactor mode
|
|
509
|
+
rename.ts # example codemod for renaming modules
|
|
510
|
+
.agent/ # provenance artifacts (generated)
|
|
511
|
+
sbom.json
|
|
512
|
+
attestation.json
|
|
513
|
+
provenance.json
|
|
514
|
+
tools-allow.json
|
|
515
|
+
.github/
|
|
516
|
+
workflows/caws.yml
|
|
517
|
+
CODEOWNERS
|
|
518
|
+
```
|
|
519
|
+
|
|
520
|
+
## 3) Templates & Examples
|
|
521
|
+
|
|
522
|
+
### Working Spec YAML Template
|
|
523
|
+
|
|
524
|
+
```yaml
|
|
525
|
+
id: { { PROJECT_ID } }
|
|
526
|
+
title: '{{PROJECT_TITLE}}'
|
|
527
|
+
risk_tier: { { PROJECT_TIER } }
|
|
528
|
+
mode: { { PROJECT_MODE } }
|
|
529
|
+
change_budget:
|
|
530
|
+
max_files: { { MAX_FILES } }
|
|
531
|
+
max_loc: { { MAX_LOC } }
|
|
532
|
+
blast_radius:
|
|
533
|
+
modules: [{ { BLAST_MODULES } }]
|
|
534
|
+
data_migration: { { DATA_MIGRATION } }
|
|
535
|
+
operational_rollback_slo: '{{ROLLBACK_SLO}}'
|
|
536
|
+
threats: { { PROJECT_THREATS } }
|
|
537
|
+
scope:
|
|
538
|
+
in: [{ { SCOPE_IN } }]
|
|
539
|
+
out: [{ { SCOPE_OUT } }]
|
|
540
|
+
invariants: { { PROJECT_INVARIANTS } }
|
|
541
|
+
acceptance: { { ACCEPTANCE_CRITERIA } }
|
|
542
|
+
non_functional:
|
|
543
|
+
a11y: [{ { A11Y_REQUIREMENTS } }]
|
|
544
|
+
perf: { api_p95_ms: { { PERF_BUDGET } } }
|
|
545
|
+
security: [{ { SECURITY_REQUIREMENTS } }]
|
|
546
|
+
contracts:
|
|
547
|
+
- type: { { CONTRACT_TYPE } }
|
|
548
|
+
path: '{{CONTRACT_PATH}}'
|
|
549
|
+
observability:
|
|
550
|
+
logs: [{ { OBSERVABILITY_LOGS } }]
|
|
551
|
+
metrics: [{ { OBSERVABILITY_METRICS } }]
|
|
552
|
+
traces: [{ { OBSERVABILITY_TRACES } }]
|
|
553
|
+
migrations: { { MIGRATION_PLAN } }
|
|
554
|
+
rollback: [{ { ROLLBACK_PLAN } }]
|
|
555
|
+
```
|
|
556
|
+
|
|
557
|
+
### PR Description Template
|
|
558
|
+
|
|
559
|
+
```markdown
|
|
560
|
+
## Summary
|
|
561
|
+
|
|
562
|
+
{{PR_SUMMARY}}
|
|
563
|
+
|
|
564
|
+
## Working Spec
|
|
565
|
+
|
|
566
|
+
- Risk Tier: {{RISK_TIER}}
|
|
567
|
+
- Mode: {{PR_MODE}}
|
|
568
|
+
- Invariants: {{INVARIANTS}}
|
|
569
|
+
|
|
570
|
+
## Tests
|
|
571
|
+
|
|
572
|
+
- Unit: {{UNIT_COVERAGE}}% (target {{TARGET_COVERAGE}}%)
|
|
573
|
+
- Mutation: {{MUTATION_SCORE}}% (target {{TARGET_MUTATION}}%)
|
|
574
|
+
- Integration: {{INTEGRATION_TESTS}} flows
|
|
575
|
+
- E2E smoke: {{E2E_TESTS}} ({{E2E_STATUS}})
|
|
576
|
+
- A11y: {{A11Y_SCORE}} ({{A11Y_STATUS}})
|
|
577
|
+
|
|
578
|
+
## Non-functional
|
|
579
|
+
|
|
580
|
+
- API p95: {{API_PERF}}ms (budget {{API_BUDGET}}ms)
|
|
581
|
+
- Security: {{SAST_STATUS}}
|
|
582
|
+
|
|
583
|
+
## Migration & Rollback
|
|
584
|
+
|
|
585
|
+
{{MIGRATION_NOTES}}
|
|
586
|
+
|
|
587
|
+
## Known Limits
|
|
588
|
+
|
|
589
|
+
{{KNOWN_LIMITS}}
|
|
590
|
+
```
|
|
591
|
+
|
|
592
|
+
## 4) Agent Conduct Rules (Hard Constraints)
|
|
593
|
+
|
|
594
|
+
1. **Spec adherence**: Do not implement beyond scope.in; if discovered dependency changes spec, open "Spec delta" in PR and update tests first.
|
|
595
|
+
2. **No hidden state/time/net**: All non-determinism injected and controlled in tests.
|
|
596
|
+
3. **Explainable mocks**: Only mock boundaries; never mock the function under test; document any mock behavior in comments.
|
|
597
|
+
4. **Idempotency & error paths**: Provide tests for retries/timeouts/cancel; assert invariants on error.
|
|
598
|
+
5. **Observability parity**: Every key acceptance path emits logs/metrics/traces; tests assert on them when feasible (e.g., fake exporter assertions).
|
|
599
|
+
6. **Data safety**: No real PII in fixtures; factories generate realistic but synthetic data.
|
|
600
|
+
7. **Accessibility required**: For UI changes: keyboard path test + axe scan; for API: error messages human-readable and localizable.
|
|
601
|
+
8. **Performance ownership**: Include micro-bench (where hot path) or budget check; document algorithmic complexity if changed.
|
|
602
|
+
9. **Docs as code**: Update README/usage snippets; add example code; regenerate typed clients from contracts.
|
|
603
|
+
10. **Rollback ready**: Feature-flag new behavior; write a reversible migration or provide kill-switch.
|
|
604
|
+
|
|
605
|
+
## 5) Trust & Telemetry
|
|
606
|
+
|
|
607
|
+
• **Provenance manifest** (.agent/provenance.json): agent name/version, prompts, model, commit SHAs, test results hashes, generated files list, and human approvals. Stored with the PR for auditability.
|
|
608
|
+
• **Trust score per PR**: composite of rubric + gates + historical flake rate; expose in a PR check and weekly dashboard.
|
|
609
|
+
• **Drift watch**: monitor contract usage in prod; alert if undocumented fields appear.
|
|
610
|
+
|
|
611
|
+
## 6) Operational Excellence
|
|
612
|
+
|
|
613
|
+
### Flake Management
|
|
614
|
+
|
|
615
|
+
• **Detector**: compute week-over-week pass variance per spec ID.
|
|
616
|
+
• **Policy**: >0.5% variance → auto-label flake:quarantine, open ticket with owner + expiry (7 days).
|
|
617
|
+
• **Implementation**: Store test run hashes in .agent/provenance.json; nightly job aggregates and posts a table to dashboard.
|
|
618
|
+
|
|
619
|
+
### Waivers & Escalation
|
|
620
|
+
|
|
621
|
+
• **Temporary waiver requires**:
|
|
622
|
+
• waivers.yml with: gate, reason, owner, expiry ISO date (≤ 14 days), compensating control.
|
|
623
|
+
• PR must link to ticket; trust score maximum capped at 79 with active waivers.
|
|
624
|
+
• **Escalation**: unresolved flake/waiver past expiry auto-blocks merges across the repo until cleared.
|
|
625
|
+
|
|
626
|
+
### Security & Performance Checks
|
|
627
|
+
|
|
628
|
+
• **Secrets**: run gitleaks/trufflehog on changed files; CAWS gate blocks any hit above low severity.
|
|
629
|
+
• **SAST**: language-appropriate tools; gate requires zero criticals.
|
|
630
|
+
• **Performance**: k6 scripts for API budgets; LHCI for web budgets; regressions fail gate.
|
|
631
|
+
• **Migrations**: lint for reversibility; dry-run in CI; forward-compat contract tests.
|
|
632
|
+
|
|
633
|
+
## 7) Language & Tooling Ecosystem
|
|
634
|
+
|
|
635
|
+
### TypeScript Stack (Recommended)
|
|
636
|
+
|
|
637
|
+
• **Testing**: Jest/Vitest, fast-check, Playwright, Testcontainers, Stryker, MSW or Pact
|
|
638
|
+
• **Quality**: ESLint + types, LHCI, axe-core
|
|
639
|
+
• **CI**: GitHub Actions with Node 20
|
|
640
|
+
|
|
641
|
+
### Python Stack
|
|
642
|
+
|
|
643
|
+
• **Testing**: pytest, hypothesis, Playwright (Python), Testcontainers-py, mutmut, Schemathesis
|
|
644
|
+
• **Quality**: bandit/semgrep, Lighthouse CI, axe-core
|
|
645
|
+
|
|
646
|
+
### JVM Stack
|
|
647
|
+
|
|
648
|
+
• **Testing**: JUnit5, jqwik, Testcontainers, PIT (mutation), Pact-JVM
|
|
649
|
+
• **Quality**: OWASP dependency check, SonarQube, axe-core
|
|
650
|
+
|
|
651
|
+
**Note**: Mutation testing is non-negotiable for tiers ≥2; it's the only reliable guard against assertion theater.
|
|
652
|
+
|
|
653
|
+
## 8) Review Rubric (Scriptable Scoring)
|
|
654
|
+
|
|
655
|
+
| Category | Weight | Criteria | 0 | 1 | 2 |
|
|
656
|
+
| --------------------------------- | ------ | ----------------------------------- | ----------------- | ------------------ | --------------------------- |
|
|
657
|
+
| Spec clarity & invariants | ×5 | Clear, testable invariants | Missing/unclear | Basic coverage | Comprehensive + edge cases |
|
|
658
|
+
| Contract correctness & versioning | ×5 | Schema accuracy + versioning | Errors present | Minor issues | Perfect + versioned |
|
|
659
|
+
| Unit thoroughness & edge coverage | ×5 | Branch coverage + property tests | <70% coverage | Meets tier minimum | >90% + properties |
|
|
660
|
+
| Integration realism | ×4 | Real containers + seeds | Mocked heavily | Basic containers | Full stack + realistic data |
|
|
661
|
+
| E2E relevance & stability | ×3 | Critical paths + semantic selectors | Brittle selectors | Basic coverage | Semantic + stable |
|
|
662
|
+
| Mutation adequacy | ×4 | Score vs tier threshold | <50% | Meets minimum | >80% |
|
|
663
|
+
| A11y pathways & results | ×3 | Keyboard + axe clean | Major issues | Basic compliance | Full WCAG + keyboard |
|
|
664
|
+
| Perf/Resilience | ×3 | Budgets + timeouts/retries | No checks | Basic budgets | Full resilience |
|
|
665
|
+
| Observability | ×3 | Logs/metrics/traces asserted | Missing | Basic emission | Asserted in tests |
|
|
666
|
+
| Migration safety & rollback | ×3 | Reversible + kill-switch | No rollback | Basic revert | Full rollback + testing |
|
|
667
|
+
| Docs & PR explainability | ×3 | Clear rationale + limits | Minimal | Basic docs | Comprehensive + ADR |
|
|
668
|
+
| **Mode compliance** | ×3 | Changes match declared `mode` | Violations | Minor drift | Full compliance |
|
|
669
|
+
| **Scope & budget discipline** | ×3 | Diff within `scope.in` & budget | Exceeded | Near limit | Within limits |
|
|
670
|
+
| **Supply-chain attestations** | ×2 | SBOM + SLSA attestation | Missing | Partial | Complete & valid |
|
|
671
|
+
|
|
672
|
+
**Target**: ≥ 82/100 (weighted sum). Calculator in `apps/tools/caws/rubric.ts`.
|
|
673
|
+
|
|
674
|
+
## 9) Anti-patterns (Explicitly Rejected)
|
|
675
|
+
|
|
676
|
+
• **Over-mocked integration tests**: mocking ORM or HTTP client where containerized integration is feasible.
|
|
677
|
+
• **UI tests keyed on CSS classes**: brittle selectors instead of semantic roles/labels.
|
|
678
|
+
• **Coupling tests to implementation details**: private method calls, internal sequence assertions.
|
|
679
|
+
• **"Retry until green" CI culture**: quarantines without expiry or owner.
|
|
680
|
+
• **100% coverage mandates**: without mutation testing or risk awareness.
|
|
681
|
+
|
|
682
|
+
## 13) Failure-Mode Cards (Common Traps & Recovery)
|
|
683
|
+
|
|
684
|
+
Add a small section of "If you see X, do Y":
|
|
685
|
+
|
|
686
|
+
1. **Symptom:** Large rename + re-exports create `*-copy.ts` or `enhanced-*.ts`.
|
|
687
|
+
**Action:** Switch to **refactor mode**. Generate `codemod/rename.ts` that updates imports/exports in place. Validate with `tsc --noEmit` and run mutation tests to ensure unchanged behavior.
|
|
688
|
+
|
|
689
|
+
2. **Symptom:** Contract change proliferates across services.
|
|
690
|
+
**Action:** Declare **blast_radius.modules**; create consumer **Pact** tests first. Stage changes behind a feature flag; ship provider compatibility for both old/new fields.
|
|
691
|
+
|
|
692
|
+
3. **Symptom:** Flaky time-based tests.
|
|
693
|
+
**Action:** Inject `Clock` and use fixed timestamps; assert **idempotency** with property tests.
|
|
694
|
+
|
|
695
|
+
4. **Symptom:** Agent proposes new external tool/library.
|
|
696
|
+
**Action:** Fail unless added to `tool_allowlist`. Require SBOM delta review and perf/a11y/security notes in the PR.
|
|
697
|
+
|
|
698
|
+
## 10) Cursor/Codex Agent Integration
|
|
699
|
+
|
|
700
|
+
### Agent Commands
|
|
701
|
+
|
|
702
|
+
• `agent plan` → emits plan + test matrix
|
|
703
|
+
• `agent verify` → runs local gates; generates provenance
|
|
704
|
+
• `agent prove` → creates provenance manifest
|
|
705
|
+
• `agent doc` → updates README/changelog from spec
|
|
706
|
+
|
|
707
|
+
### Guardrails
|
|
708
|
+
|
|
709
|
+
• **Templates**: Inject Working Spec YAML + PR template on "New Feature" command
|
|
710
|
+
• **Scaffold**: Pre-wire tests/\* skeletons with containers and contracts
|
|
711
|
+
• **Context discipline**: Restrict writes to spec-touched modules; deny outside scope unless spec updated
|
|
712
|
+
• **Feedback loop**: PR comments show coverage, mutation diff, contract verification summary
|
|
713
|
+
|
|
714
|
+
## 11) Adoption Roadmap
|
|
715
|
+
|
|
716
|
+
### Foundation Setup
|
|
717
|
+
|
|
718
|
+
- [ ] Add .caws/ directory with schemas and templates
|
|
719
|
+
- [ ] Create apps/tools/caws/ validation scripts
|
|
720
|
+
- [ ] Wire basic GitHub Actions workflow
|
|
721
|
+
- [ ] Add CODEOWNERS for Tier-1 paths
|
|
722
|
+
|
|
723
|
+
### Quality Gates Implementation
|
|
724
|
+
|
|
725
|
+
- [ ] Enable Testcontainers for integration tests
|
|
726
|
+
- [ ] Add mutation testing with tier thresholds
|
|
727
|
+
- [ ] Implement trust score calculation
|
|
728
|
+
- [ ] Add axe + Playwright smoke for UI changes
|
|
729
|
+
|
|
730
|
+
### Operational Excellence
|
|
731
|
+
|
|
732
|
+
- [ ] Publish provenance manifest with PRs
|
|
733
|
+
- [ ] Implement flake detector and quarantine process
|
|
734
|
+
- [ ] Add waiver system with trust score caps
|
|
735
|
+
- [ ] Socialize review rubric and block merges <80
|
|
736
|
+
|
|
737
|
+
### Continuous Improvement
|
|
738
|
+
|
|
739
|
+
- [ ] Monitor drift in contract usage
|
|
740
|
+
- [ ] Refine tooling based on feedback
|
|
741
|
+
- [ ] Expand language support as needed
|
|
742
|
+
- [ ] Track trust score trends and flake rates
|
|
743
|
+
|
|
744
|
+
## 12) Trust Score Formula
|
|
745
|
+
|
|
746
|
+
```typescript
|
|
747
|
+
const weights = {
|
|
748
|
+
coverage: 0.2,
|
|
749
|
+
mutation: 0.2,
|
|
750
|
+
contracts: 0.16,
|
|
751
|
+
a11y: 0.08,
|
|
752
|
+
perf: 0.08,
|
|
753
|
+
flake: 0.08,
|
|
754
|
+
mode: 0.06,
|
|
755
|
+
scope: 0.06,
|
|
756
|
+
supplychain: 0.04,
|
|
757
|
+
};
|
|
758
|
+
|
|
759
|
+
function trustScore(tier: string, prov: Provenance) {
|
|
760
|
+
const wsum = Object.values(weights).reduce((a, b) => a + b, 0);
|
|
761
|
+
const score =
|
|
762
|
+
weights.coverage * normalize(prov.results.coverage_branch, tiers[tier].min_branch, 0.95) +
|
|
763
|
+
weights.mutation * normalize(prov.results.mutation_score, tiers[tier].min_mutation, 0.9) +
|
|
764
|
+
weights.contracts *
|
|
765
|
+
(tiers[tier].requires_contracts
|
|
766
|
+
? prov.results.contracts.consumer && prov.results.contracts.provider
|
|
767
|
+
? 1
|
|
768
|
+
: 0
|
|
769
|
+
: 1) +
|
|
770
|
+
weights.a11y * (prov.results.a11y === 'pass' ? 1 : 0) +
|
|
771
|
+
weights.perf * budgetOk(prov.results.perf) +
|
|
772
|
+
weights.flake * (prov.results.flake_rate <= 0.005 ? 1 : 0.5) +
|
|
773
|
+
weights.mode * (prov.results.mode_compliance === 'full' ? 1 : 0.5) +
|
|
774
|
+
weights.scope * (prov.results.scope_within_budget ? 1 : 0) +
|
|
775
|
+
weights.supplychain * (prov.results.sbom_valid && prov.results.attestation_valid ? 1 : 0);
|
|
776
|
+
return Math.round((score / wsum) * 100);
|
|
777
|
+
}
|
|
778
|
+
```
|
|
779
|
+
|
|
780
|
+
This v1.0 combines the philosophical foundation of our system with the practical, executable implementation details needed for immediate adoption. The framework provides both the "why" (quality principles) and the "how" (automated enforcement) needed for engineering-grade AI coding agents.
|
|
781
|
+
|
|
782
|
+
---
|
|
783
|
+
|
|
784
|
+
## 🚀 Quick Start Guide
|
|
785
|
+
|
|
786
|
+
### For New Projects
|
|
787
|
+
|
|
788
|
+
1. Copy this template to your project root
|
|
789
|
+
2. Run `caws init` to scaffold the project structure
|
|
790
|
+
3. Customize the Working Spec YAML for your project
|
|
791
|
+
4. Set up your CI/CD pipeline with the provided GitHub Actions
|
|
792
|
+
|
|
793
|
+
### For Existing Projects
|
|
794
|
+
|
|
795
|
+
1. Copy the relevant sections to your existing project
|
|
796
|
+
2. Run `caws scaffold` to add missing components
|
|
797
|
+
3. Update your existing workflows to include the CAWS gates
|
|
798
|
+
|
|
799
|
+
### Customization
|
|
800
|
+
|
|
801
|
+
- **Project ID**: Update `{{PROJECT_ID}}` with your ticket system prefix
|
|
802
|
+
- **Title**: Describe your project in `{{PROJECT_TITLE}}`
|
|
803
|
+
- **Tier**: Set appropriate risk tier (1-3) in `{{PROJECT_TIER}}`
|
|
804
|
+
- **Mode**: Choose from `refactor`, `feature`, `fix`, `doc`, `chore`
|
|
805
|
+
- **Budget**: Set reasonable file/LOC limits in `change_budget`
|
|
806
|
+
- **Scope**: Define what files/features are in/out of scope
|
|
807
|
+
- **Contracts**: Specify API contracts (OpenAPI, GraphQL, etc.)
|
|
808
|
+
|
|
809
|
+
### Support
|
|
810
|
+
|
|
811
|
+
- 📖 Full documentation: See sections above
|
|
812
|
+
- 🛠️ Tools: `apps/tools/caws/` contains all utilities
|
|
813
|
+
- 🎯 Examples: Check `docs/` for implementation examples
|
|
814
|
+
- 🤝 Community: Follow the agent conduct rules for collaboration
|
|
815
|
+
|
|
816
|
+
---
|
|
817
|
+
|
|
818
|
+
**Author**: @darianrosebrook
|
|
819
|
+
**Version**: 1.0.0
|
|
820
|
+
**License**: MIT
|