codebyplan 1.13.55 → 1.13.57

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codebyplan",
3
- "version": "1.13.55",
3
+ "version": "1.13.57",
4
4
  "description": "CLI for CodeByPlan — AI-powered development planning and tracking",
5
5
  "type": "module",
6
6
  "bin": {
@@ -132,7 +132,8 @@ One subdirectory per app module. Shared flows under `_shared/`. Probe under `_pr
132
132
 
133
133
  ## Spec-Writing Patterns
134
134
 
135
- **One flow per screen/feature.** Steps:
135
+ **One flow per screen/feature.** A flow that only taps and asserts visibility is NOT done —
136
+ prove behavior:
136
137
 
137
138
  ```yaml
138
139
  appId: ${APP_ID}
@@ -141,15 +142,97 @@ tags:
141
142
  ---
142
143
  - runFlow: _shared/login.yaml
143
144
  - assertVisible: "Dashboard"
144
- - takeScreenshot: "dashboard-loaded"
145
- - tapOn: "Create"
145
+ - waitForAnimationToEnd
146
+ - assertNoDefectsWithAI: # AI visual-defect check — see AI Assertions below
147
+ optional: false
148
+ - takeScreenshot: "dashboard-loaded" # NEW states only — see Visual Baselines
149
+ - tapOn:
150
+ text: "Create"
151
+ enabled: true # state selector — waits for interactivity, catches broken gating
146
152
  - assertVisible: "New item"
147
- - takeScreenshot: "create-modal-open"
148
153
  ```
149
154
 
150
155
  Use text-based targeting first (`tapOn: "Button"`); use testID when ambiguous
151
- (`tapOn: { id: "btn" }`). For CRUD: create + verify visible; edit + verify updated;
152
- delete + confirm + verify removed.
156
+ (`tapOn: { id: "btn" }`). `text`/`id` are REGEX by default escape `$` and `[`; quote
157
+ `YES`/`NO`/`ON`/`OFF` (unquoted they parse as YAML booleans).
158
+
159
+ **Assertion depth requirements**:
160
+
161
+ - **State selectors prove logic**: `enabled`, `checked`, `focused`, `selected` — e.g. assert
162
+ Submit is `enabled: false` before required fields are filled, `enabled: true` after.
163
+ - **Data round-trips** via `copyTextFrom` + `assertTrue`: copy the value on screen A
164
+ (snapshot into `output.*` via `evalScript` before the next copy overwrites
165
+ `maestro.copiedText`), navigate, assert screen B shows the same value.
166
+ - **Persistence proof** for create/edit flows — after the UI reports success, verify via
167
+ `runScript` `http.get` against the backend API (`json()` parse + `assertTrue` on the
168
+ field), or at minimum kill + relaunch and re-assert:
169
+
170
+ ```yaml
171
+ - killApp
172
+ - launchApp: { stopApp: false }
173
+ - assertVisible: ${output.createdTitle}
174
+ ```
175
+
176
+ - For CRUD: create + verify (round-trip); edit + verify updated; delete + confirm + verify
177
+ removed.
178
+
179
+ ## Visual Baselines (assertScreenshot)
180
+
181
+ Committed PNGs under `e2e/screenshots/maestro/` are BASELINES, not run artifacts.
182
+
183
+ - **New state** (`git ls-files --error-unmatch <path>` exits non-zero): `waitForAnimationToEnd`,
184
+ then `takeScreenshot: "{flow}-{state}"` and `git add` the PNG (auto-new model).
185
+ - **Existing baseline**: do NOT retake/overwrite. Assert against it:
186
+
187
+ ```yaml
188
+ - waitForAnimationToEnd
189
+ - assertScreenshot:
190
+ path: e2e/screenshots/maestro/{flow}-{state}.png
191
+ thresholdPercentage: 95
192
+ ```
193
+
194
+ On failure classify `visual_regression`: capture the live screen under a transient
195
+ diagnostic name (`{flow}-{state}-actual`, written to `--test-output-dir`), report it in
196
+ `screenshots[]`, and NEVER overwrite the committed baseline. The user accepts the change at
197
+ `/cbp-verify`; only then is the baseline re-captured and re-added.
198
+ - `baseline_diff_pct` stays `null` (Maestro reports threshold pass/fail, not a percentage);
199
+ set `is_new` per git tracking as before.
200
+
201
+ ## AI Assertions (assertNoDefectsWithAI / assertWithAI)
202
+
203
+ Maestro's AI commands screenshot the current screen and detect rendering defects (cut-off
204
+ text, overlapping elements, mis-centered content). Run `assertNoDefectsWithAI` at every
205
+ primary screen state; use `assertWithAI` for states selectors can't express:
206
+
207
+ ```yaml
208
+ - assertNoDefectsWithAI:
209
+ optional: false
210
+ - assertWithAI:
211
+ assertion: The 6-digit verification input is visible with all six boxes empty.
212
+ optional: false
213
+ ```
214
+
215
+ **Critical**: AI commands default to `optional: true` (warn-only — a detected defect does
216
+ NOT fail the flow). ALWAYS set `optional: false`.
217
+
218
+ **Auth preflight (Step 6.5.1 addition)**: AI commands require Maestro auth — a `maestro login`
219
+ session or `MAESTRO_CLOUD_API_KEY` (a free account suffices; the legacy `MAESTRO_CLI_AI_KEY`
220
+ BYO-key path is retired). Probe before authoring AI steps. When unavailable, ask the user once
221
+ (provide key / skip AI), record `ai_checks: 'unavailable'` in the output, omit AI commands,
222
+ and rely on `assertScreenshot` baselines — never let an AI step fail a run on a missing key.
223
+ AI artifacts (`ai-report-*.html`, `ai-*.json`) land under `--test-output-dir`; reference them
224
+ in `critical_issues[].reason` when a defect is found.
225
+
226
+ ## Anti-Patterns
227
+
228
+ - `waitForAnimationToEnd` is NOT an assertion — it succeeds even on timeout; always pair it
229
+ with a real assert or screenshot.
230
+ - Don't wrap whole flows in `retry` (hides product flakiness); bound `repeat` loops with
231
+ `times` + `while` together.
232
+ - No `point:` coordinate taps — device-dependent; combine attribute + relational selectors instead.
233
+ - Don't max out timeouts ("60s everywhere") — defaults catch performance regressions.
234
+ - Platform limits: `back` is Android/Web only; airplane-mode commands are Android-only;
235
+ Android `inputText` is ASCII-only; system biometric/HealthKit dialogs need XCUITest.
153
236
 
154
237
  ## Screenshot Capture
155
238
 
@@ -161,7 +244,9 @@ Screenshots written to `e2e/screenshots/maestro/` (via `screenshotsDir` in `conf
161
244
  Committed path convention: `e2e/screenshots/maestro/{flow}-{state}.png` (repo root).
162
245
  This path is intentionally outside `apps/web/e2e/screenshots/` (which is gitignored).
163
246
 
164
- After the flow completes, `git add e2e/screenshots/maestro/` to track new PNGs.
247
+ After the flow completes, `git add` each NEW PNG individually — never `git add` the whole
248
+ directory (that silently stages drifted baselines; existing states are gated by
249
+ `assertScreenshot`, see Visual Baselines).
165
250
 
166
251
  **`is_new` detection**: `git ls-files --error-unmatch <path>` exits non-zero → `is_new: true`.
167
252
 
@@ -186,9 +271,13 @@ Include this in the specialist output alongside `screenshots[]`.
186
271
  ## Run Command
187
272
 
188
273
  ```bash
189
- maestro test maestro/flows/{module}/{flow}.yaml --format=junit --output maestro/results.xml
274
+ maestro test maestro/flows/{module}/{flow}.yaml --format junit --output maestro/results.xml \
275
+ --test-output-dir maestro/output
190
276
  ```
191
277
 
278
+ `maestro/output/` holds transient diagnostics (AI reports, `-actual` regression captures) —
279
+ gitignore it; committed baselines live only under `e2e/screenshots/maestro/`.
280
+
192
281
  ## pnpm Scripts
193
282
 
194
283
  ```json
@@ -21,7 +21,7 @@ accordingly.
21
21
  ## Install
22
22
 
23
23
  ```bash
24
- pnpm add -D @playwright/test
24
+ pnpm add -D @playwright/test @axe-core/playwright
25
25
  pnpm exec playwright install chromium
26
26
  # CI with system deps:
27
27
  pnpm exec playwright install --with-deps chromium
@@ -265,32 +265,123 @@ port from `.codebyplan/server.local.json` (worktree overlay, checked first) then
265
265
  `.codebyplan/server.json` (committed base). On mismatch ask which is correct, then propose
266
266
  an Edit to align them.
267
267
 
268
+ ## Quality Fixture (MANDATORY)
269
+
270
+ `apps/{app}/e2e/fixtures.ts` — the single `test` source for ALL specs. It auto-enforces the
271
+ console-clean mandate (an `{ auto: true }` fixture runs in every test with zero per-spec
272
+ opt-in) and provides the axe builder. Create it if absent; when touching an existing spec
273
+ that still imports from `@playwright/test`, migrate its import.
274
+
275
+ ```ts
276
+ import { test as base, expect } from "@playwright/test";
277
+ import AxeBuilder from "@axe-core/playwright";
278
+
279
+ // Known, triaged errors only — every entry needs a comment linking its fix task.
280
+ const ALLOWED_CONSOLE: RegExp[] = [];
281
+
282
+ type QualityFixtures = {
283
+ consoleGuard: void;
284
+ makeAxeBuilder: () => AxeBuilder;
285
+ };
286
+
287
+ export const test = base.extend<QualityFixtures>({
288
+ consoleGuard: [
289
+ async ({ page, baseURL }, use) => {
290
+ const errors: string[] = [];
291
+ page.on("console", (msg) => {
292
+ if (msg.type() === "error" && !ALLOWED_CONSOLE.some((re) => re.test(msg.text())))
293
+ errors.push(`console.error: ${msg.text()}`);
294
+ });
295
+ page.on("pageerror", (err) => errors.push(`pageerror: ${err.message}`));
296
+ page.on("requestfailed", (req) => {
297
+ // Own-origin, non-aborted failures only (cancelled prefetches are noise)
298
+ if (baseURL && req.url().startsWith(baseURL) && req.failure()?.errorText !== "net::ERR_ABORTED")
299
+ errors.push(`requestfailed: ${req.method()} ${req.url()} — ${req.failure()?.errorText}`);
300
+ });
301
+ await use();
302
+ expect(errors, "console/page errors captured during test").toEqual([]);
303
+ },
304
+ { auto: true },
305
+ ],
306
+ makeAxeBuilder: async ({ page }, use) => {
307
+ await use(() => new AxeBuilder({ page }).withTags(["wcag2a", "wcag2aa", "wcag21a", "wcag21aa"]));
308
+ },
309
+ });
310
+ export { expect };
311
+ ```
312
+
313
+ Collected errors from failing tests feed the `console_errors[]` output field (see Output
314
+ Additions below).
315
+
268
316
  ## Spec-Writing Patterns
269
317
 
270
- **One spec file per page/flow.** Mandatory per spec:
318
+ **One spec file per page/flow.** Specs import `{ test, expect }` from the quality fixture
319
+ (`./fixtures` or relative path) — NEVER directly from `@playwright/test`.
271
320
 
272
- - Smoke test: loads, title correct, no console errors.
273
- - Primary user flow: main interaction.
321
+ Mandatory per spec a spec that only proves elements are visible is NOT done:
322
+
323
+ - Smoke test: loads, title correct (the console guard fails it on any console/page error).
324
+ - Primary user flow: main interaction **with a behavior proof** (below).
274
325
  - Visual regression: `toHaveScreenshot` at every primary state.
326
+ - Structure: `toMatchAriaSnapshot` on the primary state — catches hierarchy/label/role
327
+ breakage without pixel fragility.
328
+ - Accessibility: one axe scan per page state, zero violations.
329
+
330
+ ### Functional Proof (mutations)
275
331
 
276
- For forms: fill + submit + verify success; validation errors.
277
- For CRUD: create + verify; edit + verify; delete + confirm + verify.
332
+ Every flow that mutates state MUST prove the mutation happened — asserting the optimistic UI
333
+ is not proof:
278
334
 
279
335
  ```ts
280
- import { test, expect } from "@playwright/test";
336
+ // 1. Prove the API call succeeded
337
+ const resp = page.waitForResponse((r) => r.url().includes("/api/items") && r.request().method() === "POST");
338
+ await page.getByRole("button", { name: "Create" }).click();
339
+ expect((await resp).status()).toBeLessThan(400);
340
+
341
+ // 2. Prove persistence — reload and re-assert (or poll the API for eventual consistency)
342
+ await page.reload();
343
+ await expect(page.getByRole("listitem").filter({ hasText: itemName })).toBeVisible();
344
+ // await expect.poll(async () => (await page.request.get(`/api/items/${id}`)).status()).toBe(200);
345
+ ```
281
346
 
282
- test.describe("Home page", () => {
283
- test.beforeEach(async ({ page }) => {
284
- await page.goto("/");
285
- });
347
+ ### Error-State Proof (forms / CRUD)
286
348
 
287
- test("loads and shows heading", async ({ page }) => {
288
- await expect(page.getByRole("heading", { level: 1 })).toBeVisible();
289
- await expect(page).toHaveScreenshot("home-loaded.png", { maxDiffPixelRatio: 0.001 });
290
- });
349
+ At least one test per form/CRUD spec injects a failure and asserts the rendered error UI —
350
+ error paths are where untested UIs break in production:
351
+
352
+ ```ts
353
+ await page.route("**/api/items", (r) => r.fulfill({ status: 500 }));
354
+ await page.getByRole("button", { name: "Create" }).click();
355
+ await expect(page.getByRole("alert")).toContainText(/failed|went wrong/i);
356
+ ```
357
+
358
+ ### Permission / RLS Proof
359
+
360
+ When the route is role-gated, include one denial test (lower-privilege storage state or
361
+ seeded non-member): assert the explicit denial UI or redirect — a blank render is a bug,
362
+ not a pass.
363
+
364
+ ### Accessibility Scan
365
+
366
+ ```ts
367
+ test("a11y: dashboard has no WCAG A/AA violations", async ({ page, makeAxeBuilder }) => {
368
+ await page.goto("/dashboard");
369
+ const results = await makeAxeBuilder().analyze();
370
+ expect(results.violations).toEqual([]);
291
371
  });
292
372
  ```
293
373
 
374
+ Known issues are excluded via `.disableRules([...])` with a comment linking the fix task —
375
+ never by deleting the scan.
376
+
377
+ ### Anti-Patterns (reject in review)
378
+
379
+ - `page.waitForTimeout(...)` — web-first assertions auto-retry; hard sleeps mask races.
380
+ - `expect(await locator.isVisible()).toBe(true)` — one-shot, no retry; use `await expect(locator).toBeVisible()`.
381
+ - `.nth(n)` / `.first()` positional selection — except the documented SCSS-module fallback.
382
+ - In-spec env skips (`test.skip(!process.env.X, ...)`) — forbidden per `rules/e2e-mandatory.md`.
383
+ - Visibility-only assertions after a mutation — see Functional Proof.
384
+
294
385
  ## Screenshot Capture
295
386
 
296
387
  **Baseline regression** (preferred):
@@ -332,6 +423,18 @@ when the playwright.config project/device emulation indicates a mobile viewport
332
423
 
333
424
  Include this in the specialist output alongside `screenshots[]`.
334
425
 
426
+ ## Output Additions (Playwright)
427
+
428
+ Beyond the shared contract, ALWAYS report:
429
+
430
+ - `console_errors[]` — every entry the console guard collected on failed tests
431
+ (`{test_name, type: 'console' | 'pageerror' | 'requestfailed', text}`). Empty array on a
432
+ clean run — never omit the field.
433
+ - `a11y` — `{scanned_pages: string[], violations: [{rule, impact, page}]}` aggregated from
434
+ the axe scans. A `status: 'completed'` output with non-empty `violations` is inconsistent —
435
+ fix in-scope or classify the failures as category `a11y`; `codebyplan e2e verify-round`
436
+ hard-fails the inconsistency.
437
+
335
438
  ## Run Command
336
439
 
337
440
  ```bash
@@ -202,6 +202,14 @@ The deterministic e2e gate (`codebyplan e2e verify-round`) and the unit/lint/typ
202
202
  here). If the diff touches an e2e-eligible UI surface, note it in `summary` so the orchestrator
203
203
  confirms its gate ran — but do not assert a build/test result this agent did not run.
204
204
 
205
+ E2E verdict gates (refuse `READY` per `rules/e2e-mandatory.md`): a zero-assertion run
206
+ (`passed === 0 && skipped > 0` on a touched path); an empty `e2e_gallery[]` when the round
207
+ touched UI for an eligible framework (sole exception: `vscode-test`-only rounds with explicit
208
+ `e2e_gallery: []`); a `status: 'completed'` e2e output carrying non-empty `console_errors[]`
209
+ or `a11y.violations[]`. Treat a `{type: 'shallow_coverage'}` critical issue on a mutation
210
+ surface as a real finding (visibility-only specs prove rendering, not behavior) — severity
211
+ `medium` minimum, routed to a follow-up round.
212
+
205
213
  ### Phase 6: Build Findings, Verdict & Routing
206
214
 
207
215
  Assign severity by impact: `critical` (runtime error / data corruption / security), `high`
@@ -54,7 +54,7 @@ output:
54
54
  - test_name: string
55
55
  error: string
56
56
  file: string
57
- category: 'env' | 'auth' | 'access' | 'flake' | 'real' | 'visual_regression'
57
+ category: 'env' | 'auth' | 'access' | 'flake' | 'real' | 'visual_regression' | 'console_error' | 'a11y'
58
58
  classification_reason: string
59
59
  framework_configured: boolean
60
60
  preflight:
@@ -77,6 +77,17 @@ output:
77
77
  committed_path: string # repo-relative; MUST be git-tracked after the run
78
78
  is_new: boolean # true => no prior baseline; auto-captured+committed this run
79
79
  baseline_diff_pct: number | null # null for non-playwright frameworks
80
+ console_errors: # REQUIRED for playwright (empty array on a clean run);
81
+ - test_name: string # null/omitted for frameworks without console capture
82
+ type: 'console' | 'pageerror' | 'requestfailed'
83
+ text: string
84
+ a11y: # REQUIRED for playwright; null/omitted otherwise
85
+ scanned_pages: string[]
86
+ violations:
87
+ - rule: string # axe rule id (e.g. color-contrast)
88
+ impact: string # critical | serious | moderate | minor
89
+ page: string
90
+ ai_checks: 'ran' | 'unavailable' | null # maestro only — AI assertion availability (see agent body)
80
91
  user_interactions: [{question, answer}]
81
92
  tech_stack_reconciliation:
82
93
  db_framework: string | null
@@ -177,12 +188,32 @@ For each failed test, assign exactly one category:
177
188
  | `auth` | Login-page redirect, 401 after credential submit, `invalid_grant`, `email_not_confirmed` | AskUserQuestion per Step 6.5.3 |
178
189
  | `access` | 403/404 on an accessible route, RLS denial text, missing seed data | AskUserQuestion: "Test failed with access error: `{error}`. Options: (1) fix + reply steps, (2) abort." |
179
190
  | `flake` | Timeout on first run, passes on immediate retry, network jitter | Retry up to 3 times before reclassifying to `real` |
180
- | `visual_regression` | `toHaveScreenshot` pixel-diff exceeded threshold | Do NOT retry. Include baseline + actual paths in `screenshots[]` with `baseline_diff_pct`. Do NOT auto-accept baselines. |
191
+ | `visual_regression` | `toHaveScreenshot` / `assertScreenshot` diff exceeded threshold | Do NOT retry. Include baseline + actual paths in `screenshots[]` with `baseline_diff_pct`. Do NOT auto-accept baselines. |
192
+ | `console_error` | Console guard collected console/page/request errors during the flow | App defect — fix in-scope or report; never allowlist without a linked fix task |
193
+ | `a11y` | Axe scan reported WCAG A/AA violations | Do NOT retry. Report rule ids in `a11y.violations`; fix in-scope or surface at `/cbp-verify` |
181
194
  | `real` | Assertion failure on app behavior (wrong text, state, navigation) | Attempt fix (selector, timeout, assertion), max 3 attempts, then report |
182
195
 
183
196
  `env`, `auth`, `access` failures MUST NOT count toward `test_results.failed` until
184
197
  preflight passes — they block the run instead.
185
198
 
199
+ ## Functional Assertion Mandate
200
+
201
+ Visibility-only specs are NOT sufficient coverage — they prove rendering, not behavior.
202
+ Every spec/flow covering a mutation (create / edit / delete / submit) MUST include at
203
+ least one behavior proof:
204
+
205
+ - **network success proof** — response-status assertion on the mutating call
206
+ (`waitForResponse` in Playwright; `runScript` `http.*` in Maestro), AND/OR
207
+ - **persistence proof** — reload / kill-and-relaunch / direct API re-read showing the
208
+ change survived, PLUS
209
+ - **one error-state test per form/CRUD surface** — inject a failure (`page.route` 500 in
210
+ Playwright) and assert the rendered error UI.
211
+
212
+ When a suite's assertions are entirely visibility/navigation-level, the specialist MUST
213
+ report `critical_issues[]` entry `{type: 'shallow_coverage', ...}` — the run may pass, but
214
+ the gap is flagged for the next round. `cbp-verify-reviewer` treats `shallow_coverage` on a
215
+ mutation surface as a finding, not noise.
216
+
186
217
  ## Committed-Screenshot Mandate
187
218
 
188
219
  Every eligible e2e run MUST persist relevant screenshots to the framework's committed
@@ -215,6 +246,11 @@ classify as `visual_regression`. Do NOT auto-update. Surface as a blocking accep
215
246
  at `/cbp-verify` (round scope). The user must explicitly approve (`--update-snapshots`) or open a
216
247
  fix task. This relaxes the prior always-manual contract ONLY for new screens.
217
248
 
249
+ The model applies to ALL screenshot-capable frameworks, not just Playwright: Maestro gates
250
+ existing baselines with `assertScreenshot` against the committed PNG (the agent never
251
+ retakes/overwrites an existing baseline; acceptance = re-capture + `git add` after user
252
+ approval at `/cbp-verify`).
253
+
218
254
  ## Screenshot Collection Rule
219
255
 
220
256
  After every run, enumerate all committed PNGs and populate BOTH `screenshots[]` and
@@ -242,6 +278,11 @@ New-screen auto-capture (above) is the only exception to the always-manual contr
242
278
  - `tests_run === true`
243
279
  - `preflight.*.ok === true` for every required prerequisite
244
280
  - Every failure has `category` other than `env`, `auth`, or `access`
281
+ - `console_errors[]` is empty and `a11y.violations[]` is empty (where the framework reports
282
+ them — Playwright always does). Non-empty values with `status: 'completed'` are
283
+ inconsistent and hard-fail `codebyplan e2e verify-round` (`console_errors_reported`,
284
+ `a11y_violations_reported`); either fix in-scope or return `status: 'failed'` with the
285
+ matching failure category.
245
286
 
246
287
  Otherwise return `status: 'failed'`.
247
288
 
@@ -17,9 +17,10 @@
17
17
  # Two jobs:
18
18
  # ci SOFT tier (authoritative required check) — the baseline-tolerant
19
19
  # inner loop: lint, typecheck, test, build across the repo.
20
- # ci-strict HARDCORE tier (report-only) — whole-repo ABSOLUTE GREEN via
21
- # `codebyplan check --scope merged --no-baseline`. Non-blocking for
22
- # now; flip to a required check once the repo is absolute-green.
20
+ # ci-strict HARDCORE tier — whole-repo ABSOLUTE GREEN via
21
+ # `codebyplan check --scope merged --no-baseline`. Report-only by
22
+ # default; set `workflow.strict_check_enforced: true` in
23
+ # `.codebyplan/ci.json` to make it a real gate (then enforce-check).
23
24
 
24
25
  name: CI
25
26
 
@@ -69,19 +70,21 @@ jobs:
69
70
  - name: Build
70
71
  run: pnpm turbo build
71
72
 
72
- # ── HARDCORE strict tier (report-only) ──────────────────────────────────────
73
+ # ── HARDCORE strict tier ────────────────────────────────────────────────────
73
74
  # Whole-repo ABSOLUTE GREEN: `codebyplan check --scope merged --no-baseline`
74
75
  # ignores .check-baseline.json entirely, so ANY failing package (lint,
75
- # typecheck, test) fails this job. This is the future checkpoint→main gate.
76
+ # typecheck, test) fails this job. This is the checkpoint→main gate.
76
77
  #
77
- # report-only until apps/web baseline is burned down; flip to required after.
78
- # `continue-on-error: true` keeps it non-blocking the `ci` job above stays
79
- # the authoritative required check. Do NOT wire this job as a branch-protection
80
- # required check until the whole repo is absolute-green.
78
+ # report-only vs enforced is driven by `.codebyplan/ci.json`
79
+ # `workflow.strict_check_enforced` (scaffold-ci-workflow substitutes the
80
+ # tokens below): when false (default) the job name carries " (report-only)"
81
+ # and `continue-on-error: true` keeps it non-blocking; when true the suffix is
82
+ # dropped and `continue-on-error: false` makes it a real gate. Only flip the
83
+ # flag once the whole repo is absolute-green AND the job has run green in CI,
84
+ # then add it to branch protection via `codebyplan ci enforce-check`.
81
85
  ci-strict:
82
- name: Strict whole-repo green (report-only)
83
- runs-on: ubuntu-latest
84
- continue-on-error: true
86
+ name: Strict whole-repo green{{STRICT_NAME_SUFFIX}}
87
+ runs-on: ubuntu-latest{{STRICT_CONTINUE_ON_ERROR_LINE}}
85
88
  steps:
86
89
  - name: Checkout
87
90
  uses: actions/checkout@v4
@@ -112,10 +115,12 @@ jobs:
112
115
  # In the monorepo run the freshly-built bundle directly (the bin shim may
113
116
  # be missing because dist/cli.js did not exist at install time); in a
114
117
  # consumer repo that path is absent, so fall back to the installed bin.
118
+ # --concurrency=1 serializes turbo so the whole-repo matrix does not
119
+ # CPU-starve timing-sensitive test suites into flaky timeouts on the runner.
115
120
  - name: Strict check (no baseline)
116
121
  run: |
117
122
  if [ -f packages/codebyplan-package/dist/cli.js ]; then
118
- node packages/codebyplan-package/dist/cli.js check --scope merged --no-baseline
123
+ node packages/codebyplan-package/dist/cli.js check --scope merged --no-baseline --concurrency=1
119
124
  else
120
- pnpm exec codebyplan check --scope merged --no-baseline
125
+ pnpm exec codebyplan check --scope merged --no-baseline --concurrency=1
121
126
  fi
@@ -420,9 +420,6 @@ function main() {
420
420
  if (shouldShow("PACKAGE_FRESHNESS", cfg.package_freshness)) {
421
421
  let guarded = false;
422
422
  let installed = "";
423
- let newer = false;
424
- let latest = "";
425
- let inSync = true;
426
423
 
427
424
  const cachePath = path.join(
428
425
  root,
@@ -444,9 +441,6 @@ function main() {
444
441
  } else {
445
442
  installed =
446
443
  typeof cache.installed === "string" ? cache.installed : "";
447
- newer = cache.newer === true;
448
- latest = typeof cache.latest === "string" ? cache.latest : "";
449
- inSync = cache.in_sync !== false;
450
444
  }
451
445
  }
452
446
  } catch {
@@ -466,21 +460,14 @@ function main() {
466
460
  guarded = true;
467
461
  } else {
468
462
  try {
469
- const mRaw = fs.readFileSync(manifestPath, "utf8");
470
- const mParsed = JSON.parse(mRaw);
471
- const mVer =
472
- typeof mParsed?.version === "string" ? mParsed.version : "";
463
+ // Reading + parsing the manifest validates it — unreadable/invalid
464
+ // JSON falls into the catch below and hides the segment. (The
465
+ // manifest-vs-installed ⟳ nag was removed in CHK-195.)
466
+ JSON.parse(fs.readFileSync(manifestPath, "utf8"));
473
467
  const pRaw = fs.readFileSync(pkgPath, "utf8");
474
468
  const pParsed = JSON.parse(pRaw);
475
- const iVer =
469
+ installed =
476
470
  typeof pParsed?.version === "string" ? pParsed.version : "";
477
- installed = iVer;
478
- if (mVer && iVer && mVer !== iVer) {
479
- // manifest ≠ installed → .claude is out of sync. The ⟳ nag was removed
480
- // (CHK-195); only the bare version renders. inSync is retained for the
481
- // guard shape but no longer drives any output.
482
- inSync = false;
483
- }
484
471
  } catch {
485
472
  // Can't read files → hide segment.
486
473
  guarded = true;
@@ -492,6 +479,26 @@ function main() {
492
479
  const L8 = `${C.DIM}cbp${C.RST} ${installed}`;
493
480
  out.push(L8);
494
481
  }
482
+
483
+ // Settings-contract violations (read from cache regardless of guard state).
484
+ if (fs.existsSync(cachePath)) {
485
+ try {
486
+ const cacheRaw2 = fs.readFileSync(cachePath, "utf8");
487
+ const cache2 = JSON.parse(cacheRaw2);
488
+ if (cache2 && typeof cache2 === "object") {
489
+ const settingsMissing = cache2.settings_missing === true;
490
+ const settingsIgnored = cache2.settings_ignored === true;
491
+ if (settingsMissing) {
492
+ out.push(`${C.RED}⚠ settings.json missing!${C.RST}`);
493
+ }
494
+ if (settingsIgnored) {
495
+ out.push(`${C.YELLOW}⚠ settings.json gitignored!${C.RST}`);
496
+ }
497
+ }
498
+ } catch {
499
+ // Unreadable / invalid → no indicators
500
+ }
501
+ }
495
502
  }
496
503
 
497
504
  process.stdout.write(out.length ? out.join("\n") + "\n" : "");
@@ -401,6 +401,19 @@ def main():
401
401
  l8 = "%scbp%s %s" % (DIM, RST, _installed)
402
402
  out.append(l8)
403
403
 
404
+ # Settings-contract violations (read from cache regardless of guard state).
405
+ if os.path.isfile(cache_path):
406
+ try:
407
+ with open(cache_path, "r", encoding="utf-8") as fh:
408
+ cache2 = json.load(fh)
409
+ if isinstance(cache2, dict):
410
+ if cache2.get("settings_missing") is True:
411
+ out.append("%s⚠ settings.json missing!%s" % (RED, RST))
412
+ if cache2.get("settings_ignored") is True:
413
+ out.append("%s⚠ settings.json gitignored!%s" % (YELLOW, RST))
414
+ except Exception:
415
+ pass # Unreadable / invalid → no indicators
416
+
404
417
  sys.stdout.write(("\n".join(out) + "\n") if out else "")
405
418
 
406
419
 
@@ -494,4 +494,16 @@ if should_show PACKAGE_FRESHNESS "$CFG_PACKAGE_FRESHNESS"; then
494
494
  L8="${DIM}cbp${RST} ${_CBP_INSTALLED}"
495
495
  printf "%b\n" "$L8"
496
496
  fi
497
+
498
+ # Settings-contract violations (read from cache regardless of guard state).
499
+ if [ -f "$CBP_STATUS_CACHE" ] && command -v jq >/dev/null 2>&1; then
500
+ _CBP_SETTINGS_MISSING="$(jq -r '.settings_missing == true' "$CBP_STATUS_CACHE" 2>/dev/null)"
501
+ _CBP_SETTINGS_IGNORED="$(jq -r '.settings_ignored == true' "$CBP_STATUS_CACHE" 2>/dev/null)"
502
+ if [ "$_CBP_SETTINGS_MISSING" = "true" ]; then
503
+ printf "%b\n" "${RED}⚠ settings.json missing!${RST}"
504
+ fi
505
+ if [ "$_CBP_SETTINGS_IGNORED" = "true" ]; then
506
+ printf "%b\n" "${YELLOW}⚠ settings.json gitignored!${RST}"
507
+ fi
508
+ fi
497
509
  fi
@@ -73,6 +73,27 @@ The sole exception is `vscode-test`: the committed dir may be empty when the ext
73
73
  has no visual output (behavior-only tests). Agents must still define the dir and report
74
74
  `e2e_gallery: []` explicitly — not omit the field.
75
75
 
76
+ ## Quality-Capture Mandates
77
+
78
+ A green run that captured no quality signals is not evidence. Per framework:
79
+
80
+ - **playwright**: every spec imports `test` from the shared quality fixture
81
+ (`e2e/fixtures.ts`) — console/pageerror guard auto-active in every test; one axe WCAG A/AA
82
+ scan per page state. The output MUST carry `console_errors[]` (empty on clean) and `a11y`
83
+ per `context/testing/e2e.md`.
84
+ - **maestro**: existing committed screenshots are baselines — gate them with
85
+ `assertScreenshot` (never retake/overwrite); run `assertNoDefectsWithAI` with
86
+ `optional: false` at primary states when Maestro auth is available (the default
87
+ `optional: true` is warn-only and forbidden; record `ai_checks: 'unavailable'` when auth
88
+ is absent).
89
+ - A `status: 'completed'` output carrying non-empty `console_errors[]` or
90
+ `a11y.violations[]` is inconsistent — `codebyplan e2e verify-round` hard-fails it
91
+ (`console_errors_reported`, `a11y_violations_reported`).
92
+
93
+ Mutation flows MUST carry a behavior proof per `context/testing/e2e.md` § Functional
94
+ Assertion Mandate (network success proof / persistence proof / error-state test);
95
+ visibility-only suites are flagged `{type: 'shallow_coverage'}` in `critical_issues[]`.
96
+
76
97
  ## Cross-References
77
98
 
78
99
  - `context/testing/e2e.md` — Input/Output contract, pre-flight loop, failure classification,
@@ -47,13 +47,21 @@ The branch model is **feat→main direct**; `.codebyplan/git.json` has `integrat
47
47
  IS the per-checkpoint feat branch. The hardcore tier runs against that feat branch's merged
48
48
  state before it lands on main; do not assume a staging/integration hop exists.
49
49
 
50
- ## Report-Only Rollout
50
+ ## Strict-Tier Enforcement (report-only ⇄ enforced)
51
51
 
52
- The whole-repo hardcore CI **job** lands **report-only first** (`continue-on-error: true`) and is
53
- flipped to a required check ONLY after the `apps/web` baseline is burned down. Until then,
54
- `--scope merged --no-baseline` is advisory in CI — surfaced, not enforced — so a pre-existing
55
- `apps/web` red does not block a merge while the baseline is still being paid down. Locally,
56
- `cbp-verify` still runs and reports it.
52
+ The whole-repo hardcore CI **job** (`ci-strict`) is config-driven via `.codebyplan/ci.json`
53
+ `workflow.strict_check_enforced`, which `codebyplan ci scaffold-workflow` substitutes into the
54
+ generated `.github/workflows/ci.yml`:
55
+
56
+ - **`false` (default)** report-only: the job carries the " (report-only)" name suffix and
57
+ `continue-on-error: true`, so `--scope merged --no-baseline` is advisory in CI — surfaced, not
58
+ enforced. A repo whose baseline is still red keeps merging while it pays the baseline down.
59
+ - **`true`** — enforced: the suffix is dropped and `continue-on-error` is omitted (defaults to
60
+ `false`), making the job a real gate. Flip ONLY after the whole repo is absolute-green AND the
61
+ job has already run green in CI, then wire the check name `Strict whole-repo green` into branch
62
+ protection via `codebyplan ci enforce-check --check-name "Strict whole-repo green"`.
63
+
64
+ Locally, `cbp-verify` runs and reports the same check regardless of the flag.
57
65
 
58
66
  ## Cross-References
59
67