@mootup/moot-templates 0.2.1 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/templates/claude/settings.json +15 -1
- package/templates/devcontainer/run-moot-channel.sh +39 -17
- package/templates/devcontainer/run-moot-mcp.sh +40 -17
- package/templates/skills/doc-curation/SKILL.md +63 -43
- package/templates/skills/handoff/SKILL.md +62 -14
- package/templates/skills/implementation-workflow/SKILL.md +587 -0
- package/templates/skills/leader-workflow/SKILL.md +182 -68
- package/templates/skills/librarian-workflow/SKILL.md +56 -14
- package/templates/skills/memory-audit/SKILL.md +60 -33
- package/templates/skills/merge-to-main/SKILL.md +100 -0
- package/templates/skills/product-workflow/SKILL.md +181 -34
- package/templates/skills/spec-checklist/SKILL.md +490 -28
- package/templates/skills/stack-reset/SKILL.md +111 -0
- package/templates/skills/verify/SKILL.md +109 -31
- package/templates/teams/loop-6/CLAUDE.md +34 -18
- package/templates/teams/loop-6/team.toml +3 -3
|
@@ -4,7 +4,39 @@ description: Generate a prefilled verification checklist for writing a spec. Use
|
|
|
4
4
|
argument-hint: [feature scope or description]
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
# Spec Checklist
|
|
8
|
+
|
|
9
|
+
## Purpose
|
|
10
|
+
|
|
11
|
+
Provide Spec with a retro-accumulated checklist of spec-authoring disciplines: grounding probes to run during Phase A, common invariant patterns to include in § 8, known trap categories (operator-name scrub, path drift, schema-version bumps, LOC estimation calibration), and scoping-level rules learned from prior retros. The failure class this skill prevents is **Spec re-discovering the same class of problem at SPEC-READY time that previous runs already caught** — most items here exist because exactly one prior run missed them, and the carry-forward was promoted to this checklist so it doesn't recur.
|
|
12
|
+
|
|
13
|
+
**Why it exists as a named skill:** the spec-authoring surface has broadened substantially across ~30 pipeline runs. Single-spec drafting discipline is no longer carried in CLAUDE.md alone — it accumulates by necessity as each retro surfaces a new class of spec-drafting miss. The skill captures the accumulated disciplines so Spec can re-read them at draft-time phase entry (general-to-sharp ordering: CLAUDE.md is general; this checklist is sharp).
|
|
14
|
+
|
|
15
|
+
## Preconditions
|
|
16
|
+
|
|
17
|
+
- **Role:** caller is Spec. Other roles may reference individual items for cross-cutting rules (e.g., Impl reads the commit-hygiene rule during their own pre-commit check), but the skill is authored for Spec-at-draft-time.
|
|
18
|
+
- **Phase:** Spec is at Phase A grounding or Phase B drafting for a named feature. The checklist is not useful to Spec at SPEC-READY or retro time — it's a pre-draft tool.
|
|
19
|
+
- **Feature scope available:** Product has posted a scoping doc on main; Spec has it in context. Many checklist items reference § 4 / § 6 / § 8 / § 11 / § 13 of the forthcoming spec — those sections are what Spec will write using the checklist as guardrails.
|
|
20
|
+
|
|
21
|
+
## Invariants
|
|
22
|
+
|
|
23
|
+
- **The rule-list evolves via retro carry-forwards, not editorial whim.** Every checklist item traces to a specific retro-surfaced failure mode. When a new item is added, the originating retro synthesis names the failure class and provides the rationale. Items are never added speculatively ("this might bite us someday"); they're added in response to confirmed bites.
|
|
24
|
+
- **Retirement requires 3+ consecutive clean runs.** An item may be retired when three consecutive retros across similar spec shapes show no recurrence of the failure class AND the spec-drafting discipline has moved to a more durable artifact (e.g., promoted to CLAUDE.md, absorbed into a tool's default behavior). Single-run clean does not justify retirement.
|
|
25
|
+
- **Items are normative, not advisory.** Checklist items use "MUST" language for invariants and "MAY" for discretionary patterns; they don't dress up "consider" as a requirement or vice-versa. Read the item's keyword literally.
|
|
26
|
+
- **Section headings reflect life-of-a-spec phases.** API & Data Layer; Architecture Tests; Dependencies / Lockfiles; Backend-Wiring; Security; Tests; Observability; Migrations; etc. When adding a new item, classify it by where in spec drafting it would first apply, not by chronological retro-land order.
|
|
27
|
+
|
|
28
|
+
## Postconditions
|
|
29
|
+
|
|
30
|
+
- Before writing the spec, Spec has worked through each item and marked:
|
|
31
|
+
- Applicable: planned treatment in the forthcoming spec (which section, what form).
|
|
32
|
+
- Not applicable: reason why (e.g., "no SDK attribute references in this run's § 4").
|
|
33
|
+
- Deferred: flagged as a known-risk with an F-finding plan.
|
|
34
|
+
- Spec's forthcoming draft inherits the disciplines from applicable items without re-discovering each one.
|
|
35
|
+
- Items with grep-shape verifications have been run and their output embedded in § 13 Phase A.
|
|
36
|
+
|
|
37
|
+
## How to use
|
|
38
|
+
|
|
39
|
+
For the feature named below, work through each item before writing the spec. Check off as you go. When an item's discipline applies, plan the treatment (what section, what form) before drafting. When it doesn't apply, note why (makes post-ship retro easier).
|
|
8
40
|
|
|
9
41
|
## Feature
|
|
10
42
|
|
|
@@ -15,60 +47,239 @@ $ARGUMENTS
|
|
|
15
47
|
Work through each item before writing the spec. Check off as you go.
|
|
16
48
|
|
|
17
49
|
### API & Data Layer
|
|
50
|
+
- [ ] **Backend-response-shape + helper-name + SDK-attribute grep-verification at § 13 Phase A.** When the spec parses OR constructs a backend response body (frontend interceptor, SDK client, test assertion on response JSON), OR when § 6 drop-ins reference any cross-module helper by name (`connection_service.ws_key`), OR when the spec references `SDK.Class.attribute` for any third-party SDK (`boto3`, `redis-py`, `asyncpg`, `FastMCP`, `mcp`), grep the defining source at the pinned version and verify the name exists verbatim at § 13 Phase A. Don't accept the scoping doc's framing — scoping docs describe intent, not wire shape. Templates: `grep -rn "class <ResponseName>" backend/core/models/` for response shapes; `grep -rn "def <helper_name>" backend/core/<subdir>/` for referenced helpers; `grep -rn "<attr>" ~/.cache/uv/.../<pkg>/` OR `docker exec <impl-stack> uv run python -c "import <sdk>; print(hasattr(<sdk>.<Class>, '<attr>'))"` for SDK attrs. Read the definition inline; compare against spec's parse/call-site claim. 6 consecutive runs (B-2 F-SCOPING-B1-FLAKE-MIS-DIAGNOSIS; B-3 F-1 SDK-attr + handler-chain; A-1 F-SCOPING-ERRORENVELOPE-SHAPE-MISMATCH; A-2 4× F-SCOPING including helper-name mismatch `ws_connected_set_key` → `ws_key`; C-1 4× F-SCOPING; C-1a F-2 `_created_connections` at pinned redis-py 7.4.0). 30 seconds per reference; saves a guaranteed-broken first-pass + one mid-run amendment cycle. Pairs with the declared-field-trap rule below (same class: "spec references a symbol; grep-verify at source").
|
|
51
|
+
- [ ] **§ 14 explicit escalation trigger for known-unverified dependencies.** When spec § 11 has an F-finding that couldn't be fully grounded at spec time (e.g., pinned-SDK attribute spec references but can't verify without installed-source-access), § 14 implementation guidance explicitly names the escalation trigger AND the expected resolution path: "If X differs at pinned version, escalate via feature-thread question before coding." Not just "escalate if needed" — name the exact trigger + signal the right comm channel. Validated on C-1a F-2: Impl followed the trigger verbatim, saved ~15 min of rework, resulted in strictly-better code (F-2 retired via public API rather than queued to beta-hardening backlog). Low-cost addition at spec-draft; high-value when it fires.
|
|
52
|
+
- [ ] **Per-metric isolation test shape template.** When an evaluator emits N metrics with independent try/except isolation per metric, the isolation test MUST drive the evaluator at least twice: first with ONE broken source (raises on that metric's read; others emit cleanly), second CLEAN (all N emit). Assert metric X is absent from call 1 + present in call 2, and metrics Y/Z are present in both. Prevents unusual / hard-to-read test shapes (C-1a B3 had the right behavior but took a re-read to parse). Applies to observability-emitter-with-isolation, rate-limit-window-with-fallback, and any similar "N independent operations with per-op failure isolation" surface.
|
|
53
|
+
- [ ] **Operator-name scrub on spec content before SPEC-READY — case-sensitive.** CLAUDE.md's "operator identity in artifacts" rule applies to ALL durable repo artifacts — the existing forbidden-words scrub in `tests/test_skills_bundle_complete.py` covers moot-cli source + commit messages; the same rule applies to convo spec files (`docs/specs/*.md`). Pre-handoff grep: **`grep -nE "\bPat\b|\bLasswell\b" docs/specs/<feature>.md`** — case-sensitive, word-boundary anchored. **Do NOT use `-i` / case-insensitive:** uppercase `PAT` is the Personal Access Token acronym (appears legitimately in auth-related files) and is NOT an operator-name violation. TE-2 Q13 caught this false-positive class. Every case-sensitive hit needs substitution with generic role ("the operator", "operator + dev team"). C-1a shipped with 6 violations in the spec doc; QA repaired via direct-repair path. 30 seconds at spec-commit.
|
|
54
|
+
- [ ] **Defensive-validation-beyond-library rationale in § 4/§ 6 narrative.** When a spec adds validation guards beyond what the primary library provides (TE-2 used `packaging.Version` + regex double-guard because `packaging.Version` is lenient on 2-part semvers like "1.0" when strict 3-part is required), the reason for the belt-and-suspenders goes in § 4 (data model) or § 6 (code narrative), NOT in a code comment. Makes the review surface + Impl confidence clearer. Template: "Uses `<primary lib>` for <purpose>; adds <guard> because <library>'s behavior accepts <unwanted case> which would <consequence>."
|
|
55
|
+
- [ ] **F-finding log-line spec for observable suboptimal paths.** When an F-finding documents a runtime fallback (TE-2 F-3 "`main` ref fallback on un-synced team"), the spec includes the log line so operators see when the fallback fires post-ship. Otherwise the deviation is invisible in production. Template: "Emit `logger.warning(\"<message>\", extra={<structured fields>})` when the fallback path activates." Turns an acknowledged suboptimal path into an observable one.
|
|
56
|
+
- [ ] **Pydantic `Base64Bytes` for binary data over JSON boundaries.** When a Pydantic model receives binary bytes via JSON (gzip blobs, image data, protobuf payloads), use `pydantic.Base64Bytes` at the field declaration — NOT `bytes`. Pydantic v2's default `bytes` coerces the JSON string as UTF-8 bytes; `Base64Bytes` decodes base64 at the boundary. TA-1 first pytest run failed on this trap; 3-minute fix but avoidable at spec time. Template: `transcript_blob: Base64Bytes` with client sending `base64.b64encode(gzip_data).decode()`.
|
|
57
|
+
- [ ] **Client-local storage format: single-JSON-body over sidecar-split.** When spec describes a storage format for client-local state (retry queues, pending-write caches, offline buffers), prefer the simplest "single JSON body" form (POST body serialized inline, base64 for binary) unless there's a specific reason to split (payload > 100MB, streaming requirement, partial-write survival). Sidecars create sync problems; inline base64 is fine at typical payload sizes. TA-1 Spec specified `.jsonl.gz + .meta.json` sidecar; Impl shipped simpler single `.json`; QA flagged as undocumented-but-simpler deviation. Spec's form was over-engineered.
|
|
58
|
+
- [ ] **Shell-script fail-open requires error-trapping, not just explicit-exit-code grep.** When a spec prescribes `set -euo pipefail` + a "fail-open on error" requirement, the invariant MUST specify error-trapping in critical sections (`|| true`, explicit `trap ... ERR`, or `set +e` around risky commands) — not just "grep for explicit `exit 1|2` → 0 matches." `set -e` causes implicit non-zero exits on unhandled command failures that evade the grep. TA-1 invariant 16 had this gap: `gzip` on a vanishing transcript would exit non-zero without matching the grep pattern. Template: invariant says "after critical command X, wrap with `|| (logger_and_exit_0 \"<context>\" || true)` so any failure still reaches the explicit exit 0 path," AND add a Required test that injects a mid-script failure to verify the fail-open invariant holds.
|
|
59
|
+
- [ ] **Schema-version bump sweep enforcement — pre-SPEC-READY requirement.** When a spec bumps `PUBLIC_TARGET` (or any schema-version constant), spec MUST run `grep -rn 'version == <OLD>\|ver == <OLD>' backend/tests/` AND enumerate ALL matching files in § 5.2 before SPEC-READY. If ≥3 files match, either flag as a separate mini-commit OR bundle the test updates into § 6 as drop-ins. **This rule has now been missed on 2 consecutive runs (TA-1 + TA-2)** despite being in the spec-checklist — enforcement gap, not rule gap. Treat as a pre-SPEC-READY blocker, not a nice-to-have. TA-1 broke 5 test files hardcoding `version == 11`; TA-2 broke 5 test files hardcoding `version == 12`; same shape, same 5-min wasted Impl-time. **Test-infra refactor flagged:** introduce `CURRENT_PUBLIC_SCHEMA_VERSION` constant in `backend/tests/conftest.py` (or similar) that version-asserting tests import — bumps become a single-line change. Backlog candidate for a future devcontainer-infra run.
|
|
60
|
+
- [ ] **F-findings with "alpha-rare / alpha-acceptable" silent-drift risks require observability spec.** When an F-finding in § 11 flags a risk as "alpha-acceptable" OR "alpha-rare" OR similar deferral language, the spec MUST include either (a) a log-line spec so operators see when the drift fires (`logger.warning(...)` with structured fields), OR (b) an explicit "we won't detect this if it fires" acknowledgment in § 11 language. Pure acceptance without observability is a false-positive risk — operators think the edge case never fires when really they just can't tell. TA-2 F-2 (multi-archive-same-session) shipped without observability; caught at TA-2 synthesis as a gap. Applied at spec-draft time: every alpha-acceptance F-finding has a log-line OR an explicit "silent" flag.
|
|
61
|
+
- [ ] **Shell variable naming: ≤3-char vars deserve operator-name scrub scrutiny at draft time.** Short shell variables (`$pat`, `$usr`, `$lass`, `$boss`, etc.) can accidentally match the case-sensitive `\bPat\b|\bLasswell\b` scrub regex even when semantically unrelated. Pre-commit grep the shell-script drop-ins for short variables that might collide; rename to unambiguous forms (`$pat` → `$post_at`, `$usr` → `$user_id`, etc.). TA-2 caught `$pat` during drafting; low-cost pre-empt. Minor spec-drafting discipline.
|
|
62
|
+
- [ ] **Service-module LOC estimation distinct from store-layer 40-60 band.** Greenfield SERVICE modules that **enumerate + dispatch + emit audit events + compute partial-success** run ~150–250 LOC (CP-1 `control_plane/service.py` hit 218 vs estimated 80 = 2.7× miss). Store-layer helpers stay 40-60 (below). Service-module template: **~180 LOC anchor + ~30 LOC per additional operation**. Apply when spec adds a new `core/<name>/service.py` module; distinct from CRUD-store helpers.
|
|
63
|
+
- [ ] **`ls` every path literal in § 4/§ 6 at spec-draft time.** Path drift (`core/utils/` vs `core/config/`, `core/control/` vs `core/control_plane/`, `api/routes/` vs `api/routers/`) is recurring across runs (CP-1 spec had `core/utils/id_encoding.py`; actual is `core/config/id_encoding.py`). Pre-SPEC-READY: `for p in $(grep -oE 'backend/(core|api)/[a-z_/]+\.py' docs/specs/<file>.md); do ls "$p" 2>/dev/null || echo "MISSING: $p"; done`. 30-second discipline; catches all path drift.
|
|
64
|
+
- [ ] **§ 9 Security tenant-fence assertion on `is_admin`-gated routes.** When a route gates on `actor.is_admin` AND takes a foreign resource ID (`space_id`, `team_id`, `archetype_id`, etc.), § 9 MUST answer: (a) is `is_admin` tenant-scoped at the DB layer (is there a tenant boundary invariant that prevents cross-tenant admin)? (b) does the service layer re-verify target resource's tenant matches issuer's tenant? TA-3, TE-3, CP-1 all shipped without this assertion — intentional-alpha pattern but unspecified. Leaving it unstated means QA flags non-blocking but the pattern bites at beta-hardening or audit time. Template: **"`is_admin` is tenant-scoped at DB; service re-verifies target tenant matches issuer tenant" OR "explicitly alpha-only: admin can act cross-tenant; F-<N> for beta."**
|
|
65
|
+
- [ ] **Store-layer LOC estimate uses ~40–60 per function, not ~15.** When § 5.2 enumerates N new store functions in a single module, estimate per-function at **~40–60 LOC each** — typical CRUD-with-tenant-scoping + error handling + row-to-model mapping adds up fast. TA-3 Spec projected +110 LOC / 6 functions ≈ 18/function; Impl shipped +345 LOC ≈ 58/function. 3× systematic miss. Template: `N × ~50 LOC baseline + (novel-algorithm functions get +20 LOC) + (pure helpers get -25 LOC)`. Calibration: `CRUD-with-tenant-scope ≈ 45`, `multi-join query builder ≈ 55`, `background-task driver ≈ 60`, `pure classifier helper ≈ 25`.
|
|
66
|
+
- [ ] **Idempotency-test invariant-to-assertion mapping: assert the MECHANISM, not just behavior.** When spec § 11 or § 8 says "idempotent via X mechanism" (ON CONFLICT, UNIQUE constraint, unique index, atomic UPDATE WHERE ... RETURNING), the Required test MUST explicitly assert the MECHANISM — not merely observational behavior. "0 new rows on re-run" is stronger observationally but doesn't prove `ON CONFLICT DO NOTHING` is what silenced duplicates; a bug could produce the same observation via a different (incorrect) code path. TA-3 B10 had this gap. Template: Required test captures PG constraint-name via `pg_stat_statements` OR via a mocked duplicate-insert asserting the constraint in an ignored error OR uses a probe where the mechanism's absence would produce a different observable outcome (e.g., a raised exception that wouldn't be raised under the mechanism).
|
|
67
|
+
- [ ] **(A) narrow-now-with-reserved-fields pattern when scoping assumes unsupported schema state.** When F-SCOPING surfaces "scoping doc assumes schema state X but schema doesn't support X today," prefer **(A) narrow the MVP + reserve forward-compat fields** over (B) schema-change-now. Ship the primitive that matches current schema; reserve the fields in the response/bundle/model schema as explicit `None` / `null` / empty with forward-compat comments. Future run (when schema evolves) populates the reserved fields without breaking `bundle_version`-1 consumers. TE-3 validated — `db_state` + `transcript_archive` reserved-null in bundle schema; TE-4 future run populates when team→space hierarchy lands. Reusable for any feature where scoping intent exceeds current schema capability.
|
|
68
|
+
- [ ] **Pydantic `Field(pattern=...)` regex on encoded IDs MUST use lowercase.** Crockford base32 encoding is lowercase. Any `Field(pattern=r"^<prefix>_[A-Z0-9]+$")` on `tem_/spc_/usr_/agt_/evt_/thr_/oac_` encoded IDs is WRONG. Use `[a-z0-9]+`. Grep at spec-draft time: `grep -nE "pattern=.*\[A-Z0-9\]" docs/specs/<file>.md` → any hit on an encoded-ID field is a red flag. TE-3 B1 caught this on first run; 30-second pre-commit discipline.
|
|
69
|
+
- [ ] **Operator-name scrub regex: `\bpat\b` both-side word boundaries; Q14 grep uses `-nE` (NOT `-niE`).** The previous `pat\b` regex (word-boundary right only) matches "com**pat**" in "forward-compat" / "backward-compat" — real false-positive class that cost TE-3 6 commits iterating. Use `\bpat\b` (both-side word boundaries) to exclude "-pat" suffix false positives. Also: Q14 grep in § 14 uses `-nE` (case-sensitive) to avoid the `PAT` acronym (Personal Access Token appears legitimately in auth files); case-insensitive `-niE` hits PAT as false positive. Example: `grep -nE "\bPat\b|\bLasswell\b" docs/specs/<file>.md`.
|
|
70
|
+
- [ ] **Drafting discipline: scrub once pre-commit, iterate in-file, then single-commit.** When a spec commit has a grep Q-gate rule, run the check pre-commit, apply fixes in the working tree, THEN single-commit the clean version. TE-3 took 6 commits for scrub work that should have been 1. Applies to operator-name scrub, encoded-ID pattern, version-bump sweep — any automated grep Q-gate.
|
|
71
|
+
- [ ] **New global-ID-bearing tables register in `id_encoding.py` TWO places.** When spec adds a new table that uses `generate_global_id(table=...)`, § 5.2 lists `id_encoding.py` as a modified file AND calls out BOTH registrations: (1) add the prefix to `PREFIXES` frozenset (e.g., `"tra"`), (2) add the table name to `_GLOBAL_ID_TABLES` frozenset (e.g., `"transcript_archives"`). Missing (2) surfaces as `ValueError` on first `generate_global_id` call. TA-1 missed (2); 30 seconds after first pytest run. 5-second spec-draft discipline.
|
|
72
|
+
- [ ] **`test_architecture.py` `expected_subdirs` update when adding new `core/` subdir.** When spec adds a new `backend/core/<subdir>/` directory (TA-1 added `core/aws/`; similar for any future C-1b/TA-3 new-subdir spec), § 5.2 must list `test_architecture.py` as a modified file + enumerate the specific assertions to update: `expected_subdirs` set gets new entry, per-file lists (`models/`, `stores/`, etc.) may need new test-file entries. TA-1 spec said "+2 lines" but Impl landed ~8 LOC across 3 set-additions. Under-estimating this surface is a recurring spec miss.
|
|
73
|
+
- [ ] **Q-gate count-occurrences-in-drop-in rule.** When a § 14 Q-gate counts `grep` occurrences (e.g., "`toastStore.push` appears exactly 3 times in `api.ts`"), Spec counts the occurrences in their own § 6 drop-in BEFORE writing the Q-gate's literal count. Prevents off-by-N wording where the spec source block contains N+K call sites but the Q-gate asserts N. Run A-1 Q8 asserted "exactly 3 `toastStore.push`" but § 6.1 drop-in literally contained 5 (3 in `_handleErrorResponse` + 1 each in `_handle*Error` helpers). Behavioral invariant held; wording didn't. 5-second count at spec-draft time.
|
|
74
|
+
- [ ] **Retained-intentional partial-coverage cross-reference in § 10.** When spec § 10 decides to retain partial coverage of some surface (B-2 retained access-log on `/health`; A-1 retained access-log on pre-auth routes), later runs touching that surface cross-reference the earlier § 10 decision in their own § 10: either (a) explicitly inherit the decision, OR (b) revisit with fresh rationale. Avoids silent drift on intentional-vs-accidental partial coverage; keeps the decision auditable. Grep template at spec-draft: `grep -rn "retained-intentional\|§ 10\|partial coverage" docs/product/run-*-synthesis.md` for the surface name; cross-reference if a prior decision exists.
|
|
75
|
+
- [ ] **Vitest file-split naming when tests span component + store surfaces.** When § 7 tests span both a Svelte component (UI) and a `.svelte.ts` rune-store (reactive state), spec § 5.1 names both test files inline (`Toast.test.ts` for the component + `toasts.test.ts` for the store) rather than listing one and leaving the split to Impl judgment. Impl's natural instinct is to split per-module; pre-empting in spec avoids the "one file became two" retro bullet. A-1 landed this way (Impl judged well); spec had listed only one. **A-2 extension:** when adding a new concern (e.g., `connection_status`) to an existing component, name a dedicated test file for that concern (`EventCard.connection.test.ts`), separate from any existing component-test file. Same principle — pre-empt the per-concern split Impl would naturally apply.
|
|
76
|
+
- [ ] **§ 5.2 file inventory marks Impl-judgment touchpoints.** When § 6 explicitly defers a mechanism choice to Impl (e.g., "create a new store OR reuse existing buffer"), § 5.2 must list the optional file with a `(optional per § 6.X judgment)` marker. Prevents Impl-judged additions from looking like scope creep in review. A-2 added `recentEventsStore.svelte.ts` + a 4-line `SpaceRoom.svelte` wire under § 6.4's deferred judgment; spec § 5.2 hadn't enumerated the touchpoint, so the additions read as "surprise changes" until § 10 deviations explained them. 5-second addition at spec draft time.
|
|
77
|
+
|
|
18
78
|
- [ ] **Existing API endpoints:** Which endpoints are relevant? Do any already serve the data needed?
|
|
79
|
+
- [ ] **`attr is None` as sentinel requires verifying the loader preserves the None.** When a D-decision or § 6 code branches on `obj.field is None` to mean "role/user did not set field in the input," grep the loader constructor (`__init__`, `from_dict`, `from_toml`, etc.) for eager-cascade patterns like `self.field = data.get("field", default_field)`. If the loader collapses the None → default at load time, the attribute alone can no longer distinguish "explicit-value-equals-default" from "inherited-default" — the sentinel is destroyed by the load. Spec must either (a) change the loader to preserve source distinction (separate `_raw` dict, boolean cascade flag, sentinel singleton), or (b) re-parse the raw source at the call site that needs the distinction. Run AF's helper called `_render_with_default(agent.model, ...)` expecting `None` to mean "role omitted," but `AgentConfig.__init__` eagerly cascaded to global default — Impl caught it on first pytest, Spec amended to add `AgentConfig._raw`. Catches at § 13 Phase A, saves one mid-run amendment cycle.
|
|
80
|
+
- [ ] **"Match family auth" ≠ "match family 404 semantics."** When scope says "mirror the existing `/family/<id>/...` auth pattern" AND the new route must return 404 on unknown ID, check whether the family uses a `registry.get_or_create` (auto-create) pattern. If so, the new route must DIVERGE from the family by calling the store method directly (e.g., `space_store.get_space` not `registry.get_or_create`) — otherwise the 404 is impossible because unknown IDs just auto-create and return 200. Call this out as an explicit D-decision ("D-NO-AUTO-CREATE" or similar), not as implicit "match family." Run AE's D-NO-AUTO-CREATE resolved this correctly; worth making the audit explicit so future routes don't silently inherit get_or_create when they needed real-404 semantics.
|
|
19
81
|
- [ ] **API gaps:** Are new endpoints needed? What request/response shapes?
|
|
20
82
|
- [ ] **Data model changes:** Any new fields, tables, or migrations?
|
|
21
|
-
- [ ] **
|
|
83
|
+
- [ ] **SSE events:** Does this feature need real-time updates? Are the right events already emitted?
|
|
84
|
+
- [ ] **Route handler naming.** Refer to FastAPI route handlers by their *route function* name (e.g., `add_speech`), not the bridge method they call (e.g., `add_human_speech`). Impl greps the route file by handler name; a wrong name burns a clarification cycle.
|
|
85
|
+
- [ ] **New entity type checklist.** When adding a new encoded-ID entity type, the spec must enumerate FOUR steps, not just "add the route": (1) register the prefix in `backend/core/utils/id_encoding.py`; (2) decode `tenant_id` in any route that accepts the new ID; (3) write an API smoke test for the happy path; (4) update `expected_subdirs["stores"]` in `backend/tests/test_architecture.py` if the entity adds a new stores subdirectory. Missing any of these is a recurring spec gap.
|
|
86
|
+
- [ ] **Rate-limit reads and writes separately.** When a new endpoint family has both reads and writes, don't copy a write-endpoint rate limit across all verbs. Default reads to `rl_standard` (20/min), writes to `rl_low` (5/min). Spec a per-verb rate-limit table when the family has more than one method.
|
|
87
|
+
- [ ] **FastAPI 422-override for RFC-specified 400-class errors.** When an RFC mandates a specific 400-class response for malformed input (RFC 7591 § 3.2.2: `invalid_client_metadata`; RFC 7009 revoke: `invalid_request`; RFC 6749 token: `invalid_request`), FastAPI's default `Body(model)` path returns 422 via Pydantic validation — wrong status code, wrong payload shape. Pattern to spec (D-CONTENT-TYPE-MANUAL-CHECK, canonicalized from AH-b): accept `Request` instead of `Body(model)`, read `Content-Type` header manually (reject non-`application/json` with the RFC-specified 400 + error code), call `await request.json()`, feed to `Model.model_validate(...)`, catch `(ValueError, ValidationError)` and convert to the RFC-specified 400-class response. Route handler signature becomes `async def handler(request: Request) -> RFCResponse: ...` — `response_model=RFCResponse` still applies. Reusable for every OAuth / OIDC / RFC-pinned route; Run AH-b's `POST /oauth/register` validates Content-Type this way, and sub-doc #3's `/oauth/revoke` will inherit the pattern.
|
|
88
|
+
- [ ] **Multi-tenant schema migration triple-write.** Any schema migration must include three components: (1) DDL in `backend/core/stores/tenant_schema.sql`, (2) DDL in `backend/core/stores/public_create.sql` (or wherever the public schema lives), AND (3) a `DO $$ LOOP` block in the schema file that applies the DDL to every existing tenant schema at runtime. Missing the loop ships migrations that work for new tenants and silently break for existing ones.
|
|
89
|
+
- [ ] **Schema version bump sweep list.** When bumping `PUBLIC_TARGET` or `TENANT_TARGET`, grep `backend/tests/` for hardcoded `ver == N` / `"version N"` literals AND grep `backend/core/` for literal `target = <N>` / `target=<N>` assignments. Enumerate both sets in spec § 4. Known test sites: `test_migration_runner.py`, `test_api.py::test_migration_v2_to_v3`. Known core site: `backend/core/stores/migrate.py` has historically carried a hardcoded `target = 8` local alongside the `PUBLIC_TARGET` constant — bumping the constant alone leaves the local stale and the migration silently no-ops (Run AH-a hit this on T1: `oauth_clients does not exist` until Impl switched the local to use `PUBLIC_TARGET`). Missing any leaves stale tests asserting the old version or stale migration paths that don't actually migrate.
|
|
90
|
+
- [ ] **New-table PK column is always `id`, never `<entity>_id`.** Every convo public-schema table uses `id BIGINT PRIMARY KEY` — not `space_id BIGINT PRIMARY KEY`, not `client_id BIGINT PRIMARY KEY`. FK columns in dependent tables CAN be named `<entity>_id` (e.g., `oauth_tokens.client_id BIGINT REFERENCES oauth_clients(id)`), but the owning table's PK is always `id`. The convention is load-bearing: `generate_global_id(conn, table=...)` queries `WHERE id = $1`, so a PK named anything else raises `UndefinedColumn` at first call. Run AH-a shipped `oauth_clients.client_id BIGINT PRIMARY KEY`; Impl renamed to `id` + kept FK columns named `client_id` (correct). Spec § 6 DDL blocks for new tables should follow the convention directly at draft time.
|
|
91
|
+
- [ ] **Side-effectful error paths: commit the side effect OUTSIDE the raising transaction.** When a store function must both (a) persist a side effect — INSERT/UPDATE for an audit row, a revocation, a security-event log — AND (b) raise an exception to signal failure, the side effect MUST commit outside the transaction that raises. Pattern that SILENTLY FAILS: `async with conn.transaction(): await revoke(...); raise OAuthStoreError(...)` — the raise rolls back the transaction, so the `revoke` is discarded. Correct pattern: autocommit the side effect (`await conn.execute(...)` outside any `transaction()` block), THEN raise. Run AH-a's replay-chain revocation hit this — spec put revoke-then-raise inside the rotation transaction; T14 failed until Impl restructured to revoke-via-autocommit, raise after. Relevant to: replay detection, rate-limit soft-block audit rows, security-event logging, audit trails on any failure path. When a store function has this shape, spec § 6 should call out the transaction boundary explicitly: "revoke runs on `pool.acquire()` (autocommit), the raise fires after."
|
|
92
|
+
- [ ] **OAS endpoint citations are verb-aware.** When § 13 Phase D cites an OAS endpoint as grounding evidence ("`GET /api/spaces/{space_id}` exists at line 944"), the grep must key on `<path>:\n.*<verb>:` or explicitly enumerate the verbs present on that path — NOT a bare path grep. A bare `grep "/api/spaces/{space_id}"` matches the path under any verb (GET / POST / PATCH / DELETE) AND sub-paths that contain the literal as a prefix. Run AD-c cited a phantom `GET` on a PATCH-only path; Impl hit it at first paste and Spec had to amend mid-run. Single bare-path grep is not sufficient proof the verb exists. Pair with the existing "verify Product's grounding claims" rule — both aim to catch aspirational grounding before Impl.
|
|
93
|
+
- [ ] **"Mirror Python" rationale requires runtime-correctness anchor.** When a D-decision uses "mirror Python" / "mirror the existing CLI" / "mirror the working implementation" as its justification, the § 13 grounding log must state explicitly whether the mirrored path is confirmed-succeeding-against-a-real-backend or just confirmed-exists-as-code. Python's `scaffold.py` called a phantom GET and silently fell back on 404 — "mirror Python" treated as correctness-proof meant the phantom propagated into the JS spec. Grounding note template: `"mirror Python: confirmed <function> succeeds against alpha (or local) backend as of <date>"` vs `"mirror Python: code path exists at <file:function>; runtime correctness not verified — treat as suggestive, not authoritative."` Either is fine; the conflation is the bug.
|
|
94
|
+
- [ ] **Schema-scoped queries are global on the default tenant.** A `SELECT … FROM {schema}.table` in resource-cap or rate-limit SQL counts ALL rows on `ten_1` because `ten_1`'s schema IS `public`. Grep schema-templated SQL at spec time and prefer per-owner scoping (`WHERE owner_id = ?`) for resource limits.
|
|
95
|
+
- [ ] **Internal consistency sweep between § 4 schema and § 6 handler / § 11 F-findings.** Before SPEC-READY, grep § 4's schema DDL for `NOT NULL` / `NULLABLE` / `REQUIRED` / `OPTIONAL` constraints and diff against behavioral directives in § 6 (handler code) + § 11 (F-findings). Any contradiction — column declared `NOT NULL` but handler passes `None` under some branch; field declared `OPTIONAL` but behavior requires it; unique constraint in § 4 but § 6 inserts without collision handling — is a spec defect. Impl will catch these at pre-draft or (worse) Impl time + escalate as `message_type="question"` per the new implementation-workflow defect-probe rule. **Spec's job is to catch them first.** Run AH-e-bootstrap-backend had § 4.1 declare `auth_code_id BIGINT NOT NULL REFERENCES oauth_codes(id)` while § 6.3 + § 11 both mandated `None` for PAT-authenticated installs — Impl picked nullable at commit time and documented as deviation; the cleaner escalation was pre-code. Grep template: read § 4 + § 6/§ 11 sequentially; for every `NOT NULL` in § 4, verify no § 6 code path + no § 11 finding shows the field as `None`; for every `NULLABLE` in § 4, verify the § 6 code path has a branch handling `None`. 30-second cross-reference; catches contradictions that would otherwise cost a mid-run escalation or a pick-at-commit deviation.
|
|
96
|
+
- [ ] **§ 7 fixture-name grep at draft time.** When § 7 instructs Impl to "reuse fixture X from conftest.py" or "extend conftest's fixture Y," grep `backend/tests/**/conftest.py` + any referenced per-file conftests for the exact identifier before writing the use sentence. Fixture-name drift over months is common (`admin_pat` → `admin_key` when sessions-vs-PATs was re-separated; `oauth_actor_with_scopes` → `_full_oauth_flow` when the AH-a author refactored the test helper). Spec text that references the old name ships a paste-unfriendly test snippet; Impl pastes, pytest errors with `fixture '<old_name>' not found`, Impl pivots, minor friction. Run AH-e-bootstrap-backend QA pre-draft caught four such drifts (F-QA-1..4). 30-second grep per fixture reference at § 7 drafting prevents the class.
|
|
97
|
+
- [ ] **API → store call paths check for intervening service layer.** When § 6 introduces a new API-layer → store-layer call (`api/auth.py` calling `actor_store.xxx`, `api/routes/*.py` calling `<domain>_store.xxx`), re-read the arch-doc for the relevant domain to confirm whether a service layer is interposed. Current convo arch-invariants with service layers: **R6 Stage B auth_service** (API → auth_service → actor_store, NOT API → actor_store directly); future AH-c oauth_service layer may follow. Grep template: `backend/core/auth/` + `backend/core/<domain>/` for `service.py` modules; if present, spec routes the call through the service, not the store. Run AH-e-bootstrap-backend Spec specced a direct `actor_store` call from `api/auth.py` which tripped `test_api_auth_has_no_business_logic`; Impl added `auth_service.resolve_revoked_installation_for_bearer` wrapper. Prevents retroactive "deviate to preserve arch" at Impl time.
|
|
98
|
+
- [ ] **SDK-attribute grep at § 13 Phase A (not Phase D during Impl pre-draft).** When a spec § 4 / § 6 references an attribute on a third-party SDK object by name (`AccessToken.session_id`, `StreamableHTTPSessionManager._server_instances`, `CallToolResult._meta`, `AgentConfig._raw`, etc.), grep the SDK source at the pinned version to verify the attribute exists — AS PART OF § 13 Phase A grounding, not deferred to "verify at grounding later" or "confirm at handoff." Same class as declared-field-trap for request models (rule below) but for SDK surfaces. Run B-3 § 6.6 referenced `AccessToken.session_id` (doesn't exist at pinned `mcp` version); 10-second grep at Phase A would have caught; instead Impl hit at pre-draft → ~20-min escalation-round-trip for a mid-run amendment. 3rd data point across runs: TE-1's `AgentConfig._raw` (declared-field-trap class), AH-f's `_server_instances` (caught via function-API-smoke), B-3's `AccessToken.session_id` (missed). Rule: every spec-named SDK attribute has a Phase A grep verifying presence at the pinned-lock-file version. `docker exec <impl-stack> uv run python -c "import <sdk>; print(hasattr(<sdk>.<Class>, '<attr>'))"` works; `grep -rn '<attr>' /path/to/sdk/<file>.py` works. Takes 10 seconds; saves a mid-run amendment cycle.
|
|
99
|
+
- [ ] **One-chain-all-the-way trace at § 13 Phase A.** When the spec claims "function X calls function Y via path Z" (e.g., "MCP tool handler calls `bridge.add_event` directly," "route handler calls `service.foo` which calls `store.bar`"), trace that chain end-to-end through adapters + transports + middleware during Phase A grounding rather than accepting the scoping-doc framing. Scoping docs describe intent; actual call graphs may go through unexpected indirections (httpx → REST → bridge rather than direct; service layer interposing where spec assumes direct store access; etc.). Run B-3 § 6.6 assumed MCP handlers call `bridge.add_event` directly; reality: they go via httpx → REST → bridge (MCP adapter is a network client, not a direct bridge consumer). 2-minute trace of `share()` → `adapter._request` → httpx → REST would have surfaced this + the `AccessToken.session_id` premise-failure with a single check. Cheap insurance for architecture-knowledge gaps; critical when spec-authoring agent's mental model of the target subsystem is older than the code's current shape.
|
|
100
|
+
- [ ] **Emission-site per-file enumeration when wrapping a pattern.** When spec § 6 instructs Impl to "wrap emission site X at function F" (e.g., "add synthetic request_id fallback at `_apply_flip`"), grep the full target file for all sites matching the emission pattern before writing § 6. Parallel emitters at `release_actor` / `discard_actor` / `on_terminate` / etc. drift out of grep-sweeps if the first call-site pattern is the only one Spec checked. Run B-3 § 6.5 named `_apply_flip` only; Impl correctly extended the wrap to `release_actor` at `connection/service.py:423` when grepping during coding. Mechanical; 30-second additional grep at § 6 drafting (template: `grep -n 'add_event\|emit\|publish' backend/core/<domain>/*.py` for the emission pattern) closes the gap. Pairs with the existing "Removing a side-effect: grep existing-test assertions that depend on the trigger" rule — same class (enumerate ALL sites, not just the primary).
|
|
101
|
+
- [ ] **Declared-field-trap: request-model field declaration is necessary but NOT sufficient proof the handler uses it.** When spec § 13 grounding asserts "backend supports X via request field Y" or D-decision asserts "the existing route already accepts field Z," grep the FULL handler chain, not just the request model: (a) the Pydantic request model (`grep "<field>" backend/core/models/*.py`), (b) the route handler function body (`grep "<field>" backend/api/routes/<file>.py` — verify the handler reads `body.<field>`, not just that the parameter exists), (c) every bridge / service method the handler calls (`grep "<field>" backend/core/<subdir>/*.py` — trace the value forward through the call chain), (d) any event / model factory the handler constructs (e.g., `ContextEvent.from_<source>(...)` — verify the factory accepts + persists `<field>`). Miss any link and the claim is aspirational; the request model declares the field, but the handler silently drops it. Run UI-1: `AddSpeechRequest.thread_id` was declared, but `add_speech` + `add_human_speech` + `from_human_speech` all dropped it; Impl caught at pre-draft + amendment added 15 LOC backend wiring. Run AH-e's D-MIDDLEWARE-ACTIVATION-PREDICATES rule is the library-side analog; this one is the application-code analog. Pairs with the existing "verify Product's grounding claims" rule.
|
|
102
|
+
- [ ] **OAS YAML paired regen.** Any commit that changes (a) a Pydantic model docstring, (b) a route handler docstring, (c) a `response_model=` annotation, or (d) adds/removes a route must regenerate `docs/api/openapi.yaml` in the SAME commit. The `test_openapi_yaml_matches_fastapi_app` drift-gate test fires on any mismatch between generated OAS and committed YAML — unpaired changes leave the YAML stale and the drift test fails in the NEXT sub-run's baseline where it looks like a "known flake." Mechanical regen: `docker exec <impl-stack> uv run python -m api.openapi_gen > docs/api/openapi.yaml` in the same commit as the source change. Symmetric to the `uv.lock` paired-reformat rule (TE-1 promoted); same class — a paired-artifact whose drift accumulates silently across runs. Run UI-1 absorbed a TE-1 docstring-scrub-unpaired-with-OAS-regen at QA time (QA repair commit `fe053d0`).
|
|
22
103
|
|
|
23
104
|
### Frontend
|
|
24
105
|
- [ ] **Existing components:** Which components are affected? Read them before proposing changes.
|
|
25
|
-
- [ ] **Existing helpers:** Check
|
|
26
|
-
- [ ] **
|
|
106
|
+
- [ ] **Existing helpers:** Check `api.ts` and other utilities — is there code that already does part of this?
|
|
107
|
+
- [ ] **SvelteKit 5 compatibility:** Confirm runes mode, $state/$derived/$effect patterns.
|
|
108
|
+
- [ ] **Svelte 5 runes filename suffix.** Non-component files using `$state` / `$derived` / `$effect` must be named `.svelte.ts` (or `.svelte.js`). A bare `.ts` filename silently fails to compile reactivity — no error, the runes just don't track. Spec the suffix explicitly when introducing a new state file.
|
|
109
|
+
- [ ] **SvelteKit redirect-only / API-only routes use `+server.ts`, not `+page.server.ts`.** `+page.server.ts` without a sibling `+page.svelte` works in the dev server but creates no route in the production build under adapter-node. Redirect-only routes (like `/settings/api-keys` → `/settings/access`) and JSON-only endpoints MUST use `+server.ts` with exported `GET`/`POST`/etc. handlers. Run AH-d shipped `+page.server.ts` for a redirect; dev was green, production adapter-node 404'd. QA caught with a Playwright redirect-existence test (T6); Impl repaired via rename. Spec template for a redirect: `export const GET: RequestHandler = ({ url }) => { throw redirect(302, "/target" + url.search); };` in `+server.ts`. Pairs with the deployment-mode verification rule below.
|
|
27
110
|
- [ ] **Styling:** Document exact colors, sizes, and spacing using the existing palette.
|
|
111
|
+
- [ ] **Logout-clear invariant enumerates ALL introduced stores.** When a spec adds N Svelte 5 rune stores, § 6 logout hook calls `.clear()` on all N (not just the primary), and § 7 asserts the full enumeration via a Required test. A-1 established T-6 for `toastStore` + `criticalBannerStore` (both cleared). A-2 added `connectionStateStore` + `recentEventsStore` but spec § 6 only enumerated the first in `logout()` — silent T-6-class partial gap (benign because SSE rebuilds the buffer post-login, but stale events could briefly appear). Rule: count the `.svelte.ts` stores introduced by the spec, match them 1:1 in the logout hook + the Required test. Structural invariant grep: `grep -nE '\.clear\(\)' frontend/src/lib/api.ts` ≥ N new-store count.
|
|
28
112
|
|
|
29
113
|
### Dependencies
|
|
30
|
-
- [ ] **New libraries needed?** Confirm compatibility with the project's toolchain.
|
|
114
|
+
- [ ] **New libraries needed?** Confirm npm/pip compatibility with the project's toolchain (Vite 7, SvelteKit 5, Python 3.11+).
|
|
31
115
|
- [ ] **Bundle size impact:** Estimate size delta for any new frontend dependencies.
|
|
32
116
|
- [ ] **Security:** Does the library handle untrusted input safely? Do we need sanitization?
|
|
117
|
+
- [ ] **Lock-file check before designing a new code-branch.** Before specifying a new config-reading branch, env-switching helper, or conditional client construction, grep the lock-file (`backend/uv.lock`, `frontend/package-lock.json`) to confirm the installed dependency version doesn't already provide the semantic natively. Run AB's tentpole decision (D-NO-CONFIG-CODE-CHANGE) rested on `boto3 1.42.89` honoring `AWS_ENDPOINT_URL` natively via botocore ≥ 1.31 — discovered by grepping uv.lock before drafting § 5. Zero edits to `config.py` because the dep did the work. When the native path exists, it's almost always the right choice (no new test surface, no production behavior drift, no new env-to-code branch to maintain). Make the check a § 13 Phase A command when the feature's shape suggests "add a code branch that reads new env var."
|
|
33
118
|
|
|
34
119
|
### Test Infrastructure
|
|
35
|
-
- [ ] **Test runners available?** Verify the test tooling exists before specifying tests
|
|
120
|
+
- [ ] **Test runners available?** Verify the test tooling exists before specifying tests:
|
|
121
|
+
- Backend: `uv run pytest` (available via docker exec)
|
|
122
|
+
- Frontend e2e: Playwright (available)
|
|
123
|
+
- Frontend component: Vitest (`npm run test:unit` — config exists, happy-dom + @testing-library/svelte; first Vitest test in a Svelte+Vite repo needs three config wires: jsdom→happy-dom, `resolve.conditions: ['browser']`, and the `$lib` alias — budget ~15 min on the first file).
|
|
36
124
|
- [ ] **Test plan:** Define test cases covering: happy path, edge cases, error cases, regressions.
|
|
37
|
-
- [ ] **Test
|
|
125
|
+
- [ ] **Test responsibility split.** Label every test in § 7 as either "Required (Implementation)" — common-case behavioral, gates Impl handoff — or "Suggested (QA)" — edge cases, extended scenarios, QA's discretion. Don't overload the Required tier; keep it to tests that would catch a broken feature. Implementation proves the feature works; QA proves the system works; Spec provides the shared baseline.
|
|
126
|
+
- [ ] **Parametrized tests expand for counting — AND Spec must commit to the form, not punt to Impl.** When a § 7 test uses `it.each(ARRAY)` (vitest), `@pytest.mark.parametrize` (pytest), `@parameterized.expand` (unittest), or similar expansion syntax, the test count in § 3 targets, § 7.3 adds/rewrites, and any § 14 "pytest total = N" gate must count expanded cases — `|ARRAY|` iterations, not the single source `it()` / `def test_*()` line. Run AD-b's § 7.3 targeted "6 Required, 14 total workspace" but vitest reported 13 in-package / 21 workspace because T3's `it.each(BUNDLED_SKILLS)` over 8 skills expanded to 8 sub-cases. The merge message read "13 via `it.each` expansion" — a deviation-looking diagnostic that was actually a spec-bookkeeping miss. Flag parametrized tests explicitly too ("T3 is `it.each` over 8 items; expansion → 8 sub-cases") so Impl/QA don't have to infer from reading § 6 source. **Spec commitment rule (AH-b strengthening):** when spec writes a matrix-style test (URI-scheme validation, reserved-name matrix, grant-type allowlist, etc.), Spec MUST pick ONE form in § 7 and commit to the matching count in § 3 / § 7.3 / § 14 — either (a) flat standalone tests projecting N functions, or (b) parametrize-expanded with an enumerated matrix projecting |matrix| cases. Do NOT write "Impl's choice" or "if Impl parametrizes" — that produces a verification-count gap at QA time (Run AH-b spec § 15 offered parametrize as Impl's choice; actual 1,037 vs projected 1,013 diverged by the 24 expanded cases). If (b), list the parametrize matrix inline in § 6/§ 7 so Impl + QA have the canonical expansion.
|
|
127
|
+
- [ ] **Test data setup (UI features):** If the feature involves visual changes, include concrete API calls (curl commands or fetch snippets) that create the right data shape for manual or Playwright verification. Don't rely on code review alone for visual correctness.
|
|
128
|
+
- [ ] **Structural invariant tests for refactors.** Every refactor that enforces a module-surface boundary (count of files in a directory, no-imports-from-X rule, class hierarchy depth, etc.) needs a pytest assertion that fails loudly on regression. Without it, the next feature can quietly violate the invariant and nobody notices until the next arch run. Include the invariant test in § 6/§ 7 of the spec; encode the count or rule literally.
|
|
129
|
+
- [ ] **Grep-invariant tests self-match — allow-list the test file, not just the canonical source.** When a § 7 invariant test greps source for a literal pattern (`r'"cli_"'`, `r'@router\.get'`, `r'from core.auth.tokens import'`, etc.), the test file containing the regex literal always matches itself because the regex source and the target string are the same bytes on disk. The test allow-list must include `tests/<invariant_test_file>.py` in addition to the canonical source file(s). Run AH-b's `test_cli_prefix_literal_centralized` was initially specced to allow only `core/auth/tokens.py`; Impl added the test file to the allow-set after the first run matched itself. Sub-rule is cheap: any grep-based invariant test allow-list must include the file that defines the test. When naming the allow-list in § 6/§ 7, include both. This is the same class as the "artifact-testing location" rule but narrower — about self-reference inside a single test, not cross-repo placement.
|
|
130
|
+
- [ ] **httpx Secure-cookie test trap.** If the spec involves session cookies with the `Secure` attribute, test clients must use `https://test` as `base_url`. With `http://test`, httpx silently drops the Secure cookie and the test passes for the wrong reason (cookie not set, no auth check). Spec the `base_url` explicitly in § 7 fixtures.
|
|
131
|
+
- [ ] **Mock library auto-injection silently breaks both presence AND absence assertions.** When the chosen mock library auto-persists or auto-injects HTTP state across requests within a single test — msw v2's internal cookie-store (via `@mswjs/cookies` + tough-cookie), fetch-mock with automatic Cookie replay, CORS emulation, auth-header injection — a test that asserts either (a) absence of a header the library may auto-add, OR (b) presence of a header the library re-emits regardless of the code under test, silently passes for the wrong reason. Run AD-a caught msw v2 replaying Set-Cookie within a test regardless of the SDK's middleware chain: test 5 (absence) appeared to fail impossibly, test 4 (presence) appeared to pass even with `onResponse` fully disabled. Fix pattern: for any test that asserts header-level behavior controlled by the code under test (cookie middleware, auth middleware, retry logic), bypass the mock library with a hand-crafted `fetch` (or http interceptor) that captures `Request` objects directly. Spec the bypass in § 7 for both directions when the mock library has auto-injection semantics — one test of each direction without the bypass is not enough; both can silently pass.
|
|
132
|
+
- [ ] **Pydantic subclass field-collision audit on `**parent.model_dump()` construction.** When § 6 code uses `Subclass(**parent_obj.model_dump(), <kwarg>=<value>)` to construct a subclass-extending model, grep the parent class's field declarations for `<kwarg>` — if the name already appears on the parent, the construction raises `TypeError: multiple values for keyword argument`. Grep pattern: inside the parent class's model file, `^ <kwarg>\s*:` on the same scope as the subclass inherits from. Run AG-c's `RedeemInviteResponse(**new_actor.model_dump(), default_space_id=default_space_id)` collided because `Actor` already declares `default_space_id` — 37 tests failed on first Impl run. Fix at construction: pop the field before unpack and add `= None` default on the subclass field for pyright. Subclass-pattern risk is recurring (AG-b's `ActorCredentialsResponse(Actor)` didn't collide by luck, not audit); catch it at draft time.
|
|
133
|
+
- [ ] **Decorator-string test grep when modifying route decorators.** When a sweep adds a kwarg (`response_model=`) or otherwise changes the source-text rendering of a `@router.<verb>(...)` decorator, grep `tests/` for literal decorator strings (`'@router.get("<path>")'`, `'@router.post("<path>")'`, etc.). Existing architectural tests that match the decorator with `==` or `line.strip() == '<literal>'` will fail silently when the kwarg is added. Run AG-b's existing `test_get_orientation_handler_is_thin` matched the literal `@router.get("/api/actors/me/orientation")` string — adding `response_model=` broke it until Impl broadened the match to `"@router.get(" in line and "<path>" in line`. Grep template: `grep -rn '"@router\.' backend/tests/` — any hit is a candidate for broadening (or at least a flag to Impl that the test may need adjustment).
|
|
134
|
+
- [ ] **Container-mount ENDPOINT asymmetry for invariant test paths.** The backend container's WORKDIR is `/app`, and `docker-compose.yml` maps `backend/api/` → `/app/api/` (NOT `backend/` → `/app/backend/`). When a § 7 structural invariant test reads a route file via `Path("/app/api/routes/<file>.py")`, that is the correct path inside the container — NOT `Path("/app/backend/api/routes/<file>.py")`. Verify the bind-mount ENDPOINT, not the host-side source dir. Run AG-a, AG-b, AND AG-c ALL shipped specs with `/app/backend/...` paths; Impl corrected all three. Strong recurring defect. Grep template at spec draft time: check an existing peer structural test in the same file (`backend/tests/test_architecture.py` is the common home) for how it constructs the path — copy that pattern, don't reinvent. Related to (but distinct from) the existing "container-mount assumptions for repo-rooted test paths" rule below, which covers walks that cross up the tree.
|
|
135
|
+
- [ ] **Container-mount assumptions for repo-rooted test paths.** When a § 7 test reads a file *outside* `backend/` or `frontend/` via `Path(__file__).resolve().parents[N]` (or any walk that crosses up the repo tree), verify the target directory is bind-mounted into the test container before prescribing the test. Many repo subtrees (`.claude/`, `docs/`, `infra/`) are not in the default backend bind-mount set, so the walk produces a path the container can't read. Two fixes: (a) bind-mount the target directory into `docker-compose.yml` (read-only is fine for test artifacts), or (b) locate the test in a host-run repo (moot-cli structural tests, frontend Vitest) where the filesystem walk resolves naturally. Run Z caught this mid-Q-gate via an in-thread amendment; one round-trip avoidable by checking the mount surface at § 13. Pairs with the existing "container bind-mount surface for committed generated artifacts" rule.
|
|
136
|
+
- [ ] **Assert-skip-condition OR document-as-load-bearing for environment-preconditioned tests.** Tests that call `pytest.skip()` (or equivalent) conditionally on an environmental precondition (bind mount present, env var set, service reachable, feature-flag on) silently conceal under-run coverage in the "N passed, M skipped" summary. Misconfigured environment → "13 passed, 2 skipped" looks healthy when the 2 skipped were load-bearing structural invariants that never ran. **Fix class:** either (a) assert the precondition explicitly — `assert target_file.exists(), "<precondition> must hold; update docker-compose.yml bind mounts"` — so misconfiguration fails loudly, OR (b) document the precondition as load-bearing in the test's docstring + spec § 5 prerequisites ("this test requires `./team.toml:/app/team.toml:ro` in docker-compose.yml; misconfiguration surfaces as skip"). Run TE-1 T12 + T13 shipped with `pytest.skip()` paths when mounts were missing; QA flagged on retro. Choose (a) for structural invariants whose skip has non-zero blast radius; (b) for legitimately-optional tests (e.g., LocalStack-only coverage).
|
|
137
|
+
- [ ] **Artifact-testing location: test the artifact in the repo that owns it.** When a feature ships a bundled artifact (a skill copied into the moot-cli template, a generated client copied into another repo, a doc copied into a sibling), the structural test for that artifact lives in the *consuming* repo's test suite, not the *producing* repo's. The producing repo's bundling test verifies "I emit the artifact"; the consuming repo's structural test verifies "the artifact has the expected shape." Mixing the two leads to the container-mount problem above.
|
|
138
|
+
- [ ] **Parity tests point at the feat worktree, not host main, for pre-ship validation.** When a test compares two copies of an artifact across repos (e.g., `test_claude_template_matches_convo` with `CONVO_REPO_PATH`), point the env var at the feat worktree (`/workspaces/convo/.worktrees/<role>/`) NOT the host main path. The host main hasn't been squash-merged yet pre-ship, so the parity test fails on a stale-source comparison — correct test behavior, but trips QA into diagnosing a phantom failure. Spec specifies the env-var target explicitly in § 7. After ship, the test runs against host main as the regression guard. Run AA caught this in QA's post-ship retro.
|
|
139
|
+
- [ ] **When § 10 names a Required test explicitly, mark whether it's standalone or embedded.** When a Required test is named (`test_<feature>_<behavior>`), Spec should indicate whether to (a) create a new top-level test function or (b) add the assertions inside an existing test. Both can satisfy intent; without the indication, Impl picks one and QA may flag the choice in verification. Run AA had this on T2 — Spec named `test_launch_script_drops_skip_permissions`, Impl embedded into `test_cmd_exec_launch_full_flow`. Both passed; the ambiguity cost ~2 min of QA verification thought.
|
|
140
|
+
- [ ] **Redis-seeding tests flagged xdist-sensitive with key-namespace guard.** When a new backend test seeds Redis state by key (not just by member value) — e.g., `SADD ws:connected:<actor_id> <session_id>` where `<actor_id>` is shared across workers — § 14 flags the test as xdist-sensitive and requires a test-unique key-namespace guard: prefix the Redis key with a worker/test-unique identifier (e.g., `uuid4()` or the xdist worker ID) so parallel workers don't collide. A-2 B1 (`test_session_count_returns_scard_union`) flaked under `-n auto` because two workers wrote to overlapping key spaces; passed in isolation. Same class as AH-f B1-reconciler + A-1 rate-limit-window + B-2 subprocess-env flakes. **Without the namespace guard, the test ships as "passes in isolation" flaky** — annoying in retros, not blocking, but avoidable. Spec template: when § 6 introduces Redis-key seeding, § 7/§ 14 adds "xdist guard: use `ws_connected_set_key(f'{uuid.uuid4().hex}_{actor_id}')`" or equivalent keyspace isolation.
|
|
141
|
+
- [ ] **Workaround-fixture grep pairs literal-key AND helper-name patterns.** When Phase-A enumerates workaround fixtures slated for removal (tests that paper over the bug the current run fixes), pair the literal-key pattern with a helper-name grep. Literal-key (catches direct call sites): `schema_name\s*=\s*["\']<prefix>`. Helper-name (catches f-string / computed-value hiding shapes): `_make_.*_with_.*|_fixture_with_.*|_workaround_.*|create_.*_with_.*`. F-string interpolation hides the literal key from the straightforward regex, but the helper-definition naming usually preserves the signal. ONB-1's `_make_tenant_with_tenant_schema` fixture at `test_te3_export_import.py:137` defined its schema via `schema_name = f"tenant_te3_{slug}_..."`; Spec's strict literal-key grep missed it, Impl's helper-name grep caught it pre-SPEC-READY via the pre-draft ping. Codify at § 13 when enumerating workarounds to remove.
|
|
142
|
+
|
|
143
|
+
### asyncpg Type Safety
|
|
144
|
+
- [ ] **All datetime params to asyncpg must be Python `datetime` objects, not ISO strings.** asyncpg does NOT coerce strings to timestamptz. This has caused bugs in 3 consecutive features. If your spec has SQL queries with timestamptz parameters, ensure the code calls `datetime.fromisoformat()` before passing to asyncpg, or accepts `str | datetime` and normalizes.
|
|
38
145
|
|
|
39
146
|
### Interface Design
|
|
40
|
-
- [ ] **Hide internal topology from consumers.** If the spec requires the client to enumerate sources, manage subscriptions, or know the internal structure of the data, the abstraction is wrong. The server should aggregate.
|
|
147
|
+
- [ ] **Hide internal topology from consumers.** If the spec requires the client to enumerate sources, manage subscriptions, or know the internal structure of the data, the abstraction is wrong. The server should aggregate. See VISION.md design philosophy.
|
|
148
|
+
- [ ] **Minimize total complexity, not one side's complexity.** Design interfaces from both sides — consumer and producer. The right shape is the one that minimizes the sum of work across all participants, not the one that pushes complexity onto whichever side you happen to be touching. A change that simplifies the producer by 10 lines but forces every consumer to add 5 lines of plumbing is usually wrong.
|
|
41
149
|
|
|
42
150
|
### Cross-Cutting
|
|
43
|
-
- [ ] **Verify claims in the feature scope.** Don't trust Product's assumptions about what exists — check the code.
|
|
151
|
+
- [ ] **Verify claims in the feature scope.** Don't trust Product's assumptions about what exists — check the code. Grep/read every Product claim about existing infrastructure, module names, function signatures, and implementation-slice steps. Product docs drift; some "to-do" steps are already shipped, some module names are aspirational, some function signatures have been renamed. Verify before § 5.
|
|
152
|
+
- [ ] **Resolve Product-doc silences in-draft, not as Open Questions.** When Product's doc is decisive on N decisions and silent on the N+1th (usually a small UX detail or naming convention), Spec resolves in-draft in the simpler direction and documents as a D-decision with a `D-UI-*` or `D-NAMING-*` label. Do NOT escalate small silences as Open Questions — they round-trip an extra Product/Pat cycle for a decision Spec can make autonomously. Validated across 10 consecutive runs.
|
|
153
|
+
- [ ] **F-label findings for adjacent-but-out-of-scope discoveries.** When § 13 grounding surfaces a real bug or gap that is *not* part of the current scope (phantom endpoint, stale comment, missing implementation), document it in § 11 as an F-finding with: (a) the exact backend line + the exact moot-cli/frontend line, (b) the proposed disposition (fix later, deprioritize, etc.), (c) explicit "deferred to Product follow-up" tag. The run ships on-scope; Product picks up the F-findings during retro synthesis without Spec re-researching. Validated on Run X (F1 phantom endpoint, F2 phantom endpoint).
|
|
154
|
+
- [ ] **Augment, don't fork, dispatching functions.** When an existing function already dispatches on the axis your feature cares about (e.g., agent-vs-human branch, role type, sponsor flag), augment the relevant branch instead of forking a new parallel helper. Check for existing dispatchers before designing a new function. Run R folded a `set_key_and_connect` proposal into `actor_store.rotate_api_key`'s existing agent branch (5 LOC vs ~30 LOC fork).
|
|
44
155
|
- [ ] **Accessibility:** Any ARIA labels, keyboard navigation, or screen reader concerns?
|
|
45
156
|
- [ ] **Performance:** Any large data sets, heavy computation, or expensive renders?
|
|
46
157
|
- [ ] **Security:** XSS, injection, auth boundaries?
|
|
158
|
+
- [ ] **bcrypt 72-byte limit applies to production credentials AND to spec/test literals.** `bcrypt.hashpw` errors on inputs > 72 bytes. When designing token storage or any bcrypt-hashed credential, budget the prefix + random portion to fit under 72 bytes; spec the byte budget explicitly in § 4 if any new credential format is introduced. **Extension from AH-c:** the same limit applies to ANY `bcrypt.hashpw(...)` call a spec puts in place — including timing-dummy hashes (`_DUMMY_BCRYPT_HASH = bcrypt.hashpw(b"oat_timing_dummy_" + b"0" * 60, ...)` = 77 bytes — FAILS at module import). Production-shape tokens are 64 bytes (e.g., `oat_` + 60 chars), which is the correct width for any dummy literal the spec ships. Rule: every spec-block `bcrypt.hashpw(LITERAL, ...)` must be ≤ 72 bytes, and should match production-token width (typically 64) when the literal is a test / timing / fixture constant. Grep check at draft time: `grep -A1 "bcrypt\.hashpw" <spec>` → for each, count the literal's byte length.
|
|
159
|
+
- [ ] **Cross-repo HTTP mocks.** When the spec calls for mocking convo HTTP endpoints in another repo (moot-cli respx mocks, npm CLI fixtures, etc.), anchor the mock body shape to the actual response model in `backend/api/` — grep the route handler for its return type and copy fields exactly. Author-imagined mock dicts self-confirm and ship broken integrations (Run R / moot init bug). Now that `docs/api/openapi.yaml` exists (Run W / b6f6d13), mocks should be generated from it.
|
|
160
|
+
- [ ] **OAS shape gap: response bodies are NOT body-authoritative on convo today.** The convo OAS at `docs/api/openapi.yaml` is body-authoritative for *request* bodies (`$ref`-linked Pydantic models) but NOT for *response* bodies — most routes don't declare `response_model=`, so the OAS emits `additionalProperties: true` on the response schema. "Anchor mocks to OAS" means: anchor the *paths and request bodies* to OAS, but cross-check response body shapes against the route handler's `model_dump()` source (i.e., grep `backend/core/models/models.py` for the model the handler returns). Until convo adds `response_model=` declarations to every route, this two-layer anchoring is required for cross-repo mock correctness. Validated on Run X / oas-mock-refresh.
|
|
161
|
+
- [ ] **`@respx.mock` is symmetric — refreshing mocks must add AND delete in the same pass.** `@respx.mock` decorator defaults to `assert_all_called=True AND assert_all_mocked=True`. If a refresh adds a new endpoint stub without deleting a stale one (or vice versa), tests flip from one AssertionError type to the other rather than passing. When the spec changes the set of mocked endpoints, list both adds and deletes explicitly in § 7, and verify in § 13 that the symmetric pair is complete.
|
|
162
|
+
- [ ] **Container bind-mount surface for committed generated artifacts.** When a spec commits a generated artifact (`docs/api/openapi.yaml`, generated SDK, etc.) and the artifact is generated *inside* the container, check whether the artifact's directory is bind-mounted into the container's filesystem. If not, the spec must add a compose mount (`./docs:/app/docs:ro` for read-only, or rw if the container writes back). Forgetting this causes the generation step to write to a container-local path that disappears on next start, with no visible host-side update. Run W / D1 emerged from this gotcha.
|
|
163
|
+
- [ ] **Stdout-redirect regen pattern for committed generated artifacts.** When the artifact is mechanically derived from runtime state (e.g., `docs/api/openapi.yaml` from `app.openapi()`), prefer a stdout-writing generator + host-side `>` redirect over a generator that writes to a path:
|
|
164
|
+
```bash
|
|
165
|
+
docker compose exec backend uv run python -m api.openapi_gen > docs/api/openapi.yaml
|
|
166
|
+
```
|
|
167
|
+
Benefits: (1) no writable bind mount needed (read-only is enough or no mount at all if the input doesn't need docs/), (2) one-way data flow (container generates, host commits), (3) no Makefile or shell wrapper. Reusable any time a committed artifact is derived mechanically from runtime state.
|
|
168
|
+
- [ ] **Drift-gate sanity check (Q-3 pattern).** For artifact-gate specs, the QA spot-check plan should include a drift-injection test: append a known-bad token to the committed artifact, confirm the gate FAILS with the correct regen hint, restore, confirm it PASSES. Actively proving the gate bites is more trustworthy than just confirming it passes today. Validated on Run W (Q-3 was the most-valuable QA spot-check).
|
|
169
|
+
- [ ] **Drift-inject on multi-occurrence source files: use Edit tool, not `sed`.** When QA or Impl performs a drift-inject Q-gate on a source file where the target pattern appears more than once (same `response_model=X` kwarg on multiple routes, same type annotation on multiple fields, etc.), `sed -i 's|pattern|replacement|'` mutates EVERY occurrence and silently corrupts adjacent endpoints / fields that share the pattern. Run AH-d's Q-gate used `sed` on a `response_model=ActorResponse` literal and caught an adjacent `/api/actors/me/agents` endpoint; the drift-inject restore cleaned it up, but the diff was noisy. **Rule:** drift-inject using the Edit tool (anchor on surrounding context to make the match unique) or `sed -n '<line>s|...'` (specific line number), not a bare `sed -i 's|...'`. Applies any time a spec's § 14 Q-gate modifies a file with grep-count > 1 for the target pattern. QA's § 3 baseline should include the grep-count per Q-gate target to flag multi-occurrence patterns up front.
|
|
170
|
+
- [ ] **FastAPI `response_model=` drift-inject: kwarg removal alone is NOT an effective gate.** FastAPI 0.115+ infers `response_model` from the handler's return-type annotation (`-> SpaceInfo`), so a drift-inject that only removes the `response_model=<X>` decorator kwarg leaves the OAS unchanged. The route continues to emit the same typed schema because the return annotation is authoritative. Effective drift-inject options: (a) delete the entire route handler (OAS drops the path entirely — Run AE's Q11 pattern); (b) remove BOTH the `response_model=` kwarg AND the return annotation (OAS falls back to `additionalProperties: true`); (c) change the return annotation to a different model (OAS schema changes). Spec drift-inject Q-gates on `response_model=` routes accordingly — Run AE shipped with Q10 targeting only the kwarg, which QA confirmed didn't bite.
|
|
171
|
+
- [ ] **Empty-body responses use `response_class=Response`, NOT `response_model=None`.** When an RFC (or equivalent contract) mandates an empty response body — RFC 7009 `/oauth/revoke` 200-empty, status-ping endpoints, admin no-content operations — declare the route with starlette's `Response` via `response_class=Response` (and optionally `status_code=<N>` on the decorator). Do NOT use `response_model=None`, which FastAPI serializes as the JSON literal `null` — non-empty body, violates the contract, and shows up in OAS as a nullable schema instead of "no response schema." AH-c's `POST /oauth/revoke` is the first use in the OAuth surface; D-RESPONSE-CLASS-RESPONSE convention canonicalized from AH-c synthesis. **OAS drift-inject extension** (AE rule extended): Q-gate for a `response_class=Response` route MUST use handler-deletion or route-rename, NOT `response_class=` kwarg manipulation — FastAPI's introspection falls back on the class default, so kwarg removal may not bite consistently. Same behavior class as the AE `response_model=` gate; apply the same mitigation.
|
|
172
|
+
- [ ] **Timing-defense completeness audit on all early-exit paths.** When spec specifies a timing-side-channel defense (dummy-op, constant-time compare, equalized error path) on a function that can return through multiple paths, enumerate EVERY early-exit path and confirm each one either (a) runs the dummy-op before returning, OR (b) is explicitly documented as "intentional fast-path, non-leaking." Four path categories must be covered for any don't-leak-existence function (revoke, login, reset-token, email-exists-check): (1) **input-format-error paths** — decode / parse raises, malformed prefix, empty input; (2) **lookup-miss paths** — DB returns no rows; (3) **attribute-mismatch paths** — rows found but per-row verification (bcrypt, hmac, etc.) fails; (4) **already-in-target-state paths** — already revoked, already consumed, already disabled. AH-c's spec covered only path (3); Impl extended dummy-op to cover all four (QA classified as security improvement). Grounding check: list exit paths in § 13 before writing § 6, confirm defense applies to each. Related Suggested-tier test: the timing-probe matrix should exercise each path category, not just one (add `{malformed-prefix, malformed-id, already-revoked}` to the probe alongside `{unknown, wrong-attr, happy-path}`).
|
|
173
|
+
- [ ] **Silent-no-op invariant for don't-leak-existence endpoints: store function catches decode errors.** RFC 7009-style "don't leak token existence" endpoints (also applicable to forgot-password, email-exists-check, reset-token-confirm) require that **any `ValueError` / `KeyError` / decode-failure raised during input processing is caught WITHIN the store function and treated as the same silent-no-op path as an unknown entity**. The route handler must never see these exceptions. Without this rule, a store function that raises on malformed input produces a 500 response (or a fast-path 400) instead of the 200-empty-body the RFC requires — both a functional bug AND a timing-distinguishable response (fast fail vs slow happy-path). Spec must state the contract explicitly in § 4 / § 6: "Any decode/parse error on `token` or `client_id` is caught within `revoke_token` and returned as the silent no-op." Impl then catches at the store boundary, not at the route. AH-c Spec flagged as missed-from-spec; Impl handled correctly but the spec should have pinned it. Paired with the timing-defense completeness rule above.
|
|
174
|
+
- [ ] **Multi-package npm-workspace fresh worktree: root build before per-package gates.** When Impl/QA is working in a fresh multi-package npm-workspace worktree (no inherited `dist/` from the host clone), the first step before any per-package test/build/lint gate must be a root `npm run build` (builds all workspaces). Per-package gates that depend on sibling packages' compiled output will fail TS2307 or run against stale state otherwise. Run AD-c caught this on T14 (shebang + exec bit check): `npm run build -w @mootup/moot-cli` failed until sibling sdk + templates had `dist/`. Spec this sequencing explicitly in § 14 / § 16 for multi-package runs: "Step 0: `npm run build` (root, all workspaces). Subsequent per-package gates assume sibling `dist/` present."
|
|
175
|
+
- [ ] **Bin-shipping packages need a shebang + exec-bit structural test.** For any package whose `package.json` declares a `bin` entry, include a § 7 structural test that asserts (a) the bin file starts with a correct shebang (`#!/usr/bin/env node` or equivalent) and (b) the file mode has the executable bit set (`statSync(bin).mode & 0o111`). CI publishes ship broken bins when `chmod +x` runs against a stale/missing `dist/`; normal invocation tests (`mootup --help`) use `node dist/cli.js` directly and don't exercise the shebang/exec-bit path. Run AD-c's T14 caught exactly this during QA's fresh-worktree run. Apply to every package with a `bin` entry.
|
|
176
|
+
- [ ] **Artifact-count Q-gates over built outputs must specify a fresh-build prefix.** When a Q-gate asserts a count over a built artifact (`tar -tf mootup-*.whl | wc -l`, `unzip -l *.zip`, `ls dist/*.{js,d.ts}`), the gate wording must include an explicit pre-step that removes the prior build output before rebuilding — `rm -rf dist/ && <build-command>` or equivalent. A stale cached artifact from a prior build can silently satisfy a lower count than the current source should produce, passing the gate for the wrong reason. Run AD-b Q7 asserted "34 template files in the wheel" but the cached `dist/mootup-*.whl` had 28 (predated subsequent template additions); QA caught it by running `rm -rf dist/ && uv build` explicitly. Don't rely on QA's instinct to refresh; spec the refresh step in § 14.
|
|
177
|
+
- [ ] **Proxy-layer testing for new URL prefixes.** Backend unit tests using `ASGITransport` bypass Vite + Caddy routing. When the spec adds a top-level URL prefix (`/api/x/`, `/auth/x/`, etc.), require a browser-level smoke test through both proxies before the gate passes — Vite's dev server proxies, Caddy's reverse-proxy rules. Flag the proxy configs that need updating in § 11.
|
|
47
178
|
|
|
48
179
|
### Spec Document
|
|
49
180
|
- [ ] **Reference code by function/class name + file path** — never use line numbers (they drift between commits)
|
|
50
181
|
- [ ] **Files to create/modify** — explicit table with file paths and actions
|
|
51
182
|
- [ ] **Open questions** — list anything that needs Product input
|
|
52
183
|
- [ ] **Out of scope** — confirm alignment with feature boundaries
|
|
184
|
+
- [ ] **TypeScript NodeNext: `.js` extensions on relative imports in § 6 drop-ins.** When the tsconfig uses `"module": "NodeNext"` (or `"NodeNext"`-derived), relative imports MUST include `.js` extensions at TypeScript compile time — `./foo` fails `tsc --noEmit` with "Relative import paths need explicit file extensions in ECMAScript imports." Spec § 6 drop-ins that ship TS source with relative imports should include the `.js` extensions from the start (`./cookies.js`, `./generated/paths.js`, etc.). Placing the rule on the spec side rather than the Impl side means one spec review catches it for all future runs; the alternative (Impl appends `.js` after lint fails) is mechanical but recurring. Run AD-a caught this on the first JS run in the pipeline.
|
|
185
|
+
- [ ] **npm workspace lock-file must include all workspace packages before the first `npm install`.** When specifying a new npm-workspaces monorepo with N package stubs, every stub (even empty placeholder `package.json`) must exist BEFORE the first `npm install` that generates `package-lock.json`, or the lock will be missing the stubs and `npm ci` will fail with "Missing: @scope/pkg@0.0.0 from lock file." Two fixes, Spec's choice: (a) place all stubs in § 6 BEFORE the `npm install` step in § 16 touch order, so the lock captures them on first generation; (b) regenerate the lock as part of the stubs commit. Either works; option (a) is cleaner and cheaper. Run AD-a QA caught this because Impl's touch order placed stubs after `npm install` — Q1 `npm ci` failed until QA regenerated.
|
|
186
|
+
- [ ] **No half-drafts in § 6/§ 7 source blocks.** Grep the spec for `NotImplementedError`, `TODO`, `FIXME`, `placeholder`, `XXX` before SPEC-READY. § 6/§ 7 source blocks are paste-and-go by spec contract; a stub there is either dead scaffolding (delete) or an unresolved decision (decide and rewrite, OR escalate to Product as `message_type="question"`). Run T's Spec retro flagged a `_build_claude_cmd` placeholder caught at draft review and removed before SPEC-READY — the cheap audit prevents the much more expensive Impl-time discover-and-escalate cycle. See `feedback_dead_helper_half_draft.md`.
|
|
187
|
+
- [ ] **Operator-name scrub of fenced code blocks before SPEC-READY.** CLAUDE.md's "operator identity in artifacts" rule forbids operator names (personal or role-neutral role-labels like "Pat-locked", "per Pat", "Pat-resolved") in any durable repo artifact — commit messages, code comments, docstrings, test names, fixture names, log messages, SQL comments. **Narrative spec prose is chat-context and exempt.** **Source blocks are durable artifacts and are NOT exempt** — whatever lands in a `` ```python `` / `` ```sql `` / `` ```toml `` / `` ```typescript `` block gets pasted verbatim into the repo. Spec source blocks inherit narrative prose during drafting; operator-name references ride into docstrings/comments/identifiers without scrubbing unless explicitly audited. **Audit pattern:** `grep -nE 'Pat-locked|per Pat|Pat said|Pat approved|Pat-resolved|Pat-confirmed|Pat-direction' <spec-file>.md` restricted to fenced code blocks (awk/sed out the narrative). Every hit requires substitution with the corresponding D-decision ID (e.g., `Pat-locked "DB index, git content"` → `D-DB-INDEX-NOT-CONTENT`) or a neutral paraphrase ("design decision:"). Run TE-1 leaked 3 sites (`TeamArchetype` docstring + 2 SQL comments); QA repaired at commit `6075765`. 2nd or 3rd retro with this class of defect; mechanical grep at SPEC-READY catches it before Impl pastes.
|
|
188
|
+
|
|
189
|
+
### § 7 Verification Gate Wording
|
|
190
|
+
- [ ] **grep-pattern tolerance for TypeScript literal forms.** When a § 14 Q-gate grep targets a string literal in TypeScript code, a bare single-quoted pattern (`grep "'thread:'"`) misses the idiomatic template-literal form (`` `thread:${var}` ``). Authors write prefixed strings as template literals when interpolation is involved; grep authored against the wrong quote form fails-green-on-ts (invariant met behaviorally; grep pattern doesn't find it). Two fix patterns: (a) `grep -nE "['\\\`]<prefix>"` accepts all quote forms, (b) phrase the Q-gate as a behavioral assertion ("`keyFor()` contains the literal substring `thread:`") rather than a literal grep. Run UI-1 Q8: spec prescribed `grep -n "'thread:'"` but Impl's `` `thread:${thread_id}` `` satisfied the invariant; QA flagged as doc-note only. Minor; mechanical.
|
|
191
|
+
- [ ] **Single-site raise-assertion Q-gates target the construction literal, not the string label.** When an invariant is "feature X raises from exactly one site," a grep over the string label (e.g., `grep -nR "revoked_installation" backend/`) counts every occurrence — docstrings, comments, test assertions, unrelated route handlers — inflating the count above the invariant's ceiling even when the behavioral invariant is met. Tighter pattern: target the dict/HTTPException **construction** site that carries the label, not the label itself. Template: `grep -nE 'detail=\{["\']error["\']:\s*["\']<label>["\']' backend/` → matches only the actual `HTTPException(detail={"error": "<label>", ...})` site. Same class as UI-1's TS-literal gap above — both are "behavioral invariant met, grep pattern imprecise" failures. Second data point (UI-1 Q8 + AH-e-bootstrap-backend Q6). Promoted to a stable rule: when a Q-gate asserts a count ceiling on an invariant, the pattern MUST target the construction / raise / decorator / definition site, not the bare label string that appears in prose + tests + call sites. If the count still needs to include the test file, the Q-gate says so explicitly.
|
|
192
|
+
- [ ] **Debounce TTL > debounce sleep (4× recommended).** When spec prescribes a fire-and-forget-with-debounce pattern using a Redis "pending" key + `asyncio.sleep(N)` (or equivalent), the TTL on the pending key MUST exceed the sleep interval by ≥3×, or the payload-own-check at wake time is a race — the key can expire exactly when the task wakes to re-read it. Template: `PX=4000` for `sleep(1.0)`; `PX=2000` for `sleep(0.5)`. Run AH-f Impl caught this pre-merge (§ 6.1 step 3 set `flip:pending:<actor_id>` to `PX=500` matching `sleep(_DEBOUNCE_MS/1000)` — 50/50 random-fail). Spec authors specifying debounced-flip patterns set the TTL headroom explicitly; a bare PX-equals-sleep-ms value is a defect.
|
|
193
|
+
- [ ] **Ratio-sentence sanity check: whole-diff ≥ source-only always.** When composing the kickoff-ratio sentence, verify whole-diff ≥ source-only before posting. Tests add LOC without removing spec-equivalent, so the inequality holds mechanically; an inverted projection (whole-diff < source-only) is a spec-draft defect and misleads the trend audit. Run AH-f Spec shipped a kickoff ack with "0.8 whole-diff / 1.7 source-only" (inverted); scoping doc's honest "~1.3-1.8 whole-diff" was right but got internalized wrong. Mechanical fix at sentence-composition time; one `<` check.
|
|
194
|
+
- [ ] **Removing a side-effect: grep existing-test assertions that depend on the trigger.** When spec deletes a side-effect from a route/function (e.g., `rotate_api_key → is_connected = TRUE` removal in AH-f), grep every existing test that uses the affected code path for assertions that depended on the side-effect firing — NOT just the column-value assertions, but ALL downstream status / response-shape / behavioral assertions. Rewrite affected tests in the same spec § 7 amendment; don't leave "tests survive unchanged" text that implicitly requires the deleted side-effect. Run AH-f § 6.7 said T5/T6 "survive unchanged" but both asserted `status == "released"` which required the rotate side-effect; Impl amended with `set_agent_connected(True)` preconditions at § 7 rewrite time.
|
|
195
|
+
- [ ] **Retro-flake-fix bundling hygiene: re-read `project_known_issues.md` at § 13 grounding when the scoping doc bundles a carried-forward fix.** When a scoping doc incorporates a retro-flake fix from an earlier run (e.g., B-2 bundled B-1's OAS-drift flake), the original root-cause analysis lives in the retro entry — NOT always in the new doc's framing. At § 13 grounding, re-read the retro source-of-truth (`project_known_issues.md` entry, prior synthesis doc, or memory file) + verify the new scoping doc's framing matches the original root-cause. If they diverge, escalate as a scope-clarification to Product before Impl commits. Run B-2 F-SCOPING-B1-FLAKE-MIS-DIAGNOSIS: scoping doc framed the B-1 flake as a middleware / logging-filter issue; the retro entry correctly identified it as test-isolation (probe routes leaking onto global `app.routes`). Cheap re-read at grounding catches conflation; late catch forces spec-amendment mid-run.
|
|
196
|
+
- [ ] **Filter-surface audit at design review (not draft time).** When a design-review decision lands at "filter probe X at emit time" (suppress health-check noise, drop a request class from shipped logs, etc.), the design-review response should enumerate every emission surface — route handler, middleware, structured logger, access-log — so Product makes the full-vs-partial-filter call at OQ-resolution time. If Spec surfaces a "but we also need to filter path Y" concern at § 6 drafting instead of at design-review, that's a spec-time surprise the kickoff reader has to reconcile. Run B-2's OQ-B2-5 resolution missed the access-log emission path until § 6.6 drafting; explicit enumeration at design-review response would have closed the loop earlier.
|
|
197
|
+
- [ ] **Ship-criterion "N resources added" lists the N items inline.** When a ship criterion asserts "N resources added" (Terraform plan count, migration table count, route count, fixture count, allowlist count, etc.), the spec lists the N items inline so the count is reader-auditable by construction. Template: "Plan shows 10 resources added: 3 `aws_cloudwatch_log_group` + 1 `aws_iam_policy` + 1 `aws_ssm_parameter` + 5 `aws_cloudwatch_query_definition`." Prevents later reverse-engineering + catches off-by-one at spec-draft time. Run B-2 Q13 asserted "10 resources" correctly but without the itemization; the math happened to be right.
|
|
198
|
+
- [ ] **Scoping-doc § 13 path assertions verified by `ls` before shipping.** Scoping-doc authors verify all filesystem paths referenced in § 13 probe targets + § 5 file inventory via `ls` / `test -e` before shipping the scoping doc. Spec shouldn't be the first to discover a phantom path at grounding time. Run AH-f: scoping doc pointed `§ 13 A.1` at `backend/core/channels/` (nonexistent); Spec found the real path `api/routes/ws.py` via direct `ls` in ~5 seconds and flagged for Product amendment pre-kickoff. Low-cost preflight; prevents Spec burning grounding minutes chasing phantoms.
|
|
199
|
+
- [ ] **`schema § 4` asserts "column X on [actor-type] T" → grep T's DDL to verify.** When spec § 4 narrative asserts "column X is non-null on [actor-type] by invariant" / "column Y exists on [type] T's table," grep the entity-type's DDL (`public_create.sql` / `tenant_create.sql`) to confirm the column actually exists on THAT table — not on a sibling / parent / sponsor's table. Run AH-f § 4.3 claimed `actor.default_space_id` non-null on agents post-onboarding — but `agents` has no such column (only `users.default_space_id` exists; agents follow `sponsor_id → users`). F-QA-1 pre-draft caught this; Impl added `_resolve_default_space` helper with sponsor-follow. Preflight discipline: when spec text crosses actor-type boundaries (agent vs user vs actor-union), verify against each table's DDL individually.
|
|
200
|
+
- [ ] **New `message_type=` string → mandatory allowlist-extension in § 5.** When spec § 6 introduces a new `message_type=` string in any `share` / `reply_to` / event-construction call, add the 1-line `ALLOWED_MESSAGE_TYPES` frozenset-extension edit to § 5 file inventory at `backend/core/models/message_type.py`. Pairs with the new-entity-type checklist rule — same shape. Run AH-f F-QA-4: spec § 6.6 wrote the `connection_status` emission path without tracing to the validation gate; QA pre-draft caught pre-Impl. Grep template at spec draft time: `grep -n '"<new_type>"' backend/core/models/message_type.py` → if 0 matches, add the allowlist edit to § 5.
|
|
201
|
+
- [ ] **Projection counts include allowlist count-assertion tests.** When a run adds a new message_type / new entity prefix / new scope / new frozenset value to an allowlist that has a count-assertion test (e.g., `test_allowed_message_types_is_frozenset_eleven_values`), the count-assertion test rewrites → +1 net test. Include in the § 8 pytest projection explicitly (`adds: +N, where N includes +1 for the allowlist-count-test`). Run AH-f: projected 1,174; actual 1,175 because Impl-initiative `test_allowed_message_types_is_frozenset_eleven_values` replaced the 10-value count assertion. Predictable +1 whenever an allowlist grows; don't project as 0 and explain away the delta in retro.
|
|
202
|
+
- [ ] **Pytest Q-gate paths are container-relative when pytest runs via `docker exec`.** When a Q-gate invokes pytest as `docker exec <stack>-backend-1 uv run pytest ...`, the `path` argument must be container-relative (`tests/api/test_X.py`), NOT host-relative (`backend/tests/api/test_X.py`). Pytest's rootdir inside the container is `/app`; the backend source is bind-mounted at `/app/...` (not `/app/backend/...`); a `backend/tests/...` path yields `0 items collected` with no error message. Run AH-e-bootstrap-backend Q2 shipped with the `backend/` prefix; Impl stumbled at first pytest invocation. Mechanical fix at spec-draft time: strip any `backend/` prefix from pytest path arguments in § 14 Q-gates and cross-check against `docker-compose.yml`'s bind-mount endpoint (the `Container-mount ENDPOINT asymmetry` rule above covers the same class for Path-based test reads; this rule is its pytest-CLI analog).
|
|
203
|
+
- [ ] **Cross-repo Q-gate path discipline.** For cross-repo sub-runs (AD-a/b/c precedent; TE-1 convo + moot), every Q-gate that reads from the filesystem must specify **which worktree** the path resolves against — at verify time, the feat tip lives on the Leader's feat worktree, NOT on the host repo's `main` branch. Default host-repo paths (`/workspaces/convo/team.toml`, `/workspaces/convo/mootup-io/moot/.../loop-6/team.toml`) point at whatever `main` currently is, which may be behind the feat branch by hours. Spec Q-gate format: `Q10: verify <path> exists — check against Leader's moot feat worktree at <feat-branch-SHA>, NOT host repo main.` OR annotate each cross-repo Q-gate with "check against feat worktree." Run TE-1's Q10/Q13 hit this: QA's first-pass verification targeted host-repo paths that hadn't yet been merged to main; caused a one-step detour to the feat worktree. Mechanical spec-draft-time fix.
|
|
204
|
+
- [ ] **"Must appear at least once" beats "must return exactly N hits"** for invariant-grep gates that anchor a behavioral property in source code. Run T's spec wrote "Q2: anchor string returns exactly 1 hit" but the actual intent was "anchor string must be present in launch.py" — the invariant has two intentional sites (cmd_exec docstring + _launch_role body) for the dual-anchor pattern (`inspect.getsource`-based static test + behavioral test). QA confirmed both as valid. Spec template wording fix: when the gate's purpose is "this anchor stays in the code," write `≥ 1` ("must appear at least once; multiple sites are intentional") rather than `= N`. Reserve `= N` for cases where exactly N is the contract (e.g., "exactly 4 agents in the kickoff mention list"). Q-check counts that allow for legitimate duplication don't drift QA into false alarms.
|
|
205
|
+
- [ ] **Scope in/out contradiction grep at grounding time.** Product kickoffs and docs accumulate silent drift between "what's in scope" and "what's out of scope" as design decisions evolve. At § 13 grounding, grep the kickoff text + linked product doc for explicit Scope (in) lists and Scope (out) lists, and diff them mentally: any file or subsystem that appears in both (or is implied by one and excluded by the other) is a contradiction that must be resolved in-draft per strongest-specific-wins and documented as a D-decision in § 9. Run Q had 1 contradiction (scaffold.py:22 URL default vs "no scaffold.py changes"); Run R had 3 (D-TOML, D-SHELL, D-PROVISION). Larger features drift more — the explicit grep prevents the contradictions from surfacing as Impl clarifications mid-run. See `feedback_scope_in_out_contradiction_grep.md`.
|
|
53
206
|
|
|
54
207
|
### Baselines (§ 14 gates)
|
|
55
|
-
- [ ] **Empty-diff shortcut first.** Run `git diff <prior_ship>..<feat_tip> --
|
|
56
|
-
- [ ] **Cross-repo first run: empty-diff shortcut does NOT apply.** When the first pipeline run in a new repo kicks off, there is no prior ship to inherit from. Always remeasure from scratch at the feat tip: run the repo's test command,
|
|
57
|
-
- [ ] **
|
|
58
|
-
- [ ] **
|
|
59
|
-
-
|
|
208
|
+
- [ ] **Empty-diff shortcut first.** Run `git diff <prior_ship>..<feat_tip> -- backend/ frontend/`. If empty, inherit the prior ship's gate matrix verbatim — no re-running pytest/pyright/arch/Vitest. 12+ consecutive runs have validated this through agent-connection-state. Skip to § 14 drafting. Only fall back to full re-measurement if the diff is non-empty or the run is a structural refactor where the diff-based check could miss a latent failure.
|
|
209
|
+
- [ ] **Cross-repo first run: empty-diff shortcut does NOT apply.** When the first pipeline run in a new repo (e.g. Run Q in `mootup-io/moot`) kicks off, there is no prior ship to inherit from and the convo-side baseline is irrelevant. Always remeasure from scratch at the feat tip: run the repo's test command, pyright (or ts equivalent), and any project-specific baseline commands. Explicitly enumerate any pre-existing failures in § 2 ("5 pre-existing `test_example.py` worktree-path failures") so QA doesn't false-alarm on the delta. Validated on Run Q / moot-cli-brand-login (first mootup-io/moot run). See `feedback_cross_repo_first_run_baseline.md`.
|
|
210
|
+
- [ ] **Ratio targeting with fan-out discounts the denominator.** When the feature shape is "N-site fan-out of identical transforms + M unique code sites," compute the spec-LOC ratio against M (unique sites) × typical-unique-site-complexity, not against the full code diff. Fan-out inflates code LOC without inflating spec risk — the spec cost is dominated by the unique sites and one copy of the transform rule. Run AC shipped 556 spec LOC for 374 code LOC (1.49:1, below the mechanical-lift floor) because 200 of those 374 LOC were a 5-template fan-out of the same transform. Unique-site ratio was ≈ 4:1 — right on the mechanical-lift band once fan-out was discounted. Don't hedge the ratio projection with "actual LOC may land higher" — identify the fan-out up front, commit to the mechanical-lift classification, and project against unique sites only.
|
|
211
|
+
- [ ] **Classify sub-run character before picking a ratio band.** Four bands, picked by character, not by default:
|
|
212
|
+
- **Scaffolding / bootstrap: 1.5–2.5:1.** A sub-run whose deliverable is a reviewable skeleton (monorepo config, tsconfig, CI workflows, test-framework boilerplate, package.json layout) and whose paste-and-go blocks ARE the judgment work. The code surface doesn't decompose into unique-site complexity the way fan-out does; the spec's job is to justify the D-decisions that shape the skeleton. Run AD-a hit 1.59:1 (715 spec / ~450 code) with 13 D-decisions and 0 deviations — below the new-surface band, but correct for the character. Don't hedge a scaffolding ratio against the 4:1–6:1 band.
|
|
213
|
+
- **Fan-out (see rule above): discount the denominator by M unique sites.**
|
|
214
|
+
- **Mechanical-lift: 3:1.** Move code from A to B, rename, adjust imports, add structural invariants. The spec's job is to enumerate sites and gates, not to justify design.
|
|
215
|
+
- **Architecture / new-subsystem: 4:1–6:1.** New abstractions, new protocols, new surface-area grounding. The new-surface-area rule above.
|
|
216
|
+
Pick the character upfront in § 4 (or § 2 grounding) and note it explicitly — "character: scaffolding, target band 1.5–2.5:1." This tells Impl/QA what to expect and prevents the ratio-projection hedge that Run AC and Run AD-a both retro-flagged.
|
|
217
|
+
- [ ] **§ 8 pytest count formula.** When § 8 projects a pytest delta, use the explicit formula `baseline + N new test functions - M deleted test functions = new total`. **Additive assertions inside existing test functions do NOT add to the count** — they don't change the pytest passed-count at all. **Rewrites of existing test functions are 0 net** — replacing the body of `test_foo` with new logic doesn't add a new function; the count stays the same. Only `+ adds` (genuinely new `def test_*` lines) and `- drops` (test functions removed entirely) move the number. **Write the projection as three explicit inputs**, not a single hand-summed figure: `drops: −D, rewrites: ±0, adds: +N → net: +(N−D)`. **Deleted test functions must be enumerated, not implied.** Run P shipped 945 → 951 (+6), but § 8's headline claimed ≥ 953 because T7/T8 (additive assertions on existing tests) were incorrectly counted as +2. Run R projected 95 but landed 82 because 5 tests deleted by the scaffold.py rewrite (3 QA `cmd_init` tests + 2 `TestScaffoldIntegration` tests) were not subtracted from the projection. Run U projected 99 (96 + 4 − 1) because "rewrite 2" was double-counted as both rewrites AND adds; actual was 97 (drop 2 + add 3 = net +1). When a rewrite changes a function signature, list every doomed test in the spec body as "§ 7.X tests to delete" so the § 8 subtraction is explicit and Impl doesn't silently drop tests that § 8 still counted. Compute new-test count with `grep "def test_" <spec_file> | wc -l` for the new bodies, NOT a hand estimate. Converging signal from Run R + Run U retros (Spec, Impl, QA all flagged on at least one).
|
|
218
|
+
- [ ] **Full re-measure if diff is non-empty.** Run baseline commands at the current `feat/<slug>` tip — `pytest -n auto`, `pyright`, `npm run test:unit`, `grep -rn <sym>`. Never inherit counts from a prior spec or memory of what the count was last run. Inter-run merges (librarian passes, non-feature commits) drift the numbers silently. See `feedback_baseline_at_feat_tip.md`.
|
|
219
|
+
- [ ] **Verify which stack sits on feat-tip before trusting its numbers.** Each agent worktree has its own docker stack (`convo-impl-backend-1`, `convo-qa-backend-1`) and each stack's branch drifts independently between runs. Before running baseline commands against a stack, check `git -C /workspaces/convo/.worktrees/<role> branch --show-current` — if the worktree is on a stale branch (e.g. QA's worktree still on `qa/prior-feat` from a previous run), its baseline numbers come from a different feat and will mislead the spec. Run S caught this: QA's worktree was on `qa/svelte-brand-sweep`, so Spec pivoted to `convo-impl-backend-1` (correctly on feat-tip via the kickoff pull) for the baseline.
|
|
220
|
+
- [ ] **Pyright gate wording:** "0 errors on the QA container" — NOT "0 errors" unqualified. Impl container state drifts (rebuild cadence differs); the QA container is the canonical gate.
|
|
221
|
+
- [ ] **EXECUTE `black --check` at baseline § 2, don't just promise to include it.** Alongside `pytest -n auto` and `pyright .`, Spec must literally run `docker exec <container> uv run black --check <target-dirs>` (or the worktree-host equivalent) at § 2 draft time and paste the exit code + first ~10 lines of output into a BASELINE-FROZEN block. Recording "black is clean at baseline" without running the command is NOT sufficient — AG-a, AG-b, and AG-c each shipped with pre-existing black drift that surfaced during Impl's edit cycle, producing diff bulges (+181/-30, +608/-169, etc.) that Spec had projected ~120 LOC. The rule landed in AF synthesis but the recurring friction shows Spec was encoding the rule without executing the command. When the baseline IS dirty, Spec must call out in § 2 or § 16: either (a) keep new code black-clean and leave pre-existing drift untouched, OR (b) treat cleanup as in-scope. This applies to every run, not just arch-class runs.
|
|
222
|
+
- [ ] **Scope baseline to refactor blast radius.** Re-establish baselines (pyright errors, test counts, grep invariants) at the same scope as the refactor's blast radius — don't inherit a baseline from a prior run that touched different directories. When the refactor enters a new directory, widen the baseline scope to include it. Stale baselines from prior runs undercounted errors when refactor moved to a new directory in past arch runs.
|
|
223
|
+
- [ ] **Paste literal output** — BASELINE-FROZEN blocks with the exact command and its output, mirroring the pattern in `feedback_arch_spec_baseline_freeze.md`.
|
|
60
224
|
- [ ] **Prefer "≤ N" over "= N"** when the residual count lives in suppressed / unrelated files (scripts/, auto-generated).
|
|
61
225
|
|
|
62
226
|
### § 13 Draft-time Command Execution (NON-NEGOTIABLE)
|
|
63
227
|
|
|
64
|
-
**Run these commands BEFORE writing § 5 / § 11 / § 14, not as a review pass.**
|
|
228
|
+
**Run these commands BEFORE writing § 5 / § 11 / § 14, not as a review pass.** 5 consecutive runs (fix-update-status, platform-instability, admin-key-bootstrap, mentions-fallback-matcher, frictionless-onboarding) produced 1-2 spec amendments each from running commands that had previously been copied from memory or hand-estimated. Reading commands is NOT the same as executing them. See `feedback_execute_commands_in_spec_review.md`.
|
|
65
229
|
|
|
66
|
-
**Position in the workflow:** §
|
|
230
|
+
**Position in the workflow:** §13 commands are *grounding*, not *review*. The frictionless-onboarding retro (2026-04-14) flagged that Spec's first-pass §11 guessed "`api_key_prefix` should already be on the Actor interface — verify before coding", which was wrong. The correct claim came out of running `grep -n api_key_prefix backend/core/models/models.py` (empty result) during the §13 review pass and forced a mid-draft D9 revision + §5 rewrite. Running §13 *first* would have made §11 correct on first write, saved ~5 min of re-threading, and kept D9 in the D-decision list from the start instead of being a mid-draft patch.
|
|
67
231
|
|
|
68
|
-
- [ ] **Execute §
|
|
232
|
+
- [ ] **Execute §13 commands at the START of spec drafting**, before writing §5/§11/§14. The output drives what goes into those sections, not the other way around.
|
|
69
233
|
- [ ] **Every quoted count** in § 11 / § 14 / § 6 is the literal output of an executed command, not a hand-estimate.
|
|
70
|
-
- [ ] **Every file path** referenced in Product's doc has been verified with `ls` or `test -e`. Product docs drift; phantom paths get caught at spec time, not at Impl's first test run.
|
|
71
|
-
- [ ] **Every Product-enumerated implementation step** has been verified against current code. Product
|
|
234
|
+
- [ ] **Every file path** referenced in Product's doc has been verified with `ls` or `test -e`. Product docs drift; phantom paths get caught at spec time, not at Impl's first test run. (admin-key-bootstrap Run B caught a phantom `l3-app/` path this way.)
|
|
235
|
+
- [ ] **Every Product-enumerated implementation step** has been verified against current code. The frictionless-onboarding Product doc described 4-of-5 steps as "to do" that had already shipped on main; the 5th was a latent bug, not the scope Product described. Grep/read each step before writing §5. See `feedback_verify_product_grounding_claims.md` — extended from "verify Product's grounding claims" to also cover "verify Product's to-do list hasn't been partially shipped."
|
|
236
|
+
- [ ] **Pyright on § 6/§ 7 drop-in test code at spec freeze, not just on source under test.** Run pyright (or equivalent type-checker) over the test snippets in § 6/§ 7 before SPEC-READY. Type errors in tests must be zero or the spec ships a known fix-up that costs Impl time. Run T (devcontainer-orchestration) had `list[dict[str, object]]` capture patterns in `test_cmd_exec_launch_full_flow` that produced 5 new pyright errors on subscript reads — pyright on the source files alone wouldn't have caught this since the test file existed only in the spec body. Cheap audit, prevents an Impl-time refactor. See `feedback_pyright_object_subscript_in_test_captures.md`.
|
|
237
|
+
- [ ] **Dry-run any non-trivial embedded helper code in § 7 against sample target content.** When § 7 includes a test that contains parser logic — shell-flag tokenization, regex matching, multi-step string walking, AST walking, etc. — paste the helper into a Python REPL or a one-liner (`python -c "..."`) with a representative sample of the content it'll parse, and confirm the assertions evaluate as intended. Pyright catches type errors but not semantic bugs in handwritten parsers. Run V (post-create-fixes) shipped a `test_post_create_uses_strict_mode` helper that used `"-u" in stripped` to detect nounset in `set -euo pipefail`, which silently returned False because `-` is followed by `e` not `u`. ~30 sec of REPL would have surfaced it before SPEC-READY; instead it cost a closed-loop `message_type="question"` round-trip (cheap, but avoidable). See `feedback_shell_flag_substring_check_trap.md`.
|
|
238
|
+
- [ ] **Test-snippet patch targets must match the resolved namespace.** For every `monkeypatch.setattr(...)` in § 7, verify the patched name is where the function under test *resolves* the symbol, not where the symbol is *defined*. When module B does `from module_a import func` and the test target calls `func(...)`, B's call resolves against `module_a` (closure scope), so `monkeypatch.setattr("module_a.func", ...)` works and `monkeypatch.setattr("module_b.func", ...)` is a no-op — UNLESS the test target is invoking B's re-imported alias, in which case it's the reverse. Run T spec had this wrong in 3 test bodies; Impl pivoted during incremental carve. See `feedback_cross_module_monkeypatch_indirection.md`.
|
|
239
|
+
- [ ] **Arch rename "verified absent" greps must cover three categories.** When the spec asserts a symbol or path has been removed (§ 6 verification gates), grep three categories independently — not one combined grep:
|
|
240
|
+
1. **Python imports:** `grep -rn "from .* import .*<old>\b\|^import .*<old>\b" backend/ scripts/ tests/` — catches `from x import old`, `import old`, `import x.old as alias`.
|
|
241
|
+
2. **String literals (mock.patch, importlib, getattr):** `grep -rn '"<old>"\|'"'"'<old>'"'"'' backend/ tests/` — catches `mock.patch("<module>.<old>")`, `importlib.import_module("<old>")`, `getattr(obj, "<old>")`.
|
|
242
|
+
3. **Path / config references:** `grep -rn "<old>" docs/ Makefile *.toml *.yaml *.yml .devcontainer/ infra/` — catches docs, build configs, deploy scripts.
|
|
243
|
+
Recurring miss across R4, R5, R10. Spec the three grep commands literally in § 6; QA reproduces them at verification.
|
|
244
|
+
- [ ] **Symbol rename file-list discipline.** For an arch rename, the spec's § 6/§ 7 file list must come from `grep -rln <old_name>`, not hand-curation. Run a Python rename via `grep -rln <old> backend/ tests/`; for a Svelte prop rename, run TWO greps (callers in `*.svelte` files + uses inside the component file itself). Hand-curated lists miss callers.
|
|
245
|
+
- [ ] **Use `grep -F` (fixed-string) for verification gates whose target contains regex metacharacters.** When a § 7 verification gate greps for an anchor string that contains `.`, `^`, `$`, `*`, `(`, `[`, etc., and the gate's intent is "this exact string is present," use `grep -F` (or `rg -F`). A bare `grep "^memory audit:"` greps for the regex `^memory audit:` (any line starting with "memory audit:") which may not match the intended literal `^memory audit:` heading character. Run Z caught this in Q12 — harmless for the run but a recurring class of typo in gate authoring.
|
|
246
|
+
- [ ] **Classmethod refactor: grep all construction sites.** When factoring inline Pydantic construction into a `from_row` (or similar) classmethod, grep every construction site first — `grep -rn "<ModelName>(" backend/`. Multiple code paths may feed differently-shaped rows, and a single classmethod that handles only the most-common shape silently breaks the others. Enumerate all sites in § 6.
|
|
247
|
+
- [ ] **Handler-deletion mass-test-migration: one regex-anchored rewrite, not N individual Edits.** When a D-decision deletes a handler (e.g., D-SF4-1-REVISED dropped the HTML-responding GET `/oauth/authorize` in favor of a new `/api/oauth/authorize-info` endpoint), its tests are likely to exist in N sites with a near-identical shape. Mass-migrating them via N individual Edit-tool calls is slow and error-prone. Better: anchor a regex rewrite on a distinctive multi-line pattern that captures the call shape without also matching unrelated routes — e.g., for Python tests, anchor on the `api_client.get(\n "/oauth/authorize"` shape, which won't match POST sites. Run AH-d Impl retro: 16 GET sites migrated in one mechanical pass without touching the 6 POST sites. Spec authoring guidance: when deleting a handler, count the affected test sites up front (§ 13 grounding command) and flag the anchor-pattern for Impl in § 16 touch-order; don't leave it as "migrate the tests" without a recipe.
|
|
248
|
+
- [ ] **Arch-invariant container-mount count asymmetry.** When an architecture invariant test reads files outside the container's bind-mount (typically via `Path(__file__).resolve().parents[N]` walking up the tree beyond `backend/`), the test skips inside the container but runs on the host. This produces an in-container pytest count one lower than the host's full-count for every such invariant test. Run AH-d's spec § 14 Q1 said 1,071 but actual in-container was 1,070 (one arch invariant skipped per § 6.15(a)). **Spec rule:** when § 14 names a pytest total count AND the run has arch-invariant tests that reach into host-only paths, state the in-container count separately — `pytest total in container = N` vs `pytest total on host = N + K` where `K` is the count of such skip-in-container invariants. Impl + QA then verify against the right number. Cheap to fix; prevents a spurious count-mismatch diagnostic at Impl-baseline and QA-verification time.
|
|
249
|
+
- [ ] **Operator CLI scripts: library / wrapper split.** When the spec adds an operator CLI script (a `scripts/<name>.py` invoked by hand or by ops automation), put production logic under `backend/core/<subdir>/` and ship a thin wrapper in `scripts/`. The wrapper is `from backend.core.<subdir> import main; main()` (or argparse + dispatch). Tests import the library via normal imports; no `sys.path` dance, no subprocess shell-out for unit coverage. Run T's `_validate_team_template` ate three days of import gymnastics by living directly under `scripts/`.
|
|
250
|
+
- [ ] **"If platform-behavior-X, do Y; else Z" hedges in D-decisions resolve at § 13 grounding, not in Impl.** When a D-decision contains a branch keyed on platform/framework/library behavior — "if FastAPI auto-emits exception-handler schemas, the refs land for free; else manual `responses=` kwargs," "if SvelteKit's adapter-node serves `+page.server.ts` for routeless pages, redirect works; else use `+server.ts`," "if asyncpg coerces strings, no `datetime.fromisoformat()` needed; else convert" — **Spec runs the grounding probe at draft time and picks A or B explicitly**. Do NOT defer the branch to Impl. Impl will reasonably pick the simpler path, and if the platform doesn't behave that way, the defect is silent until QA's structural invariant test fails to bite (or until production exercises the wrong branch). Probe cost is typically 2 minutes (`docker exec <impl-stack> uv run python -c "from <framework> import <thing>; ..."`); cost of NOT running it is one QA re-verification cycle (best case) or a latent production defect (worst case). Run B-1's D-OAS-4xx-5xx-EXPANSION hedged FastAPI auto-emission of custom exception handler schemas; Impl assumed auto-emit; Q15 OAS drift-inject couldn't fire because `ErrorEnvelope` wasn't in the OAS. Run AH-d's similar pattern on `+page.server.ts` defaulting cost a Playwright redirect-existence test repair. Pairs with the framework-pattern deployment-mode rule below — both target uncertainty-bleed from Spec into Impl. Generalization template: any D-decision with the words "if … then … else …" or "depending on whether" referencing platform behavior is a grounding-probe candidate at § 13 Phase A.
|
|
251
|
+
- [ ] **FastAPI custom exception handlers do NOT appear in `app.openapi()` auto-generation.** When a spec declares an envelope-shaped error schema (`ErrorEnvelope`-style, structured `{message, request_id, ...}` for 4xx/5xx) as a structural invariant — "schema X present in `components.schemas`; every Y route `$ref`s it" — the schema does NOT land in the OAS unless it's wired explicitly. Two patterns work: (a) per-route `responses={"4XX": {"model": ErrorEnvelope}, "5XX": {"model": ErrorEnvelope}}` kwargs on each route handler — fine for ≤ 5 routes, verbose past that; (b) post-process the OAS dict after `app.openapi()` returns — single integration point in `backend/api/openapi_gen.py`, iterate `app.routes`, inject the schema into `components.schemas` and `$ref` blocks into each non-RFC-preserved route. Choose (b) when the route count is high. Spec § 6 MUST commit to one path explicitly; do not trust auto-emission. Spec § 13 grounding probe: `docker exec <impl-stack> uv run python -c "from api.app import app; import json; oas=app.openapi(); print('ErrorEnvelope' in oas.get('components', {}).get('schemas', {}))"` — prints `False` on stock FastAPI, confirming the post-process or per-route wiring is mandatory. Run B-1 hit this: spec hedged auto-emission, Impl reasonably assumed it, QA caught the empty schema set. Pattern reusable: the ErrorEnvelope post-process in `openapi_gen.py` is the project's canonical injection site for any future structural-error-shape invariant; new RFC-preserved routes are added to `_RFC_PRESERVED_ROUTES`.
|
|
252
|
+
- [ ] **OAS structural invariants need their drift-inject Q-gate functional in the same PR.** When § 8 declares "schema X present in OAS; every Y route `$ref`s it," § 14's Q-gate that asserts the invariant via drift-inject (e.g., "comment out `class X` → drift test FAILS") only bites if X is ALREADY in the regenerated OAS. If § 6 doesn't include the OAS-injection mechanism (post-process call, per-route kwargs, etc.) as part of the same source change, the drift-inject test passes silently because there's no ref to lose. Spec rule: § 7 invariant-test list AND § 14 drift-inject Q-gate must reference the § 6 source block that performs the injection — and § 14 must include a "schema present in OAS pre-inject" precondition step ("`grep ErrorEnvelope docs/api/openapi.yaml | wc -l` → > 0 before drift-inject runs"). Run B-1's Q15 was non-functional on first ship pass; the schema wasn't in the OAS and the drift-inject couldn't observe its absence. Generalizes to any structural invariant where a generated artifact is the proxy for source state — OAS, generated TypeScript types, JSON Schema dumps, fixture graphs. The Q-gate asserts the invariant CAN regress; it MUST also assert the invariant currently HOLDS.
|
|
253
|
+
- [ ] **Response-shape changes: grep existing test assertions before projecting `rewrites: 0`.** When a sub-run changes the response shape on a route family — wrapping (`{x}` → `{detail: {x}}`), renaming (`detail` → `message`), enveloping, key insertion — Spec MUST grep `backend/tests/ -rn 'r\.json\(\)\["<old-key>"\]'` (or the equivalent for the affected key) at § 13 grounding and enumerate every assertion that breaks. Paste the enumeration into § 7.3 as a concrete rewrite list with file:line counts. No more "rewrites: 0" projections on wire-shape changes. Run B-1's `ErrorEnvelope` migration changed `r.json()["detail"]` → `r.json()["message"]` for ~22 non-RFC-preserved routes; Impl absorbed cleanly but the count was unaccounted, fed an inaccurate spec § 7.3 into Impl's pre-draft, and required mid-run scope clarification. Generalization: any time spec changes the wire shape on a family of routes, the grep is mandatory at draft time; the projected rewrite count is part of § 7.3's accounting. Pairs with the existing "wire-format preservation vs compliance" disambiguation rule.
|
|
254
|
+
- [ ] **Wire-format preservation vs compliance disambiguation in spec text.** When § 6 / D-decision text contains "as-is," "preserved," "unchanged," or "matches existing," qualify WHAT'S preserved: the RFC-ideal shape OR the prior-shipped wire format. Run B-1's D-RFC-EXCEPTION-BYPASS spec text said `return JSONResponse(content=exc.detail, status_code=exc.status_code)` with the prose comment "returns the detail as-is." Impl correctly read this as "flat RFC dict" (since `exc.detail` is the raw dict), but the AH-a/b/c/d-shipped wire format actually wraps it (`{"detail": {...}}`). The ambiguity surfaced as a 5-min mid-run judgment-call clarification (which Spec ratified as "preserve AH-a/b/c/d-shipped wrapped form"). Cheap fix at spec-draft time: write `return JSONResponse(content={"detail": exc.detail}, ...)` AND add a comment explaining the deliberate wrap-preservation choice ("preserves AH-a/b/c/d FastAPI-wrapped form for client back-compat; strict-RFC-flat is a latent F-finding for a future sub-run"). Generalization: when a D-decision's intent is "preserve the prior wire format" AND the prior wire format differs from the protocol's ideal, BOTH the source code AND the prose MUST name the divergence explicitly. Otherwise Impl reasonably picks the protocol-ideal interpretation and ships a regression.
|
|
255
|
+
- [ ] **Existing-body audit at § 13 grounding when "# ... existing body ..." or "existing body unchanged".** When spec uses ellipsis-or-equivalent phrasing for "the existing tool body unchanged" (or "existing handler unchanged" / "existing class unchanged"), ground the existing code's state assumptions against the new surface. Read at minimum one exemplar body per source file. Specifically for ASGI-mount / framework-integration / decorator-pattern sub-runs where a pre-existing surface gets mounted on a new dispatch model: check `self.<attr>` dispatch-time state — does the existing code assume single-tenant per-instance state that won't hold under the new dispatch's multi-tenant model? Run AH-e: spec § 6.5 said "tools' existing bodies unchanged" while mounting them behind a multi-tenant `/mcp` route; the existing `MCPSpaceAdapter`'s `agent_id` / `api_key` / `_space_id` per-instance state was hidden under the ellipsis and broke under the multi-tenant mount. ~30-second probe at spec time would have surfaced it. Symmetric to the "function-API smoke" rule above (which probes the NEW API surface) — this rule probes the EXISTING code being mounted BEHIND that new surface. Both sides of the integration seam need grounding.
|
|
256
|
+
- [ ] **Spec grep invariants must match Impl's actual enforcement pattern.** When spec § 8 declares a grep-based structural invariant (`grep -rn "require_scope" backend/adapters/mcp_tools/` ≥ N), Impl may legitimately introduce a wrapper or adapter that subsumes the spec's literal pattern (`_check_session_scope("...")` calls `require_scope` internally). When that happens, the spec's literal grep returns 0 hits (or the wrong count), and QA can't verify the invariant via the spec's command. Two fixes: (a) Spec writes the grep invariant as `require_scope|<expected_wrapper>` to accept legitimate wrapping; (b) when Impl introduces a wrapper, Impl updates the spec's grep invariant in the same commit (or flags it for Spec's amendment). Better: spec asserts on the intended SEMANTIC (e.g., "every tool body gates via `require_scope` directly OR via a wrapper that calls it"; concrete grep accepts both) rather than the literal call. Run AH-e: spec wanted `require_scope` grep ≥ 38; Impl used `_check_session_scope` wrapper for stdio-vs-HTTP dual-mode; QA had to reason about intent rather than asserting the literal grep.
|
|
257
|
+
- [ ] **Tool-count grep discipline at § 13 grounding.** When a D-decision enumerates "N items in surface X" (40 MCP tools, 12 message types, 35 routes, etc.), the count MUST be verified via direct grep at spec time, not trusted from subagent inventory or memory. 30 seconds of mechanical work. Concrete grep templates: for MCP tool surface, `grep -rn '@adapter\.mcp\.tool\(\)\|@mcp\.tool\(\)' backend/adapters/mcp_tools/ | wc -l`; for FastAPI route surface, `grep -rn '@router\.\(get\|post\|put\|patch\|delete\)' backend/api/routes/ | wc -l`; for message-type allowlist, `grep -c '"' backend/core/models/message_type.py`. Run AH-e: spec D-TOOL-SCOPE-TABLE listed `list_participants ×2` (memory error — only in `spaces.py`) and missed 5 tools entirely (`get_space_status`, `get_transcript`, `get_thread`, `orientation`, `wait_for_health`); net spec said 40, actual 35. Caught at parametrize matrix; would have been caught at spec time with one grep. Pairs with the existing "subagent results need consistency review before commit" rule — applies to enumerations as much as transforms.
|
|
258
|
+
- [ ] **Vitest must run inside the container when a frontend runtime dep is added.** When a sub-run adds a new frontend runtime dependency (`@sentry/sveltekit`, a Svelte component library, anything that imports at module-load time), the host-side `node_modules/` may not have the dep installed (e.g., the QA stack's `npm install` happens inside the container). Vitest run from the host filesystem fails to resolve the import even though the production build is fine. Rule: after adding a frontend runtime dep, always run Vitest via `docker exec <stack>-frontend-1 npx vitest run` — NOT the host-side `npx vitest`. Spec § 14 Q-gate for Vitest must specify the container-exec form when a new dep is in the diff. Run B-1's QA hit this on `@sentry/sveltekit` — host node_modules empty, container vitest worked immediately. Pairs with the existing "container bind-mount" + "Vitest first-time three-config-wires" rules.
|
|
259
|
+
- [ ] **Framework-pattern deployment-mode verification at grounding.** Generalization of the function-API smoke rule (below) to framework file-types, lifecycle hooks, and API patterns: when spec names a specific file/hook/pattern from a framework with distinct dev-vs-production runtimes — SvelteKit (`+page.svelte` / `+page.server.ts` / `+server.ts` / `+layout.*`), FastAPI (`response_class=` / `response_model=` / dependency overrides / lifespan events), Vite build-output expectations, pytest fixture scopes, etc. — **verify the pattern works under the target deployment mode, not just the dev runtime**. The AH-a (authlib API shape) / AH-b (NFC vs NFKC) / AH-c (timing-defense completeness) / AH-d (`+page.server.ts` production-404) defects all share a root cause: *named a specific pattern without verifying it works in the specific runtime it'll run under*. Pragmatic grounding check template: (a) for framework file-types, run the framework's production build locally and hit the route; (b) for lifecycle hooks, trigger the hook in the production-equivalent environment; (c) for API patterns, generate the downstream artifact (OAS, TypeScript types, fixture graph) and inspect. The function-API smoke rule is the library-call flavor of this same class; use both together.
|
|
260
|
+
- [ ] **Q-gate-to-source-block cross-reference audit.** Before SPEC-READY, trace every § 14 Q-gate's pass criterion to either (a) the specific Required test that exercises it, OR (b) the specific § 6 source block that implements the asserted behavior. If they disagree — Q-gate says "→ 200" but the test asserts 400, or § 6 source returns a different shape — **fix one or the other** before SPEC-READY. Run AH-d shipped with Q20 asserting "cross-origin session-auth revoke → 200" while D-SESSION-REVOKE-BRANCH-ORDER + P6 test correctly asserted 400. Not a security bug (the invariant held), but a spec-doc defect QA correctly flagged. Root cause: Q-gates written as "feels-like-what-should-happen" rather than traced to the test-or-source pair that verifies them. Audit pattern: for each Q-gate, grep the spec for the referenced test name or source pattern; diff the asserted value; resolve the disagreement. Extends the existing "no half-drafts in § 6/§ 7" rule into § 14 gate wording.
|
|
261
|
+
- [ ] **Skill-agent grounding for framework-behavior questions.** When § 13 grounding needs a definitive answer about framework behavior at a higher level than a single function signature — "does Claude Code discover `.claude/playbooks/` alongside `.claude/skills/`?", "does FastAPI middleware ordering affect X?", "does SvelteKit's adapter-node serve `+page.server.ts` without a `+page.svelte` sibling?" — delegate the probe to the relevant specialized agent (`claude-code-guide` for Claude Code features / hooks / skills / settings; `general-purpose` with the framework's docs loaded for FastAPI / SvelteKit / etc.) rather than inferring from codebase reads or memory. The skill-agent returns authoritative answers faster and with higher confidence than "I think the framework does X based on this grep." Complements the function-API smoke rule (which probes library-level signatures at pinned versions) with higher-level framework-behavior probes. Run TE-1: § 13 A.1 delegated "does Claude Code walk `.claude/playbooks/`?" to `claude-code-guide`; returned definitive "ONLY walks `.claude/skills/`; no settings.json workaround; symlinks not viable" — forced carve-conditional resolution at grounding time rather than mid-Impl surprise. Pairs with: function-API smoke (library-level) + existing-body audit (existing code) + framework-pattern deployment-mode verification (runtime-mode behavior). Four probe types, same spirit: resolve uncertainty BEFORE SPEC-READY, not during Impl.
|
|
262
|
+
- [ ] **Function-API smoke at grounding — library AND stdlib, signature AND semantics AND dispatch-side attribute paths AND activation predicates.** When § 4 / § 6 names a specific function/class/method by qualified path — third-party (`authlib.oauth2.rfc7636.CodeChallenge.is_valid_challenge(...)`, `fastapi.FastAPI.mount(...)`, `pydantic.BaseModel.model_validate_json(...)`, `sentry_sdk.init(...)`, `mcp.server.fastmcp.FastMCP(...)`) OR stdlib (`unicodedata.normalize("NFC", ...)`, `hashlib.sha256(...)`, `base64.urlsafe_b64decode(...)`, `re.match(...)`, `urllib.parse.urlparse(...)`, `secrets.token_urlsafe(...)`, `hmac.compare_digest(...)`, `json.loads(...)`) — at the version pinned in the lock file (or the stdlib shipped with the container's Python), run `docker exec <impl-stack> uv run python -c "..."` during § 13 grounding. Verify FOUR things: (a) **import AND signature** — the symbol exists and the spec-named kwargs match the pinned-version signature, read via `inspect.signature(<symbol>)` or `help(<symbol>)` (NOT `print(<symbol>)`, which only shows the repr — the kwarg names are what matters). Example: `python -c "import inspect, sentry_sdk; print(inspect.signature(sentry_sdk.init))"` reveals whether the kwarg is `with_locals` (1.x) or `include_local_variables` (2.x); (b) **exact semantics against a sample input that a test depends on** (`python -c "import unicodedata; print(unicodedata.normalize('NFC', '𝗺𝗼𝗼𝘁𝘂𝗽'))"` — verifies whether the chosen form actually collapses math-script homographs); (c) **dispatch-side attribute paths** — when spec asserts an attribute access path on a library-typed object (`ctx.request_context.session_context.actor` from MCP 1.27.0's Context), probe the attribute existence at the pinned version via `inspect.getmembers(<TypeName>)` or by reading the SDK's source for the type. Run AH-e: spec § 6.5 amended path didn't exist at MCP 1.27.0; Impl probed Context source + BearerAuthBackend.authenticate to find the actual contextvar pattern (`get_access_token()` from `auth_context_var`); (d) **activation predicates between hook points** — when wiring N hook points on the same library object (`FastMCP(token_verifier=..., event_store=..., auth=...)`), probe whether one gates another. Run AH-e: `FastMCP(token_verifier=...)` alone wraps `RequireAuthMiddleware` (scope check) but does NOT install `AuthenticationMiddleware` + `AuthContextMiddleware` — those gate on `if self.settings.auth:` requiring `auth=AuthSettings(...)` to also be set. Read the conditional code in the library that activates each hook; the activation chain must match the spec's intent. If any of (a)-(d) fails, pivot BEFORE SPEC-READY, not mid-Impl. Run AH-a specced `CodeChallenge.is_valid_challenge(verifier, challenge, method)` against authlib 1.7 where that call shape exists only on `AuthorizationServer` (a D-decision forbade using) — Impl pivoted to stdlib PKCE in 3 LOC but `authlib>=1.3` shipped as dead weight. Run AH-b specced `unicodedata.normalize("NFC", ...)` for homograph rejection, but **NFC does not collapse math-script bold** — only NFKC (compatibility decomposition) does. Impl pivoted to NFKC; T7 test verified. Run B-1 specced `sentry_sdk.init(with_locals=False)` from sentry-sdk 1.x memory; sentry-sdk 2.x renamed the kwarg to `include_local_variables` — Impl caught at grounding via the AH-a smoke rule, but the rule's "print(symbol)" wording wouldn't have surfaced the rename. Three defects share a class: the spec named a function without exercising its exact behavior at the pinned version. Rule scope: spec-named function + test-assertion-depends-on-its-output (or kwarg-depends-on-version) ⇒ mandatory grounding check of signature AND behavior on the test's sample input. Stdlib is not exempt; `unicodedata` / `hashlib` / `base64` / `re` / `urllib.parse` semantics are as fiddly as third-party surface. Parallels AG-b's F-ACTOR-CREDENTIALS-API-KEY-EXPOSURE trap.
|
|
263
|
+
|
|
264
|
+
### Env Propagation Trace (infra-touching specs)
|
|
265
|
+
|
|
266
|
+
When a spec mutates env vars or adds runtime dependencies, trace the full propagation path through every consuming layer. Run S (backend-config-from-aws) traced env vars only as far as `config.py` and `user-data.sh.tftpl`, missing 5 downstream layers — Pat hit each one as a separate alpha-deploy failure (compose env empty, uv.lock not regenerated → boto3 missing, AWS_DEFAULT_REGION not set, IMDS hop limit, KMS RequestAlias condition). The Librarian's flag: "specs that ship env-var changes should trace the full env propagation path."
|
|
267
|
+
|
|
268
|
+
- [ ] **For every new env var or mutated env var:** enumerate every layer that reads it. Layers to check (omit any that don't apply):
|
|
269
|
+
- Application code (`backend/`, `frontend/`)
|
|
270
|
+
- `docker-compose*.yml` `environment:` blocks for every relevant service — **these are a distinct layer from `user-data.sh.tftpl`, do not bundle them.** The tftpl writes the env var into the host shell environment; the compose `environment:` block decides whether the var crosses into the container. A spec that only updates the tftpl ships inert because the backend container never sees the var. Run Y caught this as F1 only because § 13 Phase D explicitly grepped `docker-compose.mootup.yml`'s `environment:` block.
|
|
271
|
+
- `Dockerfile` `ENV` directives and any `ARG` that flows into them
|
|
272
|
+
- `scripts/deploy-mootup.sh` and other deploy-time scripts (rsync, ssh-exec, env file generation)
|
|
273
|
+
- `infra/` user-data scripts and `*.tftpl` templates
|
|
274
|
+
- EC2 instance metadata access (IMDS hop limit, instance profile permissions)
|
|
275
|
+
- IAM role policies (KMS conditions, Secrets Manager scoping, SSM path scoping)
|
|
276
|
+
- Local-dev fallback paths (devcontainer.json, `.env.example`, host-only env mechanisms)
|
|
277
|
+
- **`config.py` empty-string handling.** When a new env var is introduced via `${VAR:-}` (default-to-empty) at the shell layer and consumed by `int(os.getenv("VAR", "0"))` in `config.py`, an empty string crashes the `int()` cast (Python rejects `int("")`). Pattern fix: `int(os.getenv("VAR") or "0")`. Spec the smoke test: `VAR= python -c "from backend.core.config import config"` should not raise. Run Y caught this as F2.
|
|
278
|
+
Each layer must either consume the new var correctly or be explicitly noted as out of scope with a reason.
|
|
279
|
+
- [ ] **For every new Python runtime dep added to `pyproject.toml`:** confirm `uv.lock` is regenerated AND committed. The production Dockerfile uses `uv sync --frozen` — without a regenerated lock, the new dep is silently absent from the production image even though local dev (which uses `--group test` against pyproject.toml) works fine. Run S shipped `boto3` in pyproject without a uv.lock bump; alpha hit `ModuleNotFoundError: boto3` until Pat regenerated.
|
|
280
|
+
- [ ] **`uv.lock` regens that bump formatting/linting tools must include the paired `black .` (or `ruff format .`) reformat in the same commit.** When `uv lock` bumps a formatter version (black 25 → 26.x, ruff N → N+1), the formatter's output may change (argument wrapping, trailing comma rules, string quoting conventions). Tests still pass, but `black --check` (Q6) against the new version flags pre-existing files that were clean under the old version. Unpaired bumps leave a drift landmine for the NEXT sub-run's Impl: their first `black --check` as part of Q6 reports N files needing formatting (unrelated to their work) and either blows their scope (reformat-to-satisfy-gate, bulging the diff by hundreds of LOC) OR forces mid-run scope discussion. Worse, black's arg-wrapping rewrites can silently drop `# type: ignore` comments out of their intended target lines, producing pyright errors that need per-line suppression restoration. **Rule:** any commit that regenerates `uv.lock` AND bumps a formatter version must include `uv run black .` (or `uv run ruff format .`) in the same commit. If that reformat touches N files, document in commit message ("70 pre-existing files reformatted for black 26.3.1"). Applies symmetrically for npm/yarn lock-file regens that bump prettier/eslint. Run TE-1 Impl absorbed 70 files of pre-existing drift + restored 12 dropped type-ignores because `66cdb9d` (uv.lock regen) shipped unpaired.
|
|
281
|
+
- [ ] **For IAM role policies that grant access via KMS-encrypted resources:** never use `kms:RequestAlias` conditions when the consumer is Secrets Manager, RDS, or another AWS service that calls KMS on the caller's behalf. Those services pass key ARNs, not aliases, so the condition evaluates false and the Allow doesn't match. Resource-list the specific KMS key ARNs directly. Run S's L4-platform `app_kms` policy used `kms:RequestAlias = "alias/mootup/*"`, which silently failed under Secrets Manager `GetSecretValue`. See commit `f1227a1` for the fix shape.
|
|
282
|
+
- [ ] **For Docker containers on EC2 with instance profiles:** confirm `http_put_response_hop_limit = 2` in the launch template's `metadata_options`. Default is 1, which fails the Docker bridge's extra hop, so boto3 inside containers can't reach IMDS for credentials/region.
|
|
72
283
|
|
|
73
284
|
### § 11 Surprises for Impl — Missing-Imports Audit
|
|
74
285
|
For every new symbol referenced in § 5 code snippets (function calls, exception catches, type annotations), grep the target file for the import:
|
|
@@ -77,23 +288,274 @@ For every new symbol referenced in § 5 code snippets (function calls, exception
|
|
|
77
288
|
grep -n "^import <sym>\|^from [^ ]* import.*<sym>" <target-file>
|
|
78
289
|
```
|
|
79
290
|
|
|
80
|
-
If the grep returns empty, add a § 11 line: "**Missing import** — `<sym>` is not imported in `<target-file>`. Add `import <sym>` (or `from <module> import <sym>`) to the top of the file."
|
|
291
|
+
If the grep returns empty, add a § 11 line: "**Missing import** — `<sym>` is not imported in `<target-file>`. Add `import <sym>` (or `from <module> import <sym>`) to the top of the file." Caught `json` on fix-update-status and `asyncio` on platform-instability — 2 consecutive runs. See `feedback_missing_imports_audit_in_spec_11.md`.
|
|
81
292
|
|
|
82
293
|
Common culprits:
|
|
83
294
|
- [ ] Stdlib modules newly introduced by § 5: `json`, `asyncio`, `re`, `uuid`, `time`, `os`, `sys`.
|
|
84
|
-
- [ ] New exception types in `except` clauses.
|
|
295
|
+
- [ ] New exception types in `except` clauses: `httpx.ReadError`, `asyncpg.ForeignKeyViolationError`, etc.
|
|
85
296
|
- [ ] New typing imports: `Any`, `Callable`, `Awaitable`.
|
|
86
297
|
|
|
87
|
-
**Test snippet imports must be self-contained.** Every symbol a § 7 test snippet calls must be explicitly imported inside that snippet (at the top of the test function or at the top of the test file). Do NOT assume `import foo as foo_mod` at the file top brings bare `foo_mod.cmd` into the test function's namespace — call sites that use the bare name will fail with `NameError` even if the alias import exists. Rule: for every symbol referenced by its bare name in a § 7 test snippet, grep the snippet for a matching `from <module> import <name>` or `import <name>` line. If absent, add it.
|
|
298
|
+
**Test snippet imports must be self-contained.** Every symbol a § 7 test snippet calls must be explicitly imported inside that snippet (at the top of the test function or at the top of the test file). Do NOT assume `import foo as foo_mod` at the file top brings bare `foo_mod.cmd` into the test function's namespace — call sites that use the bare name (`cmd(args)`) will fail with `NameError` even if the alias import exists. Run Q / moot-cli-brand-login hit this on T7/T8 (alias import at top, bare `cmd_login(args)` call in the test) — Impl fixed in-place with `from moot.auth import cmd_login` inside each function, but the 30s test-red roundtrip is avoidable with a pre-commit spec-snippet review. Rule: for every symbol referenced by its bare name in a § 7 test snippet, grep the snippet for a matching `from <module> import <name>` or `import <name>` line. If absent, add it.
|
|
299
|
+
|
|
300
|
+
**Package-vs-submodule collision on `core.config.*` imports.** When a new test file in `backend/tests/` imports a `core.config.<name>` submodule, prefer `from core.config import <name>` over `import core.config.<name> as <alias>`. The dotted form fails at runtime with `ImportError: cannot import name '<name>' from 'core.config.<name>'` because `core/config/__init__.py`'s `from .<name> import *` pattern combined with the subdir-name collision breaks the dotted-import-with-alias form. The `from core.config import <name> as alias` variant works identically because the package is topologically initialized before the submodule lookup. Run S / backend-config-from-aws hit this in § 6.3's `test_aws_config.py` template (`import core.config.config as cfg_mod` → `ImportError`); Impl fixed in-place with `from core.config import config as cfg_mod`, one `replace_all`. Spec drafts that add test files touching `core.config.*` should grep the § 6 snippet for `import core\.config\.` and flip any matches to the `from core.config import` form. See `feedback_package_submodule_identity_split.md` for the underlying mechanics.
|
|
88
301
|
|
|
89
302
|
### Test Cleanup Fixtures
|
|
90
|
-
- [ ] **FK cascade enumeration.** If a test fixture deletes rows from a table with FK-dependent children, grep `REFERENCES <parent>(id)` in the schema
|
|
91
|
-
- [ ] **Verify the fixture helper at the target line before prescribing additive assertions.** When
|
|
92
|
-
- [ ] **Subprocess env forwarding under pytest-xdist.** If any test runs a subprocess that imports
|
|
303
|
+
- [ ] **FK cascade enumeration.** If a test fixture deletes rows from a table with FK-dependent children, grep `REFERENCES <parent>(id)` in `backend/core/stores/public_create.sql` (or the relevant schema file) to find ALL child tables. Enumerate them in dependency order in the spec's test-infra section, OR specify `TRUNCATE ... CASCADE` as the default. Partial recipes silently fail when the schema grows new FKs, and new tests can silently shift xdist distribution to expose latent FK gaps that previously happened to never collide. See `feedback_fk_cascade_in_test_cleanup_fixtures.md`.
|
|
304
|
+
- [ ] **Verify the fixture helper at the target line before prescribing additive assertions.** When § 4 / § 5 says "add `assert X` to test Y at line N," grep test Y for the fixture call (`_create_human` vs `_create_agent` vs etc.) to confirm the fixture matches what the assertion needs. Run Q / agent-connection-state shipped with a §4.1 prescription that targeted tests using `_create_human`, but the new field (`is_connected`) only applies to agents — Impl had to recast the test as an agent-only standalone instead of an additive assertion. Cheap check at spec-draft time, saves Impl the recast work.
|
|
305
|
+
- [ ] **Subprocess env forwarding under pytest-xdist is bidirectional.** If any test runs a subprocess that imports `core.config`, spec § 7 must list ALL mutated config vars to forward: `PYTHONPATH=/app`, `POSTGRES_DB=config.POSTGRES_DB`, and any other per-worker values. In-memory config mutations don't cross process boundaries. **AND** for every new env var that a spec ADDS at the compose layer, grep `backend/tests/` for `subprocess|Popen|_run_wrapper` and verify each subprocess call either (a) forwards the new var consistently with the in-process test expectation, or (b) unsets/blanks the new var (`"<NEW_VAR>": ""`) to preserve whatever isolation the subprocess was designed for. Run AB caught this when `MOOTUP_STAGE=local` (new compose default) leaked into `_run_wrapper` subprocesses and routed POSTGRES_DB through LocalStack's seeded value, breaking per-worker DB isolation on 4 tests. The env-propagation trace has two directions: "where SET" (new consumers) AND "where explicitly UNSET" (existing subprocesses that must stay on the pre-change env path). See `feedback_subprocess_under_xdist_env_forwarding.md`.
|
|
93
306
|
|
|
94
307
|
### Token / Secret Literals in § 7 Test Snippets
|
|
95
|
-
- [ ] **Never hard-code token prefix literals in § 7 test snippets.** Import from
|
|
308
|
+
- [ ] **Never hard-code token prefix literals in § 7 test snippets.** Import from `core.auth.tokens` instead: `API_KEY_PREFIX`, `SESSION_COOKIE_NAME`, etc. Two consecutive runs (admin-key-bootstrap had `"sk_"`, frictionless-onboarding had `"convo_key_"`) produced Impl friction because the arch invariant `test_token_prefix_literals_centralized` blocks raw literal usage in tests. Grep `"convo_key_"`, `"convo_sess_"`, `"sk_"`, `"bearer "` in the spec before freeze. See `feedback_token_literal_in_spec_test_snippets.md`.
|
|
309
|
+
- [ ] **Prefix-constant completeness audit: every new ID/token prefix gets a constant, even ones used only via `encode_id`.** When a run adds N new ID or token prefixes, every prefix needs a `<NAME>_PREFIX = "<str>"` constant in `core.auth.tokens` — **including prefixes used only as the second arg to `encode_id(bigint, "<prefix>")`** at runtime. Those look "constant-free" because the literal lives inside the `encode_id` call, so it's tempting to skip the constant for them. But any time the prefix then appears OUTSIDE `encode_id(...)` and the `PREFIXES` frozenset — particularly in test assertions (`assert body["oauth_client_id"].startswith("cli_")`) — the test has to either import a missing constant or hardcode a literal. Rule: if the prefix appears anywhere outside `encode_id("<prefix>", ...)` and `PREFIXES`, it needs a `<NAME>_PREFIX` constant. Run AH-a added `OAUTH_ACCESS_PREFIX`, `OAUTH_REFRESH_PREFIX`, `OAUTH_CODE_PREFIX` but dropped `CLIENT_ID_PREFIX = "cli_"` because `cli_` is only used as an `encode_id` arg at runtime — then one test hardcoded `"cli_"` as a literal, violating the existing token-literal-centralization rule. Pair with the rule above.
|
|
96
310
|
|
|
97
311
|
### Protocol Cross-Reference
|
|
98
|
-
- [ ] **If any D-decision touches an inter-agent protocol** — mention lists on ship/kickoff/handoff messages, thread discipline, retros-in routing, token-ring mention rules, status-update discipline — scan CLAUDE.md's § Agent Workflow for the current live rule before freezing the spec decision. Live protocol evolves faster than spec templates; a spec that hard-codes a stale mention list forces Impl to either follow it blindly (wrong behavior) or deviate (compliance ambiguity).
|
|
99
|
-
- [ ] **Prefer
|
|
312
|
+
- [ ] **If any D-decision touches an inter-agent protocol** — mention lists on ship/kickoff/handoff messages, thread discipline (`reply_to` vs `share` vs `reply_to_thread`), retros-in routing, token-ring mention rules, status-update discipline — `grep -l <topic> /home/node/.claude/projects/*/memory/` and scan CLAUDE.md's § Agent Workflow for the current live rule before freezing the spec decision. Live protocol evolves faster than spec templates; a spec that hard-codes a stale mention list forces Impl to either follow it blindly (wrong behavior) or deviate (compliance ambiguity). Either way costs review time. See `feedback_spec_protocol_cross_reference.md`.
|
|
313
|
+
- [ ] **Prefer memory references over inline protocol rules** in D-decisions — "follows `feedback_retros_in_handoff`" beats hard-coding a mention list that may drift.
|
|
314
|
+
|
|
315
|
+
### Frontend Estimation Anchors (§ 6 LOC budgeting)
|
|
316
|
+
- [ ] **Svelte dialog/modal components = pure-logic LOC × 1.5–2.5.** Components with backdrop + focus trap + ARIA attributes + submit-state + modal styling carry substantial boilerplate beyond the pure logic. Run CP-2 `PauseConfirmDialog.svelte` projected 90 LOC / actual 188 (2.1×). Distinct anchor from store-layer helpers (40–60) and service-module enumerate+dispatch+audit (150–250). When estimating in § 6, separate modal chrome cost from logic cost.
|
|
317
|
+
- [ ] **Minimum 2 actual data points before proposing a new ratio band.** A single-run projection that misses identifies an estimation gap, not a band-structure gap. Run CP-2 § 6 flagged a speculative "glue-code frontend extension 0.2–0.5" band from one-run projection of 0.31; actual landed at 0.86, squarely inside the existing 0.6–1.2. Retracted at synthesis. Specs that surface a new-band hypothesis must cite ≥2 prior within-ratio data points in the proposed tier.
|
|
318
|
+
|
|
319
|
+
### Test-Path Grounding (extension of source-path `ls` rule)
|
|
320
|
+
- [ ] **`ls` every TEST path literal in § 6 at spec-draft time.** The CP-1 source-path rule extends symmetrically to test directories. Each test path proposed in § 6 must be verified to exist via `ls backend/tests/<subdir>/` or equivalent before spec freeze. CP-1 (`utils/` vs `config/`) and CP-2 (`tests/mcp/` vs `tests/api/`) both shipped spec path drift because the test-dir layout wasn't grounded. Non-blocking (Impl finds the right location) but creates QA observation noise on every run.
|
|
321
|
+
|
|
322
|
+
### Baseline Arithmetic Discipline (§ 2 + § 3)
|
|
323
|
+
- [ ] **§ 2 baseline: pass/skip split matches pytest output literally.** Write `N passed + M skipped`, never `N total collected`. `total = passed + skipped + xfail + errors`; using `total` as "baseline" produces off-by-K drift when subtracting a delta on a different dimension. One-line sanity check before spec freeze: copy-paste the pytest summary tail, don't paraphrase. CP-2 shipped with § 2 saying "1,249 passed" (collected-total) vs actual 1,246 passed + 3 skipped.
|
|
324
|
+
- [ ] **§ 3 vitest target = sum of § 7 per-R sub-counts.** When § 3 gives a target vitest count, verify it equals the sum of new tests enumerated in § 7. CP-2 § 3 said "141" but § 7 sub-counts (2+6+3+2) = 13 → correct target was 143. Two-minute arithmetic check at draft time; eliminates a handoff-message reconciliation bullet every run.
|
|
325
|
+
|
|
326
|
+
### MCP/CLI-Over-Route Rate-Limit Asymmetry (§ 9)
|
|
327
|
+
- [ ] **When layering MCP or CLI triggers on a FastAPI-rate-limited route, § 9 answers explicitly: "does the new trigger re-invoke the route's rate limit, bypass it, or add its own?"** FastAPI's middleware-based rate limits don't apply to MCP tools invoked in-process or CLI commands that call the service layer directly. Alpha may accept bypass (admin-gated + low cadence); beta may require re-layering. Document the decision rather than leave unstated. CP-2 F-CP2-MCP-RATELIMIT trigger: the MCP `pause_all` tool bypasses the `rl_control` rate limit the FastAPI route enforces. Not a blocker at alpha; flagged for beta-hardening.
|
|
328
|
+
|
|
329
|
+
### Soften Prescriptive Ordering on Not-Load-Bearing Lists
|
|
330
|
+
- [ ] **When a spec prescribes a position (alphabetical insert, specific index, etc.) for a list where source-level documentation states order doesn't matter, soften the prescription.** Use language like "append or insert alphabetically — not load-bearing per `<file>:<line>`." Avoids manufacturing a non-issue for QA observation. CP-2 Spec § 5.4 said `control.register(adapter)` goes "between `links` and `messaging`"; Impl appended to end; QA observed mismatch but `__init__.py:8` explicitly states FastMCP registration order doesn't matter. Prescriptive language on not-load-bearing ordering creates false QA signal.
|
|
331
|
+
|
|
332
|
+
### § 6 Component-Row Discipline (handler vs prop-drill)
|
|
333
|
+
- [ ] **Handler / state / `$effect` rows get their OWN § 6 line; do not subsume under a prop-drill "~3 LOC thread" row.** When a component acquires (a) a handler function that computes state + calls a store, (b) a new `$state` variable, (c) a `$effect` for cleanup, or (d) a lifecycle wiring via `$effect` / `$derived`, that's a distinct work-item with its own LOC estimate — separate from the "thread prop through this component" pass-through row. ux-rtm `SpaceRoom.svelte` was the miss: spec said "~3 LOC prop pass-through" but actual `handleReplyToMessage` + `replyParent` state + cleanup `$effect` = 36 LOC (12× miss). Same pattern in CP-2 `PauseConfirmDialog.svelte` at smaller scale. 2 consecutive frontend-primary runs projected below floor because handler-plus-state hid inside prop-drill line-items. Separate rows fix both the projection accuracy AND give Impl a clear scope signal that a new file/symbol is in-scope (not just plumbing).
|
|
334
|
+
|
|
335
|
+
### Test-FILE Grounding (extension of test-DIR rule)
|
|
336
|
+
- [ ] **`ls` every test FILE at spec-draft time.** "Extend existing `X.test.ts`" can be false even when the directory exists. One-line `ls frontend/src/lib/components/*.test.ts` (or `ls backend/tests/api/*.py`) at § 13 Phase A prevents "extend existing" spec claims that turn out to require file creation. CP-2 added the test-DIR rule; ux-rtm shows the file-level extension. Applies symmetrically to backend + frontend.
|
|
337
|
+
|
|
338
|
+
### § 2 Baseline — QA-Stack vs Impl-Stack Drift
|
|
339
|
+
- [ ] **Note the QA-stack baseline drift in § 2.** Impl-stack and QA-stack pytest baselines can drift ±3 due to environment deltas + timing flakes. When setting § 2, either (a) run baseline against BOTH stacks and note both numbers, or (b) explicitly document "impl-stack baseline at `<commit>`; QA-stack may drift ±3 on flakes." Without this note, QA's deviation check silently misleads. CP-2 + ux-rtm both surfaced the drift (~3 tests); both shipped spec literals from impl-stack only. One-line note in § 2 resolves.
|
|
340
|
+
|
|
341
|
+
### Derived-UI Source Specification
|
|
342
|
+
- [ ] **"Render a snippet / title / label / author-name" UI elements must answer: source object + lookup path.** When § 4 or § 5 says "render X of Y" for a UI element derived from a related object (snippet of parent message, author name from actor ID, thread title from thread record, channel name from space ID), spec answers BOTH: (a) what is the source object (`parent_event_id` alone is not a source — it's an ID; the source is a `ContextEvent`), and (b) how does the UI resolve it (prop threaded through from ancestor, store lookup via `events[]`, metadata field pre-hydrated on the event). Left ambiguous, Impl defaults to "just render the ID" — passes component tests, misses UX intent. ux-rtm `EventCard` reply-pill shipped with `evt_…` fallback instead of parent snippet because § 4.6 didn't specify the source.
|
|
343
|
+
|
|
344
|
+
### A-1 Error Envelope Shape in Pytest 4xx Body Assertions
|
|
345
|
+
- [ ] **Test snippets asserting 4xx response body content assert `r.json()["message"]`, NOT `r.json()["detail"]`.** FastAPI's `HTTPException(detail=str)` is flattened by the A-1 error envelope wrapper to `{code, message, request_id}` with `detail=None`. Raw `detail` is absent from the final response body. ux-rtm Impl's R8/R9/R10 initial drafts asserted `detail` and failed on first run; one-line fix to `message`. Carry-forward for future spec test templates asserting 4xx body content. Pair with the `feedback_token_literal_in_spec_test_snippets.md` discipline — both are test-snippet correctness anchors.
|
|
346
|
+
|
|
347
|
+
### JS/TS Projection Multiplier (×1.5 on source + tests)
|
|
348
|
+
- [ ] **Apply ×1.5 multiplier to JS/TS/frontend projections in § 6 for both source AND test LOC.** 3 consecutive under-projections (CP-2 2.8×, ux-rtm 2.2×, AH-e-cli 1.6× source-only) confirm the pattern as methodology, not per-run anomaly. Specific anchors:
|
|
349
|
+
- **OAuth-flow modules** (PKCE + callback server + redirect URI + state/verifier helpers): budget **250–300 LOC** (not 150–180 "pure logic").
|
|
350
|
+
- **TS/JS command-handler refactors** involving state + dispatch + validation (OAuth-path vs static-token-path branching): budget **200–250 LOC** (not 80–120 "refactor in place").
|
|
351
|
+
- **Vitest test files** for Required clusters: budget **100–150 LOC per cluster** (not 60–80).
|
|
352
|
+
- **Module re-export surfaces** (`index.ts` ambient, `src/auth/index.ts` barrel): budget **20–40 LOC** even when spec would otherwise skip as "zero LOC."
|
|
353
|
+
- **`credentials.ts` / `credential.ts`-class extensions** involving interface + session-memory map + accessor: budget **120–140 LOC** (not ≤40 "field add").
|
|
354
|
+
|
|
355
|
+
Distinct from Svelte-modal multiplier 1.5–2.5× (ux-rtm rule; modal-chrome-specific — backdrop + focus trap + ARIA + submit state). The JS multiplier is cross-language base-rate adjustment: Python projections calibrated over many runs; JS/TS projections were N=1 at AD-c and have been consistently low since. Three data points resolve.
|
|
356
|
+
|
|
357
|
+
### Cross-Repo § 1 Summary Absolute Path
|
|
358
|
+
- [ ] **When a run targets a repo outside convo, § 1 Summary names the FULL absolute path of the target repo.** Example: `/workspaces/convo/mootup-io/moot-cli-js/packages/moot-cli/`, not just `packages/moot-cli/`. Prevents orphan-stub confusion when similarly-named paths exist (AH-e-bootstrap-cli hit `/workspaces/convo/moot-cli-js` orphan-LICENSE-only stub instead of `/workspaces/convo/mootup-io/moot-cli-js` real repo; baseline-hash mismatch was treated as spec-fatal instead of path-fatal). One-line disambiguation at § 1 fixes; saves a Spec-escalation cycle.
|
|
359
|
+
|
|
360
|
+
### Invariant-Phrasing Specificity for Multi-Site Behavior
|
|
361
|
+
- [ ] **Replace generic verbs in § 8 invariants with explicit grep-targetable forms.** Generic: "threaded to all N places," "propagated," "plumbed." Specific: "registered on commander in N places" / "validated against `<regex>` in N places" / "passed through as `<kwarg>` in N call sites" / "keyed off `<value>` in N lookup sites." At draft-time, audit § 8 invariants for verbs that could be read as multiple distinct mechanisms (register / validate / use-for-lookup / pass-as-kwarg) — pick the ONE the Impl-test should exercise. AH-e-cli Inv 12 "`--profile` threaded to all 4 commands" — Impl read as "registered + passed through"; QA read as "validated against regex in all 4." Both reasonable. Cost zero impl time but created QA observation + follow-up known-issue entry.
|
|
362
|
+
|
|
363
|
+
### Design-Around-Scrub-Regex When PAT Is Technical Scope Term
|
|
364
|
+
- [ ] **When feature design involves "PAT" (Personal Access Token) as a technical identifier, route through presence-of-refresh-token-ref or similar indirect-check instead of `credential_type: 'pat' | 'oauth'`-style enum literals.** `\bpat\b` case-sensitive operator-name scrub matches literal `'pat'` in TS code + `<pat>` in CLI metavars + `pat-<suffix>` decision names. In-file-iterate-to-clean works (TE-3 discipline preserved) but breaks clean-first-pass streak. Not a regex-tightening; a design-time discipline. AH-e-cli Spec renamed `'pat'` → `refresh_token_ref`-presence discriminator during draft; 4 rename sites absorbed + single-commit preserved. Apply prospectively when PAT appears in § 4/§ 5 as technical scope.
|
|
365
|
+
|
|
366
|
+
### Test-LOC Multiplier Separate from Source (×1.8–2.0 vs Source ×1.5)
|
|
367
|
+
- [ ] **§ 6 projection tables list source and test LOC SEPARATELY with different multipliers.** AH-g validated source ×1.5 (1.09× miss); tests landed 1.4–2.2× over the same 1.5× projection. Test files carry fixture + mock + setup boilerplate ON TOP OF pure-assertion logic; the ratio isn't the same as source. Specific test-LOC anchors:
|
|
368
|
+
- **Hand-crafted MCP/mock-boilerplate fixtures** (`conftest.py` with MCP-client stubs, vitest `setup.ts` with happydom wiring): budget **80–120 LOC per fixture file**.
|
|
369
|
+
- **Parametrized-expansion test files** (`it.each` / `@pytest.mark.parametrize` with 3+ sub-cases): budget source-estimate × **1.8–2.0**. Sub-case expansion reads as 1 source line per case in commits but multiplies ceremony in the file.
|
|
370
|
+
- **Invariant-grep test files** with N structural greps: budget **30–50 LOC per invariant** (each grep has its own test wrapper + comment).
|
|
371
|
+
- **Cross-repo parity test files**: budget **60–80 LOC** for parity-JSON loader + field-by-field assertions + error-class-name mirror.
|
|
372
|
+
Combined with source ×1.5, expect total projection-to-actual to land within ~10% instead of ~40% miss on whole-diff. AH-g delta driven by test-LOC (source was 1.09× off; whole-diff was 1.41× off).
|
|
373
|
+
|
|
374
|
+
### Version-Gated Invariants Require § 14 Impl Version-Bump
|
|
375
|
+
- [ ] **When § 8 adds an invariant of the form "SDK version ≥ contract_version" or "library stamps against external artifact version," § 14 Impl guidance MUST include an explicit "bump `package.json` / `pyproject.toml` version to ≥ contract ref" checklist item.** Without this explicit callout, the invariant ships correctly (runtime check fires as intended) but pre-publish blocks — alpha ships safely; beta-gate fires at first `npm publish` or `uv publish` attempt. AH-g `@mootup/moot-sdk` 0.1.0-rc.0 vs contract `oas_version_ref 0.2.0`: inv 4 enforcement via `sync-contract` `exit(2)` works, but § 14 didn't require the version bump as part of impl. Add prospectively when any version-stamp invariant lands.
|
|
376
|
+
|
|
377
|
+
### Invariant-Enforcement-Level Specificity
|
|
378
|
+
- [ ] **When an invariant has multiple valid enforcement-time interpretations — runtime-check / build-fail / CI-fail / publish-fail — spec names the intended one explicitly.** Extension of AH-e-cli's invariant-phrasing specificity rule: verb choice matters AND enforcement-level choice matters. AH-g inv 4 "version stamps against `info.version`" worked because both `sync-contract` runtime-check and a theoretical commit-time CI gate would have been acceptable; but if spec intends one specifically, say "build-fail" / "CI-fail" / "runtime-fail" / "publish-fail" in the invariant phrasing. No AH-g drift; flag for future version-gated / environment-variable-check / manifest-validation invariants where enforcement-level could reasonably vary.
|
|
379
|
+
|
|
380
|
+
### Golden-Fixture Capture Mechanism (§ 7 / § 14)
|
|
381
|
+
- [ ] **When an invariant or R-test references committed fixtures at `test/fixtures/<path>/`, § 7 or § 14 spells out the CAPTURE mechanism: source-of-truth reference + capture commands + destination path.** Example: "run `mootup init --harness claude-code` against a canned tenant/actor fixture; `cp -r` output to `test/fixtures/harness-claude-code/`; commit." Without this, Impl substitutes runtime-equivalence — weaker semantics (verifies equivalence between two runtime-produced outputs, misses historical-regression anchor to baseline). AH-h inv 1 shipped with runtime-equivalence instead of committed fixtures because spec § 7 R9 didn't specify the capture procedure. Non-blocking ship; beta-hardening item. Applies to any fixture-backed regression test — OAS goldens, template-output fixtures, generated-file fixtures.
|
|
382
|
+
|
|
383
|
+
### Every § 6 Row Gets a LOC Estimate
|
|
384
|
+
- [ ] **Do NOT abdicate projection because implementation detail is deferred to Impl grounding.** Every § 6 table row has a non-empty LOC cell, even when marked "Impl finalizes" or "details at grounding." Project a best-guess estimate + Impl adjusts post-grounding. Deferring projection entirely leaves large line-items unprojected and systematically under-estimates whole-diff. AH-h had 296 LOC of cursor-agent template files unprojected because § 5.4 said "Impl finalizes at grounding"; template LOC was the 2nd-largest line-item in the whole-diff. Spec-checklist audit: grep § 6 for "Impl refines" / "Impl finalizes" / "at grounding" and ensure each has an accompanying LOC estimate.
|
|
385
|
+
|
|
386
|
+
### Whole-Diff-Above-Ceiling Is Expected for JS/Fanout/Test-Heavy Runs
|
|
387
|
+
- [ ] **When synthesis reports whole-diff above 2.3 ceiling: if projection correctly anticipated (within ±20%), it's NOT an outlier — it's the expected shape of that topology.** 5-run pattern established: CP-1 (infra-heavy) + AH-e-cli (cross-repo) + AH-g (three-repo) + AH-h (test-heavy-fanout) all projected + landed above ceiling. Common factor: test-LOC + fixture-LOC + fanout consistently push whole-diff past 2.3. Band ceiling reflects Python-backend central tendency; JS/fanout topologies routinely project 2.3–3.0 whole-diff. Don't recalibrate the band on correctly-projected above-ceiling runs. Reserve "outlier" treatment for unprojected overruns (e.g., a run that projected 1.5 and landed 2.8 is an outlier; a run that projected 2.7 and landed 2.8 is the shape of the topology).
|
|
388
|
+
|
|
389
|
+
### "Impl Picks or Adds" Clauses Include Bounded-Effort Qualifier
|
|
390
|
+
- [ ] **When spec prose offers Impl a "pick existing or add new X" choice (F-findings or § 5 design calls), include an explicit effort-ceiling qualifier.** Format: "If adding X fits within ~5 min / ~50 LOC (minimal, self-contained, testable), default to add. If open-ended (new package, new publish cadence, new dep, new build pipeline), escalate via `message_type='question'` reply in feature thread." Saves Spec round-trip for bounded additions. AH-h F-3 cursor-agent-devcontainer template was on the boundary (~296 LOC of template files, copy-and-tweak of existing); Impl correctly chose "add" in ~5 min but the qualifier would have made the call explicit. Applies to template additions, fixture additions, helper-module additions, CI-yaml additions.
|
|
391
|
+
|
|
392
|
+
### AWS Cross-Resource Constraint Verification (§ 13 Grounding)
|
|
393
|
+
- [ ] **For AWS Terraform resources with known cross-resource validation rules, § 13 Phase A probe MUST cite AWS docs reference OR a prior-verified-run reference.** Known rules (non-exhaustive):
|
|
394
|
+
- **SES `aws_ses_domain_mail_from.mail_from_domain` MUST be a subdomain of verified identity** — e.g., identity `mail.mootup.io` → MAIL FROM `bounce.mail.mootup.io`. Same-as-identity fails at apply.
|
|
395
|
+
- **IAM policy `Resource` ARN must match the principal's actual resource** — role-policy scoped to wrong ARN silently denies at runtime.
|
|
396
|
+
- **S3 bucket names are globally unique** — `aws_s3_bucket` with colliding name fails at apply.
|
|
397
|
+
- **CloudWatch metric namespace starting with `AWS/`** is reserved.
|
|
398
|
+
- **ACM cert domains must match DNS validation records exactly**.
|
|
399
|
+
- **Route53 record set `type` constrains allowed `records` values** (A requires IPs; CNAME single-value; MX includes priority).
|
|
400
|
+
|
|
401
|
+
AE-1 caught MAIL FROM constraint via QA repair at `023a031`; would have broken `terraform apply` without fix. Spec § 13 must demonstrate constraint verification either by linking AWS docs (`# per AWS SES docs: MAIL FROM must be subdomain of verified identity`) or by referencing a prior run that shipped the same shape correctly. First AWS-heavy run since CP-1/TA-1 to expose real cross-resource validation.
|
|
402
|
+
|
|
403
|
+
### Python Stdlib Name-Collision in `tests/core/<name>/` → Omit `__init__.py`
|
|
404
|
+
- [ ] **When § 6 lists a new `tests/<top>/<name>/` subdir, `python -c "import <name>"` at draft-time; if stdlib-resolvable, OMIT `__init__.py` from § 6 + note the pytest-collection workaround.** Creating `__init__.py` in a subdir whose name matches a stdlib module (`email`, `json`, `os`, `sys`, `types`, `collections`, `logging`, `pathlib`, `dataclasses`, etc.) triggers pytest `ModuleNotFoundError: No module named '<name>.test_<foo>'` because rootdir-based discovery walks the nearest-ancestor-`__init__.py` chain and binds the subdir name to top-level, colliding with the stdlib module. Solution: omit the file; pytest falls back to file-based discovery via rootdir conftest. AE-1 hit this on `tests/core/email/`; other `tests/core/*` subdirs (`aws`, `auth`, `models`) work with `__init__.py` because their names don't collide. One-line check at spec-draft time.
|
|
405
|
+
|
|
406
|
+
### `select_autoescape` Extension Tuple for Double-Suffix Templates
|
|
407
|
+
- [ ] **When templates use double-extensions (e.g. `.body.html.j2` / `.body.txt.j2`), Jinja2 `select_autoescape` extension tuples MUST reflect the FULL suffix.** Use `select_autoescape(enabled_extensions=("html.j2",), disabled_extensions=("txt.j2", "subject.j2"))` NOT `select_autoescape(["html", "j2"])`. Jinja2's `.endswith()`-based matching against a single-token tuple like `"j2"` doesn't discriminate `.body.html.j2` vs `.body.txt.j2` — both match, both auto-escape, and plain-text templates get their `<` characters HTML-escaped. Alternative: use a callable for `autoescape=` instead of `select_autoescape` when filenames are double-suffixed. AE-1 hit this; text-body test expected literal `<script>` but got `<script>` until the extension tuple was corrected.
|
|
408
|
+
|
|
409
|
+
### Projection Multipliers — Whole-Diff Combined-Multiplier Method (graduated post AE-3)
|
|
410
|
+
- [ ] **§ 6 projection tables use per-language + per-shape multipliers. Whole-diff combined-multiplier method is the load-bearing Spec-projection accuracy gate; source-only remains sub-calibration guidance.** Graduated from hypothesis to load-bearing via 4 consecutive whole-diff projections at ≤3% miss (AH-g 0.99× / AH-h 0.99× / AE-2 0.82× / AE-3 1.01×). Codified sub-band structure:
|
|
411
|
+
- **JS / TS (greenfield + fanout):** source **×1.5** (calibrated at 1.09–1.14× miss on AH-g / AH-h), tests **×1.8–2.0**, fixtures as separate category. Whole-diff above 2.3 ceiling is normal; projection-accuracy is the gate, not the band.
|
|
412
|
+
- **Python + backend + small-surface:** source **×0.5–0.6** (tightened from ×0.8 post-AE-3; 3 Python data points: AE-1 1.88× + AE-2 1.03× + AE-3 1.76× over projected ×0.8; median 1.56× over). Tests **×1.3–1.5** (mixed manual-mock + DB-integration shape). Whole-diff within widened bands; infra-lean Python runs land BELOW widened-floor (TA-2/TE-3/AE-1/AE-2/AE-3 all at 0.20–0.69 source); not an outlier.
|
|
413
|
+
- **Hybrid-refactor additional multiplier ×1.3 on edited-function LOC** (applies when spec splits or significantly-refactors an existing function). Validated on 2 data points: AH-h `init.ts` refactor + AE-2 `auth_service.issue_magic_link_session` split.
|
|
414
|
+
- When projecting a mixed run: sum each component row independently with its appropriate multiplier. E.g., an AE-2-shape run: `(store_LOC × 0.55) + (auth_service_refactor_LOC × 0.55 × 1.3) + (route_handler_LOC × 0.55) + (template_LOC × 1.0) + (test_LOC × 1.4)`. Don't apply one multiplier to the whole source line.
|
|
415
|
+
- **Source-only variance** is sub-calibration; whole-diff IS the gate. Spec-projection-accuracy assessments in retros should center whole-diff miss, not source-only miss.
|
|
416
|
+
|
|
417
|
+
### Caddy / Reverse-Proxy Routing Check at § 13 (before SvelteKit Intermediaries)
|
|
418
|
+
- [ ] **When spec prescribes a SvelteKit route for a backend-navigated flow, § 13 Phase A MUST probe existing Caddy config (`Caddyfile*`) for `/<path>/ → backend` passthrough.** Browser-navigation flows include `/auth/*` redemption, OAuth callback URLs, deep-link redeemers, anything the browser `GET`s directly and the backend 302-redirects. If the path already passthroughs to backend, the SvelteKit intermediary is dead code — drop it from § 4/§ 6. AE-2 shipped without the SvelteKit `/auth/redeem` passthrough because Caddy routes `/auth/*` → backend unconditionally; Impl correctly skipped the dead SvelteKit page. One-line grep at § 13: `grep -nE '^\s*(handle|@)[^{]*/(auth|api|r)' Caddyfile*`. Prevents scope bloat + wasted Impl cycles + false-positive-frontend-testing.
|
|
419
|
+
|
|
420
|
+
### Cite-Verify `config.*` and Model File-Locations at § 13
|
|
421
|
+
- [ ] **When spec § 4 drop-in code cites `config.FOO` literals or Pydantic model file-locations (`api/models.py::FooResponse`, `core/models/bar.py::Baz`), § 13 Phase A MUST verify with explicit greps.** Typical drift pattern: spec mentally-references a likely-looking name; actual name/location differs by 1 token or file. AE-2 had both:
|
|
422
|
+
- `config.APP_BASE_URL` cited in § 4.7 — actual is `config.PUBLIC_BASE_URL`. Verify: `grep -nE '^(APP_BASE_URL|PUBLIC_BASE_URL|BASE_URL)' backend/core/config/config.py`.
|
|
423
|
+
- `AuthRequestResponse` cited at `api/models.py` in § 4.4 — actual at `core/models/models.py:460`. Verify: `grep -rn 'class AuthRequestResponse' backend/`.
|
|
424
|
+
|
|
425
|
+
Each Impl-fix is 1-line; aggregate cost is a deviation bullet in the merge request + a spec-drift retro note. Cheap § 13 probe catches both shapes.
|
|
426
|
+
|
|
427
|
+
### `_GLOBAL_ID_TABLES` Frozenset Is a Separate Scoping Anchor from `PREFIXES`
|
|
428
|
+
- [ ] **Every new BIGINT-PK table that uses `generate_global_id()` MUST be added to `_GLOBAL_ID_TABLES` frozenset at `core/config/id_encoding.py` (lines 144–164), in addition to the prefix entry in `PREFIXES` (lines 38–64).** Two separate scoping anchors; spec § 4.3 typically calls out the prefix but forgets the table name. AE-2 caught at first pytest run: `ValueError: Table 'magic_link_tokens' is not a global-ID table`. Symmetric to the "new ID-prefix-constant completeness audit" rule; this extends it to the table-name side. At § 13 grounding, check both anchors.
|
|
429
|
+
|
|
430
|
+
### Response-Shape-Flip Triggers Derivative Test + OAS Budget
|
|
431
|
+
- [ ] **When § 4 flips an existing endpoint's response shape OR adds a new endpoint, spec § 6 must account for two cost categories beyond new code:**
|
|
432
|
+
- **Derivative test-update surface:** grep existing tests for every assertion touching the old response shape. `grep -n 'data\["<renamed-key>"\]' tests/api/*.py`; `grep -n 'Set-Cookie' tests/api/*.py` when toggling cookie-on-response; enumerate break-sites in § 6 in-place. AE-2 had 8 derivative updates across 3 test files (`test_auth_casual.py` T1/T2/T5/T6/T16 + `test_default_space_provisioning.py` ×2 + AE-1 template test) unprojected.
|
|
433
|
+
- **OAS regen LOC budget:** 50–150 LOC per route-shape-change or new-endpoint in the `docs/api/openapi.yaml` delta. AE-2's 102 LOC OAS regen wasn't budgeted.
|
|
434
|
+
Both drove AE-2's whole-diff 18% over-projection despite source landing 0.97× accurate.
|
|
435
|
+
|
|
436
|
+
### asyncpg JSONB Columns Require `json.dumps` + `::jsonb` Cast
|
|
437
|
+
- [ ] **In drop-in code blocks for § 4/§ 5 that INSERT or UPDATE a JSONB column via asyncpg, use `json.dumps(value)` Python-side + `$N::jsonb` SQL cast.** Passing a raw Python `dict` raises `invalid input for query argument $N: expected str, got dict`. Mirror from existing store precedent (e.g., `actor_store.py:382` pattern: `await conn.execute("UPDATE foo SET bar = $1::jsonb", json.dumps(data))`). AE-2 Impl hit this on first pytest run; 1-line fix but the `::jsonb` cast + `json.dumps` is easy to forget in drop-in SQL that looks valid at a glance. Spec-checklist auto-scan: grep § 4/§ 5 for JSONB column names OR `jsonb` SQL type + verify the adjacent Python code uses `json.dumps` or the `::jsonb` cast.
|
|
438
|
+
|
|
439
|
+
### R-Test Enacts Invariant's Threat Class (Not Structurally-Equivalent)
|
|
440
|
+
- [ ] **When an invariant describes a specific threat class, the R-test SHOULD ENACT that threat, not a structurally-equivalent one.** AE-2 Inv 6 ("cross-tenant token redemption rejected") shipped with R7 as "unknown token → 401" — structurally equivalent (cross-tenant attack via magic link reduces to unknown-token attack from the attacker's view), but the test doesn't exercise the literal cross-space scenario. The invariant is asserted by design (tokens FK'd to user; cross-tenant redemption is impossible by structure), not by test enactment. Preferable: mint a token in tenant A, attempt redemption from tenant B actor context, assert 401. Tighten in future reset-token / password-reset / cross-actor-resource invariants. Applies especially to tenant-fence invariants where structural design might pass the test at impl time but wouldn't catch a regression.
|
|
441
|
+
|
|
442
|
+
### `api/exception_handlers.py` `jsonable_encoder` Probe for `@field_validator`-ValueError
|
|
443
|
+
- [ ] **When spec § 4 introduces a Pydantic `@field_validator` raising `ValueError`, § 13 Phase A MUST `grep -n "jsonable_encoder" backend/api/exception_handlers.py`.** If the `RequestValidationError` handler doesn't wrap `exc.errors()` in `jsonable_encoder`, spec must either (a) include the handler fix in § 6 OR (b) use `pydantic.ValidationError` explicitly (which FastAPI auto-handles). Raw `ValueError` propagated from `@field_validator` via `ctx` is not JSON-serializable; the shipped handler returns 500 instead of the intended 422. AE-3 shipped this as a 6-LOC Impl deviation (load-bearing for R11 UTF-8 byte-check test). Spec-side grounding gap that § 13 didn't probe — `bcrypt`/`hashlib`/`secrets` layers were probed but the FastAPI Pydantic-error serialization path was not.
|
|
444
|
+
|
|
445
|
+
### Password-Field Validator Consistency — ALL Password Fields, Not Just Setters
|
|
446
|
+
- [ ] **When spec introduces password-field inputs, `Field(max_length=72)` + UTF-8 byte validator applies to ALL password fields, including verification paths.** `current_password` on SetPassword endpoints, legacy-password checks, migration-time password comparisons — all get the validator. Asymmetric application is harmless (bcrypt silently truncates on verify → mismatch → safe 401) but inconsistent; QA flags as non-blocking. AE-3's `SetPasswordRequest.current_password` QA-observed gap. Rule: spec-checklist password-surface probe = every password field gets the full validator triple (`min_length=8, max_length=72, UTF-8 byte check`).
|
|
447
|
+
|
|
448
|
+
### § 13 `ls` Probe Covers Parent Directory Index Pages
|
|
449
|
+
- [ ] **When spec § 4 adds navigation links into a subdirectory (`/settings/password`, `/admin/users`, etc.), § 13 MUST probe the parent directory for `+page.svelte` / `index.html` / landing-route existence.** "Subdir exists" ≠ "parent has navigable index." AE-3 prescribed "add link from `/settings` to `/settings/password`" without checking that `/settings/+page.svelte` exists — it doesn't. Pre-existing gap; no user-visible regression for AE-3, but the Spec § 4.8 prescription is dead until the parent index ships. `ls frontend/src/routes/<parent>/+page.svelte` at § 13 catches it; if missing, spec either adds the parent page to § 4 OR calls out the deferral.
|
|
450
|
+
|
|
451
|
+
### Spec Parametrization Shape Explicit When Test-Count Arithmetic Is Load-Bearing
|
|
452
|
+
- [ ] **When spec § 7 labels a test with "parametrized flag" AND § 3 target test-count depends on the parametrization choice, spec MUST explicitly specify `@pytest.mark.parametrize` shape OR declare "N separate test functions acceptable".** AE-3 R8 with 3 sub-cases (expired/consumed/unknown) was written as 3 separate test functions; spec § 3 projected 12 tests, delivered 14. Both legal under § 7.1 "parametrized flag" discretion but shifted target-count arithmetic. When the expected test count matters (ship-criteria gate, verification cadence, CI timing), spec pins the shape. When it doesn't, "either shape acceptable" is fine — just say so.
|
|
453
|
+
|
|
454
|
+
### `ls` Frontend Test-Dir Convention at § 13 Phase A
|
|
455
|
+
- [ ] **When spec § 4 prescribes a test file path (Playwright `.spec.ts`, Vitest `.test.ts`, pytest `test_*.py`), § 13 MUST `find <root> -name '*.spec.ts' -o -name '*.test.ts' -o -name 'test_*.py'` (or equivalent Glob) to confirm the shipped directory convention BEFORE prescribing the path.** 3rd test-path-drift in streak across CP-1 (`utils/` vs `config/`) + CP-2 (`mcp/` vs `api/`) + DS-1 (`e2e/` vs `tests/`). Each individually a non-blocker; pattern is repeatable Spec § 13 oversight. Specifically for Playwright: `ls frontend/tests/ frontend/e2e/` shows which directory has existing spec files — that's the convention. Generalizes the ux-rtm "`ls` every test FILE" rule to directory-convention scope when multiple candidates exist. Use the Glob tool at spec-draft time.
|
|
456
|
+
|
|
457
|
+
### First-Sphinx `exclude_patterns` When SOURCEDIR = `.`
|
|
458
|
+
- [ ] **When spec introduces Sphinx config with `SOURCEDIR = "."` (or equivalent whole-dir scan), spec § 4 MUST include `.venv`, `CLAUDE.md`, and any agent-meta-context files in `exclude_patterns`.** Typical failure modes: local dev venv at `docs/site/.venv/` causes Sphinx to copy pygments assets (90+ PNG files) into `_build/html/`; `CLAUDE.md` (agent-meta context) renders as `_build/html/CLAUDE.html` public doc page. DS-1 Impl caught + fixed both post-draft (2-line `exclude_patterns` add). One-shot spec-checklist item for first-Sphinx-at-root-with-venv runs; N/A when `SOURCEDIR = source/` or similar subdirectory.
|
|
459
|
+
|
|
460
|
+
### Committed-Build-Artifact Whole-Diff Sub-Band (post DS-1 validation)
|
|
461
|
+
- [ ] **When spec tracks generated content under `_build/**`, `dist/**`, `.next/**`, `coverage/**`, or similar, whole-diff denominator EXCLUDES that tree for projection validity.** DS-1 data point: 87× with committed `_build/html/` tree / 0.50× without — without-counter is the correct projection metric. Infrastructure-snapshot (regenerated from source on every build) dominates line-counts in a way that's uncorrelated with spec effort; including it breaks the whole-diff combined-multiplier method (4 consecutive prior runs at ≤3% miss would look like wild outliers). Add to the projection methodology alongside JS / Python / hybrid sub-bands. First DS-1 data point; 2nd data point (future DS-N or similar committed-artifact run) refines. Spec § 6 MUST note the exclusion explicitly when a projection includes committed generated trees.
|
|
462
|
+
|
|
463
|
+
### Container-Colocated Backend Service Probe at Phase A (post AH-f-deflake validation)
|
|
464
|
+
- [ ] **When spec's § 6 file-touches table includes a Redis-mutating test file (`sadd`, `srem`, `expire`, synthetic-key seeding) OR a Postgres/shared-filesystem-mutating test file, § 13 Phase A MUST probe the colocated backend service for lifespan-started background tasks sweeping the same keyspace.** The pytest subprocess and the shipping `api.run` service are distinct runtime contexts on shared infra — `if session_manager is None: return` guards are true in pytest but false in the colocated service. AH-f-deflake shipped a latent 6-run flake because Spec's § 13 A-6 accepted the session_manager short-circuit claim for both contexts. Probe recipe:
|
|
465
|
+
```bash
|
|
466
|
+
# Enumerate lifespan-started sweepers:
|
|
467
|
+
rg -nE "reconciler|sweep|scan_iter|start\(\)" backend/api/app.py backend/api/lifespan*.py backend/core/evaluators/
|
|
468
|
+
# Disambiguate what the container actually runs:
|
|
469
|
+
docker exec <container> bash -c 'ls /proc/1/cmdline'
|
|
470
|
+
```
|
|
471
|
+
Extension: when multiple Redis DBs are available in the container, prefer tests routing to a non-default DB via conftest `REDIS_URL` rewrite + FLUSHDB at session start. Rule for generic shared-infra: if pytest writes to a keyspace a shipping-service sweeper operates on, isolate at the resource layer (separate DB, separate schema, separate filesystem prefix) — don't rely on the sweeper being idle.
|
|
472
|
+
|
|
473
|
+
### Cheap-Falsification Before Amendment Authorization
|
|
474
|
+
- [ ] **When a mid-run spec amendment's fix-cost exceeds its hypothesis-verification-cost (implementing a new test-infra pattern + Product approval + merge + re-verify vs. running one more diagnostic command), Spec MUST demand the falsification probe first.** Template: "Before I authorize (X), run probe (Y) to rule out (Z)." For parallelism / timing / scheduler hypotheses, cheap falsification is almost always "reproduce without parallelism" or "reproduce without contention." AH-f Amendment 1 (xdist_group) was authorized on the "xdist event-loop pressure races pending-key TTL" hypothesis; a 30-second `pytest` (no `-n auto`) would have falsified it before paying the ~30-min amendment round-trip cost. Applies to: Spec's authorization step when Impl reports a candidate fix failing empirically. Node-internal to Spec (does not add a pipeline edge); tightens amendment-authorization criteria.
|
|
475
|
+
|
|
476
|
+
### Conftest Pre-Import Fixture Budget ~2× Naive Estimate
|
|
477
|
+
- [ ] **When § 4 prescribes a conftest fixture that mutates module-level config before import (`REDIS_URL`, `DATABASE_URL`, `MOOTUP_STAGE`, etc.), spec § 6 LOC estimate should be ~2× the naive size-of-fixture-body estimate.** Covers: (a) import-ordering wiring (mutation MUST run before the target `core.config` / `core.settings` / equivalent module loads — typically requires module-level code at conftest top, not inside a fixture function); (b) FLUSHDB / `TRUNCATE` equivalent at session start for deterministic state; (c) teardown safety if other session-scoped fixtures depend on the mutated config. AH-f-deflake's conftest Redis-isolation fixture landed ~60 LOC vs ~30 LOC naive estimate (2.0× miss).
|
|
478
|
+
|
|
479
|
+
### Repo-Trim / Artifact-Revert Sub-Band (post DS-2 validation)
|
|
480
|
+
- [ ] **When a run's primary deliverable is un-tracking generated content (git-rm-cached + `.gitignore` update + optional deploy-guard changes), apply the repo-trim sub-band projection bounds.** Source-only near-zero (~0.05-0.10×; spec work dominates); whole-diff EXCL deleted-artifact-tree 0.10-0.15× (test-infra-lean); whole-diff INCL deleted-artifact-tree 100-250× (deletion-dominated; expected outlier). Spec § 6 MUST exclude the deleted tree from the denominator for the validation-gate metric. **Mirrors DS-1's insertion-dominated "committed-build-artifact" sub-band** — DS-2 at 167× with `_build/**` / 0.11× without is the deletion-side equivalent of DS-1's 87× / 0.50×. Both sides demonstrate why infrastructure-snapshot trees must exclude from the whole-diff denominator for projection validity. First repo-trim data point at DS-2; next refines bounds.
|
|
481
|
+
|
|
482
|
+
### Scope-Doc "(Optional)" Decisions Promote-or-Skip-Explicit at Spec-Draft
|
|
483
|
+
- [ ] **When a scope doc brackets a deliverable as "(optional belt-and-suspenders)," "(MVP vs polish)," "(nice-to-have)," or similar, spec MUST resolve the ambiguity at draft: either promote to § 4/§ 6 with rationale OR explicitly skip with justification.** Passing "(optional)" through to Impl leaves it to Impl's judgment, and an Impl "polish" pass may lock in behavior the spec didn't commit to (or worse, Impl skips it and the scope doc's "optional" label gets read as "mandatory" in a later audit). DS-2's § 4.2 re-verify-build was bracketed "(optional belt-and-suspenders)" in scoping; Spec promoted at draft with reasoning (cost: seconds; catches source-edit-without-rebuild); QA's S-1 drill validated — clean precedent. Rule: read every "(optional)" or parenthetical-qualifier in scope doc at § 13 Phase A; each gets an explicit promote-or-skip decision in the spec's commit.
|
|
484
|
+
|
|
485
|
+
### Phase-A Build for Numeric-Gate Scope Items (post DS-3 validation)
|
|
486
|
+
- [ ] **When scope includes a numeric build-output gate (warning count, lint count, test count, bundle bytes, etc.), Spec MUST run that build during Phase A.** One `./scripts/<build>.sh 2>&1 | grep -c WARNING` (or equivalent) reveals current actual reality before draft. DS-3's "≤5 warnings" gate was Product-aspirational; Phase A didn't build to verify; the actual count was 14 residuals → spec amendment cycle. **Rule:** numeric output-gate → Phase-A run-the-build → either (a) promote scope to fit the gate at draft, or (b) negotiate the gate with Product before SPEC-READY. Either path avoids the amendment-thrash.
|
|
487
|
+
|
|
488
|
+
### Convention Check for Env-Vars / Config-Vars / Feature-Flags (post reconciler-gate validation)
|
|
489
|
+
- [ ] **When a spec introduces OR reuses an env-var, config-var, or feature-flag, Phase A MUST grep it across backend + frontend + tests before drafting § 4.** If two patterns exist (e.g., `config.MOOTUP_STAGE` module-attr vs. `os.getenv("MOOTUP_STAGE")` call-time), spec MUST either (a) honor one pattern + declare the divergence in § 2 + § 9 as an F-finding, or (b) promote unification to a first-class locked decision. Extension of DX-1's Phase-A-for-numeric-scope-claims carry-forward from numbers to conventions. Reconciler-gate hit path (a) cleanly: spec honored scope's `os.getenv` choice (D-RG-2), flagged the divergence as F-RG-CONFIG-MOOTUP_STAGE-CONVENTION, preserved single-shot carve. Without the grep, spec might have silently-picked a convention and not caught that two exist.
|
|
490
|
+
|
|
491
|
+
### Diff-Stat-Sanity Check Before SPEC-READY (post DS-5 validation)
|
|
492
|
+
- [ ] **Spec's § 8 invariants that count insertions / deletions MUST match the realized target diagram in § 4 / § 6.** Before SPEC-READY, count lines against the drop-in block literally (including blank separators, trailing newlines, etc.) rather than estimating from LOC aspiration. Also recount test-suite cardinality: "N/N passing" MUST reflect (existing tests in the affected files) + (new tests added by this run), not a round-number aspiration. DS-5 had both "exactly 1 insertion" (target block required 2: export line + blank separator) and "10/10 passing" (actual was 8 or 11 depending on counting method) — both low-impact cosmetic drift Spec's own accounting could have caught. Rule: whenever Spec states a count in invariants, the count is a prediction about the realized diff computed from the realized target, not from memory.
|
|
493
|
+
|
|
494
|
+
### Frontend Source Changes Require QA-Stack Rebuild Before R-Gates (post DS-5 validation)
|
|
495
|
+
- [ ] **When a spec prescribes edits under `frontend/src/**`, § 10 or § 14 MUST include a rebuild step: `docker compose -p convo-qa --env-file .env.qa up -d --build frontend`.** The QA compose stack's frontend container bind-mounts `docs/site/` but bakes `frontend/src/**` into the image at build time. Without a rebuild, Playwright R-gates hit the stale-image version and miss the fix — Impl and QA both will report tests RED when the code is actually correct. DS-5 required the rebuild for both Impl's self-verify (impl stack at port 5180) and QA's verification (QA stack at port 5190). Applies symmetrically to any `frontend/src/**` touch. Analogous to reconciler-gate's `docker cp + uv sync` note for backend dep changes.
|
|
496
|
+
|
|
497
|
+
### Container-Path Discipline for pytest Commands in Specs (post reconciler-gate validation)
|
|
498
|
+
- [ ] **When spec § 10 or § 15 writes `docker exec <container> uv run pytest <path>`, the `<path>` MUST be `/app`-relative (e.g., `tests/api/<file>.py`), NOT repo-root-relative (e.g., `backend/tests/api/<file>.py`).** Container WORKDIR is `/app`; tests dir is bind-mounted at `/app/tests/`. Spec's repo-root path returns "0 items collected" on Impl's first invocation — low-impact (Impl self-corrects at the first pytest call) but reliably annoying and 100% avoidable at draft. Apply symmetrically to any pytest command the spec writes for in-container execution. (Outside the container, repo-root paths are correct; the rule is container-only.)
|
|
499
|
+
|
|
500
|
+
### Asset-Loading Probe for `_static/` Files Under Catch-All
|
|
501
|
+
- [ ] **When § 4 prescribes new files under `docs/site/_static/` or any path served via the SvelteKit catch-all `+server.ts`, Phase A MUST `curl -I http://<test-stack>/docs/_static/<file>` to verify 200 + correct content-type.** F-DS3-ASSET-404-ON-308-REDIRECT (DS-1-latent: catch-all 308-redirect strips trailing slash → relative `_static/` URLs resolve as siblings of `/docs` not children → 404) would have surfaced at spec time, not after amendment-3 wasted a CSS override iteration. Production (Caddy) unaffected; the bug only manifests in dev/preview/QA-stack.
|
|
502
|
+
|
|
503
|
+
### Spec § 11 Amendment-History Subsection for Multi-Amendment-Likely Runs
|
|
504
|
+
- [ ] **For runs with mid-run-amendment risk (test-infra-tuning, novel infra surface, content-cleanup), spec § 11 includes a "History" subsection that captures one line per amendment (date / amend-N / event-id / fix-summary).** Validated on DS-3 across 4 amendments — kept the spec legible despite turbulence. Replaces the narrower "Amendment log at top of spec" pattern. Template:
|
|
505
|
+
```markdown
|
|
506
|
+
### History
|
|
507
|
+
- 2026-04-21 amend-1 `evt_xxx` — relax R-2 ≤5 → ≤14
|
|
508
|
+
- 2026-04-21 amend-2 `evt_yyy` — restore ≤5; § 4.9 content sweep
|
|
509
|
+
- 2026-04-21 amend-3 `evt_zzz` — _static/custom.css for R-5 mobile overflow
|
|
510
|
+
- 2026-04-21 amend-4 `evt_www` — accept R-5 RED + F-DS3-ASSET-404-ON-308-REDIRECT
|
|
511
|
+
```
|
|
512
|
+
|
|
513
|
+
### Default-Direction-of-Drift = F-Finding + Gate-Relax (Spec amendment-authorization)
|
|
514
|
+
- [ ] **When Spec is uncertain between bundle-scope and defer-to-F-finding for a mid-run discovery, default to F-finding + gate-relax.** Compaction-preserving + amend-up-if-needed beats commit-scope-early + amend-down-if-cost-grows. Product can override by directing bundle; Spec doesn't pre-commit Product's preferences. DS-3's R-2 (b)→(a) yo-yo cost was Spec's initial (b) being defensible + Product's (a) being defensible + the asymmetric reversal cost being higher than amend-up. Keep terser first move; Product ratchets up if it matters.
|
|
515
|
+
|
|
516
|
+
### Sphinx Content-Shape Grounding at Phase A (post DS-6 validation)
|
|
517
|
+
- [ ] **When scope depends on a static-site output shape (Sphinx, Hugo, mkdocs, any build-time HTML generator), Phase A MUST `ls docs/<site>/_build/html/` and `grep _static/ <sample-page>.html` to observe (a) flat vs directory output (`page.html` leaves vs `page/index.html` dirs) and (b) Sphinx's relative-URL link shape.** Then enumerate EVERY shape-dependent assertion in § 8 + § 10 + § 14 and mark each "local-verifiable" vs "prod-smoke-only" vs "ambiguous." DS-6 shipped with 3 amendments; all three traced to the same Phase-A miss (scope + spec both assumed directory-style Sphinx output; local is flat-hybrid). One grounding probe would have caught all three. Applies to any catch-all served static-site content.
|
|
518
|
+
|
|
519
|
+
### Amendment-Pass Content-Shape Audit (post DS-6 validation)
|
|
520
|
+
- [ ] **When a content-shape mis-grounding surfaces at amendment time (F-finding filed), audit ALL shape-dependent assertions in § 8 / § 10 / § 14 / test navigation targets / handler predicates in the SAME amendment pass.** DS-6's amendment-1 surfaced the flat-vs-directory mismatch, wrote it as a carry-forward, then didn't apply the rule within its own commit — spawning amendments 2 and 3 from the same root cause. The "sweep all shape-dependent edges in one pass" discipline is load-bearing on high-fanout scope gaps. Applied retroactively: when an F-finding you're about to file implies a class of violations, check the class, not just the instance.
|
|
521
|
+
|
|
522
|
+
### Symptom-Patch Sniff Test for Amendments (post DS-6 validation)
|
|
523
|
+
- [ ] **Before drafting a spec amendment, answer in one sentence: "does this fix the originally-reported bug, or just the failing test?"** If the answer is "just the test," the amendment is a symptom-patch. Retarget to the root cause. DS-6 amendment-2's R-7 retarget to `/docs/` would have passed the test while leaving Pat's reproduction URL `/docs/api-reference/` still broken. Product's override to Option G (bidirectional predicate) was the correct call; Spec's amendment authorship should have caught the retarget earlier. Applies to any mid-run fix where the bug's entry point (user action, reported URL, triggering event) can be distinguished from the failing gate.
|
|
524
|
+
|
|
525
|
+
### Multi-Package npm-Workspace Self-Smoke Install Recipe (post CLI-1 validation)
|
|
526
|
+
- [ ] **When spec § 7 prescribes `npm pack` + install for a workspace that has sibling-dep workspace semver (`"@scope/pkg": "*"`), the recipe MUST install ALL workspace packages together (N-tarball bundle), OR explicitly bump sibling semver to a concrete version that matches the published tag's release channel.** npm's `*` semver **excludes pre-releases** — `@mootup/moot-sdk@*` won't match `0.1.0-rc.0`. CLI-1's single-tarball § 7.2 recipe failed at Impl's first Q8 run; the 3-tarball workaround was mechanically correct. Dry-run the install at § 13 grounding before SPEC-READY; alternative is to author the recipe as an N-tarball bundle from the start. Applies to any npm-workspace monorepo where publishing includes pre-releases or any non-stable channel.
|
|
527
|
+
|
|
528
|
+
### Symbol-Text-Rewrite Test-Side Audit (broadened from CLI-1 / ONB-1)
|
|
529
|
+
- [ ] **When a spec rewrites ANY string value that a test's `assert <literal> in` dependency could couple to — config-file value, error-message substring, log-message text, rendered HTML substring, structured-log field value — Phase A MUST grep ALL test files for that literal before committing the rewrite.** Rule class is string-literal-in-assert, not config-file-specific. Three concrete data points:
|
|
530
|
+
- **Config-file value rewrite (CLI-1):** `packages/moot-cli/package.json` bin rename from `mootup` → `moot`. `test/invariants.test.ts` T9 asserted `expect(pkg.bin).toEqual({ mootup: ... })` — missed by src-only grep; Impl caught at Q11 vitest.
|
|
531
|
+
- **Source-side hardcoded mirror (CLI-1):** `src/bin.ts` had `.version('0.1.0-rc.0')` hardcoded alongside the `package.json` version. Source grep also needs the config value.
|
|
532
|
+
- **Error-message substring (ONB-1):** onboarding service.py error text rewritten from "does not match the expected pattern..." to "not a valid tenant schema...". `test_service.py:232` asserted `"expected pattern" in detail.lower()` — coupled to the old substring. Spec § 6.3 updated the assertion alongside the message rewrite.
|
|
533
|
+
|
|
534
|
+
Bidirectional: user-visible / test-visible text values tend to have both test assertions coupling to them AND source mirrors. Grep both directions when rewriting.
|
|
535
|
+
|
|
536
|
+
## Defined terms
|
|
537
|
+
|
|
538
|
+
- **Phase A** — grounding phase of spec drafting; runs before the draft is written. Probes the codebase state (file paths, SDK attributes, pinned versions, existing test counts, schema versions) against what the scoping doc assumes.
|
|
539
|
+
- **§ N** — a numbered section of the spec being drafted. Checklist items reference these by number: § 4 (data / code model), § 5 (files touched), § 6 (code drop-ins), § 8 (structural invariants), § 9 (security), § 11 (F-findings / known gaps), § 13 (Phase A grounding output), § 14 (implementation guidance), § 15 (ship criteria).
|
|
540
|
+
- **F-finding** — a known gap or risk acknowledged in the spec's § 11 that doesn't block ship but is flagged for future resolution. Pattern: `F-<RUN>-<DESCRIPTOR>` (e.g., `F-AHF-RECONCILER-NO-TEST-ISOLATION-FLAG`).
|
|
541
|
+
- **SPEC-READY** — the state at which Spec has finished drafting and handed off to Impl via a merge-to-feat. Many checklist items have a "before SPEC-READY" trigger — they're gates, not post-hoc cleanup.
|
|
542
|
+
- **Q-gate** — a test-grep or verification step a spec prescribes for Impl to run at commit time (e.g., "Q14: run `grep -nE '\\bPat\\b' ...` and confirm 0 matches").
|
|
543
|
+
|
|
544
|
+
|
|
545
|
+
### Triple-Grep Coverage for Container-Key-Shape Rewrites (post RL-1 validation)
|
|
546
|
+
- [ ] **When a spec modifies the key composition of an in-process or persistent container (rate-limit bucket, cache key, dedup key, token key), Phase A MUST grep THREE access shapes in test files, not two.** Broadens the ONB-1 workaround-fixture rule:
|
|
547
|
+
- **(a) Helper-fn invocation:** `_fake_<thing>|_seed_<thing>|_make_<x>_with_<y>|_reset_<thing>` — catches workaround fixtures (ONB-1 original).
|
|
548
|
+
- **(b) Literal-key argument:** the string passed in to the helper / store call (ONB-1 original).
|
|
549
|
+
- **(c) Bracket-subscript mutation:** `\._<container>\[|\._hits\[|\._cache\[|\._buckets\[|\._tokens\[` — catches tests that DIRECTLY mutate the implementation's internal key-shape container.
|
|
550
|
+
RL-1's `test_rate_limits_qa.py:134` used shape (c); Spec's (a)+(b) grep missed it. The amend-1 at D-RL1-S4 traces to the missed third shape. When spec rewrites what key a container is keyed on, all three access shapes in tests are candidates for key-shape coupling.
|
|
551
|
+
|
|
552
|
+
### Scope-Doc Storage-Layer Verification at Phase A (post RL-1 validation)
|
|
553
|
+
- [ ] **When the scope doc names a storage mechanism (Redis, Postgres, S3, in-memory, local disk, etc.) for a module Spec is editing, Phase A § 13 MUST verify the claim at source, not take scope prose as authoritative.** Typical probe: `grep -n 'import redis\|import asyncpg\|import boto3\|import aiofiles\|from collections' <target_module>`. Scope docs drift — either written against an older state OR written against the scope-author's mental model rather than the code. The code is authoritative. RL-1 scope claimed Redis-backed sliding window; actual is `collections.defaultdict(list)` + `time.monotonic()` (in-process). 3-way independent catch (Spec Phase-A + Impl pre-draft + QA pre-draft) corrected it pre-SPEC-READY via D-RL1-S1; had the grounding been formal at § 13 the convergence wouldn't have been needed.
|
|
554
|
+
|
|
555
|
+
### Spec LOC Projection for Drop-In + Phase-A-Probe Heavy Specs (post RL-1 codification — 2 data points)
|
|
556
|
+
- [ ] **When a spec's § 6 touches ≥5 files AND § 13 Phase-A includes ≥6 probes, project Spec LOC using `(file_count × 40) + (probe_count × 25) + summary_prose_LOC` as a starting estimate** (pending 3rd data point refining the coefficients). Drop-in code blocks + Phase-A command logs dominate total spec size; summary-prose-only projection systematically under-projects by 50-140%. Data points:
|
|
557
|
+
- ONB-1: 5 files + 6 probes; scope projected 230; actual 387 (+68%).
|
|
558
|
+
- RL-1: 9 files + 6 probes; scope projected 300; actual 761 (+141%).
|
|
559
|
+
|
|
560
|
+
Trigger threshold confirmed across 2 consistent runs; exact coefficients held pending 3rd data point. Note in § 6 / § 13 that the projection uses this formula when the threshold fires.
|
|
561
|
+
|