npm - @seanyao/roll - Versions diffs - 0.5.0 → 2.602.1 - Mend

@seanyao/roll 0.5.0 → 2.602.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (181) hide show

package/CHANGELOG.md +717 -0
package/LICENSE +21 -0
package/README.md +65 -165
package/bin/dream-test-quality-scan +110 -0
package/bin/roll +14897 -815
package/conventions/config.yaml +17 -1
package/conventions/global/AGENTS.md +146 -100
package/conventions/global/CLAUDE.md +1 -21
package/conventions/global/GEMINI.md +8 -22
package/conventions/global/project_rules.md +9 -0
package/conventions/templates/backend-service/AGENTS.md +30 -81
package/conventions/templates/backend-service/GEMINI.md +3 -3
package/conventions/templates/backend-service/project_rules.md +16 -0
package/conventions/templates/cli/AGENTS.md +31 -58
package/conventions/templates/cli/CLAUDE.md +3 -5
package/conventions/templates/cli/GEMINI.md +3 -3
package/conventions/templates/cli/project_rules.md +16 -0
package/conventions/templates/frontend-only/AGENTS.md +29 -64
package/conventions/templates/frontend-only/GEMINI.md +3 -3
package/conventions/templates/frontend-only/project_rules.md +14 -0
package/conventions/templates/fullstack/AGENTS.md +31 -79
package/conventions/templates/fullstack/CLAUDE.md +1 -1
package/conventions/templates/fullstack/GEMINI.md +3 -3
package/conventions/templates/fullstack/project_rules.md +15 -0
package/lib/README.md +42 -0
package/lib/__pycache__/github_sync.cpython-314.pyc +0 -0
package/lib/__pycache__/loop-fmt.cpython-314.pyc +0 -0
package/lib/__pycache__/loop_result_eval.cpython-314.pyc +0 -0
package/lib/__pycache__/loop_unstick.cpython-314.pyc +0 -0
package/lib/__pycache__/model_prices.cpython-314.pyc +0 -0
package/lib/__pycache__/prices_fetcher.cpython-314.pyc +0 -0
package/lib/__pycache__/roll-home.cpython-314.pyc +0 -0
package/lib/__pycache__/roll-loop-status.cpython-314.pyc +0 -0
package/lib/__pycache__/roll_git.cpython-314.pyc +0 -0
package/lib/__pycache__/roll_render.cpython-314.pyc +0 -0
package/lib/__pycache__/slides-render.cpython-314.pyc +0 -0
package/lib/agent_usage/README.md +49 -0
package/lib/agent_usage/__init__.py +108 -0
package/lib/agent_usage/__pycache__/__init__.cpython-314.pyc +0 -0
package/lib/agent_usage/__pycache__/gemini.cpython-314.pyc +0 -0
package/lib/agent_usage/__pycache__/kimi.cpython-314.pyc +0 -0
package/lib/agent_usage/__pycache__/openai.cpython-314.pyc +0 -0
package/lib/agent_usage/__pycache__/pi.cpython-314.pyc +0 -0
package/lib/agent_usage/__pycache__/pi_emit.cpython-314.pyc +0 -0
package/lib/agent_usage/__pycache__/qwen.cpython-314.pyc +0 -0
package/lib/agent_usage/gemini.py +127 -0
package/lib/agent_usage/kimi.py +278 -0
package/lib/agent_usage/kimi_emit.py +123 -0
package/lib/agent_usage/openai.py +126 -0
package/lib/agent_usage/pi.py +200 -0
package/lib/agent_usage/pi_emit.py +135 -0
package/lib/agent_usage/qwen.py +128 -0
package/lib/backfill-pi-usage.py +243 -0
package/lib/changelog_audit.py +155 -0
package/lib/changelog_generate.py +263 -0
package/lib/context_feed_budget.sh +194 -0
package/lib/github_sync.py +876 -0
package/lib/i18n/README.md +54 -0
package/lib/i18n/agent.sh +75 -0
package/lib/i18n/alert.sh +20 -0
package/lib/i18n/backlog.sh +96 -0
package/lib/i18n/brief.sh +5 -0
package/lib/i18n/changelog.sh +5 -0
package/lib/i18n/ci.sh +15 -0
package/lib/i18n/debug.sh +0 -0
package/lib/i18n/doctor.sh +44 -0
package/lib/i18n/dream.sh +0 -0
package/lib/i18n/init.sh +91 -0
package/lib/i18n/lang.sh +10 -0
package/lib/i18n/loop.sh +140 -0
package/lib/i18n/migrate.sh +74 -0
package/lib/i18n/offboard.sh +31 -0
package/lib/i18n/onboard.sh +0 -0
package/lib/i18n/peer.sh +41 -0
package/lib/i18n/peer_help.sh +25 -0
package/lib/i18n/peer_reset.sh +7 -0
package/lib/i18n/peer_status.sh +5 -0
package/lib/i18n/prices.sh +3 -0
package/lib/i18n/prices_refresh.sh +17 -0
package/lib/i18n/prices_show.sh +7 -0
package/lib/i18n/propose.sh +0 -0
package/lib/i18n/release.sh +0 -0
package/lib/i18n/research.sh +0 -0
package/lib/i18n/review_pr.sh +0 -0
package/lib/i18n/sentinel.sh +0 -0
package/lib/i18n/setup.sh +3 -0
package/lib/i18n/shared.sh +157 -0
package/lib/i18n/skills/roll-brief.sh +47 -0
package/lib/i18n/skills/roll-build.sh +97 -0
package/lib/i18n/skills/roll-design.sh +18 -0
package/lib/i18n/skills/roll-fix.sh +53 -0
package/lib/i18n/skills/roll-loop.sh +28 -0
package/lib/i18n/skills/roll-onboard.sh +33 -0
package/lib/i18n/skills_catalog.sh +30 -0
package/lib/i18n/slides.sh +3 -0
package/lib/i18n/slides_build.sh +38 -0
package/lib/i18n/slides_delete.sh +19 -0
package/lib/i18n/slides_list.sh +14 -0
package/lib/i18n/slides_logs.sh +12 -0
package/lib/i18n/slides_new.sh +15 -0
package/lib/i18n/slides_preview.sh +14 -0
package/lib/i18n/slides_templates.sh +7 -0
package/lib/i18n/status.sh +21 -0
package/lib/i18n/update.sh +24 -0
package/lib/i18n.sh +211 -0
package/lib/loop-exit-summary.py +393 -0
package/lib/loop-fmt.py +589 -0
package/lib/loop_pick_agent.py +316 -0
package/lib/loop_result_eval.py +469 -0
package/lib/loop_unstick.py +180 -0
package/lib/model_prices.py +186 -0
package/lib/prices/README.md +35 -0
package/lib/prices/snapshot-2026-05-22.json +22 -0
package/lib/prices/snapshot-2026-05-23-deepseek.json +15 -0
package/lib/prices/snapshot-2026-05-23-kimi.json +14 -0
package/lib/prices_fetcher.py +285 -0
package/lib/roll-backlog.py +225 -0
package/lib/roll-brief.py +286 -0
package/lib/roll-help.py +158 -0
package/lib/roll-home.py +556 -0
package/lib/roll-init.py +156 -0
package/lib/roll-loop-status.py +1683 -0
package/lib/roll-loop-story.py +191 -0
package/lib/roll-onboard-render.py +378 -0
package/lib/roll-peer.py +252 -0
package/lib/roll-plan-validate.py +386 -0
package/lib/roll-setup.py +102 -0
package/lib/roll-status.py +367 -0
package/lib/roll_git.py +41 -0
package/lib/roll_render.py +414 -0
package/lib/slides/components/README.md +123 -0
package/lib/slides/components/cards-2.html +9 -0
package/lib/slides/components/cards-3.html +9 -0
package/lib/slides/components/cards-4.html +9 -0
package/lib/slides/components/compare.html +22 -0
package/lib/slides/components/highlight.html +9 -0
package/lib/slides/components/pipeline.html +12 -0
package/lib/slides/components/plain.html +7 -0
package/lib/slides/components/quote.html +4 -0
package/lib/slides/components/timeline.html +9 -0
package/lib/slides/templates/introduction-v3.html +571 -0
package/lib/slides/templates/pitch.html +0 -0
package/lib/slides-render.py +778 -0
package/lib/slides-validate.py +357 -0
package/lib/test_quality_gate.py +143 -0
package/package.json +8 -7
package/skills/roll-.changelog/SKILL.md +406 -33
package/skills/roll-.clarify/SKILL.md +5 -2
package/skills/roll-.dream/SKILL.md +374 -0
package/skills/roll-.echo/SKILL.md +5 -2
package/skills/roll-.qa/SKILL.md +57 -3
package/skills/roll-.review/SKILL.md +42 -3
package/skills/roll-brief/SKILL.md +209 -0
package/skills/roll-build/SKILL.md +308 -63
package/skills/roll-debug/SKILL.md +341 -162
package/skills/roll-debug/injectable-bb.js +263 -0
package/skills/roll-deck/SKILL.md +296 -0
package/skills/roll-design/ENGINEERING_CHECKLIST.md +1 -1
package/skills/roll-design/SKILL.md +727 -94
package/skills/roll-doc/SKILL.md +595 -0
package/skills/roll-doctor/SKILL.md +192 -0
package/skills/roll-fix/SKILL.md +149 -32
package/skills/{roll-jot → roll-idea}/SKILL.md +18 -10
package/skills/roll-loop/SKILL.md +578 -0
package/skills/roll-notes/SKILL.md +103 -0
package/skills/roll-onboard/SKILL.md +234 -0
package/skills/roll-peer/SKILL.md +336 -0
package/skills/roll-propose/SKILL.md +157 -0
package/skills/roll-review-pr/SKILL.md +58 -0
package/skills/roll-sentinel/SKILL.md +11 -2
package/skills/roll-spar/SKILL.md +8 -6
package/template/.github/workflows/ci.yml +5 -2
package/template/AGENTS.md +20 -74
package/skills/roll-research/SKILL.md +0 -307
package/skills/roll-research/references/schema.json +0 -162
package/skills/roll-research/scripts/md_to_pdf.py +0 -289
package/tools/roll-fetch/SKILL.md +0 -182
package/tools/roll-fetch/package.json +0 -15
package/tools/roll-fetch/smart-web-fetch.js +0 -558
package/tools/roll-probe/SKILL.md +0 -84
/package/template/{BACKLOG.md → .roll/backlog.md} +0 -0

package/skills/roll-build/SKILL.md CHANGED Viewed

@@ -1,5 +1,7 @@
 ---
 name: roll-build
+license: MIT
+allowed-tools: "Read, Edit, Write, Glob, Grep, Bash, Skill, Agent"
 description: "Universal delivery skill. Handles any input: a US-XXX ID executes from BACKLOG via TCR; a FIX-XXX redirects to roll-fix; any other text auto-clarifies, designs, and ships as a new Story."
 ---
@@ -17,7 +19,7 @@ One entry point. Any input. Full delivery.
 Input received
   ├── matches "US-[A-Z]+-[0-9]+"  → Story mode: read BACKLOG → TCR workflow
   ├── matches "FIX-[A-Z]+-[0-9]+" → redirect to $roll-fix
-  ├── matches "IDEA-[0-9]+"       → redirect to $roll-jot (lookup and expand)
+  ├── matches "IDEA-[0-9]+"       → redirect to $roll-idea (lookup and expand)
   └── anything else               → Fly mode: clarify → design → execute
 ```
@@ -51,9 +53,70 @@ Do not use for:
 Activate when input is a `US-[A-Z]+-[0-9]+` identifier.
+### Step 0: Pre-flight self-check (US-AGENT-007)
+Before reading the Story in depth or splitting actions, **read the Agent profile** from the story's feature md and decide whether this cycle can realistically deliver it. The check is mechanical and turns on a single axis — the story's `est_min` estimate (US-AGENT-022 retired the old three-dimension type/est/risk routing; there is no per-agent capacity range, risk zone, or history threshold anymore):
+```
+inputs:
+  story.est_min       (from **Agent profile:** block, US-AGENT-001)
+  story.chain_depth   (0 unless already a downgrade product)
+complexity tier (lib/loop_pick_agent.py, single source of truth):
+  est_min <= 8        → easy
+  8 < est_min <= 20   → default
+  est_min > 20        → hard
+  missing / illegal   → default
+verdict:
+  too_big when:
+    story.est_min is large enough that even the `hard` tier won't fit one
+    cycle — i.e. the work plainly composes too many files / behaviours to
+    land green in a single cycle — AND story.chain_depth == 0
+    (still have downgrade budget; don't burn a cycle on a guaranteed miss).
+  ok otherwise
+```
+Output the verdict as the first line of the cycle response:
+```yaml
+verdict: ok    # or: too_big
+reason: <one short line — which condition triggered, with numbers>
+```
+When `verdict: ok` → continue to Step 1 normally.
+When `verdict: too_big` → go to **US-AGENT-008 self-downgrade path**, **but** first run the **US-AGENT-009 chain_depth cap check**:
+```bash
+# 0a. Cap check: refuse the third consecutive auto-split.
+#     exit 0 → split allowed; exit 1 → cap hit, take cap-hit path instead.
+if ! bash -c 'source "$(command -v roll)"; _loop_chain_depth_cap_check US-XXX-NNN'; then
+  # Cap hit (chain_depth ≥ 2): hold + ALERT, exit cleanly.
+  bash -c 'source "$(command -v roll)"; _loop_split_cap_hit US-XXX-NNN "depth >= 2, human triage required"'
+  exit 0
+fi
+# 1. Invoke roll-design to re-split the story into smaller sub-stories.
+#    Each sub-story carries chain_depth = (parent.chain_depth + 1).
+#    Sub-stories land as 📋 Todo with depends-on:<parent> chained.
+Skill("roll-design", "--from-story US-XXX-NNN")
+# 2. After the sub-stories are written to BACKLOG, flip the parent
+#    to 🚫 Hold and emit the downgrade event. The helper handles ALERT.
+bash -c 'source "$(command -v roll)"; _loop_self_downgrade US-XXX-NNN "too_big: <reason from verdict>" "US-XXX-NNNa,US-XXX-NNNb"'
+# 3. Exit cleanly — no TCR commits this cycle. The next loop cycle picks
+#    up the first sub-story (which is smaller and should pass pre-flight).
+exit 0
+```
+If `roll-design` cannot produce ≥2 sub-stories (story is already irreducible), fall through to **US-AGENT-009 cap-hit path** by invoking `_loop_split_cap_hit` directly. The cap is purely about stopping infinite split chains; even on the first re-split, if the design step gives up, the cap-hit handler raises ALERT for human triage.
+> Pre-flight is honest, not paranoid: a small story (est_min ≤ 8 — the `easy` tier — with chain_depth=0) should almost always go `ok`. The check pays off on the long tail — stories with a large `est_min` that, on inspection, plainly compose far more files and behaviours than one cycle can land green.
 ### Step 1: Read the Story
-1. Open `BACKLOG.md`, find the US row, follow the link to `docs/features/<feature>.md`
+1. Open `.roll/backlog.md`, find the US row, follow the link to `.roll/features/<feature>.md`
 2. Read the full AC / Files / Dependencies section
 3. If a plan doc (`<feature>-plan.md`) exists, read it for context
@@ -63,6 +126,16 @@ Activate when input is a `US-[A-Z]+-[0-9]+` identifier.
 - Pick the smallest shippable Action first
 - **Granularity constraint**: Each Action completable in 2–5 minutes; split if larger
 - **No placeholders**: Action descriptions must be specific and directly executable
+- **Test-quality self-check (US-QA-011)** — for every Action that adds tests:
+  1. Tests call project functions / public command entry points; do NOT inline
+     external-tool behaviour (`sed`/`awk`/`grep`/`find`/`cut` pipelines that
+     duplicate logic already in `lib/` or `bin/`) — rubric ❼.
+  2. Tests sandbox filesystem state via `BATS_TMPDIR` (or equivalent); do NOT
+     touch or assert on paths outside this repo (`~/.codex`, `~/.kimi`,
+     `~/.roll/`, `/etc/...`) — rubric ❽.
+  3. If you can't satisfy (1) or (2), reshape the Action: extract a project
+     helper, redirect the env var to a tmp dir, or move the test to an
+     integration tier where the boundary is intentional and documented.
 #### 2.5 Parallel Dispatch (auto-determined)
@@ -89,16 +162,16 @@ git worktree add .worktrees/{action-id} -b dispatch/{action-id}
 **Status notifications (required):**
 ```
-🔀 Parallel Dispatch: N Actions running in parallel
+🔀 $(msg build.parallel_dispatch N)
-  Agent 1 [Action: ...]  ⏳ Running...
-  Agent 2 [Action: ...]  ⏳ Running...
+  $(msg build.agent_running 1 "...")
+  $(msg build.agent_running 2 "...")
-  Agent 1 [Action: ...]  ✅ Done (N TCR commits)
-  Agent 2 [Action: ...]  ✅ Done (N TCR commits)
+  $(msg build.agent_done 1 "..." N)
+  $(msg build.agent_done 2 "..." N)
-🔀 Merge: N/N succeeded, merging...
-🧪 Integration tests: running...
+🔀 $(msg build.merge_summary N N)
+🧪 $(msg build.integration_tests)
 ```
 When parallel conditions are not met, execute Actions sequentially.
@@ -122,9 +195,9 @@ Activate when input does not match any `US-XXX` / `FIX-XXX` pattern, or when no
 Before any code, assess clarity:
 ```
-🎯 Clarified Goal: {1-2 sentences capturing user intent}
-📏 Complexity Assessment: {small|medium|large}
-🔍 Uncertainty Areas: {list what needs investigation/decision}
+🎯 $(msg build.clarified_goal): {1-2 sentences capturing user intent}
+📏 $(msg build.complexity_assessment): {small|medium|large}
+🔍 $(msg build.uncertainty_areas): {list what needs investigation/decision}
 ```
 **If uncertainty areas are non-empty or the request is vague, auto-trigger `$roll-.clarify`:**
@@ -132,6 +205,22 @@ Before any code, assess clarity:
 - Follow with 3–5 targeted questions
 - Stop and wait for user answers before proceeding
+**Approach Confirmation (required for UX / format / automation decisions):**
+If the request involves any of: output format, layout, automation level (manual vs automatic), or architecture structure — output a confirmation block **before writing any code**:
+```
+📐 $(msg build.approach_confirmation)
+   1. $(msg build.what_changes): {what will be built or modified}
+   2. $(msg build.the_approach): {specific format / automation level / structure chosen}
+   3. $(msg build.files_touched): {list of files}
+   Proceeding unless you say otherwise.
+```
+Wait for the user's response before editing files. If the user does not object within one exchange, proceed.
 **Complexity Rules (AI coding time):**
 | Level | Scope | Action |
@@ -143,8 +232,8 @@ Before any code, assess clarity:
 ### Phase 2: Create US / Actions
 - Use `$roll-design` to split vague request into INVEST-compliant User Stories
-- Insert US into `BACKLOG.md` under the relevant Epic > Feature group
-- If a new `docs/features/<feature>.md` is needed, create it
+- Insert US into `.roll/backlog.md` under the relevant Epic > Feature group
+- If a new `.roll/features/<feature>.md` is needed, create it
 After creation, switch to **Story mode** and execute the first US immediately.
@@ -156,19 +245,46 @@ Proceed to the **Shared TCR Workflow** (Phase 4 onward).
 The following phases apply to both Story mode and Fly mode after planning is complete.
+### Phase 3.5: Peer Review Gate
+After planning is complete, before entering Test Design Review, assess whether the plan warrants peer review:
+**Auto-trigger `$roll-peer` when any of the following is true:**
+- Plan affects **>3 files** or **crosses modules**
+- **Architecture decisions** or non-obvious trade-offs are involved
+- **Destructive / irreversible operations** (deletions, migrations, production deploys)
+- **High-risk signal words** detected in user request ("critical / important / don't break / 关键 / 别搞砸")
+- User explicitly requests peer review ("/peer", "叫上 peer")
+**With 10s opt-out:**
+```
+Plan affects N files across M modules. Estimated peer review: 2–3 rounds, ~X tokens.
+Press Enter to launch peer review, or type 'n' to skip. Auto-executing in 10s...
+```
+**After peer review result:**
+- **AGREE** → proceed to Phase 4 (Test Design Review)
+- **REFINE** → incorporate feedback, regenerate plan, re-run Phase 3.5
+- **OBJECT** → consider alternative plan, re-run Phase 3.5 with revised proposal
+- **ESCALATE** → present both proposals to user for final decision before proceeding
+**Never trigger:**
+- Single-file changes or well-defined fixes
+- Plans with no cross-module impact and no architecture decisions
 ### Phase 4: Test Design Review
 Before writing implementation code:
 ```
-🧪 Test Design for Action: {Action name}
+🧪 $(msg build.test_design): {Action name}
-   Scenarios:
+   $(msg build.scenarios):
    ├── {Happy path scenario}
    ├── {Edge case scenario}
    └── {Failure/regression scenario}
-   Test Types:
+   $(msg build.test_types):
    ├── Unit tests for: {logic components}
    ├── Integration tests for: {API/data flows}
    └── Manual verification for: {UI/visual elements}
@@ -187,10 +303,10 @@ Reference `$roll-.qa` for coverage requirements and test pyramid strategy.
 ```
 ┌────────────────────────────────────────────────────────────┐
-│  TCR CYCLE (Test && Commit || Revert)                       │
+│  $(msg build.tcr_cycle)                                      │
 └────────────────────────────────────────────────────────────┘
-MICRO-STEP {N}: {description of smallest testable change}
+$(msg build.micro_step {N} "{description of smallest testable change}")
    Step 1: Write/Update Test
       └── Run test → Confirm RED (expected failure)
@@ -219,6 +335,81 @@ MICRO-STEP {N}: {description of smallest testable change}
 Accumulate 3–5 micro-commits per Action. Each commit is a guaranteed working state.
+#### Architectural Friction Signal (non-blocking)
+While implementing, watch for these signals:
+- This Action requires touching code in 3+ unrelated modules
+- The existing module boundary has to be bent or bypassed to make this work
+- A data structure or interface needs to change in a way that ripples across contexts
+- The implementation feels "wrong" even when the test passes
+When any signal appears, **do not stop — flag it**:
+```bash
+# 1. Append to .roll/backlog.md under ## ♻️ Refactor
+# REFACTOR-XXX | <one-line description> | 📋 Todo
+# 2. Append a brief entry to .roll/features/autonomous-evolution/refactor-log.md
+```
+**REFACTOR entry format in .roll/backlog.md:**
+```markdown
+| REFACTOR-001 | {one-line plain-language description} | 📋 Todo |
+```
+描述写法：参见 AGENTS.md "Backlog descriptions" 规则。说清楚"什么需要改"以及"不改会怎样"，技术细节写在 `.roll/features/autonomous-evolution/refactor-log.md`。
+**refactor-log.md entry format:**
+```markdown
+## REFACTOR-001 Extract payment boundary
+**Flagged**: {YYYY-MM-DD} during US-XXX
+**Signal**: {which friction signal triggered this}
+**Observation**: {1–3 sentences describing what felt wrong}
+**Suggested scope**: {rough sense of what a fix would touch}
+```
+Then continue implementing the current Story normally.
+**Event emission** — after all TCR micro-steps for a Story complete, emit a `build` event so the cycle event stream reflects the work done:
+```bash
+# _tcr_count = number of "tcr:" prefix commits made during this Story
+_loop_event build "$US_ID" "${_tcr_count} commits" "" 2>/dev/null || true
+```
+### Phase 5.5: E2E Deposit
+After TCR micro-steps pass, deposit an E2E test for this Story's core user flow.
+```
+E2E DEPOSIT
+   Step 1: Detect
+      └── Read project's existing E2E infrastructure
+          (test directories, config files, framework, naming conventions)
+   Step 2: Write
+      └── One E2E test covering the Story's golden path
+          (the critical user journey this Story delivers)
+   Step 3: Run
+      └── Execute the new E2E test
+   Step 4: TCR
+      ├── ✅ GREEN → git commit -m "tcr: e2e deposit for {story}"
+      └── ❌ RED   → Fix via TCR cycle until green
+```
+**Rules:**
+- Follow whatever E2E patterns the project already uses — framework, directory, naming
+- If no E2E infrastructure exists, reference `$roll-.qa` "Missing Test Infrastructure" section to bootstrap minimally, then deposit
+- One test per Story — covers the golden path, not exhaustive edge cases (those are unit/integration from Phase 5)
+- Each deposited E2E becomes a replayable case: CI runs it on every push, Sentinel can sample it against production
 ### Phase 6: Pre-Push CI Gate
 After all micro-steps, run full CI locally before pushing:
@@ -259,33 +450,51 @@ EOF
 chmod +x .git/hooks/pre-push
 ```
-### Phase 7: Pre-Push Code Review
+### Phase 7: Pre-Push Code Review (Three-Axis Deep Review)
+This phase runs **once per Story** (not per micro-step) on the full accumulated diff.
+Per-micro-step review uses `$roll-.review staged` inline checklist (zero extra cost).
+**Phase 3.5 vs Phase 7 split**: Phase 3.5 (Peer Review) focuses on architectural direction
+and approach before coding begins. Phase 7 focuses on implementation quality after all
+micro-steps are done — catching issues that only appear at diff scale (parameter sprawl
+across files, copy-paste patterns, cross-file N+1, etc.).
 ```bash
-$roll-.review staged
+# Capture full Story diff
+git diff main...HEAD
 ```
-**Review output:**
+**Launch three review agents in parallel** (each receives the full diff):
 ```
-🔍 Self Review Report
-├── Scope: X files (+Y/-Z lines)
-├── 🔴 Critical: N issues (must fix)
-├── 🟡 Warnings: N issues (should fix)
-├── 🟢 Suggestions: N items (optional)
-└── ✅ Passed dimensions: [Quality, Design, Scope, ...]
+Agent 1: Reuse Review
+  → Search for existing utilities / helpers the new code could use instead
+  → Flag any new function that duplicates existing functionality
+  → Flag inline logic replaceable by existing tools
+Agent 2: Quality Review
+  → Redundant state, Parameter sprawl, Copy-paste near-duplicate,
+     Leaky abstraction, Stringly-typed, JSX nesting,
+     Nested conditionals ≥3 deep, Unnecessary comments
+Agent 3: Efficiency Review
+  → Redundant computation / N+1, Missed concurrency,
+     Hot-path bloat, Loop no-op updates, TOCTOU existence pre-check,
+     Memory leaks, Overly broad operations
 ```
-**Review dimensions** (correctness guaranteed by TCR):
-- 🎯 **Quality**: Naming clarity, DRY, function size, readability
-- 📐 **Design**: Architecture, abstraction level, separation of concerns
-- ⚠️ **Scope**: No opportunistic changes
-- 📝 **Documentation**: Comments where needed
+Wait for all three agents to complete. Aggregate findings → fix each issue
+(false positives: note and skip, no debate) → summarize what was fixed.
+**Fallback**: If parallel agent invocation fails, run `$roll-.review staged` on
+the full diff as a single-pass fallback — do not skip review entirely.
 **Decision:**
 ```
 🔴 Critical > 0 → Fix via new TCR cycle → Re-review
 🟡 Warnings > 0 → Fix if quick (< 5 min) or document
-🟢 Suggestions / ✅ All clear → Proceed to push
+🟢 Suggestions / ✅ All clear → Proceed to Phase 8
 ```
 ### Phase 8: Commit & Push
@@ -348,17 +557,17 @@ Follow the repo's deployment path (Vercel / Railway / etc.) and record the deplo
 **Before marking as DONE, fresh evidence must be provided.**
 ```
-🚦 Verification Gate
+🚦 $(msg build.verification_gate)
-   Evidence checklist (each item must have actual output):
-   ├── [ ] Tests passed: paste actual test run output
-   ├── [ ] Build succeeded: paste build output
-   ├── [ ] Online verification: screenshot / curl output / log snippet
-   └── [ ] No regression: verify at least one existing feature still works
+   $(msg build.evidence_checklist):
+   ├── [ ] $(msg build.tests_passed)
+   ├── [ ] $(msg build.build_succeeded)
+   ├── [ ] $(msg build.online_verification)
+   └── [ ] $(msg build.no_regression)
-   Gate Decision:
-   ├── ✅ All items have evidence → Can mark as DONE
-   └── ❌ Any item missing evidence → Gather evidence before passing the gate
+   $(msg build.gate_decision):
+   ├── ✅ $(msg build.gate_pass)
+   └── ❌ $(msg build.gate_fail)
 ```
 **Hard Rule**: "I confirmed the tests passed" does not count as evidence. Must be **freshly run** command output from this session.
@@ -367,16 +576,16 @@ Follow the repo's deployment path (Vercel / Railway / etc.) and record the deplo
 Both locations must be updated — neither can be skipped:
-**① Update BACKLOG.md index row (Status column):**
+**① Update .roll/backlog.md index row (Status column):**
 ```markdown
-| [US-{ID}](docs/features/<feature>.md#us-{id}) | {Title} | ✅ Done |
+| [US-{ID}](.roll/features/<feature>.md#us-{id}) | {Title} | ✅ Done |
 ```
-Change the Status from `📋 Todo` to `✅ Done`.
+Change the Status from `📋 Todo` or `🔨 In Progress` (whichever the row currently shows) to `✅ Done`. When invoked by `roll-loop`, the row will already be `🔨 In Progress` — that is the expected starting state, and the transition is the same Edit operation.
 For Fly mode: first append an index row under the appropriate Epic > Feature group, then mark it done.
-**② Update `docs/features/<feature>.md` US section:**
+**② Update `.roll/features/<feature>.md` US section:**
 ```markdown
 ## US-{ID} {Story Title} ✅
@@ -399,8 +608,15 @@ For Fly mode: first append an index row under the appropriate Epic > Feature gro
 If the US section does not yet exist, create the full section (AC / Files / Dependencies).
+**Before committing, run `$roll-.changelog`** to stage CHANGELOG.md — then include
+it in the completion commit so no separate changelog commit is created.
 ```bash
-git add BACKLOG.md docs/features/
+# 1. Stage changelog (roll-.changelog stages CHANGELOG.md only, does not commit)
+$roll-.changelog
+# 2. Commit BACKLOG + feature doc + CHANGELOG.md together
+git add .roll/backlog.md .roll/features/ CHANGELOG.md
 git commit -m "docs: mark {US-ID} as completed"
 git push
 ```
@@ -408,19 +624,20 @@ git push
 ### Phase 12: Report & Celebrate
 ```
-✅ Pushed to GitHub: origin/main
-🚀 Deployed: <url>
-✅ Verified: <what was checked>
-📦 Changes: <summary>
-🔢 Commits: <count> micro-commits via TCR
-🧪 Tests: <what tests were added/modified>
-📊 TCR Stats: <success rate, revert count if any>
-📋 Review Gate: <self-review findings summary>
-📝 BACKLOG: <US-ID> marked ✅ Done
+✅ $(msg build.pushed_to)
+🚀 $(msg build.deployed): <url>
+✅ $(msg build.verified): <what was checked>
+📦 $(msg build.changes_summary): <summary>
+🔢 $(msg build.commits_count): <count> micro-commits via TCR
+🧪 $(msg build.tests_added): <what tests were added/modified>
+📊 $(msg build.tcr_stats): <success rate, revert count if any>
+📋 $(msg build.review_gate): <self-review findings summary>
+📝 $(msg build.backlog_updated "<US-ID>")
+📄 $(msg build.changelog_bundled)
-🎉 Shipped.
+🎉 $(msg build.shipped)
-🔄 Next Options:
+🔄 $(msg build.next_options):
 1. Continue to next Action (if Story has more)
 2. Start next US (if Fly mode created multiple)
 3. Done (if all completed)
@@ -469,7 +686,7 @@ Before creating any file or directory:
    - No "while I'm here" refactors unless in a separate TCR cycle
 7. **Always update BACKLOG status**
-   - BACKLOG.md index row and `docs/features/<feature>.md` US section are both required
+   - .roll/backlog.md index row and `.roll/features/<feature>.md` US section are both required
    - Neither can be skipped
 ---
@@ -479,6 +696,7 @@ Before creating any file or directory:
 - [ ] Story and Action clearly defined
 - [ ] Test design reviewed and approved
 - [ ] **TCR cycles completed** (all micro-steps via Test && Commit)
+- [ ] **E2E deposited** (golden path test for this Story, committed via TCR)
 - [ ] All commits are green states (no broken commits)
 - [ ] Local CI checks passed (format + lint + build + test)
 - [ ] Self-code-review passed, blocking issues fixed via TCR
@@ -487,10 +705,36 @@ Before creating any file or directory:
 - [ ] Deployed to production
 - [ ] Online verification performed
 - [ ] **Verification Gate passed** (fresh evidence for tests, build, deploy, no regression)
-- [ ] **BACKLOG.md index status updated** (📋 → ✅, REQUIRED)
-- [ ] **`docs/features/<feature>.md` US section updated** (Completed date + [x] ACs, REQUIRED)
+- [ ] **.roll/backlog.md index status updated** (📋 → ✅, REQUIRED)
+- [ ] **`.roll/features/<feature>.md` US section updated** (Completed date + [x] ACs, REQUIRED)
+- [ ] **CHANGELOG.md staged and bundled** into completion commit via `$roll-.changelog` in Phase 11 (REQUIRED)
+- [ ] **Self-score note written (US-SKILL-010 / 012)** — see "Self-score" subsection below
 - [ ] Summary reported to user
+### Self-score (US-SKILL-012)
+Before reporting completion to the user, write one self-score note. The
+helper lands the note under `.roll/notes/<date>-roll-build-<US-id>-<epoch>.md`
+with YAML frontmatter so trend analysis (US-SKILL-014) can aggregate later:
+```bash
+bash -c 'source "$(command -v roll)"; \
+  _skill_write_self_score roll-build US-XXX-NNN <score 1..10> <good|ok|regression> "<rationale>"'
+```
+Score guidance (integer 1..10):
+- **9..10** — story shipped cleanly: AC fully met, TCR rhythm tight, no
+  re-tries from `verdict: too_big`, peer review concerns addressed inline.
+- **6..8** — shipped with caveats: re-tries on red, edge case left to a
+  follow-up FIX, documentation lagged behind code by one cycle, etc.
+- **1..5** — shipped but at low confidence: AC partially met (note which),
+  TCR rhythm broken (multiple revert iterations), or `regression` verdict.
+Verdict values:
+- `good` — story fully delivered; AC met; no concerning signal.
+- `ok` — shipped but with at least one documented trade-off (use rationale).
+- `regression` — story landed but another behaviour broke (rare; open a FIX).
 ---
 ## TCR Recovery Patterns
@@ -538,7 +782,7 @@ When complex state management is error-prone → consider full reset + re-initia
 roll-build   → ship anything (new idea, US-ID, free-text request)
 roll-fix     → fix a specific known bug (FIX-XXX / BUG-XXX)
 roll-design  → plan and design before building (no code output)
-roll-jot     → fast capture a bug or idea into BACKLOG.md
+roll-idea    → fast capture a bug or idea into .roll/backlog.md
 roll-.clarify → passive scope clarification for vague build requests
 ```
@@ -555,5 +799,6 @@ The agent must explicitly produce (in text) before or during execution:
 - **Test Design**: scenarios, edge cases, test types
 - **Test Design Review**: coverage validation result
 - **TCR Log**: micro-step descriptions and commit count
+- **E2E Deposit**: golden path E2E test file for this Story
 - **Quality Review**: post-TCR code review result
 - **Deployment target**: where it will be verified